3 minute read

Tags: ,

Microsoft Azure Coursera


  • Data abundance

    • Q: Structured data can be stored in Azure Blob storage, Azure SQL Database, and Azure SQL Data Warehouse.
      • A: True

        A data engineer deals with data that can be structured, unstructured, or aggregated. Structured data can be stored in Azure Blob storage, Azure SQL Database, and Azure SQL Data Warehouse.

  • The difference between on-premises and cloud-based servers

    • On-premises environments
      • Computing environment
      • Licensing
      • Maintenance
      • Scalability
      • Availability
      • Support
      • Multilingual support
      • Total cost of ownership
    • Cloud-based
      • 同上
      • 微軟好棒棒

Microsoft Certified: Azure Data Engineer Associate

Exam DP-203: Data Engineering on Microsoft Azure

Course 1 Recap

  • Structured data
    • Relational database DB
  • Unstructured data
    • Non-relational systems
    • no sql system
  • Imgur
  • Imgur

  • Azure Storage

    • four configuration options
      • Imgur
        • Azure Blob
        • Azure Files
        • Azure Queue
        • Azure Table
    • Tools ingest data
      • Imgur
        • Azure Data Factory
        • Storage Explorer
        • AzCopy tool
        • PowerShell
        • Visual Studio
  • Azure Data Lake Storage
    • Scalability
    • Hadoop compatible
    • Security support
    • POSIX compliant
    • ABFS driver
    • Zone-redundant
    • Geo-redundant
  • Azure Cosmos DB
    • ingest
    • json
    • queries
  • Azure SQL Database

  • Quiz

    • Question 1: In Azure Synapse Analytics, the Data Movement Service (DMS) coordinates and transports data between compute nodes as necessary. Azure Synapse Analytics supports which types of distributed tables.

      • Replicated
      • Round Robin
      • Hash
    • Question 2: Data that is continuously sent to a system for further processing in real-time is an example of which of the following data types?
      • Streaming data
    • Question 3: As part of an ETL process, data in date-formats may be modified to match the destination format. At what step in the ETL process is this carried out?
      • Transformation
    • Question 4: Which of the following describes the collection of real-time telemetry data?
      • Stream Processing
    • Question 5: Which of the following is an Azure database product that works with non-relational data such as JSON data?
      • Cosmos DB
    • Question 6: Which storage solution supports role-based access control (RBAC) at the file and folder level?
      • Azure Data Lake Storage
    • Question 7: What is the purpose of data ingestion?
      • To capture data flowing into a data warehouse as quickly as possible.
    • Question 8: Data Engineers configure ingestion components of Azure Stream Analytics by configuring data inputs from sources including Azure Event Hubs, Azure IoT Hub, or Azure Blob storage. Which of the following is a feature of Azure Event Hub?
      • Big data streaming service
    • Question 9: Azure Data Lake Storage Gen 2 is designed to store massive amounts of data for big-data analytics. Features of Azure Data Lake include which of the following.

      • Unlimited scalability
      • Security support for access control lists (ACLs)
      • Hadoop compatibility
    • Question 10: Synapse SQL offers both serverless and dedicated resource models to work with both descriptive and diagnostic analytical scenarios. To provide predictable performance and cost, you should create which of the following?

      • Serverless SQL endpoints

Course 2 Recap

  • structured
  • semi-structured
  • Unstructured

  • Data Serialization
    • type of data
    • operations
    • expected latency
    • transactional support
  • ACID (transaction)
    • Atomicity
    • Consistency
    • Isolation
    • Durability
  • Azure Services
    • SQL Database
    • Analysis Services
    • Cosmos DB

    • Blob storage
      • photos
      • videos

Imgur Imgur

  • Azure Active Directory (AD) and RBAC support Imgur

  • Shared Access Signature
    • third-party apps or untrusted client
    • Imgur
  • Storage Three Type blobs

    • block blobs
    • append blobs
    • page blobs

Course 3 Recap

  • Imgur
  • Imgur


  • examtopics

  • Imgur

    • Parquet
      • Parquet stores data in columns, while Avro stores data in a row-based format. By their very nature, column-oriented data stores are optimized for read-heavy analytical workloads, while row-based databases are best for write-heavy transactional workloads.
    • Avro
      • An Avro schema is created using JSON format. AVRO supports timestamps. Note: Azure Data Factory supports the following file formats (not GZip or TXT).

Synapse Analytics

  • Design tables using Synapse SQL in Azure Synapse Analytics

Slowly Choosing Dimension (SCD)

  • Star schema design theory refers to common SCD types. The most common are Type 1 and Type 2. In practice a dimension table may support a combination of history tracking methods, including Type 3 and Type 6. Let’s get to know the difference in these SCD types.




  • You have an enterprise data warehouse in Azure Synapse Analytics. Using PolyBase, you create an external table named [Ext].[Items] to query Parquet files stored in Azure Data Lake Storage Gen2 without importing the data to the data warehouse. The external table has three columns. You discover that the Parquet files have a fourth column named ItemID. Which command should you run to add the ItemID column to the external table?

  • Imgur