Orchestrating Big Data Solutions with Azure Data Factory Lab 1 - Getting Started with Azure Data Factory

Overview In this lab, you will provision an Azure Data Factory, and use the Copy Wizard to copy data from a file in Azure Blob Storage to a table in Azure SQL Database.

What You’ll Need To complete the labs, you will need the following: • • • • •

A web browser A Microsoft account A Microsoft Azure subscription A Windows, Linux, or Mac OS X computer The lab files for this course

Note: To set up the required environment for the lab, follow the instructions in the Setup document for this course.

Exercise 1: Provisioning Azure Resources In this exercise, you will create the Azure Storage account, Azure SQL Database instance, and Azure Data Factory instance. Note: The Microsoft Azure portal is continually improved in response to customer feedback. The steps in this exercise reflect the user interface of the Microsoft Azure portal at the time of writing, but may not match the latest design of the portal exactly.

Create a Storage Account and a Blob Container The source data for your data pipeline will be stored in an Azure storage account: 1. 2.

In the Microsoft Azure portal, in the menu, click New. Then in the Storage menu, click Storage account. In the Create storage account blade, enter the following settings and click Create: • Name: Enter a unique name (and make a note of it!) • Deployment model: Resource manager • Account kind: General purpose • Performance: Standard

3. 4. 5.

6. 7.

• Replication: Locally-redundant storage (LRS) • Storage service encryption: Disabled • Subscription: Select your Azure subscription • Resource group: Create a new resource group with a unique name • Location: Select any available region In the Azure portal, view Notifications to verify that deployment has started. Then wait for the storage account to be deployed (this can take a few minutes.) After the storage account has been created, browse to its blade in the Azure portal. On the blade for your storage account, click Blobs, and add a container with the following properties: • Name: adf-data • Access type: Private In the Azure portal, view Notifications to verify that deployment has started. Then wait for the container to be created (this should take a few seconds.) After the container has been created, return to the blade for your storage account, and click Access keys. Note that this blade lists the storage account name and two keys that client applications can use for authentication when connecting.

Create an Azure SQL Database Your data pipeline will copy the source data to an Azure SQL Database. SQL databases are hosted in servers, so you will create both a database and a server to host it. 1. 2.

3. 4. 5.

In the Microsoft Azure portal, in the menu, click New. Then in the Databases menu, click SQL Database. In the SQL Database blade, enter the following settings, and then click Create: • Database name: DataDB • Subscription: Select your Azure subscription • Resource Group: Select the resource group you created previously • Select source: Blank database • Server: Create a new server with the following settings: • Server name: Enter a unique name (and make a note of it!) • Server admin login: Enter a user name of your choice (and make a note of it!) • Password: Enter and confirm a strong password (and make a note of it!) • Region: Select the same location as your storage account • Allow azure services to access server: Selected • Elastic pool: Not enabled • Pricing tier: View all and select Basic • Collation: SQL_Latin1_General_CP1_CI_AS • Pin to dashboard: Unselected In the Azure portal, view Notifications to verify that deployment has started. Then wait for the SQL database to be deployed (this can take a few minutes.) After the database has been created, browse to your Azure SQL server (not the database) and under Settings, click Properties. Note the fully qualified name of your server (which should take the form server.database.windows.net, where server is the server name you specified earlier) and the server admin user name (which should be the login you specified earlier).

Create an Azure Data Factory Now that you have your data stores in place, you are ready to create an Azure Data Factory.

1. 2.

3.

In the Microsoft Azure portal, in the menu, click New. Then in the Data + Analytics menu, click Data Factory. In the New data factory blade, enter the following settings, and then click Create: • Name: Enter a unique name (and make a note of it!) • Subscription: Select your Azure subscription • Resource Group: Select the resource group you created previously • Location: Select the location you specified for your storage account (if it is not available, select any other location) • Pin to dashboard: Unselected In the Azure portal, view Notifications to verify that deployment has started. Then wait for the data factory to be deployed (this can take a few minutes.)

Exercise 2: Using the Azure Data Factory to Copy Data For simple data copy pipelines, Azure Data Factory provides an easy to use wizard. In this exercise, you will use the wizard to copy data from your Azure blob store account to your Azure SQL Database.

Upload a Data File to the Blob Container The source data is a comma-delimited text file containing details of sales transactions. 1. In the data subfolder of the folder where you extracted the lab files for this course, open the transactions.txt file in a text editor. 2. Review the data this file contains, which consist of multiple rows of dates and amounts. Then close the text editor without saving any changes. 3. Start Azure Storage Explorer, and if you are not already signed in, sign into your Azure subscription. 4. Expand your storage account and the Blob Containers folder, and then double-click the adf-data blob container you created in the previous procedure. 5. In the Upload drop-down list, click Folder. Then upload the data folder (which contains the transactions.txt file) as a block blob to the root of the container.

Create a Table in the Database You will copy the sales transaction data to a table named transactions, which contains id, tdate, and amount fields. 1. Click All Resources, and then click your Azure SQL Database. 2. On the database blade, click Tools. Then on the Tools blade, click Query editor. This opens the web-based query interface for your Azure SQL Database. 3. In the toolbar for the query editor, click Login, and then log into your database using SQL Server authentication and entering the login name and password you specified when provisioning the Azure SQL Database server. 4.

In the query editor, enter the following Transact-SQL query to create a table named transactions in your database: CREATE TABLE transactions(id int identity, tdate date, amount decimal);

5.

Click Run to run the Transact-SQL statement.

Use the Azure Data Factory Copy Wizard to Copy the Data 1.

In the Microsoft Azure portal, browse to the blade for your data factory, and click the Copy data tile. This opens a new tab in your browser.

2. On the Properties page of the Copy Data wizard, enter the following details and then click Next: • Task name: Wizard Copy • Task description: Copy transactions • Task cadence (or) Task schedule: Run once now • Expiration time: 3:00:00:00 3. On the Source data store page, on the Connect to a Data Store tab, select Azure Blob Storage. Then click Next. 4. On the Specify the Azure Blob storage account page, enter the following details and then click Next: • Connection name: blob-store • Account selection method: From Azure subscriptions • Azure subscription: Select your subscription • Storage account name: Select your storage account 5. On the Choose the input file or folder page, double-click the adf-data blob container you created previously, and then select the data folder (which contains the transactions.txt file). Then click Choose, and click Next. 6. On the File format settings page, wait a few seconds for the data to be read, and then verify the following details, ensuring that the rows of data in the Preview section match the table below, and click Next: • File format: text format • Column delimiter: Comma (,) • Row delimiter: Carriage return and line feed (\r\n) • Skip line count: 0 • Column names in first data row: Selected • Treat empty column value as null: Selected • Preview: tdate 2016-01-01 2016-01-01 2016-01-01 …

amount 129.99 125.49 99.75

7. On the Destination data store page, on the Connect to a Data Store tab, select Azure SQL Database. Then click Next. 8. On the Specify the Azure SQL database page, enter the following details and then click Next: • Connection name: sql-database • Server / database selection method: From Azure subscriptions • Azure subscription: Select your subscription • Server name: Select your Azure SQL server • Database name: DataDB • User name: The server admin login name you specified when creating the database • Password: The password for your Azure SQL server admin login 9. On the Table mapping page, in the Destination list, select [dbo].[transactions] and click Next.

10. On the Schema mapping page, ensure that the following settings are selected, and click Next: Blob path: adf-data/data/ tdate (DateTime) amount (Double)

[dbo].[transactions] tdate (DateTime) amount (Decimal)

Include this column ✓ ✓

Repeatability settings: Method: None 11. On the Performance settings page, expand Advanced settings to review the default values. Then click Next. 12. On the Summary page, click Finish. 13. On the Deploying page, wait for the deployment to complete. 14. Wait a few minutes to allow the pipeline created by the wizard to run.

Verify that the Data Has Been Copied The Copy Data wizard should have created a pipeline, and run it to copy the transactions data from your blob store to your Azure SQL Database. 1. Return to the Query editor for your Azure SQL Database and run the following query: SELECT * FROM dbo.transactions;

2. Verify that the table now contains 10 rows of transaction data, copied from the text file in your blob store. Note: You will use the resources you created in this lab when performing the next lab, so do not delete them.

Microsoft Learning Experiences - GitHub

Orchestrating Big Data Solutions with. Azure Data Factory. Lab 1 - Getting Started with Azure Data Factory. Overview. In this lab, you will provision an Azure Data ...

973KB Sizes 6 Downloads 263 Views

Recommend Documents

Microsoft Learning Experiences - GitHub
Performance for SQL Based Applications. Then, if you have not already done so, ... In the Save As dialog box, save the file as plan1.sqlplan on your desktop. 6.

Microsoft Learning Experiences - GitHub
A Windows, Linux, or Mac OS X computer. • Azure Storage Explorer. • The lab files for this course. • A Spark 2.0 HDInsight cluster. Note: If you have not already ...

Microsoft Learning Experiences - GitHub
Start Microsoft SQL Server Management Studio and connect to your database instance. 2. Click New Query, select the AdventureWorksLT database, type the ...

Microsoft Learning Experiences - GitHub
performed by writing code to manipulate data in R or Python, or by using some of the built-in modules ... https://cran.r-project.org/web/packages/dplyr/dplyr.pdf. ... You can also import custom R libraries that you have uploaded to Azure ML as R.

Microsoft Learning Experiences - GitHub
Developing SQL Databases. Lab 4 – Creating Indexes. Overview. A table named Opportunity has recently been added to the DirectMarketing schema within the database, but it has no constraints in place. In this lab, you will implement the required cons

Microsoft Learning Experiences - GitHub
create a new folder named iislogs in the root of your Azure Data Lake store. 4. Open the newly created iislogs folder. Then click Upload, and upload the 2008-01.txt file you viewed previously. Create a Job. Now that you have uploaded the source data

Microsoft Learning Experiences - GitHub
will create. The Azure ML Web service you will create is based on a dataset that you will import into. Azure ML Studio and is designed to perform an energy efficiency regression experiment. What You'll Need. To complete this lab, you will need the fo

Microsoft Learning Experiences - GitHub
Lab 2 – Using a U-SQL Catalog. Overview. In this lab, you will create an Azure Data Lake database that contains some tables and views for ongoing big data processing and reporting. What You'll Need. To complete the labs, you will need the following

Microsoft Learning Experiences - GitHub
The final Execute R/Python Script. 4. Edit the comment of the new Train Model module, and set it to Decision Forest. 5. Connect the output of the Decision Forest Regression module to the Untrained model (left) input of the new Decision Forest Train M

Microsoft Learning Experiences - GitHub
Page 1 ... A web browser and Internet connection. Create an Azure ... Now you're ready to start learning how to build data science and machine learning solutions.

Microsoft Learning Experiences - GitHub
In this lab, you will explore and visualize the data Rosie recorded. ... you will use the Data Analysis Pack in Excel to apply some statistical functions to Rosie's.

Microsoft Learning Experiences - GitHub
created previously. hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles. /data/storefile Stocks. 8. Wait for the MapReduce job to complete. Query the Bulk Loaded Data. 1. Enter the following command to start the HBase shell. hbase shell. 2.

Microsoft Learning Experiences - GitHub
videos and demonstrations in the module to learn more. 1. Search for the Evaluate Recommender module and drag it onto the canvas. Then connect the. Results dataset2 (right) output of the Split Data module to its Test dataset (left) input and connect

Microsoft Learning Experiences - GitHub
In this lab, you will create schemas and tables in the AdventureWorksLT database. Before starting this lab, you should view Module 1 – Designing a Normalized ...

Microsoft Learning Experiences - GitHub
Challenge 1: Add Constraints. You have been given the design for a ... add DEFAULT constraints to columns based on the requirements. Challenge 2: Test the ...

Microsoft Learning Experiences - GitHub
Data Science and Machine Learning ... A web browser and Internet connection. ... Azure ML offers a free-tier account, which you can use to complete the labs in ...

Microsoft Learning Experiences - GitHub
Processing Big Data with Hadoop in Azure. HDInsight. Lab 1 - Getting Started with HDInsight. Overview. In this lab, you will provision an HDInsight cluster.

Microsoft Learning Experiences - GitHub
Real-Time Big Data Processing with Azure. Lab 2 - Getting Started with IoT Hubs. Overview. In this lab, you will create an Azure IoT Hub and use it to collect data ...

Microsoft Learning Experiences - GitHub
Real-Time Big Data Processing with Azure. Lab 1 - Getting Started with Event Hubs. Overview. In this lab, you will create an Azure Event Hub and use it to collect ...

Microsoft Learning Experiences - GitHub
Data Science Essentials. Lab 6 – Introduction to ... modules of this course; but for the purposes of this lab, the data exploration tasks have already been ... algorithm requires all numeric features to be on a similar scale. If features are not on

Microsoft Learning Experiences - GitHub
Selecting the best features is essential to the optimal performance of machine learning models. Only features that contribute to ... Page 3 .... in free space to the right of the existing modules: ... Use Range Builder (all four): Unchecked.

Microsoft Learning Experiences - GitHub
Implementing Predictive Analytics with. Spark in Azure HDInsight. Lab 3 – Evaluating Supervised Learning Models. Overview. In this lab, you will use Spark to ...

Microsoft Learning Experiences - GitHub
Microsoft Azure Machine Learning (Azure ML) is a cloud-based service from Microsoft in which you can create and run data science experiments, and publish ...

Microsoft Learning Experiences - GitHub
A Microsoft Windows, Apple Macintosh, or Linux computer ... In this case, you must either use a Visual Studio Dev Essentials Azure account, or ... NET SDK for.