World of GCP by Ketan Patel: ETL Processing on Google Cloud Using Dataflow and BigQuery

ETL Processing on Google Cloud Using Dataflow and BigQuery

Overview

In this lab you build several Data Pipelines that ingest data from a publicly available dataset into BigQuery, using these Google Cloud services:

Cloud Storage

Dataflow

BigQuery

You will create your own Data Pipeline, including the design considerations, as well as implementation details, to ensure that your prototype meets the requirements. Be sure to open the python files and read the comments when instructed to.

Task 1. Ensure that the Dataflow API is successfully enabled

To ensure access to the necessary API, restart the connection to the Dataflow API.

In the Cloud Console, enter "Dataflow API" in the top search bar. Click on the result for Dataflow API.

Click Manage.

Click Disable API.

If asked to confirm, click Disable.

Click Enable.

When the API has been enabled again, the page will show the option to disable.

Task 2. Download the starter code

Run the following command to get Dataflow Python Examples from Google Cloud's professional services GitHub:

gsutil -m cp -R gs://spls/gsp290/dataflow-python-examples .

Now set a variable equal to your project id.

export PROJECT=qwiklabs-gcp-02-6bcb441b1e26Copied!

gcloud config set project $PROJECT

Task 3. Create Cloud Storage Bucket

Use the make bucket command to create a new regional bucket in the us-east1 region within your project:

gsutil mb -c regional -l us-east1 gs://$PROJECT

Copied!

Test completed task

Click Check my progress to verify your performed task.

Assessment Completed!

Create a Cloud Storage Bucket

Assessment Completed!

Task 4. Copy files to your bucket

Use the gsutil command to copy files into the Cloud Storage bucket you just created:

gsutil cp gs://spls/gsp290/data_files/usa_names.csv gs://$PROJECT/data_files/

gsutil cp gs://spls/gsp290/data_files/head_usa_names.csv gs://$PROJECT/data_files/

Copied!

Test completed task

Click Check my progress to verify your performed task.

Assessment Completed!

Copy Files to Your Bucket

Assessment Completed!

Task 5. Create the BigQuery dataset

Create a dataset in BigQuery called lake. This is where all of your tables will be loaded in BigQuery:

bq mk lake

Copied!

Test completed task

Click Check my progress to verify your performed task.

Assessment Completed! Dataset ID(s): ["lake"]

Create the BigQuery Dataset (name: lake)

Assessment Completed! Dataset ID(s): ["lake"]

Task 6. Build a Dataflow pipeline

World of GCP by Ketan Patel

ETL Processing on Google Cloud Using Dataflow and BigQuery

Task 1. Ensure that the Dataflow API is successfully enabled

Task 2. Download the starter code

Task 3. Create Cloud Storage Bucket

No comments:

Post a Comment

AppEngine - Python

Report Abuse

Labels