ETL Processing on Google Cloud Using Dataflow and BigQuery

 ETL Processing on Google Cloud Using Dataflow and BigQuery

Overview
In this lab you build several Data Pipelines that ingest data from a publicly available dataset into BigQuery, using these Google Cloud services:

Cloud Storage
Dataflow
BigQuery

You will create your own Data Pipeline, including the design considerations, as well as implementation details, to ensure that your prototype meets the requirements. Be sure to open the python files and read the comments when instructed to.


Task 1. Ensure that the Dataflow API is successfully enabled

To ensure access to the necessary API, restart the connection to the Dataflow API.

In the Cloud Console, enter "Dataflow API" in the top search bar. Click on the result for Dataflow API.

Click Manage.

Click Disable API.

If asked to confirm, click Disable.

Click Enable.
When the API has been enabled again, the page will show the option to disable.

Task 2. Download the starter code

Run the following command to get Dataflow Python Examples from Google Cloud's professional services GitHub:

gsutil -m cp -R gs://spls/gsp290/dataflow-python-examples .
Now set a variable equal to your project id.
export PROJECT=qwiklabs-gcp-02-6bcb441b1e26Copied!

gcloud config set project $PROJECT

Task 3. Create Cloud Storage Bucket

Use the make bucket command to create a new regional bucket in the us-east1 region within your project:

gsutil mb -c regional -l us-east1  gs://$PROJECT
Copied!
Test completed task
Click Check my progress to verify your performed task.

Assessment Completed!
Create a Cloud Storage Bucket
Assessment Completed!
Task 4. Copy files to your bucket
Use the gsutil command to copy files into the Cloud Storage bucket you just created:

gsutil cp gs://spls/gsp290/data_files/usa_names.csv gs://$PROJECT/data_files/
gsutil cp gs://spls/gsp290/data_files/head_usa_names.csv gs://$PROJECT/data_files/
Copied!
Test completed task
Click Check my progress to verify your performed task.

Assessment Completed!
Copy Files to Your Bucket
Assessment Completed!
Task 5. Create the BigQuery dataset
Create a dataset in BigQuery called lake. This is where all of your tables will be loaded in BigQuery:

bq mk lake
Copied!
Test completed task
Click Check my progress to verify your performed task.

Assessment Completed! Dataset ID(s): ["lake"]
Create the BigQuery Dataset (name: lake)
Assessment Completed! Dataset ID(s): ["lake"]
Task 6. Build a Dataflow pipeline















































No comments:

Post a Comment

AppEngine - Python

tudent_04_347b5286260a@cloudshell:~/python-docs-samples/appengine/standard_python3/hello_world (qwiklabs-gcp-00-88834e0beca1)$ sudo apt upda...