ETL Processing on Google Cloud Using Dataflow and BigQuery
Overview
In this lab you build several Data Pipelines that ingest data from a publicly available dataset into BigQuery, using these Google Cloud services:
Cloud Storage
Dataflow
BigQuery
You will create your own Data Pipeline, including the design considerations, as well as implementation details, to ensure that your prototype meets the requirements. Be sure to open the python files and read the comments when instructed to.
Task 1. Ensure that the Dataflow API is successfully enabled
To ensure access to the necessary API, restart the connection to the Dataflow API.
In the Cloud Console, enter "Dataflow API" in the top search bar. Click on the result for Dataflow API.
Click Manage.
Click Disable API.
If asked to confirm, click Disable.
Click Enable.
When the API has been enabled again, the page will show the option to disable.
Task 2. Download the starter code
Run the following command to get Dataflow Python Examples from Google Cloud's professional services GitHub:
gsutil -m cp -R gs://spls/gsp290/dataflow-python-examples .
Now set a variable equal to your project id.
export PROJECT=qwiklabs-gcp-02-6bcb441b1e26Copied!
gcloud config set project $PROJECT
Task 3. Create Cloud Storage Bucket
Use the make bucket command to create a new regional bucket in the us-east1 region within your project:
gsutil mb -c regional -l us-east1 gs://$PROJECT
Copied!
Test completed task
Click Check my progress to verify your performed task.
Assessment Completed!
Create a Cloud Storage Bucket
Assessment Completed!
Task 4. Copy files to your bucket
Use the gsutil command to copy files into the Cloud Storage bucket you just created:
gsutil cp gs://spls/gsp290/data_files/usa_names.csv gs://$PROJECT/data_files/
gsutil cp gs://spls/gsp290/data_files/head_usa_names.csv gs://$PROJECT/data_files/
Copied!
Test completed task
Click Check my progress to verify your performed task.
Assessment Completed!
Copy Files to Your Bucket
Assessment Completed!
Task 5. Create the BigQuery dataset
Create a dataset in BigQuery called lake. This is where all of your tables will be loaded in BigQuery:
bq mk lake
Copied!
Test completed task
Click Check my progress to verify your performed task.
Assessment Completed! Dataset ID(s): ["lake"]
Create the BigQuery Dataset (name: lake)
Assessment Completed! Dataset ID(s): ["lake"]
Task 6. Build a Dataflow pipeline
No comments:
Post a Comment