Overview
Imagine you have datasets that live in different locations around the globe. Let's say your data is in Google Cloud Storage buckets, or in BigQuery tables. How can you organize that data so that it gets consolidated and analyzed, and give you insights on your business?
Cloud Composer can help you build workflows and move your data between regions and storage systems, with an intuitive graphical view. Among others, it has templates for easy and reliable data moving between BigQuery and Cloud Storage, both ways.
In this advanced lab, you'll be able to explore that. You will learn how to create and run an Apache Airflow workflow in Cloud Composer that completes the following tasks:
Reads from a config file the list of tables to copy
Exports the list of tables from a BigQuery dataset located in US to Cloud Storage
Copies the exported tables from US to EU Cloud Storage buckets
Imports the list of tables into the target BigQuery Dataset in EU
Task 1. Create Cloud Composer environment
First, create a Cloud Composer environment by clicking on Composer in the Navigation menu:
Composer highlighted in the Navigation menu
Then click Create environment.
In dropdown menu, select Composer 1.
Set the following parameters for your environment:
Name: composer-advanced-lab
Location: us-east1
Image Version: composer-1.20.12-airflow-2.4.3
DAG Graph View
Zone: us-east1-c
Leave all other settings as default.
Click Create.
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ history
1 gcloud auth list
2 gcloud config list project
3 ls
4 pwd
5 sudo apt-get install -y virtualenv
6 python3 -m venv venv
7 source venv/bin/activate
8 DAGS_BUCKET=us-east1-composer-advanced--8a7c7892-bucket
9 echo $DAGS_BUCKET
10 gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_source_bucket {UNIQUE ID}-us
11 gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_dest_bucket {UNIQUE_ID}-eu
12 history
13 gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set table_list_file_path /home/airflow/gcs/dags/bq_copy_eu_to_us_sample.csv
14 gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_source_bucket {UNIQUE ID}-us
15 gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_source_bucket 20236552634-us
16 gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_dest_bucket 20236552634-eu
17 history
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$
Welcome to Cloud Shell! Type "help" to get started.
Your Cloud Platform project in this session is set to qwiklabs-gcp-01-ac5d04c8b2c2.
Use “gcloud config set project [PROJECT_ID]” to change to a different project.
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ gcloud auth list
Credentialed Accounts
ACTIVE: *
ACCOUNT: student-01-adbf4448682a@qwiklabs.net
To set the active account, run:
$ gcloud config set account `ACCOUNT`
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ gcloud config list project
[core]
project = qwiklabs-gcp-01-ac5d04c8b2c2
Your active configuration is: [cloudshell-18847]
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ ls
README-cloudshell.txt
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ pwd
/home/student_01_adbf4448682a
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ sudo apt-get install -y virtualenv
********************************************************************************
You are running apt-get inside of Cloud Shell. Note that your Cloud Shell
machine is ephemeral and no system-wide change will persist beyond session end.
To suppress this warning, create an empty ~/.cloudshell/no-apt-get-warning file.
The command will automatically proceed in 5 seconds or on any key.
Visit https://cloud.google.com/shell/help for more information.
********************************************************************************
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following package was automatically installed and is no longer required:
libpcre2-posix2
Use 'sudo apt autoremove' to remove it.
The following additional packages will be installed:
python3-appdirs python3-distlib python3-filelock python3-importlib-metadata python3-more-itertools python3-six python3-virtualenv python3-zipp
The following NEW packages will be installed:
python3-appdirs python3-distlib python3-filelock python3-importlib-metadata python3-more-itertools python3-six python3-virtualenv python3-zipp
virtualenv
0 upgraded, 9 newly installed, 0 to remove and 7 not upgraded.
Need to get 331 kB of archives.
After this operation, 1,464 kB of additional disk space will be used.
Get:1 http://deb.debian.org/debian bullseye/main amd64 python3-appdirs all 1.4.4-1 [12.7 kB]
Get:2 http://deb.debian.org/debian bullseye/main amd64 python3-distlib all 0.3.2+really+0.3.1-0.1 [123 kB]
Get:3 http://deb.debian.org/debian bullseye/main amd64 python3-filelock all 3.0.12-2 [8,036 B]
Get:4 http://deb.debian.org/debian bullseye/main amd64 python3-six all 1.16.0-2 [17.5 kB]
Get:5 http://deb.debian.org/debian bullseye/main amd64 python3-more-itertools all 4.2.0-3 [42.7 kB]
Get:6 http://deb.debian.org/debian bullseye/main amd64 python3-zipp all 1.0.0-3 [6,060 B]
Get:7 http://deb.debian.org/debian bullseye/main amd64 python3-importlib-metadata all 1.6.0-2 [10.3 kB]
Get:8 http://deb.debian.org/debian bullseye/main amd64 python3-virtualenv all 20.4.0+ds-2+deb11u1 [89.1 kB]
Get:9 http://deb.debian.org/debian bullseye/main amd64 virtualenv all 20.4.0+ds-2+deb11u1 [21.4 kB]
Fetched 331 kB in 0s (3,066 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package python3-appdirs.
(Reading database ... 140081 files and directories currently installed.)
Preparing to unpack .../0-python3-appdirs_1.4.4-1_all.deb ...
Unpacking python3-appdirs (1.4.4-1) ...
Selecting previously unselected package python3-distlib.
Preparing to unpack .../1-python3-distlib_0.3.2+really+0.3.1-0.1_all.deb ...
Unpacking python3-distlib (0.3.2+really+0.3.1-0.1) ...
Selecting previously unselected package python3-filelock.
Preparing to unpack .../2-python3-filelock_3.0.12-2_all.deb ...
Unpacking python3-filelock (3.0.12-2) ...
Selecting previously unselected package python3-six.
Preparing to unpack .../3-python3-six_1.16.0-2_all.deb ...
Unpacking python3-six (1.16.0-2) ...
Selecting previously unselected package python3-more-itertools.
Preparing to unpack .../4-python3-more-itertools_4.2.0-3_all.deb ...
Unpacking python3-more-itertools (4.2.0-3) ...
Selecting previously unselected package python3-zipp.
Preparing to unpack .../5-python3-zipp_1.0.0-3_all.deb ...
Unpacking python3-zipp (1.0.0-3) ...
Selecting previously unselected package python3-importlib-metadata.
Preparing to unpack .../6-python3-importlib-metadata_1.6.0-2_all.deb ...
Unpacking python3-importlib-metadata (1.6.0-2) ...
Selecting previously unselected package python3-virtualenv.
Preparing to unpack .../7-python3-virtualenv_20.4.0+ds-2+deb11u1_all.deb ...
Unpacking python3-virtualenv (20.4.0+ds-2+deb11u1) ...
Selecting previously unselected package virtualenv.
Preparing to unpack .../8-virtualenv_20.4.0+ds-2+deb11u1_all.deb ...
Unpacking virtualenv (20.4.0+ds-2+deb11u1) ...
Setting up python3-filelock (3.0.12-2) ...
Setting up python3-distlib (0.3.2+really+0.3.1-0.1) ...
Setting up python3-six (1.16.0-2) ...
Setting up python3-appdirs (1.4.4-1) ...
Setting up python3-more-itertools (4.2.0-3) ...
Setting up python3-zipp (1.0.0-3) ...
Setting up python3-importlib-metadata (1.6.0-2) ...
Setting up python3-virtualenv (20.4.0+ds-2+deb11u1) ...
Setting up virtualenv (20.4.0+ds-2+deb11u1) ...
Processing triggers for man-db (2.9.4-2) ...
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ python3 -m venv venv
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ source venv/bin/activate
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ DAGS_BUCKET=us-east1-composer-advanced--8a7c7892-bucket
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ echo $DAGS_BUCKET
us-east1-composer-advanced--8a7c7892-bucket
((venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ gcloud composer environments run composer-advanced-lab \--location us-east1 variables -- \set table_list_file_path /home/airflow/gcs/dags/bq_copy_eu_to_us_sample.csv
kubeconfig entry generated for us-east1-composer-advanced--8a7c7892-gke.
Executing withcreated
in the following Kubernetes cluster namespace: composer-1-20-12-airflow-2-4-3-8a7c7892
Variable table_list_file_path created
((venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_source_bucket 20236552634-us
kubeconfig entry generated for us-east1-composer-advanced--8a7c7892-gke.
Executing within the following Kubernetes cluster namespace: composer-1-20-12-airflow-2-4-3-8a7c7892
Variable gcs_source_bucket created
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_dest_bucket 20236552634-eu
kubeconfig entry generated for us-east1-composer-advanced--8a7c7892-gke.
Executing within the following Kubernetes cluster namespace: composer-1-20-12-airflow-2-4-3-8a7c7892
Variable gcs_dest_bucket created
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ gcloud composer environments run composer-advanced-lab \
--location us-east1 variables -- \
get gcs_source_bucket
kubeconfig entry generated for us-east1-composer-advanced--8a7c7892-gke.
Executing within the following Kubernetes cluster namespace: composer-1-20-12-airflow-2-4-3-8a7c7892
20236552634-us
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ gcloud composer environments run composer-advanced-lab \
--location us-east1 variables -- \
get gcs_source_bucket
kubeconfig entry generated for us-east1-composer-advanced--8a7c7892-gke.
Executing within the following Kubernetes cluster namespace: composer-1-20-12-airflow-2-4-3-8a7c7892
20236552634-us
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ cd ~
gsutil -m cp -r gs://spls/gsp283/python-docs-samples .
Copying gs://spls/gsp283/python-docs-samples/.DS_Store...
Copying gs://spls/gsp283/python-docs-samples/.git/description...
Copying gs://spls/gsp283/python-docs-samples/third_party/apache-airflow/plugins/gcs_plugin/operators/gcs_to_gcs.py...
- [146/146 files][ 1.7 MiB/ 1.7 MiB] 100% Done
Operation completed over 146 objects/1.7 MiB.
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ gsutil cp -r python-docs-samples/third_party/apache-airflow/plugins/* gs://$DAGS_BUCKET/plugins
Copying file://python-docs-samples/third_party/apache-airflow/plugins/gcs_plugin/__init__.py [Content-Type=text/x-python]...
Copying file://python-docs-samples/third_party/apache-
Copying file://python-docs-samples/third_party/apache-airflow/plugins/__init__.py [Content-Type=text/x-python]...
\ [6 files][ 28.8 KiB/ 28.8 KiB]
Operation completed over 6 objects/28.8 KiB.
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ echo $DAGS_BUCKET
us-east1-composer-advanced--8a7c7892-bucket
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ gsutil cp python-docs-samples/composer/workflows/bq_copy_across_locations.py gs://$DAGS_BUCKET/dags
gsutil cp python-docs-samples/composer/workflows/bq_copy_eu_to_us_sample.csv gs://$DAGS_BUCKET/dags
Copying file://python-docs-samples/composer/workflows/bq_copy_across_locations.py [Content-Type=text/x-python]...
/ [1 files][ 6.7 KiB/ 6.7 KiB]
Operation completed over 1 objects/6.7 KiB.
Copying file://python-docs-samples/composer/workflows/bq_copy_eu_to_us_sample.csv [Content-Type=text/csv]...
/ [1 files][ 108.0 B/ 108.0 B]
Operation completed over 1 objects/108.0 B.
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ history
1 gcloud auth list
2 gcloud config list project
3 ls
4 pwd
5 sudo apt-get install -y virtualenv
6 python3 -m venv venv
7 source venv/bin/activate
8 DAGS_BUCKET=us-east1-composer-advanced--8a7c7892-bucket
9 echo $DAGS_BUCKET
10 gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_source_bucket {UNIQUE ID}-us
11 gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_dest_bucket {UNIQUE_ID}-eu
12 history
13 gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set table_list_file_path /home/airflow/gcs/dags/bq_copy_eu_to_us_sample.csv
14 gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_source_bucket {UNIQUE ID}-us
15 gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_source_bucket 20236552634-us
16 gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_dest_bucket 20236552634-eu
17 history
18 gcloud composer environments run composer-advanced-lab --location us-east1 variables -- get gcs_source_bucket
19 gcloud composer environments run composer-advanced-lab --location eu variables -- get gcs_source_bucket
20 cd ~
21 gsutil -m cp -r gs://spls/gsp283/python-docs-samples .
22 gsutil cp -r python-docs-samples/third_party/apache-airflow/plugins/* gs://$DAGS_BUCKET/plugins
23 echo $DAGS_BUCKET
24 gsutil cp python-docs-samples/composer/workflows/bq_copy_across_locations.py gs://$DAGS_BUCKET/dags
25 gsutil cp python-docs-samples/composer/workflows/bq_copy_eu_to_us_sample.csv gs://$DAGS_BUCKET/dags
26 history
No comments:
Post a Comment