Cloud Composer: Copying BigQuery Tables Across Different Locations

 Overview
Imagine you have datasets that live in different locations around the globe. Let's say your data is in Google Cloud Storage buckets, or in BigQuery tables. How can you organize that data so that it gets consolidated and analyzed, and give you insights on your business?

Cloud Composer can help you build workflows and move your data between regions and storage systems, with an intuitive graphical view. Among others, it has templates for easy and reliable data moving between BigQuery and Cloud Storage, both ways.

In this advanced lab, you'll be able to explore that. You will learn how to create and run an Apache Airflow workflow in Cloud Composer that completes the following tasks:

Reads from a config file the list of tables to copy
Exports the list of tables from a BigQuery dataset located in US to Cloud Storage
Copies the exported tables from US to EU Cloud Storage buckets
Imports the list of tables into the target BigQuery Dataset in EU






Task 1. Create Cloud Composer environment

First, create a Cloud Composer environment by clicking on Composer in the Navigation menu:
Composer highlighted in the Navigation menu

Then click Create environment.

In dropdown menu, select Composer 1.

Set the following parameters for your environment:

Name: composer-advanced-lab
Location: us-east1
Image Version: composer-1.20.12-airflow-2.4.3
DAG Graph View

Zone: us-east1-c
Leave all other settings as default.

Click Create.
































(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ 
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ history
    1  gcloud auth list
    2  gcloud config list project
    3  ls
    4  pwd
    5  sudo apt-get install -y virtualenv
    6  python3 -m venv venv
    7  source venv/bin/activate
    8  DAGS_BUCKET=us-east1-composer-advanced--8a7c7892-bucket
    9  echo $DAGS_BUCKET
   10  gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_source_bucket {UNIQUE ID}-us
   11  gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_dest_bucket {UNIQUE_ID}-eu
   12  history
   13  gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set table_list_file_path /home/airflow/gcs/dags/bq_copy_eu_to_us_sample.csv
   14  gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_source_bucket {UNIQUE ID}-us
   15  gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_source_bucket 20236552634-us
   16  gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_dest_bucket 20236552634-eu
   17  history
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ 




Welcome to Cloud Shell! Type "help" to get started.
Your Cloud Platform project in this session is set to qwiklabs-gcp-01-ac5d04c8b2c2.
Use “gcloud config set project [PROJECT_ID]” to change to a different project.
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ gcloud auth list
Credentialed Accounts

ACTIVE: *
ACCOUNT: student-01-adbf4448682a@qwiklabs.net

To set the active account, run:
    $ gcloud config set account `ACCOUNT`

student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ gcloud config list project
[core]
project = qwiklabs-gcp-01-ac5d04c8b2c2

Your active configuration is: [cloudshell-18847]
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ 
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ 
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ 
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ 
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ ls
README-cloudshell.txt
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ pwd
/home/student_01_adbf4448682a
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ sudo apt-get install -y virtualenv
********************************************************************************
You are running apt-get inside of Cloud Shell. Note that your Cloud Shell  
machine is ephemeral and no system-wide change will persist beyond session end. 

To suppress this warning, create an empty ~/.cloudshell/no-apt-get-warning file.
The command will automatically proceed in 5 seconds or on any key. 

Visit https://cloud.google.com/shell/help for more information.                 
********************************************************************************
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following package was automatically installed and is no longer required:
  libpcre2-posix2
Use 'sudo apt autoremove' to remove it.
The following additional packages will be installed:
  python3-appdirs python3-distlib python3-filelock python3-importlib-metadata python3-more-itertools python3-six python3-virtualenv python3-zipp
The following NEW packages will be installed:
  python3-appdirs python3-distlib python3-filelock python3-importlib-metadata python3-more-itertools python3-six python3-virtualenv python3-zipp
  virtualenv
0 upgraded, 9 newly installed, 0 to remove and 7 not upgraded.
Need to get 331 kB of archives.
After this operation, 1,464 kB of additional disk space will be used.
Get:1 http://deb.debian.org/debian bullseye/main amd64 python3-appdirs all 1.4.4-1 [12.7 kB]
Get:2 http://deb.debian.org/debian bullseye/main amd64 python3-distlib all 0.3.2+really+0.3.1-0.1 [123 kB]
Get:3 http://deb.debian.org/debian bullseye/main amd64 python3-filelock all 3.0.12-2 [8,036 B]
Get:4 http://deb.debian.org/debian bullseye/main amd64 python3-six all 1.16.0-2 [17.5 kB]
Get:5 http://deb.debian.org/debian bullseye/main amd64 python3-more-itertools all 4.2.0-3 [42.7 kB]
Get:6 http://deb.debian.org/debian bullseye/main amd64 python3-zipp all 1.0.0-3 [6,060 B]
Get:7 http://deb.debian.org/debian bullseye/main amd64 python3-importlib-metadata all 1.6.0-2 [10.3 kB]
Get:8 http://deb.debian.org/debian bullseye/main amd64 python3-virtualenv all 20.4.0+ds-2+deb11u1 [89.1 kB]
Get:9 http://deb.debian.org/debian bullseye/main amd64 virtualenv all 20.4.0+ds-2+deb11u1 [21.4 kB]
Fetched 331 kB in 0s (3,066 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package python3-appdirs.
(Reading database ... 140081 files and directories currently installed.)
Preparing to unpack .../0-python3-appdirs_1.4.4-1_all.deb ...
Unpacking python3-appdirs (1.4.4-1) ...
Selecting previously unselected package python3-distlib.
Preparing to unpack .../1-python3-distlib_0.3.2+really+0.3.1-0.1_all.deb ...
Unpacking python3-distlib (0.3.2+really+0.3.1-0.1) ...
Selecting previously unselected package python3-filelock.
Preparing to unpack .../2-python3-filelock_3.0.12-2_all.deb ...
Unpacking python3-filelock (3.0.12-2) ...
Selecting previously unselected package python3-six.
Preparing to unpack .../3-python3-six_1.16.0-2_all.deb ...
Unpacking python3-six (1.16.0-2) ...
Selecting previously unselected package python3-more-itertools.
Preparing to unpack .../4-python3-more-itertools_4.2.0-3_all.deb ...
Unpacking python3-more-itertools (4.2.0-3) ...
Selecting previously unselected package python3-zipp.
Preparing to unpack .../5-python3-zipp_1.0.0-3_all.deb ...
Unpacking python3-zipp (1.0.0-3) ...
Selecting previously unselected package python3-importlib-metadata.
Preparing to unpack .../6-python3-importlib-metadata_1.6.0-2_all.deb ...
Unpacking python3-importlib-metadata (1.6.0-2) ...
Selecting previously unselected package python3-virtualenv.
Preparing to unpack .../7-python3-virtualenv_20.4.0+ds-2+deb11u1_all.deb ...
Unpacking python3-virtualenv (20.4.0+ds-2+deb11u1) ...
Selecting previously unselected package virtualenv.
Preparing to unpack .../8-virtualenv_20.4.0+ds-2+deb11u1_all.deb ...
Unpacking virtualenv (20.4.0+ds-2+deb11u1) ...
Setting up python3-filelock (3.0.12-2) ...
Setting up python3-distlib (0.3.2+really+0.3.1-0.1) ...
Setting up python3-six (1.16.0-2) ...
Setting up python3-appdirs (1.4.4-1) ...
Setting up python3-more-itertools (4.2.0-3) ...
Setting up python3-zipp (1.0.0-3) ...
Setting up python3-importlib-metadata (1.6.0-2) ...
Setting up python3-virtualenv (20.4.0+ds-2+deb11u1) ...
Setting up virtualenv (20.4.0+ds-2+deb11u1) ...
Processing triggers for man-db (2.9.4-2) ...



student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ python3 -m venv venv
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ 
student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ source venv/bin/activate

(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ DAGS_BUCKET=us-east1-composer-advanced--8a7c7892-bucket

(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ echo $DAGS_BUCKET
us-east1-composer-advanced--8a7c7892-bucket

((venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ gcloud composer environments run composer-advanced-lab \--location us-east1 variables -- \set table_list_file_path /home/airflow/gcs/dags/bq_copy_eu_to_us_sample.csv

kubeconfig entry generated for us-east1-composer-advanced--8a7c7892-gke.
Executing withcreated

in the following Kubernetes cluster namespace: composer-1-20-12-airflow-2-4-3-8a7c7892
Variable table_list_file_path  created

((venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_source_bucket 20236552634-us

kubeconfig entry generated for us-east1-composer-advanced--8a7c7892-gke.
Executing within the following Kubernetes cluster namespace: composer-1-20-12-airflow-2-4-3-8a7c7892
Variable gcs_source_bucket created


(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_dest_bucket 20236552634-eu

kubeconfig entry generated for us-east1-composer-advanced--8a7c7892-gke.
Executing within the following Kubernetes cluster namespace: composer-1-20-12-airflow-2-4-3-8a7c7892
Variable gcs_dest_bucket created


(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ 




(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ gcloud composer environments run composer-advanced-lab \
    --location us-east1 variables -- \
    get gcs_source_bucket

kubeconfig entry generated for us-east1-composer-advanced--8a7c7892-gke.
Executing within the following Kubernetes cluster namespace: composer-1-20-12-airflow-2-4-3-8a7c7892
20236552634-us


(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ 


(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ 
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ gcloud composer environments run composer-advanced-lab \
    --location us-east1 variables -- \
    get gcs_source_bucket
kubeconfig entry generated for us-east1-composer-advanced--8a7c7892-gke.
Executing within the following Kubernetes cluster namespace: composer-1-20-12-airflow-2-4-3-8a7c7892
20236552634-us



(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ cd ~
gsutil -m cp -r gs://spls/gsp283/python-docs-samples .

Copying gs://spls/gsp283/python-docs-samples/.DS_Store...
Copying gs://spls/gsp283/python-docs-samples/.git/description...                
Copying gs://spls/gsp283/python-docs-samples/third_party/apache-airflow/plugins/gcs_plugin/operators/gcs_to_gcs.py...
- [146/146 files][  1.7 MiB/  1.7 MiB] 100% Done                                
Operation completed over 146 objects/1.7 MiB.                                    

(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ gsutil cp -r python-docs-samples/third_party/apache-airflow/plugins/* gs://$DAGS_BUCKET/plugins

Copying file://python-docs-samples/third_party/apache-airflow/plugins/gcs_plugin/__init__.py [Content-Type=text/x-python]...
Copying file://python-docs-samples/third_party/apache-
Copying file://python-docs-samples/third_party/apache-airflow/plugins/__init__.py [Content-Type=text/x-python]...
\ [6 files][ 28.8 KiB/ 28.8 KiB]                                                
Operation completed over 6 objects/28.8 KiB.                                     


(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ echo $DAGS_BUCKET
us-east1-composer-advanced--8a7c7892-bucket

(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ gsutil cp python-docs-samples/composer/workflows/bq_copy_across_locations.py gs://$DAGS_BUCKET/dags
gsutil cp python-docs-samples/composer/workflows/bq_copy_eu_to_us_sample.csv gs://$DAGS_BUCKET/dags

Copying file://python-docs-samples/composer/workflows/bq_copy_across_locations.py [Content-Type=text/x-python]...
/ [1 files][  6.7 KiB/  6.7 KiB]                                                
Operation completed over 1 objects/6.7 KiB.                                      
Copying file://python-docs-samples/composer/workflows/bq_copy_eu_to_us_sample.csv [Content-Type=text/csv]...
/ [1 files][  108.0 B/  108.0 B]                                                
Operation completed over 1 objects/108.0 B.                                      
(venv) student_01_adbf4448682a@cloudshell:~ (qwiklabs-gcp-01-ac5d04c8b2c2)$ history
    1  gcloud auth list
    2  gcloud config list project
    3  ls
    4  pwd
    5  sudo apt-get install -y virtualenv
    6  python3 -m venv venv
    7  source venv/bin/activate
    8  DAGS_BUCKET=us-east1-composer-advanced--8a7c7892-bucket
    9  echo $DAGS_BUCKET
   10  gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_source_bucket {UNIQUE ID}-us
   11  gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_dest_bucket {UNIQUE_ID}-eu
   12  history
   13  gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set table_list_file_path /home/airflow/gcs/dags/bq_copy_eu_to_us_sample.csv
   14  gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_source_bucket {UNIQUE ID}-us
   15  gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_source_bucket 20236552634-us
   16  gcloud composer environments run composer-advanced-lab --location us-east1 variables -- set gcs_dest_bucket 20236552634-eu
   17  history
   18  gcloud composer environments run composer-advanced-lab     --location us-east1 variables --     get gcs_source_bucket
   19  gcloud composer environments run composer-advanced-lab     --location eu variables --     get gcs_source_bucket
   20  cd ~
   21  gsutil -m cp -r gs://spls/gsp283/python-docs-samples .
   22  gsutil cp -r python-docs-samples/third_party/apache-airflow/plugins/* gs://$DAGS_BUCKET/plugins
   23  echo $DAGS_BUCKET
   24  gsutil cp python-docs-samples/composer/workflows/bq_copy_across_locations.py gs://$DAGS_BUCKET/dags
   25  gsutil cp python-docs-samples/composer/workflows/bq_copy_eu_to_us_sample.csv gs://$DAGS_BUCKET/dags
   26  history
 





















No comments:

Post a Comment

AppEngine - Python

tudent_04_347b5286260a@cloudshell:~/python-docs-samples/appengine/standard_python3/hello_world (qwiklabs-gcp-00-88834e0beca1)$ sudo apt upda...