Import data from CSV text files into Cloud SQL and then carry out some basic data analysis using simple queries.
The dataset used in this lab comes from the US Bureau of Transport Statistics and contains historical information about internal flights in the United States.
This dataset can be used to demonstrate a wide range of data science concepts and techniques and is used in all of the labs in the Data Science on Google Cloud Platform quest.
- Create Cloud SQL instance
- Create a Cloud SQL database
- Import text data into Cloud SQL
- Build an initial data model using queries
Clone Data Science repo.
Create the environment variables used later in the lab for your project ID and the storage bucket that contains your data:
export PROJECT_ID=$(gcloud info --format='value(config.project)')
export BUCKET=${PROJECT_ID}-ml
Copied!
Enter following command to stage the file into Cloud Storage bucket:
student_03_09233bd91fc2@cloudshell:~ (qwiklabs-gcp-04-885cace79f9b)$ git clone https://github.com/GoogleCloudPlatform/data-science-on-gcp/
$ cd data-science-on-gcp/
$ export PROJECT_ID=$(gcloud info --format='value(config.project)')
$ export BUCKET=${PROJECT_ID}-ml
$ cd 03_sqlstudio/
$ gcloud sql instances create flights \
--database-version=POSTGRES_13 --cpu=2 --memory=8GiB \
--region=us-west1 --root-password=Passw0rd
Creating Cloud SQL instance for POSTGRES_13...done.
Created [https://sqladmin.googleapis.com/sql/v1beta4/projects/qwiklabs-gcp-04-885cace79f9b/instances/flights].
NAME: flights
DATABASE_VERSION: POSTGRES_13
LOCATION: us-west1-c
TIER: db-custom-2-8192
PRIMARY_ADDRESS: 34.83.69.16
PRIVATE_ADDRESS: -
STATUS: RUNNABLE
student_03_09233bd91fc2@cloudshell:~/data-science-on-gcp/03_sqlstudio (qwiklabs-gcp-04-885cace79f9b)$ export ADDRESS=$(curl -s http://ipecho.net/plain)/32
student_03_09233bd91fc2@cloudshell:~/data-science-on-gcp/03_sqlstudio (qwiklabs-gcp-04-885cace79f9b)$ gcloud sql instances patch flights --authorized-networks $ADDRESS
When adding a new IP address to authorized networks, make sure to also include any IP addresses that have already been authorized. Otherwise, they will be overwritten and
de-authorized.
Do you want to continue (Y/n)? Y
The following message will be used for the patch API method.
{"name": "flights", "project": "qwiklabs-gcp-04-885cace79f9b", "settings": {"ipConfiguration": {"authorizedNetworks": [{"value": "34.82.7.91/32"}]}}}
Patching Cloud SQL instance...done.
Updated [https://sqladmin.googleapis.com/sql/v1beta4/projects/qwiklabs-gcp-04-885cace79f9b/instances/flights].
$
$ gcloud sql connect flights --user=postgres
Allowlisting your IP for incoming connection for 5 minutes...done.
Connecting to database with SQL user [postgres].Password:
psql (15.3 (Debian 15.3-1.pgdg110+1), server 13.10)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.
postgres=> \c bts;
Password:
psql (15.3 (Debian 15.3-1.pgdg110+1), server 13.10)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
You are now connected to database "bts" as user "postgres".
bts=>
bts=> SELECT "Origin", COUNT(*) AS num_flights
FROM flights GROUP BY "Origin"
ORDER BY num_flights DESC
LIMIT 5;
Origin | num_flights
--------+-------------
(0 rows)
bts=>
No comments:
Post a Comment