ketan_patel@cloudshell:~ (new-user-learning)$ gcloud config list project
[core]
project = new-user-learning
Your active configuration is: [cloudshell-1078]
ketan_patel@cloudshell:~ (new-user-learning)$
ketan_patel@cloudshell:~ (new-user-learning)$ gcloud services list --enabled | grep -i dataflow
ketan_patel@cloudshell:~ (new-user-learning)$ gcloud services list --available | grep -i dataflow
NAME: dataflow.googleapis.com
TITLE: Dataflow API
ketan_patel@cloudshell:~ (new-user-learning)$ gcloud services enable dataflow.googleapis.com
Operation "operations/acf.p2-457904926486-ab0c6ab9-cb9f-4367-99a0-01052a03314a" finished successfully.
ketan_patel@cloudshell:~ (new-user-learning)$ gcloud services list --enabled | grep -i dataflow
NAME: dataflow.googleapis.com
TITLE: Dataflow API
Google Dataflow is a service for stream and batch processing at scale. When there is a need for processing lots of streamed data like click stream or data from IoT devices, Dataflow will be the starting point for receiving all the stream data. The data can then be sent to storage (BigQuery, Bigtable, GCS) for further processing (ML):
CREATE BUCKET(Storage):
kp@cloudshell:~ $ gsutil mb gs://ketanlearningbucket0818
Creating gs://ketanlearningbucket0818/...
CREATE PUBSUB:
kp@cloudshell:~ $ gcloud pubsub topics create ketantopic0818
Created topic [projects/new-user-learning/topics/ketantopic0818].
CREATE DATASET IN BIGQUERY:
kp@cloudshell:~$ bq mk ketandataset0818
Dataset 'new-user-learning:ketandataset0818' successfully created.
create a new table in the current dataset to get messages from PubSub to BigQuery in that table.
Here, let’s take a simple example of a message in JSON (JavaScript Object Notation) format, given below.
{
"name" : "Ketan",
"language" : "Cloud Eng"
}
kp@cloudshell:~ $ bq mk ketandataset0818.table01 name:STRING,language:STRING
Table 'new-user-learning:ketandataset0818.table01' successfully created

CONNECT PUBSUB TO BIGQUERY USING DATAFLOW:
CREATE DATAFLOW JOB:
USE GUI TO CREATE DATAFLOW:
INSTEAD OF GUI YOU CAN USE FOLLOWING CLI TO CREATE IT (BUT IT'S COMPLICATED)
$ gcloud dataflow jobs run Ketandataflowjob0818 --gcs-location gs://dataflow-templates-us-central1/latest/PubSub_to_BigQuery --region us-central1 --staging-location gs://ketanlearningbucket0818/temp --parameters inputTopic=projects/new-user-learning/topics/ketantopic0818,outputTableSpec=new-user-learning:ketandataset0818.table01
RUN JOB:
Now, go to PubSub and click on the “+ PUBLISH MESSAGE” button.
Copy and paste the above message in JSON or provide your message based on the fields you added in the table.
{
"name" : "Ketan",
"language" : "Cloud Eng"
}
Click the “PUBLISH” button. It will publish the message from PubSub to BigQuery using Dataflow and Google Cloud Storage.
PUBLISH 100 MESSAGES:
{
"name" : "Patel",
"language" : "GCP Cloud Eng"
}
No comments:
Post a Comment