World of GCP by Ketan Patel: BigQuery

BigQuery is a fully-managed enterprise data warehouse that enables super-fast SQL queries.

The BigQuery UI helps you complete tasks like running queries, loading data, and even creating and training ML models.

Storing and querying massive datasets can be time consuming and expensive without the right hardware and infrastructure.

BigQuery is an enterprise data warehouse that solves this problem by enabling super-fast SQL queries using the processing power of Google's infrastructure.

Simply move your data into BigQuery and let us handle the hard work.

You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data.

You can access BigQuery in the Console, the command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python.

There are also a variety of third-party tools that you can use to interact with BigQuery, such as visualizing the data or loading the data.

How to query public tables and load sample data into BigQuery.

Query a public dataset
Create a new dataset
Load data into a new table
Query a custom table

Task 1. Open BigQuery

The BigQuery console provides an interface to query tables, including public datasets offered by BigQuery. The query you will run accesses a table from a public dataset that BigQuery provides. It uses standard query language to search the dataset, and limits the results returned to 10.

Open the BigQuery console

In the Google Cloud Console, select Navigation menu > BigQuery.

The Welcome to BigQuery in the Cloud Console message box opens. This message box provides a link to the quickstart guide and the release notes.

Click Done.

The BigQuery console opens.

Task 2. Query a public dataset

Click + (SQL query) to create a new query. Copy and paste the following query into the BigQuery Query editor:

#standardSQL

SELECT

weight_pounds, state, year, gestation_weeks

FROM

`bigquery-public-data.samples.natality`

ORDER BY weight_pounds DESC LIMIT 10;

Create a new dataset

To load custom data into a table, you first need to create a BigQuery dataset.

Datasets help control access to tables and views in a project. This lab uses only one table, but you still need a dataset to hold the table.

In the Explorer pane, near your project id, click on View actions (view actions icon) then click Create dataset.

Load data into a new table

Next you create a table inside the babynames dataset, then load the data file from your storage bucket into the new table.

The custom data file you'll use contains approximately 7 MB of data about popular baby names, provided by the US Social Security Administration.

In the Cloud Console, select Navigation menu > BigQuery to return to the BigQuery console.

Navigate to the babynames dataset, by clicking View actions (view actions icon) near your dataset then click Create table.

In the Create table dialog, set the following fields, leave all others at the default value:

Field Value

Create table from Google Cloud Storage

Select file from GCS bucket spls/gsp072/baby-names/yob2014.txt

File format CSV

Table names_2014

Schema > Edit as text Slide on, then add the following in the textbox: name:string,gender:string,count:integer

Click the Create table button.

Task 5. Preview the table

Check your table! View the first few rows of the data.

Click the names_2014 table in the left panel, then click Preview.

Task 6. Query a custom dataset

Running a query against custom data is identical to the querying a public dataset that you did earlier, except that now you're querying your own table instead of a public table.

In BigQuery, click the + (SQL query) icon at the top.

Paste or type the following query into the query Editor.

#standardSQL

SELECT

name, count

FROM

`babynames.names_2014`

WHERE

gender = 'M'

ORDER BY count DESC LIMIT 5;

World of GCP by Ketan Patel

BigQuery

BigQuery is a fully-managed enterprise data warehouse that enables super-fast SQL queries.

Task 1. Open BigQuery

Create a new dataset

Load data into a new table

BigQuery is a fully-managed enterprise data warehouse that enables super-fast SQL queries.

No comments:

Post a Comment

Streaming Data Processing: Publish Streaming Data into PubSub

Report Abuse

Labels