Overview
In this lab you partition a dataset into two datasets, a training dataset and a test dataset. You use the training dataset to develop a model. You use the test dataset to evaluate the accuracy of the model, and then to evaluate predictive models in a repeatable manner.
After you use the training set to create a model, you evaluate the model against the test dataset. You store the data in BigQuery, and use Jupyterlab to perform the analysis.
This lab uses a dataset provided by US Bureau of Transport Statistics. The dataset provides historic information about internal flights in the United States and can be used to demonstrate a wide range of data science concepts and techniques.
BigQuery is a RESTful web service that enables interactive analysis of massive datasets. BigQuery works in conjunction with Google Storage. See BigQuery for more information.
Jupyterlab is a web-based interactive devlopment environment for notebooks, code and data. See Jupyterlab.
Objectives
Partition a BigQuery dataset into a training dataset and a test dataset
Create a predictive model using the training dataset
Evaluate the predictive model using the test dataset
No comments:
Post a Comment