In this module, we will learn how to train, tune, and serve a model manually from the Jupyter Notebook on AI Platform. Here's the agenda for today. We will first briefly go over all the components of the model development process and how they interact with each other. Then, we will focus on the dataset creation part of the story. Next, we will explain how to write an ML model in SKlearn, how hyperparameters can be tuned with AI Platform, and how to package the model into a Docker training container. Then, we'll show you how to build, push, train, and tune the model using the training container. Finally, we'll go over how to deploy the model as a REST API on AI Platform and query it. When you're building a machine learning model, you have three steps, essentially. Step 1 is to create your dataset, which will be used to build the model. After you create a dataset and apply any needed transformations to it, you build the model. When the model is built and performing as expected, the last major step is to operationalize it, which means you train it at scale and deploy it. Operationalizing the model on AI Platform consists itself of three main steps: The first step is to implement a tunable training application, which requires writing your model into a train.py file. This will consist of the training code and the configuration of the parameters. The second step is to package the training code into a Docker container with all the training code dependencies (Operating Systems, libraries, assets, etc.) This Docker container is used to kick off the training, at scale, on AI Platform. The definition and configuration of this Docker container is specified in a file called Dockerfile. The last step is to specify the training configurations, such as the hyperparameter ranges to be tuned, into a config.yaml file. We'll go over the details of each of these three steps in the next sections. When developing an ML model, developers and data scientists usually develop most of their code on Jupyter Notebooks. An AI Platform Notebook, is a configurable Jupyter Notebook server on AI Platform. This is a typical flow: Load the training data from BigQuery. You can then store the training files, which have been already transformed, in Cloud Storage. Package the training code into a train.py file. This code will be pushed, as a Docker image, into Container Registry. Trigger the training on AI Platform Training. AI Platform also stores the training artifacts (such as the trained model) in Cloud Storage. Deploy the trained model using AI Platform Prediction, so the model can be served. AI Platform Prediction does that by retrieving the saved model from Cloud Storage and deploying it as an API. You probably noticed that I mentioned only the orange boxes in this diagram, not the white ones. This is because the description I just gave you covers a manual process for the ML steps we discussed. The focus of this course is how to automate this process, which includes services to allow for version control of the source code, continuous integration and deployment, and pipelines. Here's a different view of all the components we've discussed so far. Let's see, with the help of this diagram, how we are going to use them and how they interact with each other. At the center, there is the Jupyterlab notebook where we experiment with our code and interact with all the other components from it. Again, these components are: BigQuery, which you can use to create repeatable splits of the dataset (train, validation, and test splits). Cloud Storage, where we export the dataset splits as CSV files. These CSV files are then consumed by the model. We also store the trained models in Cloud Storage. The next component is Container Registry, or gcr/io, where we store the training container that packages are training code (the Docker containers). AI Platform Training, which is in charge of running the training containers for training and hyperparameter tuning. And finally, AI Platform Prediction, which takes a trained model stored in Cloud Storage and deploys it as a REST API that we can query. In the next section, we'll dissect each part of this diagram and learn how to use each of these components. In the experimental phase that we have discussed so far in this module, every step of the process is done manually, which means that we manually run a cell in the notebook for a given action of the process to be triggered (such as building the training container around the training code or pushing it to Container Registry). In the next module, we'll see how to automate this process. We want each push to the code repository that contains our training code to trigger a rebuild of the assets that constitute our machine learning pipeline, training container, hyperparameter tuning, and all others. We can then extend this automation to automatically retrain the model, exporting the newly-trained model to the model registry, and deploying it into the specified serving infrastructure. This is called continuous integration and continuous delivery, or CI/CD.