Back in Course 1, we introduced the Machine Learning Process Lifecycle or MLPL. Throughout this specialization, we've referenced it once or twice. In this video, we're going to summarize our discussions and show you how to use the MLPL to structure milestones for machine learning projects. By the end, you'll have an understanding of the deliverables and documentation for each stage of the MLPL. The ML Process Lifecycle is a framework that captures the iterative process of developing a machine learning solution for a specific problem, from problem formulation to handing over the project to the client. ML solution development is an exploratory and experimental process where different learning algorithms and methods are tried before arriving at a satisfactory solution. The journey to reach an ML solution that meets business expectations is rarely linear. As you advanced to different stages of the process and uncover more information, you may need to go back to previous stages of the MLPL to make changes or start over completely. We use the MLPL to capture and explain this dynamic workflow between different stages and the sequence in which these stages are carried out. Remember, the process is split into four stages: business understanding and problem discovery, data acquisition and understanding, machine learning modeling and evaluation, and finally delivery and acceptance. As we've said, the phases are iterative, but you have to keep in mind that you can't skip ahead. For example, if you're running into problems in the modeling phase, you can either revisit your data or revisit your problem definition. But anytime you go backwards, you have to repeat the next steps. Let's go through each of the stages once more. The first phase is business understanding and problem discovery or BUPD. In this phase, you want to understand the business context of the machine learning project, identify relevant stakeholders, and come up with a clear specific question for the comm to answer. You also need to think about how the comm is going to be used and evaluated. We discussed these points in more detail in the first course of the specialization. In this phase, you should also consider things like what data is accessible, whether there's any specific requirements for your comm such as unexplainable model, or computation constraints, or excess limitations. At the end of this phase, you will have to find your goals and explicitly outlined related criteria and details about the process. This will help understand the problem better, structure your project, and dive deeper into subsequent stages of the MLPL. It's important to record the insights you developed during this phase. This could be as simple as a bulleted list stored in a code repository, but it can be helpful to put it together as a report to relevant stakeholders. The work done at this stage lays the groundwork for the entire project and speaks to your larger machine learning strategy. So it doesn't hurt to formalize it. Whatever you do, make sure to record both the answers and the points of contention because you will be revisiting these questions. The next stage is data acquisition and understanding. In this phase, you want to make sure that you're collecting the required data and think about all the necessary transformations so that the data can be used for machine learning. Also, make sure that it's possible to answer the question you're looking at based on the data you have. As we discussed in the previous course, you're learning data should be representative of your operational data. We also talked about the importance of understanding your data and doing data cleaning as a significant portion of the time spent on a machine learning project is here. This stage is also when you need to set up a data pipeline. This might also be when you split your learning data into training and test sets. Clean data is an obvious deliverable for this stage, but don't underestimate the importance of communication. An overview of the cleaning process along with the associated code and a report on the insights gained through exploratory data analysis have huge value to an organization. So again, record these insights either through your development tools or as a formal report. The more value you can demonstrate through all stages of the MLPL, the easier it is to get buy-in for further ML development. Once you complete data acquisition and understanding stage, you start with the third stage: machine learning modeling and evaluation. Course 2 was all about this phase, where we showed you how to implement several different kinds of machine learning models and discussed some of the issues with evaluation. In this stage, we select the appropriate learning algorithms and prepare the data set to be compatible with the machine learning algorithm. Feature engineering is extensive in this phase. Of course, you're going to evaluate the comm to see if it's addressing the defined machine learning problem at an acceptable level of performance. Refining the model by identifying the best parameters for each of the learning algorithms, that is hyperparameter tuning, also happens at this stage. Identifying the evaluation metric is important. As we've discussed, evaluation has two different aspects: technical and business assessment. This stage is the most purely technical, so you can focus on the technical side of model evaluation. An obvious deliverable for this stage is your actual comm, the model that performs best according to the criteria you decided on. But again, communication is important as well. All the things you had to settle to build that best comm; the feature engineering, the experimental code, your explorations of the mistakes the model makes, and differences between different algorithms, there's valuable information there. You might want to consider at this stage running and reporting explicit sensitivity tests. This is your chance to not only find the best model to answer your target question, but also to understand how and why it performs the way it does. Even if explainability wasn't a specific criteria of your model, you still need to understand when the model fails. When you're setting milestones, don't just think, "I must have a perfect model by this specific time." Think of insights that are relevant whether or not the model succeeds. For example, there's a complicated model improved much over a baseline. What are those baselines? Sometimes you find out that the heuristic you're currently using is just as good as the most complicated neural network. That's a valuable insight. After iterating through the process of modifying, refining, tuning, and testing multiple times, we come up with the comm we're happy with. Then we move on to the last stage, delivery and acceptance. As we saw earlier in this course, this is the stage where we confirm that the ML problem is addressing the main business problem. You'll be referring back to your documentation from phase 1, but your understanding of the problem and the comm has necessarily involved. Having a conversation with the client or end user at this point is important to understand if the business problem is adequately addressed. If you're developing an ML system for a client, you should document what you built and how you built it. Also, train and educate the stakeholders who are going to use and maintain the system before the handover. Package up all those scattered notes, the code comments, and repository notations into one clean document. Even when you're building a system for internal use, the delivery and acceptance phase is very important. It's rare that the exact people who built the comm are the only ones who will be using it, and we always forget more than we think we will. So documentation is key. People new to machine learning often think that building an ML system is a simple linear process, but this is usually not the case. That's why we chose to include cycle as part of the name of our framework. Remember, the four phases have to happen in the order we just saw, but you often have to double back and sometimes even restart from the beginning. This picture shows in reality how frequently you might have to apply the brakes and go back to one of the previous stages or units in the same stage. This switch to a different stage or a unit is what we call a lifecycle switch. A lifecycle switch forces you to revisit some of the modules that you've already done because remember, it's all about answering the original business problem. If any changes happened midway, there's a high chance that the other components already visited might have to change. Use lifecycle switches to your advantage by documenting each stage as you go through it and making the reason for the switch explicit. At first, you'll find lifecycle switches often are prompted by insufficient work on the previous stages. Poorly defined questions or misunderstanding data being the two most common causes. But as you gain experience, you'll find the reasons for lifecycle switches get more sophisticated. Lots of things cause a lifecycle reset and not all of them are possible to anticipate. Business objectives might change, we uncover problems with our data distribution or sufficiency, the exploration goes off in the wrong direction or gets fixated on the wrong aspect, or the performance just playing isn't good enough. Monitoring the nature of lifecycle switches can give you a measure of how your business is growing in the machine learning adoption stages. So always take note. As I said back at the beginning, developing a machine learning solution for a business problem is an iterative and exploratory process and requires a few back and forth hops between different stages of the MLPL. Yes, it can be discouraging to see the unpredictability of the process, but that's research for you. If it were easy, you'd already be able to buy it off the shelves. Understanding these stages along with the iterative and sometimes unpredictable nature is key to setting the right expectations among the stakeholders. With care and documentation, you can benefit from every stage and use this process to move your organization into the future.