I was sitting in my office and talking on the phone. In the hallway, I heard some fake chit chat of a few colleagues. Meanwhile, Marley, an energetic and very open person who I was talking to on the phone, was very interested in telling me this. I have finally persuaded my management to change the implementation of the policy we talked about. Remember that our ANOVA analysis showed that the underlying factor had absolutely no effect on our KPIs. I'm so happy that we have found this and that the plans have changed. So why am I telling you this? Well, this is one of the many times that I have realized that a tool as basic and intuitive as the ANOVA technique can really make a difference, and make people understand how something works, or how it doesn't work, and use this knowledge to make better informed decisions. In this series of three videos you will learn how to perform such ANOVA analysis. ANOVA stands for analysis of variance. In this first video I will explain the basics. In the second video I will show you how to interpret the outcome. And in the third and last video I will show you how to validate the outcomes. Remember that you saw this diagram before in the introduction video on data analysis. This diagram helps you to select the appropriate tool for the correct problem. So when do we use ANOVA? We see that you have to look at the type of data. And ANOVA is the appropriate statistical technique when the Y variable is numerical and the X variable is categorical. Let me start with an example in which I illustrate when to use the ANOVA technique. Imagine you work in a factory producing coffee and you are in charge of the quality inspections. The raw coffee beans are roasted before they are grounded and packed. And one of the important quality metrics in the moisture content of the coffee beans. If there is too much moisture the coffee will rot very quickly. If there is too little moisture, the coffee will not taste too good. And as a quality inspector, one of the problems that you have noticed was that this moisture percentage is not always within specifications. How would you solve this problem? Well, you start a project and at one point you have an idea. There are four different machines in the factory that make this coffee. Could it be that the machine influences moisture contents of the coffee? In order to answer this question, you first collect some data. From each machine, you collect ten batches of coffee, and measure for each batch, the moisture percentage. You also record which machine the batch was made on. This is the collected data. What will be your next step? This is a graph of the data. It shows the moisture percentage per machine and what do you see? There appears to be a difference between the machines. More specifically coffee produced on machine 1 shows a higher moisture content on average at least than coffee produced on machine 2. And this is exactly how ANOVA will turn out to work. It makes groups according to your categorical influence factor. And it tests whether these groups have equal means or not. Note that we have only measured ten data points per machine. Is this enough to conclude that there are differences between these machines In the whole population? Suppose that we take a second sample tomorrow. Do you expect machine one to still give the highest moisture percentage? So, can we generalize these conclusions to the entire population of batches? Well, to answer this question we will perform a statistical analysis to generalize our findings from the sample. But which statistical analysis should you use? For that, you have to think in terms of Y and X variables. The Y variable is the variable you wish to understand or explain. In a sigma project, this is nearly always your CDQ. In this example it is moisture percentage. Now look at your X variable. That is the variable that potentially influences your Y. That here in your example is the machine. Now ask yourself, what type of variables are these? Well, your Y variable, the moisture percentage is numerical. And your X variable, machine Is categorical, so use the tree diagram to see which analysis method you should use, and that is here ANOVA. So, ANOVA is used to study the relationships between a numerical Y, moisture percentage, and a categorical X variable machine. ANOVA can be performed in three steps, the first step is to organize your data. The X and Y variables should have their separate columns. Each row in your data set needs to contain one unit, a batch in our example. The dataset you saw was not in this format. Each column contained ten measurements from each machine. If you do have a dataset that is in the correct format, you can obviously skip this step. In Minitab you can stack your data. In the data menu under Stack, you have the possibility to stack columns. You entered a columns you want, and the result should look like this. We see that in our example we have two columns with 40 units. Each row now contains a batch that we have measured, and moisture percentage and machine are the variables in the columns. With data structured like this, we can go to step two and actually perform the ANOVA. The third step consist of an assumption check because after performing an ANOVA, we need to verify the reliability of the results. Steps two and three are discussed in the next two videos. Okay, let's summarize. ANOVA is a suitable statistical method if you are comparing means across various groups. The analysis consists of three steps, organizing your data, performing the ANOVA, and performing residual analysis just to check if the assumptions are met.