If you recall, we have previously discussed that the main categories of machine learning tasks are classification, regression, cluster analysis, and association analysis. We have discussed classification in detail. Now let's look at the other categories, starting with regression. After this video you will be able to define what regression is, explain the difference between regression and classification, and name some applications of regression. Before we talk about regression, let's review classification. In a classification problem the input data is presented to the machine learning model, and the task is to predict the target corresponding to the input data. The target is a categorical variable. So the classification task is to predict the category or label of the target, given the input data. The classification example shown here is one we have seen before. The input variables are measurements such as temperature, relative humidity, atmospheric pressure, wind speed, wind direction, etc. The task for the model is to predict the weather category associated with the input data. The possible values for the weather category is sunny, windy, rainy, or cloudy. Since we're predicting the category, this is a classification task. With that context in mind, let's now discuss regression. When the model has to predict a numeric value instead of a category, then the task becomes a regression problem. An example of regression is to predict the price of a stock. The stock price is a numeric value, not a category. So this is a regression task instead of a classification task. Note that if you were to predict not the actual price of the stock, but whether the stock price will go up or go down, then that would be a classification task. That is the main difference between classification and regression. In classification, you're predicting a category, and in regression you're predicting a numeric value. Here are some examples where regression can be used. Forecast the high temperature for the next day, estimate the average housing price for a particular region, determine the demand for a new product, a new book for example, based on similar existing products, predict the power usage for a particular power grid. This is what the data set might look like for the regression task of predicting tomorrow's high temperature. The input variables could be the high temperature for today, the low temperature for today, and the month. And the target is the high temperature for tomorrow. The model has to predict this target value for each sample. Recall that in a supervised task the target is provided. Well, for an unsupervised task the target is not available or not known. Since the target label is provided for each sample here, the regression task is a supervised one, similar to classification. As with classification building a regression model also involve two phases. A training phase in which the model is built, and a testing phase in which the model is applied to new data. The model is built using training data and evaluated on test data. Similar to classification, the goal in building a regression model is also to have a model perform well on training data, as well as generalize to new data. The use of three different datasets that we have previously discussed also apply to regression. Recall that the three datasets are used as follows. The training dataset is used to train the model, that is to adjust the parameters of the model to learn the input to output mapping. The validation dataset is used to determine when training should stop in order to avoid over fitting. And the test dataset is used to evaluate the performance of the model on new data. In summary, in regression the model needs to predict the numeric value corresponding to the input data. Since a target is provided for each sample, regression is a supervised task. The target is always a numerical variable in regression. In the next lecture, we will discuss a specific algorithm to build a regression model.