This video introduces regression analysis and discusses its application for smart city data. First, we will discuss what is regression? Its a technique that is used for modeling and analysis of numerical data. That is, it explores the relationship between two or more variables. The purpose is to gain information about one of them, which we call it as target variable or dependent variables, and this is done knowing the values of other variables, also known as explanatory variables or independent variables. Now the important thing about regression is that the target variable should be continuous, that is it can take any real number value. Explanatory variables, on the other hand, can be either discrete or continuous. There is no restriction on that. What is the regression analysis used for? One of the application is estimation. That is, we want to find out the value of the dependent variable for unobserved value of independent variables. Another is the prediction task, also known as extrapolation. In this, we estimate the value of dependent variable in the future so that one of the independent variable is the time. An example of that could be weather forecasting problem in which we predict the maximum temperature or humidity for the next seven days. Another application for regression is hypothesis testing, in which we can either hypothesize on the value of the individual regression parameters. We can see our dependent variable would have a value of, say, 0.7, and then we can find out whether it's true or false. We can also find out the significance of regression where a hypothesis states that none of the variables has a linear effect on the response, so we can find the relation that dependent variable varies linearly with one or more of independent variable. Regression is also helpful in modeling causal relationship. That it, it has to determine which independent variable causes changes in our dependent variable. If we look at the small illustration using a figure, so let's say a person who is getting married, her number of husbands change from 0-1 in a matter of one day, that is from yesterday to today. But we cannot just use these two data points to make the extrapolation that by late next month, she will have around four dozen husbands. Now coming to application of regression to smart cities. One of a very important application is urban mobility management. That is which mode, or it could be a combination of modes of travel a user is likely to choose. A choice depend on several factors such as time of travel, travel purpose, what is the origin and destination, how are the weather condition, is it sunny or rainy, or whether it's a weekday or a weekend. These choices impact the overall quality of life in cities and help and personalize mobility services, interaction between a city and its transportation system, and also in predicting traffic flow for the future. Another application is energy uses prediction, in which we can predict power uses on customer and industry level. By aggregating all these levels, we can find out city level power demands on our daily and seasonal basis, and it will help in management of power supply, which will further help in plan future power demands. When mode application in smart city domain is a smart waste management in which we can predict waste amount based on several factors, and we can route prediction for garbage collection. Air quality management is another important application in which we can predict seasonal pollutants, which will help in managing air, water, and soil pollution. Looking at the mathematical formulation for regression, let's say, we are given a dataset D, which consists of an observation. Each observation is that D of x_i and y_i, where x_i is a set of attributes for the ith observation and y_i is the target response, our dependent variable. The regression aims to find the target function f, such that y_i can be represented as f of x_i, Beta i plus Epsilon i, where Beta i at the parameters characterizing the role of x_i. How? Y_i depend on x_i, and Epsilon i present the stochastic nature of the relationship between y_i and x_i. That is, what is the difference between the function value and the actual value. To get the best value of f, regression aims to minimize the total error. That's either the absolute error. It's the summation of absolute value of y_i minus f x_i or square error. That is summation over y_i minus f x _i whole square. The main challenges while performing regression analysis is that the function f is not known priori. We do not know how dependent variable changes with various independent variables. For this, we can look at the scatter plot of the data for suggestions based on whether the scatter plot would look like a line or a parabola we can choose. We can also hypothesize about the nature of the underlying process. Sometimes, the independent variable maybe too little or too much. That is some variables included in the regression function may be unimportant and then unnecessarily complicate the function. Why? Many a times some important variable might be missed. There are also issues with measurement and systematic errors. Sometimes, we also have too little or too much data in which we are either not able to fit any function or the noise becomes too much. If we look at the small illustrations are data points that have been observed as shown by this black dots. We can either create a straight-line or a sinusoidal functions. Depending on the model complexity and the amount of error we have, we need to make a judicious choice. This video discussed to regression analysis and various applications for a smart city generated data. Thank you.