In this session, we discuss a specific application of the instrumental variable approach discussed in the previous session. We employ the dataset 146 cities digital service.xslx. This dataset collects information on 146 cities in the US coded by the variable code. We have data on the population of the city, the income per capita of the city, and index between 1 and 5 on how easy it is to open a new business in the city, the share of the population in the city with a university degree, the yearly revenue from consultancy in digital services in thousand dollars, and the average price of a unit of digital consultancy service in the city. For example, one day of a team of four people. Once again, these data have been made up for the purpose of this example. And they are not representative of any real situation. They are only used to discuss and illustrate our concepts. Suppose that a firm wants to set a new digital consulting service, and would like to know how much demand reacts to price. After you bring the data in the Stata environment, you could run an OLS regression, reg R_digital, pop ypcap, edupop price. However, price is endogenous. R_digital is a measure of the demand for digital consultants in the city and, in this respect, price affects demand. Thus, a regression with R_digital as a dependent variable, and price as the independent variable will estimate the demand for these services as a function of their price. But price is also the choice variable of the firms, which increase price if they see higher demand. This is the source of endogeneity: price affects demand, but demand could also affect the price. It's a standard reverse causality situation. Since R_digital is a function of price and R_controls is a relation of the demand for digital services with the price, a natural excluded instrument would be something that affects supply, and not demand. Ease of business is then a good excluded instrument because it affects supply by increasing the number of new firms that enter the market, but not demand. By increasing competition, ease of business auto-reduces price and influences demand through the price, with no direct impact on demand itself. In Stata, the command for two-stage least squares is ivreg2 or simply ivreg. The format is to ride the regression like in the standard Reg command using ivreg2 or ivreg as shown here. The endogenous variable goes in parentheses followed by equal and the name of the excluded instrument. If you use comma first at the end of the regression line, the output also produces the first stage regression. If you run this regression as an exercise, you will see that the coefficient of price in the second stage regression is negative and well measured: low p-value. You will also see that ease of business has a positive and statistically significant impact on price in the first stage regression. The statistical significance of ease of business in the first stage is just below 5%, which is a good one, even though strong instruments normally require stronger correlations. You can run a standard OLS regression for comparison, and show that OLS implies a less negative impact on price. Thus, the causal effect is more negative than implied by the OLS prediction. As an exercise, try the log-log regression. You will have to transform the dependent variable and pricing logs. You can also choose to transform in logs the excluded instrument, ease of business, or any other independent variable. Just recall that with a log-dependent variable, the coefficients of a log independent variable are the percentage changes in the dependent variable produced by a 100% change in the independent variable. That is when the independent variable becomes twice as large. If the independent variable is in levels, the coefficient represents the percentage change of the dependent variable from a unit change in the independent variable.