In this session, we discuss a more traditional approach to estimate causal effects: the instrumental variable approach. Consider a linear regression: y equal to b times x plus an error. We learned that x should not be correlated with the error. If so, we have an endogeneity problem. Endogeneity could arise from omitted variables, reverse causality or other reasons. They all boil down to the fact that x and the error are correlated. For example, if you omit a variable correlated with both x and y, it would be captured by the error term, which will then be correlated with x. Similarly, reverse causality means that the true model is x as a function of y, and, therefore, if we estimate y equal b times x plus error, this error affects x through y. If x and the error are positively correlated, OLS overestimates the change in y with respect to x. Intuitively, you will attribute to x a positive effect that comes at least in part from the error because the error also increases x. If x and the error are negatively correlated, OLS underestimates the effect of x for symmetric reasons. The solution is to find the so-called instrumental variables. They are variables correlated with x, but not with the error. Another way to say it is that they should be correlated with y only through x. In other words, a variable z is a good instrument if it produces no direct change in y, but only an indirect change through x. That is, z affects x that affects y, but there is no effect that goes directly from z to y. Moreover, instruments have to have a strong correlation with x, otherwise their power to estimate the effect is limited. Good instruments of this kind, that is strongly correlated with x but not with the error, are not easy to find. However, once you have such an instrument, you run two regressions. One, a first stage regression, whose dependent variable is the endogenous variable x, and the independent variables are the instrument z and the other independent variables, which we then call controls. Second, a second stage regression, whose dependent variable is y and the independent variables are x, the controls of the first stage regression, but not the instrument, which is then called excluded instrument. The method of this estimation is two-stage least squares. Intuitively, this method estimates the first stage regression, finds the predicted value of x, and uses this predicted value as the independent variable x of the second stage regression. The method adjusts the distributions of the estimator to take into account of this process, which is why you cannot simply estimate the first stage, obtain the predicted x, and run a second OLS of the second stage regression using the predicted x as the independent variable. As we will see below, there are standard packages including Stata for running two-stage least squares regressions. You could have two or more endogenous variables in a regression. In these cases, you need at least as many instruments as independent variables. Since instruments are hard to find, this often becomes complicated in practice. More generally, the difference between field experiments and instrumental variables is that, as we said, good instrumental variables for the specific causality you want to nail down are hard to find. A field experiment, instead, enables you to design the experiment you want for the goals you want. This is why, especially today, with a wide availability of data and context for experiments, including digital, field experiments can be very effective managerial tools to make decisions, and we strongly encourage their use. A famous example of an instrumental variable approach is a paper by Morten Bennedsen from INSEAD and colleagues published in the Quarterly Journal of Economics in 2007. They studied whether firms that appoint new CEOs from the same family of the old CEO, as sometimes happens in firms controlled by families, perform worse than firms that appoint new CEOs from outside the family. Running a regression with performance as the dependent variable and the dummy for whether the new CEO comes from the same family as the old CEO has natural endogeneity problems. For example, a worse-performing firm may not be able to find good external CEOs, or there could be other reasons. We are not interested in the specific reason for this endogeneity. However, as long as we see a potential endogeneity problem, we need to control for it. The paper then uses the gender of the first born child of the CEO as an excluded instrument. This an excellent canonical example for what we should look for as an excluded instrument. The gender of the first born child of the CEO cannot be chosen by the decision maker, which is a natural reason for endogeneity. And it has no direct effect on the performance of the firm. However, it may affect the choice of appointing the CEO from the family. For example, sociological or psychological reasons suggest that families are less likely to appoint women CEOs. We are not interested in the reason why the excluded instrument is correlated with the endogenous variable, we are only interested in the fact that it is correlated. And we can check this correlation by looking at the statistical significance of the coefficient of the excluded instrument in the first stage regression. The paper uses two-stage least squares to estimate the two regressions, as shown here. The first stage results indicate a strong correlation of the gender first instrument with the appointment of a family CEO. In the table taken from the paper, two asterisks denote that the p-value of the corresponding coefficient is smaller than 5%, and three asterisks that it is smaller than 1%. It is easy to see that the p-values of the impacts of the gender of the first born child are all statistically very robust in the first stage family CEO regression. This suggests that the instrument is strong. In the second stage, the paper studies the impact of the endogenous variable, family CEO. You can see from the table, reported again from the paper, that the coefficient of family CEO in the second stage regression is negative and statistically robust. This suggests that appointing a CEO from the family has a negative impact on performance. The table reports both the OLS and instrumental variable estimates, so that we can compare. In general, the OLS estimate produces a less negative and more fickle coefficient than the instrumental variable estimate. This suggests that controlling for endogeneity produces a stronger negative effect. The impact of a family appointment appears to be more detrimental to performance than what we predict from OLS. Moreover, this impact is causal. This means that, other things being equal, if we appoint a non-family CEO on average firm performance increases. There may be different reasons why firms may want to appoint a family CEO. For example, if they want to keep control. We are not here to judge these calls. However, based on the results of the study, if firms appoint a non-family CEO, this action increases financial performance.