To figure our air model to some real data we need to took determining the order of the air process. One possible way is to repeat the analysis for different values of model order p and choose the best model based on some criteria. The two most widely known [inaudible] use the criteria are the so-called Akaike information criteria, AIC and the Bayesian Information criteria, BIC. We now describe how to select the older of air process using AIC or BIC. Suppose we still have data y_1/ T. As we are comparing models with different numbers of parameters reduced so based on a common sample size. Thus, we fix a maximum older p star, and we consider p that is less than or equal to p star. When compare models of various other p less or equal to p star, we do so using the letter n equals to T minus p star of iterations in the time series. For any other p less than or equal to p star, the ARP model can be written as y equals to f transpose Phi plus Epsilon. Here y is our data. It is running from y_ p star plus 1 dot dot dot to y_T. The design matrix F. The first column is a design vector corresponding to the first term y p star plus y here. It will start from y_p star as y_p star minus 1 dot-dot-dot to y_p star minus p plus 1. Remember we are fitting our ARP process. The first column should have P element. That's why we end at p star minus p plus 1. As similarly, the second column will be y_p star plus 1 y_p star dot dot dot y_p star minus p plus 2. For the last column, we have the design vector for y_T which should be y_T minus 1, y_T minus 2 dot dot dot y_T minus p. There are p rows in the F matrix and T minus p star columns in the F matrix. F is a p by n matrix. The air coefficients Phi is just Phi 1 Phi 2, to Phi p. The error term, if so, follows a normal distribution which mean zero and variance matrix no times the identity matrix. Then paste the Arbor notation. The conditional likelihood is f y_p star plus 1 to T condition of the air coefficients Phi, the various parameter Nu and y_1 up to p star. This is a multivariate normal distribution. It's normal of mean, F transpose times Phi and variance no times the identity matrix. A F Phi is defined here. If we plucking the density function for the multivariate normal, we have further expand this as two Pi Nu to the power of negative 1 or 2 times exponential, negative 2 times Nu. As for the numerator, we have y minus F transpose Phi transpose times y minus F transpose Phi. Use the trick you may have seen in any linear regression class with define Phi hat as F, F transpose inverse times F times y. We can expand this term, y minus F transpose Phi transpose y minus F transpose Phi. Using the trick of hiding, subtract this term and adding this term back. This is equal to y minus F transpose Phi hat plus F transpose Phi hat and minus F transpose Phi. This whole thing transpose times y minus F transpose Phi hat plus F transpose Phi hat and minus F transpose Phi. Here we're just minus this F transpose Phi hat and press it back. This right-hand side equals to the left-hand side. This further equals to, if we consider this part and this part separately, this equals to y minus F transpose Phi hat transpose times y minus F transpose Phi hat plus Phi hat minus Phi times F times F transpose times Phi hat minus Phi. The cross product term will vanish because of the orthogonal projection property. Let us call this first part R. The definition of AIC and BIC is given by, AIC equals to 2 times p, which is the order of the air process, plus the number of observations n times log R over n minus p. BIC is defined as log n times p plus n times log R over n minus p. Notice the second term of AIC and BIC are the same. It is a measurement of the good-of-fit. The difference is only in the first term, which serves the role of punishing the model with too many parameters. Values of p leaving too small AIC and BIC values are taken as indicative of relatively good model fits. We've seen a ton of air models so explored. Larger values of p will tend to give smaller various estimates which decrease this good-of-fit term in both expressions. But this decrease is penalized of parameter dimensions by the first term. BIC tends to choose simpler models than AIC. We simulate an air process of order 2 and implement the AIC and BIC criteria to check if the best model selected has order 2. Firstly, this code simulates an AR2 process with 100 observations. Process is simulated with this formula, and it is shown here. Now, we fix p star at 15 and since capital T is a 100, we will use the latter 85 observations for the analysis. We will compute and plot both AIC and BIC for different values of p ranging from 1-15. In terms of coding, we set up our n.all, which is a capital T in our notation, p.star y, n, and p at the beginning. Now, let's take a look at the formula for AIC and BIC. It requires the calculation of R, which has this form. To calculate R, we need Phi hat. If we plucking a form of Phi hat to R, R will be only upon the design matrix F and our data y. In addition, the design matrix F is the only term that will be changed with respect to AR or the p. Data y is always fixed because no matter which order we use, we will always use the letter t minus p star as our data. The key step will be to get this design matrix F with respect to different p. This is done by this function called design.matrix. It defines the design matrix row-by-row. For the S row, from this form of F, it will start from y p star minus i plus 1 and the last will be n. The last term will be y star plus n minus 1. For example, let's do a simple check here. If we want to define the first rule, that is i equals to 1, then y star will be p star plus 1 and minus 1. That is just a p star. The last term is p star plus n minus 1. Remember p star plus n is just a capital T, so the last term is T minus 1. Therefore, the first row is y from p star to T minus 1, which is exactly the first row of F we have here. After we calculate the design matrix, the following calculation for AIC and BIC is quite straightforward. For each order p cur, we first get a corresponding design matrix using the formula we just defined, then calculate for half R AIC and BIC using these formulas and these codes. Finally, this function returns the AIC and BIC values corresponding to the order p cur. We apply this function to all P values we want to consider, which is from 1-15 in this case. The result is plotting here. Since for AIC and BIC criteria we both prefer model with smallest criteria from the plot, we will use p equals 2, which is same as the order we use to simulate the data.