In this section, we'll look at the basics of model estimation for multiple Cox Regression models, and as we've done with other regression models and with all our estimates in general, we'll talk about handling the uncertainty in the resulting estimates. So, we're going to extend the concept of partial likelihood to the estimation of a Multiple Cox Regression model. We already looked at it in the context of a Simple Cox Regression model, and we're going to learn how to compute 95 percent confidence intervals for individual slopes from a Cox regression model, and then these will be exponentiated to get confidence intervals for hazard ratios. We'll explain how to do hypothesis tests for individual slopes, and this likely confidence interval creation will be the same way we've been doing it all along. Then, we'll also discuss the concept of the likelihood ratio tests, that we also saw for logistic regression, that allows for testing multiple slopes, and hence hazard ratios at once in a single model, useful for testing a multi-categorical predictor, either where it's the only predictor of interest or when it's in a model with other predictors as well. So, for estimating the Cox regression equation, the algorithm that the computer uses is called, just like it was with simple Cox regression, Partial Maximum Likelihood. The estimates for the slopes we get for our axis in the model, are the values, let's say we have Px is then the slope estimates Beta one hat through Beta P hat, are the values that make the observed data on our time-driven outcomes, the sample we have, most likely among all choices for Beta one hat through Beta P hat. So, you can estimate how likely the observed sample is under the conditions of a given Cox model, under a given set of choices for the slopes in the model, and the algorithm optimizes this and searches through different possibilities for the slope set until it finds the one that maximizes the likelihood of the given data set. This is likelihood approach is done after the slopes are estimated, after the baseline hazard function, the general shape of risk over time for any of the groups that we're looking at based on their x values is estimated via another process which also generates estimates as uncertainty for this estimated function. So, in general, always this mode of estimation has to be done with the computer, it's complicated. So, we get standard error estimates for our slopes, but we also get which will be used when we put confidence limits on predicted survival curves, some estimates of the uncertainty of that intercept function over time. So, the values chosen for the slope estimates are just estimates based on a single sample, and as with everything we've done, had we taken different samples from the same population estimated the same model, the estimates of our slopes that best fit the data in any given sample would vary across the sample, just by chance variation. So, all regression slopes have an associated standard error that can be used to make statements about the true relationship between the time to event outcome, and a given particular x based on a single sample. So, this method of partial likelihood as I said before, yields standard error estimates for the slope estimates, and we can use the estimated slope and its standard error to create confidence intervals. So, the standard errors allow for the computation of 95 percent confidence intervals in p-values for the slopes, and as with all other regressions, we looked at the random sampling of regression slopes is approximately normal. So, it's "business as usual" for getting 95 percent confidence intervals and do hypothesis tests for the slopes. But again, the slope is on the log-hazard ratio scale, so ultimately the confidence intervals that are created on the slope scale are the results will be exponentiated to get the confidence intervals for the adjusted hazard ratio estimates. So, just as we've done each time, it's always good to remind you that for any given slope estimate, if we were to run a study over and over again, and estimate the same Cox regression model with the same predictors on, multiple random samples of the same size for any one of our slope, so you have multiple slopes, the variation in the slope estimates would be, if we did a histogram for any given slope, the estimates across multiple models for multiple samples in a histogram, the distribution of our estimates would be approximately normal and on average equal the truth, the unobservable true value of the association, the adjusted association. So again, we'll get one estimate under this curve that describes the distribution of all estimates, it could be way out here, it could be here, it may even be right on top of the truth, but we'll never know for sure. But because the distribution is normal, we know that most of the estimates we get would fall within two standard errors of the unknown truth. So, if we start with most of the estimates we get, it will fall in two standard errors. So for most of the estimates we get, if we start by adding and subtracting two standard errors, the interval will include the unknown truth. Same philosophy, same result has been working every time we've talked about sampling variability confidence intervals. In terms of hypothesis testing, the null value on the slope scale of no association between a given predictor x sub i, and the time to event outcome after accounting for the other predictors, is that the slope for that predictor is equal to zero which corresponds to a hazard ratio of one. So, what we do is we pretend that that's the truth, we start by assuming that's the truth and then we measure how far our particular slope estimate is from zero, and if it's far enough such that less than five percent of results that could have occurred by chance from the null are as likely or more extreme than ours, then we would reject the null, otherwise we would not. So, it's the same story as we've been working with in terms of hypothesis testing since we started it in the first part of the course. So, the intercept is a function of time: uncertainty estimates for the value of this function over time are estimated and are used for putting confidence limits on predicting survival curves based on the Cox results, and in a subsequent section, we'll show the idea of looking at predicted survival curves starting from the Cox regression, and we'll talk about these confidence intervals. But for any single slope, the processes is as we've seen before, where we get the 95 percent confidence interval for the slope, we just take the estimate plus or minus two estimated standard errors, and then exponentiate the end points to get 95 percent confidence interval for the corresponding adjusted hazard ratio. In order to get a p-value for testing the null of no association between the time to event outcome in that particular predictor x, when predictor can be described by a single x, or modeled by a signal x, what we would do is the nulls that the slope for that particular predictor zero at the population level, corresponding to the adjusted hazard ratio of one, and the way we do this for each of our slopes we take the estimated slope itself, and divide by its standard errors which would measure how far our estimate from zero is what we'd expect it to be under the null in terms of standard errors, and this distance would then be translated into a p-value. So, let's look at the results we looked at for Cox regression results for predictors of mortality in the PBC or primary biliary cirrhosis trial of the drug D-Penicillamine or DPCA. Let's look at the results for model two which includes as potential predictors the drug, age of the patient at the time of enrollment in the study, their bilirubin levels in milligrams per deciliter at the time of enrollment, and their sex, their biological sex. Here's the resulting underlying Cox regression model for these data. These slopes, you could get them from the previous table simply by taking the natural logarithm of each of those adjusted hazard ratio estimates presented for model two. So, the slope for treatment is the log of the adjusted hazard ratio for treatment of long of 1.10 which is just 0.1, etc. So, we have the age quartiles here, the slope for bilirubin, and the slope for sex, where sex is a 1 for females. So, I'll just give you the standard error for two of these slopes. The standard error for the slope of treatment which was the estimate was 0.10, was 0.19. The standard error for the slope of bilirubin which was Beta five equal to 0.15 is 0.013. So, if we wanted to get a 95 percent confidence interval for the slope of treatment, in other words, the adjusted hazard ratio mortality for those in the DPCA group compared to those in the placebo group. We take the estimated slope or log of the adjusted hazard ratio and subtract two standard errors and we get a confidence interval for the log of the hazard ratio that goes from negative 0.28-0.48. So, notice on the slope scale and includes the null value of zero and we know the result is not statistically significant. But to present it on the hazard ratio scale, we first take our estimate of the log hazard ratio 0.1 and exponentiate it to get an adjusted hazard ratio estimate of 1.10. Then, they get the confidence interval for this on the hazard ratio scale. We'd exponentiate the end points of the confidence interval for the slope negative 0.28 and 0.48. This will give us a confidence interval for the hazard ratio, the adjusted hazard ratio of treatment of 0.77-1.6. Similarly, we could do the same thing for the bilirubin association and all the others in the model but can only fit two two the slide here. So, we take the slope for bilirubin, take 0.15 and slope for bilirubin and subtract two as to standard errors, get a confidence interval for the true slope or the true log adjusted hazard ratio with bilirubin of 0.124-0.176. Notice this confidence interval for the slope does not include the null value on the slope scale of zero. Now, to get this on the hazard ratio scale, the adjusted hazard ratio scale, we would first to the estimate e_0.15 this one 1.16, that's the adjusted hazard ratio, and then we'd exponentiate the end points to the confidence interval for the slope to get the endpoints on the hazard ratio scale, the confidence interval 1.3-1.19. We wanted to get a p value for the population. So, let's just use Beta_1 here on the slope of treatment but we could have done it for any of the other single slopes themselves testing whether treatment is statistically significant predictor of mortality after accounting for age, bilirubin in sex, the null on the slope scale is that the true slope or true log hazard ratio with treatment is zero. If we were to put that on the exponentiated scale, the null would be that the exponentiated slope or adjusted hazard ratio is one. So again, we assume the null hypothesis is true. Calculate the distance of our slope from 0 in units of standard error or estimated slope was 0.10. Our estimated standard error is 0.019, so we have a result that's 0.53 standard errors above what we'd expect the slope to be under the null, the slope of 0 under the null. So, we know that's not particularly far if we translated that into the p value, it's a p value of 0.6. So, consistent with what we saw with the confidence interval for the hazard ratio for treatment which included one that we knew the p value would be greater than 0.05. Now, we see what that p value actually is. Let's talk about age. You can see from this table that you could take the same approach. There are standard errors given for each slope according to each indicator for age indicators of the three quartiles that are the reference. We can get confidence intervals for each adjusted hazard ratio for these three particular age comparisons of 42-50 years to the reference of less than 42 years, 50-57 years to the same reference, etc. But in order to ask whether age to be thorough about this, as what the other regressions we've done when the predictors and multi-categorical enhances modeled with multiple Xes, in order to formally test whether the result is statistically significantly associated with the outcome, it's not enough to test each slope individually because, potentially, the comparisons from each of the slopes which is not the entire set of possible comparisons across all categories maybe none of those are statistically significant but some of the other ones that are not directly estimated by a single slope r. So, in this model x2 through x4 are indicators of age quartiles two, three, four, respectively. To formally test whether age is statistically significant predictor of mortality after counting for treatment, bilirubin and sex, we need to test the null that the slope for the three age categories taken together are all zero. Again, the reason we need to do that it is very possible just in general, the difference between age quartile two and the reference is not statistically significant; age quartile three in the same reference is not statistically significant; age quartile four in the reference is not statistically significant. One may look at all three slopes and say none of them were significant. Therefore, age is not a statistically significant predictor. But we're missing, just because of the way we code it, we're not seeing a direct head to head comparison of say age quartile four to age quartile two, age quartile four to age quartile three, and age quartile three to age quartile two. It's possible that even if none of the differences between the three non-reference categories and the reference are not significant, some of these other ones are. So, this covers all of this and it's invariant to how we've coded age and what we've chosen as our reference group. So, the way you do this is that computer will do this. I just want you to be aware of this method exists. It's called a likelihood ratio test like it was for logistic as well and so on. It compares the amount of information about mortality explained by treatment, age, and bilirubin and sex to the amount of information explained by treatment bilirubin and sex only. So, we have our model that includes age as what we might call our extended model, and we're comparing that to a model includes everything else that's in our extended model except for age. We're going to leave age out. So, our null model is the one without age which includes just treatment and bilirubin and sex. I just kept the names of these coefficients the same. They will be different values, and usually, when we model that only had three Xes, we call it X1, X2, and X3 but just to show the parallels between the pieces of these models. So, if the extended model has enough potential information about the outcome beyond the information for the null model to justify estimating the three extra age slopes with the same amount of data, then the null is rejected. Otherwise, that null model is preferred. In this situation, as reported in the table of results that we looked at in the beginning of this lecture set and in the first lecture set with examples, there's only p value here is less than 0.001. So, this is a statistically significant predictor although you would have noted from the table that some of the comparisons between some of the age groups and the reference where significant. So, we already knew heads up that it would be statistically significant. But again, this is a robust test that covers us even when the comparisons we've chosen by our modeling of the indicator variables are not significant. This needs to be done with a computer. The approach is generalizable to any two null and extended model setups. These are such that one has to be nested in the other that everything that's in the null model is also in the extended model plus more things. But what this likelihood ratio test test is that all slopes for the additional predictors in the extended model are zero. If we reject that null, then it says at least some of the information in the ester predictors and extended model is useful. We should go with the extended model. Certainly, when our extended model only includes one predictor that has multiple categories is asking about that particular predictor, like this example with age. So, in summary, the construct confidence intervals for Cox regression slopes from Cox model regression is the same thing. Take the estimate and subtract two estimated standard errors for large samples. In smaller samples, again, this will handle by a computer. Ninety-five percent confidence intervals are based on exact computer-based methods but the interpretation of this confidence interval and p values are the same regardless of sample size. Confidence intervals for slopes are confidence intervals for the adjusted log hazard ratios, and we saw that we could exponentiate those endpoints to get the 95 percent confidence interval for the hazard ratios themselves. The partial likely algorithm also gives an estimate of the uncertainty and the estimate as a function of time. We can't do anything directly with this by hand, but this information will be used via the computer when creating confidence limits for survival curve estimates based on the resulting multiple Cox regression. We'll see examples of that in a subsequent section. Formally testing multi-categorical predictors requires testing two or more slopes and hence two or more hazard ratios at once together as opposed to each individually which is already done by the confidence interval and p value for each slope on its own. This can be done using a likelihood ratio test and I'm just making you aware of the name of the test. The idea behind is you may see it referenced in articles.