We talked about the demographics as the first common data type. The next common data type is diagnosis. Let's go through the specs of this data type. There are multiple places in a health data source that you might find diagnosis. Of course, you can see a lot of diagnostic codes in the encounter table that indicates a patient has a disease or not. However, diagnostic codes could be also used for signs and symptoms, for injuries, for health status factors, family history, problem list and so on. Diagnostic codes might be in multiple tables in a database. Just a bit of a background, it's good to know that chronic diseases are on the rise in the US and you might find the same trends in your database as well. There are bunch of derived variables that you can conclude based on the diagnostic codes such as severity of a diagnosis, trajectory of it, or a history of it. However, lot of this requires some clinical knowledge of understanding how you can merge some of these diagnostic codes or not. There are a number of coding standards available and there are commonly used here in the US and also other countries like the International Classification of Diseases or ICD. Here is a sample table from again, a database that each row of it has a patient. You can see there is a patient identification number. There is the ICD code version meaning that this is ICD9, it's not ICD10. There are multiple versions of these coding standards. You can see when a visit happened, that's a date there. Then there are like nine diagnostic codes in front of it. Of course, this is more of a research table. This is not a normalized database, which means it's not scalable. For example, what if in a visit a patient has more than nine diagnostic codes, then what should I do? Should I add a new row or is there a new column there that I can add? Which is not possible, at least for these relational databases. You can see all diagnostics are in one row in front of one patient. Again, this is not a normalized table, but still a lot of researchers use these diagnostic tables. If you look at arrow one, you can see that the patient has also some diagnostic codes that start with the E for ICD9 codes, which means an external cause of injury. If you see arrow number two there is a diagnostic code starting with the V, which is a factor influencing health status. You need to know how this ICD code in this case works. The classification, the special notations and how they right or the naming conventions of the different diseases, how they put numbers on them and they code them in order to understand what these tables mean. Of course, if you look at arrow number three, there instances that, some diagnostic codes were never filled because the patient didn't have that many and number four actually you can see more. As an example, you can see one of the ICD codes there, it's 997.49, and if you look up the ICD classification system, you can see it means other digestive system complications. The last common data type that I'm going to talk about is the medications. You can see the specs. Depending on where the data source is, it might be a prescription like in the EHR or what are the medication was actually filled, which is on the claim side. In terms of a background, there are obvious new medications coming in and there's obvious updates almost on a daily basis to some of the medication coding systems and overall, the cost of medication is being increased in the US because of the new sort of biologic medications that are very costly, at least for now, to produce them. There are a lot of derived variables that you can conclude from this type of data such as medication adherence rates. You can see two examples here, which I will come back and explain in the next slide. The coding standards, you can see some of them here, like the National Drug Codes. Here's a sample table. You can see again, each individual row is a medication given to a patient at the specific visit. Here arrow number one is actually listing all of the product names. You can see that for example, if you just scroll down and you'll see there are two medications listed there that is the same name but if you just go to the right side of the table, you can see that there are two different packages. That is basically an NDC code where not only it identifies what ingredient is the medication made of, but also what packaging it comes in and that's why NDC codes are heavily used in the claims world because it's all about how many meds are in that packet, so that's how you pay for it. That's our number two that shows the packaging description.