[MUSIC] This second lesson focuses on data, so what we have seen in the previous week is that in order to create visualizations, we have to start with some data, right? So in this week's module, we focus on understanding what data is and what is the role that it plays in the context of information visualization? So some important questions include, what is data? How is data generated? And why do we use it? What is if for? So let's start with the first one. What is data? Here, I have the definition that you receive if you Google data. [LAUGH] So the first definition is that data, is facts and statistics collected together for reference or analysis. This is the definition that Merriam-Webster gives. Factual information such as measurements or statistics, used as a basis for reasoning, discussion or calculation. I really like this definition because it includes the idea that we use data as a basis for reasoning, discussion and calculation. And Wikipedia also has a very long description of what data is. And in particular, I really like this sentence. Data as a general concept refers to the fact that some existing information or knowledge, is represented or coded in some form suitable for better usage or processing. So, what these definitions have in common is the idea that some information is recorded typically, digitally in a digital format. And that these information maybe useful to do some calculations to some analysis, and in general, to understand some realities, some phenomenon better. So this is what I described in the previous model, the idea that data visualization is about taking some data, finding a measure of presentation, and interacting with data and visualization. But the ultimate goal is to understand the phenomenon that is described by data, and the ultimate purpose is to generate useful knowledge. So that's a very important aspect. So why do we visualize data? Typically, because we want to generate some kind of useful and hopefully accurate knowledge. So now more precisely, I want to describe what is the role that data plays in the visualization process. And to do that I will keep using the same diagram, but with a small difference. So rather than going directly from data to visualization, and in the middle having visual and coding. Here, I introduce an additional element which is transform data. Why do I do that? Well because very often in practice, what happens is that you receive some data from someone, or you find data or you collect data. And when you want to visualize this data, you first need some transformations. So typically what happens is, there are three main stages of data processing. So the first one is the one that creates the data. So we go from the phenomenon to actually collecting some data. So in this tab we have data collection. The second step is transformation. So typically, we need to transform the data in order to have it in a format or configuration. That is necessary to transform it into a visual presentation. And then we have the last tab which is visual encoding. Which is how to transform this data into a visual form. So here is a little example that describes these last two steps. So, in the table that you see on the left, we have the raw data. And I'm using once again, the vehicle collision data set that we introduce in the previous lesson. So if you remember, every single row in the table represents one collision in New York City. And the columns represent the different aspects of the collisions, like where the collisions happened, at what time, what type of vehicles are involved in which it happens and so on. So, in this example, I am trying to visualize the information about how many collisions there are in the data set, in the different burrows that are collected, that are recorded in the data set. Okay? So what you see in the bar chart on the right is every bar represents one single borrow, and the height of the bar represents the number of collisions recorded in the data set. But we have in the middle, is a transformation. Why do we need this transformation? Well because originally, our raw data collects information about every single instance, every single collision. But the information that we need in order to create the bar chart, is exclusively for every single barrow how many data items we have there. So, in order to obtain this information, we need some transformation from the raw data to the transformed data. So, now another important question. How does one decide how to visualize data? Once I receive a data set, how do I know what is the appropriate visual representation for this data set? That's a very important question. So, one very useful tool to start reasoning about what may be an appropriate visual representation for a given dataset, is the concept of data obstruction.