In this video, I'm going to give you a little bit of introduction to the field of educational data mining which is a fairly recent field where we analyze data that comes from educational setting. And so, just to give a quick introduction, I think it's no big statement to say that computers nowadays are pretty much everywhere. You are probably looking at this video on a computer, you might have a computer in your pocket in the form of your phone, you might have a computer on your watch, so computers are pretty much everywhere and they're a big part of our lives nowadays, of our daily lives. And so, it's not a surprise that they're also very present in the context of education. So, that might be just in terms of institution collecting data on computers, but it can also be from students using computers to learn, engaging in educational video game, intelligent tutoring system, scientific simulations or online courses. And so, with all this computer involved in education, there's a lot of data that can be collected about the process of how students learns in those digital learning environments, and this data can come from many different sources including log files of detail traces of what the student is doing, webcam feeds of the facial expressions of the student, cameras that look at the postures of the student or even audio files of what the students are saying. And so, the field of educational data mining, what we want to do is we want to see how we can leverage that data to inform the study of education and to improve learning environments, so we can use that to study how students interact with digital learning environments, how we can build models of our students, their different behaviors in those environments and we can study the impact of those behaviors on learning outcomes. And so, as I said earlier in the introduction of this video, educational data mining is a fairly recent field and there're two main research communities that have a lot of focus on big data and education. The first one is the International Educational Data Mining Society which had its first official conference in 2008, and the Society for Learning Analytics Research which had its initial conference in 2011. So, the Society for Educational Data Mining they define educational data mining as an emerging discipline concerned with developing methods for exploring the unique and increasingly large scale data that comes from the educational settings and using those methods to better understand students and the settings in which they learning in. And on the other hand, the Learning Analytics Society define learning analytics as the measurement collection analysis and reporting of data about learners and the context for purposes of understanding and optimizing learning and the environments in which it occurs, so you can see that there's a lot of overlap in those two communities even though they are still distinct. So, they have a lot of overlap, they have both a joint goal of exploring how this big data can be used to support learners and learning and they also share a lot of techniques and methods. So, what I want to do now is just quickly go over a couple of techniques that have been used in educational data mining and learning analytics, just to give you a brief overview. So, many of the techniques that we use in educational data mining and learning analytics come from computer science, mostly the field of machine learning and data mining where they try to use computer to analyze large amount of data in ways that is not easy for humans to do. And so, educational data mining and learning analytics are really interested in how we can apply those techniques to educational data to study learning or to provide better real time support for learner as they are learning. And so, one of the first type of techniques that we can use is what we call prediction modeling, and more specifically two types of prediction modeling, classification and regression. So, in the context of predictive modeling, what we want to do is we want to develop a model that can infer from one specific aspect of the data which we call the predicted variable, and this prediction can be done about future events, for example, trying to predict what the score of a student is going to be on a future standardized exam or they can be about something happening in the moment but the learning environment doesn't give us information about. So, for example, is the student disengaging from the learning environment at this moment? And so, both regression and classifications are very similar and how they differ is in what type of data we're going to be predicting. So, for regression, we're looking at predicting continuous variables such as what is going to be the score of a student on an exam, whereas for classification, we're trying to detect and predict variables that are going to be categorical. For example, is the student currently disengaged or is the student currently engaged? Another type of analysis that we do a lot and educational data mining is what we call latent knowledge estimation. So, the idea behind latent knowledge estimation is that we want to get an estimation of what the student knows and what the student doesn't know. And so, in order to do that, we assess the student's knowledge of a specific knowledge component using observation of when the student succeeded or failed at applying that knowledge component in the past. And so, one of the algorithms that have been used to do latent knowledge estimation, one of the popular one is called knowledge tracing. Another type of analysis that we can do is structure discovery, and that includes algorithms such as clustering, factor analysis, domain structure discovery or network analysis. And unlike with predictive modeling, when we're doing structure discovery, we don't know exactly what the model is going to give us. So, we have maybe an idea of what we want to study but we don't know what we're going to find. And so, the algorithms are going to try to pick up on structures that emerge naturally from that data. So, one of the approach that we can use is what we call clustering, what it's going to do is it's going to try to look at different data points, for example different students, and then it's going to look at the data for each of those students and it's going to try to find students that group together because they're similar. One classical example of how that's been applied in online courses is using clustering to try to define major profile of engaged behaviors in online courses. A second type of Knowledge Discovery method is the study of social network analysis which is the study of social interactions between students or other actors, and here again, we want to try to identify patterns that are going to emerge from those interactions. So, one source of data we might have for social network analysis is look at data about how students interact together in online courses in their discussion forums, so we might try to look at, is there any community of students that forms, and then what is the learning outcomes of members of each of those communities? So, do people that succeed tend to group together or do they also interact with people that tends to not succeed? Another family of analysis is relationship mining that includes techniques like association rule mining, correlation mining, sequential pattern mining and causal data mining. So again, similarly to structure discovery, when we're doing relationship mining, we don't know exactly what's going to come up of the models that we build, of the analysis that we run. What we want to do is we want to allow the techniques to discover meaningful and unexpected relationship and the data. So, for example, we might use association rule mining to discover conditional rules of the form if something, then something else. So for example, we might look at the courses that a student takes and identify using association rule mining that if a student takes a class on educational data mining, then they're also very likely to take a class on digital learning environment, which would make sense because educational data mining can be easily applied to digital learning environments. We also have analysis of sequential pattern mining which is similar to association rule mining, but with the added component that there's a temporal associations between the events. So, here there's really a time component to the relationship, for example, if the student took the educational data of mining class, succeeded in completing the class, then maybe they are more likely to then publish at an indication of data mining conference. So, here there's a clear component of time, you need to succeed at the course and then you will publish. And finally, another type of analysis we can do is we can try to distill the data for human judgment, for example using visualization. So, the idea here is that we have a lot of data and we can't just look at the data and its rough form and actually get some meaningful information of it because there's so much data. So, what we do is we try to come up with visualization that allows us to add a quick glance, understand how that data is structured. So, that might be for a researcher for them to understand how a student interact with the digital learning environment, but it might also be for students or for teachers for them to understand what's going on in their classroom and then use that information to plan future lessons. And this is just an overview of different types of techniques, there's a lot of techniques that can be used including text mining algorithms, analysis of video data, analysis of audio data. For example, video analysis, we might look at the facial expression of a student as they're engaging with learning content and then we can use those stable expressions to try to identify whether the student's confused, frustrated, bored. We could look at data from Kinect sensors that's going to give us information about the posture of the student, that's going to be able to track their movement, so we can look at whether a student leans in or leans backward. What does that mean? Does that mean that often when people lean in, that's because they're more engaged with the content? So we can detect that. And so, just to quickly conclude, this was a really, really brief overview of what educational data mining and learning analytics are. They are very recent fields of study, but they are growing very rapidly due to little larger and larger amounts of digital data that comes from educational setting, and there's a lot of applications to educational data mining including the scientific study of the learning process itself, providing us insight on how to better design digital learning environments or using those model to automatically adapt digital learning environments.