Hi folks. Welcome back to our module 2: Data Warehouse. This module will introduce to you a different perspective of managing our data. Also, it will provide to you a comparison between two different perspectives, the one we have learned so far, which is the structured operational database, and the other one is we're going to introduce to you today and you're going to make a comparison. Also, you're going to learn the data structure supports this perspective. The learning objectives for this module. Upon finishing this module you will be able to explain what are data warehouses and you are going to explain how data warehouses are different than structured databases or relational databases, and you are going to explain how the OLAP is different than OLTP in a schema of criterias. You are going to explain a typical OLAP operations as well and you are going to explain the basic data structure which is data cubes, what are they are, how to interpret, and how to create them. Let's get a little bit warm-up of what you have learned. In the specialization in course 1, you have learned the relational database design, how to go from idea to entity-relationship model and then to a relational model, and then to normalize it in third normal form, so you can implement it in a relational and physical database. Also in course 2, you learned the structure to query language for relational databases, so you can ask a question and it writes those queries and it gets the answers. In the first module of this course, you learned the role of database administration, you learned how to do concurrency control and authorization and what is the ethics of maintaining a database as an administrator. Those things are under the same envelope that is structured databases. We do have some pros of this structure, the query and the structured relational databases. Because it is very simple and it is very easy to use, easy to learn, and easy to implement and we can maintain a good level of accuracy of our data and by emphasizing the mandatory and optionality, we can have a reinforced data integrity as well and we can use normalization to maintain and minimize the data redundancy so we can easily update and delete and maintain the database. Also we can have the concurrency control to enable the collaboration among multiple users and we can enable the security control to keep those sensitive and important data away from a lower-level user etc. Those are the good things about structured databases. There are some cons as well. We have to maintain the structure to a query databases, structured relational databases and the cost is not low. There will be some cost of developing and maintaining and we have to store the data in a physical storage and we have to store not only data, but also the metadata. In the row and columns, we have so many details of the daily transactions which will take a lot of storage. The scalability of a structured database is not that good, is simply because we maintain the data in terms of the tables and rows and columns. If we have a large scale of business, if we have millions of rows, then it may not be able to efficiently update and query. The complexity for our queries, for the journeys, for the nested queries, it will be exploded when we have a very sophisticated query. Also the performance over time will be a major issue of structured query, and the structure of the databases basically because if we have a database, over time it will get larger and larger and the performance gets worse and worse. Those pros, while still proof that the structure of the databases are still widely used in nowadays, and it's still one of the major and mainstream of usage in a business. However, because of the terms, just as simply says, the structured query, structured database cannot fit every single scenario in our life, and we may have to have a different perspective of organizing our data. Natural leads to my question of next, and that is how we can organize things around us. Starting from you, think about where you are and think about when you are watching our video and doing a practice, preparing the labs, etc., and you may have a table. On a table you have your laptop, you have your camera, you have your phone, and basically on the table, if you organized your table efficiently, then you will have the most needed things on your table, the things you need immediately, the things you need very frequently. If you have something not needed frequently as the things on your table, then you may have a shelf. For the shelf, you may have to step several steps away, and you may have to reach to the shelf and take things out. In comparison of the table and the shelf, it is simply because of the cost and efficiency. We cannot build a table with unlimited size, the size will be a thing we have to consider. If you have a limit to the size of table, then you have to make priorities based on the frequency or the need of a certain thing. Based on the priority, those most important, knows the most frequently needed things will be on your table, and the less frequent needed things will be on the shelf. However, both the table and the shelf can provide a pretty convenient access to the things you put there. In comparison with another level of storage, of organizing your data that is storage or garage; if you have apartment, you may be assigned with a storage downstairs in the basement, and if I have a house, maybe you have a garage. Think about those things you put on the storage or in your garage. Though since you are going to maybe check once a year, and you are not going to need them frequently, and in those things you just send them away, so you still have access to them, but not as often as the ones you need for the table and the shelf. Other things that are belong to you, you have already prioritized them into three levels: table, shelf, and a storage or garage. You still have one thing and one type of thing that is not even belong to you but you may need some access to them. That is the books, journals, old newspapers in the libraries. So you have to get a card and go there, check them out, and read them and return them. That will be a lot of effort. But the benefit is you only need those material, maybe just a one time. Here is the table Shelf Storage and a Library, in terms of the frequency of unit name and we actually differentiate the organizing ways of them, from the most frequent ones to the less frequent ones to the not often ones to just once a lifetime. Those things can bring us a question, which one is the best? Hypothetically, if we can build a table with unlimited shelf size, of course we want everything on a table, so whenever we want them, we can access them. However, that is not the case. The table size is limited, and the cost of building that table will be very high. We have to set up a different level of storage of organizing so that we can put the data to the level in terms of how we are going to use them, how we are going to analyze them, how we are going to access them. That will be the actual question you may want to ask yourself. That is, we need all of them, we just need to set up the different rules of how we can organize them. In answering this question, you can do with a criteria that is the Schema. That is how shall I organize my data or how should my data be logically organized? The Schema will go from different perspective of how this data is related to you. For example, if you really want to have a minimal redundancy, or you want to remove all the dependencies of the partial dependency or transitive dependency, then you mainly sum normalization. That is one of the perspective. Also what is the most common user cases or what is the views of your data? What queries will be executed the most often? What genres among tables will be accessed most often? What is the typical question you may want to ask with your data? Also for the access control, shall we design the level of users, so that the only top users can access the most important and sensitive data and the lower level users will just to access our routine and the normal data? Also for the SQL and noSQL debates, and you may want to ask which one you want to do. For the structured query language, it requires a structured relational database. For the noSQL, it doesn't require a relational database, but it still needs some format of them and depends on how you are going to store your data, then you need to pick this query language. Another question about the user will be who is actually use the database. It is the normal workers who enter the data for you, or it is the management team who will use the data for decision making. Also, what is the typical operations of the database? What is the adding or deleting or maintaining, or it is a query or summary or aggregation? You need to find out or just check out what is the most common operations of that database you are going to use. Those were a bit of questions you need to think about and ask yourself, and in this questions will be related to the Data Warehouse, which is the one we're going to introduce to you today. We're going to be back a little bit later, and here will be the time for you to think, and you can post your answers in our discussion board, which is optional, but you are welcome to join our discussion. I'll see you later.