Hi, last week, we understood NoSQL technology and how it differs from relational database. This week, we will be introduced to key value databases, their properties, scalability, indexing, et cetera, and we will refer more specifically to Apache Cassandra. Let's start. Remember that a key-value database is a system that stores values indexed by keys. It can store structured and unstructured data. Apache Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra supports clusters, spanning multiple data centers with asynchronous masterless replication allowing low latency operations for our clients. It is a highly scalable, possibly consistent, and distributed warehouse of key-value structures. It was started by Facebook and it is an open source Apache project written in Java, and then a multi platform database, which is a database that can be implemented under different operating systems. Some advantages of Cassandra for web development are; Cassandra is developed to be a distributed server, but it can also be run as a simple node. Horizontall scalability adds new hardware when necessary, quick answers even if demand grows, high write speeds to manage incremental data volumes, distributed storage, ability to change the data structure when users demand more functionality, a simple and clean API for your favorite programming language, automatic fault detection, there is no single point of failure which means that each node knows about the others, is decentralized, fault tolerant, and allows the use of Hadoop to use Map Reduce. Some disadvantages are that Cassandra can have some problems with; Ad-hoc queries, which means that these when user has to change queries to be suitable at his requirements, aggregations or use of SQL aggregate functions, unpredictable performance, for instance, chains on query time response. Cassandra uses a Gossip protocol, which is an internal communication, to allow communication within a ring, so that each node knows about other nodes. It allows to support decentralisation and tolerance to the partition. Cassandra is designed to be distributed on several machines that appear as a simple machine to customers. The most external structure of Cassandra is a cluster or ring. A node has a replica for different ranges of data. If something goes wrong, a replica can respond. The replication_factor parameter in the creation of a KeySpace indicates how many machines in the cluster will receive copies of the same data. Cassandra is more focused on availability and fault tolerance than inconsistency. Therefore, according to Apache Software Foundation, allows eventual consistency given in milliseconds. So Cassandra is AP. To support mainly the partition availability and tolerance, your system can return inaccurate data but the system will always be available even in front of a network partition. Descending for a scalable distributed data, Cassandra sacrifice a seed for performance, availability, and operational management advantages. Therefore, we need to learn some concepts in order to understand how Cassandra works. A column is composed of a name, value, and timestamp. One cluster corresponds to some machines that make up an instance of Cassandra. They can contain several Keyspaces. A keyspace is a namespace for a set of ColumFamily, associated with an application. Usually linked to a database in the relational model. A Columfamily contains multiple columns. They are usually linked to a table in the relational model. A Supercolumn is a set of columns that themselves have sub-columns. Supercolumns do not have timestamps unlike columns. A supercolumn is analogous to a record or top-part of a relational database. In Cassandra, the basic storage unit is a column, although there are super columns, families of columns, and the keyspace. Timestamps store the last update time of the column and are used for conflict resolution. A column name is analogous to an attribute name in a table in a relational database. A supercolumn is composed of an array of several columns. It is specified with a name and an ordered map of columns. Columns that belong to a super column are grouped using a common search value called Raw key. In other words, a super column is a nest key-value pair of columns. The external key-value pairs forms a super column while internal pairs corresponds to the columns. The figure shows the structure of a supercolumn. Our family of columns contains columns or grouped super columns that use a single common raw key. It can be seen as a set of key-valued pairs, where the keys are raw keys, and the values a map of column names. The figure visually presents the structure of a family of columns. Well, we can rest for now. Next session, we will continue learning more concepts. See you then.