Welcome to R Packages for Data Science. In this video, you’ll learn about the tidyverse library, a collection of R packages that you can use to perform operations on your datasets. You’ll also learn how select and filter data using some of the key functions in the dplyr package. An R package bundles together code, data, documentation, and tests, and is easy to share with others. One of the reasons that R is so successful is that there is large variety of packages available. There is a high chance that someone has already solved a problem similar to what you’re working on, and you can benefit from their work by downloading their package. In this course, you’ll primarily use the tidyverse library, which is a collection of essential R packages for data science. The core tidyverse library contains packages that you’re likely to use in everyday data analysis. The tidyverse library is divided into four groups: The Data Wrangling and Transformation group includes packages like dplyr and tidyr. The greatest advantage of these packages is you can use the pipe operator to combine different functions. From filtering to grouping the data, this package does it all. The Data Import and Management group includes the readr package. This package solves the problem of parsing a flat file, like a .csv file, into a tibble. The Functional Programming group includes the purrr package. This package provides statistics for the dataset, such as calculating the mean value for each column. And the Data Visualization and Exploration group includes the ggplot2 package. Data scientists love using ggplot2 to produce charts and visualizations, such as box plots, density plots, violin plots, tile plots, and time series plots. The dplyr package from the tidyverse introduces functions that perform some of the most common operations you’ll use when working with data. There are five key dplyr functions: The select() function selects variables by their names, the filter() function filters observations based on values, the summarize() function computes summary statistics, the arrange() function reorders the rows, and the mutate() function creates new variables. For example, let’s say that you’re only interested in a few of the columns in the Airline dataset. You can use the select() function to select only the columns that you want. You’ll first specify the dataset you will perform the action on, in this case “sub_airline”, and then the columns you would like to select. You can also filter the dataset to only show the American Airlines flights. You can use the filter() function to return the rows where the Reporting_Airline is equal to “AA”. But what if you want to select a few columns and then filter the observations based on some criteria? To combine functions, use the pipe operator. You can read the pipe operator as “then” in the function. For example, you start with a dataset, “then” you select some columns, and “then” you perform a filter. This function combines the two separate steps in the previous slides. It applies the select() and filter() functions to the sub_airline dataset, by first selecting the columns Month, DayOfWeek, Reporting_Airline, and CRSDepTime and then filtering the rows where Reporting_Airline is equal to “AA”. You can use the pipe operator to combine as many functions as needed. In this video, you learned that tidyverse packages, including dplyr, tidyr, readr, purr, and ggplot2, provide a wealth of functionality that you can leverage in your data analysis. Functions provided in packages, like dplyr, help you perform some of the most common operations you’ll use when working with data, like selecting, filtering, summarizing, arranging, and mutating, and you can even combine functions using the pipe operator to build more powerful operations.