Exploratory data analysis in R
In this post we are going to use two packages designed to perform exploratory data analysis. This is usually the first thing to do with any data base or data frame, to get to know the data, its distribution and possible missing data. These two packages are skimr
and dataxray
.
Data
For this example, we are going to use flights
dataset.
skimr
Skimr is a package designed to skim over the data and get a basic description of the data. The information included in this disciption includes:
- Number of rows and columns.
- Number of variables by type (character, numeric, date).
- Number and percentage of missing values.
- Number of unique values (character) or mean, sd and quartiles (numeric).
- Histograms.
You can also use traditional dplyr syntax to select only particular variables or non missing rows.
dataxray
Dataxray is a package that performs a similar exploratory data analysis to skimr
, but its main advantage is that it has an interactive interface and has a fancier design. Nonetheless, it shows almost the same information as skimr
; however it takes more time to show the results.