Using an exploratory data analysis technique (data visualization) to reveal different patterns or hidden properties of the data. A non-exhaustive but Informative read.
2. Introduction
Exploratory data analysis as the name implies is a technique used for exploring data. EDA is
an iterative process that entails:
1. Generating questions about your data.
2. Searching for answers through visualizations, transformations, and modeling.
3. Using acquired answers to generate new questions.
Most Data scientists consider EDA to be an art because it does not follow a formal process
with strict rules (well I belong to that category 😁). This group of slides, however, is not
exhaustive but covers many important ideas that will help you find the most common
patterns in data using data visualization, and it is a recommended read especially for
beginners.
3. Histogram
Histograms show the distribution of values a
variable takes in a particular set of data.
It’s particularly useful for seeing the shape
of the data distribution patterns in some detail.
4. Box plot
Box plots show the range of values a variable can take.
It’s useful for seeing where most of the data fall,
and to catch outliers.
The red line in the center is the median, the edges
of the box are the 25th and 75th percentiles,
and the lone points by themselves are outliers.
5. Cumulative Distribution Function (CDF)
CFDs show how much of the data is less than a
certain amount. It’s useful for comparing the
data distribution to some reference distribution.
6. Scatter plot
Scatter plots show the relationship between two variables.
It’s useful when trying to find out what kind of a
relationship exists between variables.
7. Bar plot
Bar plots show comparisons between discrete
Categories. Thus, they are highly useful for exploring
and summarising categorical data.
One axis of the plot shows the specific categories
being compared, and the other axis represents a
measured value.
8. Line graph
Line graphs are useful for visualizing trends over time.
The vertical axis could represent any variable, but
the horizontal axis ordinarily represents a time variable.
The continuous line implies some quantity that
increases sequentially (one that increases over time).