🔖 [Data Science]Seaborn

2018 - 09 - 16
🔖 [Data Science]Seaborn
0. [Seaborn] In last post, we plotted a graph from the WorldCup dataframe using matplotlib which is the most basic and common library for data visualization in Jupyter. In this post, we will use another library. Seaborn is also a library to plot diagram by which we can plot colourful diagram and various graphs for data analysis. To begin, we have to install seaborn. Open terminal and type:
pip install seaborn

1. [Import libraries] After installing the Seaborn library, we can move on to the Jupyter Notebook. Open Jupyter and create a new note. Then import the libraries we need:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

2. [Import data] There are few steps to construct a diagram using Seaborn:
  1. Import data
  2. Setup figure
  3. Plotting
  4. Customize the diagram
We can use the same piece of data, WorldCup, for this practice. Import the dataset and check the head.
df = pd.read_csv('WorldCups.csv')

3. [Setting of figure] Similar to plt, Before plotting the actual diagram, we have to define how to figure aspects. Remeber figure is just like a container of the plot. To configure, we use the function sns.set_style() which can parameters as follows:
{darkgrid, whitegrid, dark, white, ticks}
Personally, I prefer darkgrid because it is easier to read the diagram on a white notebook background. So we define:
Also, we can define the size of the diagram:

4. [Types of plots in Seaborn] There are many types of graphs we can construct using Seaborn and hence it is useful for data visualization and data analysis. Graphs we can plot include but not limited to:
  • lmplot
  • distplot
  • jointgrid
  • countplot
  • headmap
Of course, there are far more options to choose from which will be covered when we move on to data analysis.
5. [Example: lmplot] Since Seaborn is based on Matplotlib, simple line graphs are plotted using plt.plot. What if we want to show the relationship between two columns? Linear regression is to plot scattered points on the graph with X against Y. A line is drawn to check if these two columns are proportionally related.
sns.lmplot('MatchesPlayed', 'GoalsScored', df)
The graph should look like this It reveals the MatchesPlayed and the GoalsScored have no direct linear relationship.
6. [Example: distplot] To show the normal distribution of data, we can use sns.distplot in which it will show the distribution of data(Histogram) and the KDE(Kernel Density Estimation).
The output should look like: From the diagram, the bars represent the histogram while the line is the KDE. We can alter to show either KDE or histogram by:
sns.displot(df['QualifiedTeams'], kde=False, hist=True)


There is no comment yet

New Comment

Please Login to comment