## What is Plotting?

Plotting data is an essential part of any exploratory or forecasting data analysis and probably the essential part of report writing. There are three principal methods to programmable plotting. We begin an incremental plot with a blank plot canvas and then insert graphs, axes, labels, legends, etc. additionally using specialized functions. Finally, we display the plot image and optionally store it into a file. Examples of incremental plotting tools contains the R language function plot(), the Python module pyplot, and the gnuplot command-line plotting program.

Monolithic plotting systems pass all important parameters, describing the graphs, charts, axes, labels, legends, etc. to the plotting function. We plot, decorate, and store the final plot at once. An example of a monolithic plotting tool is the R language function xyplot().

Finally, layered tools defines what to plot, how to plot, and any additional features as virtual “layers”; we add more layers as required to the “plot” object. An example of a layered plotting tool is the R language function ggplot().

## Example

```import matplotlib, matplotlib.pyplot as plt
import pickle, pandas as pd
# The NIAAA frame has been pickled before
del alco["Total"]
columns, years = alco.unstack().columns.levels
# The state abbreviations come straight from the file
"states.csv",
names=("State", "Standard", "Postal", "Capital"))
states.set_index("State", inplace=True)
# Alcohol consumption will be sorted by year 2009
frames = [pd.merge(alco[column].unstack(), states,
left_index=True, right_index=True).sort_values(2009)
for column in columns]
# How many years are covered?
span = max(years) - min(years) + 1```

The first code fragment simply imports all necessary modules and frames. It then combines NIAAA data and the state abbreviations into one frame and splits it into three separate frames by beverage type. The next code fragment is in charge of plotting.

```# Select a good-looking style
matplotlib.style.use("ggplot")
STEP = 5
# Plot each frame in a subplot
for pos, (draw, style, column, frame) in enumerate(zip(
(plt.contourf, plt.contour, plt.imshow),
(plt.cm.autumn, plt.cm.cool, plt.cm.spring),
columns, frames)):
# Select the subplot with 2 rows and 2 columns
plt.subplot(2, 2, pos + 1)
# Plot the frame
draw(frame[frame.columns[:span]], cmap=style, aspect="auto")
plt.colorbar()
plt.title(column)
plt.xlabel("Year")
plt.xticks(range(0, span, STEP), frame.columns[:span:STEP])
plt.yticks(range(0, frame.shape, STEP), frame.Postal[::STEP])
plt.xticks(rotation=-17)```

The functions imshow(), contour(), and contourf() (at 1) display the matrix as an image, a contour plot, and a filled contour plot, respectively. Don’t use these three functions (or any other plotting functions) in the same subplot, because they superimpose new plots on the previously drawn plots—unless that’s your intention, of course. The optional parameter cmap (at 3) specifies a prebuilt palette (color map) for the plot.

You can also add notes with annotate(), arrows with arrow(), and a legend block with legend(). In general, refer to the pyplot documentation for the complete list of embellishment functions and their arguments, but let’s at least add some arrows, notes, and a legend to an already familiar NIAAA graph:

## Example

```import matplotlib, matplotlib.pyplot as plt
import pickle, pandas as pd
# The NIAAA frame has been pickled before
# Select the right data
BEVERAGE = "Beer"
years = alco.index.levels
states = ("New Hampshire", "Colorado", "Utah")
# Select a good-looking style
plt.xkcd()
matplotlib.style.use("ggplot")
# Plot the charts
for state in states:
ydata = alco.ix[state][BEVERAGE]
plt.plot(years, ydata, "-o")
plt.annotate(s="Peak", xy=(ydata.argmax(), ydata.max()),
xytext=(ydata.argmax() + 0.5, ydata.max() + 0.1),
arrowprops={"facecolor": "black", "shrink": 0.2})
plt.ylabel(BEVERAGE + " consumption")
plt.title("And now in xkcd...")
plt.legend(states)
plt.savefig("../images/pyplot-legend-xkcd.pdf")```

## Plotting with Pandas

Both pandas frames and series support plotting through pyplot. When the plot() function is called without any parameters, it line-plots either the series or all frame columns with labels. If you specify the optional parameters x and y, the function plots column x against column y.

pandas also supports other types of plots via the optional parameter kind. The admissible values of the parameter are “bar” and “barh” for bar plots, “hist” for histograms, “box” for boxplots, “kde” for density plots, “area” for area plots, “scatter” for scatter plots, “hexbin” for hexagonal bin plots, and “pie” for pie charts. All plots allow a variety of embellishments, such as legends, color bars, controllable dot sizes (option s), and colors (option c).

### Example

```import matplotlib, matplotlib.pyplot as plt
import pickle, pandas as pd
# The NIAAA frame has been pickled before
# Select a good-locking style
matplotlib.style.use("ggplot")
# Do the scatter plot
STATE = "New Hampshire"
statedata = alco.ix[STATE].reset_index()
statedata.plot.scatter("Beer", "Wine", c="Year", s=100, cmap=plt.cm.autumn)
plt.title("%s: From Beer to Wine in 32 Years" % STATE)
plt.savefig("../images/scatter-plot.pdf")```

Apply now for Advanced Data Science Course