Quick Contact


    What is Plotting?

    Plotting data is an essential part of any exploratory or forecasting data analysis and probably the essential part of report writing. There are three principal methods to programmable plotting. We begin an incremental plot with a blank plot canvas and then insert graphs, axes, labels, legends, etc. additionally using specialized functions. Finally, we display the plot image and optionally store it into a file. Examples of incremental plotting tools contains the R language function plot(), the Python module pyplot, and the gnuplot command-line plotting program.

    Monolithic plotting systems pass all important parameters, describing the graphs, charts, axes, labels, legends, etc. to the plotting function. We plot, decorate, and store the final plot at once. An example of a monolithic plotting tool is the R language function xyplot().

    Finally, layered tools defines what to plot, how to plot, and any additional features as virtual “layers”; we add more layers as required to the “plot” object. An example of a layered plotting tool is the R language function ggplot().

    Example

    import matplotlib, matplotlib.pyplot as plt

    import pickle, pandas as pd

    # The NIAAA frame has been pickled before

    alco = pickle.load(open("alco.pickle", "rb"))

    del alco["Total"]

    columns, years = alco.unstack().columns.levels

    # The state abbreviations come straight from the file

    states = pd.read_csv(

    "states.csv",

    names=("State", "Standard", "Postal", "Capital"))

    states.set_index("State", inplace=True)

    # Alcohol consumption will be sorted by year 2009

    frames = [pd.merge(alco[column].unstack(), states,

    left_index=True, right_index=True).sort_values(2009)

    for column in columns]

    # How many years are covered?

    span = max(years) - min(years) + 1

    The first code fragment simply imports all necessary modules and frames. It then combines NIAAA data and the state abbreviations into one frame and splits it into three separate frames by beverage type. The next code fragment is in charge of plotting.

    # Select a good-looking style

    matplotlib.style.use("ggplot")

    STEP = 5

    # Plot each frame in a subplot

    for pos, (draw, style, column, frame) in enumerate(zip(

    (plt.contourf, plt.contour, plt.imshow),

    (plt.cm.autumn, plt.cm.cool, plt.cm.spring),

    columns, frames)):

    # Select the subplot with 2 rows and 2 columns

    plt.subplot(2, 2, pos + 1)

    # Plot the frame

    draw(frame[frame.columns[:span]], cmap=style, aspect="auto")

    # Add embellishments

    plt.colorbar()

    plt.title(column)

    plt.xlabel("Year")

    plt.xticks(range(0, span, STEP), frame.columns[:span:STEP])

    plt.yticks(range(0, frame.shape[0], STEP), frame.Postal[::STEP])

    plt.xticks(rotation=-17)

    The functions imshow(), contour(), and contourf() (at 1) display the matrix as an image, a contour plot, and a filled contour plot, respectively. Don’t use these three functions (or any other plotting functions) in the same subplot, because they superimpose new plots on the previously drawn plots—unless that’s your intention, of course. The optional parameter cmap (at 3) specifies a prebuilt palette (color map) for the plot.

    You can also add notes with annotate(), arrows with arrow(), and a legend block with legend(). In general, refer to the pyplot documentation for the complete list of embellishment functions and their arguments, but let’s at least add some arrows, notes, and a legend to an already familiar NIAAA graph:

    Example

    import matplotlib, matplotlib.pyplot as plt

    import pickle, pandas as pd

    # The NIAAA frame has been pickled before

    alco = pickle.load(open("alco.pickle", "rb"))

    # Select the right data

    BEVERAGE = "Beer"

    years = alco.index.levels[1]

    states = ("New Hampshire", "Colorado", "Utah")

    # Select a good-looking style

    plt.xkcd()

    matplotlib.style.use("ggplot")

    # Plot the charts

    for state in states:

    ydata = alco.ix[state][BEVERAGE]

    plt.plot(years, ydata, "-o")

    # Add annotations with arrows

    plt.annotate(s="Peak", xy=(ydata.argmax(), ydata.max()),

    xytext=(ydata.argmax() + 0.5, ydata.max() + 0.1),

    arrowprops={"facecolor": "black", "shrink": 0.2})

    # Add labels and legends

    plt.ylabel(BEVERAGE + " consumption")

    plt.title("And now in xkcd...")

    plt.legend(states)

    plt.savefig("../images/pyplot-legend-xkcd.pdf")

    Plotting with Pandas

    Both pandas frames and series support plotting through pyplot. When the plot() function is called without any parameters, it line-plots either the series or all frame columns with labels. If you specify the optional parameters x and y, the function plots column x against column y.

    pandas also supports other types of plots via the optional parameter kind. The admissible values of the parameter are “bar” and “barh” for bar plots, “hist” for histograms, “box” for boxplots, “kde” for density plots, “area” for area plots, “scatter” for scatter plots, “hexbin” for hexagonal bin plots, and “pie” for pie charts. All plots allow a variety of embellishments, such as legends, color bars, controllable dot sizes (option s), and colors (option c).

    Example

    import matplotlib, matplotlib.pyplot as plt

    import pickle, pandas as pd

    # The NIAAA frame has been pickled before

    alco = pickle.load(open("alco.pickle", "rb"))

    # Select a good-locking style

    matplotlib.style.use("ggplot")

    # Do the scatter plot

    STATE = "New Hampshire"

    statedata = alco.ix[STATE].reset_index()

    statedata.plot.scatter("Beer", "Wine", c="Year", s=100, cmap=plt.cm.autumn)

    plt.title("%s: From Beer to Wine in 32 Years" % STATE)

    plt.savefig("../images/scatter-plot.pdf")

    Copyright 1999- Ducat Creative, All rights reserved.