**Tags**

Continuing my series on using python and matplotlib to generate common plots and figures, today I will be discussing how to make histograms, a plot type used to show the frequency across a continuous or discrete variable. Histograms are useful in any case where you need to examine the statistical distribution over a variable in some sample, like the brightness of radio galaxies, or the distance of quasars.

### What Kind of Data are we talking about?

Histograms are useful for plotting the distribution of numbers across a range of possible values. It works by taking a list of numbers, binning those numbers within a number of ranges, and counting the number of occurrences in each bin. I’ve used histograms at least once a week for my research, as they are fantastic tools for comparing populations, checking theoretical distributions against observed data, and countless other tasks. I used the histogram shown below to determine the velocity of cosmic ray muons due to time dilation of their decay lifetimes for one of my senior labs in my undergrad. Histograms answers simultaneously the questions “how many?” and “where?”.

### Getting Started with a simple example

In order to make a histogram, we need obviously need some data. Rather than make canned data manually, like in the last section, we are going to use the power of the Numpy python numerical library. If you don’t have Numpy installed, and run a Debian based distribution, just fire up the following command to install it on your machine:

sudo apt-get install python-numpy

What we will use for our data is 1000 random numbers, drawn from a Gaussian distribution. This is the common “normal” distribution, or the “bell curve” that occurs so frequently in nature. We will use a Gaussian centred about zero, with a standard deviation of 1.0 (this is the default for numpy.random.normal):

from numpy.random import normal gaussian_numbers = normal(size=1000)

Now that we have something to plot, let’s do it! The pyplot.hist() method is used for generating histograms, and will automatically select the appropriate range to bin our data. With axis labels, a title, and the show() method, our code will look like this:

import matplotlib.pyplot as plt from numpy.random import normal gaussian_numbers = normal(size=1000) plt.hist(gaussian_numbers) plt.title("Gaussian Histogram") plt.xlabel("Value") plt.ylabel("Frequency") plt.show()

Matplotlib’s histogram will default to using 10 bins, as the figure below shows.

### Formatting & Tweaking Our Histogram

We have 1000 points, so 10 bins is a bit small, and makes our histogram look pretty blocky. Let’s up the resolution by forcing matplotlib to use 20 bins instead.

plt.hist(gaussian_numbers, bins=20)

Next, let’s try plotting things as a probability distribution instead of just frequency counts. This will let have matplotlib integrate the total area of the histogram (this is just the total number in the array we feed matplotlib), and scale the values appropriately so that rather than showing how many numbers in each bin, we instead have a probability of finding a number in that bin. The total area of the histogram in this curve will be 1.

plt.hist(gaussian_numbers, bins=20, normed=True)

Another task we might want to do is plot a cumulative distribution function. This shows the probability of finding a number in a bin *or any lower bin.* Making this is as simple as throwing a single argument flag to hist(), just like making a probability distribution.

plt.hist(gaussian_numbers, bins=20, normed=True, cumulative=True)

Matplotlib will automatically compute appropriate bins for us, but often we need to know where our bins begin and end. Matplotlib allows us to pass a sequence of values defining the edges of our bins. Let’s see how many numbers are between -10 and -1, between -1 and 1, and between 1 and 10.

plt.hist(gaussian_numbers, bins=(-10,-1,1,10))

You also might want to change the look of the histogram. Let’s to plot an unfilled, stepped line rather than filled bars. I personally prefer the ‘stepfilled’ option for histtype, as it removes the ugly black lines between the bins. Those lines can get rather crowded if you have more than a few hundred bins, and end up really wrecking the look of your plot.

plt.hist(gaussian_numbers, bins=20, histtype='step')

Like a line plot, we can also plot two sets of values on the same axis with a histogram. In this case though, the plots will obscure each other if the histogram is filled. We can fix this problem easily using matplotlib’s ability to handle alpha transparency. Let’s make a histogram of uniformly distributed random numbers from -3 to 3 in red with 50% transparency over top the blue Gaussian.

import matplotlib.pyplot as plt from numpy.random import normal, uniform gaussian_numbers = normal(size=1000) uniform_numbers = uniform(low=-3, high=3, size=1000) plt.hist(gaussian_numbers, bins=20, histtype='stepfilled', normed=True, color='b', label='Gaussian') plt.hist(uniform_numbers, bins=20, histtype='stepfilled', normed=True, color='r', alpha=0.5, label='Uniform') plt.title("Gaussian/Uniform Histogram") plt.xlabel("Value") plt.ylabel("Probability") plt.legend() plt.show()

Well, there you have it. You should be able to go out and make your own histograms using matplotlib, python, and numpy. In the next post, I will introduce you to the power of matplotlib’s figure().

### Basic Data Plotting With Matplotlib

Part 2: Lines, Points & Formatting

Part 4: Multiple Plots (Coming Soon)

Part 5: ?

Pingback: Basic Data Plotting with Matplotlib Part 2: Lines, Points & Formatting « Bespoke Blog

Pingback: Basic Data Plotting with Matplotlib Part 1: Introduction « Bespoke Blog

xylem galadhon

said:This was a one of the nicest intros to matplotlib histo plotting i found on the web — thx, and hope you guys keep it up!

-XTG

(cosmology postdoc)

Pedro

said:I need to draw a histogram for some data I have stored in a file. Can’t figure or find on the web a way to do it though. Can you help me?

vtn

said:Same as xylem, enjoyed this histo tutorial

umek1

said:Reblogged this on umek chatter b201crew.

rlazo

said:Definitely, a great introduction to matplotlib and histogram plotting. Great work! I’ll be waiting for the next post 🙂

rlazo

said:Great post, the best introduction for matplotlib and histogram plotting. I’ll be waiting for the next post in this series 🙂

Jason

said:Hey really looking forward to part 4: Multiple plots.

Shaohong

said:This is very nice! Thanks!

Ripan

said:Thanks for this wonderful tutorial

Brian

said:Might be a simple question but for someone just beginning, could you possibly demonstrate how to use data from a txt or csv file rather then generating it? Great read, thank you!

Emmanuel

said:You will need to (1) first read the data and (2) store it in a numpy array. (3) Then use the array as the data.

For example:

#read data from text.txt in someDir/

dataFromFile = open(‘someDir/text.txt’).read()

#save the data into a numpy array. This might not be straight forward and greatly depends on the nature of your data. In this example I assume the data is merely numbers seperated by commas

import numpy as np

myNumpyArray = np.array(dataFromFile.split(”, ))

#Now you can use the myNumpyArray for the plots following the good examples shown at the top of this page.

joepassman

said:Awesome! Thank you. I am doing some protein simulations and am trying to convince the post-doc I work under that python is just as versatile as gnuplot.

jack parmer

said:Nice post! Here are the same histograms made with Python in Plotly: https://plot.ly/~jackp/639

Marc Telesha

said:I was sad to see you never did the other post 😦 Anyway you might be encouraged to complete this series going?

Dilip Kale

said:So clear, so illustrative ! Please, please , please continue and complete the series to cover all topics. I am highly obliged even for this much…

bobthepanda

said:Reblogged this on Um Panda na Garagem.

Pingback: Basic Data Plotting with Matplotlib Part 3: Histograms | duarthiago's logbook

Mick

said:Awesome tutorial, thank u ^.^

Hayden

said:Hey this was really helpful thank you! All I wanted was a simple template for how to do a very basic histogram plot and you answered my call! The basics of how this part of the package works makes perfect sense now after using your example and having a wee tinker.

zanderdai

said:Reblogged this on zanderdai and commented:

Useful Technique