13. Random Numbers, Histograms, and a Simulation#

13.1. Introduction#

Random numbers are often useful both for simulation of physical processes and for generating a collection of test cases. Here we will do a mathematical simulation: approximating \(\pi\) on the basis that the unit circle occupies a fraction \(\pi/4\) of the \(2 \times 2\) square enclosing it.

Disclaimer#

Actually, the best we have available is pseudo-random numbers, generated by algorithms that actually produce a very long but eventually repeating sequence of numbers.

13.2. Module random within package numpy#

The pseudo-random number generator we use are provided by package Numpy in its module random – full name numpy.random. This module contains numerous random number generators; here we look at just a few.

We introduce the abbreviation “npr” for this, along with the standard abbreviations “np” for numpy and “plt” for module matplotlib.pyplot witit package matplotlib

import numpy.random as npr
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

13.3. Uniformly distributed real numbers: numpy.random.rand#

First, the function rand (full name numpy.random.rand) generates uniformly-distributed real numbers in the semi-open interval \([0,1)\).

To generate a single value, use it with no argument:

n_samples = 4
for sample in range(n_samples):
    print(npr.rand())
0.0013886060730762262
0.2550717355467501
0.3525373297323471
0.7304954501319436

Arrays of random values#

To generate an array of values all at once, one can specify how many as the first and only input argument:

pseudorandomnumbers = npr.rand(n_samples)
print(pseudorandomnumbers)
[0.69294602 0.35719477 0.0616249  0.30263391]

However, the first method has an advantage in some situations: neither the whole list of integers from 0 to n_samples - 1 nor the collection of random numbers is stored at any time: instead, just one value at a time is provided, used, and then “forgotten”. This can be beneficial or even essential when a very large number of random values is used; it is not unusual for a simulation to require more random values than the computer’s memory can hold.

Multi-dimensional arrays of random values#

We can also generate multi-dimensional arrays, by giving the lengths of each dimension as arguments:

numbers2d = npr.rand(2,3)
print('A two-dimensional array of random numbers:\n', numbers2d)
numbers3d = npr.rand(2,3,4)
print('\nA three-dimensional array of random numbers:\n', numbers3d)
A two-dimensional array of random numbers:
 [[0.8568646  0.01446659 0.99323541]
 [0.15164152 0.20092263 0.07016001]]

A three-dimensional array of random numbers:
 [[[0.60417816 0.1394466  0.39245547 0.49640564]
  [0.10991885 0.67334607 0.8062766  0.07716147]
  [0.41702894 0.41843031 0.45446995 0.41832134]]

 [[0.43781662 0.78649574 0.19269988 0.85384925]
  [0.54796197 0.79126265 0.0709657  0.24002659]
  [0.78334332 0.57768274 0.42831938 0.7356059 ]]]

13.4. Normally distributed real numbers: numpy.random.randn#

The function randn has the same interface, but generates numbers with the standard normal distribution of mean zero, standard deviation one:

print('Ten normally distributed values:\n', npr.randn(20))
Ten normally distributed values:
 [-0.09559602  0.27136876 -0.41885046 -0.56531452 -1.19216858 -1.03662914
 -1.07756701  0.14098146  0.00537863 -0.56485856  0.28438076 -1.6964393
  0.4285249  -0.55258009 -0.10073067  0.51608622  1.73088298  1.27510835
  0.34128797 -1.88994861]
n_samples = 10**7
normf_samples = npr.randn(n_samples)
mean = sum(normf_samples)/n_samples
print('The mean of these', n_samples, 'samples is', mean)
standard_deviation = np.sqrt(sum(normf_samples**2)/n_samples - mean**2)
print('and their standard deviation is', standard_deviation)
The mean of these 10000000 samples is 0.0005127728163497473
and their standard deviation is 0.9998744785989478

Note The exact mean and standard deviation of the standard normal distribtion are 0 and 1 respectively, so the slight variations above are due to these being only a sample mean and sample standard deviation.

13.5. Histogram plots with matplotlib.random.hist#

matplotlib.random has a function hist(x, bins, ...) for plotting histograms, so we can check what this normal distribution actually looks like.

Input parameter x is the list of values, and when input parameter bins is given an integer value, the data is binned into that many equally wide intervals.

The function hist also returns three values:

  • n, the number of values in each bin (the bar heights on the histogram)

  • bins (which I prefer to call bin_edges), the list of values of the edges between the bins

  • patches, which we can ignore!

Since for now we do not need the returned values, we can suppress them with a semi-colon:

plt.hist(normf_samples, 200);
_images/random-numbers-and-simulation_16_0.png

13.6. Random integers: numpy.random.randint#

One can generate pseudo-random integers, uniformly distributed between specified lower and upper values.

n_dice = 60
dice_rolls = npr.randint(1, 6+1, n_dice)
print(n_dice, 'random dice rolls:\n', dice_rolls)
# Count each outcome: this needs a list instead of an array:
dice_rolls_list = list(dice_rolls)
for value in (1, 2, 3, 4, 5, 6):
    count = dice_rolls_list.count(value)
    print(value, 'occured', count, 'times')
60 random dice rolls:
 [5 5 5 2 1 6 5 3 6 6 5 5 1 3 4 5 2 5 3 5 4 6 1 5 2 6 4 3 2 3 2 1 1 6 4 4 2
 5 5 2 3 4 5 3 4 2 6 1 1 5 2 6 6 1 4 3 2 2 2 2]
1 occured 8 times
2 occured 13 times
3 occured 8 times
4 occured 8 times
5 occured 14 times
6 occured 9 times

Specifying bin edges for the histogram#

This time, it is best to explicitly specify a list of the edges of the bins, by making the second argument bins a list.

With six values, seven edges are needed, and it looks nicest if they are centered on the integers.

bin_edges = [0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5]
plt.hist(dice_rolls, bin_edges);
_images/random-numbers-and-simulation_21_0.png

Run the above several times, redrawing the histogram each time; you should see a lot of variation.

Things average out with more rolls:

n_dice = 10**6
dice_rolls = npr.randint(1, 6+1, n_dice)
# Count each outcome: this needs a list instead of an array:
dice_rolls_list = list(dice_rolls)
for value in (1, 2, 3, 4, 5, 6):
    count = dice_rolls_list.count(value)
    print(value, 'occured a fraction', count/n_dice, 'of the time')
plt.hist(dice_rolls, bins = bin_edges);
1 occured a fraction 0.167125 of the time
2 occured a fraction 0.166378 of the time
3 occured a fraction 0.166272 of the time
4 occured a fraction 0.165994 of the time
5 occured a fraction 0.167016 of the time
6 occured a fraction 0.167215 of the time
_images/random-numbers-and-simulation_23_1.png

The histogram is now visually boring, but mathematically more satisfying.

13.7. Exercises#

Exercise A: approximating \(\pi\)#

We can compute approximations of \(\pi\) by using the fact that the unit circle occupies a fraction \(\pi/4\) of the circumscribed square:

plt.figure(figsize=[8, 8])
angle = np.linspace(0, 2*np.pi)
# Red circle 
plt.plot(np.cos(angle), np.sin(angle), 'r')
# Blue square
plt.plot([-1,1], [-1,-1], 'b')  # bottom side of square
plt.plot([1,1], [-1,1], 'b')  # right side  of square
plt.plot([1,-1], [1,1], 'b')  # top side of square
plt.plot([-1,-1], [1,-1], 'b');  # left side of square
_images/random-numbers-and-simulation_27_0.png

We can use this fact as follows:

  • generate a list of \(N\) random points in the square \([-1,1] \times [-1,1]\) that circumscribes the unit circle, by generating successive unifromly distributed random values for both the \(x\) and \(y\) coordinates.

  • compute the fraction of these that are inside the unit circle, which should be approximately \(\pi/4\). (\(N\) needs to be fairly large; you could try \(N=100\) initially, but will increase it later.)

  • Multiply by four and there you are!

Do multiple trials with the same number \(N\) of samples, to see the variation as an indication of accuracy.

Collect the various approximations of \(\pi\) in a list, compute the (sample) mean and (sample) standard deviation of the list, and illustrate with a histogram.

Exercise B#

It takes a lot of samples to get decent accuracy, so after part (a) is working, experiment with successively more samples; increase the number \(N\) of samples per trial by big steps, like factors of 100.

For each choice of sample size \(N\), compute the mean and standard deviation, and plot the histogram.