logo

Learning from data

  • About this Jupyter Book

Course overview

  • Objectives

Topics

  • 1. Basics of Bayesian statistics
    • 1.1. Lecture 1
    • 1.2. Exploring PDFs
    • 1.3. Checking the sum and product rules, and their consequences
    • 1.4. Lecture 2
    • 1.5. Interactive Bayesian updating: coin flipping example
    • 1.6. Standard medical example by applying Bayesian rules of probability
    • 1.7. Radioactive lighthouse problem
    • 1.8. Lecture 3
  • 2. Bayesian parameter estimation
    • 2.1. Lecture 4: Parameter estimation
    • 2.2. Parameter estimation example: Gaussian noise and averages
    • 2.3. Assignment: 2D radioactive lighthouse location using MCMC
    • 2.4. Lecture 5
    • 2.5. Parameter estimation example: fitting a straight line
    • 2.6. Linear Regression and Model Validation demonstration
    • 2.7. Lecture 6
    • 2.8. Amplitude of a signal in the presence of background
    • 2.9. Assignment: Follow-ups to Parameter Estimation notebooks
    • 2.10. Linear Regression exercise
    • 2.11. Linear algebra games including SVD for PCA
    • 2.12. Follow-up: fluctuation trends with # of points and data errors
  • 3. MCMC sampling I
    • 3.1. Lecture 7
    • 3.2. Metropolis-Hasting MCMC sampling of a Poisson distribution
    • 3.3. Lecture 8
    • 3.4. Exercise: Random walk
  • 4. Why Bayes is better
    • 4.1. Lecture 9
    • 4.2. A Bayesian Billiard game
    • 4.3. Lecture 10
    • 4.4. Parameter estimation example: fitting a straight line II
    • 4.5. Lecture 11
    • 4.6. Error propagation: Example 3.6.2 in Sivia
    • 4.7. Visualization of the Central Limit Theorem
    • 4.8. Building intuition about correlations (and a bit of Python linear algebra)
    • 4.9. Lecture 12
    • 4.10. Overview: MCMC Diagnostics
    • 4.12. Lecture 13
    • 4.13. Dealing with outliers
  • 5. Model selection
    • 5.1. Lecture 14
    • 5.2. Lecture 15
    • 5.3. Evidence calculation for EFT expansions
    • 5.4. Lecture 16
    • 5.5. Example: Parallel tempering for multimodal distributions
    • 5.6. Example: Parallel tempering for multimodal distributions vs. zeus
  • 6. MCMC sampling II
    • 6.1. Lecture 17
    • 6.2. Quick check of the distribution of normal variables squared
    • 6.3. Liouville Theorem Visualization
    • 6.4. Solving orbital equations with different algorithms
    • 6.5. Lecture 18
    • 6.6. PyMC3 Introduction
    • 6.7. Getting started with PyMC3
    • 6.8. Comparing samplers for a simple problem
    • 6.9. zeus: Sampling from multimodal distributions
  • 7. Gaussian processes
    • 7.1. Lecture 19
    • 7.2. Gaussian processes demonstration
    • 7.3. Learning from data: Gaussian processes
    • 7.4. Exercise: Gaussian Process models with GPy
    • 7.5. Lecture 20
  • 8. Assigning probabilities
    • 8.1. Lecture 21
    • 8.2. Ignorance pdfs: Indifference and translation groups
    • 8.3. MaxEnt for deriving some probability distributions
    • 8.4. Maximum Entropy for reconstructing a function from its moments
    • 8.5. Making figures for Ignorance PDF notebook
  • 9. Machine learning: Bayesian methods
    • 9.1. Lecture 22
    • 9.2. Bayesian Optimization
    • 9.3. Lecture 23
    • 9.4. What Are Neural Networks?
    • 9.5. Neural networks
    • 9.6. Neural network classifier demonstration
    • 9.7. Bayesian neural networks
    • 9.8. Lecture 24
    • 9.9. Variational Inference: Bayesian Neural Networks
    • 9.10. What is a convolutional neural network?
  • 10. PCA, SVD, and all that
    • 10.1. Lecture 25
    • 10.2. Linear algebra games including SVD for PCA

Mini-projects

  • Mini-project I: Parameter estimation for a toy model of an EFT
  • Mini-project IIa: Model selection basics
  • Mini-project IIb: How many lines are there?
  • Mini-project IIIa: Bayesian optimization
  • Mini-project IIIb: Bayesian Neural Networks

Reference material

  • Bibliography
  • Related topics
  • Using Anaconda
  • Using GitHub
  • Python and Jupyter notebooks
    • Python and Jupyter notebooks: part 01
    • Python and Jupyter notebooks: part 02
  • Examples: Jupyter jb-book

Notebook keys

  • Checking the sum and product rules, and their consequences Key
  • Standard medical example by applying Bayesian rules of probability Key
  • Radioactive lighthouse problem Key
Powered by Jupyter Book
Contents
  • A. Parameter estimation example: fitting a straight line
    • First make tables
    • Now make a function for rerunning
    • Now make linear and log-log plots

2.12. Follow-up: fluctuation trends with # of points and data errors¶

This is a follow-up to Assignment: Follow-ups to Parameter Estimation notebooks, which focuses on the trends of fluctuations and how to visualize them.

A. Parameter estimation example: fitting a straight line¶

  2.   Do exercise 3: “Change the random number seed to get different results and comment on how the maximum likelihood results fluctuate. How are size of the fluctuations related to the number of data points \(N\) and the data error standard deviation \(dy\)? (Try changing them!)”

The size of the fluctuations decrease as the square root of the number of points N. As the data error standard deviation increases, the size of the fluctuations increases linearly with dy.

How do we obtain, visualize, and understand these results?

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import seaborn; seaborn.set('talk') # for plot formatting
from scipy import optimize

def make_data(intercept, slope, N=20, dy=5, rseed=None):
    """Given a straight line defined by intercept and slope:
          y = slope * x + intercept
       generate N points randomly spaced points from x=0 to x=100
       with Gaussian (i.e., normal) error with mean zero and standard
       deviation dy.
       
       Return the x and y arrays and an array of standard deviations.
    """
    rand = np.random.RandomState(rseed) 
    x = 100 * rand.rand(N)  # choose the x values randomly in [0,100]
    y = intercept + slope * x  # This is the y value without noise
    y += dy * rand.randn(N)    # Add in Gaussian noise
    return x, y, dy * np.ones_like(x)  # return coordinates and error bars

def log_likelihood(theta, x, y, dy):
    y_model = theta[0] + theta[1]*x
    return -0.5*np.sum(np.log(2*np.pi*dy**2)+ (y - y_model)**2/dy**2)

def minfunc(theta, x, y, dy):
    """
    Function to be minimized: minus the logarithm of the likelihood.
    """
    return -log_likelihood(theta, x, y, dy)

First make tables¶

intercept = 25.   # true intercept (called b elsewhere)
slope = 0.5       # true slope (called m elsewhere)
theta_true = [intercept, slope]  # put parameters in a true theta vector

iterations = 10
# Fix dy and vary Npts geometrically
dy_data = 5
for Npts in [20, 80, 320]:
    print(f'N = {Npts}, dy = {dy_data}')
    print('          intercept   slope')
    print(f'true:       {intercept:.3f}    {slope:.3f}')
    
    for i in np.arange(iterations):        
        x, y, dy = make_data(*theta_true, N=Npts, dy=dy_data, rseed=None)
        result = optimize.minimize(minfunc, x0=[0, 0], args=(x, y, dy))
        intercept_fit, slope_fit = result.x
    
        print(f'dataset {i}:  {intercept_fit:.3f}    {slope_fit:.3f}')
    print('------------------------------\n')
N = 20, dy = 5
          intercept   slope
true:       25.000    0.500
dataset 0:  26.121    0.488
dataset 1:  24.379    0.477
dataset 2:  28.448    0.406
dataset 3:  31.338    0.434
dataset 4:  19.791    0.566
dataset 5:  23.434    0.500
dataset 6:  21.570    0.543
dataset 7:  23.114    0.549
dataset 8:  28.597    0.457
dataset 9:  19.860    0.545
------------------------------

N = 80, dy = 5
          intercept   slope
true:       25.000    0.500
dataset 0:  24.694    0.508
dataset 1:  25.215    0.490
dataset 2:  24.527    0.520
dataset 3:  24.962    0.485
dataset 4:  23.997    0.499
dataset 5:  24.504    0.497
dataset 6:  23.002    0.529
dataset 7:  22.897    0.537
dataset 8:  24.026    0.514
dataset 9:  24.027    0.502
------------------------------

N = 320, dy = 5
          intercept   slope
true:       25.000    0.500
dataset 0:  25.780    0.496
dataset 1:  24.403    0.505
dataset 2:  25.067    0.504
dataset 3:  24.019    0.518
dataset 4:  25.243    0.498
dataset 5:  24.903    0.494
dataset 6:  25.476    0.497
dataset 7:  24.273    0.508
dataset 8:  25.363    0.496
dataset 9:  24.602    0.499
------------------------------
# Fix Npts and vary dy geometically
Npts = 80
for dy_data in [1, 5, 25]:
    print(f'N = {Npts}, dy = {dy_data}')
    print('          intercept   slope')
    print(f'true:       {intercept:.3f}    {slope:.3f}')
    
    for i in np.arange(iterations):        
        x, y, dy = make_data(*theta_true, N=Npts, dy=dy_data, rseed=None)
        result = optimize.minimize(minfunc, x0=[0, 0], args=(x, y, dy))
        intercept_fit, slope_fit = result.x
    
        print(f'dataset {i}:  {intercept_fit:.3f}    {slope_fit:.3f}')
    print('------------------------------\n')
N = 80, dy = 1
          intercept   slope
true:       25.000    0.500
dataset 0:  25.281    0.494
dataset 1:  24.989    0.504
dataset 2:  25.134    0.499
dataset 3:  25.022    0.502
dataset 4:  24.651    0.512
dataset 5:  25.229    0.497
dataset 6:  24.828    0.502
dataset 7:  25.318    0.495
dataset 8:  25.282    0.497
dataset 9:  25.282    0.495
------------------------------

N = 80, dy = 5
          intercept   slope
true:       25.000    0.500
dataset 0:  24.720    0.508
dataset 1:  23.637    0.521
dataset 2:  26.342    0.468
dataset 3:  24.035    0.514
dataset 4:  26.363    0.462
dataset 5:  23.992    0.523
dataset 6:  25.620    0.485
dataset 7:  25.440    0.502
dataset 8:  25.571    0.475
dataset 9:  26.452    0.472
------------------------------

N = 80, dy = 25
          intercept   slope
true:       25.000    0.500
dataset 0:  20.129    0.527
dataset 1:  18.857    0.604
dataset 2:  18.320    0.707
dataset 3:  27.873    0.399
dataset 4:  19.448    0.561
dataset 5:  21.213    0.558
dataset 6:  19.209    0.550
dataset 7:  16.658    0.656
dataset 8:  28.626    0.451
dataset 9:  24.319    0.450
------------------------------

Now make a function for rerunning¶

def std_of_fit_data(Npts, dy_data, iterations, theta_true=theta_true):
    """Calculate the standard deviations of the slope and intercept 
       for a given number of iterations
    """ 
    intercept_fits = np.zeros(iterations)
    slope_fits = np.zeros(iterations)

    for j in np.arange(iterations):        
        x, y, dy = make_data(*theta_true, N=Npts, dy=dy_data, rseed=None)
        result = optimize.minimize(minfunc, x0=[0, 0], args=(x, y, dy))
        intercept_fits[j], slope_fits[j] = result.x
        
    return intercept_fits.std(), slope_fits.std()    
std_of_fit_data(20, 5, 20)
(3.3694189754675885, 0.05485217437350401)
std_of_fit_data(80, 5, 20)
(1.1785970628509084, 0.018145922809239247)
std_of_fit_data(320, 5, 20)
(0.5223960837377885, 0.008440770742473696)

Now make linear and log-log plots¶

Which is better? How do you read a power law from a log-log plot?

Npts_array = [20 * 2**i for i in range(10)]
Npts_array
[20, 40, 80, 160, 320, 640, 1280, 2560, 5120, 10240]
# Fix dy and vary Npts geometrically
dy_data = 5

Npts_array = [20 * 2**j for j in range(10)]
intercept_std_array = np.zeros(len(Npts_array))
slope_std_array = np.zeros(len(Npts_array))

iterations = 20
for i, Npts in enumerate(Npts_array):
    intercept_std_array[i], slope_std_array[i] = std_of_fit_data(Npts, dy_data, iterations)   

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

axes[0,0].set_title(r'intercept fluctuations vs $N$')
axes[0,0].set_xlabel(r'$N$')
axes[0,0].plot(Npts_array, intercept_std_array)

axes[0,1].set_title(r'intercept fluctuations vs $N$')
axes[0,1].set_xlabel(r'$N$')
axes[0,1].loglog(Npts_array, intercept_std_array)
axes[0,1].set_xlim(10,1e4)
axes[0,1].set_ylim(.01,10)
axes[0,1].set_aspect('equal')

axes[1,0].set_title('slope fluctuations vs $N$')
axes[1,0].set_xlabel(r'$N$')
axes[1,0].plot(Npts_array, slope_std_array)

axes[1,1].set_title('slope fluctuations vs $N$')
axes[1,1].set_xlabel(r'$N$')
axes[1,1].loglog(Npts_array, slope_std_array)
axes[1,1].set_xlim(10,1e4)
axes[1,1].set_ylim(1e-3,1e-1)
axes[1,1].set_aspect('equal')

fig.tight_layout()
../../_images/fluctuation_trend_with_number_of_points_and_data_errors_14_0.png
# Fix Npts and vary dy geometrically
Npts = 20

dy_array = [1 * 2**j for j in range(10)]
intercept_std_array = np.zeros(len(dy_array))
slope_std_array = np.zeros(len(dy_array))

iterations = 20
for i, dy_data in enumerate(dy_array):
    intercept_std_array[i], slope_std_array[i] = std_of_fit_data(Npts, dy_data, iterations)   

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

axes[0,0].set_title(r'intercept fluctuations vs $dy$')
axes[0,0].set_xlabel(r'$dy$')
axes[0,0].plot(Npts_array, intercept_std_array)

axes[0,1].set_title(r'intercept fluctuations vs $dy$')
axes[0,1].set_xlabel(r'$dy$')
axes[0,1].loglog(Npts_array, intercept_std_array)
axes[0,1].set_xlim(1e1,1e4)
axes[0,1].set_ylim(.1,1000)
#axes[0,1].set_aspect('equal')

axes[1,0].set_title('slope fluctuations vs $dy$')
axes[1,0].set_xlabel(r'$dy$')
axes[1,0].plot(Npts_array, slope_std_array)

axes[1,1].set_title('slope fluctuations vs $dy$')
axes[1,1].set_xlabel(r'$dy$')
axes[1,1].loglog(Npts_array, slope_std_array)
axes[0,1].set_xlim(1e1,1e4)
axes[1,1].set_ylim(1e-3,1e+1)
#axes[1,1].set_aspect('equal')

fig.tight_layout()
../../_images/fluctuation_trend_with_number_of_points_and_data_errors_15_0.png
  1. In both sets of joint posterior graphs, are the slope and intercept correlated? How do you know? Does it make sense?

  2. For the first set of data, answer the question: “What do you conclude about how the form of the prior affects the final posterior in this case?”

  3. For the second set of data, answer the question: “Why in this case does the form of the prior have a clear effect?” You should consider both the size of the error bars and the number of data points (try changing them to explore the impact).

2.11. Linear algebra games including SVD for PCA 3. MCMC Sampling I

By Dick Furnstahl
© Copyright 2021.