Fitting

Jed Rembold

April 10, 2023

Announcements

  • HW5 should hopefully be coming out over the coming weekend
    • Things have been a bit slowed by my travel
  • I almost got through HW2 feedback before I left, but not quite. I’m so sorry.
  • If my current connection does not allow a decent stream, then I’ll make a recording and post it.
  • On Tuesday we will have another remote lecture

Recap

  • Rotation or velocity curves measure the speed of orbiting objects as you move outwards from a central body
    • Commonly assume circular or largely circular orbits
    • Generally measured through Doppler shifts
  • The shape of a rotation curve tells you about the distribution of material throughout the region
  • The rotation curves of galaxies do not match what we would expect, thus leading to conjectures about dark matter

Discussing Today

  • What do we want from a model fit?
  • Why might we need methods beyond least-mean squares?
  • How can randomness help us determine the output of integrals?

Why fit data?

  • A “fit” model computes the values of unknown constants within the model
  • In science, and in astronomy in particular, such models are usually derived from fundamental first principles, and thus those constants are linked to physical properties or laws
    • The T when we fit Planck’s Law, for instance
  • Individual fits themselves may still not be that important: rather, they can contribute to an overall story that the data is telling
    • “Hotter things emit more light and bluer light”

Fitting Neglect

  • In the fitting that we have done so far this semester, we have been leaving some important things out
    • What role does any known error or variance play in our fitting of data?
    • How confident should we be in our fit parameters? That is, what variance should we expect in the fit parameters?
  • All of these things can sometimes be estimated easily from current techniques, and can sometimes have easy analytic results.
    • But what about in more complicated situations?

A seeming aside: Solving Integrals

  • Suppose you want to find the area under a function \(f(x)\) between \(x=a\) and \(x=b\), such that you are interested in \[ \int_a^b f(x)\,dx \]
  • If \(f(x)\) is simple and you know a bit of calculus, this is relatively straightforward
  • But what if \(f(x)\) is some terribly complicated function that you don’t know how to integrate?
    • Or worse, what if \(f(x)\) is a largely black-box, wherein you put in an input and then some time later get an output, without seeing what happens in the middle?

Alternative Approaches

  • When posed with an intractable integral, most folks these days look to:
    • Wolfram Alpha, or some other computation engine, for aid with analytic integration
    • Numerical techniques
  • Numerical techniques rely on breaking the integral up into simple approximations
    • The key here is that they are always approximations (midpoint rule, trapezoid rule, Simpson’s rule, etc.)
    • As approximations, there will always be some inherent error associated with using these methods

Entering the Multiverse

  • For nice one-dimensional problems, these methods can work out nicely
  • But what if our function was instead dependent on, for instance, 6 parameters? \[ f(x,y,z,a,b,c) \quad\rightarrow\quad \int\int\int\int\int\int f(x,y,z,a,b,c)\,dx\,dy\,dz\,da\,db\,dc \]
  • The trapezoid rule is a decent approximation method, but if you work it out, its error scales as: \[ \epsilon \propto \frac{1}{N^{2/d}} \] where \(N\) is the number of sample points (how skinny are the rectangles) and \(d\) the number of dimensions being integrated over

Compiling Errors

  • In order to keep a constant error value then as the number of dimensions increase: \[ N = (1/\epsilon)^{d/2} \]
  • This is highly problematic for higher dimensional data, as we need exponentially more data to keep the errors at a given level!
    • We need to compute the area of each of those trapezoids, so exponentially more trapezoids means exponentially longer running time

A Random Solution

  • What if, instead of slicing up the entire region of interest, we instead randomly sampled from that region
  • Computing, for instance, just the area at a given, randomly selected point
  • It turns out this error scales as: \[ \epsilon \propto \frac{1}{\sqrt{N}} \] which no longer has any \(d\) dependence!!
  • This is the basis behind random (or Monte Carlo) selection methods

Random Areas?

  • How do random points get us an accurate portrayal of the area under the curve?
    • At each random point, compute the value of the function at that point, this will be the height of your rectangle
    • The width of your rectangle is just the span of your boundaries
  • Some points will overestimate the area, and some will underestimate the area
  • The amazing aspect is that given enough points, the average of the point areas will closely approximate the actual area under the curve

Example

Suppose we wanted to evaluate the area under the curve \[ f(x) = -\frac{4}{5}x + 4 \] from \(x=0\) to \(x=5\), using Monte Carlo methods. This describes a triangle with height 4 and width 5, and thus should have an area of 20. Is that what we get?

Example 2

What if we slightly complicate this function by looking instead at a piece-wise function: \[ f(x) = \begin{cases} 0 & x < 3; \\ -\frac{4}{5}x + 4 & 3 \leq x \leq 5; \\ 0 & x > 5 \end{cases} \] This now describes a triangle with height of 1.6 and width of 2, thus having an area of 1.6. Does the Monte Carlo method still work?

Probability Distributions

  • A probability distribution describes the likelihood of selecting a particular value over a range of values.
  • The area under the probability distribution must sum to 1, so the odds add up.
  • In our current examples, any number between 0 and 5 was equally likely, which would have led to the probability distribution to the right.

Importance

  • For some functions, all points might not be equally interesting, and thus a constant probability distribution makes no sense.
  • We instead might want to weigh certain regions higher in our random selection, as that is where the interesting things are happening
  • If we do so, we need to recognize that in computing our areas, we are now doing \[ A = \frac{f(x_k)}{p(x_k)} \] where \(p(x_k)\) is the probability of selecting a particular \(x\)
    • This is the same as what we were doing before, since for a uniform sampling, the probability weights looked like 0.2, and 1/0.2 is 5.

Selecting from a probability distribution

  • Trying to select from a continuous probability distribution discretely is not ideal, but that is all that is immediately available through Numpy

  • For a better option, roll your own selector:

    def sample_from_prob_dist(pdf, lower, upper, num):
        samples = []
        while len(samples) < num:
            x = np.random.uniform(lower, upper)
            p = pdf(x)
            if np.random.uniform(0,1) <= p:
                samples.append(x)
        return np.array(samples)

Example 3

Now suppose we wanted to evaluate the area under the same piece-wise function as earlier: \[ f(x) = \begin{cases} 0 & x < 3; \\ -\frac{4}{5}x + 4 & 3 \leq x \leq 5; \\ 0 & x > 5 \end{cases} \] But this time choosing points based on two different probability distributions: \[ {p(x) = \begin{cases} 0 & x < 3; \\ 0.5 & 3 \leq x \leq 5; \\ 0 & x > 5 \end{cases}}\qquad {p(x) = \begin{cases} 0 & x < 3; \\ \frac{1}{2}\left(x - 3\right) & 3 \leq x \leq 5; \\ 0 & x > 5 \end{cases}} \]

An Unknown PDF

  • What if we don’t know or have a PDF to work with?
  • The perfect PDF would look just like the function we are trying to integrate, but scaled to have area = 1
    • But if we could easily calculate the area, we wouldn’t be doing this integration in the first place
  • Instead we will construct a Markov Chain to semi-randomly “walk” it’s way around our initial function
    • Goal is to have it spend more of its time near “higher” parts of the function
    • Then we’ll look at the distribution of where it spent time

Markov Chains

  • A Markov chain is a model that represents a sequence of events where the probability of the next event is determined only by the current event
  • In our case, the point we are currently “at” will make it likely that our next point is nearby
  • Our process will then follow that of Metropolis-Hastings and is shown below. Note that you must keep track of the visited points along the way!
    • Choose a starting point
    • For some number of iterations:
      • Randomly choose a nearby point
      • Compare that point to your current point, to see if it would move you “higher” up the function
      • If above a random threshold, move to that point, otherwise stay where you are at

Choosing a Nearby Point

  • Randomly choosing a nearby point inherently pulls from a probability distribution centered on the current point

  • Most frequently used is a Gaussian probability distribution

    • Centered (mean) on the current point
    • Spread (std) can be tuned, but 1 is a reasonable starting value
    np.random.normal(loc=0, spread=1)
  • Working in higher dimensions? A multi-variate Gaussian will still work!

    np.random.multivariate_normal(mean=VEC, cov=MAT)

Move onward or stay?

  • Once you have the new point candidate, you need to decide whether to keep it or your current point
  • In Metropolis-Hastings, you:
    • Take the ratio of your sampled function evaluated at the new point over the sampled function evaluated at the current \[ \frac{f(x^\prime)}{f(x)} \] where \(x\) is your current position and \(x^\prime\) your proposed new position
    • Compare that ratio to a randomly generated number between 0 and 1
      • If bigger, keep the new position, making it your new current
      • If smaller, keep the old position (but still add it to your visited list again)

Backing out a PDF

  • The history of the random walk contains information about where the walker spent the most time
    • Due to how we constructed the walk, this should be near the maximum portions of our function
  • Creating a histogram or KDE plot will allow us to visualize the probability distribution.
  • We could use the histogram information to directly sample from the distribution
// reveal.js plugins // Added plugins ,