Homework 12

For the problem below, the expectation is that you submit a standalone HTML file (any images should be embedded) back to GitHub. There is no provided data for this problem. Since this is the first assignment in a new unit, you’ll need to form your new groups when you accept the assignment. The first person to accept the assignment should form an easily identifiable group name (likely using the names of your group members). Others in the group should also accept the assignment, and then they can just join the existing group.

Accept Assignment

Problem 1: Dark Matter

This problem investigates dark matter density models and how they affect the resulting rotation curve of orbiting objects. It will initially focus on a simpler model of planets in the solar system, before moving on to a galactic rotation curve that requires a dark matter treatment.

Part A: Planets

The file planets_rot.csv in the data directory contains distance and rotational speed information for all planets in the solar system. Note that the column units are specified in the second row of the CSV file and preceded with a #, so you can provide an option when reading in the CSV to ignore those rows. For these data you will adopt a simple density profile of: \rho(r) = \begin{cases} \rho_0, &\text{if }r < r_0 \\ 0, &\text{otherwise}\end{cases} where r_0 and \rho_0 are the two free parameters you’ll fit the data to. We saw in class how the velocity is given by: v(r) = \sqrt{\frac{GM_{in}(r)}{r}} where M_{in}(r) is the total mass within radius r, which is just given by the integrated density \rho(r):

M_{in}(r) = \int_0^r 4\pi \rho(r^\prime) {r^\prime}^2\,dr^\prime

in spherical coordinates, as we are assuming spherical symmetry. While you could compute M_{in} numerically using your own method or an establish numerical solver like Scipy’s quad or R’s integrate, I’d recommend using the provided calc_velocity model provided in the repository. It uses a simple Riemann sum to compute the integral, but doing so allows reusing parts of the sum, which, when you are calculating an integral for each data point, saves a LOT of time!

Import or source your model calc_velocity, and then proceed to define your density, prior, likelihood, and pdf functions. Sample from the posterior distribution using MCMC over at least 2000 iterations, and feel free to using existing Python or R libraries to help with the MCMC here. Starting parameters around r_0 = 7\times10^8\,\text{m} and \rho_0 = 1500\,\text{kg}/\text{m}^3 works fairly well. Looking at your parameter evolution over the course of their walks, you may be concerned that they don’t really seem to be converging on anything. This is actually ok! Given our model and our data, there is a huge range of possible r_0 and \rho_0 values which could fit our data very well. The important thing will be the relationship between them. Choose a reasonable burn-in point and plot a 2d KDE plot of the distribution of r_0 vs \rho_0. You should see that, despite the range of possible values, r_0 and \rho_0 have a quite tight relationship with one another in order to fit the data.

Randomly select from the walked parameter chain to create an ensemble of 100 model results at different distances from the Sun, and plot them against the original data. You’ll likely notice that, despite there being a significant range in the values of r_0 and \rho_0, the spread of your plotted models on the plot is basically non-existent!

Part B: Galaxies with a Poor Model

The file UGC06787_rot.csv contains rotational speed information for the galaxy UGC 6787. While it looks very similar to the planet data, note that the units on the distances are very different!

You might need to slightly tweak your prior or likelihood functions, but then go ahead and still use the same density and velocity model to sample from the posterior distribution for a fit to this new data. New starting values around r_0 = 1\times10^{20}\,\text{m} and \rho_0=5\times10^{-20}\,\text{kg}/\text{m}^3 worked well for me. Plot both the 2d KDE plot of the parameter distributions, as well 100 sampled model fits against your data. You’ll notice here that, while the model is clearly a rubbish fit to the data, our sampling of parameter distribution still creates a fairly tight range of models on the plot. MCMC sampling tells us about how the parameters vary in parameter space as we seek to maximize the likelihood: it does not tell us anything about how well we did in maximizing the likelihood!

Part C: Galaxies with Dark Matter

Clearly our model is missing something, which, as we discussed in class, we might ascribe to dark matter. We need to thus add a term to our density function to account for the density distribution of dark matter throughout the galaxy. There are a few different approaches here, but one common one is called the Burkert profile, and looks like: \rho_{dm}(r) = \frac{\rho_1}{\left(1+\frac{r}{r_1}\right)\left(1 + \left(\frac{r}{r_1}\right)^2\right)} This can be added to your density of the “light” matter from Part A, such that your new density would look like: \rho_{tot} = \rho_{lm} + \rho_{dm}

Redefine your density function to account for both sources of mass. You’ll probably need to update your prior now as well since you have a total of 4 parameters running around. Sample from the posterior distribution with around 2000 length chains. Starting values around r_0 = 2\times10^{19}\,\text{m}, \rho_0 = 1\times10^{-18}\,\text{kg}/\text{m}^3, r_1 = 1.25\times10^{20}\,\text{m}, and \rho_1 = 2\times10^{-20}\,\text{kg}/\text{m}^3 worked well for me. This may take some time, but it shouldn’t be too bad with the velocity model I have provided you! Plot all the 2d parameter combinations (both 1d histograms and 2d histogram combinations. If in Python, using something like the corner library can greatly streamline this, or, if in R, using ggmcmc and then ggs_pairwise can help), and then sample 100 of the parameter combinations to model and plot against your data. It should look considerably better than in Part B!