Homework 13

For each of the problems below, the expectation is that you submit a standalone HTML file (any images should be embedded) back to GitHub. Please do just one problem per HTML file. All problems will either have data in the start repository, which you should accept by following the link below. Don’t forget that you are working in the groups indicated in the email! The first person to accept the assignment should form an easily identifiable group name (likely using the names of your group members). Others in the group should also accept the assignment, and then they can just join the existing group.

Accept Assignment

Problem: Structure of the Universe

The data provided in Hubble.csv contains redshift and distance information for approximately 1700 galaxies or supernova, gathered from the Pantheon+ and SH0ES observation campaigns, and represents about the “deepest” data we have with respect to the distance and redshift of extremely far away objects. The dataset expresses the distances as a distance moduli (\mu), which is the apparent magnitude minus the absolute magnitude, as per the same equation we have used in the past: \mu = m - M = 5\log_{10}(d_{pc}) - 5 This can be inverted to compute the object distances. Redshift values (z) can be converted to a velocity by multiplying them by the speed of light.

Part A: “Close” Objects

For this part you will be only looking at objects within 100 Mpc of Earth. Within this range, the changing acceleration of the universe has not yet had a chance to play a strong role, and so the relationship between redshift velocity and distance will be linear: cz = H_0 d Construct a function to model this relationship, being careful that we classically report H_0 in units of km / s / Mpc, and so your velocity units should be in terms of km/s and your distance units in terms of Mpc. Use MCMC to sample the posterior probability distribution, using Python or R libraries if you like (I’d suggest it). Show graphical evidence of how much burn-in you ended up dropping, and visualize the resulting distribution of the single parameter H_0. Approximately what value of H_0 did you find, and what is your uncertainty (standard deviation) in this value? Randomly select 100 sampled Hubble constants and plot your model results back against the nearby points on a Hubble diagram to ensure that your fit looks reasonable.

Part B: All Objects

If you plot your H_0 fit for the close objects on a plot with all the data, you’ll notice that, while it nicely captures linear data as small distances, as objects get further away they start deviating from this linear trend. This is the result of the light from those objects having traveled so long to reach us that the rate of the expansion of the universe has changed over the journey. Attempting to model this data then requires us to factor in what we know (or suspect) about the nature of the universe. The easiest way to interpret this is that the value of H has changed over the lifetime of the universe, which from our perspective means that the value of H for objects really far away (at high redshift z) is different from objects nearby. That is to say, H is no longer a constant, but instead a function of z. As such, when we go to determine the relationship between an object’s distance and its speed (or redshift), we actually need to integrate over all the “past” H values that light has traveled through: d = (1+z)c \int_0^z \frac{dz^\prime}{H(z^\prime)} where we can compute H(z) as: H(z) = H_0 \sqrt{\Omega_M(1+z)^3 + \Omega_k(1+z)^2 + \Omega_\Lambda} Here,

H_0 represents the value of the Hubble constant now. You’d expect this to come to a very similar number to what you got in part A.
\Omega_M represents the overall matter density in the universe. This includes both the matter we can see (light matter) and dark matter. Greater amounts of this result in a universe that should slow as it expands, owing to the inward pull of gravity. This is a unitless number, and should be between 0 and 1.
\Omega_k represents the overall curvature of our universe. \Omega_k values of near 0 predict a flat universe, whereas larger values indicate positive curvature and negative values indicate negative curvature. It is less an actual fractional proportion and more a “value to account for curvature”, and as such this value will lie between -1 and 1.
\Omega_\Lambda is the cosmological constant, and generally represents the amount of dark energy in our universe. Greater amounts of this result in a universe that accelerates as it expands. Like \Omega_M, this value should lie between 0 and 1.

All told, since \Omega_M, \Omega_k and \Omega_\Lambda are fractional proportions, they should all add up to 1: \Omega_M + \Omega_k + \Omega_\Lambda = 1 Please note that if using this restriction when defining your prior, you need to make it a little looser: it will generally be impossible for random steps to result in values that sum exactly to 1.

Clearly, in this situation, the model needs to change. Writing out the new expression for H is not the difficult part, but rather dealing with the integration that needs to happen. Again, a numerical integration solver to help crunch the necessary integral in your function, such as quad for Python or integrate in R, would be commonly be helpful, but the tend to be slow here. So I have also provided you with a model for this part of the problem called calc_distances that you can use. Also note that the dependence of your model needs to flip! Whereas in Part A you could look at distance as the independent variable and velocity / redshift as the dependent variable, given how H depends on z, you really need to look at z as the independent variable here and the distance as the dependent variable.

Once you have the model imported or sourced, go ahead and set up your ln_prior, ln_likelihood and ln_pdf functions, as you did for Part A. Be aware that you have more information you can feed your prior in this case owning to what you know about the different \Omega values. For your likelihood, the awkward part about switching z to the independent variable is that we now need different uncertainties (the distance is now our dependent variable). You can compute these, because we know the uncertainty in the distance modulus \mu. However, because of the non-linear expression to convert the distance modulus to a distance, doing so is a bit less straightforward. Thus, I am just giving you the expression below, which results in the uncertainty in the distance in units of parsecs. Generally it was worked out from some calculus and the chain-rule:

\sigma_d = \frac{\ln 10}{5} \cdot 10^{1+\mu/5} \cdot \sigma_\mu

Run the MCMC sampler for a variety of starting points. Choosing points near the H_0 value you got earlier and with starting values around 0.3, 0, and 0.7, respectively, for \Omega_M, \Omega_k, and \Omega_\Lambda. Gather enough samples that you can drop out the burn-out portions and still have a reasonable population to create your distributions from.

Important

Because of the fact that you have to do this numeric integral everytime you take a step, sampling posterior can take a while! I’d highly recommend using the progress flag if using emcee so that you can keep an eye on how far along you are. Otherwise, be patient, and start with shorter runs first to ensure things are working before trying longer runs.

Create distributions of each of the parameters and determine their most likely value (the median) and uncertainties. Randomly select 100 combinations of these parameters and plug them into your model, plotting them atop your original data to ensure that it fits your data far better than the linear fit from earlier. Comment on how well your determined parameters match the current day accepted values.