Homework 4

For the problem below, the expectation is that you submit a standalone HTML file (any images should be embedded) back to GitHub. I am resuing the groups from HW3 for this assignment, so you shouldn’t need to create or join groups again. Just make sure that both of you still accept the assignment.

Accept Assignment

Problem: Hertzsprung-Russell Patterns

Part A - Looking at Everything

The gaia.zip file in the data folder contains a CSV which contains magnitude and parallax (in milli-arcseconds) information for stars within 200pc of Earth as measured by the Gaia mission. The Gaia mission uses 3 main filters for taking images: a G band, a B band, and an R band. The official image below showcases the range of wavelengths each of those bands is sensitive to:

Spectral sensitivity of Gaia’s three main bands. Note that the G band covers the greatest range of wavelengths, largely accounting for both the R and B band regions.

The prime thing to keep in mind is that the G band is the most representative, covering mostly the sum of the B and R bands. Information about color though can be gotten from taking the difference between the B and R bands (in essence, asking the question: “Is this star more blue or more red?”).

Your task here is to make an HR diagram using the approximately 1.3 million stars in this dataset. Some things to keep in mind as you set about doing so:

The observer’s model of an HR diagram will be more straightforward here, since everything is reported in magnitude values. So you’ll want the difference between the B and R magnitudes on the horizontal scale, and the absolute magnitude in the G band on the vertical scale
Computing absolute magnitude requires a distance, which you can compute from the provided parallax angle (remembering that the provided values of parallax are in units of milli-arcseconds.
HR diagrams always have brighter stars near the top, which means smaller absolute magnitudes. Thus, you will want to invert the y-axis (you can leave the x-axis unchanged if you did B - R for your difference: bluer stars will appear to the left, which is also where higher temperatures would be, as we’d expect from an HR diagram)
Making a scatter plot of 1.3 million data points is going to leave a messy blob in areas of concentrated stars. A better method of visualizing these dense regions on the diagram is to explore other options, perhaps using 2d histograms, KDE plots, or similar. And you can always use both: layering, for instance, a 2d histogram on top of the scatter plot, so that you get the benefits of both.
You are plotting a lot of points here. Be patient with your computer as it works though things. It will likely take longer than normal. I might suggest testing different plotting options or styles with a reduced sample of the dataset to speed up testing, and then run it with the full dataset once you are happy with everything.

Looking at your final diagram, are all the parts of an HR diagram that you would normally expect visible? Or, by only looking at stars within 200 pc, have we lost parts of the diagram?

Part B - Looking a Clusters

Also in the data folder you will come across two files: star_cluster1.csv and star_cluster2.csv, each of which contains stars located within a certain angular distance of the center of a star cluster. Both are again taken from the Gaia mission, and report the same 4 columns as in Part A above. Your task in this problem is relatively straightforward to describe: determine which of the two clusters is older.

Something to be aware of when looking at the datasets is that they represent all the stars within a small angular distance of the center of the cluster in the night sky. As such, they also contain stars that might be in front of the star cluster in question, or behind it and shining through. A vital step is therefore going to be filtering down the collection of stars to contain only those actually within the star cluster in question. How will you know what to filter? Both angular distances used to gather the stars were fairly small, so stars that are members of the cluster should represent the bulk of stars in the collection. And stars that are part of a cluster will not only be nearby in the night sky, but will also be similar distances away from us here on Earth. So you can look at the distribution of stellar distances in each dataset to determine the dominant distance, and then filter to only keep those stars.

Ensure that your computational essay provides evidence for how you came up with which cluster is older, and includes visuals as appropriate or useful.