---
title: "Exo-Techniques"
author: Jed Rembold
date: March 6, 2025
slideNumber: true
theme: tokyo-night-light
highlightjs-theme: tokyo-night-light
width: 1920
height: 1080
transition: slide
---


## Announcements
- Reminder: Homework 3 deadline is next Tuesday at midnight
  - Debriefing poll **not available this weekend**, but instead from Tuesday night through Thursday night
- Homework 4 and new partners will be announced Tuesday
- Also on Tuesday we'll start into our new unit on galaxies

## Recap
  - By utilizing Kepler's 3rd law and the definition of center of mass, we can extract the mass of the exoplanet (assuming the star's mass is much larger)
  - If a planet orbits such that it happens to pass between its host star and Earth, we will detect a dip in the brightness observed. This is called a _transit_.
  - The depth of the transit dip corresponds to the ratio of the planet radius to the star's radius:
    $$\text{% light lost} = \frac{R_{planet}^2}{R_{star}^2}$$


## Discussing Today
- Quiz Discussion
- Rolling Averages
- Rebinning
- HW3 work time

<!--
## A Transit Example
- The data [here](../demos/transit_averaging.csv) contains some transit information on a simulated exoplanet. How does the exoplanet's radius compare to it's parent star's?
-->

# Quiz Discussion
## Aggegrate Results
::::::cols
::::col
\begin{tikzpicture}%%width=100%
  \begin{axis}[
      width=8cm,
      xlabel= Percent,
      ylabel= Number of Students,
      yticklabels={,,},
      ymajorticks=false,
      ylabel near ticks,
      ]
      \addplot [hist={bins=7, data min=40, data max=100}, fill=violet!70!blue!80] table[y=perc, col sep=comma] {../../images/quiz_data/quiz1_results.csv};
  \end{axis}
\end{tikzpicture}
::::

::::col
- Pre-curve:
  - Max: 95%
  - Mean: 76.5%
  - Median: 78.75%
  - St Dev: 16.2%
::::
::::::

## Quiz Talk
- Be aware that even if you split HW problems across partners, you are still responsible for knowing and understanding the content from all questions
- I think many could have benefitted from a bit more studying, or perhaps just looking at the study materials a little more
- Quizzes can be a bit more punishing owing to limited points, plus this was the first one in this class, so perhaps you didn't know what to expect
  - As such, I've added 1 point (5%) to your total score in the gradebook
- Grade Reports will be coming out as soon as I finish scoring HW2


# Rolling Away
## Dealing with Noise
- There will often times be noise that detracts from or obscures a signal
- In astronomy, this commonly comes from thermal sensor noise or atmospheric effects
  - Can show up in any signal though where measurement-to-measurement variations obscure a longer pattern in the signal
- One method to try to eliminate this noise is with Fourier Transforms
  - Set all signals in the frequency domain to 0 except your main signal and then inverse it back to the time domain
  - We can't do that (easily) for non-uniform measurements though
- What other options might we have?

## Rolling Averages
- A _rolling average_ computes an average for **each** point in a data series, usually taking into account a certain number of observations on either side of the point in question
- This is the same result as convolving a tophat function with a certain width with the noisy signal
- The width of the tophat or "window" directly affects the amount of smoothing that will be seen: bigger windows smooth things more
- Algorithms can vary, but the basic algorithm is just looking at position of data within the series, so it does **not** account for data spacing.
  - Data ordering matters!
  - This may still be fine for some non-uniformly sampled data, provided the window in only marginally larger than the average spacing

<!--
## In Python: Direct Convolution
- Doing the convolution directly means constructing the square wave in Numpy and then using Numpy's convolution function
  ```python
  window = np.ones(size) / size # necessary to scale!
  out = np.convolve(signal, window)
  ```
  - The output by default has length $L_{signal} + L_{window} -1$, so if you want to plot it against your original times, you need to mask off the last parts
- You **really** need to ensure your signal points are ordered by your x-axis for this to work! The easiest way to achieve this is with `np.argsort`:
  ```python
  sorted_idxs = np.argsort(ts)
  sorted_times = ts[sorted_idxs]
  sorted_signal = signal[sorted_idxs]
  ```
-->

## In Python: With Pandas
- Assuming you already have your data in a dataframe, you **still need to ensure it is ordered!**
  ```python
  df = df.sort_values('ts')
  ```
  where `ts` is whatever column you want to sort by
  - This just reorders the index, so be aware if you try to extract just a column by itself
- Computing the rolling average is then straightforward:
  ```python
  df['rolling'] = df.sig.rolling(wsize).mean()
  ```
  where `sig` is whatever column you are computing the average over, and `wsize` is the size of the window you want


## In R: With Zoo
- You still want to ensure your data is ordered! Can use `arrange` if using Tidyverse
  ```R
  df <- arrange(df, colname)
  ```
- Easiest to use `rollmean` from the `zoo` library for the rolling averages
  ```R
  library(zoo)
  df <- df %>% 
    mutate(
      rolling = rollmean(colname, k=wsize, fill=NA)
      )
  ```

# The Bin Man
## Rebinning
- Sometimes, you just need to take non-uniformly sampled data and cast it into something more consistent
- If the data is noisy, this can also help cut down on the noise
- The idea is to compute new, evenly spaced bins, and then compute the mean (or median) of all the points that fall within that bin
  - Some bins might not contain any points, and that is ok! They can just get assigned null values.
- You could then proceed to do anything you would do with uniformly sampled data: Fourier transforms, further rolling averages, etc
  - Do be careful about null values with Fourier transforms though. They aren't handled well, and you probably need to interpolate.


## Rebinning in Pandas
- You can either specify the bins with a list, or have Pandas compute the bins for you given a desired number of bins
  - I find the former better when I have specific data that I'm trying to rebin
- Key function in Pandas is `pd.cut`, which takes several arguments:
  - The array that you are trying to bin by
  - The bins or number of bins that you desire
  - How you want to handle the bin labels
  ```python
  bin_labels = pd.cut(df.ts, bins=np.arange(0,10,0.1), 
                      labels=False)
  ```
    - This gives labels as **indexes**. I prefer to know the start of the bin, so I multiply by the bin step size


## Rebinning in R
- Similar to Pandas, you can use the `cut` or `ntile` functions to compute a binning:
  ```R
  df %>% mutate(
    bins = cut(colname, breaks=seq(0,10,0.1)
    )
  ```
  gives your bins as intervals, whereas
  ```R
  df %>% mutate(bins = ntile(colname, n=100))
  ```
  gives the bins as integers

# Work Time!
## HW Work Time
- The rest of the class is yours to work with your partner on HW3