Identify when chi-square test techniques should be implemented.
Accurately apply the formulas for calculating the chi-square statistic
Identify when valid conditions are satisfied for a theory-based chi-square test.
Find theory-based p-values for chi-square test.
Draw appropriate conclusions from chi-square test.
As usual, we start by loading our two packages: mosaic
and ggformula
. To load a package, you use the
library()
function. I’ve put the code to load one package
into the chunk below. Add the other package you need.
library(ggformula)
# put in the other package that you need here
Recall the study investigating whether there is a relationship between use of night lights in a child’s room before age 2 and the child’s eyesight condition a few years later. In Chapter 4, we presented a two-way table of counts from the study examining whether the child eyesight was associated with the light in the room while sleeping. We will recreate that two-way table of counts.
We’ll load the example data, LightSightData1.csv
from
this Url: https://raw.githubusercontent.com/IJohnson-math/Math138/main/LightSightData1.csv
and since this is a csv file we will use the read.csv()
function.
glimpse
command and
by looking in the data file after it is loaded#load the data here
#NightlightData <-
The explanatory variable and variable type: light level while sleeping (dark/ nightlight/ room light) a categorical variable
the response variable and variable type: sight (nearsighted/not nearsighted) a binary categorical variable
\(\pi_{D}\)
\(\pi_{NL}\)
\(\pi_{RL}\)
\[H_0:\textrm{ The proportion of near-sightness is the same in each of the three groups } \] \[H_0:\textrm{ At least one of the population proportions is different }\]
Equivalently, our hypotheses in notation are
\[ H_0: \pi_{dark} = \pi_{NL} = \pi_{RL}\] \[ H_a: \textrm{ not all of } \pi_{dark}, \pi_{NL}, \pi_{RL} \textrm{ are equal}\]
Create a two-way table of counts and a two-way table of proportions for the data. Remember to use the correct order for the explanatory and response variables in your code.
Are the validity conditions for a chi-square test met? Explain why or why not, and what you are checking.
Create a segmented bar graph for the data. Use contrasting colors in your graph that will display nicely even if the document is printed in black and white. Give your graph a title and label your axes.
Recall, one way to calculate the chi-square statistic, \(\chi^2\), is to
\[\displaystyle{\chi^2 = \stackrel{\Large \Sigma}{\small \textrm{i groups}} \left( \frac{\hat{p}_i - \hat{p}}{\sqrt{\frac{\hat{p}(1-\hat{p})}{n_i}}}\right)^2 }\]
where \(n_i\) is the sample size of group \(i\) with success proportion \(p_i\). The pooled proportion of success is denoted by $.
#total sample size for each group
# the pooled proportion of nearsighted children
# proportions of nearsighted children
# standardize proportion darkness
# standardize proportion night light
# standardize proportion room light
#chi-square statistic
#chiSq <-
simulation-based p-value:
chisq.test
function to calculate a theory-based
p-value. Write your executable code in the code chunk below. Record your
theory-based p-value below. Does your value of the chi-squared statistic
match the values calculated in the applet and with the
chisq.test
function?#theory-based p-value from a chi-square test
theory-based p-value:
$^2 = $
Significance with context:
Causation:
Recall, there is another more general formula for calculating the chi-square statistic.
\[ \chi^2 = \ \stackrel{ \Large \Sigma}{\small \textrm{all cells}} \frac{(\textrm{observed count } - \textrm{ expected count})^2}{\textrm{expected count}}\]
Calculating Chi-squared