Lab 6A: Chi-squared test

Goals for this lab.

Identify when chi-square test techniques should be implemented.
Accurately apply the formulas for calculating the chi-square statistic
Identify when valid conditions are satisfied for a theory-based chi-square test.
Find theory-based p-values for chi-square test.
Draw appropriate conclusions from chi-square test.

Setup and packages

As usual, we start by loading our two packages: mosaic and ggformula. To load a package, you use the library() function. I’ve put the code to load one package into the chunk below. Add the other package you need.

library(ggformula)


# put in the other package that you need here

Nearsightedness and Nightlights Revisited

Recall the study investigating whether there is a relationship between use of night lights in a child’s room before age 2 and the child’s eyesight condition a few years later. In Chapter 4, we presented a two-way table of counts from the study examining whether the child eyesight was associated with the light in the room while sleeping. We will recreate that two-way table of counts.

Load and explore the data

We’ll load the example data, LightSightData1.csv from this Url: https://raw.githubusercontent.com/IJohnson-math/Math138/main/LightSightData1.csv and since this is a csv file we will use the read.csv() function.

Load and look at the data using the glimpse command and by looking in the data file after it is loaded

#load the data here
#NightlightData <-

The explanatory variable and variable type: light level while sleeping (dark/ nightlight/ room light) a categorical variable

the response variable and variable type: sight (nearsighted/not nearsighted) a binary categorical variable

To study a potential association between nearsightedness and bedroom light levels, we use the parameters $\pi_{D}$, $\pi_{NL}$, and $\pi_{RL}$. Explain the meaning of these parameters in the space provided below.

$\pi_{D}$

$\pi_{NL}$

$\pi_{RL}$

Our hypotheses in words are:

\[H_0:\textrm{ The proportion of near-sightness is the same in each of the three groups } \] \[H_0:\textrm{ At least one of the population proportions is different }\]

Equivalently, our hypotheses in notation are

\[ H_0: \pi_{dark} = \pi_{NL} = \pi_{RL}\] \[ H_a: \textrm{ not all of } \pi_{dark}, \pi_{NL}, \pi_{RL} \textrm{ are equal}\]

Create a two-way table of counts and a two-way table of proportions for the data. Remember to use the correct order for the explanatory and response variables in your code.
Are the validity conditions for a chi-square test met? Explain why or why not, and what you are checking.
Create a segmented bar graph for the data. Use contrasting colors in your graph that will display nicely even if the document is printed in black and white. Give your graph a title and label your axes.

Recall, one way to calculate the chi-square statistic, $\chi^2$, is to

standardize our sample proportions,
square these standardized values, and
add them up.

\[\displaystyle{\chi^2 = \stackrel{\Large \Sigma}{\small \textrm{i groups}} \left( \frac{\hat{p}_i - \hat{p}}{\sqrt{\frac{\hat{p}(1-\hat{p})}{n_i}}}\right)^2 }\]

where $n_i$ is the sample size of group $i$ with success proportion $p_i$. The pooled proportion of success is denoted by $.

Find: the overall pooled proportion of children who are near-sighted. Find, name, and enter into RStudio: the sample size of each group and the proportion of nearsighted children in each group. Calculate the standardized z-statistics and, from those values, the chi-square statistic. Make sure to write your code so that the value of $\chi^2$ is displayed.

#total sample size for each group



# the pooled proportion of nearsighted children



# proportions of nearsighted children



# standardize proportion darkness



# standardize proportion night light



# standardize proportion room light



#chi-square statistic
#chiSq <-

Go to the ISI Applets and use the chi-square statistic to calculate a simulation-based p-value for the NightLight1 data. Note the data is pre-loaded in the applet. Report your p-value below.

simulation-based p-value:

Use the chisq.test function to calculate a theory-based p-value. Write your executable code in the code chunk below. Record your theory-based p-value below. Does your value of the chi-squared statistic match the values calculated in the applet and with the chisq.test function?

#theory-based p-value from a chi-square test

theory-based p-value:

$^2 = $

Write your conclusions below. Does this study suggest that use of night lights and room lights causes an increase to the chance that a child is nearsighted? Why or why not? Do you have any concerns regarding the conclusion of this study? Explain.

Significance with context:

Causation:

Recall, there is another more general formula for calculating the chi-square statistic.

\[ \chi^2 = \ \stackrel{ \Large \Sigma}{\small \textrm{all cells}} \frac{(\textrm{observed count } - \textrm{ expected count})^2}{\textrm{expected count}}\]

Use the worksheet from class to calculate $\chi^2$ again using the more general formula. Include an image of your completed worksheet as shown below

Lab 6A: Chi-squared test

Put Your Name Here

Goals for this lab.

Setup and packages

Nearsightedness and Nightlights Revisited

Load and explore the data

Calculating Chi-squared general formula.