sas problem 8

Please copy and paste all the SAS code, log, and output, fill out the table.

About the project: the objective is to determine whether there is any difference between students who took this course this semester online compared to in-class.

  • EXCEL datasets EPI626 Online and EPI626 InClass contains selected data collected from students enrolled in this course. Please use the data to fill out the table below.
  • Table to be filled out*:
  • General Grading of Final Assignment (required steps).

Student Characteristics

All

Online

In Class

p-value

N = ()

N = ()

N = ()

Age

Gender

Female

Male

Degree program

MPH

MSPH

Other

Number of Languages

1

2

3 or more

Hold Breath for 45 seconds or longer

Yes

No

*For continuous variables, report Mean (SD) or Median (IQR) as appropriate and specify which one you are reporting in the table.For categorical variables, report n and % of total N for each column (All, Online, In Class at top of table)

  • Import data sets
    • If you use Import Wizard, please be sure to request and save relevant SAS program
    • Do NOT modify data in excel sheets by hand prior to import!
  • Combine data sets
    • Make sure that in the combined data set, there is a variable which allows you to classify a student as enrolled in the online vs in class course.
  • Preparing and cleaning combined data set
    • Please check variable names (and types!)
    • Please use Dec 31 of this year as the reference date for age calculations for each respective group.
  • Saving the cleaned data set as a permanent data set
    • Please exclude anyone from the final cleaned data set if the person has a missing value for one or more of the final variables.
  • Producing descriptive statistics (cover all characteristic)
  • Producing p-values (cover all characteristic)
  • Final report. The final report should address the following:
    • Describe the process you went through to clean/recode the data. Essentially, your report should allow someone else to replicate your work, based on information provided in the report, and come to the same findings/conclusions.
      • For each variable:
        • How many observations have missing values
        • How many observations have implausible values (e.g., age of 101)
        • How many observations have inconsistencies to record the same value (e.g., f and F for female)
        • What did you do to handle the above situations and why?
        • What is the impact of the approaches you used to clean the data (e.g., observation excluded from analysis)?
      • What are the attributes of the cleaned data set that you saved permanently?
        • How many total included observations?
        • Number of observations having missing values for one or more variables
        • Distribution of each variable
      • What procedures did you use to obtain the descriptive statistics and the p-values?
        • Why?