# r program that uses the framingham xlsx data to do the following

1. Read the data into a R dataframe.

2. Create a pie chart that shows the frequency of the education level of the study participants. Make sure the graph has appropriate title, labels, and font size, and make sure the order of the categories in the graph makes sense.

3. Create a horizontal barchart that shows the frequency of the different weight statuses among the study participants. Make sure the graph has appropriate title, labels, and font size, and make sure the order of the categories in the graph makes sense. Display the frequency of each category on the top of corresponding bar.

4. Create a visual display that displays the frequency of the weight statuses within each smoking status. Make sure the graph has appropriate title, labels, and font size, and make sure the order of the categories in the graph makes sense.

5. Perform an appropriate statistical analysis to determine if the smoking status is related to the weight status of a person. What do you conclude based on your analysis?

6. Find the min, max, mean, median, sd, and IQR of the cholesterol levels of the study participants. Make sure to present the numbers with only 2 decimal digits.

7. Create an appropriate graph to display the Glucose levels of the study participants. Make sure the graph has appropriate title, labels, and font size, and make sure to use appropriate class endpoints.

8. Create a graphical display that compares the heart rate of those who had previous stroke with those that did not. Make sure the graph has appropriate title, labels, and font size, and make sure the order of the categories in the graph makes sense. Does the graph imply that there is a difference in the heart rate between the two groups?

9. Find the mean and standard deviation of the heart rate of each of the two groups defined by whether they had previous stroke.

10. Use an appropriate statistical procedure to check if there is a difference in the heart rate of those who had previous stroke with those that did not.

(a) Provide appropriate graphical displays to check the assumptions of the method you used. Make sure the graph has appropriate title, labels, and font size, and make sure the order of the categories in the graph makes sense.

(b) Based on the graphs, clearly justify your choice of the method you opted to perform.

(c) What do you conclude based on your analysis?

11. Create a graphical display that compares the cholesterol levels of the groups formed by the weight status of the study participants. Make sure the graph has appropriate title, labels, and font size, and make sure the order of the categories in the graph makes sense. Does the graph imply that there is a difference in the heart rate between the two groups?

12. Find the mean and standard deviation of the heart rate of each group defined by the weight status of the study participants.

13. Use an appropriate statistical procedure to check if there is a difference in the heart rate of those who had previous stroke with those that did not. clearly justify your choice of the method you opted to perform.

(a) Provide appropriate graphical displays to check the assumptions of the method you used. Make sure the graph has appropriate title, labels, and font size, and make sure the order of the categories in the graph makes sense.

(b) What do you conclude based on your analysis?

14. Create a visual display that helps you figure out if there is a relationship between the systolic and diastolic pressure of a person. Use different symbols/colors to distinguish the hypertension status of a person. Make sure the graph has appropriate title, labels, and font size.

â€framingham.xlsxâ€ Descriptions:

You can also find the data description in the excel file under the â€Description Tabâ€.

Background :

World Health Organization has estimated 12 million deaths occur worldwide, every year due to Heart diseases. Half the deaths in the United States and other developed countries are due to cardio vascular diseases. The early prognosis of cardiovascular diseases can aid in making decisions on lifestyle changes in high risk patients and in turn reduce the complications. This research intends to pinpoint the most relevant/risk factors of heart disease as well as predict the overall risk using logistic regression Data Preparation Source The dataset is publically available on the Kaggle website, and it is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. The classification goal is to predict whether the patient has 10-year risk of future coronary heart disease (CHD).

Data Description:

The dataset provides the patientsâ€™ information. It includes over 4,000 records and 15 attributes/Variables. Each attribute is a potential risk factor. There are both demographic, behavioral and medical risk factors.

Demographic:

â€“ Gender: male (1) or female (0)

â€“ Age: Age of the patient truncated to whole numbers

â€“ Education (1 = High School or less, 2) Trade School, 3) Bachelorâ€™s Degree, 4) Graduate degree

Behavioral:

â€“ Current Smoker: whether or not the patient is a current smoker

â€“ Cigs Per Day: the number of cigarettes that the person smoked on average in one day.

Medical( history):

â€“ BP Meds: whether or not the patient was on blood pressure medication

â€“ Prevalent Stroke: whether or not the patient had previously had a stroke

â€“ Prevalent Hyp: whether or not the patient was hypertensive

â€“ Diabetes: whether or not the patient had diabetes

Medical(current):

â€“ Tot Chol: total cholesterol level

â€“ SBP: systolic blood pressure

â€“ DBP: diastolic blood pressure

â€“ BMI: Body Mass Index

â€“ Heart Rate: heart rate

â€“ Glucose: glucose level

â€“ TenYearCHD: 10 year risk of coronary heart disease CHD (binary: â€1â€, means â€Yesâ€, â€0â€ means â€Noâ€)

â€“ WeightStatus: Underweight (BMI < 18.5), Normal (BMI 18.5 – 24.9), Overweight (BMI 25.0 – 29.9), Obese (BMI > 30.0)