 MMI 409 - Introduction to Biostatistics

## Course Description:

This course covers classic statistical inference. Applications and interpretation of data are emphasized. Mathematical proofs and derivations are not covered; however, theory is addressed conceptually. Readings are intended to be theoretical. Synch sessions, homework and exams will focus on applying statistical procedures using SPSS and interpreting data.

## Learning Objectives:

 Define frequency distributions and percentiles Define central  tendency and variability Define  z-scores Define normal  distributions Define inferential  statistics Define probability Define population variances Define analysis of variance Define one way regression Define multiple regression Define correlation Define Chi-square Define factor analysis Define logistic regression

## What I Learned:

I was a bit nervous for this class in all honesty.  While I like Math and have taken a lot of statistics for my undergraduate degree, it is not something you retain and can recall easily unless you use it frequently. For these reasons, I was a bit apprehensive in taking the class and went into it about as eager as one is to rip a Band-Aid off and enrolled  with similar intent to do it to get over it it and fast!  Knowing this class had a proctored final only added did not help with the anxieties I and most of the students had going into it. On the first night of class, I learned that many of my fellow cohort felt the same way and at that, I was relieved.  One aspect that excited me is that I would be in the class with friends that I had worked with before and was encouraged that we could learn together.

It turns out that my fears were all for nothing.  The class was not founded on memorizing mathematical algorithms and formulas or even doing much math at all.   The focus was mainly on understanding how to see meaning in the numbers and interpret the results.  We did have to have a solid understanding in the statistical methods used and the theories behind them but we the approach to doing the math was equal to how one would do it in the real world. We got hands on experience using IBM's SPSS software package. Dr. John Gatta was our professor and it is an  understatement to say how wonderful and gifted he is for explaining a complicated subject matter like Biostatistics in simple terms that everyone can  understand.  To start the class, he had hoped to try an asynchronous teaching method to pilot a direction the program would be taking. He quickly saw that it wasn't going to work and adjusted accordingly.

With every week, we focused on different statistical methods to gradually build our knowledge and proficiency at using biostatistics. Each week had a corresponding assignment to enforce the methodology being taught.  We started off easy at first to become familiar with SPSS and to understand  sample populations, frequency distributions and percentiles. Understanding normal distributions, variability and central tenancy were key to our continuing forward on the other topics and assignments. Dr. Gatta laid out the standard approach which would be leveraged for the remainder of our biostatistics course in teaching us about Hypothesis Testing.   This method involves the following steps and questions in each stage and was taken from one of Dr. Gatta's presentations..

Step 1 - Check the Assumptions
• What do we know about the population?
• The Mean? The standard deviation?
• Is the population normally distributed or the sample size large enough?
• Is the data a result of simple random sampling?
• What type of data do we have?
• Interval or Ratio Scale?
• Continuous?
Step 2 - Generate the Null and Alternative Hypotheses
• The null hypothesis states that there is no change or difference between groups
• For parameter estimation, the null hypothesis states that the parameter is a specific value. When embarking on statistical analysis, start out with the assumption that the experimental variable has had no effect.

H0:  Ma = Mb
H1:  Ma ≠  Mb

Step 3 - Select our test statistic
• What statistical test should we use given the data and parameters?

Step 4 - Set the significance level and establish the decision rule

Step 5 - Compute statistics

Step 6 - Apply decision rule and draw inference

• ONLY IF we reject the null hypothesis do we really draw a conclusion.
• You CANNOT "prove" the null hypothesis.

The rest of the class and the statistical tests we used all relied on understanding the approach. There were plenty of problems available to test our understanding of the hypotheses test approach.  Throughout the course we refined our ability to use it.  We learned how to do simple T-Tests, One and Two Sample Inference, Analysis of Variance (ANOVA) and the General Linear Model, Regression, Multiple Regression, Categorical Data Analysis and Logistic Regression.  There were no written assignments or papers. Instead, our retention of the principles taught were tested through a Midterm and proctored Final Exam.

Another great tool that Dr. Gatta gave us to help us know when to apply what statistical test was the below.

Dependent Variable Independent Variables Statistical Method (General Linear Model)
Continuous 1 Categorical variable with 2 levels T-Test
Continuous 1 Categorical variable with more than 2 levels or multiple categorical variables Analysis of Variance (ANOVA)
Continuous Continuous Linear Regression
Continuous Continuous and Categorical Analysis of Covariance (ANCOVA)

I will say, of all the Professors I had in the program Dr. Gatta was probably the best in terms of ability to teach such a complex material.  I left the class with confident that I could use the procedure of hypothesis testing to make sense of statistical study data in Medical Informatics.  I felt comfortable with IBM SPSS and could use it again in my own research and study.  I had a better understanding of predictive modeling and an appreciation for the science involved to positively predict in a population.