Read the lecture and respond to the discussion questions with reference
Discovering Relationships and Building Models
Additional tools useful for evaluation of quantitative data include correlation and regression. Both of these techniques investigate the relationship(s) between two or more quantitative variables. Similarities and differences between these two techniques will be discussed. The focus for this lecture will be on relationships of just two variables at a time.
Correlation is used to determine the type and extent of relationship between two variables. The correlation coefficient r is used to describe the relationship. This coefficient can be positive or negative, and can take on any value between -1 and 1, inclusive. A positive correlation is obtained when the scatter of data points representing the two variables slopes upward to the right; as one variable increases, so does the other. A negative correlation shows data points sloping downward to the right; as one variable increases, the other decreases. An r-value of 1 indicates a perfectly straight line of points sloping upward. An r-value of -1 indicates a perfectly straight line of points sloping downward. A value of zero indicates no relationship. This could be a “shotgun” scatter of points, or even a perfectly horizontal or perfectly vertical line. A key point about correlation is that it indicates only relationships, not cause and effect.
Correlation is calculated through Pearson’s correlation coefficient. The formula is as follows:
where x and y are the variables whose relationship is in question. Remember r is a measure of the strength of a linear association between two variables (Brase & Brase, 2010).
Examples of these calculations are shown in the Visual Learner: Statistics.
Regression, on the other hand, implies causative relationship. Previous experiments, or other evidence, may indicate that the two variables of interest have the sample relationship in a variety of settings and conditions, or in carefully controlled conditions. In this case, one variable is explanatoryâˆ’meaning that it explains changes in the other variable. The other variable is a response variable; it responds to changes in the explanatory variable. Typically, the response variable is designated by y and the explanatory variable by x. The equation of the regression equation is as follows:
where b0 is the y-intercept and b1 is the slope of the line (Brase & Brase, 2010).An example of the calculation is shown in the Visual Learner: Statistics.
Parametric tests are used on data that are assumed to fit a normal distribution and follow the assumptions associated with normal distributions. Nonparametric tests do not assume a normal distribution. The main distinction for the purpose of this class, however, is that nonparametric tests are typically used for categorical and qualitative data. The main nonparametric test discussed in this course is the chi-square test.
Chi-square tests do not assume a normal distribution. The distribution for the chi-square value is right-skewed, and the farther to the right (i.e., farther along the tail) the calculated chi-square value is, the more likely it will be declared significant. The chi-square test for independence is used to determine whether two categorical variables are independent. This test is a hypothesis test and utilizes the seven steps of hypothesis testing. An example of the chi-square test for independence is in the Visual Learner: Statistics.
Correlation and regression are simple but powerful tools used to describe, simplify, and analyze quantitative data. They yield information as to the relationship of variables and, in the case of regression, provide evidence for cause and effect. Chi-square tests are useful for categorical/qualitative data and provide a means to compare selection preferences. They can also be used to compare the selection preferences of different populations in the test for independence. When deciding on the appropriate test to use, it is necessary to determine the data type and the goal of the analysis.
Brase, C., & Brase, C. (2010). Understanding basic statistics (5th ed.). Belmont, CA: Cengage Learning.
Describe the error in the conclusion. Given: There is a linear correlation between the number of cigarettes smoked and the pulse rate. As the number of cigarettes increases the pulse rate increases. Conclusion: Cigarettes cause the pulse rate to increase.
Now that you are familiar with the basic concepts of statistics, what are some examples of when you have seen or heard statistics used inappropriately?