Chapter 11 Analysis of Quantitative Data
1) Explain how a researcher codes, enters, and cleans data so that it can be used for statistical analysis. What procedures are involved in each of these steps?
Answer:
• Coding the data: Data coding means systematically reorganizing raw numerical data into a format that is easy to analyze using computers; rules are developed whereby certain numbers are assigned to the attributes of each variable.
• Entering data: Data are entered in a grid format where each row represents a respondent, subject, or case, and the column or a set of columns represents specific variables. Researchers can enter data into a computer by way of a code sheet, the direct-entry method, an optical scan, or bar code.
• Cleaning the data: The researcher verifies the accuracy of coding after data are entered into a computer in two ways: possible code cleaning involves checking the categories of all variables for impossible codes; contingency cleaning (or consistency checking) involves cross-classifying two variables and looking for logically impossible combinations.
Diff: 5 Type: ES Page Ref: 238–240
Learning Objective: 1. Explain that is meant by coding data.
Skill: 04. Expresses familiarity with the range of acceptable techniques/methods in social research
2) Describe three ways a researcher can display information about univariate statistics.
Answer:
• Frequency distribution: A table that shows the distribution of cases into the categories of one variable (i.e., the number or percent of cases in each category)
• Bar chart: A display of quantitative data for one variable in the form of rectangles where longer rectangles indicate more cases in a variable category
• Pie chart: A display of numerical information on one variable that divides a circle into fractions by lines representing the proportion of cases in the variable’s attributes
Diff: 3 Type: ES Page Ref: 240–241
Learning Objective: 2. Define and give examples of univariate analysis.
Skill: 40. Able to calculate, read, and correctly interpret univariate statistics
3) Describe each of the three measures of central tendency. What are the main differences between them? How are they affected by a normal versus a skewed distribution of data?
Answer:
• Mean: A measure of central tendency for one variable that indicates the arithmetic average (i.e., the sum of all scores divided by the total number of scores)
• Median: A measure of central tendency for one variable indicating the point or score at which half the cases are higher and half are lower
• Mode: A measure of central tendency for one variable that indicates the most frequent or common score
• Skewed distribution: If the frequency distribution forms a “normal” or bell-shaped curve (normal distribution), the three measures of central tendency equal each other. If the distribution is a skewed distribution (i.e., more cases are in the upper or lower scores), then the three will not be equal. If most cases have lower scores with a few extreme high scores, the mean will be the highest, the median in the middle, and the mode the lowest. If most cases have higher scores with a few extremely low scores, the mean will be the lowest, the median in the middle, and the mode the highest.
Diff: 4 Type: ES Page Ref: 241–242
Learning Objective: 2. Define and give examples of univariate analysis.
Skill: 40. Able to calculate, read, and correctly interpret univariate statistics
4) Why is knowing the variability or dispersion of a variable as important as knowing its central tendency? How is variation measured?
Answer:
• Two distributions can have identical measures of central tendency but differ in their spread about the centre.
• Variability has important social implications. For example, in city X, the median and mean family income is $35,600 per year, and it has zero variation. Zero variation means that every family has an income of exactly $35,600. City Y has the same median and mean family income, but 95 percent of its families have incomes of
$12 000 per year and 5 percent have incomes of $300,000 per year. City X has perfect income equality, whereas there is great inequality in city Y. A researcher who does not know the variability of income in the two cities misses very important information.
• Researchers measure variation in three ways: range (the largest and smallest scores), percentile (the score at a specific place within the distribution), and standard deviation (the “average distance” between all scores and the mean).
Diff: 4 Type: ES Page Ref: 243–244
Learning Objective: 2. Define and give examples of univariate analysis.
Skill: 40. Able to calculate, read, and correctly interpret univariate statistics
5) Describe each of the three techniques researchers use when deciding whether a relationship exists between two variables.
Answer:
• Scattergram: A diagram to display the statistical relationship between two variables based on plotting each case’s values for both of the variables
• Cross-tabulation: Placing data for two variables in a contingency table to show the number or percentage of cases at the intersection of categories of the two variables
• Measures of association: A single number that expresses the strength, and often the direction, of a relationship. It condenses information about a bivariate relationship into a single number.
Diff: 5 Type: ES Page Ref: 247–253
Learning Objective: 3. Explain the techniques of bivariate analysis.
Skill: 41. Able to calculate, read, and correctly interpret simple bivariate statistics
6) What are two ways in which statistical relationships can be described? Provide an example for each one.
Answer:
• Correlation: To be correlated means to vary together whereby cases with certain values on one variable are likely to have certain values on the other one (e.g., people with higher values on the income variable are likely to have higher values on the life expectancy variable).
• Independence: There is no association (i.e., no relationship) between variables (e.g., there is likely no relationship between the two variables “number of siblings one has” and “life expectancy”).
Diff: 4 Type: ES Page Ref: 247
Learning Objective: 3. Explain the techniques of bivariate analysis.
Skill: 41. Able to calculate, read, and correctly interpret simple bivariate statistics
7) What are five measures of association that are useful when interpreting bivariate statistics? Describe each one and also specify which level of data each one is applicable to.
Answer:
• Lambda: Used for nominal-level data and is based on a reduction in errors based on the mode and ranges between 0 (independence) and 1.0 (perfect prediction or the strongest possible relationship).
• Gamma: Used for ordinal-level data and is based on comparing pairs of variable categories and seeing whether a case has the same rank on each.
• Tau, or Kendall’s tau: Used for ordinal-level data where tau ranges from -1.0 to +1.0 with 0 meaning no association.
• Rho, or Pearson’s product moment correlation coefficient: Used only for data measured at the interval or ratio level and tells how far cases are from a relationship (or regression) line in a scatterplot.
• Chi-squared: Used for nominal and ordinal data; it has an upper limit of infinity and a lower limit of zero, meaning no association.
Diff: 8 Type: ES Page Ref: 252–253
Learning Objective: 3. Explain the techniques of bivariate analysis.
Skill: 42. Able to explain and correctly interpret statistical significance
8) Discuss the concepts of control variables and trivariate tables. What are three limitations of trivariate tables?
Answer:
• In order to meet all the conditions needed for causality, researchers want to “control for” or see whether an alternative explanation explains away a causal relationship. If an alternative explanation explains a relationship, then the bivariate relationship is spurious. Alternative explanations are operationalized as third variables, which are called control variables because they control for alternative explanations.
• A trivariate table has a bivariate table of the independent and dependent variable for each category of the control variable. These new tables are called partials. The number of partials depends on the number of categories in the control variable. Partial tables look like bivariate tables, but they use a subset of the cases. Only cases with a specific value on the control variable are in the partial. Thus, it is possible to break apart a bivariate table to form partials, or combine the partials to restore the initial bivariate table.
• Trivariate tables have three limitations. First, they are difficult to interpret if a control variable has numerous categories. Second, control variables can be at any level of measurement, but interval or ratio control variables must be grouped (i.e., converted to an ordinal level), and how cases are grouped can affect the interpretation of effects. Finally, the total number of cases is a limiting factor because the cases are divided among cells in partials.
Diff: 8 Type: ES Page Ref: 253–255
Learning Objective: 4. Describe the purpose of multivariate analysis.
Skill: 43. Able to interpret multivariate statistical relationships
9) Is a Type I or Type II error more likely if a 0.05 level is used? Explain.
Answer:
• Type I error: Falsely accepting the null hypothesis when in fact there is a causal relationship (usually occurs at a more precise level such as at the 0.01 level)
• Type II error: Indicates a relationship when in fact no causal relationship exists (random factors actually caused the results and usually occurs at the 0.10 level)
• 0.05 level is a compromise between Type I and Type II errors.
Diff: 5 Type: ES Page Ref: 258–259
Learning Objective: 5. Describe the relationship between inferential statistics, levels of significance, and Type I and Type II errors.
Skill: 43. Able to interpret multivariate statistical relationships
10) Describe, as simply as possible, what is meant by the statement “It is statistically significant at the 0.05 level.”
Answer:
• The level of statistical significance (usually 0.05) is a way of talking about the likelihood that results are due to chance factors; that is, that a relationship appears in the sample when there is none in the population.
• If a researcher says that results are significant at the 0.05 level, it means that one can be 95 percent confident that the results are due to a real relationship in the population, not chance factors.
Diff: 5 Type: ES Page Ref: 257–258
Learning Objective: 5. Describe the relationship between inferential statistics, levels of significance, and Type I and Type II errors.
Skill: 43. Able to interpret multivariate statistical relationships
Reviews
There are no reviews yet.