Compiled by Mai Youa Miksic

Former Equity Research Fellow, CIEP

**Comparison group**: In a non-experiment research design, the group of individuals not receiving the treatment or intervention or receiving an alternative treatment or intervention is called a comparison group. Comparison groups are often used when random assignment cannot be done, and should not be confused with control groups.**Control group**: In an experiment, the group of individuals who do not receive the treatment or intervention is called the control group. A true control group only exists if random assignment was done properly. If no random assignment was done, then the group is called a comparison group.**Control variable:**In a regression, control variables are used to limit the effect of omitted variable bias. When trying to isolate the effect of the independent variable on the dependent variable, it’s important to include all other variables that may also influence the dependent variable (outcome of interest). These alternative possible causes of the dependent variable are then factored out of the regression equation.**Counterfactual**: A counterfactual refers to what would have happened in the absence of an event. For example, the counterfactual of a car accident happening is the car accident not happening. A true effect, therefore, is the difference between the event and the absence of the event. A counterfactual is never actually observed. Research designs and statistics strive to imitate a true counterfactual in order to estimate the effect of a treatment or intervention. In an experiment, random assignment is designed to create a statistical equivalent group, known as a control group, which acts as the counterfactual to the treatment/intervention group.**Dependent variable**: The outcome(s) of interest in a research study.**Difference-in-differences (DID)**: Often used by economists, DID examines the causal effects of policy reforms; it tries statistically to mimic the setup of an experimental design. DID is used when there are two groups being compared, one of which is exposed to the new policy while the other group is not; there must be two or more time periods in which the outcome variable has been measured. Using a complex regression method, the average treatment effect for one group is subtracted from the average gain for the comparison group.**Effect**: In statistics, the correct use of the word “effect” implies a cause and effect mechanism. Cause and effect can almost always only be inferred through an experimental design; although, other advanced research designs/statistical methods attempt to mimic the setup of experiments. To say that X has an effect on Y means that it has been proven that there is a causal relationship between the two. In order to prove that X has an effect on Y, three conditions must be met: there must be a relationship between X and Y, X has to precede Y, Y is not caused by some third outside factor. Effect is interchangeable with the term “impact” or the term “cause.”**Effect size**: Effect sizes measure the strength of a relationship between two groups, usually after a treatment or intervention has been implemented. Generally, an effect size can be determined by taking the difference of the means of the two groups and dividing it by the pooled (or shared) standard deviation of the two groups. More complex and common types of effect sizes include Cohen’s*d*and Hedges*g*. Traditionally, an effect size of less than 0.2 is considered small to negligible, 0.5 is modest, and anything greater than 0.8 is large (Cohen, 1988). Effect sizes tend to be smaller in education research than in other research fields.**Experiment**: This research design is considered the “gold standard” in education research. It is characterized primarily by random assignment. This ensures the creation of a treatment group and a control group, whose properties are statistically the same. In an experiment, the treatment group gets the intervention (typically a program or policy implemented). The results are then compared to the results of the control group, who did not get the intervention. An experiment is the only way to confidently determine the cause and effect of an intervention.**Intraclass correlation (ICC):**This term relates to hierarchical linear modeling, where ICC is a measure of how similar two individuals are in the context of nesting (or hierarchies). Humans are naturally nested, or are placed in hierarchies, within larger groups: children are nested within families, workers within companies, etc. When individuals are nested within the same grouping, they are likely to share similar experiences that can affect their outcomes. For example, students, nested within classrooms, share a teacher, classroom environment, curriculum, and schedule. As a result, these students are likely to be more similar to each other than students in a different classroom. ICC is a measure of the degree of dependence among these students, or the degree of homogeneity in the group. When measuring the outcomes, test scores for example, of all students across a school (or school district for that matter), ICC must be taken into account statistically in order to factor out the influence of these similarities. Not doing so would lead to biased estimates of the effects. Researchers measure ICC before deciding whether or not to use certain types of statistical techniques, such as hierarchical linear modeling (HLM).**Independent variable:**The variable of interest that is being investigated as a cause of an outcome variable (or dependent variable).**Hierarchical linear model (HLM)**: Also known as multilevel modeling, this statistical technique is used to adjust for nesting structures. Humans are naturally nested, or are placed in hierarchies, within larger groups: children are nested within families, workers within companies, students in classrooms, etc. When individuals are nested within the same group, they are likely to share similar experiences that can affect their outcomes. For example, students, nested within classrooms, share a teacher, classroom environment, curriculum, schedule, and more. As a result, these students are likely to be more similar to each other than students in a different classroom. Thus the test scores of students in one classroom will be theoretically more similar than the test scores of another classroom. Without using HLM, our estimates of the differences of these test scores between two classrooms would be biased because it did not take these similarities and differences into account. Methodologically, HLM is a complex form of regressions.**Omitted variable bias**: This occurs when a regression leaves out causal explanations for the dependent variable. A perfect regression would include all possible variables that could cause the outcome of interest. However, it is nearly impossible to include every variable or the right combination of variables that could explain a phenomenon. This is why regressions can never prove that one variable causes another variable. We use this term when we suspect that a regression equation is missing important variables.**Propensity score matching**: This technique minimizes selection bias by using matched pairs to control for background characteristics. This matching process creates a sort of “counterfactual” by comparing one individual in a treatment group to another in a comparison group who is so similar that he or she could represent that person. This method replicates the random assignment process that occurs in an experimental design. While matching techniques can be done manually, the process can be cumbersome, if not impossible. Propensity score matching makes the process manageable by assigning each individual a numeric score (a “propensity score”) based on selected sets of personal characteristics. The score represents the probability, or propensity, of the individual to select into the treatment group. Researchers then match students based on their propensity scores.**Random assignment**: The cornerstone of an experimental design, random assignment ensures that the treatment and control group are statistical equivalents of one another. Not to be confused with random selection/sampling, random assignment occurs when individuals are blindly assigned to a treatment or a control group.**Random selection/sampling**: This technique is used to create the sample for a study. Random selection/sampling, not to be confused with random assignment, is used to ensure the representativeness of the sample, which is necessary in order to draw conclusions about the general population based on a smaller sample. Random sampling occurs when individuals are chosen from a larger population in such a way that the probability of being chosen is the same for all individuals. A common way of doing random sampling is to assign all possible individuals a number and then use a computer program to generate a random number. If an individual’s number gets selected, he or she is thus put into the sample.**Regression**: Regressions are an extension of doing a correlation, but they allow researchers to include more than one independent variable. Regressions are useful in that they can predict the strength and direction (positive or negative association) of the relationship between two variables, but are limited in that they cannot prove that one variable causes another variable.**Selection bias:**This occurs when individuals select into a specific group based on predetermined characteristics. When selection bias is present, inaccurate estimates occur because the impact of the intervention cannot be isolated from the characteristics that caused the individual to be in that group in the first place. For example, it is difficult to estimate the effect of Catholic schools on students because families self-select into Catholic schools. Therefore, it is hard to determine if the academic outcomes of the students are a result of the school or if it’s a result of the family characteristics that resulted in the student being put into the school. Selection bias can be curbed with the use of random sampling/selection and random assignment.**Standard deviation**: This descriptive term refers to the spread of a data around a mean (average). When looking at data, researchers often want to have one number that can represent in general what the data looks like. Often this number is a mean (also known as an average). However, this number alone cannot fully represent what the data looks like. A standard deviation tells us how representative the average number is of the rest of the data. A low standard deviation means the data points are close to the mean, whereas a high standard deviation indicates the data points are spread far away from the average and from each other. For example, if the average test score is a 50 and the standard deviation is low, that means most students scored relatively close to a 50. But if the standard deviation is high, then that means that the test scores in general were far away from 50. A high standard deviation is usually an indicator that the mean is not a good representation of the data. A standard deviation is used either to solely describe data or is used in statistical formulas to calculate other measures; for example, standard deviation is used to calculate an effect size.**Type I error**: In hypothesis testing, this type of error occurs when a researcher finds an effect, but there was no actual effect (also known as a false positive). Of course, there is no way to determine if this actually happened or not since it is purely theoretical. Type I error is associated with the term alpha (α), which is the probability that we will make this type of error. Alpha is a threshold that researchers set for themselves, preferably before they conduct the study. Most often in social science and education research, alpha is set at .05, meaning there is a 5% chance of making the error. That means, over the long run, one out of every twenty tests will incur a type I error.**Type II error**: In hypothesis testing, this type of error occurs when the researcher fails to detect an effect when there is one present (also known as a false negative). Like type I errors, there’s no way to determine if a type II error actually happened since this is theoretical. Type II error is associated with the term beta (β), which is related to the power or sensitivity of a test in detecting an effect and is found by subtracting beta from 1 (1-β). Researchers in social science and education research often want to set beta at .95, which corresponds with alpha being set to .05.

**Further Reading**

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). Hillsdale, NJ: Erlbaum.

Hedges, L.V. (1981). Distribution theory for Glass’s estimator of effect size and related estimators. *Journal of Educational Statistics, 6*(2), 107–128.

Kreft, I., & De Leeuw, J. (1998). Introduction to multilevel modeling. London: Sage Publications.

Shadish, W.R., Cook, T.D., & Campbell, D.T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin Company.