Multilevel Modeling for Educational Research with applications in R
2020-10-14
1 Preface
Multilevel Modeling (MLM) is a powerful and flexible statistical framework for analyzing complex nested relationships across a wide variety of social science research problems. In education, we may be interested in factors that affect student achievement. Broadly, we may theorize factors associated with the school (school social groups, principal leadership, school size), the teachers (effectiveness of the teacher, specific expertise of the teacher, relationship of the teacher with the student), and the students themselves (motivation, previous achievement, general intelligence). Each of these factors associated with student achievement could be conceptualized as different “levels” of nesting. Students (at Level 1) are nested within classrooms (at Level 2), which are nested within schools (at Level 3), and each level potentially impacts student achievement. MLMs allows researchers to investigate these nested relationships and either parse them out (i.e., control for higher-level factors to examine the unique effect of a specific variable) or examine the impact of variables at the higher levels (e.g., the effect of attending public versus private school on student achievement). In other contexts, MLM may be used to investigate other nested relationships, including, for example, how employee interactions differ by the type of organization the employees belong to (e.g., corporate compared to local). In this book we will focus exclusively on the application of MLMs to educational research, but it’s important to keep in mind that the principles of multilevel modeling extend to a much wider variety of problems, including (as we will see) situations where the nesting is less clear. Consider, for example, neighborhoods and schools. The nesting becomes less clear here because students from the same neighborhood may attend different schools, while students from different neighborhoods may attend the same school. In this case, students would be nested in neighborhoods, and students would be nested in schools, but schools are not nested in neighborhoods or vice versa. These are referred to as non-nested or cross-classified models and are still readily handled by MLMs.
MLM is also particularly well suited for evaluating changes in data through growth models applied to longitudinal data. Growth models can be used to evaluate how individuals are changing over time on an outcome, and how specific variables at any level relate to where they started and/or the rate at which they change. For example, we could examine data from a cohort of students as they moved from kindergarten through grade 5 in urban and rural schools. We could then test whether the achievement of students attending urban schools differed significantly in kindergarten from students attending rural schools, and whether these same students differed in the rate at which they progressed during the 6 years of the study. We could also examine other characteristics of the students. For example, we could examine the rate at which students with one specific disability (e.g., autism spectrum disorder) progressed as compared to students with a different disability (e.g., learning disabled), and whether this observed relationship held for students in both urban and rural schools.
The purpose of this book is to introduce readers to the core concepts of MLMs as applied to cross-sectional and longitudinal data. MLM is a complex topic and few assumptions are made about readers’ familiarity with the topic outside of a basic understanding of regression. The book focuses on applied examples, including important decisions that analysts must make when building complex models—how to handle missing data in the predictors or the outcome, how estimated relations in longitudinal data carry forward (or not), what predictor variable vary randomly at higher levels, etc.
In the [first chapter][Basic Concepts], we begin with a brief explanation of nested data structures and some of the problems they can pose to if not modeled correctly. The primary components of a two-level model with cross-sectional data are then introduced and important elements of consideration discussed. The basic notation for models is also introduced, along with model building techniques and important statistics accompanying MLMs, including indicators of the overall model fit, the intraclass correlation coefficient (ICC), and the Pseudo \(R^2\) statistic. All the basic concepts of MLMs are introduced in this section, which is concluded with an illustrated example using real data.
In Chapter 3, the principles introduced for cross-sectional data are extended to illustrate how the concept of nesting can be used to measure growth by treating time as being nested within a student. In other words, just as many students may be nested within a school in a model with cross-sectional data, so too can multiple test scores be nested within an individual with longitudinal data. A two-level growth model is first introduced. It is then shown how the notation and model-building strategies can be expanded to three-levels. Considerable time is taken in Chapter 3 to reflect on specific issues related to growth modeling, including the linearity of the slope, the coding of time, and covariates that may vary by time.
MLM notation is discussed in considerable detail given that it is critical to understanding the specific model that has been applied. As stated earlier, it is also assumed that readers have a basic understanding of regression.
For readers not yet familiar with regression, I recommend reading the first two sections of the structural equation modeling manuscript from this series (Anderson, Patarapichayatham, & Nese, 2013).