La lecture à portée de main
Vous pourrez modifier la taille du texte de cet ouvrage
SAS Institute  Walter W. Stroup, George A. Milliken, Elizabeth A. Claassen, Russell D. Wolfinger
Vous pourrez modifier la taille du texte de cet ouvrage
Description
Discover the power of mixed models with SAS.
Mixed models—now the mainstream vehicle for analyzing most research data—are part of the core curriculum in most master’s degree programs in statistics and data science. In a single volume, this book updates both SAS® for Linear Models, Fourth Edition, and SAS® for Mixed Models, Second Edition, covering the latest capabilities for a variety of applications featuring the SAS GLIMMIX and MIXED procedures. Written for instructors of statistics, graduate students, scientists, statisticians in business or government, and other decision makers, SAS® for Mixed Models is the perfect entry for those with a background in twoway analysis of variance, regression, and intermediatelevel use of SAS.
Sujets
Informations
Publié par  SAS Institute 
Date de parution  12 décembre 2018 
Nombre de lectures  1 
EAN13  9781635261523 
Langue  English 
Poids de l'ouvrage  19 Mo 
Informations légales : prix de location à la page 0,0197€. Cette information est donnée uniquement à titre indicatif conformément à la législation en vigueur.
Exrait
The correct bibliographic citation for this manual is as follows: Stroup, Walter W., George A. Milliken, Elizabeth A. Claassen, and Russell D. Wolfinger . 2018. SAS for Mixed Models: Introduction and Basic Applications . Cary, NC: SAS Institute Inc.
SAS for Mixed Models: Introduction and Basic Applications
Copyright 2018, SAS Institute Inc., Cary, NC, USA
9781635261356 (Hardcopy) 9781635261547 (Web PDF) 9781635261523 (epub) 9781635261530 (mobi)
All Rights Reserved. Produced in the United States of America.
For a hardcopy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc.
For a web download or ebook: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication.
The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others rights is appreciated.
U.S. Government License Rights; Restricted Rights: The Software and its documentation is commercial computer software developed at private expense and is provided with RESTRICTED RIGHTS to the United States Government. Use, duplication, or disclosure of the Software by the United States Government is subject to the license terms of this Agreement pursuant to, as applicable, FAR 12.212, DFAR 227.72021(a), DFAR 227.72023(a), and DFAR 227.72024, and, to the extent required under U.S. federal law, the minimum restricted rights as set out in FAR 52.22719 (DEC 2007). If FAR 52.22719 is applicable, this provision serves as notice under clause (c) thereof and no other notice is required to be affixed to the Software or documentation. The Government s rights in Software and documentation shall be only those set forth in this Agreement.
SAS Institute Inc., SAS Campus Drive, Cary, NC 275132414
December 2018
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration.
Other brand and product names are trademarks of their respective companies.
SAS software may be provided with certain thirdparty software, including but not limited to opensource software, which is licensed under its applicable thirdparty software license agreement. For license information about thirdparty software distributed with SAS software, refer to http://support.sas.com/thirdpartylicenses .
Contents
About This Book
Dedication and Acknowledgments
Chapter 1: Mixed Model Basics
1.1 Introduction
1.2 Statistical Models
1.3 Forms of Linear Predictors
1.4 Fixed and Random Effects
1.5 Mixed Models
1.6 Typical Studies and Modeling Issues That Arise
1.7 A Typology for Mixed Models
1.8 Flowcharts to Select SAS Software to Run Various Mixed Models
Chapter 2: Design Structure I: Single Random Effect
2.1 Introduction
2.2 Mixed Model for a Randomized Block Design
2.3 The MIXED and GLIMMIX Procedures to Analyze RCBD Data
2.4 Unbalanced TwoWay Mixed Model: Examples with Incomplete Block Design
2.5 Analysis with a Negative Block Variance Estimate: An Example
2.6 Introduction to Mixed Model Theory
2.7 Summary
Chapter 3: Mean Comparisons for Fixed Effects
3.1 Introduction
3.2 Comparison of Two Treatments
3.3 Comparison of Several Means: Analysis of Variance
3.4 Comparison of Quantitative Factors: Polynomial Regression
3.5 Mean Comparisons in Factorial Designs
3.6 Summary
Chapter 4: Power, Precision, and Sample Size I: Basic Concepts
4.1 Introduction
4.2 Understanding Essential Background for Mixed Model Power and Precision
4.3 Computing Precision and Power for CRD: An Example
4.4 Comparing Competing Designs ICRD versus RCBD: An Example
4.5 Comparing Competing Designs IIComplete versus Incomplete Block Designs: An Example
4.6 Using Simulation for Precision and Power
4.7 Summary
Chapter 5: Design Structure II: Models with Multiple Random Effects
5.1 Introduction
5.2 Treatment and Experiment Structure and Associated Models
5.3 Inference with Factorial Treatment Designs with Various Mixed Models
5.4 A SplitPlot Semiconductor Experiment: An Example
5.5 A Brief Comment about PROC GLM
5.6 Type Dose Response: An Example
5.7 Variance Component Estimates Equal to Zero: An Example
5.8 A Note on PROC GLM Compared to PROC GLIMMIX and PROC MIXED: Incomplete Blocks, Missing Data, and Spurious NonEstimability
5.9 Summary
Chapter 6: Random Effects Models
6.1 Introduction: Descriptions of Random Effects Models
6.2 OneWay Random Effects Treatment Structure: Influent Example
6.3 A Simple Conditional Hierarchical Linear Model: An Example
6.4 ThreeLevel Nested Design Structure: An Example
6.5 A TwoWay Random Effects Treatment Structure to Estimate Heritability: An Example
6.6 Modern ANOVA with Variance Components
6.7 Summary
Chapter 7: Analysis of Covariance
7.1 Introduction
7.2 OneWay Fixed Effects Treatment Structure with Simple Linear Regression Models
7.3 OneWay Treatment Structure in an RCB Design StructureEqual Slopes Model: An Example
7.4 OneWay Treatment Structure in an Incomplete Block Design Structure: An Example
7.5 OneWay Treatment Structure in a BIB Design Structure: An Example
7.6 OneWay Treatment Structure in an Unbalanced Incomplete Block Design Structure: An Example
7.7 Multilevel or SplitPlot Design with the Covariate Measured on the LargeSize Experimental Unit or Whole Plot: An Example
7.8 Summary
Chapter 8: Analysis of Repeated Measures Data
8.1 Introduction
8.2 Mixed Model Analysis of Data from Basic Repeated Measures Design: An Example
8.3 Covariance Structures
8.4 PROC GLIMMIX Analysis of FEV1 Data
8.5 Unequally Spaced Repeated Measures: An Example
8.6 Summary
Chapter 9: Best Linear Unbiased Prediction (BLUP) and Inference on Random Effects
9.1 Introduction
9.2 Examples Motivating BLUP
9.3 Obtainment of BLUPs in the Breeding Random Effects Model
9.4 MachineOperator TwoFactor Mixed Model
9.6 Matrix Notation for BLUP
9.7 Summary
Chapter 10: Random Coefficient Models
10.1 Introduction
10.2 OneWay Random Effects Treatment Structure in a Completely Randomized Design Structure: An Example
10.3 Random Student Effects: An Example
10.4 Repeated Measures Growth Study: An Example
10.5 Prediction of the Shelf Life of a Product
10.6 Summary
Chapter 11: Generalized Linear Mixed Models for Binomial Data
11.1 Introduction
11.2 Three Examples of Generalized Linear Mixed Models for Binomial Data
11.3 Example 1: Binomial ORing Data
11.4 Generalized Linear Model Background
11.5 Example 2: Binomial Data in a Multicenter Clinical Trial
11.6 Example 3: Binary Data from a Dairy Cattle Breeding Trial
11.7 Summary
Chapter 12: Generalized Linear Mixed Models for Count Data
12.1 Introduction
12.2 Three Examples Illustrating Generalized Linear Mixed Models with Count Data
12.3 Overview of Modeling Considerations for Count Data
12.4 Example 1: Completely Random Design with Count Data
12.5 Example 2: Count Data from an Incomplete Block Design
12.6 Example 3: Linear Regression with a Discrete Count Dependent Variable
12.7 Blocked Design Revisited: What to Do When Block Variance Estimate is Negative
12.8 Summary
Chapter 13: Generalized Linear Mixed Models for Multilevel and Repeated Measures Experiments
13.1 Introduction
13.2 Two Examples Illustrating Generalized Linear Mixed Models with Complex Data
13.3 Example 1: SplitPlot Experiment with Count Data
13.4 Example 2: Repeated Measures Experiment with Binomial Data
Chapter 14: Power, Precision, and Sample Size II: General Approaches
14.1 Introduction
14.2 Split Plot Example Suggesting the Need for a FollowUp Study
14.3 Precision and Power Analysis for Planning a SplitPlot Experiment
14.4 Use of Mixed Model Methods to Compare Two Proposed Designs
14.5 Precision and Power Analysis: A Repeated Measures Example
14.6 Precision and Power Analysis for NonGaussian Data: A Binomial Example
14.7 Precision and Power: Example with Incomplete Blocks and Count Data
14.8 Summary
Chapter 15: Mixed Model Troubleshooting and Diagnostics
15.1 Introduction
15.2 Troubleshooting
15.3 Residuals
15.4 Influence Diagnostics
15.5 Two Diagnostic Plots Useful for NonGaussian Data
15.5 Summary
Appendix A: Linear Mixed Model Theory
A.1 Introduction
A.2 Matrix Notation
A.3 Formulation of the Mixed Model
A.4 Estimating Parameters, Predicting Random Effects
A.5 Statistical Properties
A.6 Model Selection
A.7 Inference and Test Statistics
Appendix B: Generalized Linear Mixed Model Theory
B.1 Introduction
B.2 Formulation of the Generalized Linear Model
B.3 Formulation of the Generalized Linear Mixed Model
B.4 Conditional versus Marginal Models and Inference Space
B.5 Integral Approximation
References
Index
About This Book
What Does This Book Cover?
During the past 25 years, mixed models have become an integral part of statistical methodology. Nearly all areas of application that use statistics use mixed models in some form. Mixed models are taught in graduatelevel statistics courses, as well as disciplines outside traditional statistics. Mixed models are familiar to most statisticians. Nonetheless, many persons who are engaged in analyzing mixed model data have questions about the appropriate implementation of the methodology. In addition, given the rapid growth of degree programs in data science, as well as statistics, those who are new to the discipline and ready to extend their knowledge of statistical methods to mixed models need a place to start. Being an area of active research and methodological development, mixed models have everincreasing new applications capabilities available. Those who studied the topic several years ago may not be aware of these developments and need a resource that makes these advances accessible. This edition is intended to address the needs of this diverse audience.
Like the first two editions of SAS for Mixed Models , this third publication presents mixed model methodology in a setting that is driven by applications. The scope is both broad and deep. Examples represent numerous areas of application and range from introductory examples to technically advanced case studies. The book is intended to be useful to as diverse an audience as possible, although persons with some knowledge of analysis of variance and regression analysis will benefit most.
The first chapter provides important definitions and categorizations and delineates mixed models from other classes of statistical models. Chapters 2 through 10 cover specific forms of mixed models and the situations in which they arise. Randomized block designs ( Chapter 2 ) give rise to models with fixed treatment and random block effectsamong the simplest mixed models. These enable us to introduce elementary mixed model concepts and operations, and to demonstrate the use of SAS mixed model procedures in this simple setting. An overview of mean comparison procedures for various treatment designs is presented in Chapter 3 . The topic of power and sample size often means doing a power calculation for a designated design at the end of the planning process. However, power involves more than sample sizedifferent designs with the same sample size can yield very different power characteristics. Mixed models provide a powerful methodology for comprehensive assessment of competing plausible designs. Mixed model power and precision analysis is introduced in Chapter 4 . Studies with multiple levels, such as splitplot and hierarchical designs, are common in many areas of application. These give rise to models with multiple random effects. The analysis of the associated models is discussed in Chapter 5 . Chapter 6 considers models in which all effects are random, and it covers variance component estimation and inference on random effects. Chapter 7 covers analysis of covariance in the mixed model setting. Repeated measures in time or space and longitudinal data give rise to mixed models in which the serial dependency among observations can be modeled directly; this is the topic of Chapter 8 . Chapter 9 continues with inference on random effects, a topic begun in Chapter 6 . Chapter 9 is devoted to statistical inference based on best linear unbiased prediction of random effects. This naturally leads us to random coefficient and multilevel linear models ( Chapter 10 ).
The second edition of SAS for Mixed Models was published when the earliest version of the GLIMMIX procedure had just been released. Since then, new releases of PROC GLIMMIX have greatly expanded SAS capability to handle generalized linear mixed models (GLMMs), mixed models for nonGaussian data. Although the first two editions of SAS for Mixed Models devoted a single chapter to GLMMs, this edition devotes three. The GLMM is introduced in Chapter 11 with binomial data. Chapter 12 introduces GLMMs for count data. Chapter 13 covers multilevel and repeated measures designs in a GLMM context.
In Chapter 14 we revisit power and precision. Chapter 4 concerns simple designs and Gaussian data only, whereas Chapter 14 considers more complex designs and nonGaussian data. Chapter 14 showcases the full potential of GLMMbased precision and power analysis. Chapter 15 covers mixed model diagnostics based on residuals and influence analysis, as well as some troubleshooting strategies.
Good statistical applications require a certain amount of theoretical knowledge. The more advanced the application, the more an understanding of mixed models theoretical foundations will help. Although this book focuses on applications, theoretical developments are presented as well. Appendix A covers linear mixed model theory. Appendix B covers generalized linear mixed model theory. These appendices describe how mixed model methodology works, provide essential detail about the algorithms used by SAS mixed model software, and cover the assumptions underlying mixed model analysis. In addition to describing how mixed models work, these appendices should help readers understand why things are not working in cases (hopefully few) where problems arise.
Topics included in SAS for Mixed Models, Second Edition , but not appearing in this volume are as follows:
Bayesian analysis
spatial variability
heterogeneous variance models
the NLMIXED procedure
additional case studies
The authors have reserved these topics for a planned subsequent publication.
What s New in This Edition?
SAS for Mixed Models, Second Edition , has been the goto book for practitioners, students, researchers and instructors on mixed model methodology for more than a decade. PROC GLIMMIX is the most comprehensive and sophisticated mixed model software on the market. The current version of PROC GLIMMIX was released in 2008, two years after the publication of the second edition. This publication will be a worthy update incorporating developments over the past decade, building on the SAS for Mixed Models goto status and fully taking advantage of PROC GLIMMIX capabilities.
Some topics have been rearranged to provide a more logical flow, and new examples are introduced to broaden the scope of application areas. Nearly all examples have been updated to use PROC GLIMMIX as the onestop shop for linear modeling, whether fixed effect only, linear mixed models, or generalized linear mixed models. The chapters on GLMMs greatly expand on SAS for Mixed Models, Second Edition, as knowledge and software capability have both improved over the past decade. Expanded power and precision chapters enhance the researcher s ability to plan experiments for optimal outcomes. Statistical graphics now utilize the modern SGPLOT procedures.
Is This Book for You?
SAS for Mixed Models: Introduction and Basic Applications is useful to anyone wanting to use SAS for analysis of mixed model data. It is meant to be a comprehensive reference book for data analysts working with mixed models. It is a good supplementary text for a statistics course in mixed models, or a course in hierarchical modeling or applied Bayesian statistics. Mixed model applications have their roots in agricultural research, the behavioral sciences, and medical researchaspects of mixed model methodology arose somewhat independently in these three areas. But the same or similar methodology has proven to be useful in other subject areas, such as the pharmaceutical, natural resource, engineering, educational, and social science disciplines. We assert that almost all data sets have features of mixed models.
Not everyone will want to read the book from cover to cover. Readers who have little or no exposure to mixed models will be interested in the early chapters and can progress through later chapters as their needs require. Readers with good basic skills may want to jump into the chapters on topics of specific interest and refer to earlier material to clarify basic concepts.
To gain the most benefit from this book, ideally readers will have intermediate knowledge of SAS. More importantly, knowledge of some statistical ideas, such as multiple regression, analysis of variance, and experimental design, will ensure that the reader gains the most value from the book.
What Should You Know about the Examples?
This book includes examples for you to follow to gain handson experience with SAS.
Software Used to Develop the Book's Content
The software products used to develop the content for this book are as follows:
Base SAS 9.4 SAS/STAT 14.3 SAS/GRAPH 9.4
Example Code and Data
You can access the example code and data for this book by linking to its author pages at https://support.sas.com/authors .
Output and Figures
The tabular and graphical output in this book was generated with a SAS Output Delivery System style customized for optimal book print quality; therefore, your output will differ in appearance. Color versions of Figures 3.11 and 3.13 are included with the example code and data: https://support.sas.com/authors .
SAS University Edition
This book is compatible with SAS University Edition. If you are using SAS University Edition, then begin here: https://support.sas.com/uedata .
SAS Press Wants to Hear from You
Do you have questions about a SAS Press book that you are reading? Contact us at saspress@sas.com .
SAS Press books are written by SAS Users for SAS Users. Please visit sas.com/books to sign up to request information on how to become a SAS Press author.
We welcome your participation in the development of new books and your feedback on SAS Press books that you are using. Please visit sas.com/books to sign up to review a book
Learn about new books and exclusive discounts. Sign up for our new books mailing list today at https://support.sas.com/en/books/subscribebooks.html .
Learn more about these authors by visiting their author pages, where you can download free book excerpts, access example code and data, read the latest reviews, get updates, and more:
https://support.sas.com/en/books/authors/walterstroup.html
https://support.sas.com/en/books/authors/georgemilliken.html
https://support.sas.com/en/books/authors/elizabethclaassen.html
https://support.sas.com/en/books/authors/russellwolfinger.html
Dedication and Acknowledgments
This book is dedicated in memory of William L. Sanders (19422017).
Dr. Sanders was a professor of statistics at the University of TennesseeKnoxville, where he created the Agricultural Experiment Station Statistical Consulting unit in 1972 and served as its director until 2000. He then became Director of the Educational ValueAdded Assessment System (EVAAS) at SAS Institute. Bill served as Tennessee s representative to University Statisticians of Southern Experiment Stations (USSES), a multistate group formed in the 1960s.
USSES was instrumental in the original development of SAS software. In 1983, USSES initiated a project that practically revealed the power and usefulness of mixed model methodology and its widespread applicability. Bill was the project s public face and most effective advocate. The project completed its initial primary publication, Applications of Mixed Models in Agriculture and Related Disciplines, Southern Cooperative Series Bulletin No. 343 (1988, pp. 3948) just as USSES had its annual meeting that year at SAS Institute headquarters in Cary, North Carolina.
The pivotal event at that meeting was a dinner at which Bill sequestered SAS CEO Dr. Jim Goodnight, keeping him there into the wee hours of morning. At breakfast the next morning, Bill looked tired but very pleased. He had persuaded Dr. Goodnight to push well beyond the GLM procedure and the VARCOMP procedure to invest seriously in developing fullfledged mixed model capabilities in SAS/STAT. A direct result was the MIXED procedure, which then branched into the suite of mixed model procedures:
PROC NLMIXED
PROC HPMIXED
PROC HPLMIXED
PROC GLIMMIX
In addition, mixed model capabilities in JMP software and the MCMC procedure were developed. His foresight and persistence came full circle when the EVAAS operation began using the computationally intensive mixed model algorithms incorporated directly into the HPMIXED procedure.
This book would have been impossible without Bill s contributions to the USSES mixed model project, and his tireless advocacy at SAS Institute. Indeed, Bill s efforts figure prominently in the development of the comprehensive mixed model capability we see in SAS today. Bill was also an important mentor to many of us involved in this project as it matured. We enjoyed countless pleasant hours in his company, and he even taught us a lesson or two on the golf course, a game at which he was quite skilled. Acknowledging our indebtedness to Bill, we dedicate this edition to him. Rest in peace, Dr. Sanders.
We extend a special thanks to the editorial staff at SAS Press. We thank Julie McAlpine Palmieri, editorinchief, whose persistent prodding got this project off the ground. Our editor, Jenny Jennings Foerst, had the patience of a saint, as life events kept pushing deadlines back. The attention to detail our copyeditor, John West, has shown improved the quality of this book immeasurably.
We also especially thank Oliver Schabenberger for his tireless and immense effort on the second edition of SAS for Mixed Models , as well as on PROC GLIMMIX, on which much of the content here is based. His vast energy and acumen have carried him into a role of exceptionally high responsibility at SAS, which has prevented him from joining as an official coauthor. Nonetheless, we know he is with us in spirit and retains a love for mixed models.
We acknowledge Ramon Littell. As a coauthor on the first two editions of the SAS for Mixed Models series, and the SAS for Linear Models editions that set the stage for the mixed model series, Dr. Littell has been a prominent figure in the development of linear and mixed model methodology and application since the inception of SAS.
We acknowledge Dallas E. Johnson for longterm help on complex designs and resulting mixed models analysis, as well as for helping work with PROC GLM (even before PROC MIXED) to provide appropriate analyses of difficult and messy splitplot and blocked designs. These analyses have provided key groundwork for our interpretation of the results from PROC MIXED and PROC GLIMMIX.
We acknowledge Robert A. McLean, who, as a colleague of Bill Sanders at the University of Tennessee and a key contributor to the USSES regional mixed model project, played a crucial role in the events that ultimately led to the development of PROC MIXED.
Thanks to those reviewers whose feedback on draft chapters helped refine and clarify the text. A special shoutout goes to our colleagues from SASPhil Gibbs, Kathleen Kiernan, and Jill Taofor promoting, supporting, and clarifying these methods on the front line for so many years. You guys are the best. In SAS Research and Development, John Sall, Dave DeLong, Bob Rodriguez, Randy Tobias, Maura Stokes, Chris Gotwalt, Min Zhu, John Castelloe, and Tianlin Wang have all provided wise, instrumental guidance and code on which we were able to constantly draw.
Special thanks goes also to reviewers outside SAS: Billy Bridges Jr., Alumni Distinguished Professor of Mathematical Sciences, Clemson University, and Julia L. Sharp, Associate Professor and Director of the Franklin A Graybill Statistical Laboratory, Department of Statistics, Colorado State University. These two reviewers generously contributed much painstaking commentary that helped us improve the manuscript significantly.
We could not have written this book without the support, input, and energy of many individuals, groups, and organizations. Foremost, we need to thank our families for their patience, understanding, and support. Thanks to our respective employersthe University of Nebraska, Kansas State University, and SASfor giving us freedom to undertake this project. Thanks to mixed model researchers and statistical colleagues everywhere for shaping our thinking through their work. Much appreciation to the countless scientists, analysts, engineers, and researchers who shared their data sets and stories and allowed us to pass them along to our readers. The widespread use and popularity of the two books that this one updates have been overwhelming and humbling. Our hearts are full of gratitude to all who have effectively leveraged and utilized these methods in ways we never dreamed possible and then provided feedback. We celebrate your successes and wish you many more as we all continue to experiment and learn. Your strong interest and demand prompted this new, expanded edition.
Finally, Elizabeth thanks the veteran members of the author team for welcoming her enthusiastically and guiding her through her first foray into book authorship. In turn, we thank her for countless hours of work pulling together many loose ends into a coherent sequence and keeping the whole author team organized and on track.
Chapter 1: Mixed Model Basics
1.1 Introduction . 1
1.2 Statistical Models . 2
1.2.1 Simple Example with Two Treatments . 2
1.2.2 Model Characteristics . 2
1.2.3 Models with Subsampling . 3
1.2.4 Experimental Units . 4
1.2.5 Simple Hierarchical Design . 5
1.3 Forms of Linear Predictors . 6
1.4 Fixed and Random Effects . 8
1.5 Mixed Models . 9
1.6 Typical Studies and Modeling Issues That Arise . 10
1.6.1 Random Effects Model 10
1.6.2 Multilocation Example . 11
1.6.3 Repeated Measures and SplitPlot Experiments . 12
1.6.4 Fixed Treatment, Random Block, Nonnormal (Binomial) Data Example . 13
1.6.5 Repeated Measures with Nonnormal (Count) Data . 13
1.6.6 Repeated Measures and Split Plots with Nonlinear Effects . 13
1.7 A Typology for Mixed Models . 15
1.8 Flowcharts to Select SAS Software to Run Various Mixed Models . 16
1.1 Introduction
There is someone or something collecting data on everything that moves, grows, thinks or changes as time marches on. It is very important to have the appropriate tools to analyze the resulting data sets. But it is very important to identify the structure or structures embedded in a data set so analysts can select the appropriate tool or tools needed to extract useful information from the data set. This chapter provides guidelines to help identify data structures that require a generalized linear mixed model to extract necessary information. Types of data structures and types of models are discussed in the following sections.
Data sets presented in this book come from three different situations: (1) designed experiments, (2) sample surveys, and (3) observational studies. Virtually all data sets are produced by one of these three sources. The primary objectives of a study are influenced by the study s construction:
In designed experiments , the primary objective might be to compare two or more drug formulations in their ability to control high blood pressure in humans. The process is to apply treatments or formulations to experimental units (persons) and then observe the response (blood pressure). In a human clinical trial , the experimental units are volunteer patients who meet the criteria for participating in the study. The various drug formulations are randomly assigned to patients, their responses are subsequently observed, and the formulations are compared.
In sample surveys , data are collected according to a plan called a survey design , but treatments are not applied to units. Instead, the units, typically people, already possess certain attributes such as age or occupation. It is often of interest to measure the effect of the attributes on, or their association with, other attributes, as described by the primary objectives of the study.
In observational studies , data are collected on units that are available, rather than on units chosen according to a plan. An example is a study at a veterinary clinic in which dogs entering the clinic are diagnosed according to their skin condition, and blood samples are drawn for measurement of trace elements depending on the primary objectives. Alternatively, observational studies may use data that have already been collected. For example, an ecological study may use data collected on animal populations over the past several decades in order to better understand the impact of factors such as decreasing wetland area or suburban development on animal species of interest. All studies evolve from the primary objectives the researchers want to study.
The objectives of a project, the types of resources that are available, and the constraints on what kind of data collection is possible all dictate your choice of whether to run a designed experiment, a sample survey, or an observational study. Even though the three have striking differences in the way they are carried out, they all have common features or structures leading to a common set of statistical analysis tools.
For example, the terms factor , level , and effect are used alike in designed experiments, sample surveys, and observational studies. In designed experiments, the treatment condition under study (e.g., from examples you decide to use) is the factor, and the specific treatments are the levels . In the observational study, the dogs diagnosis is the factor and the specific skin conditions are the levels. In all three types of studies, each level has an effect; that is, applying a different treatment in a designed experiment has an effect on the mean blood pressure response. The different skin conditions show differences in their respective mean blood trace amounts. These concepts are defined more precisely in subsequent sections.
In this book, the term study refers to whatever type of project is relevant: designed experiment, sample survey, or observational study.
1.2 Statistical Models
Statistical models are mathematical descriptions of how the data conceivably can be produced. Models consist of at least two parts: (1) a formula relating the response to all explanatory variables (e.g., effects), and (2) a description of the probability distribution, or distributions, assumed to characterize random variation affecting the observed response. In addition to providing a description of how the data arose, statistical models serve as a template for how the data will be analyzed.
Although much of the focus of this book is on the template for analysis aspect of modeling, readers should note that when things go wrong, when implementation of a model goes off track, more often than not it is because not enough attention has been paid to how did the data arise? or this aspect of modeling has been disregarded altogether. Ideally, you should be able to simulate your data using your model in conjunction with random number generators. If you can t simulate a data set like the one you have or will have is a possible red flag indicating that the modeling assumptions may have gone bad.
Writing the model as a narrative of how the data arose in terms of a formula and probability distribution requires you to translate the study design into a plausible statistical model. The clear majority of modeling issues are really faulty designtomodel translation issues, and they often have their roots in inadequate understanding of basic design principles. Accordingly, our discussion of statistical models begins with a review of design concepts and vocabulary and a strategy for identifying models that are reasonable given the study design that produced the data we want to analyze.
1.2.1 Simple Example with Two Treatments
To illustrate, consider the paired comparison, a design introduced in all first courses in statistics. In a designed experiment, the paired comparison takes the form of a blocked design with two treatments. The pairs are referred to as blocks . Each block has two observations, one per treatment. In survey designs, the pairs take the form of strata or clusters. Observations in each stratum or cluster are taken on two levels of an attribute that plays a role analogous to a treatment. In observational studies, paired comparisons often take the form of matchedpair or casecontrol studies. That is, observations are matched retrospectively according to criteria deemed relevant to the study so that a member of each pair represents the two treatments. For example, a pair may be as alike as possible except that one is a smoker and the other is not.
1.2.2 Model Characteristics
From a modeling perspective, these different versions of the paired comparison share the same structure. Each has three sources of variation: the pair, the treatment, and anything unique to the unit within a pair not accounted for by pair or treatment. These sources of variation hold the key to translating a study design into a plausible model. First, however, a brief excursion into what not to do, and an approach many readers may find that they will need to unlearn.
Historically, these sources of variation give rise to the model equation, where residual is understood to mean anything unique to the unit within a pair not accounted for by the pair or by the treatment. :
Observation = Overall Mean + Treatment Effect + Pair Effect + Residual
This equation is a simple form of a linear statistical model and is a special case of what has become known as the general linear model . This approach to writing the model, called the model equation approach , works well when the observations can be assumed to be normally distributed and the design structure is simple. The observation=model effects + residual equationbased approach to modeling does not adapt well to modeling data from study designs with even a moderate amount of complexity, nor does it adapt well to modeling data from any design when normality cannot be assumed. The general linear model is not at all general by modern standards. This approach is illsuited for some of the study designs, models, and analyses that are considered in this book.
The linear mixed model extends the linear statistical model to allow appropriate analyses of data from study designs more typical of modern research, analytics and databased decision making. To specify a plausible model, start with the sources of variation, but take a different approach:
1. In the first step, you identify the unit at which the observations are taken. In study design terminology, this is called the unit of observation.
2. Next, write a plausible probability distribution to assume for the observations. For example, if you can assume normality, denote the observation on treatment i and pair j as y i j and write the distribution as y i j ~ N ( μ i j , σ 2 ) . On the other hand, if the observations are discrete counts, it may be more plausible to assume a Poisson or negative binomial distribution, such as y i j ~ Poisson ( λ i j ) .
3. Once you identify the distribution of the observations, the next step is to write an equation describing how the sources of variation in the study design affect the mean of the distribution of the observations. If you can assume normality, μ i j = μ + τ i + p j , where μ denotes the overall mean, τ i denotes the treatment effect and p j denotes the pair effect, is commonly used to describe how treatments and pairs affect the means. The righthand side of this equation is called the linear predictor . This strategy for writing the statistical model is called the probability distribution approach . Notice that the required elements of the model are (1) the probability distribution at the unit of observation level and (2) the linear predictor. Often, there are additional required elements, depending on the study design, the distribution of the observations, and the nature of the model effects.
Once you have identified the probability distribution and the sources of variation to include in the linear predictor, there are three issues that you need to resolve before a model is ready for implementation. These are:
What form should the linear predictor use to address the primary objectives of the study? The simplest version of this issue involves deciding if the source of variation is best modeled as a discrete effect, as in the τ i specification above, or as a continuous effect, as in regression models. See Section 1.3 for an introduction to regression models .
Which model effects are fixed and which are random? In other words, which effects arise from sampling a population and therefore must be treated as random variables with probability distributions, and which effects do not. The existence of random model effects is what distinguishes the general linear model from mixed models. Study designs with even a modest amount of complexity  e.g. the paired comparison  have effects that should be considered random. See Section 1.4 for an introduction to the fixed vs. random effect issue.
What should be on the lefthand side of the equal sign in the linear predictor? For normally distributed observations, μ i j is the obvious choice. For nonnormally distributed observations, there are usually better choices. The nonnormally distributed observations are discussed in Sections 1.3 and 1.6 and Chapters 11 through 13 on generalized linear models and generalized linear mixed models for a full presentation.
Before considering these issues, complete our study design and translation to model review to make sure you have the necessary tools for model construction and implementation. To complete this section, consider two common design formats. The first is a design with subsampling. The second is a hierarchical design, also called a multilevel design or a split plot design .
1.2.3 Models with Subsampling
First consider subsampling. The distinguishing feature of this type of design is that the unit of observation is a subset from the unit that receives the treatment. For example, suppose that a school evaluates a new curriculum by assigning one group of teachers to use the current curriculum and another group of teachers to use the new curriculum. The treatment is assigned at the teacher, or classroom, level. Suppose that the observations are test scores. These will be obtained from individual students in the class. Thus, the student is the unit of observation. In the language of design, the classroom is the experimental unit and the student is the sampling unit .
Variations on the design with subsampling include the following:
In a clinical trial, treatments may be assigned to clinics, and measurements taken on individual patients. In this case, clinic is the experimental unit; patient is the sampling unit.
In a retrospective study, a community may have been exposed to a toxic environmental condition or not. Measurements may come from individuals in the community. If they do, community is the experimental unit and individuals are the sampling units.
1.2.4 Experimental Units
Formally, experimental unit is defined as the smallest entity to which treatment levels are independently assigned. Emphasis on the word independently . Each classroom in theory can be assigned to use a different curriculum. While it is true that students are assigned a curriculum, individual students in a classroom cannot receive different curricula. Assignment to classrooms is independent; assignment to students is not. You can also think of the experimental unit as the unit of randomization or the unit of treatment assignment (see Chapters 4 and 5 of Milliken and Johnson, 2009, for a more detailed discussion).
While the term experimental unit appears to be specific to designed experiments, it is not. Survey designs and observational studies all have analogues to the experimental unit, such as the communities in the toxic exposure study. Correctly identifying the experimental unit is perhaps the single most important precondition to constructing a plausible model . With this in mind, consider the second format, the hierarchical or split plot design.
Suppose that in addition to a new curriculum, schools are also trying a new professional development program. The district assigns one group of schools to participate in the professional development program and another group of schools to serve as a control. Within each school, certain classrooms are assigned to use the current curriculum and other classrooms are assigned to use the new curriculum. Students in each classroom are given pre and posttests and their scores are used to measure the impact of curriculum and the professional development program. Now there are two types, or levels, of experimental unit. The unit of assignment for the professional development program is the school. The unit of assignment for curriculum is the classroom within a school. In other words, there are two sizes of experimental units; (1) the school and (2) the classroom. The sampling unit is the student within a classroom.
How do you translate these study designs into plausible models? One strategy, suggested by Stroup (2013), is to build on a suggestion due to Fisher in comments following the Yates (1935) presentation entitled Complex Experiments . Stroup (2013) called this strategy What Would Fisher Do? (WWFD ), but you can also think of it as the ANOVA table repurposed . The strategy consists of listing the components of the study design other than treatments, then separately listing the components of the study design that are the treatments (or analogues to treatments, as is the case in many surveys and observational studies), and finally combining the two. Fisher called the components of the design other than treatments the topographical aspect of the design Federer (1955) and Milliken and Johnson (2009) refer to them as the experiment design or design structure. Fisher called the treatment components the treatment aspect. Federer and Milliken and Johnson refer to them as the treatment design or treatment structure.
For the subsampling design, suppose that there are 10 schools that participate in the study and two classrooms per school, one assigned to the current curriculum, the other to the new. For simplicity, suppose there are 20 students per classroom. The ANOVA strategy leads to Table 1.1.
For the Experiment Design, there are 10 schools, so there are 10 1 or 9 degrees of freedom for schools. There are 2 classrooms per school so there is 2 1 = 1 df for classrooms within each school, thus there are 10 1 = 10 df for classrooms nested within schools. There are 20 students per classroom so there are 20  1 = 19 df for variability of students within each school and a total of 19 2 10 df for variability of students within classrooms pooled or nested across schools.
For the Treatment Design, there are 2 curriculums and thus 1 df for comparing curriculums. The parallels consist of all df that are not contributed to the treatments.
Next, combine the experiment and treatment designs where school and curriculum have the same df . The classroom(school) term df are reduced by 1 as the curriculum effect is a betweenclassroom withinschool effect. This effect is denoted as Classroom(School)Curriculum. The variability of student within classrooms pooled across classrooms within schools has 380 df , which is unchanged from the Experiment Design Column.
Table 1.1: Three Steps Demonstrating the Analyses of the Experiment Design, the Choice of the Treatment Design, and the Process of Combining the Two Designs into the Final ANOVA Table
Experiment Design
Treatment Design
Combined Experiment and
Treatment Designs
Source
df
Source
df
Source
df
School
9


School
9


Curriculum
1
Curriculum
1
Classroom(School)
10


Classroom(School  Curriculum
10  1 = 9
Student(Classroom)
380
Parallels
398
Student (Classroom)
380
Total
399
399
399
The experimental design components of the study design are school , classroom , and student . These are put in the lefthand side of Table 1.1 under Experiment Design. The treatment design component is curriculum. Fisher used the term parallels mainly as a placeholder for everything but treatment in the treatment design column. The righthand side of Table 1.1 combines the Experiment Design columns with the Treatment Design column to provide the basis for the statistical model. Notice that the Experimental Design factor, schools, and the treatment design factor, curriculum, move intact from their columns to the combined column. Also notice the placement of curriculum in the table, in the line immediately above classroom(school). This is important, because it signifies that classroom is the experimental unit with respect to the treatment factor curriculum.
In the combined column of Table 1.1, the classroom source of variation appears as Classroom(School)  curriculum. Read this as classroom nested within school given (or after accounting for the effects of ) curriculum. The degrees of freedom for this effect results from subtracting the curriculum degrees of freedom from the original classroom degrees of freedom in the experiment design column. The degrees of freedom columns within each of the three major columns are not strictly necessary, but they do help individuals new to data analysis, and the degrees of freedom in the combined column provide useful checks on the degree of freedom algorithms used by mixed models software, particularly SAS.
In traditional ANOVA, it is convention to relabel the last line as residual. This is a good habit to break. Here it is misleading, because student is not the experimental unit with respect to treatment, and hence the last line of the ANOVA does not measure experimental error for estimating or testing curriculum effects. For nonnormal data, the word residual has no meaning: relabeling the last line residual is not only misleading, it is nonsense.
Following the steps to construct the model for the schoolcurriculum example, first identify the unit of observation. In this case, it is Student(Classroom). Next, write the assumed distribution of the observations. If you can assume normality, this will be
y i j k ~ N ( μ i j , σ 2 )
Models for which normality cannot be assumed will be previewed in Section 1.6 and dealt with in full in Chapters 11  13 . The next step is to write the linear predictor. Assuming normality, an obvious candidate for the linear predictor is
μ i j = μ + s i + τ j + s τ i j
where s i denotes school effects, τ j denotes curriculum effects, and s τ i j denotes classroom(school)curriculum effects.
Note that in this design, knowing the school and curriculum treatment uniquely identifies the classroom. The final step is to decide which effects should be considered as random. Any effect in the linear predictor that is the experimental unit with respect to a treatment factor must be random, so you know s τ i j must be a random effect.
For further discussion about the school and curriculum effects, see Section 1.4.
1.2.5 Simple Hierarchical Design
The final example of designtomodel translation is the hierarchical design described above. As with the subsampling example, suppose there are 10 schools, 2 classrooms per school and 20 students per classroom. In addition, five schools are assigned to participate in the professional development program and the other five serve as the control. The resulting ANOVA table is Table 1.2.
Table 1.2: Experiment Design, Treatment Design, and Combination of the Experiment and Treatment Designs for the Hierarchical Design for Professional Development (PD) and Curriculum (C) Study
Experiment Design
Treatment Design
Combined Experiment and Treatment Designs
Source
df
Source
df
Source
df


Professional Development
1
Professional Development
1
Schools
9


SchoolPD
9  1 8


Curriculum
1
Curriculum
1


C × PD
1
C × PD
1
Classroom(School)
10


Classroom(school)  C, PD
10  2 8
Student (Classroom)
380
Parallels
396
Student(Classroom)  School. C, PD
380
Total
399
Total
399
Total
399
The experiment design components are school, classroom, and student. The treatment design components are the two levels of professional developments crossed with the two levels of curriculums, which form a 2 × 2 factorial treatment structure. Place professional development of the Treatment Design in the line above school, because school is the experimental unit with respect to professional development. Place curriculum of the Treatment Design in the line above classroom, because classroom is the experimental unit with respect to curriculum. The curriculum professional development interaction effect also goes immediately above the line for classroom, because classroom is the experimental unit for the interaction.
At this point, follow the steps as before. The unit of observation is student(classroom). Denote the observation as y ijk and write its assumed distribution. Then write the linear predictor using the other sources of variation. A plausible candidate is
μ + ρ i + s ( ρ ) i j + τ k + ρ τ i k + s ρ τ i j k
where ρ i denotes professional development effects, and
s ( ρ ) i j
denotes school after accounting for professional development also known as school nested within professional development effects τ k denotes curriculum effects, ρ τ i k denotes professional development curriculum interaction effect, and s ρ τ i j k denotes classroom(school) after accounting for professional development and curriculum effects (because classroom is uniquely identified by school, professional development and curriculum).
The effects s ( ρ ) i j and s ρ τ i j k must be random effects, because they are experimental units with respect to professional development and curriculum, respectively.
In principle, as Fisher implied in his comments that motivated Stroup to develop the WWFD ANOVA , if you follow this process, it should be clear how to proceed with modeling any study design of arbitrary complexity. Chapters 4 and 5 of Milliken and Johnson (2009) present an equally effective alternative using a graphical approach to demonstrate a process of constructing complex models.
1.3 Forms of Linear Predictors
As a broad overview, the linear predictor has two components and serves two purposes. The first component deals with the treatment design. The treatment component follows the nature of the treatment design and the study s objectives. Examples that follow in this section clarify how this is done. The second component of the linear predictor deals with the experiment design. In general, components of the experiment design should be considered random effects. A more detailed discussion of this issue appears in Section 1.4.
We begin our discussion of the treatment component of the linear predictor with a oneway treatment design. Consider an experiment with five drugs (say, A, B, C, D, and E) applied to subjects to control blood pressure where the primary objective is to determine if there is one drug that controls blood pressure better than the other drugs. Let μ A denote the mean blood pressure for subjects treated with drug A, and define μ B , μ C , μ D , and μ E similarly for the other drugs. Suppose that the experiment design is completely randomized and that we can assume that variation in blood pressure is normally distributed with mean μ i for the i th drug.
The simplest linear predictor to describe how treatments affect the mean is simply to use μ A through μ E . That is, for the j th observation on the i th drug,
y i j ~ N ( μ i , σ 2 )
and the linear predictor is μ i . This is called a means model because the only term in the linear predictor is the treatment mean.
The mean can be further modeled in various ways. You can define the effect of drug A as α A such that μ A = μ + α A , where μ is defined as the intercept. This form of the linear predictor is called an effects model (Milliken and Johnson 2009, Chapter 6 ). If the distribution of y i j is Gaussian, as given above, this is equivalent to the oneway analysis of variance (ANOVA) model y A j = μ + α A + e A j .
Note that the effects model has more parameters (in this case 6, μ and the 5 α i ) than factor levels (in this case 5). Such models are said to be overparameterized because there are more parameters to estimate than there are unique items of information. Such models require either placing a constraint on the solution to estimate the parameters, or using a generalized inverse to solve the estimating equations. The SAS linear model procedures discussed in this book all use a generalized inverse that is equivalent to constraining the last factor level, in this case α E , to zero. An alternative, called the sumtozero or settozero constraints (Milliken and Johnson (2009) Chapter 6 ) involves defining μ as the overall mean implying α A = μ A − μ and thus
∑ i = A E α i = 0
Its advantage is that if the number of observations per treatment is equal, it is easy to interpret. However, for designs with unequal observations per treatment, the sumtozero constraint becomes unwieldy, if not completely intractable, whereas alternative constraints are more generally applicable. In general, for effects models, the estimate of the mean μ A = μ + α A is unique and interpretable, but the individual components μ and the α i may not be.
Another approach to defining the linear predictor, which would be appropriate if levels A through E represented amounts, for example, doses of a drug given to patients, is to use linear regression. Specifically, let X A be the drug dose corresponding to treatment A, X B be the drug dose corresponding to treatment B, and so forth. Then the linear predictor μ i = β 0 + β 1 X i could be used to describe a linear increase (or decrease) in the mean blood pressure as a function of changing dose. Assuming normality, this form of the linear predictor is equivalent to the linear regression model y i = β 0 + β 1 X i + e .
One important extension beyond linear statistical models involves cases in which the response variable does not have a normal distribution. For example, suppose in the drug experiment that c i clinics are assigned at random to each drug, n i j subjects are observed at the j th clinic assigned to drug i , and each subject is classified according to whether a medical event such as a stroke or heart attack has occurred or not. The resulting response variable y i j can be defined as the number of subjects having the event of interest at the i j t h clinic, and y i j ~ Binomial ( n i j , π i ) where π i denotes the probability of a subject showing improvement when treated with drug i .
While it is technically possible to fit a linear model such as p i j = μ i + e i j , where p i j = y i j / n i j is the sample proportion and μ i = π i , there are conceptual problems with doing so. First, there is no guarantee that the estimate of π i will be between 0 and 1. Regression models are especially prone to producing nonsense estimators. Second, any model with a residual term e i j implicitly requires you to estimate the variance, σ 2 , as a separate act from estimating the mean. However, the variance of the binomial response variable in this example is n i j π i ( 1 − π i ) . Once you estimate the mean, you know the variancethere is no separate variance to estimate.
The residual term e i j is superfluous. A better model might be as follows: π i = 1 / ( 1 + e − η i ) and either η i = η + α i or η i = β 0 + β 1 X i , depending on whether the effectsmodel or regression framework discussed above is more appropriate. Notice that for nonnormal data, you replace μ by η in the linear predictor, because the linear predictor is not an estimate of the mean. In other contexts, modeling π i = Φ ( η i ) , where Φ ( • ) is the standard normal distribution , η i = η + α i or η i = β 0 + β 1 X i , may be preferable, because, for example, interpretation is better connected to the subject matter under investigation. The former are simple versions of logistic ANOVA and logistic regression models, and the latter are simple versions of probit ANOVA and regression. Both are important examples of generalized linear models .
Generalized linear models use a general function of a linear model to describe the expected value of the observations. Specifically, i is called the link function and the function that related the expected value to the link function, e.g. π i = 1 / ( 1 + e − η i ) , is called the inverse link . The linear predictor you equate to the link function is suggested by the design and the nature of the explanatory variables, similar to the rationale for ANOVA or regression models. The form of the link and inverse link are suggested by the probability distribution of the response variable. Note that the link function can be the linear model itself and the distribution can be normal; thus, standard ANOVA and regression models are in fact special cases of generalized linear models. Chapters 11 and 12 discuss mixed model forms of generalized linear models .
In addition to generalized linear models, another important extension involves nonlinear statistical models. These occur when the relationship between the expected value of the random variable and the treatment, explanatory, or predictor variables is nonlinear. Generalized linear models are a special case, but they require a linear model embedded within a nonlinear function of the mean. Nonlinear models may use any function, and may occur when the response variable has a normal distribution. For example, increasing amounts of fertilizer nitrogen (N) are applied to a crop. The observed yield can be modeled using a normal distributionthat is, y i j ~ N ( μ i , σ 2 ) . The expected value of y i j in turn is modeled by μ i = α i exp ( − exp ( β i − γ i X i ) ) , where X i is the i th level or amount of fertilizer N, α i is the asymptote for the i th level of N, γ i is the slope, and β i / γ i is the inflection point. This is a Gompertz function that models a nonlinear increase in yield as a function of N: the response is small to low N, then increases rapidly at higher N, then reaches a point of diminishing returns and finally an asymptote at even higher N. Mixed model forms of nonlinear models are not discussed in this book.
1.4 Fixed and Random Effects
The previous section considered models of the mean involving only an assumed distribution of the response variable and a function of the mean involving only factor effects that are treated as unknown constants. These are called fixed effects models . An effect is called fixed if the levels in the study represent all possible levels of the factor, or at least all levels about which inference is to be made. Notably, this includes regression models where the observed values of the explanatory variable cover the entire region of interest.
In the blood pressure drug experiment, the effects of the drugs are fixed if the five specific drugs are the only candidates for use and if conclusions about the experiment are restricted to those five drugs. You can examine the differences among the drugs to see which are essentially equivalent and which are better or worse than others. In terms of the model y i j = μ + α i + e i j , the effects α A through α E represent the effects of a particular drug relative to the intercept μ .
The drugs means, μ A = μ + α A , μ B = μ + α B , , μ E = μ + α E , and differences among drug means, for example, α A − α B , represent fixed, unknown quantities. Data from the study provide estimates about the five drug means and differences among them. For example, the sample mean from drug A, y ¯ A · is an estimate of the population mean μ A .
Notation Note : When data values are summed over a subscript, that subscript is replaced by a period. For example, y A · stands for y A 1 + y A 2 + ... + y A n . A bar over the summed value denotes the sample average. For example, y ¯ A · = n − 1 y A · .
The difference between two sample means, such as y ¯ A · − y ¯ B · , is an estimate of the difference between two population means μ A − μ B . The variance of the estimate y ¯ A · is n − 1 σ 2 and the variance of the estimate y ¯ A · − y ¯ B · is 2 σ 2 / n . In reality, σ 2 is unknown and must be estimated. Denote the sample variance for drug A by s A 2 , the sample variance for drug B by s B 2 , and similarly for drugs C, D, and E. Each of these sample variances is an estimate of σ 2 with n − 1 degrees of freedom. Therefore, the average of the sample variances, s 2 = ( s A 2 + s B 2 + ... + s E 2 ) / 5 , (called the pooled estimate of the variance ) is also an estimate of σ 2 with 5 ( n − 1 ) degrees of freedom. You can use this estimate to calculate standard errors of the drug sample means, which can in turn be used to make inferences about the drug population means. For example, the standard error of the estimate y ¯ A · − y ¯ B · is as follows:
2 s 2 / n
The confidence interval is as follows, where t α is the level, twosided critical value of the tdistribution with 5 ( n − 1 ) degrees of freedom:
( y ¯ A · − y ¯ B · ) ± t α 2 s 2 / n
Factor effects are random if they are used in the study to represent a sample (ideally, a random sample ) of a larger set of potential levels. The factor effects corresponding to the larger set of levels constitute a population with a probability distribution. The last statement bears repeating because it goes to the heart of a great deal of confusion about the difference between fixed and random effects: a factor is considered random if its levels plausibly represent a larger population with a probability distribution . In the blood pressure drug experiment, the drugs would be considered random if there are actually a large number of such drugs and only five were sampled to represent the population for the study. Note that this is different from a regression or response surface design, where doses or amounts are selected deliberately to optimize the estimation of fixed regression parameters of the experimental region. Random effects represent true sampling and are assumed to have probability distributions.
Deciding whether a factor is random or fixed is not always easy and can be controversial. Blocking factors and locations illustrate this point. In agricultural experiments blocking often reflects variation in a field, such as on a slope with one block in a strip at the top of the slope, one block on a strip below it, and so forth, to the bottom of the slope. One might argue that there is nothing random about these blocks. However, an additional feature of random effects is exchangeability . Are the blocks used in this experiment the only blocks that could have been used, or could any representative set of blocks from the target population be substituted? Treatment levels are not exchangeable: you cannot estimate the effects of drugs A through E unless you observe drugs A through E . But you could observe them on any valid subset of the target population. Similar arguments can be made with respect to locations. Chapter 2 considers the issue of random versus fixed blocks in greater detail. Chapter 6 considers the multilocation problem.
When the effect is random, we typically assume that the distribution of the random effect has mean zero and variance σ a 2 , where the subscript a refers to the variance of the distribution of treatment effects. If the drugs are randomly selected from a population of similar drugs then the drug effects are random, where σ a 2 denotes th e variance among drug effects in the population of drugs. The linear statistical model can be written in model equation form as y i j = μ + α i + e i j , where μ represents the mean of all drugs in the population, not just those observed in the study. Alternatively, you could write the model in probability distribution form by giving the conditional distribution of the observations y i j  a i ~ N ( μ + a i , σ 2 ) and the distribution of the drug effects, a i ~ NI ( 0 , σ a 2 ) . As noted earlier, the probability distribution form is preferred because it can be adapted for nonnormal data, whereas the model equation form cannot. Note that the drug effect is denoted a i rather than α i as in the previous model. This book follows a frequently used convention, denoting fixed effects with Greek letters and random effects with Latin letters. Because the drugs in this study are a sample, the effects a i are random variables with mean 0 and variance σ a 2 . The variance of y i j is Var [ y i j ] = Var [ μ + a i + e i j ] = σ a 2 + σ 2 .
1.5 Mixed Models
Fixed and random effects were described in the preceding section. Models that contain both fixed and random effects are called mixed models . Consider the blood pressure drug experiment from the previous sections, but suppose that we are given new information about how the experiment was conducted. The n subjects assigned to each drug treatment were actually identified for the study in carefully matched groups of five. They were matched for criteria such that they would be expected to have similar blood pressure history and response. Within each group of five, drugs were assigned so that each of the drugs A, B, C, D, and E was randomly assigned to exactly one subject. Further assume that the n groups of five matched subjects each was drawn from a larger population of subjects who potentially could have been selected for the experiment. This process of grouping experimental units is a form of blocking . The resulting study is a randomized complete block design (RCBD) with fixed treatment effects and random block effects.
In model equation form, the model is y i j = μ + α i + b j + e i j , where μ , α A , ..., α E represent unknown fixed parametersintercept and the five drug treatment effects, respectivelyand the b j and e i j are mutually independent random variables representing blocks (matched groups of five) and error, respectively. In the preferred probability distribution form, the required elements of the model are y i j  b j ~ N ( μ i j , σ 2 ) and μ i j = μ + α i + b j . Assume that the random block effects b j are independently and identically distributed with mean zero and variance σ b 2 . Additionally, for the model equation form, assume that the residual effects, e i j , are independently and identically distributed with mean zero and variance σ 2 . The variance of y i j , the observation of the randomly chosen matched set j assigned to drug treatment i , is Var [ y i j ] = σ b 2 + σ 2 . The difference between two drug treatment means (say, drugs A and B) within the same matched group is y A j − y B j . It is noteworthy that the difference expressed in terms of the model equation is
y A j − y B j = α A − α B + e A j − e B j
which contains no block or matched group effect. The term b j drops out of the equation. Thus, the variance of this difference is 2 σ 2 / n . The difference between drug treatments can be estimated free from matched group effects. On the other hand, the mean of a single drug treatment, y ¯ A • has variance ( σ b 2 + σ 2 ) / n , which does involve the variance among matched groups.
The randomized block design is just the beginning with mixed models. Numerous other experimental and survey designs and observational study protocols produce data for which mixed models are appropriate. Examples of mixed models include nested (or hierarchical) designs, clustered designs, splitplot experiments and repeated measures (also called longitudinal) studies. Each of these designs has its own model structure depending on how treatments or explanatory factors are associated with experimental or observational units and how the data are recorded. In nested and splitplot designs there are typically two or more sizes of experimental units. Variances and differences between means must be correctly assessed in order to make valid inferences.
Modeling variation is arguably the most powerful and important single feature of mixed models, and what sets it apart from conventional linear models. This extends beyond variance structure to include correlation among observations and, for nonnormal data, the impact of distribution on how variance and covariance are characterized. In repeated measures designs, discussed in Chapter 8 , measurements taken on the same unit close together in time are often more highly correlated than measurements taken further apart in time. The same principle occurs in two dimensions with spatial data. Care must be taken to build an appropriate covariance structure into the model. Otherwise, tests of hypotheses, confidence intervals, and possibly even the estimates of treatment means themselves may not be valid. The next section surveys typical mixed model issues that are addressed in this book.
1.6 Typical Studies and Modeling Issues That Arise
Mixed model issues are best illustrated by way of examples of studies in which they arise. This section previews six examples of studies that call for increasingly complex models.
1.6.1 Random Effects Model
In the first example, 20 packages of ground beef are sampled from a larger population. Three samples are taken at random from within each package. From each sample, two microbial counts are taken. Suppose you can reasonably assume that the log microbial counts follow a normal distribution. Then you can describe the data with the following linear statistical model:
y i j k = μ + p i + s ( p ) i j + e i j k
Here, y i j k denotes the k th log microbial count for the j th sample of the i th package. Because packages represent a larger population with a plausible probability distribution, you can reasonably assume that package effects, p i , are random. Similarly, sample within package effects, s ( p ) i j , and count, or error, effects, e i j k , are assumed random. Thus, the p i , s ( p ) i j , and e i j k effects are all random variables with means equal to zero and variances σ p 2 , σ s 2 , and σ 2 , respectively. This is an example of a random effects model . Note that only the overall mean is a fixed effects parameter; all other model effects are random.
The modeling issues are as follows:
How should you estimate the variance components σ p 2 , σ s 2 , and σ 2 ?
How should you estimate the standard error of the estimated overall mean, μ ^ ?
How should you estimate or, putting it more correctly predict random model effects p i , or s ( p ) i j if these are needed?
Mixed model methods primarily use three approaches for variance component estimation: (1) procedures based on expected mean squares from the analysis of variance (ANOVA); (2) maximum likelihood (ML); and (3) residual maximum likelihood (REML), also known as restricted maximum likelihood. Of these, ML is usually discouraged, because the variance component estimates are biased downward, and hence so are the standard errors computed from them (Stroup, 2013, Chapter 4 ). This results in excessively narrow confidence intervals whose coverage rates are below the nominal 1 level, and upwardly biased test statistics whose Type I error rates tend to be well above the nominal level. The REML procedure is the most versatile, but there are situations for which ANOVA procedures are preferable. PROC MIXED and GLIMMIX in SAS use the REML approach by default for normally distributed data. PROC MIXED also provides optional use of ANOVA and other methods when needed. Chapter 2 presents examples in which you would want to use ANOVA rather than REML estimation.
The estimate of the overall mean in the random effects model for packages, samples, and counts is
μ ^ = y ¯ · · · = ∑ y i j k / I J K
where I denotes the number of packages (20), J is the number of samples per package (3), and K is the number of counts per sample (2). Substituting the model equations yields
∑ ( μ + p i + s ( p ) i j + e i j k ) / I J K
and taking the variance yields the following:
Var [ μ ^ ] = V a r [ ∑ ( p i + s ( p ) i j + e i j k ) ] / ( I J K ) 2 = ( J K σ p 2 + K σ s 2 + σ 2 ) / I J K
If you write out the ANOVA table for this model, you can show that you can estimate Var [ μ ^ ] by MS(package) / ( I J K ) . Using this, you can compute the standard error of μ ^ by the following:
MS(package) / ( I J K )
Hence, the confidence interval for m becomes the following, where 1 is the confidence level and
t α / 2 , d f ( package )
is the twosided critical value from the t distribution and df ( package ) are the degrees of freedom associated with the package source of variation in the ANOVA table.
y ¯ · · · ± t α / 2 , d f ( package ) MS(package) / ( I J K )
The critical value can be computed using TINV(1 (alpha/2),dendf,0) in SAS.
If you regard package effects as fixed, you would estimate its effect as p ^ i = y ¯ i · · − y ¯ · · · . However, because the package effects are random variables, the best linear unbiased predictor ( BLUP ) is more efficient:
E [ p i  y ] = E [ p i ] + Cov [ p i , y ¯ i · · ] ( V a r [ y ¯ i · · ] ) − 1 ( y ¯ i · · − y ¯ · · · )
This leads to the BLUP:
p ^ i = ( σ p 2 ( J K σ p 2 + K σ s 2 + σ 2 ) / J K ) ( y ¯ i · · − y ¯ · · · )
When estimates of the variance components are used, the above is not a true BLUP, but an estimated BLUP, often called an EBLUP . Best linear unbiased predictors are used extensively in mixed models and are discussed in detail in Chapter 9 .
1.6.2 Multilocation Example
The second example appeared in Output 3.7 of SAS System for Linear Models, Fourth Edition (Littell et al. 2002). The example is a designed experiment with three treatments observed at each of eight locations. At the various locations, each treatment is assigned to between three and 12 randomized complete blocks. A possible linear statistical model is as follows, where L i is the i th location effect, b ( L ) i j is the ij th block within location effect, is the k th treatment effect, and is the ik th location by treatment interaction effect:
y i j k = μ + L i + b ( L ) i j + τ k + ( τ L ) i k + e i j k
The modeling issues are as follows:
Should location be a random or fixed effect?
Depending on issue 1, the F test for treatment depends on MS(error) if location effects are fixed or MS(location treatment) if location effects are random.
Also depending on issue 1, the standard error of treatment means and differences are affected.
The primary issue is one of inference space that is, the population to which the inference applies. If location effects are fixed, then inference applies only to those locations actually involved in the study. If location effects are random, then inference applies to the population represented by the observed locations . Another way to look at this is to consider issues 2 and 3. The expected mean square for error is s 2 , whereas the expected mean square for location treatment is 2 + k TL 2 , where TL 2 is the variance of the location treatment effects and k is a constant determined by a somewhat complicated function of the number of blocks at each location. The variance of a treatment mean is 2 / (number of observations per treatment) if location effects are fixed, but it is [ 2 + K ( TL 2 + L 2 )] / (obs/trt) if location effects are random. The inference space question, then, depends on what sources you believe contribute to uncertainty. If you believe all uncertainty comes from variation among blocks and experimental units within locations, you believe locations are fixed. If, on the other hand, you believe that variation among locations contributes additional uncertainty, then you believe locations are random. Issues of this sort first appear in Chapter 2 , and reappear in various forms throughout the rest of the book.
1.6.3 Repeated Measures and SplitPlot Experiments
Because repeated measures and splitplot experiments share some characteristics or structures, they have some modeling issues in common. Suppose that three drug treatments are randomly assigned to subjects, to the i th treatment. Each subject is observed at 1, 2, ..., 7, and 8 hours posttreatment. A possible model for this study is as follows, where represents treatment effects, t represents time (or hour) effects, and s( a ) represent the random subject within treatment effects:
y i j k = μ + α i + s ( α ) i j + τ k + ( α τ ) i k + e i j k
The main modeling issues here are as follows:
The experimental unit for the treatment effect (subject) and for time and time treatment effects (subject time) are different sizes, and hence these effects require different error terms for statistical inference. This is a feature common to splitplot and repeated measures experiments.
The errors, e i j k , are correlated within each subject. How best to model correlation and estimate the relevant variance and covariance parameters? This is usually a question specific to repeated measures experiments.
How are the degrees of freedom for confidence intervals and hypothesis tests affected?
How are standard errors affected when estimated variance and covariance components are used?
Chapter 8 discusses the various forms of splitplot experiments and appropriate analysis. You can conduct the analysis using either PROC GLIMMIX or PROC MIXED. When both PROC GLIMMIX and PROC MIXED can be used, examples in this edition use PROC GLIMMIX because of its greater versatility.
Repeated measures with normally distributed data use strategies similar to splitplots for comparing means. Chapter 8 builds on Chapter 5 by adding material specific to repeated measures data. Chapter 8 discusses procedures for identifying and estimating appropriate covariance matrices. Degree of freedom issues are first discussed in Chapter 2 and appear throughout the book. Repeated measures, and correlated error models in general, present special problems to obtain unbiased standard errors and test statistics. These issues are discussed in detail in Chapter 8 . Spatial models are also correlated error models and require similar procedures.
1.6.4 Fixed Treatment, Random Block, Nonnormal (Binomial) Data Example
The fourth example is a clinical trial with two treatments conducted at eight locations. At each location, subjects are assigned at random to the two treatments; n i j subjects are assigned to treatment i at location j . Subjects are observed to have either favorable or unfavorable reactions to the treatments. For the ij th treatmentlocation combination, y i j subjects have favorable reactions, or, in other words, p i j = y i j / n i j is the proportion of favorable reactions to treatment i at location j .
This study raises the following modeling issues:
Clinic effects may be random or fixed, raising inference space questions similar to those just discussed.
The response variable has a binomial, not normal, distribution.
Because of issue 2, the response may not be linear in the parameters, and the errors may not be additive, casting doubt on the appropriateness of a linear statistical model.
Also as a consequence of issue 2, variance is a function of the mean, and is therefore not homogeneous by treatment. In addition, residual in the traditional linear statistical model has no meaning. How one characterizes variability at the unit of observation level must be rethought.
A possible model for this study is a generalized linear mixed model. Denote the probability of favorable reaction to treatment i at location j by π i j . The generalized linear model is as follows:
log ( π i j 1 − π i j ) = η + τ i + c j + c t i j
Or alternatively it is as follows, where c i are random clinic effects, τ j are fixed treatment effects, and c τ i j are random clinic treatment interaction effects:
π i j = e η + τ i + c j + c t i j 1 + e η + τ i + c j + c t i j = 1 1 + e − ( η + τ i + c j + c t i j )
Generalized linear mixed models are discussed in Chapters 11 through 13 .
1.6.5 Repeated Measures with Nonnormal (Count) Data
The fifth example appears in Output 10.39 of SAS System for Linear Models, Fourth Edition (Littell et al. 2002). Two treatments are assigned at random to subjects. Each subject is then observed at four times. In addition, there is a baseline measurement and the subject s age. At each time of measurement, the number of epileptic seizures is counted. The modeling issues here are as follows:
Counts are not normally distributed.
Repeated measures raise correlated observation issues similar to those discussed previously, but with additional complications. Specifically, because residual has no meaning for distributions appropriate for count data, you cannot use the residual covariance structure, as you would with normally distributed data, to account for correlated repeated measurements. You must use an alternative approach.
The model involves both factor effects (treatments) and covariates (regression) in the same model, essentially, called analysis of covariance.
Chapter 7 introduces analysis of covariance in mixed models. Count data in conjunction with repeated measures is an important topic in generalized linear mixed models, and is discussed in Chapter 13 .
1.6.6 Repeated Measures and Split Plots with Nonlinear Effects
The final example involves five treatments observed in a randomized block experiment with 4 blocks. Each experimental unit is observed at several times over the growing season and percent emergence is recorded. Figure 1.1 shows a plot of the percent emergence by treatment over the growing season. Like the example in section 1.6.3, this is a repeated measures experiment, but the structure and model equation are similar to splitplot experiments, so similar principles apply to mixed model analysis of these data.
Figure 1.1: Treatment Means of Sawfly Data over Time
The modeling issues are as follows:
The usual mixed model and repeated measures issues discussed in previous examples; plus
The obvious nonlinear function required to describe percent emergence as a function of date.
A possible model for this experiment is as follows, where μ i j is the ij th treatment date mean, w i k is the random wholeplot error effect, and e i j k are the repeated measures errors, possibly correlated:
y i j k = μ i j + w i k + e i j k
The Gompertz model described earlier is a suitable candidate to model μ i j as a function of date j for treatment i. The model described here is an example of a nonlinear mixed model.
Alternatively, you could model the response by using a generalized linear mixed model, assuming that
y i j k  w i j k ~ Beta ( μ i j , ϕ )
and
w i k ~ N ( 0 , σ w 2 )
and
η i j k = log ( μ i j / ( 1 − μ i j ) ) = η + τ i + δ j + τ δ i j
where denotes the Beta scale parameter, denotes treatment effects and denotes date effects. The advantage of this model is that beta distribution is wellsuited to continuous proportion data. The disadvantage is that accounting for repeated measures correlation with generalized linear mixed models with a Beta distribution is difficult given the current state of the art.
1.7 A Typology for Mixed Models
From the examples in the previous section, you can see that contemporary mixed models cover a very wide range of possibilities. In fact, models that many tend to think of as distinct are, in reality, variations on a unified theme. Indeed, the model that only a generation ago was universally referred to as the general linear model fixed effects only, normal and independent errors, homogeneous varianceis now understood to be one of the more restrictive special cases among commonly used statistical models. This section provides a framework to view the unifying themes, as well as the distinctive features, of the various modeling options under the general heading of mixed models that can be implemented with procedures in SAS.
As seen in the previous example, the two main features of a statistical model are (1) a characterization of the mean , or expected value of the observations, as a function of model parameters and constants that describe the study design, and (2) a characterization of the probability distribution of the observations. The simplest example is a onefactor means model where the expected value of the observations on treatment i is μ i and the distribution is N ( μ i , σ 2 ) , which leads to the linear statistical model y i j = μ i + e i j . The fifth example of Section 1.5 provides a more complex example: the mean model is as follows:
π i j = 1 1 + e − ( η + τ i + c j + c t i j )
and the distribution has two partsthat of the random effects c j and ( c τ ) i j , and that of the observations given the random effects, essentially y j  c j , ( c τ ) i j ~ Binomial ( n i j , π i j ) . But each model follows from the same general framework.
Appendix A provides a more detailed presentation of mixed model theory. In what follows we present an admittedly simplistic overview that uses matrix notation, which is developed more fully at appropriate points throughout the book and in the appendix.
Models have two sets of random variables whose distributions we need to characterize: Y , the vector of observations, and u , the vector of random model effects. The models considered in this book assume that the random effects in the model follow a normal distribution, so that in general we assume u ~ MVN( 0 , G) that is, u has a multivariate normal distribution with mean zero variancecovariance matrix G . In a simple variance components model, such as the randomized block model given in Section 1.3, G = b 2 I .
By mean of the observations we can refer to one of two concepts: either the unconditional mean , E[ Y ] or the conditional mean of the observations given the random model effects, E[ Y  u ]. In a fixed effects model, the distinction does not matter, but for mixed models it clearly does. Mixed models are mathematical descriptions of the conditional mean in terms of fixed effect parameters, random model effects, and various constants that describe the study design. The general notation is as follows:
is the vector of fixed effect parameters.
X is the matrix of constants that describe the structure of the study with respect to the fixed effects. This includes the treatment design, regression explanatory or predictor variables, and the like.
Z is the matrix of constants that describe the study s structure with regard to random effects. This includes the blocking design, explanatory variables in random coefficient designs (see Chapter 10 ), etc.
The mixed model introduced in Section 1.4, where observations are normally distributed, models the conditional mean as E[ Y  u ] = X + Zu , and assumes that the conditional distribution of the observations given the random effects is Y  u ~ MVN( X + Zu, R ) , where R is the variancecovariance matrix of the errors. In simple linear models where errors are independent with homogeneous variances, R = 2 I . However, in heterogeneous error models (presented in Chapter 9 ) and correlated error models such as repeated measures or spatial models, the structure of R becomes very important. PROC GLIMMIX and PROC MIXED enable you to specify the structures of the G and R matrices.
The class of generalized linear mixed models (GLMM) has a linear model embedded within a nonlinear functionthat is, g (E[ Y  u ]) is modeled by X + Zu . The GLMM assumes normally distributed random model effects, but not necessarily normally distributed observations. For the general class accommodated by the GLMM variance of Yu may be characterized by a function of the mean, a scale parameter, or both. The generic expression for the variance is
Var ( Y  u ) = V μ 1 2 A V μ 1 2
where V μ = diag [ v ( μ ) ] , v ( μ ) is the variance function, and A denotes the scale matrix. For example, for the Poisson distribution, denoting the mean as E ( Y  u ) = λ , the variance function is v ( λ i ) = λ i and the scale matrix is A = I . Hence Var ( Y  u ) = diag ( λ i ) . For the normal distribution, the scale matrix is A=R and, because the variance does not depend on the mean, V μ = I . Hence Var ( Y  u ) = R . In certain GLMMs for repeated measures and spatially correlated data, A takes the form of a working correlation matrix. This is one, but not the only, GLMM approach for the analysis of such data.
In the most general mixed model included in SAS, the nonlinear mixed model (NLMM), the conditional mean is modeled as a function of X , Z , , and u with no restrictions; in essence, h ( X , β , Z , u ) models E[ Y  u ]. Each successive model is more restrictive. In NLMMs and GLMMs, the observations are not necessarily assumed to be normally distributed. The GLMM assumes a linear predictor, although it may be embedded in a nonlinear function of the meanin essence, h ( X , β , Z , u ) = h ( X β + Z u ) . The linear mixed model (LMM) does assume normally distributed observations and models the conditional mean directlythat is, you assume E[ Y  u ] = X + Zu . Each mixed model has a fixed effects model analog, which means that there are no random model effects and hence Z and u no longer appear in the model, and the model now applies to E[ Y ]. The term mixed model is often associated with the LMMit is the standard mixed model that is implemented in PROC MIXED. However, the LMM is a special case of a GLMM. The next section presents a flowchart to associate the various models with appropriate SAS software.
This text focuses on mixed models with linear predictors, specifically LMMs and GLMMs. Table 1.3 shows the various models and their features in terms of the model equation used for the conditional mean and the assumed distribution of the observations. Although nonlinear models are not covered in this text, they are included in Table 1.3 for completeness in describing types of mixed models and their fixedeffectonly analogs.
Table 1.3: Summary of Models, Characteristics, and Related Book Chapters
Type of Model
Model of Mean
Distribution
Chapter
GLMM
h ( X + Zu )
Yu general, u normal
1114
LMM*
X + Zu
Yu, u normal
210, 15
GLM
h ( X )
Y general
11 and 12
LM*
X
Y normal
24
NLMM
h ( X , β , Z , u )
Yu general, u normal
NA
NLM
h ( X , β )
Y general
NA
Note: Models with linear predictor X β and a nontrivial covariance structure that requires a RANDOM or REPEATED statement are considered mixed models.
1.8 Flowcharts to Select SAS Software to Run Various Mixed Models
SAS offers several procedures (PROCs) designed to implement the various mixed models introduced in the previous sections. PROC GLIMMIX is the flagship of the linear model procedures. It is capable of implementing a comprehensive array of GLMMs, and, because all other linear models are special cases of the GLMM, PROC GLIMMIX can compute analyses for LMM, GLM, and LM. PROC MIXED, introduced in 1992, was designed to implement LMMs. It remains a wellknown and oftused procedure, and has certain LMMspecific specialpurpose features not available in PROC GLIMMIX. SAS has several fixed effects model procedures: PROC GLM implements LMs, PROC NLIN implements NLMs, and PROC GENMOD implements GLMs.
There are also several procedures, e.g., PROC LOGISTIC and PROC LIFEREG, that implement special types of GLMs; PROC REG, which implements special types of LMs; and so forth. These specialpurpose procedures are not discussed in this book, but they are discussed in detail in other SAS publications as noted throughout this book. Note that PROC GLM was named before generalized linear models appeared, and it was named for general linear models ; these are now understood not to be general at all, but the most restrictive special case among the models described in Section 1.7, and are now known simply as linear models (LM).
For NLMMs, SAS offers PROC NLMIXED and the %NLINMIX macro. The %NLINMIX macro continues to have uses that are distinct and supplementary to the NLMIXED procedure.
Figures 1.2 and 1.3 provide flowcharts to help you select the appropriate model and software for your mixed model project. The basic questions that you need to ask are as follows:
Can you assume a normal distribution for your observations? If the model contains random effects, then this question refers to the conditional distribution of the data, given the random effects.
Can you assume that the mean or a transformation of the mean is linearly related to the model effects? Note that linear relation does not mean the absence of curvature. A quadratic (in X ) regression model β 0 + β 1 X + β 2 X 2 is a linear model in the s because all the terms in the model are additive. The linear component is termed the linear predictor. Generalized linear (mixed) models imply such linearity on a certain scale, i.e. the link function g ( · ) . On the other hand, the Gompertz regression equation (see Sections 1.3 and 1.6) is a nonlinear equation.
Are all effects (except errors) fixed? Or, are there random model effects?
Can you assume that the errors are independent? Or, as in repeated measures or spatial data, are errors possibly correlated?
A corollary to the previous question is, are the variances among the errors homogeneous? If the answer is no, then the same modeling strategies for correlated errors are also needed for heterogeneous errors.
Once you answer these questions, you can follow the flowchart to see what kind of model you have and what SAS procedure is appropriate. Then you can refer to the relevant chapter in this book for more information about the model and procedures.
Figure 1.2: Flowchart Indicating Tools in SAS/STAT for Normal Distributed Response
Figure 1.3: Flowchart Indicating Tools in SAS/STAT for Nonnormal Distributed Response
Chapter 2: Design Structure I: Single Random Effect
2.1 Introduction . 19
2.2 Mixed Model for a Randomized Block Design . 20
2.2.1 Means and Variances from Randomized Block Design . 21
2.2.2 The Traditional Method: Analysis of Variance . 21
2.2.3 Expected Mean Squares . 21
2.2.4 Example: A Randomized Complete Block Design . 22
2.3 The MIXED and GLIMMIX Procedures to Analyze RCBD Data . 23
2.3.1 PROC MIXED Analysis Based on Sums of Squares . 23
2.3.2 Basic PROC MIXED Analysis Based on Likelihood . 26
2.3.3 Basic PROC GLIMMIX Analysis . 28
2.3.4 Confidence Intervals for Variance Components . 29
2.3.5 Comments on Using PROC GLM for the Analysis of Mixed Models . 31
2.4 Unbalanced TwoWay Mixed Model: Examples with Incomplete Block Design . 32
2.4.1 Intrablock Analysis of PBIB Data . 33
2.4.2 Combined Intra and Interblock PBIB Data Analysis with PROC GLIMMIX .. 37
2.5 Analysis with a Negative Block Variance Estimate: An Example . 41
2.5.1 Illustration of the Problem ... 41
2.5.2 Use of NOBOUND to Avoid Loss of Type I Error Control 42
2.5.3 Reparameterization of the Model as Compound Symmetry . 43
2.6 Introduction to Mixed Model Theory . 44
2.6.1 Review of Regression Model in Matrix Notation . 44
2.6.2 The RCBD Model in Matrix Notation . 45
2.6.3 Inference Basics for the Randomized Block Mixed Model 46
2.7 Summary . 47
2.1 Introduction
The simplest design structures that raise mixed model issues are those with blocking. Blocking is a research technique used to diminish the effects of variation among experimental units. The units can be people, plants, animals, manufactured mechanical parts, or numerous other objects that are used in experimentation. Blocks are groups of units that are formed so that units within the blocks are as nearly homogeneous as possible. Examples of blocking criteria include batches of manufactured items, plots or benches containing plants, matched pairs of people, day on which an assay is performed, etc. In a designed experiment, the levels of the factor being investigated, called treatments , are randomly assigned to units within the blocks . However, as noted in Chapter 1 , blocking can more generally be understood as a grouping method used in survey sampling (e.g. strata or clustering), observational studies (e.g. matched pairs), and the like.
An experiment conducted using blocking is called a randomized block design . While the methods discussed in this chapter are presented in the context of randomized block designs, you can easily adapt these methods to survey or observational study contexts. Usually, the primary objectives are to estimate and compare the means of treatments (i.e. treatments as broadly defined). In most cases, the treatment effects are considered fixed because the treatments in the study are the only ones to which inference is to be made. That is, no conclusions will be drawn about treatments that were not used in the experiment. Block effects are usually considered random because the blocks in the study constitute only a small subset of a larger set of blocks over which inferences about treatment means are to be made. In other words, the investigator wants to estimate and compare treatment means with statements of precision (confidence intervals) and levels of statistical significance (from tests of hypotheses) that are valid in reference to the entire population of blocks, not just those blocks of experimental units in the experiment. To do so requires proper specification of random effects in model equations. In turn, computations for statistical methods must properly accommodate the random effects. The model for data from a randomized block design usually contains fixed effects for treatment contributions or factors and random effects for blocking factors contributions, making it a mixed model.
The issue of whether blocks effects are considered fixed or random becomes especially important in blocked designs with missing data, or incomplete block designs. Analysis with random block effects enables recovery of interblock information, and the resulting analysis is called combined inter and intrablock analysis. For estimation and testing treatment differences, analysis with or without recovery of interblock information is identical only in the case of a complete block design with no missing data. Otherwise, except where noted in this chapter, interblock information adds efficiency and accuracy to the analysis.
Section 2.2 presents the randomized block model as it is usually found in basic statistical methods textbooks. The traditional analysis of variance (ANOVA) methods are given, followed by an example to illustrate the ANOVA methods. Section 2.3 illustrates mixed model analysis using the GLIMMIX and MIXED procedures to obtain the results for the example. Section 2.4 presents an analysis of data from an incomplete block design to illustrate similarities and differences between analyses with and without recovery of interblock information with unbalanced data. Finally, Section 2.5 presents an example of an analysis with a negative block variance estimate. This presents a common dilemma for data analysts: does one allow the variance estimate to remain negative or does one set it to zero. This section presents the pros and cons of each alternative, as well as a general recommendation. Then, basic mixed model theory for the randomized block design is given in Section 2.6, including a presentation of the model in matrix notation.
2.2 Mixed Model for a Randomized Block Design
A design that groups experimental units into blocks is called a randomized block design . These have two forms: complete block and incomplete block. Complete block designs are generally referred to as randomized complete block designs , or by the acronym RCBD. In an RCBD, each treatment is applied to an experimental unit in each block. In incomplete block designs, only a subset of the treatments is assigned to experimental units in any given block. The balanced incomplete block and partially balanced incomplete block (acronym BIB and PBIB, respectively) are two common examples of this type of design. Blocked designs with missing data share modeling issues and approaches with incomplete block designs. In mostbut not allcases, each treatment is assigned to at most one experimental unit in a given block. See Section 2.4, Milliken and Johnson (2009) and Mead, et al. (2012) for complete discussions of block design strategy and structure.
Whether complete or incomplete, all randomized block designs share a common model. Assume that there are t treatments and r blocks, and that there is one observation per experimental unit. Once the treatments are selected to appear in a given block, each selected treatment is randomly assigned to one experimental unit in that block. In general, there will be N total experimental units. For complete block designs, because each of the t treatments is assigned to one experimental unit in each of the r blocks, there are N = tr experimental units altogether. For incomplete block designs with the same number of experimental units in each block, there are N = rk experimental units, where k denotes the number of experimental units per block.
The conventional assumptions for a randomized block model are as follows:
Letting y i j denote the response from the experimental unit that received treatment i in block j , the equation for the model is as follows:
y i j = μ + τ i + b j + e i j (2.1)
where the terms are defined as follows:
i = 1, 2, ..., t
j = 1, 2, ..., r
and i are fixed parameters such that the mean for the i th treatment is μ i = μ + τ i
b j is the random effect associated with the j th block
e ij is the random error associated with the experimental unit in block j that received treatment i
Assumptions for random effects are as follows:
Block effects are distributed normally and independently with mean 0 and variance σ b 2 ; that is, the b j ( j = 1,2, , r ) are distributed iid N ( 0 , σ b 2 ) .
Errors e ij are distributed normally and independently with mean 0 and variance 2 ; that is, the e ij ( i = 1,2, , t ; j = 1,2, , r ) are distributed iid N ( 0 , σ 2 ) . The e ij are also distributed independently of the b j .
2.2.1 Means and Variances from Randomized Block Design
The usual objectives of a randomized block design are to estimate and compare treatment means using statistical inference. Mathematical expressions are needed for the variances of means and differences between means in order to construct confidence intervals and conduct tests of hypotheses. The following results apply to complete block designs. Once these results are in place, you can adapt them for incomplete blocks, as shown below.
For the RCBD, it follows from Equation 2.1 that a treatment mean, such as y ¯ 1 · , can be written as follows:
y ¯ 1 · = μ 1 + b ¯ · + e ¯ 1 ·
Likewise, the difference between two means, such as y ¯ 1 · − y ¯ 2 · , can be written as follows:
y ¯ 1 · − y ¯ 2 · = μ 1 − μ 2 + e ¯ 1 · − e ¯ 2 ·
From these expressions, the variances of y ¯ 1 · and y ¯ 1 · − y ¯ 2 · are
Var [ y ¯ 1 · ] = ( σ 2 + σ b 2 ) / r
and
Var [ y ¯ 1 · − y ¯ 2 · ] = 2 σ 2 / r
Notice that the variance of a treatment mean Var [ y ¯ 1 · ] contains the block variance component σ b 2 , but the variance of the difference between two means Var [ y ¯ 1 · − y ¯ 2 · ] does not involve σ b 2 . This is the manifestation of the RCBD controlling block variation; the variances of differences between treatments are estimated free of block variation.
2.2.2 The Traditional Method: Analysis of Variance
Almost all statistical methods textbooks present analysis of variance (ANOVA) as a key component in analysis of data from a randomized block design. We assume that readers are familiar with fundamental concepts for analysis of variance, such as degrees of freedom, sums of squares (SS), mean squares (MS), and expected mean squares (E[MS]). Readers needing more information about analysis of variance may consult Littell, Stroup, and Freund (2002) or Milliken and Johnson (2009). Table 2.1 shows a standard ANOVA table for the RCBD, containing sources of variation, degrees of freedom, mean squares, and expected mean squares.
Table 2.1: ANOVA for Randomized Complete Blocks Design
Source of Variation
df
MS
E[MS]
Blocks
r  1
MS(Blk)
σ 2 + t σ b 2
Treatments
t  1
MS(Trt)
σ 2 + r ϕ 2
Error
(r  1)( t  1)
MS(Error)
σ 2
2.2.3 Expected Mean Squares
As the term implies, expected mean squares are the expectations of means squares. They are the quantities estimated by mean squares in an analysis of variance. The expected mean squares can be used to motivate test statistics, and to provide a way to estimate the variance components. For test statistics, the basic idea is to examine the expected mean square for a factor and see how it differs under null and alternative hypotheses. For example, the expected mean square for treatments, E[MS(Trt)] = σ 2 + r ϕ 2 , can be used to determine how to set up a test statistic for treatment differences. The null hypothesis is H 0 : 1 = 2 = = t . The expression 2 in E[MS(Trt)] is
ϕ 2 = ( t − 1 ) − 1 ∑ i = 1 t ( μ i − μ ¯ · ) 2
where μ ¯ is the mean of the i . Thus, 2 = 0 is equivalent to 1 = 2 = = t . So, if the null hypothesis is true, MS(Trt) simply estimates 2 . On the other hand, if H 0 : 1 = 2 = = t . is false, then E[MS(Trt)] estimates a quantity larger than 2 . Now, MS(Error) estimates 2 regardless of whether H 0 is true or false. Therefore, MS(Trt) and MS(Error) tend to be approximately equal if H 0 is true, and MS(Trt) tends to be larger than MS(Error) if H 0 : 1 = 2 = = t is false. So a comparison of MS(Trt) with MS(Error) is an indicator of whether H 0 : 1 = 2 = = t is true or false. In this way the expected mean squares show that a valid test statistic is the ratio F = MS(Trt)/MS(Error).
Expected mean squares can also be used to estimate variance components, variances of treatment means, and variances of differences between treatment means. Equating the observed mean squares to the expected mean squares provides the following system of equations:
MS ( Blk ) = σ ^ 2 + t σ ^ b 2 MS ( Error ) = σ ^ 2
The solution for the variance components is
σ ^ 2 = MS ( Error )
and
σ ^ b 2 = 1 t [ MS ( Blk ) − MS ( Error ) ]
These are called analysis of variance estimates of the variance components. Using these estimates of the variance components, it follows that estimates of Var [ y ¯ 1 · ] and Var [ y ¯ 1 · − y ¯ 2 · ] are
V a ^ r [ y ¯ 1 · ] = ( σ ^ 2 + σ ^ b 2 ) / r = 1 r t MS ( Blk ) + t − 1 r t MS ( Error )
and
V a ^ r [ y ¯ 1 · − y ¯ 2 · ] = 2 r MS(Error)
The expression for
Var [ y ¯ 1 · ]
illustrates a common misconception that the estimate of the variance of a treatment mean from a randomized block design is simply MS(Error)/ r . This misconception prevails in some textbooks and results in incorrect calculation of standard errors by some computer software packages, as well as incorrect reporting in refereed journal articles
2.2.4 Example: A Randomized Complete Block Design
An example from Mendenhall, Wackerly, and Scheaffer (1996, p. 601) is used to illustrate analysis of data from a randomized block design.
Data for this example are presented as Data Set Bond . There are seven blocks and three treatments. Each block is an ingot of a composition material. The treatments are metals (nickel, iron, or copper). Pieces of material from the same ingot are bonded using one of the metals as a bonding agent. The response is the amount of pressure required to break a bond of two pieces of material that used one of the metals as the bonding agent. Table 2.2 contains the analysis of variance table for the BOND data where the ingots form the blocks.
Table 2.2: ANOVA Table for BOND Data
Source of Variation
df
SS
MS
F
p value
Ingots
6
268.29
44.72
4.31
0.0151
Metal
2
131.90
65.95
6.36
0.0131
Error
12
124.46
10.37
The ANOVA table and the metal means provide the essential computations for statistical inference about the population means.
The ANOVA F = 6.36 for metal provides a statistic to test the null hypothesis H 0 : c = i = n . The significance probability for the F test is p = 0.0131, indicating strong evidence that the metal means are different. Estimates of the variance components are σ ^ 2 = 10.37 and σ ^ b 2 = ( 44.72 − 10.37 ) / 3 = 11.45. Thus, an estimate of the variance of a metal mean is ( σ ^ 2 + σ ^ b 2 ) / 7 = 3.11 , and the estimated standard error is 3.11 = 1.77. An estimate of the variance of a difference between two metal means is 2 σ ^ 2 / 7 = 2 × 10.37 / 7 = 2.96 , and the standard error is 2.96 = 1.72.
2.3 The MIXED and GLIMMIX Procedures to Analyze RCBD Data
PROC GLIMMIX and PROC MIXED are procedures with several capabilities for different methods of analyzing mixed models. PROC MIXED can be used for linear mixed models (LMMs), i.e., when you can assume that the response variable has a Gaussian distribution. PROC MIXED enables you to estimate the variance components using sums of squares and expected mean squares, as described in the previous section or by using likelihood methods. PROC GLIMMIX can be used for LMMs and generalized linear mixed models (GLMMs; i.e., for both Gaussian and nonGaussian response variables). PROC GLIMMIX uses only likelihoodbased methods.
For the randomized block examples presented in this chapter, and for more complex LMM applications presented in Chapters 5 through 10 , analyses obtained using PROC MIXED or PROC GLIMMIX are essentially identical. For certain advanced LMMs, not presented in this volume, PROC MIXED offers specialized capabilities that are not available in PROC GLIMMIX. On the other hand, for GLMMs with nonGaussian data, discussed in Chapters 11 through 13 , and for inference on variance components, presented in Chapter 6 , PROC GLIMMIX provides capabilities that are not available in PROC MIXED. For this reason, in this section, analyses of an RCBD are shown using both procedures, but all subsequent examples in this volume use PROC GLIMMIX.
In both PROC MIXED and PROC GLIMMIX, many of the estimation and inferential methods are implemented on the basis of the likelihood function and associated principles and theory (see Appendix A , Linear Mixed Model Theory, for details). Readers may be more familiar with the analysis of variance approach described in the previous section; those results are obtained and presented in Section 2.3.1. The likelihood method results are presented in Section 2.3.2. Output from both PROC MIXED and PROC GLIMMIX are presented so readers can see that the results are the same, but the presentation format is slightly different. The results of the analysis of variance and likelihood methods are compared and are shown to duplicate many of the results of the previous section.
There are extensive postprocessing options for mean comparison estimation, testing, and plotting available with both procedures. Presentation of these options, focusing on the more extensive options available with PROC GLIMMIX, are deferred to Chapter 3 .
2.3.1 PROC MIXED Analysis Based on Sums of Squares
This section contains the code to provide the analysis of the RCBD with PROC MIXED using the sums of squares approach as described in Section 2.2.4. The METHOD=TYPE3 option is used to request that Type 3 sums of squares be computed along with their expected mean squares. Those mean squares and expected mean squares are used to provide estimates of the variance components and estimates of the standard errors associated with the means and comparisons of the means.
Program
Program 2.1 shows the basic PROC MIXED statements for the RCBD data analysis.
Program 2.1
proc mixed data=bond cl method=type3;
class ingot metal;
model pres = metal;
random ingot;
lsmeans metal;
run;
The PROC MIXED statement calls the procedure. The METHOD=TYPE3 option requests that the Type 3 sums of squares method be used in estimating the variance components. You can request Type 1, 2, or 3 sums of squares. See Milliken and Johnson (2009) or Littell and Stroup (2002) for additional detail. The CLASS statement specifies that INGOT and METAL are classification variables, not continuous variables.
The MODEL statement is an equation whose lefthand side contains the name of the response variable to be analyzed, in this case PRES. The righthand side of the MODEL statement contains a list of the fixed effect variables, in this case the variable METAL. In terms of the statistical model, this specifies the i parameters. (The intercept parameter is implicitly contained in all models unless otherwise declared by using the NOINT option.)
The RANDOM statement contains a list of the random effects, in this case the blocking factor INGOT, and represents the b j terms in the statistical model.
The MODEL and RANDOM statements are the core essential statements for many mixed model applications, and the terms in the MODEL statement do not appear in the RANDOM statement, and vice versa.
Results
Results from the MODEL and RANDOM statements about the methods used appear in Output 2.1.
Output 2.1: Results of RCBD Data Analysis from PROC MIXED Using Type 3 Sums of Squares
Model Information
Data Set
WORK.BOND
Dependent Variable
pres
Covariance Structure
Variance Components
Estimation Method
Type 3
Residual Variance Method
Factor
Fixed Effects SE Method
ModelBased
Degrees of Freedom Method
Containment
Class Level Information
Class
Levels
Values
ingot
7
1 2 3 4 5 6 7
metal
3
c i n
Dimensions
Covariance Parameters
2
Columns in X
4
Columns in Z
7
Subjects
1
Max Obs per Subject
21
Number of Observations
Number of Observations Read
21
Number of Observations Used
21
Number of Observations Not Used
0
Interpretation
The Model Information table contains the model specifications for the data set being used, the response variable, the methods used to estimate the variance components, the approximate degrees of freedom, and the standard errors for the fixed effects.
The Class Level Information table lists the levels for each of the variables declared in the class statement. You should be sure that these levels are specified consistently with how the study was conducted.
The Dimensions table shows how many columns are in the fixed effects matrix ( X ) and in the random effects matrix ( Z ) parts of the model, where the linear predictor is X + Zu (see Section 1.7). For this study there are three levels of the treatment factor (metal) plus an intercept, which accounts for four columns in the X matrix. There are seven ingots (blocks), thus there are seven columns in the Z matrix. The inclusion of the RANDOM statement means that there is one variance component for the ingot effects, plus the residual variance, providing two parameters in the covariance structure of the model. There is no SUBJECT= option used in this RANDOM statement, so PROC MIXED assumes that all observations are from the same subject, a quantity that can be ignored here.
The Number of Observations table indicates how many observations are in the data set and how many of those observations had valid data values for all variables used in the analysis. The difference between the number in the data set and the number used is the number of observations not used in the analysis. The information in these dimension specifications must match the information that is expected from the design being analyzed. Checking these values can help determine if there are data errors that need to be addressed, because they can cause the analysis to fail.
Results
Statistical results from the MODEL and RANDOM statements appear in Output 2.2.
Output 2.2: Results of the RCBD Data Analysis from PROC MIXED Using Type 3 Sums of Squares to Estimate the Variance Components
Type 3 Analysis of Variance
Source
DF
Sum of Squares
Mean Square
Expected Mean Square
Error Term
Error DF
F Value
Pr F
metal
2
131.900952
65.950476
Var(Residual) + Q(metal)
MS(Residual)
12
6.36
0.0131
ingot
6
268.289524
44.714921
Var(Residual) + 3 Var(ingot)
MS(Residual)
12
4.31
0.0151
Residual
12
124.459048
10.371587
Var(Residual)
.
.
.
.
Covariance Parameter Estimates
Cov Parm
Estimate
Standard Error
Z Value
Pr Z
Alpha
Lower
Upper
ingot
11.4478
8.7204
1.31
0.1893
0.05
5.6438
28.5394
Residual
10.3716
4.2342
2.45
0.0072
0.05
5.3332
28.2618
Type 3 Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
metal
2
12
6.36
0.0131
Interpretation
The Type 3 Analysis of Variance table is the usual analysis of variance table with degrees of freedom, sums of squares, mean squares, expected mean squares, error terms for effects other than the residual, F tests, and significance levels for these tests. The terms Var(Residual) and Var(ingot) denote the variance components 2 and σ b 2 , respectively. See the discussion of the Tests of Fixed Effects table for more detail.
The Covariance Parameter Estimates table gives estimates of the variance component parameters obtained by solving the set of equations from equating the observed mean squares to the expected mean squares. The estimate of σ b 2 , the block variance component, is 11.4478 (labeled ingot ), and the estimate of 2 , the error variance component, is 10.3716 (labeled Residual ). The confidence intervals for the variance components are Wald confidence intervals.
The Tests of Fixed Effects table is like an abbreviated ANOVA table, showing a line of computations for each term in the MODEL statement. In this example, only METAL is included in the MODEL statement. The F statistic is used to test the null hypothesis H 0 : c = i = n vs. H a (not H 0 ). With 2 numerator and 12 denominator degrees of freedom, the F value of 6.36 is significant at the 5% level ( p value is 0.0131). If the true METAL means are equal, then an F value as large as 6.36 would occur less than 131 times in 10,000 by chance. This is the same F test that was obtained from the analysis of variance.
In summary, these basic PROC MIXED computations are based on sums of squares and provide the same statistical computations obtained from analysis of variance methods for a balanced data set.
2.3.2 Basic PROC MIXED Analysis Based on Likelihood
Both PROC MIXED and PROC GLIMMIX, by default, provide maximum likelihood estimates (acronym MLE) of model effects, and REML estimates of variance components. REML stands for REsidual (or REstricted) Maximum Likelihood (Patterson and Thompson 1971). A fundamental strength of likelihoodbased methodology is its adaptability. For randomized block models, analysis of variance and likelihoodbased methods produce identical results, but analysis of variance methods cannot be applied to most cases that are even slightly more complex than the randomized block, whereas likelihoodbased methods can be applied to arbitrarily complex models.
When comprehensive mixed model software first became widely availablein the early 1990ssome questioned the use of REML as the default variance estimation method. Specifically, why not maximum likelihood (ML) estimates of the variance? While not shown here, you can obtain ML variance estimates by using METHOD=ML in PROC MIXED or METHOD=MSPL in PROC GLIMMIX. The resulting variance estimates will be less that the corresponding REML estimates, and the resulting confidence intervals will be narrower and test statistics will be greater. This reflects the wellknown fact that ML estimates of variance components are biased downward. For example, in the onesample case, when y 1 , y 2 , ... , y n is a random sample from N ( μ , σ 2 ) , the ML estimate of the variance is as follows:
∑ i ( y i − y ¯ ) 2 / n
whereas the sample variancewhich is the simplest REML variance estimateis as follows:
∑ i ( y i − y ¯ ) 2 / ( n − 1 )
We know that the latter is unbiased and universally regarded as the preferred variance estimate. One can easily show that the use of ML variance estimates results in upwardly biased type I error rates (rejection rates as high as 25% for a nominal α = 0.05 ), and inadequate confidence interval coverage.
Program
Program 2.2 uses the default REML method for estimating the variance components. One could exclude METHOD=REML in the PROC MIXED statement, and achieve the same results. The assumptions of normality of the various terms in the model Equation 2.1 are required in order to construct the appropriate likelihood function that is maximized. The code to provide the likelihoodbased analysis is identical to that of the sums of squares method, except for the method specification.
Program 2.2
proc mixed data=bond method=reml;
class ingot metal;
model pres=metal;
random ingot;
run;
The PROC MIXED statement invokes the procedure for the default method of estimation, REML. The CLASS, MODEL, and RANDOM statements are identical to those in Section 2.3.1.
Results
The results of Program 2.2 appear in Output 2.3.
Output 2.3: Results of RCBD Data Analysis from PROC MIXED METHOD =REML
Model Information
Data Set
WORK.BOND
Dependent Variable
pres
Covariance Structure
Variance Components
Estimation Method
REML
Residual Variance Method
Profile
Fixed Effects SE Method
ModelBased
Degrees of Freedom Method
Containment
Class Level Information
Class
Levels
Values
ingot
7
1 2 3 4 5 6 7
metal
3
c i n
Dimensions
Covariance Parameters
2
Columns in X
4
Columns in Z
7
Subjects
1
Max Obs per Subject
21
Number of Observations
Number of Observations Read
21
Number of Observations Used
21
Number of Observations Not Used
0
Iteration History
Iteration
Evaluations
2 Res Log Like
Criterion
0
1
112.40987952
1
1
107.79020201
0.00000000
Convergence criteria met.
Covariance Parameter Estimates
Cov Parm
Estimate
ingot
11.4478
Residual
10.3716
Type 3 Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
metal
2
12
6.36
0.0131
Differences between results in Output 2.3 and Output 2.2 include the following:
The Model Information table shows that REML is the specified method of estimating the variance components.
The Iteration History table shows the sequence of evaluations to obtain (restricted) maximum likelihood estimates of the variance components. This portion of the output is not critical to most applications, such as the present RCBD analysis.
The Covariance Parameter Estimates table gives estimates of the variance component parameters. The REML estimate of σ b 2 , the block variance component, is 11.4478 (labeled ingot ), and the estimate of 2 , the error variance component, is 10.3716 (labeled Residual ). For this example of a balanced data set, these variance component estimates are identical to the estimates obtained from the analysis of variance method.
Notice that the essential output you would report; that is, the variance component estimates and the test statistics for the null hypothesis of no treatment effectin essence, the F value, 6.36, and p value, 0.0131are identical to the results using analysis of variance.
In summary, the default PROC MIXED computations are based on likelihood principles, but many of the results are the same as those obtained from analysis of variance methods for the RCBD.
2.3.3 Basic PROC GLIMMIX Analysis
You can use PROC GLIMMIX to compute the same analysis as PROC MIXED METHOD=REML. Because PROC GLIMMIX is the most general of the SAS mixed model procedures, most examples from this point forward use PROC GLIMMIX. The RCBD data set is shown using both procedures to enable you to see the similarities, as well as some minor differences in the format of the results.
Program
Program 2.3 shows the PROC GLIMMIX program corresponding to PROC MIXED Program 2.2 in Section 2.3.2.
Program 2.3
proc glimmix data=bond method=rspl;
class ingot metal;
model pres=metal;
random ingot;
run;
The only difference is that RSPL replaces REML in the METHOD option. RSPL (Residual Subjectspecific Pseudo Likelihood) is a generalized form of the REML algorithm that can be used for generalized linear mixed models (GLMMs), essentially mixed models with nonGaussian response variable. The more general algorithm is required to enable PROC GLIMMIX to accommodate nonGaussian data. Chapters 11 , 12 and 13 cover GLMMs. Details of the RSPL algorithm are given in Appendix B . The distinction between RSPL and REML is only relevant in those chapters. With Gaussian response variablesin essence, when the data are assumed to have a normal distributionthe RSPL algorithm reduces to REML. For Gaussian data, RSPL and REML are one and the same.
Results
Output 2.4 shows selected results.
Output 2.4: Results of RCBD Data Analysis from PROC GLIMMIX
Model Information
Data Set
WORK.BOND
Response Variable
pres
Response Distribution
Gaussian
Link Function
Identity
Variance Function
Default
Variance Matrix
Not blocked
Estimation Technique
Restricted Maximum Likelihood
Degrees of Freedom Method
Containment
Covariance Parameter Estimates
Cov Parm
Estimate
Standard Error
ingot
11.4478
8.7204
Residual
10.3716
4.2342
Type III Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
metal
2
12
6.36
0.0131
Default output from PROC GLIMMIX is similar to default REML output from PROC MIXED. They differ in that the PROC GLIMMIX table of Covariance Parameter Estimates, includes a column for the standard error whereas PROC MIXED does not. For small data sets, the standard error of the variance component estimate is not too useful, because it is based on too few degrees of freedom. Confidence intervals for variance components based on a Satterthwaite approximation or the profile likelihood are useful when Wald type confidence intervals are not. Satterthwaite and profile likelihood confidence intervals are discussed in the next section.
2.3.4 Confidence Intervals for Variance Components
Confidence intervals can be used when it is of interest to access the uncertainty about the variance components in the model. A (1 ) 100% confidence interval about 2 can be constructed by using the chisquare distribution, as
( b − 1 ) ( t − 1 ) σ ^ 2 χ ( 1 − α / 2 ) , ( b − 1 ) ( t − 1 ) 2 ≤ σ 2 ≤ ( b − 1 ) ( t − 1 ) σ ^ 2 χ ( α / 2 ) , ( b − 1 ) ( t − 1 ) 2
where
χ ( 1 − α / 2 ) , ( b − 1 ) ( t − 1 ) 2
and
χ α / 2 , ( b − 1 ) ( t − 1 ) 2
are the lower and upper α / 2 percentage points of a central chisquare distribution with ( b 1) ( t 1) degrees of freedom, respectively. When the estimate of σ b 2 is positive, an approximate (1 ) 100% confidence interval about σ b 2 can be constructed using a Satterthwaite (1946) approximation. The estimate of σ b 2 is a linear combination of mean squares, which in general can be expressed as
σ ^ b 2 = ∑ i = 1 s q i M S i
where the i th mean square is based on f i degrees of freedom and q i is the constant by which the i t h mean square is multiplied to obtain σ ^ b 2 . The approximate number of Satterthwaite degrees of freedom associated with σ ^ 2 b is as follows:
v = ( σ ^ b 2 ) 2 ∑ i = 1 s [ ( q i M S i ) 2 ] / f i
For the randomized complete block, the expression is the following:
σ ^ b 2 = 1 t ( MS ( Blk ) − MS ( Error ) )
The approximate number of degrees of freedom is as follows:
v = ( σ ^ b 2 ) 2 ( t − 1 MS ( Blk ) ) 2 b − 1 + ( t − 1 MS ( Error ) ) 2 ( b − 1 ) ( t − 1 )
A (1 ) 100% confidence interval about σ b 2 can be constructed using the chisquare distribution, as the following, where χ ( 1 − α / 2 ) , v 2 and χ α / 2 , v 2 are the lower and upper /2 percentage points with degrees of freedom, respectively:
v σ ^ b 2 χ ( 1 − α / 2 ) , v 2 ≤ σ b 2 ≤ v σ ^ b 2 χ α / 2 , v 2
Program to Obtain Satterthwaite Approximation Confidence Intervals
You can use either PROC GLIMMIX or PROC MIXED to obtain Satterthwaite approximation confidence intervals about σ b 2 and σ 2 . With PROC MIXED, use the COVTEST and CL options in the PROC statement. With PROC GLIMMIX, use the COVTEST statement with the CL option. Program 2.4 shows the PROC GLIMMIX statements.
Program 2.4
proc glimmix data=bond;
class ingot metal;
model pres=metal;
random ingot;
covtest / cl;
run;
Results
The results of computing the estimate of the variance components and using the Satterthwaite approximation to construct the confidence interval about σ b 2 are given in Output 2.5.
Output 2.5: Wald Confidence Intervals for Block and Residual Variance from the PROC GLIMMIX COVTEST CL Option
Covariance Parameter Estimates
Cov Parm
Estimate
Standard Error
Wald 95% Confidence Bounds
ingot
11.4478
8.7204
3.8811
121.55
Residual
10.3716
4.2342
5.3332
28.2618
The 95% confidence interval for the block (INGOT) variance is (3.88, 121.55) and for the residual variance is (5.33, 28.26). The confidence intervals denoted as Wald confidence intervals are in fact Satterthwaite approximate confidence intervals. The Satterthwaite degrees of freedom are computed as df = 2 * Z 2 , where Z = Estimate/(Standard Error). The confidence interval is as follows:
d f * Estimate χ 1 − α / 2 , d f 2 ≤ σ 2 ≤ d f * Estimate χ α / 2 , d f 2
For the Ingot variance component, terms are as follows:
Z = 11.4478 / 8.7204 = 1.313
d f = 2 * ( 1.313 2 ) = 3.45
χ 0.025 , 3.45 2 = 0.3246
χ 0.975 , 3.45 2 = 10.17
The 95% confidence for the Ingot variance is as follows:
3.45 * 11.4478 10.17 ≤ σ Ingot 2 ≤ 3.45 * 11.4478 0.3246
or
3 .881 ≤ σ Ingot 2 ≤ 121.55
which is the same as shown in Output 2.5.
For all but very large data sets, the Satterthwaite confidence bounds are more accurate than Wald confidence bounds and therefore recommended. You can obtain Satterthwaite bounds using either PROC GLIMMIX or PROC MIXED. An alternative procedure, available only with PROC GLIMMIX, uses the likelihood ratio. Let σ denote the vector of covariance parameters, and log L ( σ ^ ) denote the restricted log likelihood given the REML estimates of the parameters of σ . For the ingot example,
σ ′ = [ σ b 2 σ 2 ]
Let
log L ( σ ^  σ ˜ b 2 )
denote the restricted log likelihood for a given valuenot necessarily the REML or ML estimateof the block variance, denoted σ ˜ b 2 and the estimate of the other variance components, in this case σ 2 , given σ ˜ b 2 . We know that − 2 log ( Λ ) where Λ denotes the likelihood ratio, can be written as follows:
2 { log L ( σ ^ ) − log L ( σ ^ 2  σ ˜ b 2 ) }
And we know that it has an approximate χ 2 distribution. Just as the Satterthwaite approximation confidence interval contains all variance component values such that the test statistic, v σ ^ b 2 / σ b 2 , is between upper and lower quantiles of the χ ν 2 distribution, you can form a 95% confidence interval for σ b 2 from the set of all σ ˜ b 2 such that the likelihood ratio test statistic,
2 { log L ( σ ^ ) − log L ( σ ^ 2  σ ˜ b 2 ) } < χ 2
You can obtain profile likelihood confidence intervals for a variance component in two ways. The profile likelihood ratio (PLR) reestimates all the other covariance parameters for each new value of the parameter for which the confidence interval is being determined. The empirical likelihood ratio (ELR) uses the REML estimate of σ 2 to calculate the likelihood ratio for all values of σ b 2 being evaluated. The latter is computationally simpler and is adequate for blocked designs. You can obtain empirical profile likelihood confidence intervals using the following modification to the COVTEST statement.
covtest / cl(type=elr);
Output 2.6 shows the result.
Output 2.6: Estimated Profile Likelihood Confidence Intervals for Block and Residual Variance
Covariance Parameter Estimates
Cov Parm
Estimate
Standard Error
Estimated Likelihood 95% Confidence Bounds
Lower
Upper
Bound
Pr Chisq
Bound
Pr Chisq
ingot
11.4478
8.7204
2.2907
0.0500
56.4772
0.0500
Residual
10.3716
4.2342
5.1386
0.0500
25.2825
0.0500
Notice that the confidence bounds are noticeably more precise, especially for the block variance.
2.3.5 Comments on Using PROC GLM for the Analysis of Mixed Models
Prior to the advent of mixed model methodsPROC MIXED was introduced in the early 1990sPROC GLM was the principal SAS procedure for analyzing mixed models, even though the basic computations of PROC GLM are for fixed effects models. Statistical methods textbooks continued to present the analysis of blocked designs using PROC GLM well into the 2000s. For the complete block designs with no missing data, the GLM procedure produces results similar to the PROC MIXED analysis of variance output shown in Section 2.3.1. However, this is not true for incomplete blocks designs or any blocked designs (complete or incomplete) with missing data. PROC GLM was not designed to solve mixed model estimating equations or to compute mixed model inferential statistics. Specifically, the RANDOM statement in PROC GLM does not modify estimation or inference as do RANDOM statements in PROC GLIMMIX and PROC MIXED. The RANDOM statement in PROC GLM merely assigns sums of squares to be used to construct F values and standard errors. The sums of squares, however, are computed as if all effects are fixed.
As a result, you cannot use PROC GLM to implement any of the mixed models analyses shown subsequently in this book. In many cases, PROC GLM does implement an analysis that would have been considered state of the art in the 1970s. However, these analyses are known to be less accurate than the corresponding mixed model analyses. In many cases, the standard errors and test statistics obtained by PROC GLM do not correctly account for random effects. PROC GLM is an excellent tool when used for what it was intended (fixedeffectsonly models), but we emphatically discourage its use for all applications beyond the RCBD, beginning with incomplete block designs or RCBDs with missing data. Refer to Littell, Stroup, and Freund (2002) for more detailed PROC GLM coverage.
2.4 Unbalanced TwoWay Mixed Model: Examples with Incomplete Block Design
In some applications of blocking there are not enough experimental units in each block to accommodate all treatments. Incomplete block designs are designs in which only a subset of the treatments is applied in each block. The treatments that go into each block should be selected in order to provide the most information relative to the objectives of the experiment.
Three types of incomplete block designs are balanced incomplete block designs (BIBD), partially balanced incomplete block design (PBIBD), and unbalanced incomplete block design. The word balanced has a specific meaning for incomplete block designs. In design theory, the meaning of balanced for BIB and PBIB designs results in all treatment mean estimates having the same variances (and hence the same standard error). Also, the variances of estimated treatment mean differences are the same for all pairs of treatments with BIBDs and for sets of treatments with PBIBDs. As you may suspect, it is not possible to construct BIB or PBIB designs for all possible numbers of treatments and blocks. Discovery of numbers of blocks and treatments for which BIBDs and PBIBDs can be constructed was once an active area of statistical research. With the advent of fast computers and good statistical software, the existence of BIBDs and PBIBDs for given numbers of blocks and treatments has become a less important problem. For example, you can use PROC OPTEX or the optimal design software in JMP to construct approximately balanced incomplete block designs. These designs are commonly used in many fields of research. Mead et al. (2011) have an excellent discussion of this issue.
This section presents the two most commonly used analyses for incomplete block designs. In one, called intrablock analysis, block effects are assumed to be fixed. In the premixedmodel era of statistical software, intrablock analysis was the only available method. In the other method, called combined inter and intrablock analysis, block effects are assumed to be random. In most cases, using information provided by the block variance, called recovery of interblock information , improves the accuracy and precision of the resulting analysis. However, the intrablock analysis is useful for introducing the distinction between Least Squares treatment means, also called adjusted means , and unadjusted arithmetic means, and the associated distinction between Type I tests of hypotheses and Type III tests.
You can use PROC GLM, PROC GLIMMIX or PROC MIXED to implement intrablock (fixed block effect) analysis. To do combined inter and intrablock (random block effect) analysis, you must use either PROC GLIMMIX or PROC MIXED. PROC GLM was not designed to perform the required computations for recovery of interblock information.
For consistency, both types of analyses are demonstrated using PROC GLIMMIX. Data from a PBIBD is used to illustrate the similarities and differences between intrablock and combined inter and intrablock analyses. Note that the intrablock analysis shown in Section 2.5.1 is identical to the analysis that you would get if you use PROC GLM or PROC MIXED (assuming block effects fixed). The combined inter and intrablock analysis in Section 2.5.2 is identical to the results using PROC MIXED (assuming random block effects). Finally, although the example is a PBIBD, data analysis methods in this section apply to incomplete block designs in general.
As noted above, models for an incomplete block design are the same as for an RCBD. That is, the model equation is
y i j = μ + τ i + b j + e i j
where μ i = μ + τ i denotes the treatment mean, b j denotes the block effects b j and the residual, or experimental error effects e ij are assumed iid N(0, 2 ). An analysis of variance table for an incomplete block design is shown in Table 2.3.
Table 2.3: Type III Analysis of Variance Table for Incomplete Blocks Design
Source of Variation
df
F
Blocks (adjusted for treatments)
r  1
Treatments (adjusted for blocks)
t  1
MS(Trts adj.) / MS(Residual)
Residual
N  r  t + 1
In the table, r is the number of blocks, t is the number of treatments, and N is the total number of observations. Notice that the treatments source of variation is adjusted for blocks (Littell and Stroup 2002). The treatments cannot be compared simply on the basis of the usual sum of squared differences between treatment means, because this would contain effects of blocks as well as treatment differences. Instead, a sum of squared differences must be computed between treatment means that have been adjusted to remove the block effects. The difference between the adjusted and unadjusted analyses is illustrated in Section 2.4.1.
Most statistics textbooks that cover BIBD and PBIBD present intrablock analyses. A few also present combined intra and interblock analysis. In older textbooks, combined inter and intrablock analysis appears needlessly daunting. This is especially true of textbooks written before mixed model software was available, i.e., before PROC MIXED was introduced in the early 1990s, and it was not recognized that recovery of interblock information is simply random block effect mixed model analysis. Textbooks that do cover both types of analysis are all over the map regarding advice about when to use which method of analysis. In Section 2.4.3, we address this question.
2.4.1 Intrablock Analysis of PBIB Data
Data Set PBIB contains data from Cochran and Cox (1957, p. 456). The design is a PBIBD with fifteen blocks, fifteen treatments, and four treatments per block. Data are pounds of seed cotton per plot. The block size is the number of treatments per block. This PBIBD has a block size of four. Each treatment appears in four blocks. Some pairs of treatments appear together within a block (e.g., treatments 1 and 2), and other treatments do not appear together in the same blocks (e.g., treatments 1 and 6).
The data appear in multivariate form; that is, with one data line per block, and the four treatment identifiers and responses given as separate variables. To arrange the data in the univariate form in which each observation has a single data line, as required for SAS mixed model procedures, use the following DATA step:
data pbib;
input blk @@;
do eu=1 to 4;
input treat y @@;
output;
end;
datalines;
Program for Intrablock Analysis
An intrablock analysis of the PBIBD data is obtained from Program 2.5.
Program 2.5
proc glimmix data=pbib;
class treat blk;
model y=treat blk;
lsmeans treat;
run;
Results
Selected results from this PROC GLIMMIX run appear in Output 2.7.
Output 2.7: Incomplete Block Design: PROC GLIMMIX Output for Intrablock Analysis
Fit Statistics
2 Res Log Likelihood
46.33
AIC (smaller is better)
106.33
AICC (smaller is better)
1966.33
BIC (smaller is better)
149.35
CAIC (smaller is better)
179.35
HQIC (smaller is better)
120.36
Pearson ChiSquare
2.67
Pearson ChiSquare / DF
0.09
Type III Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
treat
14
31
1.23
0.3012
blk
14
31
2.76
0.0090
treat Least Squares Means
treat
Estimate
Standard Error
DF
t Value
Pr t
1
2.8456
0.1634
31
17.41
.0001
2
2.4128
0.1634
31
14.76
.0001
3
2.4517
0.1634
31
15.00
.0001
4
2.6833
0.1634
31
16.42
.0001
5
2.8067
0.1634
31
17.17
.0001
6
2.9039
0.1634
31
17.77
.0001
7
2.7711
0.1634
31
16.96
.0001
8
2.8100
0.1634
31
17.19
.0001
9
2.9333
0.1634
31
17.95
.0001
10
2.5150
0.1634
31
15.39
.0001
11
2.8539
0.1634
31
17.46
.0001
12
3.0128
0.1634
31
18.44
.0001
13
2.6683
0.1634
31
16.33
.0001
14
2.5333
0.1634
31
15.50
.0001
15
2.8483
0.1634
31
17.43
.0001
Interpretation
As with the Fit Statistics output for the RCBD, only the last two lines are relevant to interpreting these results. The Pearson ChiSquare is equivalent to the residual sum of squares in an ANOVA table, and hence the Pearson Chisquare/DF gives the MS(residual) and is thus the estimated residual variance, σ ^ 2 = 8.62 . The F value for differences between (adjusted) treatment differences is given in the Type III Tests of Fixed Effects: F = 1.23 and its associated p value is 0.3012.
The leastsquares means, obtained from the LSMEANS statement, are usually called adjusted means in standard textbooks. In complete block designs, the LSMEANS and the usual arithmetic means that you would calculate by hand are the same. This is not true for incomplete blocks designs, or for complete block designs with missing data. Both are examples of unbalanced designs in the standard design sense as defined above. Data for a given treatment in a block design with unbalance come from only a subset of the blocks. Each treatment is observed on a potentially unique subset of blocks. For example, in the PBIB example treatment 1 is observed in blocks 1, 2, 3 and 6, whereas treatment 2 is observed in blocks 3, 4, 9, and 12. If you compared unadjusted sample means of these two treatments, they would be confounded with blocks. In other words, if the sample means differ, you could not say whether it was a treatment 1 versus 2 difference, or a blocks 1, 2 and 6 versus blocks 4, 9, and 12 difference. A least squares mean adjusts for the fact that each treatment is observed on a different subset of blocks by taking the estimates of the intercept, treatment effect, and the average of all block effects. In other words, it is an estimate of what the treatment mean would have been if it had been observed in all blocks. For the PBIB, the LSMEAN for the i th treatment is defined as the estimate of μ + τ i + ( 1 / 15 ) ∑ j b j .
Program to Compare Unadjusted and Adjusted Sample Means
Program 2.6 enables you to see the difference between unadjusted sample means and adjusted, or least squares, means and the inference associated with them.
Program 2.6
proc glimmix data=pbib;
class treat blk;
model y=treat blk/htype=1,3;
lsmeans treat / e;
lsmeans treat / bylevel e;
run;
The first LSMEANS statement causes PROC GLIMMIX to compute adjusted means. The E option enables you to see which linear combination of model parameters is being used to calculate these means. The BYLEVEL option in the second LSMEANS statement causes PROC GLIMMIX to compute unadjusted sample means, and the associated E option enables you to see how these means are calculated. The HTYPE=1,3 statement obtains TYPE I and TYPE III tests of treatment effects. If you put TREAT first in the MODEL statement, the Type I tests for treatment are not adjusted for blocks, whereas the TYPE III tests are.
Results
Selected results appear in Output 2.8.
Output 2.8: Adjusted versus Unadjusted Means with Intrablock Analysis
Type I Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
treat
14
31
2.48
0.0172
blk
14
31
2.76
0.0090
Type III Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
treat
14
31
1.23
0.3012
blk
14
31
2.76
0.0090
Obs
Effect
treat
adj_mean
adj_stderr
unadj_mean
unadj_stderr
1
treat
1
2.84556
0.16343
2.775
0.14676
2
treat
2
2.41278
0.16343
2.400
0.14676
3
treat
3
2.45167
0.16343
2.450
0.14676
4
treat
4
2.68333
0.16343
2.950
0.14676
5
treat
5
2.80667
0.16343
2.800
0.14676
6
treat
6
2.90389
0.16343
2.925
0.14676
7
treat
7
2.77111
0.16343
2.825
0.14676
8
treat
8
2.81000
0.16343
2.725
0.14676
9
treat
9
2.93333
0.16343
2.825
0.14676
10
treat
10
2.51500
0.16343
2.450
0.14676
11
treat
11
2.85389
0.16343
2.975
0.14676
12
treat
12
3.01278
0.16343
3.125
0.14676
13
treat
13
2.66833
0.16343
2.525
0.14676
14
treat
14
2.53333
0.16343
2.425
0.14676
15
treat
15
2.84833
0.16343
2.875
0.14676
Obs
Effect
treat
blk
adj_coef4
adj_coef5
unadj_coef4
unadj_coef5
1
Intercept
_
_
1.00000
1.00000
1.00
1.00
2
treat
1
_
0.00000
0.00000
0.00
0.00
3
treat
2
_
0.00000
0.00000
0.00
0.00
4
treat
3
_
0.00000
0.00000
0.00
0.00
5
treat
4
_
1.00000
0.00000
1.00
0.00
6
treat
5
_
0.00000
1.00000
0.00
1.00
7
treat
6
_
0.00000
0.00000
0.00
0.00
8
treat
7
_
0.00000
0.00000
0.00
0.00
9
treat
8
_
0.00000
0.00000
0.00
0.00
10
treat
9
_
0.00000
0.00000
0.00
0.00
11
treat
10
_
0.00000
0.00000
0.00
0.00
12
treat
11
_
0.00000
0.00000
0.00
0.00
13
treat
12
_
0.00000
0.00000
0.00
0.00
14
treat
13
_
0.00000
0.00000
0.00
0.00
15
treat
14
_
0.00000
0.00000
0.00
0.00
16
treat
15
_
0.00000
0.00000
0.00
0.00
17
blk
_
1
0.06667
0.06667
0.00
0.00
18
blk
_
2
0.06667
0.06667
0.00
0.25
19
blk
_
3
0.06667
0.06667
0.00
0.00
20
blk
_
4
0.06667
0.06667
0.00
0.00
21
blk
_
5
0.06667
0.06667
0.25
0.00
22
blk
_
6
0.06667
0.06667
0.25
0.00
23
blk
_
7
0.06667
0.06667
0.00
0.00
24
blk
_
8
0.06667
0.06667
0.00
0.25
25
blk
_
9
0.06667
0.06667
0.25
0.25
26
blk
_
10
0.06667
0.06667
0.00
0.00
27
blk
_
11
0.06667
0.06667
0.00
0.00
28
blk
_
12
0.06667
0.06667
0.00
0.00
29
blk
_
13
0.06667
0.06667
0.00
0.25
30
blk
_
14
0.06667
0.06667
0.00
0.00
31
blk
_
15
0.06667
0.06667
0.25
0.00
The first two tables show the unadjusted (Type I) and adjusted (Type III) test of overall treatment effect. Notice that the p values are noticeably different. Based on the adjusted test, with p = 0.3012, you would conclude that the treatment effect is not statistically significant; based on the Type I p = 0.0172, you find a statistically significant difference among the treatments. However, a careful examination of the next two tables reveals that the Type I test is confounded with block effects.
The third table shows the adjusted and unadjusted means and their respective standard errors. Notice that these means are not the same. In particular, consider treatments 4 and 5. The unadjusted means are 2.95 and 2.80, respectively, whereas the adjusted means are 2.68 and 2.81, respectively. With the unadjusted analysis, you would conclude that there is a treatment effect and that the mean of treatment 4 is greater than the mean of treatment 5. With the adjusted means, you would conclude that the mean of treatment 5 is greater, but there is insufficient evidence to conclude that a treatment effect exists. The fourth table clarifies the problem with the unadjusted means. This table shows the results of the E option for both sets of means; in the interest of space, only the coefficients of treatments 4 and 5 are given. In the usual PROC GLIMMIX output, these variables are named ROW4 and ROW5here they are renamed ADJ_COEF4, UNADJ_COEF4, etc. The values in each column give the coefficients of the model effects used to compute the respective mean. For example, the adjusted mean for treatment 4 is computed as follows:
μ + τ 4 + 0.06667 ( ∑ j = 1 15 b j )
The unadjusted mean is computed as μ + τ 4 + 0.25 ( b 5 + b 6 + b 9 + b 16 ) . These tell you what each mean estimates. You can see that if you take the difference between the adjusted means, you estimate τ 4 − τ 5 , whereas if you take the difference between the unadjusted means you estimate τ 4 − τ 5 + 0.25 ( b 5 + b 6 + b 15 − b 2 − b 8 − b 13 ) . With the latter, you have no way of knowing if treatments 4 and 5 are different, or if blocks 5, 6, and 15 differ from blocks 2, 8, and 13. This is why you use treatments results adjusted for blocksin essence, Type III tests of fixed effects for treatment and default LSMEANSand not Type I tests or handcalculated sample means.
The means and their standard errors in intrablock analysis stem from the ordinary least squares (OLS) estimation. Thus, they do not take into account the fact that blocks are random. The adjustment of treatment means to remove block effects is a computation that treats blocks simply as another fixed effect. The intrablock analysis does not use all available information about the treatment effects, and thus it is suboptimal compared to the combined intra and interblock estimators provided by PROC GLIMMIX and PROC MIXED.
2.4.2 Combined Intra and Interblock PBIB Data Analysis with PROC GLIMMIX
When blocks are really treated as random, the result is the combined intra and interblock analysis. You can obtain this analysis with either the GLIMMIX or MIXED procedure.
Program
The PROC GLIMMIX statements are given in Program 2.7.
Program 2.7
proc glimmix data=pbib;
class blk treat;
model response=treat;
random blk;
lsmeans treat/diff;
run;
The primary difference between these statements and those for intrablock analysis is that BLK appears in the RANDOM statement instead of the MODEL statement. You could add the HTYPE=1,3 option to the MODEL statement, and a second LSMEANS statement using the BYLEVEL option, as shown in the previous section. You will find that the Type I and III tests, and the default and BYLEVEL means are identical with block effects assumed to be random. This is because with random block effects, the estimable function to LSMEANS is μ + τ i and does not require coefficients for the b j terms. The resulting LSMEANS are adjusted, but the adjustment occurs differently than it does in intrablock analysis. This is explained in more detail below. The DIFF option causes all possible pairwise differencesthere are ( 15 × 14 ) / 2 = 105 of themto be computed. These are computed in this section to illustrate the role of the standard error of the difference in defining what partially balanced means in a PBIBD. Other mean comparison options are presented in detail in Chapter 3 .
Results
Selected PROC GLIMMIX results appear in Output 2.9 for the combined intra and interblock analysis.
Output 2.9: Incomplete Block Design: PROC GLIMMIX Analysis
Covariance Parameter Estimates
Cov Parm
Estimate
Standard Error
blk
0.04652
0.02795
Residual
0.08556
0.02158
Type III Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
treat
14
31
1.53
0.1576
treat Least Squares Means
treat
Estimate
Standard Error
DF
t Value
Pr t
1
2.8175
0.1664
31
16.93
.0001
2
2.4053
0.1664
31
14.45
.0001
3
2.4549
0.1664
31
14.75
.0001
4
2.7838
0.1664
31
16.73
.0001
5
2.8049
0.1664
31
16.86
.0001
6
2.9107
0.1664
31
17.49
.0001
7
2.7890
0.1664
31
16.76
.0001
8
2.7816
0.1664
31
16.72
.0001
9
2.8913
0.1664
31
17.37
.0001
10
2.4911
0.1664
31
14.97
.0001
11
2.8987
0.1664
31
17.42
.0001
12
3.0528
0.1664
31
18.34
.0001
13
2.6178
0.1664
31
15.73
.0001
14
2.4913
0.1664
31
14.97
.0001
15
2.8592
0.1664
31
17.18
.0001
Interpretation
Information about the effect of blocks moves from the test of fixed effects output to the Covariance Parameter Estimates. The estimated block variance is σ b 2 = 0.04562 . The REML estimate of the residual variance component is 0.08556, compared to 0.086154 from the intrablock analysis (Output 2.9). Although PROC GLIMMIX output gives standard errors of variance component estimates, these are asymptotic standard errors.
The F statistic in the Type 3 Tests of Fixed Effects table is 1.53 with a p value of 0.1576. Compare this to the results from the intrablock analysis (Output 2.8, F = 1.23, p = 0.3012). This smaller p value in the mixed model analysis is the result of increased power associated with the combined intra and interblock estimates of the treatment effects.
The Least Squares Mean estimates of the treatment means are similar, but not identical, to the adjusted means in the intrablock analysis in Section 2.4.1. For example, the estimate of the treatment 1 mean is 2.817, compared with the intrablock estimate of 2.846. The latter is an ordinary least squares (OLS) estimate, whereas the former is a mixed model estimate, equivalent to (estimated) generalized least squares (GLS). Theoretically, the GLS estimate is superior, because it accounts for BLK being random and computes the estimate of the best linear unbiased estimate (EBLUE) accordingly, substituting estimates of the variance components for block and residual. Likewise, the standard errors in the combined inter and intrablock analysis are different from those in Section 2.4.1. The standard error of the OLS estimate is 0.163 whereas the GLS estimate is 0.166. The former is not a valid estimate of the true standard error, for the same reason that the fixedblockeffect analysis did not compute a valid standard error estimate for a treatment mean for the RCBD data in Section 2.2.1: the random effects of blocks were ignored.
Program
In the combined inter and intrablock PROC GLIMMIX run (Program 2.7), the differences of the leastsquares means were saved to a data set with the ODS OUTPUT statement. We now want to carry out additional processing on these differences. Program 2.8 shows how. First, a data set (PAIRS) is created that contains the pairs of observations that occur together in a block in this partially balanced incomplete block design.
Program 2.8
data pairs;
set pbib_mv;
array tx{4} trt1trt4;
array yy{4} y1y4;
do i=1 to 3; do j=(i+1) to 4;
treat = min(tx{i},tx{j});
_treat = max(tx{i},tx{j});
output;
end; end;
keep blk treat _treat;
run;
proc sort data=pairs nodupkey; by treat _treat; run;
proc print data=pairs(obs=23); run;
The PAIRS data set is created from the original data in multivariate format. The variables TREAT and _TREAT are set up to match the variables by the same name in the DIFMIX data set that was created in the PROC GLIMMIX call.
Results
Output 2.10 shows the first 23 observations of the PAIRS data set. These observations correspond to the pairings of treatments within a block that involve the first two treatments.
Output 2.10: Pairs within a Block Involving Treatments 1 and 2
Obs
blk
treat
_treat
1
3
1
2
2
6
1
3
3
6
1
4
4
2
1
5
5
2
1
7
6
2
1
8
7
1
1
9
8
3
1
10
9
6
1
12
10
1
1
13
11
3
1
14
12
1
1
15
13
4
2
3
14
9
2
4
15
9
2
5
16
12
2
6
17
12
2
8
18
12
2
9
19
3
2
10
20
4
2
11
21
9
2
13
22
3
2
14
23
4
2
15
Interpretation
Treatment 1 occurs with all other treatments somewhere in a block, except for treatments 6 and 11. Similarly, treatment 2 appears with all but treatments 7 and 12. Pairs of treatments that never appear in the same block are called disconnected pairs.
Next, the output data set of treatment mean differences was sorted by StdErr. This reveals that there are two values of standard errors. Output 2.11 shows all pairs with the greater standard error value. Output 2.12 shows a subset of the standard errors of differences with the lower standard error, specifically differences between treatment 1 or 2 and all other treatments for which the standard error is at the lower level.
Output 2.11: LeastSquares Means Differences for Disconnected Pairs
treat
_treat
Estimate
StdErr
Probt
1
6
0.09317
0.2272
0.6846
1
11
0.08118
0.2272
0.7233
2
7
0.3837
0.2272
0.1013
2
12
0.6475
0.2272
0.0077
3
8
0.3267
0.2272
0.1605
3
13
0.1628
0.2272
0.4789
4
9
0.1075
0.2272
0.6395
4
14
0.2925
0.2272
0.2075
5
10
0.3138
0.2272
0.1771
5
15
0.05434
0.2272
0.8126
6
11
0.01199
0.2272
0.9582
7
12
0.2638
0.2272
0.2544
8
13
0.1638
0.2272
0.4762
9
14
0.4000
0.2272
0.0882
10
15
0.3682
0.2272
0.1153
Output 2.12: LeastSquares Means Differences for Connected Pairs Involving Treatments 1 and 2
treat
_treat
Estimate
StdErr
Probt
1
2
0.4122
0.2221
0.0729
1
3
0.3626
0.2221
0.1126
1
4
0.03369
0.2221
0.8804
1
5
0.01262
0.2221
0.9550
1
7
0.02854
0.2221
0.8986
1
8
0.03592
0.2221
0.8726
1
9
0.07379
0.2221
0.7419
1
10
0.3265
0.2221
0.1516
1
12
0.2353
0.2221
0.2975
1
13
0.1998
0.2221
0.3753
1
14
0.3262
0.2221
0.1519
1
15
0.04171
0.2221
0.8522
2
3
0.04963
0.2221
0.8246
2
4
0.3785
0.2221
0.0983
2
5
0.3996
0.2221
0.0817
2
6
0.5054
0.2221
0.0299
2
8
0.3763
0.2221
0.1002
2
9
0.4860
0.2221
0.0363
2
10
0.08575
0.2221
0.7020
2
11
0.4934
0.2221
0.0337
2
13
0.2125
0.2221
0.3461
2
14
0.08600
0.2221
0.7012
2
15
0.4539
0.2221
0.0495
In Output 2.11, all of the standard errors of differences between disconnected pairs are 0.2272, whereas in Output 2.12, all standard errors for differences between connected pairs are 0.2221. Although only the results for the connected pairs of treatments 1 and 2 are shown in Output 2.12, similar results are obtained for the other treatments. These standard errors differ because the treatment pairs in Output 2.11 were observed together in the same block a different number of timesin this case zerothan the pairs in Output 2.12. This is a defining characteristic of a PBIBDthere are exactly two levels of standard error of the difference. In a BIBD, there is only one levelif there is more than one level, the design is not a BIBD. The more times that treatment pairs appear together in the same block in a given design, the lower the standard error will be. Although in this example the difference is small, it is an important difference, because it reflects the decreased precision that is the result of disconnected treatment pairs. Contrasts involving treatments that do not appear in the same block are not estimated with the same precision as contrasts involving treatments that do appear in the same block. Chapter 4 covers procedures that include these principles in the design of experiments.
As noted above, textbooks that do include sections on combined inter and intrablock analysis, i.e., assuming random block effects, often include cautionary warnings about using this analysis. Textbooks vary in the nature and extent of these warnings. Some appear to dismiss mixed model analysis altogether, while some warn against mixed model analysis when the number of blocks is small. The definition of small varies. Stroup (2015) reported a simulation study on the behavior of mixed model analysis of incomplete block designs. In most cases, even when the assumptions of the randomized block mixed model are violated (including cases where the block effect distribution is bimodal or beta with most of the probability density at zero or one, the performance of the combined inter and intrablock analysis was still equal to or superior to intrablock analysis, even with incomplete block designs with 4 blocks. The only case for which any caution seems valid is if the number of blocks is small ( ≤ 6 ) and the block variance is small relative to the residual variance ( σ b 2 / σ 2 < 0.5 ). Otherwise, the warnings appear to be more of a holdover from an era before PROC MIXED appeared when recovery of interblock information was difficult, and research comparing intrablock analysis to random block effect analysis focused more on is it worth the trouble? rather than because it is easy with mixedmodel software, is there any harm in using it? Taking the latter perspectivethe more relevant perspective given easy access to mixed model softwareour answer is as follows: with the one exception noted above, no, there is no harm: you are never worse off, and you are usually better off, if you use the mixed model, random block effect approach.
2.5 Analysis with a Negative Block Variance Estimate: An Example
This section focuses on cases when the default algorithms in PROC GLIMMIX and PROC MIXED set the block variance estimate to zero. This is accompanied by a warning in the SAS LOG, Estimated G matrix is not positive definite. Users ask if this is a problem and the answer is, Yes, it is. This section covers an example, discusses why this occurs, why it is a problem, and what to do about it.
2.5.1 Illustration of the Problem
With this example we demonstrate a case in which the estimate of the block variance has been set to zero.
Program
The data set titled RCBD with Negative Variance Estimate contains data analyzed using the statements in Program 2.9.
Program 2.9
proc glimmix data=zero_v_ex ;
class blk trt;
model y=trt;
random blk;
run;
Results
This produces the result shown in Output 2.13.
Output 2.13: Variance Components Estimates Illustrating SettoZero Default
Covariance Parameter Estimates
Cov Parm
Estimate
Standard Error
block
0
.
Residual
8.4792
2.4477
Notice that the block variance estimate has been set to zero. Rerunning the analysis using PROC MIXED with option METHOD=TYPE3 produces an insight into why this happens.
Output 2.14: ANOVA Table Generating Negative Block Variance Estimate
Type 3 Analysis of Variance
Source
DF
Sum of Squares
Mean Square
Expected Mean Square
Error Term
Error DF
F Value
Pr F
trt
5
59.184969
11.836994
Var(Residual) + Q(trt)
MS(Residual)
20
1.18
0.3542
block
4
2.663237
0.665809
Var(Residual) + 6 Var(block)
MS(Residual)
20
0.07
0.9913
Residual
20
200.838496
10.041925
Var(Residual)
.
.
.
.
The MS(blk) is less than MS(residual). Recalling the ANOVA estimate of the block variance from Section 2.3.1, you can see that this results in a negative estimate. Because variance cannot be negative, the traditional approach is to set the variance estimate to zero, the lowest number within the block variance s parameter space.
Unfortunately, while setting the estimate to zero solves the problem of trying to explain how a variance can be negative (it can t), doing so has undesirable consequences. Specifically, for tests of hypotheses about treatment effects, it raises the Type I error rate relative to the nominal α level, and it reduces the accuracy of confidence interval coverage. Using a simulation study, Littell and Stroup (2003) document the consequences of the settozero default.
The reason these problems occur can be seen from the discrepancy between the residual variance estimate in the default output and the MS(residual). We know that the MS(residual) is an unbiased estimate of the residual variance. When the settozero default is invoked, MS(blk) is pooled with MS(residual), as are the degrees of freedom for block and residual. The result is a downward bias in the residual variance estimate and an upward bias in the t and F statistics used to test hypotheses. The default F statistic for TRT is 1.40 versus 1.18 using the ANOVA MS(residual).
There are two ways to avoid this problem without abandoning the benefits of mixed model analysis. These are discussed in the next two sections.
First, some comments about negative variance estimates that have more to do with design than with analysis. A zero estimate may suggest a number of things. It may indicate that variation associated with the criterion used to block is relatively small compared to background noise. If so, the likelihood of data producing a MS(blk) less than MS(residual) is rather high. In such cases, the options given in the next two sections are strongly recommended. On the other hand, a zero variance estimate may suggest a flawed design. Often it means that blocking was not done in a manner consistent with the blocking criterion. See the discussion in Chapter 4 on effective versus ineffective blocking strategies. If flawed blocking is the case, all bets are off. Before proceeding, you should always do a retrospective and be willing to ask hard questions about how the design was implemented and how the data were collected.
To conclude, one common practice that we strongly discourage is pooling block and error sources of variation. This is equivalent to the settozero approach, and, as noted above, is a recipe for inflated Type I error rates and poor confidence interval coverage. If the design used blocking, the data analyst must respect the design.
2.5.2 Use of NOBOUND to Avoid Loss of Type I Error Control
In this approach, you override the settozero default using the option NOBOUND in the PROC statement. You can use NOBOUND in either PROC GLIMMIX or PROC MIXED.
Program
The PROC GLIMMIX statements are shown in Program 2.10.
Program 2.10
proc glimmix data=zero_v_ex nobound;
class block trt;
model y=trt;
random block;
run;
Results
Output 2.15: Variance Estimates Obtained Using the PROC GLIMMIX NOBOUND Option
Covariance Parameter Estimates
Cov Parm
Estimate
Standard Error
block
1.5627
0.5350
Residual
10.0419
3.1755
Notice that the variance estimate for BLK corresponds to ( 1 / t ) [ MS ( b l k ) − MS ( r e s i d u a l ) ] = ( 1 / 6 ) [ 0.66 − 10.04 ] from the ANOVA table above. The residual variance estimate is now equal to the MS( residual ). NOBOUND results in the following F statistic for treatment:
Type III Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
trt
5
20
1.18
0.3542
As with the variance estimates, this result agrees with the ANOVA table. The main problem with the NOBOUND option is that it makes interpreting the block variance awkward.
2.5.3 Reparameterization of the Model as Compound Symmetry
As an alternative to NOBOUND, you can reparameterize the randomized block model as follows to avoid the need to report a negative variance. The reparameterized model is called a compound symmetry model, which is an important tool for mixed model analysis.
Rewrite the model from Equation 2.1 as y i j = μ + τ i + w i j , instead of y i j = μ + τ i + b j + e i j . That is, let w i j = b j + e i j . You can easily show that Var ( w i j ) = σ b 2 + σ 2 and Cov ( w i j , w i ' j ) = σ b 2 . It follows that the correlation between any two observations on different treatments in the same block is ρ = σ b 2 / ( σ b 2 + σ 2 ) . This is called the intraclass correlation . The model equation y i j = μ + τ i + w i j , with Cov ( w i j , w i ' j ) redefined as the intraclass covariance and denoted σ w , is the simplest version of the compound symmetry covariance model. If σ w ≥ 0 , the compound symmetry and randomized block model, with b j defined as a random effect, are equivalent. However, unlike σ b 2 in the randomized block model, σ w , being a covariance, is not required to be nonnegative. The compound symmetry model enables you to interpret an apparently negative variance as a covariance. In fact, in many experiments, there are competition effects among experimental units within blocks, making negative covariance an unsurprising result.
Program
You can implement the compound symmetry model for randomized block designs using Program 2.11.
Program 2.11
proc glimmix;
class blk trt;
model y=trt; random trt / subject=blk type=cs residual;
run;
Read the RANDOM statement beginning with SUBJECT=BLK. This signifies that the residuals are assumed to be correlated within each block. TYPE=CS signifies that the correlation structure is compound symmetry. RANDOM TRT does not mean that TRT effects are randomit merely signifies that TRT identifies each observation within a block, and that the number of treatment levels determines the dimension of the covariance structure within each subject level, i.e., block in this case. The word RESIDUAL signifies that this covariance is part of the residual variance structure, not a random model effect. An equivalent way to write the RANDOM statement is as follows:
random _residual_ / subject=blk type=cs;
The equivalent PROC MIXED statements are:
proc mixed ;
class blk trt;
model y=trt;
repeated / subject=blk type=cs;
run;
Results
The results from PROC GLIMMIX appear in Output 2.15.
Output 2.16: Compound Symmetry Covariance Estimates Obtained Using PROC GLIMMIX
Covariance Parameter Estimates
Cov Parm
Subject
Estimate
Standard Error
CS
block
1.5627
0.5350
Residual
10.0419
3.1755
Type III Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
trt
5
20
1.18
0.3542
Notice that the variance estimates are identical to those obtained using the NOBOUND option, as is the test statistic for TRT. The only difference is that the block variance has been relabeled as the CS covariance associated with block.
The compound symmetry model is the simplest form of a marginal model. Mixed models in which all sources of variation over and above residual appear as random effects in the linear predictor are called conditional mixed models. The name conditional is somewhat misleading: the best way to understand the distinction is that conditional mixed models enable you to compute predictions (called Best Linear Unbiased Predictors ) from linear combinations of fixed and random effects, whereas marginal models have only fixed effects in the linear predictor and account for all random variation through the residual covariance structure.
Note that the above comments are primarily applicable in cases involving designed experiments or observational studies with a clearly defined design structure (e.g. matchedpairs). In such cases, it is important to respect the design in the sense that sources of variation that are part of the design structure must be accounted for by the model in some form. On the other hand, in observational studies in which a term is included in the model because it is suspected to be a source of variation, but not known to be a source of variation, dropping the effect if it produces a settozero variance estimate is preferable to the NOBOUND or compound symmetry approach.
2.6 Introduction to Mixed Model Theory
The randomized complete block design presents one of the simplest applications of mixed models. It has one fixed effect (treatments) and one random effect (blocks). In this section, we use the RCBD to introduce essential theory that underlies the mixed model. Refer to Appendix A , Linear Mixed Model Theory, for the general setting and for additional details.
2.6.1 Review of Regression Model in Matrix Notation
The standard equation for the linear regression model is as follows:
y = β 0 + β 1 x 1 + ... + β k x k + e
In an application there would be n observed values of y and corresponding values of the predictor variables x 1 , , x k . Often the values of y are considered to be independent realizations with equal variance. These can be represented in matrix notation as
Y = X β + e (2.2)
where the terms are defined as follows:
Y is the n vector of observations.
X is an n ( k + 1) matrix comprising a column of 1s and columns of observed values of x 1 , , x k ,.
β = [ β 0 β 1 β 2 … β k ] ′ is the n vector of regression coefficients.
e is a vector of realizations of the errors e .
At this point we assume only that the error vector e has mean 0 and covariance matrix 2 I , denoted as e ~ ( 0 , 2 I ). This covariance matrix reflects the assumption of uncorrelated errors. In linear mixed models, we add the assumptions that the errors are normally distributed, denoted as e ~ N(0, 2 I ). Note that when the normality assumption is added, lack of correlation among the errors is tantamount to independence of the errors. Refer to the expression Y = X + e as the model equation form of the fixed effects linear model.
Alternatively, you can rewrite Equation 2.2 as follows:
Y ~ N ( X β , σ 2 I ) . (2.3)
Refer to this as the probability distribution form of the fixed effects linear model. There are two advantages to the probability distribution form. First, Equation 2.3 makes it clear that X β models E ( Y ) . Second, the probability distribution form can be generalized to describe linear models for nonGaussian data, whereas the model equation form cannot. The model equation form is useful for theoretical development of the mixed model with Gaussian (normally distributed) data, so we will continue to use it when appropriate in this book. However, it general, Equation 2.3 is the preferred form.
2.6.2 The RCBD Model in Matrix Notation
The RCBD model in Equation 2.1 can be written in matrix notation. In explicit detail, the model equation is as follows:
[ y 11 . . . y t 1 . . . y 1 r . . . y t r ] = [ 1 1 . . . 0 . . . . . . . . . . . . 1 0 1 . . . . . . . . . 1 1 . . . 0 . . . . . . . . . . . . 1 0 1 ] [ μ τ 1 . . . τ t ] + [ 1. . . . 0 . . . . . . . . . 1 . . . . 0 . . . . . . . . . 0 . . . 1 . . . . . . . . . 0 . . . 1 ] [ b 1 . . . b r ] + [ e 11 . . . e t 1 . . . e 1 r . . . e t r ]
The terms are defined as in Equation 2.1. In more compact matrix notation the equation is as follows:
Y = X β + Z u + e (2.4)
The definitions are as follows:
Y is the vector of observations
X is the treatment design matrix
is the vector of treatment fixed effect parameters
Z is the block design matrix
u is the vector of random block effects
e is the vector of residuals
The model Equation 2.4 states that the vector Y of observations can be expressed as a sum of fixed treatment effects X , random block effects Zu , and random experimental errors e . The X portion is defined by the MODEL statement, and the Zu portion is defined by the RANDOM statement. It is not necessary in this example to define the residuals e .
For the RCBD model in matrix notation, the random vector u has a multivariate normal distribution with mean vector 0 and covariance matrix σ b 2 I t , u ~ N ( 0 , σ b 2 I t ) , and the random vector e is distributed N( 0 , 2 I tr ).
As with the fixed effects linear model, you can express Equation 2.4 in probability distribution form as
Y  u ~ N ( X β + Z u , R ) ; u ~ N ( 0 , G ) (2.5)
For the randomized block design as presented in this chapter, R = σ 2 I , and G = σ b 2 I t . As with Equation 2.3, Equation 2.5 can be adapted for mixed models with nonGaussian data. Notice that with a mixed model, the distribution of the observation vector, Y , as conditional on the random model effects, and the mixed model linear predictor is used to estimate the conditional expectation, E ( Y  u ) .
You can also write the marginal distribution of Y in model form as follows:
Y ~ N ( X β , V )
where V = Z G Z ′ + R . The specific form of V for the randomized block design is
V = [ σ b 2 J t 0 t × t ⋯ 0 t × t 0 t × t σ b 2 J t ⋯ 0 t × t 0 t × t 0 t × t ⋱ ⋮ 0 t × t 0 t × t ⋯ σ b 2 J t ] + σ 2 I t r = [ σ b 2 J t + σ 2 I t 0 t × t ⋯ 0 t × t 0 t × t σ b 2 J t + σ 2 I t ⋯ 0 t × t 0 t × t 0 t × t ⋱ ⋮ 0 t × t 0 t × t ⋯ σ b 2 J t + σ 2 I t ]
where σ b 2 J t + σ 2 I t is the covariance matrix of the observations in a particular block, 0 t t is a t t matrix of zeros, and J t is a t t matrix of ones.
Alternatively, you can redefine σ b 2 as the compound symmetry covariance, denoted as σ c s or you can denote Var ( y i j ) = σ y 2 , define the intraclass correlation as ρ = σ c s / σ y 2 and write Var ( Y ) as the compound symmetry covariance matrix:
V = σ y 2 [ ρ J t + ( 1 − ρ ) I t 0 t × t ... 0 t × t 0 t × t ρ J t + ( 1 − ρ ) I t ... 0 t × t 0 t × t 0 t × t ... ... 0 t × t 0 t × t ... ρ J t + ( 1 − ρ ) I t ]
Note that in theory, σ b 2 = σ c s , σ y 2 = σ b 2 + σ 2 and ρ = σ b 2 / ( σ b 2 + σ 2 ) . You see the main advantage of the compound symmetry form in Section 2.5: it provides a useful way to deal with the problem of negative block variance estimates.
2.6.3 Inference Basics for the Randomized Block Mixed Model
If you want to estimate and perform inference on the fixed effects onlythe treatment effectsyou can use the fact that the estimate of from the mixed model equations is equivalent to the solution from the generalized least squares (GLS) estimating equation V  1 X β = X V  1 y . The matrix X is not of full column rank and so X ' V 1 X is singular. You must use a generalized inverse to obtain a GLS solution of the fixed effect parameter vector . But the treatment means, differences between treatment means, and contrasts are estimable. Thus, no matter what generalized inverse is used, there will be a vector K for which K ' is equal to the mean, difference or contrast of interest. For example, choosing K ' = [1,1,0,...,0] gives K ' = + 1 = 1 . Then the general theory gives
Var [ K ' β ^ ] = ( σ b 2 + σ 2 ) / r
where β ^ is the generalized leastsquares estimate. Likewise, K ' = [0,1,1,0...,0] gives K ' = 1  2 , and Var [ K ' β ^ ] = 2 σ 2 / r . These are the expressions presented in Section 2.2.1.
In the case of a relatively simple, balanced design such as an RCBD, the variance expressions can be derived directly from the model. This was the approach in Section 2.2.1. But more complicated, unbalanced situations require you to use the general theoretical result, Var [ K ' β ^ ] = K ′ ( X ′ V − 1 X ) ¯ K . Given that the variance components are generally unknown and must be estimated, in practice you use the estimated variance, denoted as K ′ ( X ′ V ^ − 1 X ) ¯ K , for the following inferential statistics.
If k is a vector (e.g. for estimating a treatment difference or a contrast), then you can use a t statistic,
t = k ′ β ^ / k ′ ( X ′ V ^ − 1 X ) ¯ k
Also, if k is a vector, you can obtain a confidence interval for K ′ β with k ′ β ^ ± t ν , α / 2 k ′ ( X V ^ − 1 X ) ¯ k , where ν denotes the degrees of freedom that are associated with the estimate of k ′ ( X V ^ − 1 X ) ¯ k .
If K is a matrix, which it is for testing any hypothesis with more than 1 numerator degree of freedom, square the t ratio to get the estimated Wald statistic, ( K ′ β ^ ) [ K ′ ( X ′ V ^ − 1 X ) ¯ K ] − 1 K ′ β ^ . For complete block designs, when K ′ β defines treatment difference or differences, the estimated Wald statistic reduces to ( K ′ β ^ ) [ K ′ ( X ′ X ) ¯ K ] − 1 K ′ β ^ / σ ^ 2 , which is equal to SS ( K ′ β ) / MSE . Dividing by rank ( K ) gives you MS ( K ′ β ) / MSE , an F statistic.
These are the general theoretical results in the RCBD setting. They are provided to assist readers with a matrix and mathematical statistics background to better understand the methodology used in this chapter.
2.7 Summary
Chapter 2 begins with an example of a randomized block design with fixed treatments and random blocks. The importance of accounting for random effects in such a basic situation, to correctly compute the variance of a treatment mean, is demonstrated. The use of PROC GLIMMIX and PROC MIXED is introduced with explanations of how to set up the MODEL and RANDOM statements. Then, PROC GLM is briefly discussed, with emphasis on the fact that GLM is intended for fixed effect only models. We emphasize the basic applications that are handled appropriately by PROC MIXED and PROC GLIMMIX but not by PROC GLM. Then, an incomplete block design is used in Section 2.4 to illustrate some of the issues associated with unbalanced mixed model data. These include recovery of interblock information, the difference between intrablock (fixed block) and combined inter/intrablock (random block) analysis, and the difference between the arithmetic, or sample, mean and the least squares mean. The issue of negative estimates of variance components, why they matter and what to do about them, is discussed in Section 2.5. The chapter concludes with a section intended for readers with a matrix and mathematical statistics background, introducing mixed model theory relevant to estimation and inference for blocked designs.
Chapter 3: Mean Comparisons for Fixed Effects
3.1 Introduction . 49
3.2 Comparison of Two Treatments . 50
3.2.1 Example Using PROC TTEST . 50
3.2.2 Analysis Using PROC GLIMMIX .. 51
3.3 Comparison of Several Means: Analysis of Variance . 52
3.3.1 Relationship between LSMEANS and Fixed Effects Solution . 52
3.3.2 Standard Errors for Mean Comparisons . 54
3.3.3 Multiple Comparisons and Planned Comparisons . 57
3.4 Comparison of Quantitative Factors: Polynomial Regression . 58
3.4.1 Analysis of a Regression Experiment 58
3.4.2 Type I versus Type III Sums of Squares . 61
3.4.3 Direct Fit of Polynomial 62
3.4.4 Orthogonal Polynomial Contrasts . 65
3.5 Mean Comparisons in Factorial Designs . 67
3.5.1 Factorial: The Basics . 67
3.5.2 A More Complex TwoWay Factorial Experiment 72
3.5.3 Multiple Comparisons for a Factorial Experiment 74
3.5.4 Multiple Comparisons of METHOD Means by VARIETY Using SLICE and SLICEDIFF . 74
3.5.5 Planned Comparisons in a TwoWay Factorial Experiment: CONTRAST, ESTIMATE, and LSMESTIMATE .. 77
3.5.6 Simple Effect ComparisonsUsing LSMESTIMATE .. 78
3.5.7 Simple Effect ComparisonsUsing Model Parameters: How ESTIMATE and LSMESTIMATE Differ . 79
3.5.8 Simultaneous Contrasts in TwoWay Classifications . 81
3.5.9 Comparison of Levels of One Factor within Subgroups of Levels of Another Factor . 82
3.5.10 Quantitative by Qualitative Factorial: An Example . 84
3.5.11 Example: Factorial with Two Continuous Factors . 92
3.6 Summary . 99
3.1 Introduction
In Chapter 1 the difference between the experiment design (blocking, experimental units, etc.) and treatment design (the factors and levels applied to the experimental units) were introduced. Chapter 2 focused primarily upon obtaining estimates of variance components, estimates of fixed effects, and overall tests of model effects for simple blocked experiment designs. However, in most cases, the overall F tests for fixed model effects represent a starting point, not a final product. Typically, you not only want to know whether there is an overall treatment difference. If there are more than two treatments, you want to know which treatments are different. How big are the treatment differences? Are they large enough to matter? Which treatment is better ?
This chapter extends the examples from Chapter 2 to demonstrate various mean comparison procedures. This chapter also covers and builds upon some fundamental inferential concepts of linear models, which have a very rich history in statistical practice and were a core motivation for the formation of SAS Institute and PROC GLM in the 1970s. For additional discussion with lots of great examples, refer to Littell et al. (2002) and Milliken and Johnson volumes 13 (2009, 1989, and 2001). We focus here on core methods and examples that form the basis for treatment comparisons in a modern mixed model framework. While some of the treatment designs may be analyzed using designspecific procedures when the experiment design is a completely randomized design, we focus on using mixed model procedures (PROC GLIMMIX, in particular) even with the simple examples as it is necessary when factorial treatment designs are combined with complex experiment designs. We begin with a simple comparison of two treatments in Section 3.2. Section 3.3 presents approaches for comparing more than two treatments. Section 3.4 illustrates methods appropriate for treatment designs whose factor levels are quantitative. Sections 3.5 introduces methods for factorial treatment structures and complex quantitative by qualitative factorial designs. The examples in this chapter demonstrate the use of the LSMEANS, ESTIMATE, CONTRAST and LSMESTIMATE statements and their options to address the various goals of comparative experiments.
3.2 Comparison of Two Treatments
One of the most basic analyses is a comparison of two treatments. This typically occurs in one of two forms: independent samples (i.e. a completely randomized design with two treatments) and paired samples. A randomized block experiment with two treatments is an example of paired samples. Paired sampling gives rise to the simplest mixed model intended for comparing treatments. It is vitally important to recognize the difference between paired or blocked and independent samples, because a failure to accommodate pairing can dramatically influence conclusions about your treatments. The following example uses variation of the paired sample experiment: before and aftertreatment responses on the same experimental unit.
A combination stimulantrelaxant drug is administered to 15 animals whose pulse rates are measured before (PRE) and after (POST) administration of the drug. The purpose of the experiment is to determine whether there is a change in the pulse rate as a result of the drug. The data are in the PULSE data set in the sample library. The simplest method of analyzing a paired sample experiment is a t test. However, the paired comparison t test can be viewed as a special case of mixed model methodology. For completeness, and to establish the connection to mixed models, analyses of this example using PROC TTEST and PROC GLIMMIX are presented.
3.2.1 Example Using PROC TTEST
The appropriate t statistic is t = d ¯ / s d ¯ , where
d ¯ = ∑ i d i / n i
with d i being the difference between the PRE and POST measurement for the i th animal, and
s d = ∑ i ( d i − d ¯ ) / ( n − 1 )
The t test is one of the most popular of all statistics and can be computed using PROC TTEST, one of the very first statistical procedures in all of SAS. The TTEST procedure computes twosample t tests for the paired case and independent case.
Program
For the paired test, use Program 3.1.
Program 3.1
proc ttest;
paired pre*post;
run;
The statement PAIRED PRE*POST causes the test to be computed for the paired difference PREPOST.
Results
The results appear in Output 3.1.
Output 3.1 PairedDifference Analysis Using PROC TTEST with the PAIRED Option
The TTEST Procedure
Difference: pre  post
N
Mean
Std Dev
Std Err
Minimum
Maximum
15
1.4667
2.3258
0.6005
2.0000
7.0000
Mean
95% CL Mean
Std Dev
95% CL Std Dev
1.4667
0.1787
2.7547
2.3258
1.7028
3.6681
DF
t Value
Pr t
14
2.44
0.0285
Interpretation
The estimated mean difference of PREPOST, 1.4667, appears in the column labeled MEAN. The lower and upper 95% confidence limits about the mean appear in the column labeled 95% CL Mean. This mean difference has a t value of 2.44 with p = 0.0285 indicating a statistically significant change in mean pulse rate. Since the mean is positive, the drug evidently decreases pulse rate.
3.2.2 Analysis Using PROC GLIMMIX
The data in the PULSE data set were entered with one line per animal. To analyze these data as a randomized block design using PROC GLIMMIX, the data must be stacked with separate data lines for the PRE and POST responses. Doing so also requires creating a variable to identify the observation s treatment, that is, PRE or POST. You can obtain the stacked data using the following DATA step.
data pulsestack; set pulse;s
y=pre; trt="pre "; output;
y=post; trt="post"; output;
run;
Now each experimental unit has two observations, one for each of the PRE and POST treatment levels. ANIMAL is the block in this setup.
Program
Using the stacked data, Program 3.2 shows the PROC GLIMMIX statements to analyze the experiment as a randomized block design.
Program 3.2
proc glimmix data=pulsestack;
class trt animal;
model y = trt;
random animal;
lsmeans trt/diff;
run;
Results
Output 3.2 shows the results of this analysis.
Output 3.2 PairedDifference Analysis Using PROC GLIMMIX with random blocks
Covariance Parameter Estimates
Cov Parm
Estimate
Standard Error
animal
4.9333
2.4301
Residual
2.7048
1.0223
Type III Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
trt
1
14
5.96
0.0285
trt Least Squares Means
trt
Estimate
Standard Error
DF
t Value
Pr t
post
62.2667
0.7136
14
87.26
.0001
pre
63.7333
0.7136
14
89.31
.0001
Differences of trt Least Squares Means
trt
_trt
Estimate
Standard Error
DF
t Value
Pr t
post
pre
1.4667
0.6005
14
2.44
0.0285
Interpretation
The Covariance Parameter Estimates table shows the estimated variance between the experimental units (EUs), 4.9333, which is nearly twice the residual or within experimental unit variance, 2.7048. The intraclass correlation is 4.933/(4.933 + 2.705) = 0.65, signifying a strong correlation between measurements on the same EU. The Type III Tests of Fixed Effects has the F test for the difference between treatments. Here, the F value, 5.96, is the square of the t value obtained in the t test (Output 3.1), 2.44, and has the same p value, 0.0285, indicating a significant effect of the drug on the animal s pulse rate. The LSMEANS statement requests the estimates of the treatment means and their difference be provided. Because the SAS procedures process names in alphanumeric order (as default, but can be changed by using the ORDER=data option), the difference is POSTPRE and therefore is estimated as 1.4667, the opposite sign as from PROC TTEST.
In the next section, we extend the analysis from two treatments to more than two, as most experiments include multiple treatments for efficiency.
3.3 Comparison of Several Means: Analysis of Variance
In Chapter 2 , Section 2.3, we began with an example of analysis of variance using the BOND data. Three different metals were used to bond two pieces of an ingot together. The amount of pressure required to break the bond was measured as the response. The seven ingots serve as complete blocks, so there are seven observations for each metal. The initial analysis results in a p value for the Type III Tests of Fixed Effects of .0131indicating there is a significant difference in the amount of pressure required to break the bond between the three metals. Although this tells us that there is evidence that at least one treatment mean is different, it does not say which mean is different, nor does it tell us how different. For this, we need mean comparison procedures, which in mixed models are implemented using functions of the model parameters called least squares means , defined below in Section 3.3.1. When comparing more than two treatment levels, a natural starting point is to consider all pairwise comparisons.
3.3.1 Relationship between LSMEANS and Fixed Effects Solution
You can obtain treatment means from the LSMEANS statement. Linear combinations of means can be estimated and compared in a variety of ways, including the DIFF and SLICE options and the LSMESTIMATE, ESTIMATE, and CONTRAST statements. These statements all compute linear combinations of estimates of parameters defined in the MODEL statement with coefficients in the order of the terms in the solution.
The statement
model pres=metal;
specifies that the expected value of y ij in equation (2.1) for a given metal can be represented as follows:
E [ y i j ] = μ + τ i = μ i
The least squares means are modelbased estimates of these means. In other words, they are calculated from estimates of the model fixed effect parameters in the expressions: lsmean for metal i = μ ^ + τ ^ i The general interpretation of least squares means are as estimates of the treatment effects under a balanced design. The following program provides a simple example.
Program
Program 3.3 shows the PROC GLIMMIX statements to obtain treatment mean estimates (LSMEANS), treatment mean differences (DIFF option), confidence limits for both (CL option), ESTIMATE of the least squares mean of the nickel treatment ( μ ^ + τ ^ 3 ), and ESTIMATE and CONTRAST for the difference between the copper and iron means ( τ ^ 1 − τ ^ 2 ). Note that all SAS linear model procedures read levels of CLASS variables in alphabetical order (see the output in the Results below). Hence, τ 1 denotes the effect of the first treatment in alphabetical order (copper), τ 2 denotes the effect of the second treatment (iron) and τ 3 denotes the effect of the third (nickel). Note the correspondence of 1, 1 and 0 after METAL in the ESTIMATE and CONTRAST statements. These specify the coefficients for the respective τ i in each statement. INTERCEPT 1 means μ is included in the terms to be estimated; absence of INTERCEPT means μ is not included. The SOLUTION option in the MODEL statement (you can abbreviate it as S) presents estimates of the fixed effect parameters.
Program 3.3
proc glimmix data=bond plots= (meanplot diffplot) ;
class ingot metal;
model pres = metal / solution;
random ingot;
lsmeans metal / diff cl;
estimate 'nickel mean' intercept 1 metal 0 0 1;
estimate 'copper vs iron' metal 1 1 0;
contrast 'copper vs iron' metal 1 1 0;
run ;
Results
Results from the SOLUTION option and the LSMEANS statement appear in Output 3.3. The ESTIMATE and CONTRAST statements appear in Output 3.4. The plots generated are in Figures 3.1 and 3.2, displayed in the next section.
Output 3.3: Results from the SOLUTION Option and LSMEANS Statements
Solutions for Fixed Effects
Effect
metal
Estimate
Standard Error
DF
t Value
Pr t
Intercept
71.1000
1.7655
6
40.27
.0001
metal
c
0.9143
1.7214
12
0.53
0.6050
metal
i
4.8000
1.7214
12
2.79
0.0164
metal
n
0
.
.
.
.
metal Least Squares Means
metal
Estimate
Standard Error
DF
t Value
Pr t
Alpha
Lower
Upper
c
70.1857
1.7655
12
39.75
.0001
0.05
66.3390
74.0324
i
75.9000
1.7655
12
42.99
.0001
0.05
72.0533
79.7467
n
71.1000
1.7655
12
40.27
.0001
0.05
67.2533
74.9467
Differences of metal Least Squares Means
metal
_metal
Estimate
Standard Error
DF
t Value
Pr t
Alpha
Lower
Upper
c
i
5.7143
1.7214
12
3.32
0.0061
0.05
9.4650
1.9636
c
n
0.9143
1.7214
12
0.53
0.6050
0.05
4.6650
2.8364
i
n
4.8000
1.7214
12
2.79
0.0164
0.05
1.0493
8.5507
Interpretation
The Solution for Fixed Effects table contains estimates of the fixed effects parameters. Because the X matrix is not of full rank, a consequence of having a classification variable in the MODEL statement and of including an intercept column in X , the estimates labeled Intercept, Metal c, Metal i, and Metal n are not unbiased estimates of , 1 , 2 , and 3 . The problem of a singular X matrix is solved by setting to zero the fixed effects solution corresponding to the last column of the classification effect METAL in the MODEL statement. The last level in this example n is the reference level. The other solutions are unbiased estimates of deviations from the reference level. The row labeled Intercept estimates + 3 , the row labeled Metal c estimates 1  3 , the row labeled Metal i estimates 2  3 , and the row labeled Metal n estimates 3  3 = 0, yielding the following:
μ ^ + τ ^ 3 = 71.100 τ ^ 1 − τ ^ 3 = − 0.9143 τ ^ 2 − τ ^ 3 = 4.800
Because there is no unique way of resolving the singularity in X , there is no unique way of deriving solutions. For example, changing the names of the metals in such a way that their order changes affects the choice of the reference level. However, certain linear combinations of the estimates are unique. These linear combinations correspond to estimable functions (See Littell, Stroup, and Freund, 2002 and Milliken and Johnson, 2009). Least squares means are such estimable functions of the fixed effects parameters.
The Least Squares Means table provides the modelbased estimates of the least squares means. Since, for example, 1 = + 1 = + 3 + ( 1  3 ), the least squares means in this example can be related easily to the estimates displayed in the Solution for Fixed Effects table:
μ ^ 1 = μ ^ + τ ^ 1 = μ ^ + τ ^ 3 + ( τ ^ 1 − τ ^ 3 ) = 71.1 + ( − 0.9142 ) = 70.1857 μ ^ 2 = μ ^ + τ ^ 2 = μ ^ + τ ^ 3 + ( τ ^ 2 − τ ^ 3 ) = 71.1 + ( 4.8 ) = 75.9 μ ^ 3 = μ ^ + τ ^ 3 = μ ^ + τ ^ 3 + ( τ ^ 3 − τ ^ 3 ) = 71.1 + ( 0.0 ) = 71.1
Because this design is balanced, the least squares means are simply the arithmetic averages of the data values from each of the treatments. This will generally not be the case for unbalanced designs.
Note: Many textbooks refer to μ as the overall mean. This is true only if we make assumptions about the model effects, in this case τ i , that are unnecessary, and not used by any SAS linear model procedures (see Appendix A for details). It is correct to call μ the intercept, but in general it is not correct to call it the overall mean. Look at the Solutions for Fixed Effects table in Output 3.3 above: You can clearly see why this is so.
3.3.2 Standard Errors for Mean Comparisons
Along with the least squares mean and mean difference estimates, the PROC GLIMMIX listing includes the standard errors. These arise as follows. First obtain the theoretical variance of the estimator. By definition, the standard error is the square root of the estimated variance of the estimator.
Variance of a least square mean: For a paired sample with no missing data, the least square mean for treatment i is equal to the sample mean, i.e.
μ ^ + τ ^ i = ( 1 / b ) ∑ j y i j
You can express this in model terms as ( 1 / b ) ∑ j ( μ + τ i + b j + e i j ) The variance is thus
Var ( ( 1 / b ) ∑ j ( μ + τ i + b j + e i j ) ) = ( 1 / b ) 2 ( ∑ j Var ( μ + τ i + b j + e i j ) ) = ( 1 / b ) 2 ( b Var ( b j ) + b Var ( e i j ) ) = ( σ b 2 + σ 2 ) / b
where b is the number of pairs, or blocks  in this case, b = 7. Thus, the estimated standard error of a least squares mean in this example is
( σ ^ 2 + σ ^ b 2 ) / 7 = 1.7655
A similar derivation gives the standard error of the difference. Begin with the difference between the twotreatment means, express the difference in terms of the model, and compute the variance:
( 1 / b ) ∑ j y 1 j − ( 1 / b ) ∑ j y 2 j = ( 1 / b ) ∑ j [ ( μ + τ 1 + b j + e 1 j ) − ( μ + τ 2 + b j + e 2 j ) ] = ( 1 / b ) ∑ j ( τ 1 − τ 2 + e 1 j − e 2 j )
Computing the variance gives ( 1 / b ) 2 ∑ j Var ( e 1 j − e 2 j ) = 2 σ 2 / b .
Thus, the estimated standard error of a difference is 2 σ ^ 2 / 7 = 1.7214 .
Important Note : In this example, (and all blocked designs) the standard error of the mean depends on the block and the residual variances, whereas the standard error of the difference depends only on the residual variance. Many scientific journals make the mistake of providing only the standard error of the mean. However, the most important inference in a comparative experiment concerns treatment mean differences ; the standard error of the estimated difference should be regarded as an essential piece of information in such publications. Except for a completely random design, you cannot determine the standard error of a difference if all you are given is the standard error of the mean.
The Differences of Least Squares Means table in Output 3.3 shows all possible pairwise comparisons among the three treatments. The interpretation of the results is that each row is a comparison of the mean in the column with heading metal minus the mean in the column with heading _metal. The estimated standard errors of the differences are all equal as the design is a randomized complete block with no missing data.
T values are computed to test that the differences between the respective means are zero. The results provide evidence that the means of both copper and nickel are different from iron but the copper and nickel means are not different from each other.
You can use the ESTIMATE statement for linear combinations of model effects. For illustration, consider estimating the linear combination equal to the nickel mean, n . First, express the nickel mean as the estimable function 3 = + 3 . More explicitly, 3 = 1 + 0 1 + 0 2 + 1 3 . These coefficients are used to construct the ESTIMATE statement:
estimate 'nickel mean' intercept 1 metal 0 0 1;
Similarly, the difference between the means for copper and iron is 1  2 = 1  2 . The ESTIMATE and CONTRAST statements for this comparison are as follows:
estimate 'copper vs iron' metal 1 1 0;
contrast 'copper vs iron' metal 1 1 0;
Output 3.4 displays the results of these estimate and contrast statements.
Output 3.4: Inference about Linear Combinations of Means
Estimates
Label
Estimate
Standard Error
DF
t Value
Pr t
nickel mean
71.1000
1.7655
12
40.27
.0001
copper vs iron
5.7143
1.7214
12
3.32
0.0061
Contrasts
Label
Num DF
Den DF
F Value
Pr F
copper vs iron
1
12
11.02
0.0061
The Estimates table collects the results of the ESTIMATE statements. The nickel mean and the differences between the copper and iron means are the same as those obtained from the LSMEANS statement with the DIFF option in Output 3.4. Notice that the estimate of the nickel mean, 71.1, is equal to the solution for the Intercept in the solution table. This is due to the set to zero constraint. Therefore, the Intercept solution is not equal to the overall mean.
The Contrasts table collects the results of CONTRAST statements. The contrast of copper vs iron tests the null hypothesis H 0 : 1  2 = 0. It is associated with an F statistic of 11.02. This equals the square of the t value from the corresponding ESTIMATE statement (3.32 2 ).
The ESTIMATE and CONTRAST statements are used to estimate and test hypotheses about linear combinations of terms in the mixed model, including random effects. The ESTIMATE statements in the GLIMMIX procedure allow multiplerow estimates and can adjust the corresponding t tests for multiplicity. The CONTRAST statement can consist of several linear combinations where the resulting F statistic provides a simultaneous test of the hypotheses that the set of linear combinations of the parameters are equal to zero.
The LSMEANS statement produces estimates and tests of treatment means and mean differences. The DIFF option, as the name implies, obtains mean differences. The PLOT options provide different ways of visualizing treatment means and differences. MEANPLOT shows the treatment means (Figure 3.1); the CL option places 95% confidence limits above and below each mean estimate. The DIFFPLOT (Figure 3.2) shows 95% confidence limits for all pairwise differences and plots them so that statistically significant and nonsignificant differences can readily be seen (once you get some practice looking at the plot!).
Important Note : Notice that the overlapping 95% confidence bars for the treatment means in Figure 3.1 must not be interpreted as the corresponding means are not significantly different. For example, the confidence bars for the copper and iron means overlap, but the p value for the copper versus iron treatment mean difference is 0.0061. For an appropriate display of differences, see Figure 3.2.
Figure 3.1: Plot of the Least Squared Means
Figure 3.2: Plot of the Differences of the Least Squares Means
In the plot in Figure 3.2, the group means themselves are represented by both horizontal and vertical lines, labeled appropriately. The differences between means are represented by where the horizontal and vertical lines representing different means cross. The trick to this plot is that the (invisible) axis for these differences is at a 45 angle counterclockwise from the main Y axis. Confidence intervals for the differences are shown against this alternate axis, as is the reference line for difference = 0. Thus, a difference is insignificant when the confidence interval for a difference crosses this difference = 0 line.
In this section, we have presented estimation of treatment differences and inferential statistics available with SAS mixed model procedures that control for comparisonwise error rate, that is, p values apply to each test without regard to other tests also being made. Typically, these are used when a Type I error is not a paramount concern or avoidance of Type II error is an even bigger concern. For designs with three or more treatments, experimentwise error rate, that is, the probability of making at least one Type I error, is often the overriding concern. In statistical practice, the topic dealing with error control when testing more than two treatments is called multiplicity . The next section introduces the basic issues and tools available. Subsequent sections use examples that illustrate the major treatment designs and provide more detail for those designs.
3.3.3 Multiple Comparisons and Planned Comparisons
The F test for a factor in an analysis of variance tests the null hypothesis that all the factor means are equal. However, the conclusion of such a test is seldom a satisfactory end to the analysis. You usually want to know more about the differences among the means (for example, which means are different from which other means or if any groups of means have common values).
Multiple comparisons of the means are commonly used to answer these questions. There are numerous methods for making multiple comparisons, many of which are available in PROC GLIMMIX. In this chapter, only a few of the methods are illustrated.
One method of multiple comparisons is to conduct t tests between all possible pairs of means; this is known as the least significant difference (LSD) test. Refer to Steel and Torrie (1980) for examples.
The LSD test applies the α level applies to each comparison, referred to as the comparisonwise error rate . However, if you have t treatments, there are ( t ( t − 1 ) ) / 2 possible comparisons, and in many cases it is more appropriate to be concerned with the probability of at least one Type I error. This is called the experimentwise error rate . If you have c comparisons, the experimentwise error rate is ≥ 1 − ( 1 − α ) c , with strict equality only if all comparisons are independent  which they cannot be in an LSD test. In statistics, this is called the multiplicity problem.
A traditional method of dealing with multiplicity in multiple comparisons is Duncan's multiplerange test. With this test, the means are first ranked from largest to smallest. Then the equality of two means is tested by referring the difference to tabled critical points, the values of which depend on the range between the ranks of the two means tested. The larger the range of the ranks, the larger the tabled critical point (Duncan 1955).
The LSD method and, to a lesser extent, Duncan's method are frequently criticized for failing to control the Type I error rate, specifically the experimentwise error rate. In other words, the overall probability of falsely declaring some pair of means different, when in fact they are equal, is substantially larger than the stated level. In fact, many journals explicitly discourage the use of Duncan s multiple ranges test. PROC GLIMMIX provides several multiplicity options with varying levels of experimentwise error control. In addition to the LSD, i.e., no multiplicity adjustment, PROC GLIMMIX has options that enable the use of methods proposed by Bonferroni, Nelson, Scheffe, Sidak, and Tukey. Duncan s adjustment, specifically intended for comparisons between a control and other treatments, in also available. In addition to tablebased adjustments, you can also use the SIMULATE option to obtain simulationbased inference to adjust for multiplicity.
You can request the various multiple comparison tests with the ADJUST option in the LSMEANS, ESTIMATE, and LSMESTIMATE statements in the GLIMMIX procedure.
Multiple comparison procedures, as described in the previous paragraphs, are useful when there are no preidentified comparisons of special interest. However, in many situations there are specific comparisons known to be of particular interest. These are called preplanned comparisons because you can decide to make these comparisons prior to collecting data. Specific preplanned comparisons can be tested using the CONTRAST, ESTIMATE, LSMESTIMATE, or LSMEANS statements in PROC GLIMMIX, as discussed in Section 3.5.5, "Planned Comparisons in a TwoWay Factorial Experiment," later in this chapter. In addition, ESTIMATE, LSMESTIMATE, or LSMEANS statements enable you to estimate the size of the difference and, if desired, compute a confidence interval, as well as merely test whether the difference is zero.
3.4 Comparison of Quantitative Factors: Polynomial Regression
In many research areas, the explanatory factor is not a group of categories but rather chosen amounts of a particular treatment. These could be amounts of fertilizer or water applied to plants, or in industrial settings the temperature or pressure under which components are made. When the levels of the factor are quantitative, researchers are often interested in describing the change in the response over the change in the explanatory factor. Therefore, we want to create a regression model that adequately describes the data. One such method is to use polynomial regression. Often times the relationship between the response variable and the independent variables is a known function, which may be a nonlinear model instead of a linear model. Some such models are logistic functions that are used in the case of doseresponse experiments and others follow a nonlinear function, that is, a function that cannot be expressed as a linear combination of its unknown parameters. Such cases are typically not suitable for polynomial regression unless the nonlinear curve can be adequately approximated by a lowdegree polynomial. More straightforward is a direct nonlinear regression, which we do not cover here.
3.4.1 Analy s is of a Regression Experiment
In many studies, the treatment design consists of quantitative levels, or amounts. Cooking time, drug dosage, irrigation volume, and storage temperature are all examples. Regression is a useful mean comparison tool in conjunction with mixed models. Polynomial regression is especially useful with linear mixed models. The following example illustrates the essential ideas.
In this example, a horticulturist is interested in the effect of added nitrogen to the final dry weight of a particular house plant. The greenhouse has six benches with room for six plants each. To account for a known gradient within the greenhouse resulting from the angle of the sun and the circulation fans, the researcher blocks by bench. The added nitrogen ranges from 0 to 200 mg/kg soil in 40 mg increments. The goal of the study is to estimate a regression equation to predict plant weight for a given amount of nitrogen applied.
In polynomial regression the number of levels of the factor determines the maximum degree polynomial we can fit. If there are T levels, you can fit a polynomial of order T − 1 . However, it is common practice to include additional levels beyond the minimum needed to fit the polynomial that we believe will be sufficient. This enables us to test lack of fit, that is, to confirm that the polynomial that we have chosen is adequate to describe the data (or tell us that our initial belief was incorrect). For example, Figure 3.3 shows a plot of means of the 5 levels of a quantitative treatment factor. If only the two extreme low and high levels, X = 1 and X = 5 were included in the study, it would only be possible to estimate a straight line. The curvilinear nature of the regression model would be missed. If the midlevel X = 3 was also observed, it would be possible to diagnose the lack of fit from linear, or you could fit a quadratic regression. However, with only three levels, you would not be able to see that the actual relationship between treatment level and response.
Figure 3.3: Possible Regression Estimates as a Function of Treatment Design
The point is that, given the number of levels of the regression factor, there is a polynomial degree that can be fit, and also one that can be confirmed by testing some hypotheses. Table 3.1 shows what is possible for given treatment designs.
Table 3.1: Degree Polynomial Possible
Number of levels
Fit degree
Maximum Confirm degree
2
Linear
3
Quadratic
Linear
4
Cubic
Quadratic
5
Quartic
Cubic
6
Quintic
Quartic
This experiment has 6 levels of nitrogen (0, 40, 80, 120, 160, and 200), so we could fit a fifthdegree polynomial, though we can only confirm a fourthdegree polynomial. To develop some intuition as to which degree might best fit, we can begin the analysis by treating nitrogen as a classification variable as in ANOVA and obtain the LS Means plot.
Program
Use Program 3.4 to create a mean plot and overall effect test.
Program 3.4
proc glimmix data=plants;
class bench n;
model plant_wt=n;
random intercept / subject=bench;
lsmeans n / plot=meanplot(join);
run;
The join option causes the means to be connected, enabling you to visualize the possible polynomial fit.
Results
The mean plot is shown in Figure 3.4, and the overall test of nitrogen effect is shown in Output 3.5.
Output 3.5: Variance Estimates and Overall Test of Nitrogen Effect for Plant Weight Data
Covariance Parameter Estimates
Cov Parm
Subject
Estimate
Standard Error
Intercept
bench
1.3723
1.1312
Residual
2.4419
0.6907
Type III Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
N
5
25
15.01
.0001
The table of covariance parameter estimates in Output 3.5 gives the variance among benches ( σ ^ b e n c h 2 = 1.37 ) and the variance among experimental units within benches ( σ ^ r e s i d u a l 2 = 2.44 ) . Interpret the bench variance as a measure of the variability attributable to the greenhouse bench gradient. Note that for small experiments the standard error is a misleading statistic. That is, the standard errors of a variance estimate is useful only when you have thousands of observations. If you want to test the bench variance, use the COVTEST statement to obtain a likelihood ratio test, as shown in Chapter 6 .
The F value for overall test of no nitrogen (N) effect is 15.01 with p value 0.0001. Figure 3.4 enables you to visualize the nitrogen effect.
Figure 3.4: Least Squares Mean Plot for Nitrogen Experiment
In this case we see an Scurve, common in doseresponse experiments. In polynomial regression, this suggests a cubic function. Although a nonlinear regression is an alternative, we will continue with the polynomial for illustrative purposes. We have three methods that we can use to choose an adequate model to fit the data: (1) directly fitting the maximum polynomial using the Type I hypotheses, (2) creating a lackoffit variable and fitting the degree polynomial we believe is sufficient with the lackoffit testing sufficiency, and (3) using orthogonal polynomial contrasts. Notice in the ensuing examples that the model must account for all five degrees of freedom associated with nitrogen. We do this to ensure that all tests use the bench and residual variance estimates as shown in Output 3.5, that is, the appropriate variance estimates associated with the experiment design of this study.
Before we use one of the three methods to fit the regression, it is instructive to fit the model using the PROC GLIMMIX defaults. Instead of making nitrogen a classification variable as in ANOVA, we omit it from the class statement and include all of the polynomial terms in the model statement. With six levels we could fit up to a fifth degree polynomial, so we will need all five terms in the initial model statement.
Program
The full polynomial fit is implemented in the model statement in Program 3.5.
Program 3.5
proc glimmix data=plants;
class bench;
model plant_wt = n n*n n*n*n n*n*n*n n*n*n*n*n;
random intercept / subject=bench;
run;
Results
Results are in Output 3.6.
Output 3.6: Selected Output from Direct Fit
Type III Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
N
1
26
0.30
0.5860
N*N
1
26
0.30
0.5862
N*N*N
1
26
0.29
0.5947
N*N*N*N
1
26
0.21
0.6503
N*N*N*N*N
1
26
0.13
0.7173
What is striking about the Type III Tests of Fixed Effects table is that none of our five terms are statistically significant at any reasonable level, despite the highly significant overall N effect (Output 3.5), and the plot showing an obvious visual effect of nitrogen. This obvious discrepancy results from the nature of the Type III Tests themselves. The Type III tests test the hypothesis that we can informally but instructively state as, Do you need this variable in the model given the variation of the other variables is already removed? Here this means we can drop any one of the terms in the polynomial and the others will pick up the slack in the fit. One reasonable next step would be to sequentially drop the highest order terms in the model until a parsimonious fit is found. Useful in this regard are Type I Sums of Squares, and the next section discusses the difference between Type I and Type III Sums of Squares.
3.4.2 Type I versus Type III Sums of Squares
Before illustrating the recommended approach to direct regression, we first digress into the different hypothesis testing options that are available within the GLIMMIX procedure. By default, PROC GLIMMIX (and PROC MIXED) use Type III Tests of Fixed Effects. The reason for this default is illustrated in Chapter 2 , Section 2.4.1. However, Type III tests are not the only type of test available. In fact, there are three types of tests, Type I, Type II and Type III, available in SAS mixed model procedures. The section focuses on Type I and Type III. It is important to understand the difference, and how it affects inference and interpretation of the analysis.
Type III sums of squares are one type of partial sums of squares. Type III sums of squares measure the effect of a factor after accounting for the effects of all other terms in the model. For example, if we have two factors, The Type III Sums of squares for A and B are as follows:
SS(A  B), read as sum of squares for A after accounting for B
SS(B  A)
In other words, a Type III test of A tests the effect of A after accounting for the effect of B, even if A appears first in the MODEL statement. Because each test is dependent on all of the other terms in the model, the order in which we specify the terms in the model statement does not matter. As we saw in Section 2.4.1, with blocked designs the Type III tests enable you to test treatment effect adjusted for the impact of missing data or an incomplete block structure. However, in the case of regression, if you let A correspond to the linear effect and B to the quadratic, then the Type III test of linear is actually a result of SS(linear  quadratic), i.e. testing linear after fitting quadratic may not be useful.
Type I tests, on the other hand, are based on sequential sums of squares. That is, each test is the effect of that factor after accounting for the prior effects specified in the model. If we specify our model as
model y = a b;
then the Type I sums of squares are as follows:
SS(A) for factor A;
SS(BA) for factor B.
However, if we specify the model as
model y = b a;
then the type I sums of squares are as follows:
SS(B) for factor B;
SS(AB) for factor A;
The order of the effects in the model affects the results of the tests from the Type I sums of squares. In Section 2.4.1 we saw that if you fit
model y = trt block / htype=1;
you get a test of the unadjusted means, i.e. tests that are confounded with the structure of missing data. In general, when testing CLASS effects, this is an inappropriate test. However, with regression, if you fit
model y = n n*n / htype=1;
the tests correspond to the following:
SS(N)
SS(N*N  N)
i.e. linear regression over and above a simple intercept model, followed by quadratic over and above linear. In other words, it answers the question: Is there a linear effect in addition to not accounting for N at all, and is there a quadratic effect in addition to the linear effect? The Type I sums of squares clearly provides the answer.
Which tests are used depends upon the goal of the analysis. In most cases when we are dealing with class variables, the Type III sums of squares are preferred because they account for the design structure and possible effects of missing data. Type III tests are also handy for large designs, as they immediately provide indication of the contribution of each effect beyond the others. However, for standard polynomial regression, where order is obvious  i.e. linear first, then quadratic, etc.  the sequential Type I tests make sense. For general regression, where the standard polynomial approach is not adequate and order is not obvious, both Type 1 and Type 3 tests can be useful.
3.4.3 Direct Fit of Polynomial
The direct fit method utilizes the Type I sequential sums of squares tests to determine the degree which best fits the data.
Program
Writing out all of the terms as we did in Section 3.4.1 can be tedious. Instead, you can use a shortcut using the vertical bar symbol,  as shown in the MODEL statement of the PROC GLIMMIX statements in Program 3.6.
Program 3.6
proc glimmix data=plants;
class bench;
model plant_wt = nnnnn / htype=1;
random intercept / subject=bench;
run;
The HTYPE=1 option gives you Type I tests, overriding the TYPE III test default. The vertical bar notation generates a linear predictor identical to the MODEL statement with N, N*N, etc. explicitly written as in Section 3.4.1. Specifying the MODEL statement either way will produce identical results. We will make use of this shortcut symbol frequently when analyzing factorial designs.
Results
Selected results of this program appear in Output 3.7.
Output 3.7: Selected Output from Direct Fit Using htype=1
Covariance Parameter Estimates
Cov Parm
Subject
Estimate
Standard Error
Intercept
bench
1.3723
1.1312
Residual
2.4419
0.6907
Type I Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
N
1
26
67.14
.0001
N*N
1
26
1.42
0.2442
N*N*N
1
26
4.38
0.0462
N*N*N*N
1
26
1.97
0.1718
N*N*N*N*N
1
26
0.13
0.7173
Interpretation
The covariance parameter estimates table in Output 3.7 shows the same estimates that we obtained from the default fit. The hypothesis type for the fixed effects does not affect the fit of the model itself.
Because the Type I tests of fixed effects are sequential tests, we begin at the bottom of the table to determine whether that degree term is significant in the model. We eliminate higherorder terms until we reach one that is significant. At that point we stop and retain that term and all lowerorder terms in the model (whether the lowerorder terms are significant or not). In this case we find that neither the fifth nor fourth degree terms add significantly to the model after the lowerorder terms are included ( p values .7173 and .1718 respectively). The cubic term is significant at the = 0.05 level. This is consistent with our inspection of the mean plot that a thirddegree polynomial describes the nitrogen effect for these data.
Knowing the cubic polynomial is the best solution, we refit the data using only those terms and request the solution. We can use the solution formula to predict plant weights for nitrogen levels not observed but within the range of the experiment.
Program
Program 3.7 obtains the solution.
Program 3.7
proc glimmix data=plants;
class bench;
model plant_wt = nnn / htype=1 solution;
random intercept / subject=bench;
run;
Output 3.8: Solution Output from Fitting the Cubic Polynomial
Covariance Parameter Estimates
Cov Parm
Subject
Estimate
Standard Error
Intercept
bench
1.3707
1.1308
Residual
2.4517
0.6673
Solutions for Fixed Effects
Effect
Estimate
Standard Error
DF
t Value
Pr t
Intercept
6.1791
0.7879
5
7.84
0.0005
N
0.04113
0.03042
27
1.35
0.1875
N*N
0.000855
0.000378
27
2.26
0.0319
N*N*N
2.59E6
1.241E6
27
2.09
0.0463
Interpretation
Notice that the variance component estimates change somewhat from those obtained from the full model. This is because the higher order polynomial terms that were in the model have now been pooled with the random effects. In the Solutions for Fixed Effects table, our primary interest is in the regression coefficient ESTIMATES and their standard error. Hypotheses about them have already been tested and are the motivation for fitting this reduced model. Therefore, the test statistics and associated p values in this table are of little interest. Our final prediction formula is
Plant Weight ^ = 6.1791 − 0.04133 n + 0.000855 n 2 − 0.00000259 n 3
where n is the amount of nitrogen in mg/kg soil.
With levels 0 through 200, the resulting regression coefficients are quite small. You might consider rescaling the N levels, e.g. dividing them by 10 and fitting the model with N=0, 4, 8, 16 and 20. The resulting regression equation would then be 6.1791 − 0.4133 x + 0.08554 x 2 − 0.00259 x 3 , where x = n / 10 .
Direct Fit of Polynomial with Lack of Fit
With only six levels to our explanatory factor, nit r ogen, it is not difficult to write the model to directly fit the full polynomial. However, it is often useful to include only the polynomial terms believed to explain the treatment effect and pool the remaining higher order terms into a single term, used to test lack of fit of the proposed polynomial model; hence, the term is called lack of fit in regression theory. For example, in the plant weight example, we believe a cubic regression will be adequate, so the lack of fit term pools the quartic and quintic effects.
In the plants data set, we currently only have the regression variable, n , for the level of nitrogen. In order the compute a lack of fit term, we create a new variable  call it cn (for the class version)  in the following DATA step.
data plants;
set plants;
cn = n;
run;
Now with these two versions of the nitrogen variable, we can directly test our thirdorder polynomial for lack of fit. We again make use of the Type I hypothesis tests being sequential tests, so that including the classification variable at the end of the model becomes a test of the remaining degrees of freedom for nitrogen after fitting the cubic polynomial.
Program
Program 3.8 shows the needed GLIMMIX statements.
Program 3.8
proc glimmix data=plants;
class bench cn;
model plant_wt = nnn cn / htype=1;
random intercept / subject=bench;
run;
The variable n defines the regression (in this case nnn specifies cubic regression) and must not be defined as a class variable. On the other hand, the cn variable does appear in the class statement. Including cn in the model statement after nnn in conjunction with the HTYPE=1 option causes cn to act as a lack of fit term.
Results
Output 3.9 shows the results.
Output 3.9: Fixed Effect Tests Using a Lack of Fit variable.
Type I Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
N
1
25
67.14
.0001
N*N
1
25
1.42
0.2446
N*N*N
1
25
4.38
0.0466
cn
2
25
1.05
0.3635
Interpretation
Our interest is in whether the lack of fit test enables us to conclude that the cubic polynomial is sufficient to describe the effect of nitrogen on plant weight. We see that the test for cn , the lack of fit, is a two degree of freedom test because it combines the remaining two degrees of freedom for nitrogen after we have fit the cubic polynomial. This is the combined test for the fourth and fifth order terms that we included directly in the previous section. Here we have that there is no significant lack of fit, p value 0.363. We can conclude that the cubic polynomial provides an adequate fit for these data. We would then continue as in the previous section.
3.4.4 Orthogonal Polynomial Contrasts
Using a direct fitting method as presented in the previous sections exploits the capabilities of contemporary linear model software. An alternative, developed in the precomputer era, uses orthogonal polynomial contrastscontrasts specifically defined for regression analysisto partition the treatment sums of squares.
Orthogonal polynomial contrasts are contrasts defined to correspond to each order polynomial regression. In the plant weight example, this means five contrasts, one to test linear effect, one for quadratic, one for cubic, etc. By definition, coefficients in contrast statements must sum to zero. To make a set of contrasts orthogonal , the sum of the multiplied coefficients of any two contrasts in the set must sum to zero.
To illustrate, suppose we have a treatment design with three quantitative levels. Therefore, we have two degrees of freedom and can write two contrastsone to test the linear trend and one the quadraticalternatively interpreted as lack of fit from linear regression. Bearing in mind that the null hypothesis for a contrast is no effect, we determine coefficients for a linear combination of means that demonstrate no linear effect of the factor. − 1 μ 1 + 0 μ 2 + 1 μ 3 is a possible combination. The coefficients are along a straight line on a yaxis (where the means are ordered along an xaxis). A linear slope equal to zero is equivalent to μ 1 = μ 3 , whereas μ 1 ≠ μ 3 indicates a straight line with a nonzero slope, i.e. a linear effect.
For the quadratic effect test, we can picture our same axes with a quadratic curve instead of a straight line. Because contrast coefficients must sum to 0, either 1 μ 1 − 2 μ 2 + 1 μ 3 or inversely, − 1 μ 1 + 2 μ 2 − 1 μ 3 would work for the test of a quadratic effect (and there are others). That is, departure from linear is equivalent to μ 2 ≠ ( μ 1 + μ 3 ) / 2 . We can also see that either of these contrasts is orthogonal to the linear contrast, i.e. ( − 1 ) ∗ 1 + 0 ∗ ( − 2 ) + 1 ∗ 1 = 0 and ( − 1 ) ∗ ( − 1 ) + 0 ∗ 2 + 1 ∗ ( − 1 ) = 0 .
Tables of orthogonal polynomial contrast coefficients for varying numbers of levels of the regression factor are available in textbooks and online. These tables presume equally spaced levels. If your chosen levels are not equally spaced, you can obtain the coefficients using the ORPOL function in PROC IML. This procedure is explained in SAS for Linear Models, 4 th Edition in Chapter 7 . Chapter 5 , section 5.7 of this book also shows an example that uses orthogonal polynomials for logarithmically rather than equal spaced levels.
In the plant weight example, there are six, equally spaced levels of nitrogen, meaning we can construct five mutually orthogonal contrasts to fully test the model. To implement contrasts in SAS linear mixed model software, use the basic form of the CONTRAST statement:
CONTRAST label' effectname effectcoefficients;
Contrasts use the least squares means, so nitrogen must be treated as a class variable in this program. Program 3.9 shows PROC GLIMMIX statements to implement regression analysis using orthogonal polynomial contrasts.
Program 3.9
proc glimmix data=plants;
class bench n;
model plant_wt = n;
random intercept / subject=bench;
contrast 'Linear' n 5 3 1 1 3 5;
contrast 'Quadratic' n 5 1 4 4 1 5;
contrast 'Cubic' n 5 7 4 4 7 5;
contrast 'Lack of fit' n 1 3 2 2 3 1,
n 1 5 10 10 5 1;
run;
The program includes separate CONTRAST statements to test linear, quadratic and cubic effects. Instead of separate quartic and quintic statements, this program shows how to combine quartic and quintic contrasts into a single twodegreeoffreedom test labeled lack of fit. Output 3.10 shows the results.
Output 3.10: Results of Orthogonal Polynomial Contrasts
Covariance Parameter Estimates
Cov Parm
Subject
Estimate
Standard Error
Intercept
bench
1.3723
1.1312
Residual
2.4419
0.6907
Type III Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
N
5
25
15.01
.0001
Contrasts
Label
Num DF
Den DF
F Value
Pr F
Linear
1
25
67.14
.0001
Quadratic
1
25
1.42
0.2446
Cubic
1
25
4.38
0.0466
Lack of fit
2
25
1.05
0.3635
The covariance parameter estimates and Type III test of overall N effect are identical to those in Output 3.5, as one would expect given that the CLASS and MODEL statements are the same. The contrasts results break the N effect into its regression components. Notice that the F and p values for linear, quadratic, cubic and lack of fit are identical to those in Output 3.9, as is their interpretation.
It is possible to use the estimates from the contrasts to obtain the coefficients for the polynomial regression line (see Snedecor & Cochran and Steel & Torrie). However, in most cases, it is easier to get the estimated regression equation using the direct method.
3.5 Mean Comparisons in Factorial Designs
Two basic aspects of the design of experiments are treatment structure and error control. Choosing between randomization schemes, such as completely randomized, randomized blocks, and so on, is part of error control. This aspect is sometimes called the experiment design . On the other hand, the structure of the treatments consisting of the factor(s) and factor levels that are to be observed is called the treatment design . The factorial treatment design is one of the most important and widely used treatment structures. The factorial treatment design can be used with any randomization scheme, or experiment design. This section introduces the analysis of variance and mean comparison procedures used with factorial experiments.
A factorial treatment design consists of two or more treatment factors. The generic notation for factorial experiments is as follows. Factors are denoted by capital letters, beginning with A. A twofactor factorial has factors A and B. The third factor of a threefactor factorial is called factor C. Levels of each factor are denoted by lowercase letters. Factor A has a levels, factor B has b levels, and so forth. The number of levels are used to describe the design. A twofactor factorial is called an a × b factorial treatment design. Note that all factors have at least two levels. If all factors have the same number of levels, the design is referred to as a l M factorial, where l denotes the number of levels per factor and M denotes the number of factors. A twofactor experiment with a = b = 2 is called a 2 2 factorial treatment design.
A complete factorial experiment consists of all possible combinations of levels of each factor. Levels can refer to numeric quantities of variables, such as pounds of fertilizer or temperature in degrees, as well as qualitative categories, such as names of plant varieties or drug types. An example of a factorial experiment is a study using nitrogen, phosphorus and potassium, each at three levels. Following the naming convention described above, such an experiment is called a 3 3 factorial, and has 3 3 = 27 treatment combinations.
Factorial experiments can be used to investigate several types of treatment effects. These are as follows:
simple effects (i.e. difference in mean response of levels of one factor holding the other factor or factors constant at a given level)
interactions ( i.e. difference in simple effects of one factor that result from changing levels of another factor; that is, do the simple effects remain constant no interaction or do they change, meaning interaction ? )
main effects (i.e. differences in mean response to levels of one factor averaged over all the levels of the other factor)
The simplest factorial treatment design is the 2 2 or 2 2 factorial. It is a good place to start to establish the essentials of understanding and working with factorial experiments.
3.5.1 Factorial: The Basics
The basics for factorial experiments include the following:
formal notation for the factorial treatment mean
formal definitions of simple effects, interaction and main effects in terms of the treatment means
formal definition of the statistical model for the factorial treatment design
introduction to contrasts that are specifically useful for factorial experiments and their connection with terms of the model
SAS commands that are specific to factorial structure
Refer to the mean for a factorial experiment as the A B mean or, more precisely, the A i × B j mean, where i = 1 , 2 references the levels of factor A and j = 1 , 2 references the levels for factor B. For modeling and effect definition purposes, denote the A i × B j mean as μ i j .
Simple effects
The simple effect of A holding B constant at level B 1 is defined as μ 11 − μ 21 , the difference between the means of the two levels of A. The symbolic language A  B 1 , read the simple effect of A given B 1 is often used. Similarly, A  B 2 , simple effect of A given B 2, often shortened to A given B 2 is defined as μ 12 − μ 22 ; B  A 1 , B given A 1 is defined as μ 11 − μ 12 ; and, B given A 2 is defined as μ 21 − μ 22 .
Interaction
Interaction occurs when the simple effect given one level of the other factor differs from the simple effect given the other level of the other factor. That is, interaction occurs if A  B 1 ≠ A  B 2 or, equivalently, B  A 1 ≠ B  A 2 . The measure of interaction is thus formally defined by the difference between simple effects, for example, the difference A  B 1 − A  B 2 = μ 11 − μ 21 − ( μ 12 − μ 22 ) , or, alternatively, B  A 1 − B  A 2 = μ 11 − μ 12 − ( μ 21 − μ 22 ) . Both expressions can be rewritten μ 11 − μ 12 − μ 21 + μ 22 . Interaction exists when simple effects are not equal; that is, when μ 11 − μ 12 − μ 21 + μ 22 ≠ 0 . Simple effects equal, that is μ 11 − μ 12 − μ 21 + μ 22 = 0 , defines the condition of no interaction .
Notice that μ 11 − μ 12 − μ 21 + μ 22 is a contrast. It is the simplest form of an interaction contrast , an important tool for the analysis of factorial experiments. We will use interaction contrasts in subsequent examples.
The interaction plot is an essential visual aid for identifying and interpreting interaction. The following three plots are representative of typical relationships among A B means. Another interpretation of a basic twoway interaction is as a difference of the differences, or the difference of the simple effects. When the secondorder difference is zero, it indicates that the firstorder differences remain constant across the levels of the two factors.
Figure 3.5: Mean Plot Indicating no Interaction
Figure 3.6: Mean Plot Indicating Interaction
Note: A  B 0 ≈ 0 and A  B 1 ≫ 0.
Figure 3.7: Mean Plot Indicating Interaction
Note: A  B 0 < 0 and A  B 1 > 0.
Main effects
Main effects means, also called marginal means, are means for a given level of one factor averaged over the levels of the other factor. For example, the main effect mean for A 1 is
μ ¯ 1 · = ( 1 / b ) ∑ j = 1 b μ 1 j
where b denotes the number of levels of factor B. For a 2 2 factorial, b = 2 , so the main effect means reduces to
μ ¯ 1 · = ( 1 / 2 ) ∑ j = 1 2 μ 1 j
The dotnotation is standard shorthand for summing over all levels of the subscript replaced by a dot ( ). μ j · means sum over all j , that is all levels of factor B; μ · j means sum over all i , that is all levels of factor A. μ · · denotes the grand total, summing over everything. Dotnotation with a bar , μ ¯ i · means divide the total by the number of levels of j , that is the number of levels of factor B, to obtain the mean.
You can visualize the main effect means using the interaction plot. For example, plot the main effect means for A 0 and A 1 on Figure 3.5 to provide Figure 3.8: A 0 main effect mean.
Figure 3.8: Plot in Figure 3.5 with Main Effect Means Added
A main effect is defined as the difference between main effect means for a given factor. For a 2 2 factorial, the main effect of A is μ ¯ 1 · − μ ¯ 2 · , and the main effect of B is μ ¯ · 1 − μ ¯ · 2 .
Notice that main effects convey useful information about a given factor s effect only if the interaction is negligible. In the example plots above, the main effects are meaningful in Figure 3.5 but not in Figures 3.6 and 3.7. Reporting main effects in the presence of a nonnegligible interaction is at best misleading and should not be attempted. For this reason, always start the analysis of a factorial by assessing the interaction. If the interaction is negligible, proceed to the main effects. If the interaction is not negligible, focus on simple effects only . Interpreting main effects in the presence of a nonnegligible interaction is typically inappropriate.
Main effects, like interaction, define a contrast over the A B treatment means. You can visualize this with Table 3.2.
Table 3.2: Contrast over the A B Treatment Means
Factor A Levels
1
2
Factor B Levels
1
2
1
2
A main effect
1
1
1
1
B main effect
1
1
1
1
A B Interaction
1
1
1
1
The coefficients for the three contrasts appear in the unshaded part of the table. Notice two things:
The contrasts are mutually orthogonal and form a complete set: therefore, they partition the three treatment degrees of freedom into three singledegreeoffreedom components.
The interaction contrast consists of the crossproducts of the coefficients of the main effects contrasts. When there are multiple main effect degrees of freedom, you can use this result to form interaction contrasts from all possible crossproducts of maineffect contrasts defined on A and B.
Model for the Factorial Treatment Design
The above contrasts are important for the construction of the model for factorial experiments. For a completely randomized experiment design, the model can be expressed in one of two ways.
Cell means model is as follows:
y i j k = μ i j + e i j k
where y i j k denotes the observation on the k th experimental unit assigned to treatment A i × B j , and e i j k ~ NI ( 0 , σ 2 ) . Equivalently, you can give this model in probability distribution form as y i j k ~ N ( μ i j , σ 2 ) .
Effects model is as follows:
y i j k = μ + α i + β j + α β i j + e i j k
where denotes the intercept, i denotes the main effect of A, j denotes the main effect of B, and ij denotes the A B interaction.
In PROC GLIMMIX, you specify each model as follows. Both use the following:
proc glimmix;
class a b;
The MODEL for the cell means model is
model y = a*b/noint;
The NOINT option suppresses the intercept and A * B specifies that the only fixed effect in the model is the A i × B j effect. If you do not include the NOINT option, PROC GLIMMIX fits the model y i j k = μ + α β i j + e i j k .
The MODEL for the effect means model is the following:
model y = a b a*b;
The terms A and B specify the main effects α i and β j , respectively, and A * B specifies the interaction effect α β i j . Alternatively, you can use the following syntax:
model y = ab;
The vertical bar () tells PROC GLIMMIX to obtain all main effects and all possible products of the effects listed.
The following statements are useful for factorial experiments.
Create an interaction plot.
lsmeans a*b/plots=meanplot(sliceby=a join cl);
The option PLOTS=MEANPLOT produces a plot of the A × B means. SLICEBY=A places the levels of B on the Xaxis. JOIN connects the means for each level of A. SLICEBY=B places the levels of A on the Xaxis; JOIN would then connect the B means. The SLICEBY option is essential for an interaction plot. The JOIN command is optional  connecting the means often helps you visualize important features of the treatment factor effects, especially (but not exclusively) the A × B interaction. CL provides error bars defined in terms of a 95% confidence interval on the estimated means. There is an ALPHA option, not shown here, that enables you to change the confidence level of the interval.
Obtain main effect means and differences.
lsmeans a b / diff;
The preceding LSMEANS statement provides main effect means and their estimated differences. As mentioned above, you generally want to use this statement only if the A × B interaction is negligible, i.e. nonsignificant or too small to be considered scientifically important. Otherwise, use the following statement for simple effects
lsmeans a*b/diff slice=(a b) slicediff=(a b);
The preceding LSMEANS statement provides A B means. DIFF calls for all possible pairwise differences. SLICE computes an F test for the hypothesis that all B means are equal one level of A at a time (SLICE=A) or all A means are equal one level of B at a time (SLICE=B). SLICEDIFF computes all possible pairwise simple effects, either among pairs of B one level of A at a time or vice versa. You can define SLICE and SLICEDIFF on A only, B only or both (the latter is shown above).
The other PROC GLIMMIX statement useful for factorial experiments is the LSMESTIMATE statement. LSMESTIMATE has relatively little value for 2 2 factorials but comes into its own when you have more levels or more factors.
Examples of PROC GLIMMIX for 2 2 factorials with illustrations of each of these statements are given in the SAS data sets intro1, intro2, and intro3, which were used to create Figures 3.5, 3.6, and 3.7. The use and interpretation are straightforward when you have only two levels per factor. The statements become a bit more involved when you have more than two levels. The next section illustrates their use. The next example also introduces the LSMESTIMATE statement in section 3.5.6.
3.5.2 A More Complex TwoWay Factorial Experiment
Now consider a factorial with more than two levels per factor. The primary complication results from the fact that all interactions, main effects and simple effects have multiple degrees of freedom. This means that the overall tests defined by the usual effects modelA B interaction, A and B main effectsyield inherently vague information. To work with these designs, you often must break the interaction into meaningful onedegreeoffreedom components, use these to determine nonnegligible aspects of the interaction, and proceed as appropriatei.e. with main effects or simple effects depending on the interaction results. Again, these effects should be broken into single degreeoffreedom terms that are informative and interpretable with respect to the study s objectives.
As an example, suppose three seed growthpromoting methods (METHOD) are applied to seed from each of five varieties (VARIETY) of turf grass. Six pots are planted with seed from each METHOD × VARIETY combination. The resulting 90 pots are randomly placed in a uniform growth chamber, and the dry matter yields (YIELD) are measured after clipping at the end of four weeks. Because the concern in this experiment is specifically about these five varieties and three growth methods, VARIETY and METHOD are regarded as fixed effects. A complete description of the experiment, e.g. for a scientific article, includes the treatment design, a 3 5 factorial, and the randomization scheme, a completely randomized design.
Data are recorded in a SAS data set called GRASSES. For convenience, the six replicate measurements are recorded in multivariate form, i.e. as Y1Y6 in the same data line. However, SAS mixed model software requires that data be in univariate form, that is each replicate observation on YIELD must appear in a separate line. Therefore, the data set GRASSES must be rearranged to permit analysis. This data manipulation would not be necessary if the values of YIELD had originally been recorded using one data line per replication.
Program
Use Program 3.10 to convert the data from multivariate to univariate form.
Program 3.10
data factorial;
set grasses;
drop y1y6;
yield=y1; output;
yield=y2; output;
yield=y3; output;
yield=y4; output;
yield=y5; output;
yield=y6; output;
run;
This DATA step creates a new data set, named FACTORIAL, containing the rearranged data.
It is often helpful to begin the analysis with a visual inspection of the interaction plot. This is particularly true of complex factorial experiments. You can use the PLOT option in PROC GLIMMIX to construct the interaction plot. Use the following statements:
proc glimmix data=factorial;
class method variety;
model yield = methodvariety;
lsmeans method*variety / plots=meanplot(sliceby=method join);
run;
The LSMEANS statement instructs SAS to compute the means and standard errors of the means of each METHOD×VARIETY combination. The MEANPLOT option creates an interaction plot of the LS means. SLICEBY=METHOD and JOIN work together to connect the VARIETY means for each METHOD enabling you to visualize METHOD and VARIETY effects.
Results
This plot appears in Figure 3.9.
Figure 3.9: Plot of Cell Means for Factorial Experiment
You could alternatively slice by VARIETY and draw a complementary interaction plot joining variety means with lines to show their profiles over methods. Which one you draw (or both) is a matter of personal preference and how you like to think about the interactions.
The interaction plot suggests that the differences between METHOD means depend on which VARIETY is used. This should be formally tested, however, since the graph only shows treatment means without their underlying variation.
Program
Run Program 3.11 to compute the analysis.
Program 3.11
proc glimmix data=factorial;
class method variety;
model yield=method variety method*variety;
run;
Both treatment factors, METHOD and VARIETY are classification variables and thus appear in the CLASS statement. The MODEL statement specifies that the analysis of YIELD is to contain sources of variation METHOD, VARIETY, and METHOD*VARIETY.
Results
Output 3.11 shows the results.
Output 3.11: Analysis of Variance for Factorial Experiment
Type III Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
method
2
75
24.25
.0001
variety
4
75
0.14
0.9648
method*variety
8
75
2.38
0.0241
Interpretation
Note that the METHOD*VARIETY effect is significant at the p = .0241 level, confirming the apparent interaction observed by visual inspection in Figure 3.9. Notice, however, that this test does not give any further information about the interaction. In fact, closer inspection of Figure 3.9 suggests that the interaction has many facets. For example, limiting attention to varieties 1 and 2 only, there is little visual evidence of interaction. On the other hand, limiting attention to varieties 3, 4 and 5, there is strong visual evidence of interaction, and the nature of the interaction is quite different for varieties 3 and 4 than it is for varieties 4 and 5. The next sections show how to isolate these aspects of the interaction and their associated simple effects.
3.5.3 Multiple Comparisons for a Factorial Experiment
As a general rule, if the interaction is negligible , you can perform multiple comparisons on the main effect means using the LSMEANS statement. In this case, you would use the following statement:
lsmeans method variety;
along with options shown earlier in this chapter for oneway treatment designs. On the other hand, if the interaction is not negligible, as is the case in this example, you must focus on breaking the interaction into its component parts, and then on associated simple effects that follow from the interaction components. The following sections show examples of what one might do. These examples are not intended to be exhaustive, but are intended to illustrate how to use the CONTRAST, ESTIMATE, LSMEANS, and LSMESTIMATE statements to obtain various inferential statistics likely to be of interest when analyzing a factorial experiment.
Before proceeding, two comments about what not to do . These are practices that either have been common in the past when statistical software was less sophisticated, or result from inadequate understanding of appropriate statistical practice for factorial treatment designs.
First comment: In the past, it was common practice to rerun the analysis with a BY statement, resulting in one analysis of variance table per level of the BY variable. However, this is very inefficient, because the error df for each analysis can be quite small. In essence, you are throwing out most of the data for each analysis. For example, if you do a separate analysis BY VARIETY you get ANOVAs with 2 df for METHOD and 10 df for error. This seriously reduces the power of the resulting tests. Options in PROC GLIMMIX specifically for simple effect analysis enable you to avoid this problem. The BYgroup approach also directly estimates distinct residual variance components for each level of VARIETY. Such heterogeneous variances may be warranted for some data sets, and it turns out that you can also accommodate this kind of model in a single analysis on all of the data, which is the recommended approach. See later chapters on how to set up and run such heterogeneous variance modelsthey effectively weight observations according to the inverse of their estimated variance. In general, use BY groups only when you are sure data from distinct BY groups are independent, or when you have a very large number of groups, for example, in a genomics experiment.
Second comment: The LSMEANS statement has several options for multiple comparison tests. LSMEANS computes both main effect means and factorial treatment combination means such as METHOD*VARIETY. It will also compute multiple comparison tests for these means but with the following caveat. Multiple comparisons for testing all possible pairwise differences among treatment combination means in a factorial experiment are generally considered to be inappropriate. Several authors have written articles critical of the frequent misuse of such procedures. See, for example, Chew (1976) and Little (1978). The main point of these objections is that with factorial treatment designs, the main focus should be on interactions first, then simple effects or main effects (but not both) depending on whether the interaction is negligible or not. Multiple comparisons, when applied indiscriminately to all possible differences, tend to obscure essential information contained in the data, and make interpretation needlessly complicated and confusing. Instead, you should proceed using one of more of the following approaches.
3.5.4 Multiple Comparisons of METHOD Means by VARIETY Using SLICE and SLICEDIFF
One approach to breaking a nonnegligible interaction into its component parts involves testing and estimating simple effects. The PROC GLIMMIX LSMEANS statement has two options for doing this. SLICE enables you to test overall simple effects of one factor at a specific level of the other factor. SLICEDIFF enables you to estimate pairwise simple effect differences. In this section, we illustrate each in turn. First, the SLICE option.
Suppose you want to test the overall simple effects of METHOD for each variety, i.e. test H 0 : μ A j = μ B j = μ C j for each variety j = 1 , 2 , 3 , 4 , 5 . Do this with the SLICE option by including the following statement after the MODEL in the PROC GLIMMIX program given above:
lsmeans method*variety/slice=variety;
The SLICE statements produce F tests for simple effects. For example, SLICE=VARIETY causes a separate F statistic to be computed for the METHOD effect at each VARIETY. Note that you can have multiple slices in the LSMEANS statement. For example, you could use either of the following statements if you wanted to test the overall simple effect of VARIETY for each METHOD as well as the overall effect of METHOD for each VARIETY:
lsmeans method*variety/slice=variety slice=method;
lsmeans method*variety/slice=(variety method);
Results
Only the results for SLICE=VARIETY are shown here. They appear in Output 3.12.
Output 3.12: SLICE Option to Test Simple Effect of METHOD at Each VARIETY
Tests of Effect Slices for method*variety Sliced By variety
variety
Num DF
Den DF
F Value
Pr F
1
2
75
3.41
0.0383
2
2
75
3.53
0.0341
3
2
75
4.90
0.0100
4
2
75
14.31
.0001
5
2
75
7.63
0.0010
Interpretation
You can see that the magnitudes of the METHOD effects vary among the VARIETIES. You can also see that there is a statistically significant METHOD effect for every VARIETY. Unfortunately, the SLICE option does not reveal any further detail about the simple effects. To do this, additional mean comparisons are required.
Program
To do this, you use the SLICEDIFF option as shown with Program 3.12.
Program 3.12
proc glimmix data=factorial;
class variety method;
model yield = methodvariety;
lsmeans method*variety/slice=variety slicediff=variety cl adjust=tukey;
run;
The SLICEDIFF option provides all possible pairwise differences between levels of one factorin this case METHODfor each level of the factor named in the SLICEDIFF= option. Notice that you can include SLICE and SLICEDIFF options in the same statement. However, you can run SLICEDIFF without SLICE and vice versa. The CL option causes PROC GLIMMIX to compute 95% confidence limits for each simple effect difference. You can use multiplicity adjustments if desired. There are three METHODs and hence three possible pairwise differences among method means. Therefore, there are three simple pairwise simple effects for METHOD for each VARIETY. If you use a multiplicity adjustment, SLICEDIFF applies it on a per VARIETY basis. That is, the adjustment is for the three comparisons within each VARIETY, not for the 15 comparisons total for all varieties. Output 3.13 shows the SLICEDIFF output.
Results
For space reasons, the results are broken into Output 3.13 with no multiplicity adjustment and Output 3.14 with the TUKEY adjustment.
Output 3.13: Confidence Limits for Simple Effect Differences between METHOD by VARIETY Unadjusted
Simple Effect Comparisons of method*variety Least Squares Means By variety
Simple Effect Level
method
_method
Estimate
Standard Error
DF
t Value
Pr t
Alpha
Lower
Upper
variety 1
a
b
6.6833
2.5593
75
2.61
0.0109
0.05
1.5849
11.7817
variety 1
a
c
3.3500
2.5593
75
1.31
0.1945
0.05
1.7484
8.4484
variety 1
b
c
3.3333
2.5593
75
1.30
0.1968
0.05
8.4317
1.7651
variety 2
a
b
6.6167
2.5593
75
2.59
0.0117
0.05
1.5183
11.7151
variety 2
a
c
1.9333
2.5593
75
0.76
0.4524
0.05
3.1651
7.0317
variety 2
b
c
4.6833
2.5593
75
1.83
0.0712
0.05
9.7817
0.4151
variety 3
a
b
7.6833
2.5593
75
3.00
0.0036
0.05
2.5849
12.7817
variety 3
a
c
5.8167
2.5593
75
2.27
0.0259
0.05
0.7183
10.9151
variety 3
b
c
1.8667
2.5593
75
0.73
0.4681
0.05
6.9651
3.2317
variety 4
a
b
12.4667
2.5593
75
4.87
.0001
0.05
7.3683
17.5651
variety 4
a
c
11.1333
2.5593
75
4.35
.0001
0.05
6.0349
16.2317
variety 4
b
c
1.3333
2.5593
75
0.52
0.6039
0.05
6.4317
3.7651
variety 5
a
b
3.1167
2.5593
75
1.22
0.2271
0.05
1.9817
8.2151
variety 5
a
c
9.7833
2.5593
75
3.82
0.0003
0.05
4.6849
14.8817
variety 5
b
c
6.6667
2.5593
75
2.60
0.0111
0.05
1.5683
11.7651
Output 3.14: Confidence Limits for Simple Effect Differences between METHOD by VARIETY Adjusted
Simple Effect Comparisons of method*variety Least Squares Means By variety Adjustment for Multiple Comparisons: Tukey
Simple Effect Level
method
_method
Estimate
Standard Error
DF
t Value
Pr t
Adj P
Alpha
Adj Lower
Adj Upper
variety 1
a
b
6.6833
2.5593
75
2.61
0.0109
0.0290
0.05
0.5637
12.8029
variety 1
a
c
3.3500
2.5593
75
1.31
0.1945
0.3947
0.05
2.7696
9.4696
variety 1
b
c
3.3333
2.5593
75
1.30
0.1968
0.3983
0.05
9.4529
2.7863
variety 2
a
b
6.6167
2.5593
75
2.59
0.0117
0.0310
0.05
0.4971
12.7363
variety 2
a
c
1.9333
2.5593
75
0.76
0.4524
0.7313
0.05
4.1863
8.0529
variety 2
b
c
4.6833
2.5593
75
1.83
0.0712
0.1669
0.05
10.803
1.4363
variety 3
a
b
7.6833
2.5593
75
3.00
0.0036
0.0101
0.05
1.5637
13.8029
variety 3
a
c
5.8167
2.5593
75
2.27
0.0259
0.0659
0.05
0.3029
11.9363
variety 3
b
c
1.8667
2.5593
75
0.73
0.4681
0.7469
0.05
7.9863
4.2529
variety 4
a
b
12.4667
2.5593
75
4.87
.0001
.0001
0.05
6.3471
18.5863
variety 4
a
c
11.1333
2.5593
75
4.35
.0001
0.0001
0.05
5.0137
17.2529
variety 4
b
c
1.3333
2.5593
75
0.52
0.6039
0.8614
0.05
7.4529
4.7863
variety 5
a
b
3.1167
2.5593
75
1.22
0.2271
0.4465
0.05
3.0029
9.2363
variety 5
a
c
9.7833
2.5593
75
3.82
0.0003
0.0008
0.05
3.6637
15.9029
variety 5
b
c
6.6667
2.5593
75
2.60
0.0111
0.0295
0.05
0.5471
12.7863
Interpretation
Outputs 3.13 and 3.14 give the estimated differences between the treatment combinations (Estimate), the standard error of the difference, and the DF, t statistic, and pvalue for the comparison. By definition, the latter are the test statistics for the LSD mean comparison test. The Adj P gives the Tukey adjusted pvalue for the comparison. Output 3.13 and 3.14 also include the lower and upper limits of the confidence interval, both the standard CI and a Tukey adjusted interval. Add the statement
ods output slices=s slicediffs=sd;
to create output SAS data sets of these tables. These tables can be helpful for sorting results or creating custom plots.
3.5.5 Planned Comparisons in a TwoWay Factorial Experiment: CONTRAST, ESTIMATE, and LSMESTIMATE
You can use the CONTRAST and ESTIMATE statements to make planned comparisons among means in a twoway classification just like you did in the oneway classification. In addition, you can use LSMESTIMATE statements, introduced in this subsection. The LSMESTIMATE statement is similar to the ESTIMATE statement, but is often easier to work with when you have factorial treatment structure.
In the previous section, METHODs were compared separately for each VARIETY using a multiple comparison procedure. The comparisons were made separately for each variety because of the significant METHOD*VARIETY interaction. Using a multiple comparison procedure implicitly means no knowledge of the METHODs was assumed that might suggest specific comparisons among the METHOD means. Now suppose that you know something about the METHODs that suggests a specific comparison. For example, assume that METHOD A is a new technique, and METHODs B and C are industry standard techniques, assumed to be similar. You might want to compare a mean for METHOD A with the average of means for METHODs B and C, referred to here as A vs B, C. In general terms, assume you want to estimate the difference
μ A − 1 2 ( μ B + μ C )
There are several ways to make this comparison:
Compare A with B, C separately for each VARIETY (simple effect).
Compare A with B, C averaged across all VARIETY levels (main effect).
Compare A with B, C averaged across subsets of VARIETY (see text below).
Which comparison is appropriate depends on how the comparison interacts with VARIETY and whether the B and C means are sufficiently similar to justify comparing their average to the mean of A. The first comparison (simple effect) would be appropriate if the comparisons were very different from one VARIETY to the next. The second comparison (main effect) would be appropriate only if the comparison did not interact with VARIETY, that is, if the comparison had essentially the same value (aside from random noise) for all the varieties. The third comparison would be appropriate if there were subsets of varieties such that the comparison did not interact with VARIETY within the subsets.
The following sections illustrate how to use CONTRAST, ESTIMATE, and LSMESTIMATE statements to decide which of the above comparisons is appropriate and then make the comparison. For estimating comparisons, examples will initially focus on LSMESTIMATE because, as mentioned above, it is generally easier to use with factorial experiments. In subsequent sections, equivalent implementation using ESTIMATE statements will be shown. ESTIMATE and LSMESTIMATE statements differ in that ESTIMATE statements are defined in terms of the effects model whereas LSMESTIMATE statements are defined in terms of treatment LSMEANS. The ESTIMATE statement is more flexible, and is required for certain specialized applications, but the LSMESTIMATE statement is easier to use for the applications discussed here.
Writing CONTRAST and ESTIMATE statements can be tricky, especially in multiway classifications. You must use the relationship between the means and model parameters to construct CONTRAST and ESTIMATE statements. Following is a threestep process that always works:
1. Write the linear combination that you want to test or estimate in terms of means.
2. Convert means into model parameters: μ i j = μ + τ i + ν j + τ ν i j .
3. Gather like terms.
For example, suppose you want to compare A versus B and C for variety 1.
1. The comparison in terms of treatment means is μ A 1 − ( 1 / 2 ) ( μ B 1 + μ C 1 )
2. Convert to model parameters:
μ + τ A + ν 1 + τ ν A 1 − ( 1 / 2 ) [ ( μ + τ B + ν 1 + τ ν B 1 ) + ( μ + τ C + ν 1 + τ ν C 1 ) ]
3. Gather like terms:
τ A + τ ν A 1 − ( 1 / 2 ) [ ( τ B + τ ν B 1 ) + ( τ C + τ ν C 1 ) ] = τ A − ( 1 / 2 ) ( τ B + τ C ) + τ ν A 1 − ( 1 / 2 ) ( τ ν B 1 + τ ν C 1 )
The resulting expression has the coefficients for model parameters that you can directly insert into a CONTRAST or ESTIMATE statement.
3.5.6 Simple Effect ComparisonsUsing LSMESTIMATE
To set up a comparison of the first type (a comparison of A versus B, C in VARIETY 1) use the basic relationship between means. This is a simple effect comparison because you are comparing METHOD means within a particular VARIETY. Use an LSMESTIMATE statement to estimate A versus B, C in VARIETY 1 (B, C in Variety 1 means the mean of B and C in variety 1).
Writing the linear comparison in terms of cell means gives μ A 1 − 0.5 ( μ B 1 + μ C 1 ) . Before you proceed with the LSMESTIMATE statement, check the order of the LS means from the previous SAS output. The ordering of the METHOD*VARIETY coefficients is determined by the CLASS statement. In this CLASS statement, METHOD comes before VARIETY. For this reason, VARIETY levels change within METHOD levels. This can be seen in the LSMeans table in Output 3.15.
Output 3.15: LSMeans Table Showing VARIETY Changing Within METHOD
method*variety Least Squares Means
method
variety
Estimate
Standard Error
DF
t Value
Pr t
a
1
21.7667
1.8097
75
12.03
.0001
a
2
21.8500
1.8097
75
12.07
.0001
a
3
23.1333
1.8097
75
12.78
.0001
a
4
25.9667
1.8097
75
14.35
.0001
a
5
22.3333
1.8097
75
12.34
.0001
b
1
15.0833
1.8097
75
8.33
.0001
b
2
15.2333
1.8097
75
8.42
.0001
b
3
15.4500
1.8097
75
8.54
.0001
b
4
13.5000
1.8097
75
7.46
.0001
b
5
19.2167
1.8097
75
10.62
.0001
c
1
18.4167
1.8097
75
10.18
.0001
c
2
19.9167
1.8097
75
11.01
.0001
c
3
17.3167
1.8097
75
9.57
.0001
c
4
14.8333
1.8097
75
8.20
.0001
c
5
12.5500
1.8097
75
6.93
.0001
Program
Now you have the information that you need to set up the LSMESTIMATE statement to go with the PROC GLIMMIX model. The required statements are
proc glimmix;
class method variety;
model yield = method variety method*variety;
lsmestimate method*variety A vs B,C in V1
1 0 0 0 0 .5 0 0 0 0 .5 0 0 0 0;
Equivalently, you can write the LSMESTIMATE statement as
lsmestimate method*variety A vs B,C in V1
2 0 0 0 0 1 0 0 0 0 1 0 0 0 0 / divisor=2;
Rather than examine output for the single LSMESTIMATE statement, make the comparison for all five varieties. You would probably want to estimate the comparison A vs B, C separately for each VARIETY if the comparison interacts with VARIETY, that is, if the value of the comparison differs from one VARIETY to the next. The next section shows how to formally test which VARIETY levels interact with the A vs B, C comparison.
As an exercise, see whether you can go through the process to get the coefficients for estimates of A versus B, C in each of VARIETY 2, 3, and 4. The A versus B, C comparison for VARIETY 5 is not shown here because there is a statistically significant difference between the B and C means given VARIETY 5 (see Output 3.13). Recall that comparing the A mean with the average of B and C only makes sense if there is no evidence that the B and C means are different.
Program 3.13 is the complete PROC GLIMMIX step with the correct LSMESTIMATE statements for A versus B, C with the four varieties for which this comparison is legitimate:
Program 3.13
proc glimmix;
class method variety;
model yield = method variety method*variety;
lsmestimate method*variety
A vs B,C  V1 2 0 0 0 0 1 0 0 0 0 1 0 0 0 0,
A vs B,C  V2 0 2 0 0 0 0 1 0 0 0 0 1 0 0 0,
A vs B,C  V3 0 0 2 0 0 0 0 1 0 0 0 0 1 0 0,
A vs B,C  V4 0 0 0 2 0 0 0 0 1 0 0 0 0 1 0 /
divisor=2;
run;
Results
The results appear in Output 3.16.
Output 3.16: Estimates of Method Differences by Variety
Least Squares Means Estimates
Effect
Label
Estimate
Standard Error
DF
t Value
Pr t
method*variety
A vs B,C  V1
5.0167
2.2164
75
2.26
0.0265
method*variety
A vs B,C  V2
4.2750
2.2164
75
1.93
0.0575
method*variety
A vs B,C  V3
6.7500
2.2164
75
3.05
0.0032
method*variety
A vs B,C  V4
11.8000
2.2164
75
5.32
.0001
Interpretation
Notice that the estimates differ among VARIETY levels, possibly an indication of interaction between the comparison A versus B, C and VARIETY. This is no surprise, because there was interaction between METHOD and VARIETY in the analysis of variance table (Output 3.11) in Section 3.5.2, A More Complex TwoWay Factorial Experiment, earlier in this chapter. However, it is possible that VARIETY could interact with METHOD in general, but not interact with the comparison A versus B, C. In section 3.5.8 Simultaneous Contrasts in TwoWay Classification, you will see how to use CONTRAST statements to test for the statistical significance of the interaction between the comparison A vs B, C and VARIETYs.
3.5.7 Simple Effect ComparisonsUsing Model Parameters: How ESTIMATE and LSMESTIMATE Differ
The comparison of A with B, C in VARIETY 1 shown in the previous section uses the basic relationship between means and model parameters: μ A 1 − 0.5 ( μ B 1 + μ C 1 ) . The LSMESTIMATE statement uses the coefficients of these A B cell means directly, because LSMESTIMATE is an estimate statement defined in terms of least squares means (LSM). The ESTIMATE statement, on the other hand, is defined in terms of the effects of the factorial statistical model, μ + α i + β j + α β i j . You can convert the LSMESTIMATE statement to an ESTIMATE statement as follows:
1. State the A versus B, C comparison in terms of the cell means model:
μ A 1 − 0.5 ( μ B 1 + μ C 1 )
2. Restate Step 1 in terms of the effect model parameters:
μ + τ A + ν 1 + τ ν A 1 − 0.5 [ ( μ + τ B + ν 1 + τ ν B 1 ) + ( μ + τ C + ν 1 + τ ν C 1 ) ]
3. Gathering like terms gives the following:
τ A + τ ν A 1 − ( 1 / 2 ) [ ( τ B + τ ν B 1 ) + ( τ C + τ ν C 1 ) ] = τ A − ( 1 / 2 ) ( τ B + τ C ) + τ ν A 1 − ( 1 / 2 ) ( τ ν B 1 + τ ν C 1 )
Now you have the information that you need to set up the ESTIMATE statement to go with the PROC GLIMMIX model. The required statements are as follows:
proc glimmix;
class method variety;
model yield = method variety method*variety;
estimate A vs B,C in V1 method 2 1 1
method*variety 2 0 0 0 0 1 0 0 0 0 1 0 0 0 0 /
divisor=2;
run;
Notice the following:
The and parameters disappeared from the expression, so you don t need INTERCEPT or VARIETY terms in the ESTIMATE statement. Leaving them out is equivalent to setting their coefficients equal to 0.
The ordering of the METHOD*VARIETY coefficients is determined by the CLASS statement. In this CLASS statement, METHOD comes before VARIETY. For this reason, VARIETY levels change within METHOD levels.
Unlike the LSMESTIMATE statement, you must include the coefficients for METHOD in the ESTIMATE statement. If you leave them out, the SAS log will print an error message that the ESTIMATE is nonestimable and the SAS listing for this ESTIMATE will be blank (try it to see what it looks like).
If you only wanted a test of the hypothesis H 0 : μ A 1 − 0.5 ( μ B 1 − μ C 1 ) = 0 and do not want the estimate or standard error, you could replace the ESTIMATE statement with a CONTRAST statement containing the same coefficients:
contrast A vs B,C in V1 method 1 .5 .5
method*variety 1 0 0 0 0 .5 0 0 0 0 .5 0 0 0 0;
Rather than examine output for the single ESTIMATE statement, you make the comparison for all varieties for which the A versus B, C comparison is appropriate. Guidelines for doing this are identical to those presented earlier in Section 3.5.6.
Program
As an exercise, see whether you can go through the threestep process to get the coefficients for estimates of A vs B, C in each of VARIETY 2, 3, and 4. Here is a complete PROC GLIMMIX step with the correct ESTIMATE statements for A vs B, C given VARIETY levels 1 through 4. Notice that these comparisons can be listed in a single ESTIMATE statement. You could also add and ADJUST= option to control multiplicity since you are evaluating more than one comparison.
Program 3.14
proc glimmix;
class method variety;
model yield = methodvariety;
estimate
A vs B,C in V1 method 2 1 1
method*variety 2 0 0 0 0 1 0 0 0 0 1 0 0 0 0,
A vs B,C in V2 method 2 1 1
method*variety 0 2 0 0 0 0 1 0 0 0 0 1 0 0 0,
A vs B,C in V3 method 2 1 1
method*variety 0 0 2 0 0 0 0 1 0 0 0 0 1 0 0,
A vs B,C in V4 method 2 1 1
method*variety 0 0 0 2 0 0 0 0 1 0 0 0 0 1 0 /
divisor=2;
run;
Results
The results appear in Output 3.17.
Output 3.17: Estimate of Method Differences by Variety
Estimates
Label
Estimate
Standard Error
DF
t Value
Pr t
A vs B,C in V1
5.0167
2.2164
75
2.26
0.0265
A vs B,C in V2
4.2750
2.2164
75
1.93
0.0575
A vs B,C in V3
6.7500
2.2164
75
3.05
0.0032
A vs B,C in V4
11.8000
2.2164
75
5.32
.0001
Notice that these estimates are identical to those computed by the LSMESTIMATE statement.
3.5.8 Simultaneous Contrasts in TwoWay Classifications
This section illustrates setting up simultaneous contrasts in a twoway classification by constructing a test for significance of interaction between the comparison A versus B, C and VARIETY. The hypothesis of no interaction between A versus B, C and VARIETY is
H 0 : [ μ A 1 − 0.5 ( μ B 1 + μ C 1 ) ] = … = [ μ A 5 − 0.5 ( μ B 5 + μ C 5 ) ] .
This hypothesis is actually a set of four equations, which can be written in different but equivalent ways. One way to express the equality of all the comparisons is to specify that each is equal to the last. This gives the hypothesis (H 0 ) in the following equations:
[ μ A 1 − 0.5 ( μ B 1 + μ C 1 ) ] = [ μ A 5 − 0.5 ( μ B 5 + μ C 5 ) ]
[ μ A 2 − 0.5 ( μ B 2 + μ C 2 ) ] = [ μ A 5 − 0.5 ( μ B 5 + μ C 5 ) ]
[ μ A 3 − 0.5 ( μ B 3 + μ C 3 ) ] = [ μ A 5 − 0.5 ( μ B 5 + μ C 5 ) ]
[ μ A 4 − 0.5 ( μ B 4 + μ C 4 ) ] = [ μ A 5 − 0.5 ( μ B 5 + μ C 5 ) ]
Going through the threestep process for each of these equations results in the following CONTRAST statement:
contrast A vs BC & Varieties
method*variety 1 0 0 0 1 .5 0 0 0 .5 .5 0 0 0 .5,
method*variety 0 1 0 0 1 0 .5 0 0 .5 0 .5 0 0 .5,
method*variety 0 0 1 0 1 0 0 .5 0 .5 0 0 .5 0 .5,
method*variety 0 0 0 1 1 0 0 0 .5 .5 0 0 0 .5 .5;
As mentioned in Section 3.3.3 Preplanned Comparison for OneWay Classification, concerning the CONTRAST statement for simultaneous comparisons in the oneway classification, there are several ways to specify a set of four equations that would be equivalent to the null hypothesis that the comparison A vs B, C is the same in all five VARIETYs. No matter how you set up the four equations, a CONTRAST statement derived from those equations will produce the results in Output 3.18.
Output 3.18: Test for A versus BC * Varieties Interaction
Contrasts
Label
Num DF
Den DF
F Value
Pr F
A vs BC & Varieties
4
75
1.76
0.1450
The F tests for the A vs B, C * Varieties interaction in Output 3.18 is significant at the =0.145 level. In many hypothesis testing situations, you might not consider this significant. However, like the overall METHOD*VARIETY test shown in Output 3.11, is a multipledegree of freedom test. The overall F test had 8 numerator df and the test in Output 3.18 has 4 numerator df . This means that the F test shown in Output 3.18 is vulnerable to the same problem as the overall F test as there may be certain varieties for which A vs B, C comparison differs in a scientifically important way from the A vs B, C difference in other varieties. Because the 4 df for the A vs B,C*Varieties contrast is an average of four 1 df contrasts, if the A vs B, C interaction is concentrated in a one of single degree of freedom tests, the other 3 tests may mask it.
For this reason the test in Output 3.18 must be regarded as a preliminary test in the modelbuilding phase. The decision should be based on a rather liberal cutoff level of significance, such as .2 or .25. You want to relax the Type I error rate in order to decrease the Type II error rate. It might be a serious mistake to declare that there is no interaction when in fact there is interaction (Type II error); you would then report main effects when you should report simple effects. The estimated main effect might not be a good representation of any of the simple effects. It is usually a less serious mistake to declare there is interaction when in fact there is not (a Type I error) and you would then report simple effects when you should report main effects. In this event, you still have unbiased estimates of simple effects, but you lose precision.
The next section illustrates partitioning the 4 df term above into single df contrasts. This will either confirm that there is no A versus B, C VARIETY interaction or identify the specific component(s) for which an interaction does exist. Such tradeoffs between Type I and Type II errors are ubiquitous throughout statistical modeling and should be kept closely in mind and related directly to the problem at hand when setting thresholds.
3.5.9 Comparison of Levels of One Factor within Subgroups of Levels of Another Factor
There are sometimes good reasons to report simple effects averaged across subgroups of levels of another factor (or factors). This is especially desirable when there are a large number of levels of the second factor. For example, if there were twenty varieties in the example instead of five, it would be tedious to report a separate comparison of methods for each of the twenty varieties, assuming it was even feasible to do so. You might want to consider trying to find subgroups of varieties such that the method comparison does not interact with the varieties within subgroups. It would be legitimate to report the method comparison averaged across the varieties within the subgroups. You should search for the subgroups with caution, however. Identification of potential subgroups should be on the basis of some prior knowledge of the varieties, such as subgroups that have some property in common.
In this example, you have already seen that for VARIETY 5 you cannot legitimately report a mean difference between METHOD A and the average of VARIETY B and C because given VARIETY 5, the B and C means differ significantly. Comparing the average of B and C to another mean makes no sense. However, this is not true for VARIETY 1 through 4. You can legitimately compare the mean of A with the average of B and C individually for each VARIETY except 5. Might there be opportunities to combine these means even further?
Suppose that VARIETY 1 and VARIETY 2 have similar genetic background, and that VARIETY 3 and VARIETY 4 have similar genetic background (but different than varieties 1 and 2). This presents a natural basis for forming subgroups. You might want to group VARIETY 1 and VARIETY 2 together and report a single result for the comparison A versus B, C averaged across these two varieties, and do the same thing for VARIETY 3 and VARIETY 4. The validity of these groupings, however, is contingent upon there being no interaction between the comparison A versus B, C and VARIETY within the groups. You could proceed using the following strategy
1. Test the interaction between A versus B,C by VARIETY within the variety 1 and 2 subgroup, and within the variety 3 and 4 subgroup.
2. If both tests in step 1 suggest negligible interaction, average over the two varieties in each group and test the interaction between A versus B,C by VARIETY across subgroups, i.e. A versus B,C VARIETY 1&2 versus 3&4.
3. If both interactions in step 1 are negligible, you can estimate the simple effects of A versus B,C averaged over varieties 1 and 2 and, separately, over varieties 3 and 4.
4. If the interaction tested in step 2 is also negligible, you can estimate the simple effect of A versus B,C averaged over all four varieties, i.e. variety 1 through 4.
The formal hypotheses and associated contrast statements for step 1 are described here. The null hypothesis of no interaction between the comparison A vs B, C and VARIETY 1 and VARIETY 2 is
H 0 : [ μ A 1 − 0.5 ( μ B 1 + μ C 1 ) ] = [ μ A 2 − 0.5 ( μ B 2 + μ C 2 ) ]
It follows that the CONTRAST statement to test this hypothesis is as follows:
contrast A vs B,C*V1,V2
method*variety 1 1 0 0 0 .5 .5 0 0 0 .5 .5 0 0 0;
Equivalently, it can be expressed as follows:
contrast A vs B,C*V1,V2
method*variety 2 2 0 0 0 1 1 0 0 0 1 1 0 0 0;
Likewise, the null hypothesis of no interaction between A versus B, C and VARIETY 3 and VARIETY 4 is H 0 : [ μ A 3 − 0.5 ( μ B 3 + μ C 3 ) ] = [ μ A 4 − 0.5 ( μ B 4 + μ C 4 ) ] and the associated CONTRAST statement is
contrast A vs B,C*V3,V4
method*variety 0 0 2 2 0 0 0 1 1 0 0 0 1 1 0;
For step 2, the formal statement of the null hypothesis and associated contrast statements are as follows:
H 0 : { [ μ A 1 − 0.5 ( μ B 1 + μ C 1 ) ] + [ μ A 2 − 0.5 ( μ B 2 + μ C 2 ) ] } − { [ μ A 3 − 0.5 ( μ B 3 + μ C 3 ) ] + [ μ A 4 − 0.5 ( μ B 4 + μ C 4 ) ] } = 0
Collecting terms yields the CONTRAST statement:
contrast A vs B,C*V1,V2 vs V3,V4
method*variety 2 2 2 2 0 1 1 1 1 0 1 1 1 1 0;
Results of these CONTRAST statements appear in Output 3.19.
Output 3.19: Interaction between A vs B, C and VARIETY Subsets
Contrasts
Label
Num DF
Den DF
F Value
Pr F
A vs B,C*V1,V2
1
75
0.06
0.8136
A vs B,C*V3,V4
1
75
2.60
0.1114
A vs B,C*V1,V2 vs V3,V4
1
75
4.36
0.0401
This is almost a complete partition of the 4 df A vs B,C*VARIETY test from Section 3.5.8 (Output 3.18). These three contrasts are mutually orthogonal. The remaining degree of freedom would come from an A versus B, C*V1,2,3,4 versus V5 contrast. You would not include V5 since the B and C means are significantly different for V5 and not significantly different for V1, V2, V3, and V4.
You can see that the F test for the interaction between A versus B, C and VARIETY 1 and VARIETY 2 has a pvalue of only 0.8136, which is nonsignificant. Assume that this interaction is negligible, and average the comparison across VARIETY 1 and VARIETY 2. On the other hand, the F test for interaction between A vs B, C and VARIETY 3 and VARIETY 4 has a p value of 0.1114, which is a more ambiguous result. Certainly, there is no rigorous evidence of interaction. However, statisticians tend to be more cautious about whether to require separate estimates of A vs B, C in each of VARIETY 3 and VARIETY 4 or to combine them. Some statisticians say pool unless the 1 df interaction test has p 0.05 ; others say don t combine unless p 0.20 . Assume, for the sake of demonstration, that you do combine the A versus B, C comparison over VARIETYS 1 and 2 and another over VARIETY 3 and 4. Clearly, you cannot combine over all four varieties because the A versus B, C * V1, 2 versus V3, 4 contrast has a pvalue of 0.0401. This comparison is clearly significant, meaning the A versus B, C comparison differs for the two genetic groups, VARIETY 1, 2 and VARIETY 3, 4.
To combine over the two varieties in each group, you want an estimate of
0.5 { [ μ A 1 − 0.5 ( μ B 1 + μ C 1 ) ] + [ μ A 2 − 0.5 ( μ B 2 + μ C 2 ) ] }
for A versus B, C given V1, V2 and
0.5 { [ μ A 3 − 0.5 ( μ B 3 + μ C 3 ) ] + [ μ A 4 − 0.5 ( μ B 4 + μ C 4 ) ] }
for A versus B, C given V3, V4. These yield the following LSMESTIMATE statement:
proc glimmix data=factorial;
class method variety;
model yield = methodvariety;
lsmestimate method*variety
'a vs b,c  v1,V2' 2 2 0 0 0 1 1 0 0 0 1 1 0 0 0,
'a vs b,c  v3,V4' 0 0 2 2 0 0 0 1 1 0 0 0 1 1 0 /
divisor=4,4;
run;
Output 3.20 shows the results.
Output 3.20: Estimate of the A vs B, C Averaged over VARIETY 1 and VARIETY 2 and over VARIETY 3 and VARIETY 4
Least Squares Means Estimates
Effect
Label
Estimate
Standard Error
DF
t Value
Pr t
method*variety
a vs b,c  v1,2
4.6458
1.5673
75
2.96
0.0041
method*variety
a vs b,c  v3,4
9.2750
1.5673
75
5.92
.0001
method*variety
(a vs b,c  v1,2) vs (a vs b,c  v 3,4)
4.6292
2.2164
75
2.09
0.0401
For clarity, you could add the A and averaged B, C means for the two groups using the following LSMESTIMATE statement:
lsmestimate method*variety;
'a  v1,V2' 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0,
'b,c  v1,V2' 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0,
'a  v3,V4' 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0,
'b,c  v3,V4' 0 0 0 0 0 0 0 1 1 0 0 0 1 1 0 /
divisor=2,4,2,4;
Output 3.21 shows the results.
Output 3.21: Mean of A Minus the Mean of B, C Means for Combined VARIETY Groups
Least Squares Means Estimates
Effect
Label
Estimate
Standard Error
DF
t Value
Pr t
method*variety
a  v1,2
21.8083
1.2797
75
17.04
.0001
method*variety
b,c  v1,2
17.1625
0.9049
75
18.97
.0001
method*variety
a  v3,4
24.5500
1.2797
75
19.18
.0001
method*variety
b,c  v3,4
15.2750
0.9049
75
16.88
.0001
The estimate 4.64 in the first line of Output 3.20 is the average of the two estimates 5.02 for VARIETY 1 and 4.27 for VARIETY 2 in Output 3.17. The advantage of averaging is the smaller standard error of 1.57 for the combined estimate compared with 2.21 (see Output 3.17) for the individual estimates.
3.5.10 Quantitative by Qualitative Factorial: An Example
In the previous sections, we investigated a twoway classification design. As with a single factor design, factorial experiments are not limited to qualitative factors. In this section, we will look at the analysis of a design with one qualitative factor and one quantitative factor. In the following section, we investigate treatment design with two quantitative factors, also known as a response surface design.
This example uses data from a study with a randomized complete block experiment design and eight treatments comprising a 2 4 factorial treatment design. The identities of treatment factors and response variable are blinded, so they will be discussed in generic terms. Factor A is qualitative, e.g. two types of management methods. Factor B is quantitative, e.g. amounts of an additive, with levels equally spaced and amounts multiples of 0, 1, 2 and 3. The response variable is denoted Y.
The generic model for these data is
y i j k = μ + α i + β j + α β i j + r k + e i j k
where terms are defined as follows:
α i denotes the main effect of factor A.
β j denotes the main effect of factor B.
α β i j denotes the A B interaction effect.
r k denotes the block effect, assumed i.i.d. N ( 0 , σ r 2 ) .
e i j k denotes the residual effect, assumed i.i.d. N ( 0 , σ e 2 ) .
A typical starting place for the analysis of a qualitative quantitative experiment is to fit the classification model as described above, using in to create an interaction plot as well as do a preliminary assessment of model effects.
Program
Use Program 3.15 to implement this step.
Program 3.15
proc glimmix data= qual_by_quant;
class a b block;
model y = ab;
random block;
lsmeans a*b / plot=meanplot(sliceby=a join);
run;
Results
Output 3.22 shows the results.
Output 3.22: Preliminary Analysis of Qualitative Quantitative Data
Covariance Parameter Estimates
Cov Parm
Subject
Estimate
Standard Error
Intercept
block
0
.
Residual
10.9161
2.7290
Type III Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
A
1
32
7.95
0.0082
B
3
32
1.44
0.2482
A*B
3
32
1.47
0.2413
Figure 3.10: A B Interaction Plot
Interpretation
The first thing to notice is the Covariance Parameter Estimates table: the block variance estimate has been set to zero. This occurs because the solution to the REML estimating equations for the block variance is negative. As noted in Section 2.5, we must override the settozero default using either the NOBOUND option or the compound symmetry (CS) reparameterization of the variance structure. Failing to do so risks inflating type I error rate. Use Program 3.16 to implement the NOBOUND option. This will affect the table of Type III Tests of Fixed Effects but it will not change the interaction plot, shown above.
You can see by inspection of the interaction plot in Figure 3.10 that there is visual evidence of a linear increase in mean response with increasing levels of factor B applied in conjunction with level 1 of factor A, whereas the effect of B in conjunction with level 0 of factor A appears to be negligible.
Program
Program 3.16 implements the NOBOUND option to override the settozero default.
Program 3.16
proc glimmix data = qual_by_quant nobound;
class a b block;
model y = ab;
random block;
lsmeans a*b / plot=meanplot(sliceby=a join);
run;
Results
Output 3.23 shows the covariance parameter estimates and model effect tests as amended by NOBOUND.
Output 3.23: NOBOUNDcorrected Variance Estimates and Model Effect Tests for Qualitative Quantitative Data
Covariance Parameter Estimates
Cov Parm
Subject
Estimate
Standard Error
Intercept
block
1.2340
0.4531
Residual
12.1502
3.2473
Type III Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
A
1
28
7.14
0.0124
B
3
28
1.30
0.2948
A*B
3
28
1.32
0.2875
Interpretation
Using NOBOUND overrides the settozero default, resulting in a more accurate estimate of residual variance (12.15 versus 10.92 without NOBOUND) and tests with better type I error control. In this data set, the results are not appreciably affected (with or without NOBOUND, the main effect of A is statistically significant and the B and A B effects are not significant), but they could have been, as illustrated by the change in p value for the main effect of A ( p = 0.008 without NOBOUND, p = 0.012 with NOBOUND).
The striking result of the Type III tests is what is not significant. Despite visual evidence of an interaction  strong linear effect of B in conjunction with A level 1, negligible B effect with A level 0  the overall test of the A B effect has p value 0.287  not even close to statistical significance. The apparent disconnect between visual evidence and formal statistics is an artifact of the numerator degrees of freedom of the F value for A B: multiple numerator degree of freedom tests are notoriously underpowered, and must be partitioned into single degree of freedom components.
There are two methods for partitioning the A B effect when B is a quantitative factor. You can either use orthogonal polynomial contrasts, or reexpress the model using direct regression. Note that these are extensions of the options for single factor regression shown in Section 3.4. We begin with orthogonal polynomials.
Partition of A B Effect Using Orthogonal Polynomials
Given that there are four levels of factor B, you can partition the B effect into linear, quadratic and cubic components. Use the partition of the B effect to obtain the needed A B contrasts. Table 3.3 shows the strategy.
Table 3.3: Construction of Orthogonal Polynomial Interaction Contrasts by Treatment Combinations
0
1
0
1
2
3
0
1
2
3
A
1
1
B linear
3
1
1
3
3
1
1
3
B quadratic
1
1
1
1
1
1
1
1
B cubic
1
3
3
1
1
3
3
1
A B linear
3
1
1
3
3
1
1
3
A B quad
1
1
1
1
1
1
1
1
A B cubic
1
3
3
1
1
3
3
1
First, construct main effect contrasts for A and B. Factor A only has two levels, so the only possible contrast is A level 0 versus level 1, i.e. coefficients 1 and 1. The orthogonal polynomial contrasts for factor B are linear {3,1,1,3}, quadratic {1,1,1,1} and cubic {1,3,3,1}. Construct the interaction contrasts by taking crossproducts of the A and each B contrast. For example, for A B linear, multiply the A level 0 coefficient (1) by each B linear coefficient, then multiply the A level 1 coefficient by each B linear coefficient.
Program
Use Program 3.17 to implement the partition.
Program 3.17
proc glimmix data = qual_by_quant nobound;
class a b block;
model y = ab / ddfm=kr2;
random intercept / subject=block;
contrast 'a' a 1 1;
contrast 'B linear' b 3 1 1 3;
contrast 'B quadratic' b 1 1 1 1;
contrast 'B cubic' b 1 3 3 1;
contrast 'A x B linear' a*b 3 1 1 3 3 1 1 3;
contrast 'A x B quadratic' a*b 1 1 1 1 1 1 1 1;
contrast 'A x B cubic' a*b 1 3 3 1 1 3 3 1;
run;
Results
Output 3.24 shows the results.
Output 3.24: Partition of Model Effects Using Orthogonal Polynomial Contrasts
Contrasts
Label
Num DF
Den DF
F Value
Pr F
a
1
28
7.14
0.0124
B linear
1
28
3.57
0.0693
B quadratic
1
28
0.30
0.5874
B cubic
1
28
0.02
0.8770
A x B linear
1
28
3.93
0.0574
A x B quadratic
1
28
0.01
0.9391
A x B cubic
1
28
0.03
0.8707
Interpretation
The A B linear contrast shows a marginally significant interaction with the linear term, p = 0.0574. Therefore, we would conclude that the effect of A is contingent on what level of B is used, and the linear effect of B is different depending on which level of factor A is used.
Looking at the A B quadratic and A B cubic results, you can also see why the overall A B effect was nonsignificant. Assuming no missing data, the A B effect is an average of the three orthogonal contrast effects, that is (3.93 + 0.01 + 0.03)/3 = 1.32, the F value for the overall A B effect.
Partition of the A B Effect Using Direct Regression
Instead of partitioning the standard factorial effects using contrasts, you can rewrite the model in polynomial regression form. The model is as follows:
y i j k = μ + α i + ( β 1 + δ 1 i ) B j + ( β 2 + δ 2 i ) B j 2 + ( β 3 + δ 3 i ) B j 3 + r k + e i j k
where terms are as follows:
μ , α i , r k and e i j k are defined as before.
β 1 , β 2 and β 3 are linear, quadratic and cubic regression coefficients, respectively.
δ 1 i , δ 2 i and δ 3 i are treatment effects on the linear, quadratic and cubic regression coefficients, respectivelythat is, they are A B linear, A B quadratic and A B cubic interaction terms.
B j is the j t h level of factor Bnote that B j is a direct variable: If the levels of factor B are 0, 1, 2 and 3, then B 1 = 0 , B 2 = 1 , and so on.
Program
Use Program 3.18 to implement this model.
Program 3.18
proc glimmix data= qual_by_quant nobound;
class a block;
model y = abbb / htype=1,3 ddfm=kr2;
random block;
run;
The righthand terms in the MODEL statement
abbb
are SAS shorthand for the following;
a b a*b b*b a*b*b b*b*b a*b*b*b
The term a here fits the α i effects; b, b*b and b*b*b fit the regression main effects β 1 , β 2 and β 3 , and a*b , a*b*b and a*b*b*b fit the interaction effects δ 1 i , δ 2 i and δ 3 i .
Results
Results appear in Output 3.25.
Output 3.25: Variance Estimates and Type I and III Tests of Regression Model Effects
Covariance Parameter Estimates
Cov Parm
Estimate
Standard Error
block
1.2340
0.4531
Residual
12.1502
3.2473
Type I Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
A
1
28
7.14
0.0124
B
1
28
3.57
0.0693
B*A
1
28
3.93
0.0574
B*B
1
28
0.30
0.5874
B*B*A
1
28
0.01
0.9391
B*B*B
1
28
0.02
0.8770
B*B*B*A
1
28
0.03
0.8707
Type III Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
A
1
28
0.01
0.9355
B
1
28
0.33
0.5683
B*A
1
28
0.00
0.9598
B*B
1
28
0.06
0.8156
B*B*A
1
28
0.03
0.8632
B*B*B
1
28
0.02
0.8770
B*B*B*A
1
28
0.03
0.8707
Interpretation
The Covariance Parameter Estimates are identical to those computed using the factorial effects model. The Type I Tests of Fixed Effects results are identical to the contrast results from the factorial effects model. As we saw in the singlefactor regression model in Section 3.4, the Type III test results are not useful in the context of these data.
Lack of Fit Analysis of Qualitative by Quantitative Factorial
Outputs 3.24 and 3.25 show the analysis of these data using models that account for all seven available treatment degrees of freedomone for A, three for the main effects of B, and three for the A B interaction. Suppose, however, that in planning the study there is good reason to believe that the effect of factor B will be linear; the four levels of factor B are observed to allow a lackoffit test in case the research planners are wrong about the anticipated B effect. You can extend the lackoffit method shown in Section 3.4.3 for use with factorial experiments.
Program
Program 3.19 fits the linear terms from the regression model but replaces the quadratic and cubic terms with a catchall lackoffit term, denoted α β i j in the resulting model:
y i j k = μ + α i + ( β 1 + δ 1 i ) B j + α β i j + r k + e i j k
Program 3.19
proc glimmix data= qual_by_quant nobound;
class a cb block;
model y=ab cb*a / htype=1 ddfm=kr2;
random block;
run;
The term
cb*a
in the MODEL statement corresponds to the lackoffit effect α β i j . You must create the CB variable by using the program statement
cb=b;
in the DATA step.
Results
Output 3.26 shows the results.
Output 3.26: Results of Lack of Fit Analysis
Covariance Parameter Estimates
Cov Parm
Estimate
Standard Error
block
1.2340
0.4531
Residual
12.1502
3.2473
Type I Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
A
1
28
7.14
0.0124
B
1
28
3.57
0.0693
B*A
1
28
3.93
0.0574
A*CB
4
28
0.09
0.9849
Interpretation
The Covariance Parameter Estimates and the Type I tests of A main effect, B linear main effect, and A B linear interaction effect are identical to those in Outputs 3.24 and 3.25. The F value for the lackoffit term, A*CB, is 0.09. Although it is a fourdegreeoffreedom effect, the largest possible F value for any single degree of freedom component of the lackoffit term is 4 0.09 = 0.36. Therefore, you can conclude that the linear regression, with different slopes for each level of A, is sufficient to account for the treatment effects in these data. The next step is to estimate the coefficients of the regression equation for each level of A.
Model for Fitting Separate Regression Equations for Each Level of Factor A
The reduced model to predict Y, with linear B terms only, is
E [ y i j k ] = μ + α i + ( β 1 + δ 1 i ) B j
The disadvantage of this model is that it is not full rank, and working with the resulting solution can be inconvenient. A more useful, full rank form of the model, is
E [ y i j k ] = β 0 i + β 1 i B j
where β 0 i denotes the intercept for the level of factor A, and β 1 i denotes the slope for the level of factor A. The advantage of the full rank model is that the resulting parameters estimates have a clear interpretation.
Program
Use Program 3.20 to fit the full rank model.
Program 3.20
proc glimmix data = qual_by_quant nobound;
class a block;
model y=a a*b / noint ddfm=kr2 solution;
random block;
run;
Results
The solution for this model appears in Output 3.27.
Output 3.27: Solution to the Qualitative by Quantitative Factorial
Covariance Parameter Estimates
Cov Parm
Estimate
Standard Error
block
1.0612
0.3921
Residual
10.7675
2.6919
Solutions for Fixed Effects
Effect
A
Estimate
Standard Error
DF
t Value
Pr t
A
0
25.5040
1.1381
32
22.41
.0001
A
1
25.5180
1.1381
32
22.42
.0001
B*A
0
0.04600
0.6563
32
0.07
0.9446
B*A
1
1.9080
0.6563
32
2.91
0.0066
Interpretation
The block and residual variance estimates are different (1.06 and 10.77 versus 1.23 and 12.15 for block and residual, reduced and full model, respectively) because the quadratic and cubic terms have been pooled with block and residual variability. The implicit assumption is that by dropping them from the model, they are now understood to be noise rather than signal. The regression equation estimates are as follows:
for level 0 of A: 25.504  0.046 B
for level 1 of A: 25.518 + 1.908 B
One criticism of this model is that the slope coefficient for level 0 of A is not significantly different from zero (the p value for it is 0.94); there is no evidence that B, in conjunction with level 0 of factor A, affects response. A more parsimonious, but adequate and arguably more accurate model, is the following:
E [ y i j k ] = { β 00 if A=0 β 01 + β 11 B j if A=1
Program
Use the DATA step and subsequent PROC GLIMMIX statements in Program 3.21 to fit this model.
Program 3.21
data reg;
set qual_by_quant;
xb = (a=1)*b;
run;
proc glimmix data=reg nobound;
class a block;
model y=a a*xb / noint solution;
random block;
run;
The indicator function multiplied by B in the line
xb = (a=1)*b;
creates a regression variable for B that only has value for level 1 of factor A. Thus, a linear regression over B is only fit for level 1 of A.
Results
Output 3.28 shows the results.
Output 3.28: Parameter Estimates for Parsimonious Linear Regression Model
Covariance Parameter Estimates
Cov Parm
Estimate
Standard Error
block
1.0206
0.3792
Residual
10.4429
2.5709
Solutions for Fixed Effects
Effect
A
Estimate
Standard Error
DF
t Value
Pr t
A
0
25.4350
0.5639
33
45.10
.0001
A
1
25.5180
1.1216
33
22.75
.0001
xb*A
0
0
.
.
.
.
xb*A
1
1.9080
0.6463
33
2.95
0.0058
Interpretation
The regression equations are now as follows:
for level 0 of A: 25.435
for level 1 of A: 25.518 + 1.908 B
Notice that for level 1 of A the regression equation is unchanged, but for level 0, the slope is dropped. The expected response for level 0 of A is 25.435 regardless of what level of B is applied.
3.5.11 Example: Factorial with Two Continuous Factors
Factorial experiments in which all treatment factors are quantitative are also called response surface experiments. Although SAS has procedures specifically intended for response surface analysis, notably PROC RSREG, they fit fixedeffectonly models. However, many response surface experiments are blocked in various ways, including incomplete block structures discussed in Chapter 2 , and splitplot structures discussed in Chapter 5 . In addition, response surface experiments often feature nonGaussian data, and are most appropriately analyzed using generalized linear models or generalized linear mixed models (depending on the design structure). GLMs and GLMMs are discussed in Chapters 11 , 12 and 13 . The purpose of this section is to introduce PROC GLIMMIXbased methods useful for analyzing factorial experiments with all quantitative factors, also known as response surface experiments. The example presented in this section uses a fixedeffects only model in order to keep the focus on model statements and plot options. You can augment these with needed RANDOM statements and DISTRIBUTION options consistent with the experiment design and type of response variable with which you are working.
The experiment in this example is a study of how the surface finish of a metal part is affected by the feed rate of the metal through the machine and the depth of the cut. There are three feed rates (0.2, 0.25 and 0.3 in/min) and fourth depths of cut (0.15, 0.18, 0.20 and 0.25mm). Therefore, feed rate could have a linear or quadratic effect, while the depth of cut could have a linear, quadratic or cubic effect. However, in response surface analysis we typically assume that a second order polynomial will be sufficient to model the data. If it is not, then it is more likely that a nonlinear surface should be fit. To test whether the second order polynomial is sufficient, you use the same lackoffit test discussed in Section 3.4.
Program
Program 3.22 reads in the data and begin the analysis with a plot to visualize the surface. We include both a contour plot and a 3dimensional plot as some readers may find one more intuitive to understand the response surface than the other.
Program 3.22
data finish;
do f=.2,.25,.3;
do d=.15,.18,.20,.25;
do rep=1 to 3;
input surface @;
cf = f;
cd = d; *class copies of f and d;
output;
end;
end;
end;
datalines;
74 64 60 79 68 73 82 88 92 99 104 96
92 86 88 98 104 88 99 108 95 104 110 99
99 98 102 104 99 95 108 110 99 114 111 107
; run;
proc glimmix data=finish;
model surface=ffddd;
store gmxfit;
run;
proc plm restore=gmxfit;
effectplot;
run;
proc glimmix data=finish;
class f d;
model surface = f*d;
lsmeans f*d;
ods output lsmeans=lsm;
run;
proc template;
define statgraph surfaceplotparm;
begingraph;
layout overlay3d / cube=fals rotate=20 axisopts=(griddisplay=on);
surfaceplotparm x=f y=d z=estimate;
endlayout;
endgraph;
end;
run;
ods graphics / antialiasmax=5700;
proc sgrender data=lsm template=surfaceplotparm;
run;
Results
Figure 3.11 shows the contour plot. Figure 3.12 shows the 3dimensional plot.
Figure 3.11: Contour Plot of Feed Rate by Cut Depth
Figure 3.12: 3D Plot of Feed Rate by Cut Depth LSMeans
Interpretation
Both plots depict the same data, and it is a matter of preference which plot enables a better visualization. You can change the rotation of the 3D plot using the rotate option within the layout statement of PROC TEMPLATE. It appears that as the feed rate increases, the response increases linearly or possibly quadratically. A similar response pattern can be seen across depth of cut. There also might be a linear by linear interaction due to the slight twist to the surface. You can verify the visual inspection with the statements in Program 3.23.
Program
You can verify the visual inspection with the statements in Program 3.23.
Program 3.23
proc glimmix data=finish;
class cf cd;
model surface=ffdd@2 cf*cd / htype=1;
run;
Results
The output of this analysis is in Output 3.29.
Output 3.29: Analysis of Full Response Surface Model Including LackofFit
Type I Tests of Fixed Effects
Effect
Num DF
Den DF
F Value
Pr F
f
1
24
103.42
.0001
f*f
1
24
6.62
0.0167
d
1
24
71.10
.0001
f*d
1
24
14.39
0.0009
d*d
1
24
0.63
0.4367
cf*cd
6
24
1.21
0.3349
Interpretation
As with a single quantitative factor, you begin the analysis at the bottom of the Type I Tests table to determine whether there is significant lack of fit after fitting the secondorder response surface. In this case, with a p value of .3349, you would fail to reject the null hypothesis of no lack of fit and conclude that the secondorder polynomial is sufficient. Because the secondorder polynomial model is sufficient, you now look for further model reduction opportunities by eliminating any other higher order nonsignificant terms. In this experiment, you can eliminate the d*d term as it has a pvalue of .4367. All other terms remain in the model due to significant pvalues. However, even if a lower order term was not significant, it is often included in the model if a higher order term including it is significant.
Program
Estimate the final fitted response surface using Program 3.24.
Program 3.24
proc glimmix data=finish;
model surface = f f*f d f*d / solution;
store gmxfit2;
run;
Results
The output from this model is in Output 3.30.
Output 3.30: Parameter Estimates for the Fitted Response Surface
Parameter Estimates
Effect
Estimate
Standard Error
DF
t Value
Pr t
Intercept
231.41
55.9055
31
4.14
0.0002
f
1642.08
402.95
31
4.08
0.0003
f*f
1950.00
768.74
31
2.54
0.0164
d
776.89
154.43
31
5.03
.0001
f*d
2279.87
609.65
31
3.74
0.0007
Scale
29.5481
7.5052
.
.
.
Interpretation
Using these estimates, you have the final fitted model. This model can be used to generate and plot predicted values to visualize the surface. It can also be used to identify the feed rate and cut depth that maximized (or minimizes, depending on the objective) the surface finish.
Program
You can use the statements in Program 3.25 to visualize the final fitted surface, either a contour plot or a 3dimensional plot as best suits your visual understanding.
Program 3.25
proc plm restore=gmxfit2;
effectplot;
run;
data plot_rs;
do f=.20 to .30 by .01;
do d=.15 to .25 by .01;
y_hat = 231.41+1642.08*f1950*f*f+776.89*d2279.87*f*d;
output;
end;
end;
run;
proc template;
define statgraph surfaceplotparm;
begingraph;
layout overlay3d / cube=false rotate=20 axisopts=(griddisplay=on);
surfaceplotparm x=f y=d z=y_hat;
endlayout;
endgraph;
end;
run ;
ods graphics / antialiasmax=5700;
proc sgrender data=plot_rs template=surfaceplotparm;
run;
Results
The contour plot from these statements is shown in Figure 3.13. The 3D plot is shown in Figure 3.14.
Figure 3.13: Contour Plot of Fitted Response Surface
Figure 3.14: 3Dimensional Fitted Response Surface
An alternative way to c reate the points for the plot without having to handcode the coefficients is to append them to the original data with missing values for SURFACE. Then add the statements,
output out=rsp(where=(surface=.)) pred=y_hat resid=y_resid; id f d;
to create a SAS data set named RSP containing the response surface predictions and residuals.
An Old Fashioned but Useful ThreeDimensional Plot
For users who may find the three dimensional plot more useful in visualizing a response surface but who also may not require a publishready plot, the older procedure PROC G3D can provide a quick, less programming intensive look at the 3D surface.
Program
As an alternative to the PROC TEMPLATE and PROC SGRENDER program, we present an alternate program for producing plots similar to Figure 3.12 and Figure 3.14 in Program 3.26.
Program 3.26
proc g3d data=lsm;
plot f*d=estimate / rotate=30;
run;
proc g3d data=plot_rs;
plot f*d=y_hat / rotate=30;
run;
Results
The plots from these statements are shown in Figure 3.15.
Figure 3.15: ThreeDimensional Response Surfaces using PROC G3D
3.6 Summary
This chapter presents an extensive set of material required for postprocessing , that is, pursuing the analysis beyond ANOVAstyle tests of fixed effects. It begins with a simple twotreatment comparison, and continues with to factorial experiments with either quantitative or qualitative factors (or both).
Section 3.2 introduces the basic twotreatment comparison. This design is expanded to multiple levels in section 3.3. Section 3.4 shows postprocessing methods when the factor levels are quantitative instead of qualitative. Section 3.5 presents methods for the many forms of factorial treatment structures: qualitative by qualitative, quantitative by quantitative, and quantitative by qualitative factorials of varying degrees of complexity.
Although this chapter is long, it is also introductory. Subsequent chapters expand the use of these treatment designs in conjunction with more complex experiment design structures. Specifically, Chapter 5 includes factorial treatment designs used in conjunction with multiple random effects such as split plots. Chapter 7 addresses analysis of covariance, which is conceptually similar to a qualitative by quantitative factorial.
Chapter 4: Power, Precision, and Sample Size I: Basic Concepts
4.1 Introduction . 101
4.2 Understanding Essential Background for Mixed Model Power and Precision . 102
4.3 Computing Precision and Power for CRD: An Example . 104
4.3.1 Creation of an Exemplary Data Set and Use of PROC GLIMMIX .. 104
4.3.2 Completion of the Power Calculation . 105
4.4 Comparing Competing Designs ICRD versus RCBD: An Example . 108
4.4.1 Revision of PROC GLIMMIX Power and Calculation Steps . 108
4.4.2 Precision of Treatment Differences with Blocking . 108
4.4.3 Pilot Data to Anticipate Variance Components for Precision and Power Analysis . 109
4.4.4 Precision and Power for Changing Sample Size with Blocked Design . 111
4.5 Comparing Competing Designs IIComplete versus Incomplete Block Designs: An Example . 112
4.6 Using Simulation for Precision and Power 117
4.6.1 Simulation to Characterize Possible Outcomes . 117
4.6.2 Simulation as an Alternative Way to Compute Precision or Power . 120
4.6.3 Simulation to Approximate Change in Variance Components . 124
4.7 Summary . 129
4.1 Introduction
Chapters 2, 3, and most of the subsequent chapters in this book focus on the analysis of data using mixed model methods. In this chapter, we turn our attention to analysis before data collection begins, during the planning phase of a study. Specifically, our focus in this chapter is on power, precision, sample size, and comparison of alternative designs under consideration for a prospective study.
First, a review of terminology used in this chapter.
Precision refers to the standard error, and hence the width of the confidence interval for a treatment comparison or contrast of interest. Precision analysis as discussed in this chapter refers to the expected standard error or expected confidence interval width for a proposed design. Precision analysis is especially useful when you want to compare two or more designs under consideration, because the impact of your choice of design on the expected standard error of contrasts of interest (or the expected width of their confidence intervals) gives you a highly informative way to evaluate the pros and cons of each prospective design.
The power of a statistical test is the probability of rejecting the null hypothesis when the alternative hypothesis is in fact true. Power equals one minus the probability of a Type II error and is also known as sensitivity or the true positive rate. For a comparative experiment, the alternative hypothesis is equivalent to the statement a treatment difference exists. Many textbooks refer to the alternative hypothesis as the research hypothesis . The way hypothesis testing is most commonly done, the null and alternative hypotheses have even narrower meaning: the null says no treatment difference exists, and the alternative says a nonzero difference exists. Employed in this manner, power is the probability of declaring a nonzero difference to be statistically significant at a specified level. Note that nonzero difference and scientifically relevant difference are not equivalent. As we shall see in this chapter, being able to articulate the latter is essential for implementing power analysis. However, when you reject the null hypothesis in a conventional hypothesis test, you are merely concluding that there is evidence of a nonzero difference, not necessarily a scientifically relevant difference.
Sample size refers to the number of legitimate replications that are required to attain a desired level of power or precision. As we will see in this chapter, sample size is not exclusively a matter of number of observations. Design matters.
Important Note : For a given sample size, different designs can have very different power and precision profiles, depending on the source, pattern, and magnitude of variance among experimental units. Mixed modelbased power and precision analysis provides a powerful tool for assessing alternative designs.
The last two sentences are important, because they motivate what this chapter is and is not about. Researchers often restrict their attention to the final step of power analysis: determining the required sample size for the design that they plan to use. Typically, SAS procedures such as PROC POWER or PROC GLMPOWER are used for this step. This chapter is intentionally not about this step or these procedures. Although useful, PROC POWER and PROC GLMPOWER are limited to cases where there is a single source of variance, e.g. experimental unit variance in a completely randomized design or randomized complete block design. The POWER and GLMPOWER procedures are not designed to account for multiple random effects, recovery of interblock information, or to assess multilevel designs. You cannot use them to compare, for example, the precision and power characteristics of a randomized block versus an incomplete block versus a splitplot design when these are three designs under consideration. Our point here is that statistical thinking about design should begin much earlier than the compute the sample size step. The purpose of this chapter is to present mixed modelbased tools that enable you to do so. The material in this chapter is introductory. The calculations in the early sections of this chapter could be implemented with PROC POWER, but that is not the point. The point is to set the stage for examples later in this chapter, and more advanced applications in Chapter 14 , that engage statistical thinking early in the planning process and that require mixed modelbased methods.
Our focus is on prospective power and precision calculations, referring to the power of statistical hypothesis tests, and precision of contrast estimates for new experiments that are yet to be conducted. Such calculations can be critical in determining the size and structure of a new experimental design and in optimizing information gain from experimental units. Along with Lenth (2001) and Hoenig and Heisey (2001), we recommend against retrospective power calculations, in which power statistics are used to embellish analysis of a data set in hand. Conclusions from such calculations are misleading, and in fact, the power estimates themselves are generally just a function of the corresponding p values. However, power calculations on current data sets can be useful from a pilot study perspective, in the sense that reasonable estimates for required parameters can be obtained from existing data, providing information needed to perform an appropriate prospective power calculation.
There is welldeveloped literature for power, precision and sample size calculations for a variety of simple statistical models, and for general linear models in particular. Please refer to Castelloe and O Brien (2000), Verbeke and Molenberghs (2000), Muller and Fetterman (2002), SAS Institute Inc. (2004), as well as the documentation for the POWER and GLMPOWER procedures in SAS/STAT. As noted above, the purpose of this chapter is to move beyond this literature, to enable you to use mixed modelbased statistical thinking earlier in the planning process, and to enable you to plan for designs whose complexity exceeds the capabilities of procedures such as PROC POWER and PROC GLMPOWER.
4.2 Understanding Essential Background for Mixed Model Power and Precision
Power and precision calculations for mixed models are more involved because of their more complex covariance structure (Helms 1992; Stroup 2002; Tempelman 2006; Rosa, Steibel, and Tempelman 2006; Stroup 2013), but you can implement them with either PROC GLIMMIX or PROC MIXED using theory and methodology basics, given as follows:
The treatment effect(s) or contrast(s), defined as a linear combination K ′ β .
The design matrix, X . The number of rows corresponds to the sample size and the columns of X are determined by the designusually the treatment design in particular.
The variancecovariance matrix, V . This matrix specifies the sources of random variation and their magnitude. In a mixed model, V = Z G Z ′ + R . G gives the covariance of the random model effects, which typically correspond to the elements of the experiment design. The columns of Z specify the sample size and structure of the random model effects. R gives the variance among the experimental units, and specifies the correlation, if any, among experimental units or subsets of experimental units.
The primary result from linear mixed model theory that enables power and precision analysis is that the estimate of an estimable function, denoted k ′ β ^ , is distributed approximately N ( k ′ β , k ′ [ X ′ V  1 X ]  k ) . From this we have the following:
for a single degreeoffreedom contrast, defined by k ′ β , where k is a vector, the standard error of the contrast estimate is S E ( k ′ β ^ ) = k ′ ( X ′ V  1 X )  k .
a confidence interval for k ′ β is thus k ′ β ^ ± t ( α , ν ) k ′ ( X ′ V  1 X )  k , where t is the value from the t distribution for the (1 )100% level of confidence and degrees of freedom associated with the estimate of the standard error.
for matrix K the test statistic for H 0 : K ′ β = 0 is F = ( K ′ β ^ ) ′ [ K ′ ( X ′ V  1 X ) K ]  ( K ′ β ^ ) / rank ( K ) .
The F statis t ic, referred to in mixed model theory as the approximate F , has an approximate noncentral F distribution,
F ( rank ( K ) , ν , ( K ′ β ) ′ [ K ′ ( X ′ V  1 X ) ]  ( K ′ β ) )
The numerator degrees of freedom equal the rank of K . The denominator degrees of freedom, , follow from the design and sample size. The term ( K ′ β ) ′ [ K ′ ( X ′ V  1 X ) ]  ( K ′ β ) , often denoted , is the noncentrality parameter.
The design determines the X matrix and the structure of V . These, along with the anticipated values for the variance and covariance components, are required for precision analysis. Using these values, you can compute the anticipated standard error and confidence interval width of treatment differences or contrasts of potential interest.
For power analysis, in addition to the information needed for precision analysis, you also need to specify the scientifically relevant value(s) of each treatment effect of interest. For example, if you want to compare two treatment means, specifying the minimum difference that would be considered important translates to a value of the treatment effect k ′ β , which in this case equals τ 1 − τ 2 . This enables you to calculate the noncentrality parameter for the approximate F statistic, and to determine what proportion of the resulting noncentral F distribution lies to the right of the critical value for a test at a specified level. Figure 4.1 illustrates the essential conceptual basis of power analysis.
Figure 4.1: Noncentral F for Various Noncentrality Parameters
Central F
Critical value
Noncentral F , φ > 0
Noncentral F , φ ≫ 0
Under the null hypothesis, H 0 , the treatment effect K ′ β = 0 . Hence, the noncentrality parameter is 0, and the F statistic has an approximate central F distribution. The critical value is the value of the central F such that the area under the curve to the right is , the probability of a type I error, i.e. probability of rejecting H 0 when H 0 is true. The critical value is depicted by the vertical line. When the treatment effect is nonzero, the noncentrality parameter is greater than zero, shifting the distribution of the F statistic to the right. The greater the noncentrality parameter (i.e. the greater the treatment effect size) the more the distribution shifts to the right. Noting that you reject H 0 if the F statistic is greater than the critical value, the area under the curve to the right of the critical value gives the probability of rejecting H 0 when H 0 is false. That is, the area to the right of the critical value for noncentral F is the power for the corresponding treatment effect size.
You can compute precision and power analyses for mixed models using PROC GLIMMIX or PROC MIXED in the following steps:
1. Create a data set identical to the one that you will eventually use to analyze the data when it is collected, except that instead of actual datawhich you do not yet haveuse anticipated expected values for each treatment or treatment combination. Following O Brien, we refer to this as the exemplary data set . The means that you use in the exemplary data set are not intrinsically important, but it is important that differences between means of treatments that you intend to compare reflect the differences that you consider to be scientifically relevant.
2. Compute the anticipated standard errors and test statistics using PROC GLIMMIX or PROC MIXED. In this step, you must provide variance and covariance values for the random model and residual effects.
3. For precision analysis , the standard errors from the PROC GLIMMIX or PROC MIXED step provide the needed information.
4. For power analysis , use the F value from the PROC GLIMMIX or PROC MIXED step. You can see from the definitions of the approximate F statistic and the noncentrality parameter given above, that multiplying the PROC GLIMMIX or PROC MIXED F value by the corresponding numerator degrees of freedom gives you the noncentrality parameter for the treatment differences given in the exemplary data set. You then use SAS probability functions to compute power.
In the following section, three examples of precision and power analysis are presented. Section 4.3 shows the basic procedure setting up the exemplary data set, implementing the PROC GLIMMIX step, and using the probability functions. Section 4.4 shows how to extend this procedure to compare competing designs, whose structure requires the use of mixed model analysis. Section 4.5 shows a more complex example that requires comparison of complete and incomplete block designs, and the use of recovery of interblock information in design planning. Finally, Section 4.6 introduces the use of simulation as an additional tool to planning studies that will require mixed model analysis.
4.3 Computing Precision and Power for CRD: An Example
The example in the section introduces the essential steps that are required to implement precision and power analyses.
Suppose that a researcher wants to compare three treatments using a completely randomized design with 4 experimental units per treatment. One of treatments (referred to below as treatment C) is a reference treatment. From past experience, the mean response of treatment C is known to be approximately 50, with a variance among experimental units of σ 2 = 25 . The other two treatments (referred to below as treatments A and B) have the potential to increase mean response by as much as 10. In addition, the research team agrees that the minimum increase in mean response of any practical importance is 5. To begin planning an experiment to evaluate treatments A and B relative to the control, the researchers follow the steps outlined in Section 4.2 to assess the anticipated precision and power of their proposed experiment.
4.3.1 Creation of an Exemplary Data Set and Use of PROC GLIMMIX
The programs in this section demonstrate the first two steps in the precision analysis implementation.
Programs
The first step is to create the exemplary data set as shown in Program 4.