Test developers face two issues: (a) what to measure, and (b) how to measure it (Lindquist, 1936). For most large-scale testing programs, test blueprints are developed that specify content and cognitive demands in terms of “what to measure.�? Regarding “how to measure,�? one dilemma facing designers is the choice of item format. The issue is significant in a number of ways. First, interpretations vary according to item format. Sec- ond, for policymakers, the cost of scoring open-ended items can be enormous compared with multiple-choice items. Third, the consequences of using any given format may affect instruction in ways that foster or hinder the development of cognitive skills being measured by tests-an effect related to systemic validity (Frederiksen & Collins, 1989). Everyone involved in these discussions points to the centrality of validity concerns. Whether our attention is to systemic validity, a unitary construct validity orientation (Messick, 1989), or a focus on consequential validity (see Mehrens, chap. 7, this volume; Messick, 1994), meaning and inference are our concerns.
|Original language||English (US)|
|Title of host publication||Large-Scale Assessment Programs for All Students|
|Subtitle of host publication||Validity, Technical Adequacy, and Implementation|
|Publisher||Taylor and Francis|
|Number of pages||16|
|ISBN (Electronic)||1410605116, 9781135653897|
|State||Published - Jan 1 2012|