Context. - Peer assessments have traditionally been used to judge the quality of care, but a major drawback has been poor interrater reliability. Objectives. - To compare the interrater reliability for outcome and process assessments in a population of frail older adults and to identity systematic sources of variability that contribute to poor reliability. Setting. - Eight sites participating in a managed care program that integrates acute and long- term care for frail older adults. Patients. - A total of 313 frail older adults. Design. - Retrospective review of the medical record with 180 charts randomly assigned to 2 geriatricians, 2 geriatric nurse practitioners, or 1 geriatrician and 1 geriatric nurse practitioner and 133 charts randomly assigned to either a geriatrician or a geriatric nurse practitioner. Main Outcome Measures. - Interrater reliabilities for structured implicit judgments about process and outcomes for overall care and care for each of 8 tracer conditions (eg, arthritis). Results. - Outcome measures had higher interrater reliability than process measures. Five outcome measures achieved fair to good reliability (more than 0.40), while none of the process measures achieved reliabilities more than 0.40. Three factors contributed to poorer reliabilities for process measures: (1) an inability of reviewers to differentiate among cases with respect to the quality of management, (2) systematic bias from individual reviewers, and (3) systematic bias related to the professional training of the reviewer (ie, physician or nurse practitioner). Conclusions. - Peer assessments can play an important role in characterizing the quality of care for complex patients with multiple interrelated chronic conditions, but reliability can be poor. Strategies to achieve adequate reliability for these assessments should be applied. These strategies include emphasizing outcomes measurement, providing more structured assessments to identity true differences in patient management, adjusting systematic bias resulting from the individual reviewer and their professional background, and averaging scores from multiple reviewers. Future research on the reliability of peer assessments should focus on improving the ability of process measures to differentiate among cases with respect to the quality of management and on identifying additional sources of systematic bias for both process and outcome measures. Explicit recognition of factors influencing reliability will strengthen efforts to develop sound measures for quality assurance.