Mining Health Claims Data for Assessing Patient Risk
Abstract
As all countries struggle with rising medical costs and increased demand for services, there is enormous need and opportunity for mining claims and encounter data to predict risk. This chapter discusses the important topic, to health systems and other payers, of the identification and modeling of health risk. We begin with a definition of health risk that focuses on the frequency and severity of the events that cause patients to use healthcare services. The distribution of risk among members of a population is highly skewed, with a few members using disproportionate amounts of resources, and the large majority using more moderate resources. An important modeling challenge to health analysts and actuaries is the prediction of those members of the population whose experience will place them in the tail of the distribution with low frequency but high severity. Actuaries have traditionally modeled risk using age and sex, and other factors (such as geography and employer industry) to predict resource use. We review typical actuarial models and then evaluate the potential for increasing the relevance and accuracy of risk prediction using medical condition-based models. We discuss the types of data frequently available to analysts in health systems which generate medical and drug claims, and their interpretation. We also develop a simple grouper model to illustrate the principle of “grouping” of diagnosis codes for analysis. We examine in more depth the process of developing algorithms to identify the medical condition(s) present in a population as the basis for predicting risk, and conclude with a discussion of some of the commercially-available grouper models used for this purpose in the U.S. and other countries.