There is no is doubt that health systems are swimming in data — sometimes easily accessible but occasionally “locked away” in silos. With the Medicare Access and CHIP Reauthorization Act taking effect in January, systems are under increased pressure to support the Centers for Medicare & Medicaid Services' Advanced Alternative Payment Models and Merit-based Incentive Payment System.

Many health systems are taking stock of their data-use maturity for population health management. Harnessing all their data in a meaningful way can be a challenge, but mastering data is critical for stratifying patients by risk, a core competency for PHM.

Defining the data types

Determining what data is needed, what data is available, and each source’s strengths and weaknesses are an organization’s first steps.

Administrative data

Administrative data is generated through routine health care operations such as registering patients (obtaining demographic and insurance information) and professional and facility billing (e.g., 837I and 837P claims). Occasionally, health systems have integrated registration and billing systems, as well as administrative processes tethered to their ambulatory care, acute care and post-acute care systems.

Demographic information is necessary — though not sufficient — for accurate patient matching. It is also required for data analysis involving age, gender or place of residence and for patient engagement.

Because billing data includes coded diagnoses (ICD-9 and ICD-10) and coded procedures (CPT and CPT II) to justify the appropriateness of services, some population health capabilities are feasible. Systems can use aggregated billing data to determine quality measures, create disease registries, calculate basic risk scores, and identify and reach out to patients with gaps in care.

Challenges using administrative data: Using such administrative data alone has inherent limitations. It can’t be used to identify certain subpopulations, such as patients with some clinical findings (e.g., elevated blood pressure or patients with abnormally high HbA1c values). In addition, not all clinical activity is billable. Even when it is, physicians do not always submit a code for each service they perform, even when warranted. Poor charge capture affects revenue and can decrease the value of administrative data in population health management.

Adjudicated claims data

Administrative billing data submitted to a health plan for reimbursement undergoes cycles of “edits” to identify errors; monitor appropriateness; and check for duplicates during the care process (numerous revised bills), group services and other revisions. This results in the plan’s having an adjudicated claim. Complete adjudicated claims data includes paid claims for all the providers and hospitals the patient used, creating a broad view of care. In addition to filling in care gaps occurring outside a specific health system, this adjudicated claims data can be used to determine a more accurate total cost of care.

Adjudicated claims are typically available from several sources:

  • Self-insured health systems may find that their third-party administrators are willing to provide the paid claims data for employee patients to improve their care. They need to pay careful attention to appropriate use of employee health data by a health system (the employer).
  • CMS provides claims data periodically to accountable care organizations that participate in the Medicare Shared Savings Program to better manage their at-risk lives. Submission of a data use agreement is necessary.
  • Commercial insurers collaborating with a particular health care organization may provide their claims data to the health system. If the organization has contracts with health plans that require taking on risk for its PHM activities, the health plans are more likely to share claims data.

By informing clinicians about the care that a patient has received outside the clinician's enterprise, claims data can improve medical decision-making, fill treatment gaps and avoid redundant testing. Similarly, when a patient seeks care outside his or her network in an accountable care organization, claims data may identify issues concerning access to in-network services, physician referral patterns or patient convenience. Finally, organizations may benefit from examining historical claims data to identify cost and quality issues before contracting to share risk on a patient population.

Challenges using claims data: Like administrative billing data, claims data has significant lag, making some real-time care decisions impractical. Typically, commercial claims data is not fully adjudicated until 30 days after the date of service. The time lag for Medicare varies, but it may be up to three months for an adjudicated claim.

Clinical data

Clinical data arising from electronic health record data are often much more granular and clinically useful than are administrative data. Clinical data, such as laboratory test results and vital signs, tend to be more timely than claims data since it is generated in near-real time through the care process. Recently, it has become possible to exchange summaries of care between providers with disparate EHRs. When that happens, some limited, coded clinical data from outside an organization may be added to the enterprise EHR. Health information exchanges may also provide additional clinical data that can be useful in patient care.

Challenges using clinical data: Because of the lack of true semantic interoperability between disparate EHRs, access to rich EHR data is generally limited to the organization where it is collected. If a patient receives a service outside that organization or network, that service will usually be documented in a different EHR or practice management system rather than in the enterprise EHR system. Since data on that care event are missing, they cannot be used in clinical decision support and are not part of the database used in registries or for analytic purposes, limiting PHM activities that can be supported by enterprise EHR systems.

Clinical EHR data also contain artifact. “Garbage in, garbage out” is a real issue when busy clinicians are a key source of EHR data. Problem-list diagnoses, accurate medication lists and conflicting sources of clinical histories from various providers all challenge those who need to use this data. Thus, systems need to be careful in validating and choosing which EHR data is usable in the setting of PHM. Moreover, standardization of EHR workflows, improvements in the ability to capture critical data and ongoing validation of back-end EHR data are all important steps for any organization wishing to use its EHR data.

EHR data may be accessible through special extraction tools on proprietary vendor databases or may be copied to reporting databases. In some cases, standard exchange mechanisms such as HL7, C-CDA or FHIR could be used to extract coded clinical information for specific patients from EHRs, but this may not be suitable for large registries or analytics in PHM.

Patient-generated data

Patient-generated data include many different kinds of information, ranging from health risk assessments and online medical histories to functional-status surveys and remote monitoring data. Predicting a patient’s health risk is a basic building block of population health management. Online medical histories can increase the efficiency of provider documentation and provide additional information about a patient. Functional status surveys are one of the best sources of outcomes data.

Challenges using patient-generated data: Although this kind of data has not typically been an important source of information on the health of a patient, that is rapidly changing. And as mobile health apps and wearable sensors proliferate, clinicians are growing more interested in remote patient monitoring. Health care providers will be tasked with finding technology that can incorporate patient-generated data into care planning and with adjusting their workflow accordingly.

Unstructured data

Roughly 80 percent of EHR data is unstructured, trapped in free text, dictated documents, imaging studies and some test results. Some of this unstructured data is information that was not entered in the appropriate fields of the EHR, but much of it addresses psychosocial factors, living situations, habits, environment and patient outcomes where discrete fields are not consistently available.

Challenges using unstructured data: This type of information is often unavailable for use in analysis or clinical decision support in PHM programs. However, some organizations are using natural language processing tools to extract structured data from unstructured text and make some of the data available.

Optimal data strategy for PHM

No one data source is going to provide a “gold standard” for PHM. Claims data alone has often been used for PHM, but it is not timely, making it less useful for care management. In addition, because administrative and adjudicated claims data are not as rich as clinical data, both fail to provide a granular picture of clinical situations. Therefore, it is important to reconcile administrative, adjudicated claims and clinical data to optimally risk-stratify patients.

Administrative, clinical, adjudicated claims and patient-generated data all have advantages and disadvantages. The disadvantages are especially obvious when analysts are attempting to use a data source for some purpose other than its original intent. Here are three key points to remember when building your data strategy for PHM:

  • Start with an inventory of data sources and analysts who have expertise in accessing, extracting and curating the various data. This advice holds true whether you are developing your data strategy in house or working with a partner.
  • Although it helps to have as much data as possible and to be able to combine different data types as needed into a common data platform, keep in mind that data-use agreements, privacy and security policies such as the Health Insurance Portability and Accountability Act and other constraints may limit your projects.
  • Finally, iterate through your data projects with a few data types and incrementally layer on additional types as needed.

Editor’s note: This article is the second in a three-part series on common barriers to population health management. In part 1, "Interoperability: A Common Challenge for Population Health," Dr. Jain examines how coordinated care requires organizations' technologies to be connected. In Part 3, “How Hospitals Should Build a Data Infrastructure,” Jain addresses the challenges of building the right infrastructure.

Anil Jain, M.D., is a vice president of IBM Watson Health and former senior executive director of information technology at the Cleveland Clinic. He continues to practice and teach internal medicine at the clinic.

The opinions expressed by the author do not necessarily reflect the policy of the American Hospital Association.