Big data generates a mixture of great promise, fantastic and delusional claims, impressive misunderstanding, and several early examples that are beginning to shape our understanding of it. The Internet, open systems and mobile technologies are other, earlier examples of this kind of information technology phenomenon.

Big data conferences and articles are everywhere, and big data initiatives can be found in all industries. Online retailers like Amazon are masters at unlocking the true power of big data analytics. By monitoring our every click, they are able to infer who we are and what types of merchandise we're interested in to better identify sales opportunities and suggest other items we might want to buy. Thus, they create value by optimizing their product mix and personalizing our shopping experience. Customer loyalty and satisfaction tends to trend well, and profits typically follow suit.

Health care also has begun to pursue big data. The growing volume of health-related data, including data from electronic health records, diagnostic imaging equipment, aggregated pharmaceutical research and personal devices such as FitBits and other wearable technologies, presents exciting new opportunities to obtain medical insights and improve patient care. Big data is often proffered as the means to unlock this value.

The Scope of Big Data

Understanding several characteristics of big data can help health care providers to harness this phenomenon.

What makes big data big? Data's "bigness" is due to what is referred to as the four Vs:

The sheer volume of data that can be obtained, aggregated and analyzed is often several orders of magnitude more than it was 10 years ago.

The velocity of data has accelerated greatly; i.e., more data are available in real time. For example, we can monitor people's purchasing as they purchase.

The variety of data has become quite broad. For example, a health care provider can bring together EHR, imaging, molecular medicine, patient search behavior and environmental data for predictive analysis.

The veracity of data can be much improved. For example, a retailer can observe what customers really buy as distinct from what they say they buy in a consumer survey.

Big data is broadening the range of data that may be important in caring for patients. For instance, in the case of Alzheimer's and other chronic diseases such as diabetes and cancer, online social sites like PatientsLikeMe not only provide a support community for like-minded patients, but also contain knowledge that can be mined for public health research, medication use monitoring and other health-related activities. Moreover, popular social networks like Facebook and Twitter can be used to engage the public and monitor public perception and response during flu epidemics and other public health threats.

Perhaps it is the analysis of these more non-traditional forms of health-related data that will yield the most potent new analytics. For instance, is it possible to extract useful information to aid in spotting potential epidemics from online search queries?

It likely took only one fatal case of enterovirus D68 in a particular community before concerned mothers rushed to Google to check its symptoms. While such search query data are no substitute for local hospital surveillance, researchers believe they certainly can be used as additional input data for estimation models used to improve detection of and response to infectious disease outbreaks.

Big data is more than data; it is also analytics. As important, perhaps more important, than the data are the novel analytics that are being developed to analyze these data. In health care, we see an impressive range of such analytics as:

  • post-market surveillance of medication and device safety;
  • comparative effectiveness research;
  • assignment of risk, e.g., readmissions;
  • novel diagnostic and therapeutic algorithms in fields such as oncology;
  • real-time status and process surveillance to determine, for example, abnormal test follow-up performance and patient compliance with treatment regimens;
  • identification of patterns in the data; e.g., determining if a patient is following the treatment regimen by looking at medication compliance, grocery purchases, sensor and activity data;
  • machine correction of data quality problems.

At times, big data involves a combination of novel analytics and novel uses of data. For example, a team at Baylor University, using IBM's Watson, identified 10 kinases that might play an important role in combating cancer by mining more than 100,000 articles. This identification subsequently was confirmed through traditional bench research efforts.

Big data is a category. Big data is applied to a wide range of uses in a wide range of industries and efforts. There is no single big data product or application or technology. In this way, big data is similar to transportation or retail, both of which encompass a wide range of activities.

Despite the hype, big data's impact may be quite profound. Overall, McKinsey & Co. estimates that big data initiatives could account for $300 billion to $450 billion in reduced health care spending, or 12–17 percent of the $2.6 trillion baseline in U.S. health care costs.

There are several early examples of possibly profound impact. An analysis of the cumulative sum of monthly hospitalizations due to myocardial infarction, among other clinical and cost data, led to the discovery of arthritis drug Vioxx's adverse effects and its subsequent withdrawal from the market in 2004.

Today, Alzheimer's disease research is benefiting greatly from a range of big data approaches, from whole genome sequencing to complex analyses of the huge volumes of data associated with functional magnetic resonance imaging. Analysis of large data streams — genomic, behavioral, clinical, epigenetic and environmental — from multiple observation points can lead to a more sophisticated understanding of the causes and most effective treatments for Alzheimer's.

Much of what we need to do in health care today requires that we focus on "bigger data." As we begin to manage populations and care continuums we have to bring together data from hospitals, physician practices, long-term care facilities, the patient and so forth. These data needs are bigger than the data needs we had when we focused only on inpatient care.

The expanse of quality measures that health systems need to capture, report and use is bigger than the quality measure set of five years ago.

The range of analyses is bigger. Payer mix analysis and productivity analysis are still germane, but we have extended the analysis arsenal to include predictive algorithms, assessments of practice conformance to the evidence, and patient adoption of health behaviors.

Determining if the increase in "bigness" is big enough to constitute big data is irrelevant. What is relevant is that providers have to tackle this increase in bigness, and this tackling is not trivial.

Significant challenges exist in acquiring and managing such large volumes of data, reconciling inconsistent data arriving from disparate sources, protecting patient privacy and effectively analyzing big data to create value.

Capitalizing on big and bigger data requires the elevation of organizational data and analytics competency. Delivering actionable data that create value requires significant analytics capabilities. From reporting on what happened to predicting what will happen, a recent McKinsey & Co. report steps us through a range of big data capabilities today's providers ought to have or should be acquiring:

Reporting: At its simplest, this is looking at data to determine what happened, to evaluate performance or both. For example, with low technological complexity, today's business intelligence tools enable managers to track and report on key operational metrics such as Healthcare Effectiveness Data and Information Set measures, referral patterns, census analysis and so forth.

Monitoring: Using recent and near real-time data, monitoring activities provides insight into what is happening now. One might imagine patient safety-oriented workflows that identify the potential for inappropriate medication administration.

Data mining and evaluation: Both hypothesis-based and machine-based data mining can be used to determine why a specific event happened. This investigation of cause-and-effect relationships can help to inform evaluations of drug efficacy; deliver proof of the best, most cost-effective care protocols; and identify patients with potential diseases, to name just a few uses.

Prediction/simulation: Using sophisticated analytics to crunch massive amounts of data to predict which chronic disease sufferers are most likely to be readmitted or to determine the predictive indicators of relapse, for example, is fast becoming the holy grail of this new era of accountable health care and the shift to population health management.

These capabilities will be needed to effectively address several aspects of analysis that will prove crucial, as providers are held more accountable for the care delivered to a patient and a population. A recent Deloitte report identified five aspects:

Population management analytics: Produce a variety of clinical indicator/quality measure dashboards and reports to help improve the health of a whole community, as well as help to identify and manage at-risk populations.

Provider profiling/physician performance analytics: Normalize (both severity and case mix-adjusted profiling), evaluate and report the performance of individual providers (primary care physicians and specialists) compared with established measures and goals.

Point-of-care health gap analytics: Identify patient-specific health care gaps and issue a specific set of actionable recommendations and notifications either to physicians at the point of care or to patients via a patient portal or personal health record.

Disease management: Define best practice care protocols over multiple care settings, enhance the coordination of care and monitor/improve adherence to best practice care protocols.

Cost modeling/performance risk management/comparative effectiveness: Manage aggregated costs and performance risk, integrating clinical information and clinical quality measures.

The adoption of big data into management and clinical practice will shift the intent of analytics. Noted analytics expert Thomas H. Davenport points out that analytics typically have focused on monitoring the performance of the organization. The data used originated from such transaction systems as the revenue cycle and were largely retrospective.

Many of the examples cited previously shift the organizational intent to a more proactive and predictive orientation. Can we predict readmissions? Can the data identify more effective treatment approaches? The analysis uses transaction systems, but has grown to encompass nontraditional sources of data such as medical literature and patient recording of health status.

An emerging intention is embedding the analytics into medical equipment, buildings and services. This category of analytics could range from a workflow-based EHR that drives and monitors process orchestration across the health care continuum, to the construction of smart assisted-living facilities. Such facilities could feature technologies that monitor and track residents' activity levels, vital signs, sleep cycles and compliance with medication regimens to help physicians understand variation in patient outcomes.

Final Thoughts

Health care is one of the most data-intensive and data-driven industries in the world. Vast amounts of data are generated from health care providers, public and private payers, ancillary service providers such as labs and pharmacies, and health care consumers alike. The challenge is not just in storage and access, but also in making this data usable.

As we strive to deliver a more personalized experience in health care, the actions of doctors and nurses and the personal connection they have with their patients will always remain the centerpiece of providing high-quality care.

However, the opportunity to more positively impact care outcomes will be broadened through ongoing investments in the solutions, infrastructure and expert knowledge needed to support and advance big data undertakings. Such initiatives will cause us to look at data that have not been studied before or simply weren't available, thereby opening up a whole new set of analysis opportunities — opportunities to dramatically transform the practice of medicine.