This site uses cookies.
Learn more.

×

This website uses cookies. Some of the cookies we use are essential for parts of the website to operate while others offer you a better browsing experience. You give us your permission to use cookies, by continuing to use our website after you have received the cookie notification. To find out more about cookies on this website and how to change your cookie settings, see our Privacy policy and Terms of Use.

eBriefing

Big Data in Healthcare: Opportunities and Challenges

Big Data in Healthcare
Reported by
Ann Griswold

Posted February 11, 2019

Presented By

NYU School of Medicine

The New York Academy of Sciences

Overview

Data analysis for medical research is no longer restricted to doctor’s offices or traditional clinical trials. Health outcomes can be predicted by information from wearable devices, at-home genetic testing kits, insurance databases, web searches—even a cell phone’s GPS. As a result, business models in healthcare are experiencing a fundamental period of disruption and new players are entering the space from tech, e-commerce, and beyond. With this integration of big data comes a fresh set of ethical concerns.

On October 24, 2018, The New York Academy of Sciences and NYU School of Medicine presented Healthcare in the Era of Big Data, a two-day conference exploring the ethical risks and rewards of incorporating big data into the healthcare landscape. Healthcare professionals discussed how to navigate recruitment and consent; privacy protection and data ownership; corporate responsibility and compliance; and more in this new era of medicine and research.

Speakers

Jacqueline Corrigan-Curay, JD, MD
Jacqueline Corrigan-Curay, JD, MD

United States Food and Drug Administration

Brett Davis
Brett Davis

Deloitte

Vivian Lee
Vivian Lee, MD, PhD, MBA

Verily

Patrick Ryan
Patrick Ryan, PhD

Janssen, Columbia University

Rainu Kaushal
Rainu Kaushal, MD, MPH

Weill Cornell Medicine and New York Presbyterian Hospital

Barbara Evans
Barbara Evans, PhD, JD, LLM

The University of Houston Law Center

Jamie Holloway
Jamie Holloway, PhD

Georgetown University

Harlan Krumholz
Harlan Krumholz, MD, SM

Yale School of Medicine

Solomon Iyasu
Solomon Iyasu, MD, MPH

Merck

Sally Okun
Sally Okun, RN, MMHS

PatientsLikeMe

Eric Perakslis
Eric Perakslis, PhD

Harvard Medical School

Arti Rai
Arti K. Rai, JD

Duke University School of Law

Mark Barnes
Mark Barnes, JD, LLM

Ropes & Gray

Nadav Zafir
Nadav Zafrir, LLB, MBA

Team8

Consuelo Wilkins
Consuelo H. Wilkins, MD, MSCI

Vanderbilt University Medical Center

Deborah Kilpatrick
Deborah Kilpatrick, PhD

Evidation Health

Mitchell Lunn
Mitchell Lunn, MD, MAS

University of California, San Francisco

Misti Anderson
Misti Ault Anderson, MS

U.S. Department of Health and Human Services

Craig Lipset
Craig Lipset, MBA

Pfizer

Jennifer Miller
Jennifer E. Miller, PhD

Yale School of Medicine; Bioethics International

Craig Konnoth
Craig Konnoth, JD, MPhil

University of Colorado, Boulder

Amy Abernethy
Amy Abernethy, MD, PhD

Flatiron Health

Kate Black
Kate Black, JD

23andMe

Patricia Furlong
Patricia Furlong

Parent Project Muscular Dystrophy

James Lu
James Lu, MD, PhD

Helix

Tara Montgomery
Tara Montgomery

Civic Health Partners

Joanne Waldstreicher
Joanne Waldstreicher, MD

Johnson & Johnson

Tom Donaldson
Thomas Donaldson, PhD

The Wharton School, The University of Pennsylvania

Bray Patrick-Lake
Bray Patrick-Lake, MFS

Duke Clinical Research Institute

Robert Califf
Robert M. Califf, MD

Duke University; Verily

Arthur Caplan
Arthur Caplan, PhD

NYU School of Medicine

This symposium was made possible with support from

Introduction

The fourth industrial revolution is underway and it’s powered by big data. The mass accumulation of real-world information—including everything from parameters tracked by surveillance tools such as Fitbit to evidence logged by parole officers, details mined from electronic health records and genomic sequences and more—has the potential to refine the healthcare system.

To realize the potential of big data, however, it must be treated with the utmost responsibility and privacy. This conference addressed strategies for reaping the benefits of big data while mitigating the potential risks. Experts discussed topics ranging from analytical power requirements to the need for data standardization. Sessions focused on strategies to promote compatibility between datasets drawn from disparate sources.

The issue of data stewardship is paramount now that anonymization of patient data is a pipedream of the past. Genetic databases such as 23andMe can easily identify an individual, so the critical question becomes: Who is responsible for safeguarding big data, and how? What penalties should be applied when big data is used in a way that harms individuals? How might profits from big data be equitably distributed? Bioethics experts tackled these questions and more in panel discussions before an audience of fellow industry experts.

Session 1. Incorporating Big Data into the Healthcare Landscape: Applications and Open Questions

Panelists

Jacqueline Corrigan-Curay

United States Food and Drug Administration

Brett Davis

Deloitte

Vivian Lee

Verily

Patrick Ryan

Janssen, Columbia University

Highlights

  • Three features have historically characterized big data: volume, variety, and velocity—with a "V," veracity, becoming increasingly important in healthcare.
  • There is a critical need for data standardization so that information drawn from multiple sources can be analyzed in a useful way.
  • A two-way feedback system is needed to connect clinical trial data with healthcare data (e.g., electronic health records).
  • Unique challenges related to trust and privacy must be overcome before consumer data collected via smart sensors can be integrated with healthcare data.

What is Big Data?

According to Brett Davis, of Deloitte, three defining features characterize big data: volume, variety, and velocity of the data, with a fourth "V," veracity, becoming increasingly important as real world data increasingly gets used for regulatory decision making. Electronic health records are a common source of high-volume and highly variable data, as are clinical imaging studies. Machine learning algorithms can use big data from these sources to streamline disease screening and personalize treatment. Vivian Lee, of Verily, noted that machine learning streamlines diabetic retinopathy screening in India, where ophthalmologists are in short supply. There, machine learning translates image pixels into meaningful information, which allows for the automatic interpretation of imaging studies.

Big data is also rich in variety, Davis noted. Large data sets can be provided in various formats and structures, with diverse semantics. Thus, there is a critical need for data standardization.

To address veracity—the bias, noise, and abnormality in data—the research community must understand variations in data sets and the influences of confounding variables such as fraud or medication adherence. Jacqueline Corrigan-Curay, of the U.S. Food and Drug Administration, shared insights from the airline industry, where big data drives operations. Airline data are highly standardized, however, whereas health data are subject to variations in patient conditions, quality of medical records, and level of detail. The healthcare industry needs a way to standardize data.

Challenges of Big Data

“Algorithm aversion,” in which people are likely to believe their own opinion versus the output of an algorithm, may pose a barrier to the use of big data. Movement towards the creation of an “internet of bodies,” in which health information is shared in real-time, has created additional challenges related to trust and cyber security. Finally, distinguishing a high-quality data analysis from a poor-quality analysis may prove challenging and create issues of misinformation.

Observational data sets pose a unique set of challenges because they are not randomized and therefore do not meet traditional standards for data quality. It may therefore be helpful to shift the focus from data quality to a broader consideration of evidence quality, according to Patrick Ryan, of Janssen and Columbia University. Massive data sets can reveal useful information about medical conditions even when the data are messy, he notes. For example, only 40 head-to-head clinical trials of hypertension drugs have been performed, but numerous electronic health records and insurance claims contain valuable information that can link clinical data to patient outcomes for millions of patients around the world.

Ryan also noted opportunities for using observational data to establish collective goals and objectives. He proposed moving towards a shared management system: a repository of shared datasets, questions, and sets of analyses. The healthcare community should band together to build the physical infrastructure needed to manage big data.

Lee added that community engagement is another key challenge. If properly engaged in the research mission, patients and families can collect important information from wearable sensors. This information can be combined with electronic health records and insurance claims data to advance healthcare outcomes. However, this level of community engagement requires strong patient trust and a shift from traditional healthcare business models to a more patient-empowering system.

Big Data in Action

Ryan described three potential uses for big data: clinical characterization of patient experiences and behaviors based on medical histories; conducting population-level effect estimates for safety surveillance and comparative effectiveness; and using patient-level prediction for precision medicine to complement knowledge from clinical trials, ultimately allowing physicians to predict which patients will experience which treatment benefits and side effects.

One example is the Observational Health Data Sciences and Informatics (OHDSI) collaborative, an open science community with participation from academia, industry, and governments around the world. This partnership explores how observational data can be amassed to generate reliable evidence for interventions. Recognizing that the medical community needs data standards, the partnership uses open source software and methodological tools to analyze the risks and benefits of medical products.

Smartphone applications for people with diabetes are another example of big data in action. These apps recognize the foods in photos of patients’ meals and estimate the total glucose value. The apps also track glucose levels and allow patients to share the information with friends as well as their care providers.

Panel Discussion

Panel: Incorporating Big Data into the Healthcare Landscape: Applications and Open Questions


Moderator: Mark Sheehan (The Ethox Centre, University of Oxford)
Session 2. Ethical Risks and Rewards for Big Data in Healthcare

Panelists and Keynote Speaker

Barbara Evans

The University of Houston Law Center

Jamie Holloway

Georgetown University

Solomon Iyasu

Merck

Sally Okun

PatientsLikeMe

Highlights

  • Shared ownership of big data is preferable to exclusive ownership by physicians or researchers.
  • Strong dialogue between research and consumer communities can help ensure that consumers’ concerns and goals are addressed.
  • Guidelines are needed to ensure that big data does not jeopardize a person’s medical or life insurance by exposing pre-existing conditions.

Data Ownership and Stewardship

Data ownership comes with a bundle of rights and duties, said Barbara Evans, of The University of Houston Law Center. People tend to think of data ownership in terms of property ownership (e.g., housing). But while someone might own a house, she pointed out, the house can potentially be taken away by eminent domain. Because of this, shared ownership of big data through the formation of a trusted data partnership might be beneficial, said Sally Okun, of PatientsLikeMe. This would allow physicians to access the data for some purposes, while the public reserves the right to access their data and be informed of how their data are shared.

Solomon Iyasu, of Merck, brought up the issue of corporate responsibility. He noted that the European General Data Protection Regulation (GDPR) gives patients ownership of data. The GDPR is a framework for protecting European data in the context of cloud computing or, more generically, IT outsourcing. Principles within this framework include: the rights of all parties to know the purposes of data collection and sharing at all times, including when the purpose changes; reliable and transparent data management; and an understanding that regulatory frameworks have not kept pace with technology and grey areas exist.

Concerns about Data Sharing

There are serious concerns about the potential for genomic data to jeopardize a person’s medical and life insurance by exposing pre-existing health conditions. A risk-benefit analysis is needed to determine how, when, and why genetic information should be used and shared by physicians.

Jamie Holloway, of Georgetown University, noted that trust and partnership are even more critical when working with terminally ill patients and vulnerable populations. Patients in a health crisis are typically searching for clinical trials and solutions, and willing to share any data that might help.

Public Dissemination of Knowledge

Public trust is earned when researchers demonstrate that they use big data in effective and socially responsible ways, Okun noted. Trust is built between the research and consumer communities through the use of language appropriate for the lay public, setting reasonable expectations about an intervention’s effectiveness, and by addressing a population’s specific concerns and goals, Holloway said. For example, patients advocated for studies on the potential use of lithium to treat ALS, and researchers completed the study with the patient’s input in mind.

There is a need to define the social benefits that may arise from big data, Evans said. These benefits may be financial (e.g., if public data reveal strategies for lowering drug prices) or medical (e.g., if big data help home in on geographical areas with low vaccination rates, so that public health interventions can be targeted to those regions). The Patient Innovation Council at Merck, for example, is working with patients to identify particularly compelling social benefits.

Keynote Lecture: AI in Medicine — Navigating the Risks, Seizing the Opportunities

As of yet, however, the healthcare industry is unprepared for the challenges and opportunities of this new era, keynote speaker Harlan Krumholz, of the Yale School of Medicine, warned. If the industry remains stuck in old rules and standards, the vast potential of big data may go unrealized. Krumholz urged the healthcare community to abandon outdated models of clinical research and transition to a model of greater connectedness—a centralized platform for data integration where collective wisdom can be built.

Applications of Artificial Intelligence in Healthcare

Applications of Artificial Intelligence in Healthcare

The power of big data can be harnessed by asking the right questions, Krumholz noted, and by learning from every interaction. Electronic health records currently contain binary data, which are shallow and do not capture real-world information about patients. These records could be enriched by asking better, more in-depth questions. This, in turn, could pave the way for artificial intelligence to recognize patterns and read x-rays, pathology slides, angiograms.

Yale’s centralized platform for data integration, TrialChain, demonstrates how this can be achieved, Krumholz explained. TrialChain is a blockchain-based platform that can be used to validate data integrity from large, biomedical research studies. Yale has developed phenotypes that represent patients, and they have linked servers to create a supercomputer that uses open-source programming. “We have a lot of things in medicine that require pattern recognition,” said Krumholz. “It turns out that humans are terrible at this. This is where I think expert systems might help.” The system learns from a predictive model and returns data to the electronic health record in real time. With this system, the team at Yale can handle queries in hours rather than weeks. A combined private/public blockchain platform allows for both public validation of results while maintaining additional security and lower cost for blockchain transactions.

Schematic of TrialChain, a blockchain-based platform that can be used to validate data integrity from large, biomedical research studies

Schematic of TrialChain, a blockchain-based platform that can be used to validate data integrity from large, biomedical research studies

Panel Discussion

Panel: Ethical Risks and Rewards for Big Data in Healthcare


Moderator: Rainu Kaushal (Weill Cornell Medicine; New York Presbyterian Hospital)

Keynote Lecture

AI in Medicine: Navigating the Risks, Seizing the Opportunities


Harlan Krumholz (Yale School of Medicine)

Further Readings

Krumholz

Warraich JH, Califf RM, Krumholz HM.

Digital Medicine. 2018, 1:1-3.

Dai H, Young HP, Durant T JS, et al.

Distributed, Parallel and Cluster Computing. 2018. arXiv:1807.03662

Session 3. Privacy Protection and Data Ownership

Panelists

Eric Perakslis

Harvard Medical School

Arti K. Rai

Duke University School of Law

Mark Barnes

Ropes & Gray

Nadav Zafrir

Team8

Highlights

  • Ownership, security, and privacy of big data are among the most pressing issues to address.
  • Consumers want access to their data, but trade secrets may limit data sharing.
  • Tradeoffs exist between government regulation and self-regulation.
  • Next-level encryption is the most promising strategy for enhancing data security.

What are health data and why are they so interesting?

Eric Perakslis, of Harvard Medical School, noted that discussions about personally identifiable information (PII) and personal health records (PHR) are outdated. For instance, the Undiagnosed Diseases Network allows parents to tweet photos of their children to crowd-source possible diagnoses. Now, with the rise of real-time data, the discussion revolves around how much data should be shared and who should have access.

Meaning of Ownership

Data ownership becomes especially controversial in the face of trade secrecy. Many patients would prefer access to open data, but this information could expose a company’s trade secrets. Arti Rai, of the Duke University School of Law, said that United States law does not accord property rights of health data to individuals, but rather focuses on consent or liability requirements. Regulations such as the General Data Protection Regulation (GDPR) tend to benefit big data companies such as Google, Rai said, more than individuals. The GDPR does, however, grant the public the right to understand any automated algorithm that is used to make decisions affecting them. The owner of the algorithm is then required to give the public information about the algorithm’s methodology. It is possible for companies to release the algorithm information in a way that meets the explainability requirements without revealing source code.

Addressing Security and Privacy

It’s important to remember that the technologies on our wrists and in our pockets are surveillance devices that can uniquely identify and locate each of us, Perakslis said. For this reason, there is an urgent need for legislation to ensure that all members of the public, including the elderly, understand how and why health sensor technologies are being used. In addition, Perakslis noted, healthcare systems should take stronger steps to protect privacy and avoid data breaches by creating a strong technology framework. Hospitals rarely know how to address security risks after their technology systems are hacked. He described how the European National Health System’s technology failed because they could not afford Information Technology systems.

Legislation should address how healthcare data are used, who uses the data, and set limits on its use. Mark Barnes, of Ropes & Gray said that penalties should be created and enforced to discourage the misuse of data. For example, the misuse of HIPAA data for commercial purposes is a felony, but penalties are rarely enforced. Implementing transparency requirements and using de-identified data could further prevent abuse.

Nadav Zafrir, of Team8, discussed ways to enhance privacy in data sharing. For example, Zafrir and his colleagues are working on homomorphic encryption, which would allow for information sharing in which patients provide their data to companies and companies use that data without decrypting it. Next-level encryption methods could therefore help resolve dilemmas surrounding public trust and privacy in data sharing, because personally identifiable information would never be revealed. He also argued that regulation can set standards, but the technology industry has to lead the way because not every nation will follow the same standards. It would be helpful to make cyber-based interactions an integral part of big data, rather than trying to avoid these types of interactions.

Panel Discussion

Panel: Privacy Protection and Data Ownership


Moderator: Bernd Stahl (De Montfort University)
Session 4. Recruitment and Consent

Panelists

Consuelo H. Wilkins

Vanderbilt University Medical Center

Deborah Kilpatrick

Evidation Health

Mitchell Lunn

University of California, San Francisco

Misti Ault Anderson

U.S. Department of Health and Human Services

Craig Lipset

Pfizer

Highlights

  • Involving consumers in the early stages of clinical trial design could help mitigate public concerns about data use.
  • There are widespread concerns about the ethics of using social media to recruit trial participants and the financial gains that private companies derive from consumer data.

Moving Beyond Recruitment to Engagement

Consuelo Wilkins, of the Vanderbilt University Medical Center, noted that patients are most engaged and trusting when people affected by the studies have influenced the design, study questions, and other components of the work.

Similarly, Mitchell Lunn, of the University of California, San Francisco, said engagement is particularly important when working with communities that have already been stigmatized. For example, Lunn said the PRIDE study, which reached 13,000 people in the LBGTQ community, achieved high levels of participant engagement through the use of online visuals showing people of all colors, body shapes, and abilities.

Ethics of Tailored and Online Recruitment Strategies

On average, only 2% of the lay public participates in research studies. There can be many reasons for this, but one simple reason is geographic access. People in rural communities may find it especially challenging to participate in clinical trials, especially when tertiary care centers ca be many miles away, said Deborah Kilpatrick of Evidation Health.

Clinical researchers are often frustrated by the need to de-identify data because this makes it difficult to track down previous study participants for help with ongoing research. This forces researchers to start the recruitment process from scratch, including reaching out to physicians for help in identifying study participants. But physicians receive an average of 77 notifications for new clinical trials every day, which fosters fatigue and over-engagement, especially since physicians have no incentive to assist in this process (e.g., they are not listed as authors on the study).

It is not necessary to rely on the help of physicians, however. Researchers can identify potential study participants via opt-in systems, via screening tools linked to electronic health records or by directly approaching patients with user-friendly information about the study. Screening tools that rely on health records could use minimal identifiers to find appropriate participants, then dive deeper if a participant is deemed eligible. Facebook ads for clinical trials are designed this way. As long as the company is transparent about the study, and encourages patients to engage their doctors in a fuller discussion, this could be an effective and ethical approach.

Consuelo noted that tailored recruitment and IRB verbiage are not necessarily coercion but could potentially be considered as such because they use language to encourage people to behave differently (e.g., by using images that trigger specific emotions that may encourage participants to enroll in the study). Kilpatrick referenced the ability to give this power to the people by focusing on data streamed from patient-controlled devices such as smartphone apps and wearables or other connected sensors, which can be turned on or off anytime.

Misti Ault Anderson, of the U.S. Department of Health and Human Services, described recent changes in the regulatory requirements surrounding participant recruitment and eligibility. Existing rules require a researcher to work with the IRB and obtain a waiver of consent. The new rules don’t require an official waiver as long as researchers collect data by interacting with a patient or by testing existing records or specimens. However, there are gaps in the regulations, as evidenced by the data collected by Facebook, and Anderson said the HHS has heard from a number of patients who didn’t know their data were being used in research.

Mitchell said that online recruitment and meeting patients where they live, work, and play can attract a larger and more diverse study population. He has used Facebook ads to target LGBTQ participants, who are easier to find online versus at medical centers. Mitchell’s team uses images and language tailored to target communities. They also track engagement and consent from the first click to the completion of study activities. In another example, a recent study by Kilpatrick's company used an online study platform to enroll 1,156 participants within a five-day recruitment window. In the first hour, the enrollment rate was 6.6 participants per minute (394 per hour).

Complications arise when the patient’s information is entered into a database and used in additional studies, especially when the data are de-identified or when the patient waived consent. There is not always a requirement to stop using this data after a patient leaves the study, so participants should be counseled about the informed consent process and the potential future uses of their data, particularly if data are not individually permissioned.

Dealing with Data Monetization

When participants discover that their data have been sold, trust is lost. Many participants are not aware that when they click a button agreeing to enter their data in an electronic health records system, their data could be monetized, said Craig Lipset, of Pfizer. The HHS now advises researchers to include monetization and profit-sharing statements when they solicit data from the public. Lipset expects the healthcare data model to become similar to the consumer product model, where consumers (or buyers of data) can decide which data to purchase based on how ethically it was acquired and is being used.

Panel Discussion

Panel: Recruitment and Consent


Moderator: Adrian F. Hernandez (Duke University School of Medicine)
Session 5. Third-party Data Access and Genomic Databases

Panelists and Keynote Speaker

Amy Abernethy

Flatiron Health

Jennifer E. Miller

Yale School of Medicine; Bioethics International

Craig Konnoth

University of Colorado, Boulder

Kate Black

23andMe

Patricia Furlong

Parent Project Muscular Dystrophy

James Lu

Helix

Highlights

  • Genetic and third-party uses are the least regulated and protected forms of health data.
  • Clear and appropriate communication, opportunities to consent or opt out at multiple points in the process, and frequent check-ins with patients are critical practices that build public trust.
  • There is a need for a shared decision-making model that encompasses all big data, and has special protections for genetic data.

Keynote Lecture: Research at Scale — Rethinking Evidence Generation for the 21st Century

Today, patients benefit from increasingly targeted and informed treatment, noted Amy Abernethy, of Flatiron Health, in her keynote lecture. Abernethy turned to an example from lung cancer literature to illustrate how a clinical research finding can be scaled up to allow for more effective and tailored use of pharmaceuticals.

PDL1 expression testing

PDL1 expression testing

Thanks to the discovery of the PDL1 biomarker, the standard of care for lung cancer patients has rapidly improved. Today, nearly all patients with lung cancer are tested for the PDL1 biomarker, which predicts who will benefit from certain immuno-oncology agents. Working with the U.S. Food and Drug Administration, Flatiron used datasets of every patient who had received lung cancer drugs the year before to explore how clinical trials compared with real world cases. Evaluation of this large dataset revealed that age did not affect outcomes, even though clinical trial patients were in different age groups than people using the treatment. From this example, it is easy to imagine how real-world evidence – including prospective and retrospective data – might be analyzed to predict how pharmaceuticals will perform in the market.

Real world evidence can be used to bridge the gap between rapidly changing results seen in long-term data, the expanding research and design pipeline giving more information on how drugs can be used and the increasing level of personalization. Real world evidence can help with monitoring uptake of drug use in the market, which is generally unobserved. This is all being driven by the 21st Century Cures Act, which asks the community how to confidently use real world evidence.

“Everybody thinks that since Flatiron was originally a Google based company, we must have some of the best NLP and machine learning people in the world solving this,” said Abernathy. But that's not the case. "We build software that lets human experts be better at pulling those data out of the charts,” she continued. “We don’t use machine learning until we are confident that any machine learning algorithm can do it with the same level of quality as a real person.”

The Unique Challenges of Genetic Data

For genomic data, the issues of privacy and ethics are especially complicated because data sharing transcends generations and families. Craig Konnoth, of the University of Colorado, Boulder, noted that genomic data are underregulated under HIPPA. Increased protection is needed in three areas: sample collection, use, and disclosure. Protections against unwanted collection of an individual’s genetic material is minimal at the state and federal levels, though some states and the United Kingdom prohibit this practice. Use of this information is more restricted and controlled by federal laws such as the Genetic Information Nondiscrimination Act (GINA), which relate to health insurance and employers. Privacy policy is covered by FTC’s policy preventing unfair or deceptive trade practices.

Role of Third-Party Data Providers

Privacy standards for genetic data are customer driven, noted Kate Black of 23andMe. While the FDA reviews some data components for security and privacy, third-party data providers that work with 23andMe are overseen by an external IRB, data are de-identified according to HIPAA standards, and data are used in aggregate once in the research database. Black said that 23andMe has turned down providers that aren’t compatible with this model, Black said. She also noted that the site uses transparent data practices, educates customers on ongoing studies, and requires customer permission before data are shared. There are a number of opt-in components that participants must actively select to share their genetic data via 23andMe, including ‘research’, ‘biobanking’, ‘discovering relatives,’ and ‘sharing information with others.’

One of the world’s largest clinical sequencing labs is housed at Helix, where James Lu has observed varying uses of data over time. Helix stores data, but the owner of the data (i.e., the patient) retains the right to decide where and how their data are used. The company also allows individuals to remove data from third-party applications. Helix links to a number of providers, so individuals can decide, for example, if they want to learn more about their genetics from the Mayo Clinic; be involved in genetics research through Health Nevada; or learn about their ancestry at National Geographic.

Patricia Furlong, of Parent Project Muscular Dystrophy, noted that when patients or their caregivers are given a catastrophic diagnosis, they tend to share any information that might prove helpful. The organization works closely with parents, as a trusted partner, to provide free genetic testing, identify key questions and process scientific information. Information can also be shared with pharmaceutical companies to see if experimental compounds target affected protein pathways.

Using Real World Evidence as a Control Arm

Using Real World Evidence as a Control Arm

How Much Do Patients Understand When They Provide Genetic Data?

Jennifer E. Miller, of Yale School of Medicine and Bioethics International, mentioned that there are three common ways of collecting genetic data: routine genetic tests, over-the-counter kits, and clinical trials. Each collection method provides a different level of information to the consumer. The information people understand is inversely related to the length of the document, Lu added. Helix attempts to overcome this by limiting information to three or four bullet points. Still, most people are unaware that data from routine lab tests can be used in various ways, and once de-identified, they lose legal rights to the data, Miller explained.

Black noted that in surveys, 23andMe found that 90%-95% of customers still remember five years later that they signed consent forms, are enrolled in research studies, and are aware that scientists will use their data. During any form of customer engagement, Black said that the company requires users to opt out or click an acknowledgement stating that their responses will be used for research purposes.

Keynote Presentation

Research at Scale: Rethinking Evidence Generation for the 21st Century


Amy Abernethy (Flatiron Health)

Panel Discussion

Panel: Third-party Data Access and Genomic Databases


Moderator: Erika Fry (Fortune)

Further Readings

Abernathy

McPadden, J, Durant T JS, Bunch, DR, et al. 2018.

Distributed, Parallel and Cluster Computing. Preprint. August 2018. arXiv:1808.04849v1.

Miksad RA, Abernethy AP.

Clinical Pharmacology & Therapeutics. 2018 Feb;103(2):202-205. doi:10.1002/cpt.946.

Session 6. Corporate Responsibility: Working for Social Good

Panelists

Tara Montgomery

Civic Health Partners

Joanne Waldstreicher

Johnson & Johnson

Thomas Donaldson

The Wharton School, The University of Pennsylvania

Bray Patrick-Lake

Duke Clinical Research Institute

Highlights

  • Interindustry agreements can build greater trust and may be more effective than the efforts of individual corporations.
  • Efforts to promote social good should span all levels of corporate governance, from staff to upper management and Boards of Directors.
  • Antitrust laws create a barrier to open discussion and limit open data sharing.
  • Researchers need mechanisms for contacting former clinical trial participants who might be at risk of future health issues, or who might benefit from future trials.

Main Principles of Corporate Social Responsibility

As a society, we tend to prescribe the same property and privacy rights to corporations as we do to humans, said Thomas Donaldson, of The Wharton School at The University of Pennsylvania. But corporate responsibility is unique, and it is influenced by two things: the business’s purpose in society and the implicit or explicit agreements, such as industry agreements.

A business’ purpose in society is influenced by a set of collective values such as health, knowledge, fairness, and privacy. This is especially true for the healthcare industry, where businesses are configured differently than commodity businesses. There is often a disconnect between broad values and the organization’s actual behaviors. However, interindustry agreements could help organizations align values with behaviors and build greater public trust.

Two principles are key to corporate social responsibility, said Joanne Waldstreicher, of Johnson & Johnson: protecting the privacy of individuals who participate in research and listening to those individuals. Waldstreicher said that Johnson & Johnson is collaborating with Yale’s Open Data Access group to develop responsible mechanisms for sharing clinical trial data. This involves de-identifying data and asking researchers to sign a confidentiality agreement. The team also uses a secure website for data access and doesn’t allow researchers to combine with outside data. Lastly, they gave decision rights to an academic group at Yale, which independently reviews all proposals to determine appropriate data access.

Waldstreicher also shared that Johnson & Johnson has developed a credo to help ensure that all levels of the corporation, including leadership and the board, are working to advance social good. This credo lays out the company’s responsibilities to people who use products, employees, and shareholders.

What Should Corporations Do to Promote Social Good?

Montgomery suggested that the most effective approach for corporations is to shift the power to the public, share insights about data with communities, and ask communities to define their goals and values. Bray Patrick-Lake, of the Duke Clinical Research Institute, noted that recent privacy breaches have decreased the public’s trust in data sharing, but the Good Pharma scorecard could help with this by tracking consistency, accountability, and transparency for stakeholders.

Waldstreicher and Donaldson seconded the need for open discussions with the public, but noted that anti-trust laws make it difficult to have meaningful conversations without worrying that the responses might inspire litigation. Donaldson said that a large and powerful ‘champion’ should take the lead on this process, and encourage smaller firms to follow suit. He described the responsible care initiative in the chemical industry, wherein small firms needed to be assured by larger firms that they would not get shortchanged.

Tara Montgomery, of Civic Health Partners, said that corporate social responsibility must balance trust with profit, rather than the more widespread model of causing harm and then taking steps to offset that harm. For example, certain data should not be shared because it could undermine the greater good, even if sharing that data might generate a large profit.

Is Transparency Always Good?

Transparency can be a double-edged sword, the panel noted. To determine the appropriate level of transparency for a given situation, corporations should have patient advocates or involve patients in early decision-making processes, as is the case with the National Healthcare Council.

If a company identifies a serious risk based on genetic data, for example, should they reach out to potentially affected communities? Montgomery suggested that the company should tell the community that important results were found but let individuals decide if they would like to know the full details.

Waldstreicher cautioned against transparency in all situations unless the community has already been engaged. For example, the HIV community has advocated for increased data sharing, but revealing a person’s HIV status can have severe consequences.

Transparency can be challenging during clinical trials because companies don’t have direct contact with participants. Rather, communications are funneled through study coordinators. This presents an issue when predictive modeling is involved: Companies can purchase and analyze datasets to predict which patients might benefit most or have the highest risk of side effects, but then the company cannot reach out to warn these participants. The creation of participant alumni networks could help address this challenge.

Panel Discussion

Panel: Corporate Responsibility: Working for Social Good


Moderator: Richard Moscicki (PhRMA)
Session 7. Big Data, Big Future? A Debate

Speakers

Robert M. Califf

Duke University; Verily

The Future of Big Data

Big data may be a salvation for the healthcare catastrophe that is slowly engulfing the United States, said Robert Califf, of Duke University and Verily. The U.S is in the third year of a decline in life expectancy and, among 18 high-income countries, our nation has the worst health outcomes. Life expectancy in the U.S. is expected to rank 28th in the world by 2040, worse than many middle-income countries. Instead of becoming healthier, Americans are experiencing rising blood pressure, lipid and glucose levels, and obesity.

Califf explained that this decline is not the same across all people and places. Social determinants of health—including sex, race, income, and ethnicity—explain as much as 60%–85% of the variation in life expectancy. Interestingly, however, the geospatial determinants have shifted. Big metropolitan areas and university cities are experiencing increased longevity while rapidly deteriorating health outcomes are seen in white males, ages 25–60, from predominantly rural regions such as Oklahoma and West Virginia. These populations are most affected by opioid use, suicide, and cardiovascular issues.

While the medical community believes data and facts are enough to make a change, said  Arthur Caplan, of the NYU School of Medicine, a far more important shift is needed. Access to healthcare in the U.S. should be treated as a right, not merely as a privilege. Similarly, people must learn to care about each other, otherwise the vast amounts of data researchers generate will only benefit those with the most resources. Caplan suggested that corporations invest a percentage of their profits from big data back into rural health programs or the NIH. Additionally, since the idea of privacy has been lost, public trust could be restored through the enforcement of penalties against companies that violate ethics policies.

Furthermore, there is a pressing need to overcome the problem of misinformation. When scientists share information, they are obligated to talk about uncertainty and caveats, but this can inadvertently lower the public’s confidence. In the meantime, the internet is full of conflicting or inaccurate information about health issues. For instance, a high proportion of the anti-vaccination and anti-Affordable Care Act information originates from the Russian government, Caplan said. People tend to believe this misinformation, despite a recent warning by the U.S. Government Accountability Office (GAO) that states that do not expand their Medicaid programs will face the largest declines in life expectancy and health outcomes.

In the future, Califf said, individual patients will have more control of their data and will be able to arrive at solutions faster than they currently can. We are nearly there: Big data can be organized and curated, bringing together genomic and immune profiles, lab data, electronic health records, and imaging studies. A big shift will occur when researchers begin to track the behavioral determinants of health. Once these capabilities are scaled to the population level, the healthcare industry will have real-time data for individuals on every street, in every neighborhood and county. Decision makers will then be able to predict, with increasing accuracy, the outcomes of healthcare policies.

Discussion

Big Data, Big Future? A Debate


Robert M. Califf (Duke University; Verily) and Arthur Caplan (NYU School of Medicine)
Open Questions

What policy shifts would best promote the effective and ethical use of big data?

Which ethics framework is best and who (e.g., a company-led coalition) should implement it? Within this framework, what models, systems and standards might best protect under-represented populations?

How might a regulatory framework integrate data from real-world users?

What tangible societal benefits are most desired from the use of big data?

How does behavioral data track with traditional health data?

How might it be possible to build a two-way feedback system between clinical trial data and healthcare data?

As automation becomes more common, what role do humans play in stewarding the data?

How can researchers ensure that consumers understand how data are being used? How can communication be improved in an unbiased manner, while reducing the potential for miscommunication?

How can researchers improve the privacy of data storage?

How early can researchers involve the public in data collection, given anti-trust and related laws?

What legal issues preclude contacting former trial participants with new information and how might these issues be managed to enhance the societal benefits of big data?