Support The World's Smartest Network

Help the New York Academy of Sciences bring late-breaking scientific information about the COVID-19 pandemic to global audiences. Please make a tax-deductible gift today.

This site uses cookies.
Learn more.


This website uses cookies. Some of the cookies we use are essential for parts of the website to operate while others offer you a better browsing experience. You give us your permission to use cookies, by continuing to use our website after you have received the cookie notification. To find out more about cookies on this website and how to change your cookie settings, see our Privacy policy and Terms of Use.

We encourage you to learn more about cookies on our site in our Privacy policy and Terms of Use.


Leveraging Big Data and Predictive Knowledge to Fight Disease

Leveraging Big Data and Predictive Knowledge to Fight Disease
Reported by
Don Monroe

Posted September 29, 2015


Big data is transforming science, as high-throughput measurements and powerful computing allow researchers to interrogate vast quantities of data. But computer-assisted analysis can outrun our ability to make intuitive sense of data, and researchers must devise new procedures to manage it. At the same time, transactions conducted by computers let companies assemble vast—if imperfect—logs of human behavior.

These trends collide in health care. Computerization of health records, accelerated by Federal stimulus funding, provides opportunities to mine data for public health and private profit. On July 28, 2015, the Academy's Biochemical Pharmacology Discussion Group presented a symposium called Leveraging Big Data and Predictive Knowledge to Fight Disease to explore the topic.

Use the tabs above to find a meeting report and multimedia from this event.

Presentations available from:
Arjun Krishnan, PhD (Princeton University)
Sisi Ma, PhD (New York University School of Medicine)
Michael Matheny, MD, MS, MPH (Vanderbilt University)
Jason H. Moore, PhD, MS (University of Pennsylvania)
Niven R. Narain, MD (Berg Pharma)
Nicholas Tatonetti, PhD (Columbia University Medical Center)
Craig P. Webb, PhD (NuMedii Inc.)
Chunhua Weng, PhD (Columbia University)
Diane Wuest, PhD (GNS Healthcare)

The Biochemical Pharmacology Discussion Group is proudly supported by:

  • American Chemical Society
  • Boehringer Ingelheim
  • Merck
  • WilmerHale

Premiere Supporter

  • Pfizer

How to cite this eBriefing

The New York Academy of Sciences. Leveraging Big Data and Predictive Knowledge to Fight Disease. Academy eBriefings. 2015. Available at:

Journal Articles

Ayvaz S, Horn J, Hassanzadeh O, Zhu Q, et al. Toward a complete dataset of drug–drug interaction information from publicly available sources. J Biomed Inform. 2015;55:206-17.

Boland MR, Miotto R, Weng C. A method for probing disease relatedness using common clinical eligibility criteria. Stud Health Technol Inform. 2013;192:481-5.

Cronin RM, VanHouten JP, Siew ED, et al. National Veterans Health Administration inpatient risk stratification models for hospital-acquired acute kidney injury. J Am Med Inform Assoc. 2015;22(5):1054-71.

Cui JJ, Tran-Dubé M, Shen H, et al. Structure based drug design of crizotinib (PF-02341066), a potent and selective dual inhibitor of mesenchymal-epithelial transition factor (c-MET) kinase and anaplastic lymphoma kinase (ALK). J Med Chem. 2011;54:6342-63.

Greene CS, Krishnan A, Wong AK, et al. Understanding multicellular function and disease with human tissue-specific networks. Nat Genet. 2015;47:569-76.

He Z, Carini S, Sim I, Weng C. Visual aggregate analysis of eligibility features of clinical trials. J Biomed Inform. 2015;54:241-55.

Hu T, Darabos C, Cricco ME, et al. Genome-wide genetic interaction analysis of glaucoma using expert knowledge derived from human phenotype networks. Pac Symp Biocomput. 2015:207-18.

Johnson TW, Richardson PF, Bailey S, et al. Discovery of (10R)-7-amino-12-fluoro-2,10,16-trimethyl-15-oxo-10,15,16,17-tetrahydro-2H-8,4-(metheno)pyrazolo[4,3-h][2,5,11]-benzoxadiazacyclotetradecine-3-carbonitrile (PF-06463922), a macrocyclic inhibitor of anaplastic lymphoma kinase (ALK) and c-ros oncogene 1 (ROS1) with preclinical brain exposure and broad-spectrum potency against ALK-resistant mutations. J Med Chem. 2014;57:4720-44.

Kwak EL, Bang YJ, Camidge DR, et al. Anaplastic lymphoma kinase inhibition in non-small-cell lung cancer. N Engl J Med. 2010;363:1693-703.

Matheny ME, Peterson JF, Eden SK, et al. Laboratory test surveillance following acute kidney injury. PLoS One. 2014;9:e103746.

Meeker D, Jiang X, Matheny ME, et al. A system to build distributed multivariate models and manage disparate data sharing policies: implementation in the scalable national network for effectiveness research. J Am Med Inform Assoc. 2015. [Epub ahead of print]

Miotto R, Weng C. Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials. J Am Med Inform Assoc. 2015;22:e141-50.

Miotto R, Weng C. Unsupervised mining of frequent tags for clinical eligibility text indexing. J Biomed Inform. 2013;46:1145-51.

Monks NR, Cherba DM, Kamerling SG, et al. A multi-site feasibility study for personalized medicine in canines with osteosarcoma. J Transl Med. 2013;11:158.

Moore JH, Hill DP. Epistasis analysis using artificial intelligence. Methods Mol Biol. 2015;1253:327-46.

Ohno-Machado L, Agha Z, Bell DS, et al. pSCANNER: patient-centered Scalable National Network for Effectiveness Research. J Am Med Inform Assoc. 2014;21:621-6.

Resnic FS, Gross TP, Marinac-Dabic D, et al. Automated surveillance to detect postprocedure safety signals of approved cardiovascular devices. JAMA. 2010;304:2019-27.

Saulnier Sholler, GL, Ferguson W, et al. A pilot trial testing the feasibility of using molecular-guided therapy in patients with recurrent neuroblastoma. J Cancer Ther. 2012;3:602-12.

Sirota M, Dudley JT, Kim J, et al. Discovery and preclinical validation of drug indications using compendia of public gene expression data. Sci Transl Med. 2011;3(96):96ra77.

Tatonetti NP, Denny JC, Murphy SN, et al. Detecting drug interactions from adverse-event reports: interaction between paroxetine and pravastatin increases blood glucose levels. Clin Pharmacol Ther. 2011;90:133-42.

Weng C, Li Y, Ryan P, et al. A distribution-based method for assessing the differences between clinical trial target populations and patient populations in electronic health records. Appl Clin Inform. 2014;5:463-79.

Xing H, McDonagh PD, Bienkowska J, et al. Causal modeling using network ensemble simulations of genetic and gene expression data predicts genes involved in rheumatoid arthritis. PLoS Comput Biol. 2011;7:e1001105.

Zou HY, Friboulet L, Kodack DP, et al. PF-06463922, an ALK/ROS1 inhibitor, overcomes resistance to first and second generation ALK inhibitors in preclinical models. Cancer Cell. 2015;28:70-81.

Zou HY, Li Q, Engstrom LD, et al. PF-06463922 is a potent and selective next-generation ROS1/ALK inhibitor capable of blocking crizotinib-resistant ROS1 mutations. Proc Natl Acad Sci U S A. 2015;112:3493-8.

A registry and results database of publicly and privately supported clinical studies of human participants; a service of the U.S. National Institutes of Health.

Connectivity Map
A research effort at the Broad Institute that aims to generate a detailed map that links gene patterns associated with disease to corresponding patterns produced by drug candidates and a variety of genetic manipulations.

Genome-scale Integrated Analysis of gene Networks in Tissues (GIANT)
A Princeton University initiative.

LINCS Consortium
A U.S. National Institutes of Health program that aims to create a network-based understanding of biology by cataloging changes in gene expression and other cellular processes that occur when cells are exposed to a variety of perturbing agents, and by using computational tools to integrate this diverse information into a comprehensive view of normal and disease states that can be applied for the development of new biomarkers and therapeutics.

Longitudinal and Biomarker Study in Parkinson's Disease (LABS-PD)
A study to better understand the natural course of Parkinson's motor and non-motor symptoms, from the earliest to the latest stages of the disease, and to provide a database for the development of biomarkers.

Observational Health Data Sciences and Informatics
A multi-stakeholder, interdisciplinary collaborative to bring out the value of health data through large-scale analytics.

Patient-Centered Outcomes Research Institute
A program authorized by Congress in 2010 to improve the quality and relevance of evidence available to help patients, caregivers, clinicians, employers, insurers, and policy makers make informed health decisions.


Walter Jessen, PhD

Covance Inc.

Robert Martone

St. Jude Children's Research Hospital

Sonya Dougal, PhD

The New York Academy of Sciences

Keynote Speaker

Niven R. Narain, MD

Berg Pharma

Niven R. Narain is cofounder, president, and CTO of Berg, a Boston-based biopharma company with integrated discovery, clinical, analytics, and diagnostics divisions. Narain created the company's flagship Interrogative Biology® platform, an artificial intelligence tool to derive drug targets, biomarkers, and health analytic information. He has also overseen the development of Berg's drug pipeline, led by BPM 31510, an anticancer technology that targets cancer metabolism and is under development for solid tumors and skin cancer. Other areas of focus are diabetes and CNS diseases. Narain collaborated with the U.S. Department of Defense to develop novel biomarkers, currently in clinical trials, for the diagnosis and prognosis of prostate cancer. Narain was previously director of cutaneous oncology and therapeutics research at the University of Miami Miller School of Medicine. He serves as senior biopharma advisor to Ocean Tomo in Chicago and serves on the steering committee for NASA's Gene Lab/Mars Initiative. Narain received his PhD in cancer biology and clinical dermatology at the Miller School of Medicine.


Marc D. Chioda, PharmD


Marc Chioda is an associate medical director on the Lung Cancer Team at Pfizer Oncology. He is the U.S. Medical Affairs Lead for crizotinib (Xalkori) and also supports pipeline compounds in development, such as PF-06463922, Pfizer's next-generation ALK/ROS inhibitor. With over a decade of pharmaceutical industry experience, Chioda has worked in pre-clinical research, clinical research, and health economics and outcomes research. He has practiced in hospital, community, and specialty pharmacy settings. Chioda is a coinventor on 5 pharmaceutical patents. He joined Pfizer in 2013 after serving as adjunct clinical faculty at Rutgers University, where he also earned his doctor of pharmacy degree.

Leonard James, MD, PhD


Lee James is a senior director in clinical development at Pfizer and the global clinical lead for PF-06463922, Pfizer's next-generation ALK/ROS inhibitor. He previously worked with multiple teams in clinical development and medical affairs involving both Xalkori (crizotinib) and Sutent (sunitinib). Lee received his MD and a PhD in molecular and cellular biology from the University of Washington in Seattle. His researched his PhD thesis on Myc target genes at the Fred Hutchinson Cancer Research Center. He completed residency training in internal medicine at the University of Chicago Hospitals and postgraduate training in the Oncology/Hematology Fellowship Program at Memorial Sloan-Kettering Cancer Center, with a focus on lung cancer. Before joining Pfizer he was a medical oncologist in private practice, and led the clinical implementation of a new EMR system and served as principal investigator on industry-sponsored clinical trials.

Michael Matheny, MD, MS, MPH

Vanderbilt University
website | publications

Michael Matheny is director of the Vanderbilt Center for Population Health Informatics, associate director of the TVHS Veterans Affairs Biomedical Informatics Fellowship, and assistant professor of bioinformatics, medicine, and biostatistics at Vanderbilt University. He received an MD from the University of Kentucky, an MS in biomedical informatics from Massachusetts Institute of Technology, and an MPH from Harvard University. He is a fellow of the American College of Physicians with board certifications in internal medicine and clinical informatics. He has expertise developing and adapting methods for postmarketing medical device surveillance as well as developing and evaluating NLP tools, predictive analytics, and automated surveillance applications in large observational cohorts. He currently has funding from Veterans Affairs Health Services Research and Development, the Patient-Centered Outcomes Research Institute, the National Human Genome Research Institute, the U.S. Food and Drug Administration, and AstraZeneca.

Jason H. Moore, PhD, MS

University of Pennsylvania
website | publications

Jason Moore holds an MS in applied statistics and a PhD in human genetics from the University of Michigan. He served as an assistant and associate professor of molecular physiology and biophysics at Vanderbilt University, where he held the Ingram Chair in Cancer Research and served as director of the Advanced Computing Center for Research and Education. He then moved to the Geisel School of Medicine at Dartmouth College, where he was the Frank Lane Research Scholar in Computational Genetics and later the Third Century Professor of Genetics and founding director of the Institute for Quantitative Biomedical Sciences. Moore is now the Edward Rose Professor of Informatics and chief of the Division of Informatics in the Department of Biostatistics and Epidemiology at the Perelman School of Medicine at the University of Pennsylvania. He serves as founding director of the Penn Institute for Biomedical Informatics and senior associate dean for informatics. Moore is an elected fellow of the American Association for the Advancement of Science and was a Kavli Fellow of the National Academy of Sciences.

Nicholas Tatonetti, PhD

Columbia University Medical Center
website | publications

Nicholas Tatonetti is an assistant professor of biomedical informatics in the Departments of Biomedical Informatics, Systems Biology, and Medicine and the director of clinical informatics at the Herbert Irving Comprehensive Cancer Center at Columbia University. He received his PhD from Stanford University, where he focused on developing novel statistical and computational methods for observational data mining. He applied these methods to drug safety surveillance, discovering and validating new drug effects and interactions. His lab at Columbia is focused on continuing this work to detect, explain, and validate drug effects and drug interactions from large-scale observational data. Tatonetti is interested in the integration of hospital data (stored in electronic health records) and high-dimensional biological data (captured using next-generation sequencing, high-throughput screening, and other 'omics technologies). He has been featured by the New York Times, Genome Web, Science Careers, and other media outlets.

Craig P. Webb, PhD

NuMedii Inc.
website | publications

Craig Webb is chief scientific officer at NuMedii, where he is focused on precision medicine and drug repositioning. He leads technology development and the transition of drug indication hypotheses to the clinic. He spent 13 years at the Van Andel Research Institute (VARI) in Michigan, where he directed translational science and precision oncology. Under his leadership VARI formed new companies in the areas of translational informatics (TransMed Systems and Intervention Insights), molecular diagnostics (The Center for Molecular Medicine), and clinical trial operations (ClinXus). Webb holds a PhD in cell biology and completed a postdoctoral fellowship in molecular oncology.

Chunhua Weng, PhD

Columbia University
website | publications

Chunhua Weng is an associate professor of biomedical informatics and co-director for the Biomedical Informatics Core of the Clinical and Translational Science Award at Columbia University. She holds a Master's degree in information and computer science from the University of California, Irvine, and a PhD in biomedical and health informatics from the University of Washington in Seattle. Weng's primary research interests are (1) designing and applying text knowledge engineering methods to improve the computability of clinical research designs and to support knowledge management and reuse for clinical research, (2) designing data-driven methods to improve the efficiency and patient-centeredness of clinical research, and (3) designing socio-technical solutions to integrate patient care and clinical research workflows to build a rapid-learning health system.

Diane Wuest, PhD

GNS Healthcare

Short Talk Presenters

Arjun Krishnan, PhD

Princeton University
website | publications

Arjun Krishnan is a senior researcher at the Lewis-Sigler Institute for Integrative Genomics at Princeton University. He holds a PhD in genetics, bioinformatics, and computational biology. His research focuses on tissue- and cell-specific gene expression, function, and interaction; functional and evolutionary relationships between cells and tissues; and tissue specificity and its role in disease manifestation and drug response. He integrates large-scale functional genomics data to build computational models of gene interactions in specific biological contexts. These models are designed to capture scarce expert biomedical knowledge and make systematic genome-wide predictions about gene function and disease–gene association and about the effects of genetic perturbation.

Sisi Ma, PhD

New York University School of Medicine

Sisi Ma is a research scientist at the Center for Health Informatics and Bioinformatics at New York University Langone Medical Center. Her research interest is the application of statistical modeling, machine learning, and causal analysis methods to biology and medicine. She is focused on devising and implementing causal discovery methods tailored to biomedical data; benchmarking novel and existing causal discovery and predictive modeling methods to evaluate their efficacy for biomedical data; and using multimodal, high-dimensional data to understand pathology, to develop diagnostic technologies, and to identify treatment targets. Ma received her PhD in psychology and her MS in computer science from Rutgers University.

Don Monroe

Don Monroe is a science writer based in Boston, Massachusetts. After getting a PhD in physics from MIT, he spent more than fifteen years doing research in physics and electronics technology at Bell Labs. He writes on physics, technology, and biology.


The Biochemical Pharmacology Discussion Group is proudly supported by:

  • American Chemical Society
  • Boehringer Ingelheim
  • Merck
  • WilmerHale

Premiere Supporter

  • Pfizer