Presented by the Systems Biology Discussion Group
The Reactivity of the Cellular Transcriptome to Xenobiotic Compound Perturbation
Posted March 16, 2012
Over the past 15 years, scientists' ability to perform very large-scale screens of the transcriptomic and proteomic states of a range of organisms—from bacteria to humans—has grown tremendously. With the advent of the microarray, next-generation sequencing technologies, and mass spectrometry tools, researchers are well positioned to evaluate potential drug targets quickly. The Systems Biology Discussion Group of the New York Academy of Sciences met on October 24, 2011, for The Reactivity of the Cellular Transcriptome to Xenobiotic Compound Perturbation symposium to discuss the pharmacological uses and the discovery of small, nonessential molecules that are capable of affecting biological processes in diverse and interesting ways. In particular, these molecules affect biological processes by altering the transcriptome, the set of all RNA molecules, coding and non-coding, that are transcribed in a given cell or in a particular population of cells.
Modern genome-wide expression array (microarray) technology allows researchers to screen potential drug candidates for their effects on the cellular transcriptome and to cross-reference these effects against known transcriptomal manifestations of particular diseases. A principal challenge of drug discovery, however, is how to grapple with the massive combinatorial space that arises when researchers want to screen a library of compounds against one or more of the 40,000 human transcripts—any one of which may be differentially expressed under various cellular conditions and at various times. In addition, microarray technology, used to profile gene expression, is expensive and generally limits the range of compounds that can be tested at once. Though precise, the process of correlating entire transcriptomes with disease states is extremely time- and resource-intensive. New methods are needed to reduce costs, increase throughput, and improve the informational power of current and emerging platforms.
One approach to reducing the combinatorial burden of drug candidate testing is to narrow the field of transcripts of interest, so that every transcript need not be compared for every candidate. Aravind Subramanian, of the Broad Institute of MIT and Harvard University, discussed how this approach might be applied as part of the experimental design prior to a drug screen without sacrificing gene expression information on the full transcriptome. Subramanian and colleagues have been developing a "Connectivity Map" (also called "CMap"), a map that correlates the expression patterns of disease states (genome-wide disease signatures) with a growing list of drug candidates and with the effects of genetic perturbations (e.g., RNA interference or overexpression of genes). But, instead of correlating entire transcriptomes with perturbation, Subramanian and colleagues have identified 1000 "landmark genes" (L1000)—characteristic genes from expression clusters—that can be used to infer large portions of the transcriptome to more accurately capture the "effective dimensionality" of the transcriptome, given that genes are correlated. This system offers a highly representative expression signature from a reduced transcript set, and it is a low cost, high-throughput platform for discovering the "functional connections between drugs, genes and diseases."
These expression clusters were identified by principal component analysis (PCA) of a large expression compendium of Affymetrix (microarray) data from the Gene Expression Omnibus (GEO, a public functional genomics data repository). Subramanian's team used linear regression models to find genes that were widely expressed and displayed predictive power for dependent genes, the remaining transcripts in the cluster. While Subramanian acknowledged that this method is dependent upon the training data, he believes L1000 will perform well for a number of applications. Using these 1000 landmark genes, the system was able to recapitulate clustering results similar to those recorded when using the full transcriptome.
Landmark gene expression levels were measured with ligation-mediated amplification in the Luminex-bead system. The current system uses 384-well format with a cost of about $5 per profile and throughput of 20 plates per week. Of course, the method is not limited to drug compounds but can be extended to any molecules amenable to high-throughput dispensation, such as microRNA molecules. Subramanian's group has created a preliminary dataset of 1800 compounds, approximately 1100 RNA interference knockdowns, and 550 ORF overexpression experiments in 10 cell lines to generate a connectivity map that describes the effects of small molecules and genetic perturbations. They have been able to perform early validation of the L1000 system's ability to determine context-specific connections. For example, the team screened the anti-diabetes drug Rosiglitazone, a PPARγ (peroxisome proliferator-activated receptor γ) agonist, for its interaction with prostate cancer cells (PC3) and breast adenocarcinoma cells (MCF7). The query revealed associations with drugs known to be similar, and, more importantly, only showed associations with the PC3 cells, which—unlike MCF7—express the PPARγ receptor gene. The results from an L1000 screen can be used to quickly generate new hypotheses because when any landmark gene is implicated as a target, many of its dependent genes are also associated. Thus, this method successfully aids in rapidly reducing the number of targets that need to be screened in more detail.
As important as generating a large database of pharmacological profiles is, understanding the dose response of drugs in an animal system is critical to characterizing overall efficacy. Rui-Ru Ji, of Bristol-Myers Squibb, has been developing algorithms to better understand this dose-response relationship. Many compounds hit multiple targets, e.g. a kinase inhibitor can affect multiple kinases due to domain homology. Furthermore, these targets can respond to varying levels of compounds, suggesting there is no single "correct dose." For example, Dasatinib, a small-molecule inhibitor used to treat chronic myelogenous leukemia, targets different kinases at particular concentrations ranging from 0.1 nM to 10000 nM.
To date, most methods that aim to correlate the transcript signatures associated with particular compounds do not account sufficiently for the dimensionality of microarray data (i.e., the number of microarray chips needed for each experiment) or cannot provide quantitative dose-response information. Ji presented an experimental design and computational algorithm, along with unique visualization tools, that address both challenges. The researchers set up dose-response experiments over a 6-log range with 12 doses in cell-based assays. This economical design requires only 12 arrays per compound and provides a statistically comprehensive data set without the need for point analysis. Since most drug dose responses follow a sigmoidal curve, Ji implemented a novel curve-fitting algorithm, Sigmoidal Dose-Response Search (SDRS), to be applied in genomic data sets. SDRS is a completely automated method that identifies transcripts that follow a sigmoidal dose curve over a parameter space limited to ensure realistic values. Unique to SDRS, an F-score is produced for each dose to measure the search's success in identifying probe sets, or genes, that fit the sigmoidal dose-response model.
With SDRS the researchers were able to compare the differences in potency of two compounds over the same sets of responsive genes. The F-score was then used to generate a novel visualization approach, a False Discovery Rate (FDR) heatmap in which each pixel is a dose-response–regulated gene list. Using a Fisher Test for statistical significance, they compared gene lists found at different doses and/or with different compounds to determine shared common targets or mechanisms across doses and drugs. Ji presented data that clearly showed differential dose response by cell lines (for the same drug), and the team was able find specific dose ranges and the biological pathways that the compounds may be affecting together or independently. SDRS provides an automated pipeline to better understand the effects of a drug or combination of drugs on its targets over a large dose range, thereby improving the comparison of pre-clinical data to gene expression array data.
One of the big challenges in drug discovery for cancer is the complexity of most cancers that results from the heterogeneity of their cell types. Duane C. Hassane, from Weill Cornell Medical College, spoke of the limitations of targeting only specific, primitive attributes of cancer, such as metabolic rates, when cancer cells vary significantly in these attributes. He suggested a chemical transcriptomic approach, which not only looks at individually responsive transcripts as drug targets but also identifies sets of responsive genes known to have roles in particular biological processes and determines if combinations of drugs can alter the entire program to return the system to a non-disease state.
Current therapies, which target growing and dividing cells, tend to spare stem cells, which have reduced metabolism. The survival of these cells increases the likelihood of relapse. Hassane noted that more effective treatments would target a shared property of stem cells and bulk cancer cells. One such shared property is NF-κB activity, which is associated with processes mediating cell survival and which is not a property of normal hematopoietic cells. Hassane's group looked for inhibitors of NF-κB and discovered Parthenolide (PTL), an inhibitor now in phase 1 clinical trials in the United Kingdom. Hassane's group further studied the transcripts associated with PTL activity to find interactors, genes whose products affect the same processes. Using CMap, they discovered that mTOR (mammalian target of rapamycin) inhibitors had transcriptional effects opposing those of PTL, yet when mTOR inhibitors were tested in conjunction with PTL, they enhanced the NF-κB inhibitor's effects. Hassane presented evidence that PTL in combination with Temsirolimus, an mTOR inhibitor, was an effective treatment in mice that were irradiated and injected with 2 million primary acute human myeloid leukemia cells. He subsequently found cytoprotection genes among the set of PTL-responsive genes. This result led them to ask if there was a single drug that is capable of affecting both NF-κB and cytoprotection. By using chemical transcriptomics, Hassane and colleagues were able to discover AR-42, a drug that inhibits NF-κB transcription and does not activate transcription of Nrf2, a transcription factor that decreases cytoprotection.
Chemical transcriptomics allows for the rational selection of combination therapies by identifying drugs based on a checklist of requirements. Using novel combinations of existing drugs has the added benefit of bypassing stages of the long drug testing process. Hassane's group further developed their approach by isolating normal mouse hematopoietic stem cells, transducing them with cancer genes to produce malignancy, and transplanting them into a recipient mouse. Cancer cells were then exposed to a variety of compounds before being cell-sorted. Gene signatures were obtained from microarrays. With this system, Hassane was able to find genes that are up- and down-regulated in response to different compounds and to identify combinations of compounds that normalize relevant gene expressions profiles, assuming a linear combination of factors. Using this system, it is then possible to find "if/then" conditions for drug combinations and to screen for potential cancer treatments more effectively.
Use the tab above to find multimedia from this event.
Presentations available from:
Introduction: Manuel Duval, PhD (Network Therapeutics Inc.)
Duane C. Hassane, PhD (Weill Cornell Medical College)
Rui-Ru Ji, PhD (Bristol-Myers Squibb)
Aravind Subramanian, PhD (Broad Institute of MIT and Harvard University)
Anderle P, Duval M, Draghici S, et al. Gene expression databases and data mining. BioTechniques 2003;Suppl:36-44.
Dufresne G, Duval M. Genetic sequences: how are they patented? Nat. Biotechnol. 2004;22(2):231-232.
Matteson S, Paulauskis J, Foisy S, Hall S, Duval M. Opening the gate for genomics data into clinical research: a use case in managing patients' DNA samples from the bench to drug development. Pharmacogenomics 2010;11(11):1603-1612.
Pettit S, des Etages SA, Mylecraine L, et al. Current and future applications of toxicogenomics: Results summary of a survey from the HESI Genomics State of Science Subcommittee. Environ. Health Perspect. 2010;118(7):992-997.
Duane C. Hassane
Guzman ML, Rossi RM, Neelakantan S, et al. An orally bioavailable parthenolide analog selectively eradicates acute myelogenous leukemia stem and progenitor cells. Blood 2007;110(13):4427-4435.
Hassane DC, Lee RB, Pickett CL. Campylobacter jejuni cytolethal distending toxin promotes DNA repair responses in normal human cells. Infect. Immun. 2003;71(1):541-545.
Hassane DC, Guzman ML, Corbett C, et al. Discovery of agents that eradicate leukemia stem cells using an in silico screen of public gene expression data. Blood 2008;111(12):5654-5662.
Hassane DC, Sen S, Minhajuddin M, et al. Chemical genomic screening reveals synergism between parthenolide and inhibitors of the PI-3 kinase and mTOR pathways. Blood 2010;116(26):5983-5990.
Ji R, de Silva H, Jin Y, et al. Transcriptional profiling of the dose response: a more powerful approach for characterizing drug activities. PLoS Comput. Biol. 2009;5(9):e1000512.
Ji R, Ott K, Yordanova R, Bruccoleri RE. FDR-FET: an optimizing gene set enrichment analysis method. Adv. Appl. Bioinform. Chem. 2011;4:37-42.
Ji R, Siemers NO, Lei M, Schweizer L, Bruccoleri RE. SDRS—an algorithm for analyzing large-scale dose–response data. Bioinformatics 2011;27(20):2921-2923.
Ucar D, Neuhaus I, Ross-MacDonald P, et al. Construction of a reference gene association network from multiple profiling data: application to data analysis. Bioinformatics 2007;23(20):2716-2724.
Bourquin J, Subramanian A, Langebrake C, et al. Identification of distinct molecular phenotypes in acute megakaryoblastic leukemia by gene expression profiling. Proc. Natl. Acad. Sci. USA 2006;103(9):3339-3344.
Guo S, Lu J, Subramanian A, Sonenshein GE. Microarray-assisted pathway analysis identifies mitogen-activated protein kinase signaling as a mediator of resistance to the green tea polyphenol epigallocatechin 3-gallate in her-2/neu-overexpressing breast cancer cells. Cancer Res. 2006;66(10):5322-5329.
Lamb J, Crawford ED, Peck D, et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 2006;313(5795):1929-1935.
Luo B, Cheung HW, Subramanian A, et al. Highly parallel identification of essential genes in cancer cells. Proc. Natl. Acad. Sci. USA 2008;105(51):20380-20385.
Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 2005;102(43):15545-15550.
Subramanian A, Kuehn H, Gould J, Tamayo P, Mesirov JP. GSEA-P: a desktop application for Gene Set Enrichment Analysis. Bioinformatics 2007;23(23):3251-3253.
Weng L, Macciardi F, Subramanian A, et al. SNP-based pathway enrichment analysis for genome-wide association studies. BMC Bioinformatics 2011;12:99.
Williams G. A searchable cross-platform gene expression database reveals connections between drug treatments and disease. BMC Genomics 2012;13(1):12.
Andrea Califano, PhD
Andrea Califano's doctoral thesis in physics, at the University of Florence, was on the behavior of high-dimensional dynamical systems. From 1986 to 1990, as a Research Staff Member in the Exploratory Computer Vision Group at the IBM TJ Watson Research Center he worked on several algorithms for machine learning, more specifically for the interpretation of 2D and 3D visual scenes. In 1990 Califano started his activities in Computational Biology and, in 1997, became the program director of the IBM Computational Biology Center, a worldwide organization active in several research areas related to bioinformatics, chemoinformatics, complex biological system modeling/simulation, microarray analysis, protein structure prediction, and molecular-dynamics. In 2000 he co-founded First Genetic Trust, Inc. to actively pursue translational genomics research- and infrastructure-related activities in the context of large-scale patient studies with a genetic components. Finally, in 2003, he joined Columbia University and is currently Professor of Systems Biology at Columbia University, Director of the Columbia Initiative in Systems Biology, Director of the JP Sulzberger Columbia Genome Center, and Associate Director for Bioinformatics of the Herbert Irving Comprehensive Cancer Center. Califano serves on numerous scientific advisory boards, including the Board of Scientific Advisors of the National Cancer Institute.
Manuel Duval, PhD
Manuel Duval is a French American Life Scientist, trained both in France and in the US, and with professional experiences in both continents, at Rhône-Poulenc and Pfizer. Duval earned his PhD in Biochemistry in 1996 at the University Joseph Fourier Grenoble, France and completed his post-doctoral work at Texas A&M with computer science training. For the past 10 years, he has been a computational biologist practitioner in an outstanding centenarian Drug R&D organization headquartered in New York City and funded by German entrepreneur chemists, Charles Erhardt and Charles Pfizer. He co-founded a Drug Discovery 2.0 organization called Network Therapeutics.
Aris Economides, PhD
Aris N. Economides joined Regeneron Pharmaceuticals Inc in 1992 and he currently holds the position of Sr. Director, leading two groups: Genome Engineering Technologies, and Skeletal Diseases TFA. Economides is a co-inventor of the Cytokine Trap technology that led to the development of the IL-1 trap, a currently approved biologic drug (ARCALYST). He is also a co-inventor of the VelociGene® technology, which has led to the development of VelocImmune®, a method for the generation of all-human antibodies in mice. More recently, he has been spearheading the development of new methods for the generation of transgenic mice using BAC as transgene vectors, and has also pioneered a new method for generating conditional alleles.
Gustavo Stolovitzky, PhD
Gustavo Stolovitzky is manager of the Functional Genomics and Systems Biology Group at the IBM Computational Biology Center in IBM Research. The Functional Genomics and Systems Biology group is involved in several projects, including DNA chip analysis and gene expression data mining, the reverse engineering of metabolic and gene regulatory networks, modeling cardiac muscle, describing emergent properties of the myofilament, modeling P53 signaling pathways, and performing massively parallel signature sequencing analysis.
Stolovitzky received his MSc in Physics, from the University of Buenos Aires in 1987 and his PhD in mechanical engineering from Yale University. After that he worked at The Rockefeller University and at the NEC Research Institute before coming to IBM. He has served as Joliot Invited Professor at Laboratoire de Mecanique de Fluides in Paris and as visiting scholar at the physics department of The Chinese University of Hong Kong. Stolovitzky is a member of the steering committee at the Systems Biology Discussion Group of the New York Academy of Sciences. In addition, Stolovitzky is a Fellow of the American Physical Society, a fellow of the American Association for the Advancement of Science, and an adjunct Associate Professor at Columbia University.
Jennifer Henry, PhD
The New York Academy of Sciences
Jennifer Henry received her PhD in plant molecular biology from the University of Melbourne, Australia, with Paul Taylor at the University of Melbourne and Phil Larkin at CSIRO Plant Industry in Canberra, specializing in the genetic engineering of transgenic crops. She was then appointed as Associate Editor, then Editor, of Functional Plant Biology at CSIRO Publishing. She moved to New York for her appointment as a Publishing Manager in the Academic Journals division at Nature Publishing Group, where she was responsible for the publication of biomedical journals in nephrology, clinical pharmacology, hypertension, dermatology, and oncology. Henry joined the Academy in 2009 as Director of Life Sciences and organizes 35–40 seminars each year. She is responsible for developing scientific content in coordination with the various life sciences Discussion Group steering committees, under the auspices of the Academy's Frontiers of Science program. She also generates alliances with outside organizations interested in the programmatic content.
Duane C. Hassane, PhD
Duane Hassane is an Assistant Professor of Pathology and Laboratory Medicine at the Institute for Computational Biomedicine of Weill Cornell Medical College in New York City. Hassane received his BS in 1995 at the University of Rochester. In 2002, he received his PhD in Microbiology, Immunology, and Molecular Genetics at the University of Kentucky, subsequently returning to the University of Rochester where he trained as postdoctoral fellow in the laboratory of Craig Jordan, focusing on leukemia stem cell targeting. Currently, at Weill Cornell Medical College, Hassane's laboratory focuses on the use of chemical genomic strategies to accelerate the translation of pre-clinical discoveries to clinic through development of novel treatment modalities and therapeutic combinations.
Rui-Ru Ji, PhD
Rui-Ru Ji graduated from the University of Science and Technology of China with a BS in Biology. After college Rui-Ru came to the United States and obtained a PhD in Molecular Biology and a MS in Computer Science from the Purdue University. Rui-Ru joined Celera Genomics in 2000 and was part of the team decoding the human and mouse genomes. In 2002 Rui-Ru moved to New Jersey and has since been working in the pharmaceutical industry. Rui-Ru first worked at Purdue Pharma LP and led their bioinformatics effort to support target identification and validation. In 2005 Rui-Ru joined Bristol-Myers Squibb and has been working on various Oncology and Immunology discovery and development programs.
Aravind Subramanian, PhD
Aravind Subramanian is a Research Scientist in the Cancer Program at the Broad Institute of MIT and Harvard. He leads a team of molecular biologists, computational biologists, and software engineers whose focus is on developing new technologies and algorithms for large-scale mRNA profiling and analysis. As a graduate student in the Whitehead Institute Center for Genome Research, Subramanian helped develop Gene Set Enrichment Analysis (GSEA), a widely cited knowledge-based algorithm for the interpretation of high-dimensionality genomic datasets. In collaboration with colleagues in the RNAi platform at the Broad Institute, he developed computational methods to analyze genome-scale pooled shRNA screens for the identification of essential genes in cancer cells (RIGER). He is currently collaborating with members of the Todd Golub laboratory to implement a high-throughput, medium-density, low-cost gene expression-profiling platform. The current focus of the group is to use this technology to massively scale-up the Connectivity Map database to include over 1M perturbational profiles.
Kahn Rhrissorrakrai received a BS in molecular biology from Emory University in Atlanta, GA. After graduating in 2002, he worked at the Centers for Disease Control and Prevention in Atlanta, GA, where he studied Congenital Rubella Syndrome and worked to develop a quantitative diagnostic assay. Kahn is now a PhD student at New York University where he also received an MS in computational biology. He is currently studying the dynamics of gene and functional module usage in animal development using both graph-theoretic and probabilistic models.