Support The World's Smartest Network

Help the New York Academy of Sciences bring late-breaking scientific information about the COVID-19 pandemic to global audiences. Please make a tax-deductible gift today.

This site uses cookies.
Learn more.


This website uses cookies. Some of the cookies we use are essential for parts of the website to operate while others offer you a better browsing experience. You give us your permission to use cookies, by continuing to use our website after you have received the cookie notification. To find out more about cookies on this website and how to change your cookie settings, see our Privacy policy and Terms of Use.

We encourage you to learn more about cookies on our site in our Privacy policy and Terms of Use.

Computational Biology and Bioinformatics Discussion Group (2)

Computational Biology and Bioinformatics Discussion Group (2)

Thursday, January 26, 2006

The New York Academy of Sciences

Presented By

Presented by the Computational Bio & Bioinformatics Discussion Group


Organizer: Olga Troyanskaya, Princeton University

The Bioinformatics and Computational Biology Discussion group brings together diverse institutions and communities to share new and relevant information at the frontiers the inter-related fields of bioinformatics and computational biology. Recent topics have included "Benchmarking and Improving the Accuracy of Comparative Modeling of Protein Structures," "Integrated Statistical Modeling of Gene Expression Data" and "Estimating SNP Haplotype Frequencies from DNA Pools."


5:00 - 7:00 Presentations

Andrey Rzhetsky, Columbia University, "Of Truth and Pathways (and Text-Mining)?"

Mona Singh, Princeton University; "'Analyzing and Interrogating Protein Interaction Maps Using Network Schemas'."

Mark Gerstein
, Yale University, "Human Genome Annotation."


"Of Truth and Pathways (and Text-Mining)?"
Andrey Rzhetsky

The information overload in molecular biology is a mere example of the status common to all fields of the current science and culture: An ever-strengthening avalanche of novel data and ideas overwhelms specialists and non-specialists alike, unavoidably fragments knowledge, and makes enormous chunks of knowledge invisible/inaccessible to those who desperately need it. The help of relieving the information overload may come from the text-miners who can automatically extract and catalogue facts described in books and journals. In my talk I will try to touch (some of) the following questions: What is text-mining? In what ways is text-mining useful? What can large-scale analyses of scientific literature tell us about both active and forgotten knowledge? What can such analyses tells us about the scientific community itself? How modeling helps us to differentiate true and false statements in literature? How will text-mining help us to find cures for human and non-human maladies?

"Human Genome Annotation"
Mark Gerstein
A central problem for 21st century science will be the analysis and understanding of the human genome. My talk will be concerned with topics within this area, in particular annotating pseudogenes (protein fossils) in the genome. I will discuss a comprehensive pseudogene identification pipeline and storage database we have built. This has enabled use to identify >10K pseudogenes in the human and mouse genomes and analyze their distribution with respect to age, protein family, and chromosomal location. One interesting finding is the large number of ribosomal pseudogenes in the human genome, with 80 functional ribosomal proteins giving rise to ~2,000 ribosomal protein pseudogenes. I will try to inter-relate our studies on pseudogenes with those on tiling arrays, which enable one to comprehensively probe the activity of intergenic regions. At the end I will bring these together, trying to assess the transcriptional activity of pseudogenes. Throughout I will try to introduce some of the computational algorithms and approaches that are required for genome annotation and tiling arrays -- i.e. the construction of annotation pipelines, developing algorithms for optimal tiling, and refining approaches for scoring microarrays.