eBriefing

Computing Research: Qatar Foundation Annual Research Forum 2011

Computing Research: Qatar Foundation Annual Research Forum 2011
Reported by
Don Monroe, PhD

Posted February 15, 2012

Presented By

Overview

The Qatar Foundation Annual Research Forum convened for the second time from November 20 – 22, 2011, in Doha, to discuss progress and challenges in transforming Qatar from a resource-based to a knowledge-based economy and in creating a more sustainable future. The Foundation recruited eminent scientists and leaders from Qatar and around the world to share their insights on how to build a robust R&D infrastructure, encourage regional and worldwide collaborations, and foster entrepreneurship in Qatar. One day of the forum was devoted to a series of research presentations in five areas: energy, environmental, biomedical, computing, and arts and humanities research.

This eBriefing looks at the research presented in the computing track, which focused on the areas in which the newly founded Qatar Computing Research Institute is concentrating: Arabic language technologies, social computing, scientific computing, cloud computing, and data analytics. A panel of distinguished experts in computing research challenged the presenters to consider new ways of thinking about their experimental designs and their results. In addition to the research presentations, a workshop met to discuss computing challenges in data, from content understating to scalable analytics.

Use the tabs above to find a meeting report and multimedia from the Annual Research Forum Computing track.

Presentations available from Workshop 6 — Computing Challenges in Data:
Sihem Amer-Yahia, PhD (Qatar Computing Research Institute)
Ihab Francis Ilyas Kaldas, PhD (Qatar Computing Research Institute)
Christopher Ré, PhD (University of Wisconsin, USA)
Lew Tucker, PhD (Cisco Systems, Inc.)
Stephan Vogel, PhD (Qatar Computing Research Institute)
Q&A Session

A report and multimedia presentations from the forum-wide sessions can be found in the Building a Knowledge-based Economy in Qatar eBriefing.

Reports on the individual research tracks can be found at:
Arts, Humanities, Social Sciences, and Islamic Studies Research eBriefing
Biomedical Research eBriefing
Energy Research eBriefing
Environmental Research eBriefing

For speaker abstracts, download the Annual Research Forum Proceedings here.
For speaker biographies, download the Annual Research Forum Program book here.


Presented by

  • Qatar Foundation

Distinguished Research Award Sponsors

  • ExxonMobil
  • Total
  • Shell

Other Sponsors

  • Chevron
  • Carnegie Mellon University in Qatar

Scientific Publication Partner

  • Bloomsbury Qatar Foundation Journals

Why Do We Urgently Need a Science of the Social Web?


Sihem Amer-Yahia (Qatar Computing Research Institute)
  • 00:01
    1. Introduction
  • 00:50
    2. The importance of using the social web
  • 02:30
    3. End users' point of view
  • 05:20
    4. Dilemma with reviews and ratings
  • 08:11
    5. Meaningful difference mining
  • 12:25
    6. How to find the most popular item
  • 14:31
    7. Creation of inverted list
  • 16:48
    8. What the social web offer

Workshop 6: Introduction


Ihab Francis Ilyas Kaldas (Qatar Computing Research Institute)

Workshop 6: Q&A Session


Moderator: Ihab Francis Ilyas Kaldas (Qatar Computing Research Institute)

Hazy: Deeply Analyzing Data from a Wide Variety of Sources


Christopher Ré (University of Wisconsin)
  • 00:01
    1. Introduction
  • 00:25
    2. Hazy data management
  • 01:30
    3. Deeper understanding of data
  • 03:49
    4. Web extraction and classification of products
  • 05:27
    5. Infrastructure FELIX
  • 08:58
    6. TAC-KBP challenge
  • 09:56
    7. The ICECUBE collaboration
  • 13:42
    8. Application takeaways
  • 15:31
    9. Future direction

Cloud Computing: The Time Is Now


Lew Tucker (Cisco Systems, Inc.)
  • 00:01
    1. Introduction
  • 01:52
    2. Shift from traditional data centers
  • 03:31
    3. What is cloud computing?
  • 04:30
    4. Marketplace commodity for computing
  • 06:25
    5. Operating traditional data center
  • 07:57
    6. Cloud based data center
  • 12:41
    7. Large scale computing
  • 15:51
    8. Web approach vs. enterprise approach
  • 17:30
    9. Open source software framewor

Language Technologies: Enabling Communication in a Global Village


Stephan Vogel (Qatar Computing Research Institute)
  • 00:01
    1. Introduction
  • 00:25
    2. Commonalities in villages to globalization
  • 01:31
    3. How social networks link communities
  • 03:54
    4. Developing machine/statistical translation
  • 07:19
    5. Extraction of valuable knowledge from data
  • 12:42
    6. Translation of new sentences
  • 14:55
    7. Scaling up of sentences using data
  • 17:38
    8. Using pictures as data
  • 19:48
    9. The benefit of dat

Arabic Language Technologies

Websites

Arabic Language Technologies at the Qatar Computing Research Institute
Qatar Computing Research Institute is dedicated to promoting the Arabic language in the information age by conducting world-class research in all aspects of Arabic language technologies. In addition, the Institute actively participates in national, regional, and international standards bodies involved in defining the Unicode bi-directional algorithm for multi-lingual documents as well as the proposed multi-lingual Internet Domain Name Service.

Roboceptionist 1.0
Introduction to the Hala roboceptionist project. Hala is a robot receptionist (or roboceptionist) that can be found at the front desk of Carnegie Mellon, Qatar. She understands written English/Arabic and can speak back to you in either language.

Shafallah Center
The Shafallah Center was established in 1999 at the behest of Her HighnessSheikha Mozah bint Nasser al Missned so that Qatari society could providecomprehensive services to children with disabilities. The Shafallah Centeris the first facility of its kind in world.

Publications

Al-Sabbagh R, Girju R, Hasegawa-Johnson M, et al. Using web-mining techniques to build a multi-dialect lexicon of Arabic. Presented at: Linguistics in the Gulf III; March 6-7, 2011; Doha, Qatar.

Ekström J, 2011. Mahalanobis' distance beyond normal distributions. [preprint online.]

Elmahdy M, Gruhn R, Abdennadher S, Minker W. Rapid phonetic transcription using everyday life natural chat alphabet orthography for dialectal Arabic speech recognition. Presented at: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 22-27, 2011; Prague, Czech Republic.

Jaoua A, Lebda W, Al-Jaam J. Automatic structuring of search results of Arabic and English meta-dearch rngines by formal concepts extraction. International Journal of Computer Science and Engineering in Arabic, Vol. 3, 2009, Philips Publishing Company.

Jaoua A, Al-Saidi MA, Othman A, Abdullah F, Mohsen I. Automatic Arabic text structuring by extracting optimal concepts and its utilization for browsing. International Journal of Computer Science and Engineering in Arabic, Vol. 2, 2008, Philips Publishing Company.

Makatchev M, Lee MK, Simmons R. Relating initial turns of human-robot dialogues to discourse. International Conference on Human-Robot Interaction, March 2009.

Mohit B, Liberato F, Hwa R. Language model adaptation for difficult-to-translate phrases. In: Proceedings of the 13th Annual Conference of the European Association for Machine Translation (EAMT-09) May 14-15, 2009, Barcelona, Spain.


Scientific Computing

Websites

Introduction to Parallel Programming and MapReduce
This tutorial from Google Code University covers the basics of parallel programming and the MapReduce programming model. The pre-requisites are significant programming experience with a language such as C++ or Java, and data structures & algorithms.

Parallel Programming and Computing Platform | CUDA | NVIDIA
NVIDIA resource page for parallel computing with graphics processors.

Qatar Computing Research Institute
Qatar Computing Research Institute conducts world-class, multidisciplinary computing research that is relevant to the needs of Qatar, the wider Arab region, and the world, leveraging Qatar's unique historical, linguistic, and cultural heritage. QCRI disseminates the results of this research through community outreach and technology transfer activities. Research Centers include: Arabic Language Technologies, Social Computing, Scientific Computing, Cloud Computing, and Data Analytics.

Publications

Haoudi A, Bensmail H. Bioinformatics and data mining in proteomics. Expert Rev. Proteomics 2006;3(3):333-343.

Hunter-Zinck H, Musharoff S, Salit J, et al. Population genetic structure of the people of Qatar. Am. J. Hum. Gen. 2010;87(1)17-25.

Suhre K, Wallaschofski H, Raffler J, et al. A genome-wide association study of metabolic traits in human urine. Nat. Genet. 2011;43(6)565-569.

Suhre K, Meisinger C, Döring A, et al. Metabolic footprint of diabetes: a multiplatform metabolomics study in an epidemiological setting. PLoS ONE 2010;5(11)e13953.


Video

Publications

Hsu C, Hefeeda M. 2011. Flexible broadcasting of scalable video streams to heterogeneous mobile devices. IEEE Transactions on Mobile Computing 10, 406.

Kolar V, Bharath K, Abu-Ghazaleh NB, Riihijarvi J. 2009. Contention in multi-hop wireless networks: model and fairness analysis. ACM-IEEE International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems (MSWiM).

Kolar V, Razak S, Abu-Ghazaleh NB. 2010. Interaction engineering: taming of the CSMA. ACM-IEEE International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems (MSWiM).

Liu Y, Hefeeda M. 2010. Video streaming over cooperative wireless networks. MMSys '10 Proceedings of the first annual ACM SIGMM conference on Multimedia systems.


Networks

Journal Articles

Mtibaa A, Harras K. Social-based trust mechanisms in mobile opportunistic networks. Paper presented at: The First ICCCN Workshop on Social Interactive Media Networking and Applications (SIMNA), July 31 – August 4, 2011, Maui, Hawaii, USA.

Mtibaa A, May M, Ammar M. 2010. On the relevance of social information to opportunistic forwarding. Paper presented at: The 18th Annual Meeting of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, August 17 19, 20190, Miami Beach, Florida, USA.


Data Analytics

Websites

HAZY: An Operator-based Approach to Statistical Data Analysis
HAZY is a relational operator-based approach to statistical data analysis. The hypothesis behind Hazy is that a large fraction of a diverse set of statistical applications can be captured using a small handful of primitives. To understand this hypothesis, Professor Ré's group at the University of Wisconsin is building several applications to exercise Hazy's primitives.

Qatar Computing Research Institute
Qatar Computing Research Institute conducts world-class, multidisciplinary computing research that is relevant to the needs of Qatar, the wider Arab region, and the world, leveraging Qatar's unique historical, linguistic, and cultural heritage. QCRI disseminates the results of this research through community outreach and technology transfer activities. Research Centers include: Arabic Language Technologies, Social Computing, Scientific Computing, Cloud Computing, and Data Analytics.

Publications

Ambati V, Vogel S. 2010. Can crowds build parallel corpora for machine translation systems? In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk.

Ambati V, Vogel S, Carbonell J. 2010. Active learning and crowd-sourcing for machine translation. In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10).

Amer-Yahia S, Das M, Das G, Yu C. MRI: Meaningful interpretations of collaborative ratings. In: Proceedings of the International Conference on Very Large Databases (VLDB), 2011;4(11):1063-1074.

Carolan J, Gaede S, Baty J, et al. 2009. Introduction to cloud computing architecture. [White Paper] Sun Microsystems.

Chen F, Feng X, Ré C, Wang M. 2012. Optimizing statistical information extraction programs over evolving text. Paper to be presented at: IEEE International Conference on Data Engineering (ICDE); April 1-5, 2012, Washington, DC, USA.

Niu F, Zhang C, Ré C, Shavlik J. 2011. Felix: scaling inference for Markov logic with an operator-based approach. [In submission] ArXiv e-print

Roy SB, Amer-Yahia S, Chawla A, Das G, Yu C. Space efficiency in group recommendation. The VLDB Journal—The International Journal on Very Large Data Bases, 2010;19(6):877-900.

Speakers

Computing Research Panelists

Richard DeMillo, PhD

Georgia Institute of Technology, USA
e-mail | website

Karem Sakallah, PhD

University of Michigan, USA
e-mail | website

Lew Tucker, PhD

Cisco Systems, USA

Speakers

Sihem Amer-Yahia, PhD

Qatar Computing Research Institute
e-mail | website

Halima Bensmail, PhD

Qatar Computing Research Institute
e-mail | website

Othmane Bouhali, PhD

Texas A&M University at Qatar
e-mail | website

Mohamed Elmahdy, PhD

Qatar University
e-mail

Mohamed Hefeeda, PhD

Qatar Computing Research Institute
e-mail | website

Bahattin Karakaya, PhD

Qatar University
e-mail

Mansour Karkoub, PhD

Texas A&M University at Qatar
e-mail | website

Vinay Kolar, PhD

Carnegie Mellon University in Qatar
e-mail | website

Abderrahmen Mtibaa, PhD

Carnegie Mellon University in Qatar
e-mail | website

Behrang Mohit, PhD

Carnegie Mellon University in Qatar
e-mail | website

Marwa Khalid Qaraqe

Texas A&M University at Qatar
e-mail

Christopher Ré, PhD

University of Wisconsin, USA
e-mail | website

Moutaz Saleh, PhD

Qatar University
e-mail | website

Ali Sheharyar

Texas A&M University at Qatar

Lew Tucker, PhD

Cisco Systems

Stephan Vogel, PhD

Qatar Computing Research Institute
e-mail

Serhan Yarkan, PhD

Texas A&M University at Qatar
e-mail

Student Speakers

Amna Alzeyara

Carnegie Mellon University in Qatar
e-mail

Sheikha Ali Karam

Qatar University

Rawan AlSaad

Qatar University
e-mail


Don Monroe

Don Monroe has nearly 20 years of physics and technology research experience and 8 years of experience as a freelance science and technology writer. He has a PhD in physics from MIT and worked at Bell Labs for many years. He also has a Masters Degree in Journalism from New York University's Science and Environmental Reporting Program. He has written for New Scientist, Scientific American, PLoS Biology, The Scientist, Communications of the ACM, Technology Review, Physical Review Focus, and SciDAC Review, and has been a writer for the New York Academy of Sciences eBriefings program for many years.

Computing research panelists:
Richard DeMillo, Georgia Institute of Technology, USA
Karem Sakallah, University of Michigan, USA
Lew Tucker, Cisco Systems, Inc.

Introduction

To support Qatar's transition to a knowledge-based economy, the Qatar Foundation is pursuing research in biomedicine, energy, the environment, computing science, and arts, social sciences, humanities, and Islamic studies. The foundation's Annual Research Forum, held in Doha on November 20–22, 2011, included sessions highlighting each of these focus areas.

One important facet of the Foundation's computing science activities is the establishment in 2010 of the Qatar Computing Research Institute (QCRI). QCRI joins the Qatar Biomedical Research Institute and the Qatar Environment and Energy Research Institute as part of a framework for intensive exploration of critical issues, exploration to be conducted in parallel with other research supported by the Qatar Foundation and its Qatar National Research Fund. Currently, the QCRI programs are organized into five specific areas: Arabic language technologies, social computing, scientific computing, cloud computing, and data analytics.

In addition to sessions about the overall goals of the foundation and about Qatar's strategic priorities, the research forum included several sessions specifically related to computing sciences. One workshop included several distinguished researchers discussing the challenges of managing the growing flood of data on several levels. For example, the business model in which companies buy computing services online can help them adapt to changing conditions and focus on their core business, said Lew Tucker, vice president and chief technology officer of Cloud Computing at Cisco. Christopher Ré of the University of Wisconsin described how many data-intensive problems can be managed by designing reconfigurable tools for statistical analysis.

Two researchers from the Qatar Computing Research Institute described other fields that are being transformed by massive amounts of data. Sihem Amer-Yahia is looking at approaches that use news sources for social data management and mining by exploring algorithms, methods, tools, and infrastructure to extract value and identify emergent phenomena from aggregate social network data. In particular, the MAQSA project with MIT and Al Jazeera aims to better organize and analyze the opinions or ratings that users express online.

Large amounts of translated text help Stephan Vogel build translation software, although he stressed that better language models are also critical. The meeting also comprised focused sessions featuring local computing researchers. These oral presentations, including several by student researchers, covered a wide variety of topics, and poster sessions addressed even more.

The computer analysis of written and spoken Arabic language combines regional needs with cutting-edge computing research. One presentation addressed the processing of various Arabic dialects that differ substantially from the more widely studied Modern Standard Arabic. Other presentations described procedures for annotating text resources and covered the feature-based classification of Arabic documents. Even when processing tools exist in other languages, it is important to tune them for Arabic speakers. Presentations about such projects included a description of an animated robot and of educational tools for special-needs children.

Computational research can also directly benefit other research areas that require extensive scientific computation. One group of researchers used the graphics chips designed for computer games to perform powerful parallel computations of fluid flow. Two other talks highlighted opportunities in bioinformatics, an area of growing interest both internationally and at the Qatar Computing Research Institute. One presentation, recognized by the annual Qatar Foundation research award, described the work of teams of scientists that are dissecting the cellular profiles of proteins or metabolites to study obesity and diabetes. Another project uses the growing power of cloud computing, distributing computing tasks among multiple remote servers, to perform the computationally intensive job of genetic sequence alignment.

Technology is also providing new and challenging sources of data from video imaging. Talks on this topic described how computer processing can unravel warped images from a simple pipeline inspection system and how images from multiple cameras can be combined to reduce the amount of data transmitted from a network of cameras. Another presentation discussed the identification of copied versions of three-dimensional videos, while another described systems that would allow computerized vision systems to determine facial orientation or body position, providing input for computer controllers or assistive devices.

The increasing power of everyday devices is making it possible to imagine networks dynamically established between peers, rather than dependent on fixed infrastructure. In designing protocols for such networks, useful nodes must not be forced to bear undue burdens, or their users may quit participating. Nonetheless, the adaptable nature of a peer-to-peer network can compensate for the limitations of poor links, such as those in underwater acoustic networks for pipeline maintenance. Traditional wireless networks also need techniques to identify interference, and ways to allow lower-priority signals to share surplus bandwidth without degrading service for the primary users.

 

 

Speakers:
Mohamed Elmahdy, Qatar University
Behrang Mohit, Carnegie Mellon University in Qatar
Moutaz Saleh, Qatar University

Student speakers:
Amna Alzeyara, Carnegie Mellon University in Qatar
Sheikha Ali Karam, Qatar University

Highlights

  • Automated analysis of Arabic speech should address not only Modern Standard Arabic, but the various regional dialects that are essentially different languages.
  • With proper training, undergraduates can reproducibly annotate Arabic text for vocabulary, meaning, and to some degree, syntax.
  • An animated on-screen face should employ established principles of facial dynamics as well as facial motions that are specific to Arabic speech.
  • Customizable Arabic-language educational software can help children with special needs when trained instructors are unavailable.
  • Accurate classification of Arabic news documents can be achieved by categorizing them according to many isolated features.

Introduction

Computer processing of language, both in text and in speech, is an area of active international research that can provide huge benefits in productivity and information management. Most research, however, has centered on processing of English and other European languages. The Qatar Foundation and the Qatar Computing Research Institute have both recognized Arabic-language processing as an opportunity for high-quality computing research with tremendous regional relevance.

Although Arabic is very widely used, its more than 250 million speakers employ many dialects. These dialects differ enough from the formal Modern Standard Arabic (MSA) in pronunciation, vocabulary, and syntax, that they are "usually considered as completely different languages," said Mohamed Elmahdy of Qatar University.

These differences pose a major challenge for practical machine processing of Arabic, such as speech recognition and speech-to-text translation. For example, Elmahdy explained that statistical machine translation algorithms, such as the commonly available translation tools for websites, typically use Bayesian classifiers. This approach compares large databases of words in two different languages, in order to determine the most likely translations from one language to the other. But as Elmahdy points out, "to train these models, you need large, parallel annotated corpora" of speech in both languages. Such annotated data sets are not available for dialectal Arabic.

Adapted Egyptian colloquial Arabic (ECA) acoustic modeling, based on incorporation of the Modern Standard Arabic (MSA) acoustic modeling, results in a 41.8% reduction in word error rate (WER). (Image courtesy of Mohamed Elmahdy)

There is a large corpus for MSA, however, and Elmahdy and his colleagues hope to exploit it for processing dialectal Arabic. That's because most speakers of Arabic dialects also speak MSA, but with distinctive accents. "We assume that MSA is always a second language," he said, so MSA samples from many speakers should include pronunciation specific to all the dialects. Using this approach to adapt MSA acoustic models, they reduced the word error rate for Egyptian Arabic from 25% to about 7%. "Improvements are observed in both phonemic and graphemic modeling," he noted. Phonemes are the basic sounds that make up a spoken language, while graphemes are the elements of the written language.

Creating new annotated Arabic resources

"Natural language processing usually uses a lot of human-evaluated data for training, tuning, and evaluation," said Behrang Mohit of Carnegie Mellon University in Qatar. That's because machine-based algorithms miss many nuances of translation, unless they're tuned with sample texts that have been annotated by people. Although news reports are often used for this purpose, these "may not generalize well," Mohit said. To include data from diverse domains, he and his colleagues have been training students to identify the roles of various words in entries from Arabic Wikipedia, including seven articles each in history, science, sports, and technology.

The students were trained in three layers of annotation. The most straightforward annotation was flagging named entities. The second layer concerned semantics, which "requires a deep understanding of the language and the topic," Mohit said. In the third layer of annotation, "we want our annotators to highlight the syntactic structure of the sentence and the role of each word," he said. This effective sentence diagramming poses "a lot of challenges."

The researchers found that undergraduate students agreed surprisingly well in their annotations. The team also developed a process to allow their annotators to introduce new classification terms. "One asset we have in Qatar is that we have a multi-dialectical Arab society," Mohit said. "This is a great resource for us."

Animating Arabic

Some Qatar investigators have tackled the visual side of Arabic communication. Computer-generated human-like images that fall just short of perfection can be more unsettling to users than an obvious cartoon. To avoid this "uncanny valley," Amna Alzeyara of Carnegie Mellon University in Qatar said in her student presentation, "Expressions of the robot and lip movements must be consistent with the culture." This work won the Qatar Foundation award for best student research in computing sciences.

She and her colleagues have been adapting an animated on-screen robot called Hala for more natural Arabic communication. Since the Hala program was designed for English, Arabic phonemes had been modeled with English approximations. The team used a facial action coding system, like that used in computer-generated movies, to capture lip movements and other facial motions. They adapted existing research on elementary "visemes" of facial motion corresponding to 28 Arabic phonemes, modeled using underlying muscle primitives to generate natural-looking face dynamics.

"It's not perfect yet, but we are working on making her look perfect," Alzeyara said, in particular making the facial movements for entire sequences of words look realistic. "Stitching together visemes doesn't necessarily work," she said. "We need a viseme-to-viseme transition."

Applications of computation

There is a growing recognition that children with disabilities need special help, both in Qatar and around the world. Unfortunately, modern specialized institutions such as Qatar's Shafallah Center can only handle a small fraction of the need, due to a lack of trained staff and teachers, said Moutaz Saleh of Qatar University. To address that challenge, he and his colleagues are creating a series of Arabic-language tools to help special-needs students.

The system includes games and puzzles, and leads the students through a series of tasks involving math, science, religion, and daily-life skills. They also provide opportunities for parents to personalize the content, for example by inserting photographs of the child in the appropriate places in the software.

One important task in automated natural-language processing is classifying documents into particular categories, for example to identify significant financial events. In her student talk, Sheikha Ali Karam of Qatar University described a system for classification based on a support vector machine. These software tools are examples of "classifiers," which place each data point in one of two sets. After an initial "learning" period, the algorithm's reliability should increase.

A method for classifying documents based on domain ontology. (Image courtesy of Sheikha Ali Karam and
Ali Mohamed Jaoua)

Karam's system achieved 80% accuracy in classifying financial documents by determining which category is chosen most often when each category is compared one-on-one against each other category. In choosing categories, Karam and her colleagues found that this collective classification is most effective when the identifying features are isolated labels that do not overlap one another, rather than overlapping concepts that aim to precisely capture the desired idea.

Speaker:
Ali Sheharyar, Texas A&M University at Qatar

Student speakers:
Rawan AlSaad, Qatar University
Halima Bensmail, Qatar Computing Research Institute

Highlights

  • The graphics processing units in gaming systems contain powerful parallel processors and high memory bandwidth that are useful for large-scale scientific calculations.
  • Analyzing large biological data sets can reveal groups of molecules that react similarly and that can be used as signatures for biological or genetic differences.
  • Cloud computing could be more flexible than traditional implementations for large computational problems such as DNA sequence alignment.

Simulating fluid flow on graphics processors

Massive computations are increasingly central to scientific research, both for simulating complex systems whose behavior cannot be calculated and for analyzing the ever-larger data sets coming from modern experiments. Extending the capabilities of scientific computing can therefore benefit both computing science and specific scientific areas, notably bioinformatics.

The graphics processing units (GPUs) used for computer games combine immense computational power and parallelism with high memory bandwidth, said Ali Sheharyar of Texas A&M University at Qatar. He and his colleagues have been applying these powerful devices to simulations of fluid flow. In particular, they have exploited global technology company NVIDIA's GPUs, whose 512 parallel cores support their Compute Unified Device Architecture (CUDA) platform to allow high-level programming.

The team aims both to simulate and to visualize the motions of up to 100,000 particles, in order to build models of the flow patterns in microfluidic devices. These so-called "lab on a chip" devices move and mix experimental samples and reagents through a series of microscopic channels and wells, often conducting complex chemical and medical tests faster and more efficiently than traditional laboratory techniques. Improving these devices will require a better understanding of their complex fluid dynamics. Sheharyar's team has found that performing both the rendering and the simulation for the model on GPUs allows the researchers to modify the simulation rapidly as they learn more about its behavior.

The GPU was no faster than an Intel central processing unit for a smaller simulation of 1,000 particles, Sheharyar said, because of the time spent in transferring the data. "But when we increase the number of particles, we get a dramatic performance improvement, up to a factor of 18."

Computational tools for proteomics and metabolomics

Over the past decade, new experimental tools have begun to provide vast amounts of quantitative biological data. Analyzing this flood is a major challenge and an opportunity for researchers with skills that go beyond traditional biology. That analysis is one of the foci of the Qatar Computing Research Institute (QCRI). In a talk that received the Qatar Foundation award for best computing science research, Halima Bensmail of QCRI described two pilot projects focusing on the molecular basis of obesity and diabetes.

In one project, Bensmail and her colleagues are analyzing proteomic data, which comes from experiments using mass spectrometry, to identify the entire set of proteins in a tissue sample. They characterized human fat-cell cultures, one of which was chemically induced to resemble brown adipose tissue (BAT), and one of which resembled white adipose tissue. "BAT is good fat," Bensmail said, and it correlates inversely with body weight. White fat correlates directly with body weight. By comparing the two cell types, the researchers hope to identify—and perhaps find ways to modify—the molecular signatures that lead to a healthier fat profile.

The researchers used a novel algorithm to find 36 proteins which differ significantly in abundance between the two culture types. Some of the computational challenges, Bensmail said, are the very large data sets and the fact that the variations in the proteins' concentrations do not follow a normal distribution.

In a second project, Bensmail and her colleagues are analyzing measurements of chemicals produced by metabolism. She noted that genetic variants cause the levels of these chemicals to vary in different individuals. Determining the particular set of molecules that arise from a specific genetic variation is a complicated technical challenge, but the researchers have developed a promising preliminary method that produces not just a classification but a confidence estimate for it. They can now define the metabolic changes that are most likely to be associated with specific genetic variants.

Searching for sequences with cloud computing

Another computationally-intensive task in modern biology is to search for occurrences of specific sequences of DNA bases in an entire genome. The comparison itself is not terribly complicated, but efficiently examining millions of possible locations is time-consuming. In a student presentation, Rawan AlSaad of Qatar University described a procedure for performing this computation using cloud computing resources.

AlSaad exploited the established MapReduce framework for tapping the power of the cloud. She adapted an existing sequence-alignment tool, known as BFAST, to this framework. Although creating this flexible tool was a lot of work, the resulting software should be easier to deploy and manage and easier to scale to larger calculations.

A framework for efficient sequence alignment using MapReduce on the cloud (Image courtesy of Rawan AlSaad)

Currently, AlSaad noted, the need to upload large amounts of data to the remote computers may limit the gains from the efficient calculation, although "the effort of moving such data into the cloud could be worth it." In the future, she said, "sequencing data will go straight into the cloud."

Speakers:
Othmane Bouhali, Texas A&M University at Qatar
Vinay Kolar, Carnegie Mellon University in Qatar
Mohamed Hefeeda, Qatar Computing Research Institute
Mansour Karkoub, Texas A&M University at Qatar

Highlights

  • Software engineers can compensate for the distortions of a simple imaging system to inspect oil and gas pipelines.
  • Stitching together the images from different video cameras could reduce their bandwidth demands on wireless networks.
  • Recording how digital signatures change from frame to frame can efficiently identify copies of videos.
  • Video analysis of facial or body motions could be a rapid and intuitive way to control wheelchairs or other devices for physically challenged people.

Inspecting pipelines

Digital cameras are becoming ubiquitous, but their potential is far from being fully realized. Many computing science projects aim to exploit the enormous computational power available in different systems to process, analyze, and extract value from video inputs.

Many of the interesting projects sponsored by the Qatar Foundation cross the boundaries between disciplines. Othmane Bouhali of Texas A&M University at Qatar described one such project that applies computer science principles to the inspection of pipelines. "Oil and gas companies spend millions of dollars on inspecting pipelines, using sophisticated, expensive equipment," he noted.

Bouhali and his colleagues are exploring ways to automate the processing of panoramic videos provided by catadioptric vision systems. In these systems, the optics are relatively simple, consisting of a convex mirror, ideally hyperbolic or conical, and a simple camera. The resulting image is severely distorted, but the researchers showed that much of this distortion can be corrected by pre-calibrating the optical system. "The most important thing is the unwarping," Bouhali said. Simulations show that these corrections could be applied in real time to images from a robot moving steadily along a pipeline, in order to identify potential defects. In future work, the researchers plan to build a prototype inspection system.

Aggregating, identifying, and deploying video

As digital video cameras have become cheaper and more widespread, combining and navigating their collective output is a critical challenge. In addition, the bandwidth requirements of video make it difficult to assure quality of service, especially over wireless links.

Some of the challenges, said Vinay Kolar of Carnegie Mellon University in Qatar, are designing an efficient network, ensuring that the cameras are looking at distinct views, and efficiently archiving the videos for later review or analysis. He and his colleagues are exploring methods to aggregate video streams at intermediate points in the network, rather than relaying the full streams to a central location. "Different cameras carry overlapping information," he said, so they propose that "intermediate routers stitch together images from different cameras."

To implement the stitching in their experimental system, the researchers employ background detection and feature-point extraction to get salient points of the videos to match different viewpoints. As long as the cameras remain fixed, the parameters need not be re-evaluated. Nonetheless, Kolar said, "stitching is computationally difficult, and we couldn't do it in real time."

Producers of video, especially 3D video, would like to know when someone is re-using their content to be sure they are receiving appropriate compensation. Mohamed Hefeeda of Qatar Computing Research Institute described a system for identifying copies of 3D videos.

Copied videos can be altered in many ways that make it difficult for simple algorithms to identify them at the bit level. For example, they can be changed in size or orientation or encoded in a different format. For 3D videos, there are additional possibilities such as generating new views or different 3D formats.

Hefeeda and his colleagues developed a way to generate robust electronic signatures for videos. On its own, he said, "depth is fragile," meaning it changes with simple manipulations, so the researchers' signatures combined depth with visual features from object textures. However, the number of descriptors in each frame and across many frames was too large, so the signatures record "only the number of descriptors in each frame, and their changes across frames." The system they implemented had perfect precision and recall for unaltered parts of videos, and was still quite accurate for videos that had undergone several transformations.

The Spider system can identify copies of 3D videos. (Image courtesy of Mohamed Hefeeda)

Automated evaluation of body or facial gestures would be a huge boon for disabled people, allowing them to dispense with clumsy input devices such as joysticks. Mansour Karkoub of Texas A&M University at Qatar described two systems for analyzing videos of people, with the ultimate goal of providing input for controlling computer actions. The first system analyzes a video of a human face to determine its orientation, while the second determines limb motion with the help of colored tags attached to the body. "The motion is intuitive," Karkoub said, and the calculations can be completed in about 0.1 seconds, so they should be useful for controlling wheelchairs or other devices. "We can use this to help the physically challenged."

Speakers:
Abderrahmen Mtibaa, Carnegie Mellon University in Qatar
Bahattin Karakaya, Qatar University
Serhan Yarkan, Texas A&M University at Qatar

Student speaker:
Marwa Khalid Qaraqe, Texas A&M University at Qatar

Highlights

  • Networks can be dynamically assembled from cooperating mobile nodes, but the protocols must account for fairness, trust, and scalability.
  • A cooperative network can partially overcome the poor quality of underwater acoustic communication.
  • The statistical characteristics of radio-frequency signals can be combined with knowledge of propagation environments to identify interference sources.
  • Interfering sources can share radio spectrum, if lower-priority signals adapt their transmissions to avoid interfering with the primary network users.

Mobile opportunistic networks

The steadily increasing processing capabilities of consumer-grade electronic devices are creating new opportunities to use dynamically configured networks for communication and sensing. Allocating resources in such networks will require new algorithms and analysis techniques to ensure that they operate as expected and to evaluate their performance.

Traditionally, networks have carried traffic over fixed infrastructure; the wired telephone system is a prime example. Today, though, the increasing communication and processing power of individual mobile devices raises the possibility of ad-hoc networks that bypass the infrastructure. Such opportunistic networks can be made more efficient by providing locally relevant information, said Abderrahmen Mtibaa of Carnegie Mellon University in Qatar, and can continue working even when the traditional backbone is overloaded or shut down.

But users will need incentives to open up their devices to store, carry, and forward messages for strangers. "There are three main challenges," Mtibaa said: "fairness, trust, and scalability." Fairness is critical. To avoid overwhelming the most useful nodes, the protocol must ensure that traffic is widely shared, even at the cost of individual efficiency. There must also be procedures to establish trust, perhaps through distributed ratings in the network. Finally, networks need ways to find nearby nodes that provide a path to more distant subnetworks. Mtibaa and his colleagues are now trying to develop opportunistic networks that meet these goals, and they are simulating them using real-world interconnection data.

Optimizing radio frequency communication

Oil and gas pipelines, an important feature of Qatar's infrastructure, need frequent monitoring to ensure their security and integrity. For underwater pipelines, acoustic communications can carry pipeline monitoring information to stations onshore, said Bahattin Karakaya of Qatar University. But this transmission mode is often unreliable and limited in range and data rate, combining "the poor link quality of radio and the large latency [ten or hundreds of milliseconds] of satellite communication."

In this project, Karakaya and colleagues propose “cooperative communication” as an enabling technology to meet the challenging demands in underwater acoustic communication (UWAC). Specifically, they consider a multi-carrier and multi-relay UWAC system and investigate relay (partner) selection rules in a cooperation scenario. An extensive literature already exists on relay selection for terrestrial wireless RF systems. However, there are only sporadic results reported for UWAC applications. The team's simulations found that such "cooperative communication outperforms the direct link," Karakaya said, when they included both large-scale path loss and small-scale fading of the signals. The optimal assignment of carrier frequencies and relay nodes is "strikingly different from terrestrial [radio frequency] selection," Karakaya noted, and "relay placement is an important issue."

For everyday communications, interference from extraneous radio frequency sources is likely to be a growing problem as wireless networks become more widespread. The industrial, scientific, and medical (ISM) band at 2.4 GHz, for example, hosts microwave ovens as well as Wi-Fi computer networks, Bluetooth, cordless phones, and existing and planned cellular phone communication protocols. Reliably identifying sources of interference is critical to ensuring communication quality, said Serhan Yarkan of Texas A&M University at Qatar.

Handling interference requires identifying and locating its source, Yarkan said. "Identification is one of the most critical steps." He and his colleagues have proposed a method for identifying interference and its features, such as whether it comes in bursts or whether it hops from one frequency to another. The method accounts for the ways the environment can affect radio propagation, as well as for properties such as signal loss over long paths, signal "shadowing" by objects along the communication path, and local signal fading.

The researchers apply a special filter to identify signals on the basis of their statistical properties. The background in each environment determines a threshold above which a signal is declared to be interference. By accounting for the environment's effects on signal propagation, the team can determine the characteristics of the interference. "The interfering signal must also obey these propagation mechanisms," Yarkan said.

In many cases, sources of interference with wireless signals can not be eliminated, and must instead be managed. In a student presentation, Marwa Khalid Qaraqe of Texas A&M University at Qatar proposed two different strategies for transmitters to avoid competition for the same frequency, emphasizing either switching efficiency or bandwidth efficiency. Put another way, the transmitters can either move their signals to another frequency, or make them take up less space so they can share the frequency with other signals.

Both proposals envision a set of primary networks, which have priority in accessing the spectrum, coexisting with secondary networks that can use spectrum only if they do not interfere with the primary networks. This accommodation will often be possible, Qaraqe said, because currently the "spectrum is largely underutilized in most bands." The secondary networks adjust their transmission power, perhaps distributing it among multiple antennas, while checking that their own error rate is acceptable but that the primary network is not degraded. If no acceptable configuration is found, the secondary network will hold onto its data and wait before trying again.

Panel Members:
Sihem Amer-Yahia, Qatar Computing Research Institute
Christopher Ré, University of Wisconsin
Lew Tucker, Cisco Systems, Inc.
Stephan Vogel, Qatar Computing Research Institute

Highlights

  • Cloud computing, a new economic model for providing flexible, scalable access to computing resources, is changing the way companies manage information technology.
  • Statistical analysis can extract information in many contexts, but tools for it should be easily reconfigurable for different formats and goals.
  • Combining statistical analysis of large bodies of human-translated text and other sources with traditional grammar-based translation can improve machine translation.
  • Social networks offer an enormous amount of data, but there is no consensus on how to extract and analyze this information efficiently.

Cloud computing

Many of the challenges of computing sciences in the coming decades involve processing and analyzing the enormous amounts of data available in our increasingly digital world. The Qatar Computing Research Institute sponsored a session at which distinguished external and internal speakers addressed challenges of data acquisition and analysis that span the Institute's five main areas: Arabic language technologies, cloud computing, social computing, data analytics, and scientific computing.

Although there are many definitions of cloud computing, Lew Tucker, Vice President and Chief Technology Officer for Cloud Computing at Cisco, said it is "not just access to applications, but access to basic computing resources over the internet. That last part is new."

In the case shown above, the services organization provides a cloud infrastructure service, on top of which applications are deployed fully configured with their own operating system and configuration. (Image courtesy of Lew Tucker)

"We are at the beginning of a major shift," Tucker said, analogous to the early days of electrical power, when "every business had to have its own generation capabilities," and had employ its own electrical engineers. With improved technology and distribution, companies could better focus on their real businesses, and a similar transition is happening in information technology. "Most businesses would rather not own and run their own data centers," he said.

"Of course it is overhyped," Tucker said. But "just because there wasn't some new invention," he added, "don't be fooled: this is real, fundamental change." Competition is turning computing into an inexpensive commodity, with a current price of about ten cents an hour per virtual machine. In addition, "you're only paying for what you use," Tucker stressed. Just as important, the computing resources can be scaled up much more rapidly than the hardware, for example during the potentially explosive early growth of a startup. Tucker sees that flexibility as a major advantage of cloud computing.

Designing software for this new environment requires new strategies, Tucker said, because cobbling together servers and clients over Internet connections means that "failures are a fact of life," is a critical challenge, said Tucker. With the growing flood of data from unreliable sensors will make robust design even more important. "There's data everywhere," he said. "Whatever platforms have the most users, the most data, and the most developers are the ones that will win."

Statistical tools

"Today, data are available in an unprecedented number of formats," said Christopher Ré of the University of Wisconsin, but there is "a continuing arms race to more deeply understand the data." Ré and his colleagues have turned to statistical processing to address these issues. Their goal is not to build complex tools tuned to specific applications but to create building blocks that can be reused for a wide range of purposes. As Ré puts it, he wants to find "common themes in these individual, diverse-looking applications," and "find common abstractions to build systems that are easy to deploy and maintain over time." He added that "the next great breakthrough in data analysis isn't in any individual algorithm ... the key is to marry simple, robust, statistical tools with very simple abilities to filter and transform data, and to be able to scale to terabytes."

Ré described how his team's implementation of the formal mathematical framework of convex programming addressed two very different challenges. They quickly developed a program that performed well in the Text Analysis Conference Knowledge-Based Population Challenge (TAC-KBP), a competitive test of language-processing algorithms. This showed that "simple tools, rapidly combined, can yield very, very high quality," Ré said. The researchers also collaborated successfully with detector physicists to identify candidate neutrino-detection events. Even though "physicists have a different repertoire than data miners," Ré noted, "statistical processing enables a huge number of applications."

Many ways to translate

"Social networks go back to the original village in a funny way," said Stephan Vogel of Qatar Computing Research Institute, adding that modern social networking sites tend to create closely linked communities of a few hundred people each. But unlike the village, people in social networks live in different environments and have different cultural backgrounds, history, and values, and they speak different languages. "Information exchange and communication in a multilingual society requires learning foreign languages, or hiring a translator or interpreter, or machine translation," Vogel said, adding that "actually we should use all of these."

Traditionally, translation software has been rule-based, adopting formal grammatical structures like those taught in schools. "In recent years, a different approach has developed, which learns from data, which is called statistical translation," Vogel noted. But rather than seeing these approaches as competing, he said, "I think they are two different dimensions." Indeed, the ideal translation software would combine deep versions of both approaches.

The statistical approaches, exemplified by Google's automatic translations, rely on training with sentences or documents that have previously been translated by humans. These data are used to extract a "translation model" for how one language maps into another. In addition, Vogel said, "we use a language model for what is typical structure in a given language: which sentences make sense, which don't," Vogel said. "The goal is to translate sentences that we have not seen before."

"Sometimes you hear people say that to use statistical approaches you need a lot of data," Vogel said, citing a common figure of a million sentence pairs to develop a translation algorithm. "I think that is not a reasonable statement ... you do a little bit, you get a little bit. You do more, you get more," he said. The typical improvement is logarithmically related to the amount of data. "More data helps ... but also better modeling helps. If we want to improve our systems, it's not sufficient to only throw more data at it," he concluded.

Exploring the social web

"On the social web, there is a lot of data," said Sihem Amer-Yahia of the Qatar Computing Research Institute, referring to Facebook, Twitter, and other social websites. What do we have to do to understand these data? Most current techniques provide only "very coarse aggregates," Amer-Yahia noted, such as the average number of "stars" awarded by users. More useful summaries would recognize which reviews are more likely to be meaningful for a given user, she said, adding that "understanding data in the context of the social web has to be somewhat social; we need to be told more than just ‘this many people liked this article.'"

Some sites are beginning to provide these fine-grained breakdowns of responses. But "one argument that goes against this is that these are predefined demographics," Amer-Yahia noted, such as age or gender. "Who tells me that these demographics are the most appropriate for this dataset?" she asked, querying whether these categories correlate well with other features of the responses. To address that question, Amer-Yahia and her colleagues have been exploring a more neutral, and potentially more powerful, approach: "to use structure that is either hidden in the data or provided by the data."

The MAQSA project aims to analyze user-generated content on news sites. (Image courtesy of Sihem Amer-Yahia)

By comparing the ratings of many different users, the researchers hope to find groups who consistently give similar ratings to items, or who consistently disagree in their ratings. Compared to collecting data on pre-defined groups, Amer-Yahia warned, "this is computationally very intensive." But if many users share a common network profile, "potentially they could be clustered into a single user pool for which we maintain a single index," she said, adding that "there's a lot of potential of using shared behavior at every single layer of the social data-management stack."

"We need a science of the social web," Amer-Yahia said. "We need to stop building little tools that do one or two things."

What new principles are needed to adapt systems for processing written and spoken English for use with Arabic?

Can metabolic profiles associated with metabolic disorders be manipulated to change health outcomes?

Will the capabilities wireless video networks ultimately be limited by computation or bandwidth?

Will ad-hoc peer-to-peer networks have the power and robustness to replace fixed networks?

Will dedicating, company-owned computing become extinct in favor of cloud computing?

How can information mined from social networks be protected from intentional data manipulation?