Data as Design
Using The New York Times as a data source, Mark Hansen creates a unique exhibit at the intersection of algorithms and art.
The co-creator of an exhibit hundreds of people have passed by every day for the past five years, Mark Hansen, professor of statistics at the University of California, Los Angeles, speaks with The New York Academy of Sciences Magazine about the Moveable Type exhibit—a fixture in the lobby of The New York Times Building in New York City since 2007.
The exhibit is made up of 560 vacuum-fluorescent screens that display phrases, words, and numbers from the newspaper, the Times archives, and the activity of visitors on NYTimes.com. The content presented on the screens is governed by statistical methods and carefully programmed natural-language processing algorithms, organized into a dozen or so "scenes," each presenting data on a different theme (quotes from the day's paper or questions posed in recent articles) or from a different section of the paper (weddings, obituaries, crosswords, and soon, recipes).
NYAS: How did the Moveable Type project get started? What was the inspiration for it?
Hansen: It was an outgrowth of a previous work. My collaborator, artist Ben Rubin, and I had a piece exhibited in the Whitney Museum of American Art that pulled text from web-based chat rooms and bulletin boards; a group from The New York Times saw it and thought it would be interesting to do something similar in the lobby of their new building.
NYAS: How is the screen content generated?
Hansen: We get feeds of articles, blog posts, and user comments, which are automatically passed through statistical natural language processing algorithms. We parse each sentence and create a tree that represents its grammatical structure. Our scenes are really just filters on these trees. In addition to all this language, we also get a feed of the web access logs from NYTimes.com, as well as a sample of their search engine logs.
Setting up the software that fetches the data and parses it was pretty quick. In an age of APIs [application programming interfaces], this kind of work is painless. But the process of figuring out what to compute from the text and how the results of those computations should move across the displays in the lobby, that was a long, interactive process. We started working on this in the early 2000s in Ben's studio with a test wall of six screens. Over time, we built up a grid of about 50 screens, which let us experiment with presentation ideas, at least in a limited sense. We had to rethink much of this work, however, when the full grid of 560 screens was installed in the lobby at the Times. What looked interesting in the studio could fall flat in a busy midtown lobby. We spent about three months in the Times lobby doing programming for Moveable Type, looking at pacing for the work and seeing how people reacted to it.
Depending on what words or phrases we're displaying, sometimes we want large, bold text, sometimes we want small, tight text. By adjusting the text's size and how it moves down the lobby, we can control to some extent how the public interacts with the work. The crossword puzzle scene, for example, starts with a graphical representation of the crosswords (black and white squares) from the last 80 weeks. The puzzles are arranged so that all the Monday puzzles are in one row, all the Tuesdays in the next and so on. You can see the puzzles getting harder as the week goes on! After a few moments with the graphical representation of the puzzles, the squares fade out, and the puzzles start to play themselves, with answers appearing to the sound of a pencil on paper—560 puzzles playing at one time. This scene is a pretty direct representation of the data, but it illustrates how moving text and sound are used in the piece.
The dozen or so different scenes run on a cycle. Some are short, some are long, and the frequency varies too (some run once an hour, some run three times an hour). Oh, and we're still adding new themes. The Times has been very generous in that way.
NYAS: Can seemingly random data, when presented in a thoughtful way, tell us something about ourselves?
Hansen: All data come from somewhere; it's an act of recording something about the physical or virtual world or even a state of mind. Whether it's air quality or your blood pressure or mood over time, all data have context. Often, the public is conditioned to believe that data and data interpretation belong to someone else, some authority with specialized expertise like a statistician or a scientist. This has been the case for centuries, with data providing us surprising, often shocking views of our world. So much data collection now, however, is happening in the personal realm, on mobile phones, say, and these data are finding their way into research on health, urban planning, and social science. But the public knows quite a lot about the context around these data and can (and should!) have a role in interpreting them. The scale of modern data is creating new relationships around its collection and analysis. It's an exciting time to be a statistician.
In Moveable Type, as with some—maybe all—data visualization, we're balancing two components: recognition and surprise.
Artworks like Moveable Type and similar pieces (there is so much good work out there) are another signal that our relationship to data is changing. In addition to applications in science, data and data processing are supporting a raft of creative practices. Artists and designers are making beautiful data visualizations.
In Moveable Type, as with some—maybe all—data visualization, we're balancing two components: recognition and surprise. People might recognize that it is content from the Times, but the pieces might be juxtaposed in a way that's new. They might see a new or surprising pattern. If it's all recognition, people won't look because they would just be reading the paper in a busy lobby. If it's all surprise, people won't connect to it.
NYAS: With the enormous amount of data being generated today, is it easier or harder to see trends and draw inferences about our collective consciousness?
Hansen: I think we've always felt overwhelmed by data, granted, the absolute scale of data was vastly different than it is now. What also seems different now is our access to data. With organizations believing that data publication is a form of transparency, with the open data movement, the public is being invited to analyze information in a new way. This might make it harder for us to make sense of the world. With each new data set we have to judge the motivations and incentives of the data creators, and now there are more of them.
In fact, there's a new set of skills that constitutes an expanded view of quantitative reasoning. Virtual data have real effects in the physical world, and understanding how that works and how we can participate has become almost an exercise of citizenship. For academics like myself, it means we have to do a better job of teaching people to "read" data and its effects.
NYAS: Why do you choose to work at the intersection of data and art?
Hansen: I've learned so much about my own practice by doing these projects. Some of the artists I've worked with are better statisticians than I am! Through these projects, I've also become a better teacher—I've learned so much about how to effectively present data and concepts around data.
NYAS: What are some of the most pressing issues in data collection and analysis that we might hear about over the next few years?
Hansen: I think some of the most important developments will be around interactions with the public. I've already alluded to the ways in which data are refiguring certain social relationships, for example. I also think that you will see a major shift in the academy. Data will be seen as deserving a scientific discipline of its own, a field that combines the expertise of statisticians, computer scientists, and mathematicians, as well as practices from some unexpected places like design, architecture, urban planning, and the humanities. So many university disciplines are starting to grapple with data, and as they do, are creating new and interesting approaches to data collection and analysis. I predict we'll see departments of data science on university campuses, programs that integrate these experiences in one place.