Date: March 6, 2012

Title: Astroinformatics

Podcasters: Pauline Barmby

Organization: Western University Canada


Description: How do astronomers deal with the massive amounts of data that come from hundred-megapixel cameras? How do they find out about the properties of thousands of celestial objects at once? The answer is “astroinformatics”, the new science of astronomical data.

Bio: Pauline Barmby is an assistant professor in the Department of Physics & Astronomy at Western University in London, Ontario, Canada. She studies stars and star clusters in nearby galaxies, using any telescope she can find. When not doing astronomy or teaching, she can be found knitting, curling, listening to podcasts, and reading science fiction.

Sponsor: This episode of the “365 Days of Astronomy” podcast is sponsored by — NO ONE. We still need sponsors for many days in 2012, so please consider sponsoring a day or two. Just click on the “Donate” button on the lower left side of this webpage, or contact us at


Hello. My name is Pauline Barmby and I’m an assistant professor in the Department of Physics And Astronomy at Western University in London, Ontario, Canada.

Today, I want to tell you a little bit about “astroinformatics”. Astroinformatics is the sub-field of astrophysics that involves organizing, describing, classifying, visualizing, and mining astronomical data. It’s similar to bioinformatics in biology, which arose out of the need to deal with the large volume of data produced by molecular biology (e.g., genome sequencing).

While astrophysicists have often had to deal with large quantities of data, the idea of astroinformatics as a separate research area is relatively new. Like bioinformatics some years ago, astroinformatics is currently in something of a boundary area between astrophysics and computer science, and the methods of doing things are still being sorted out. The reason that astroinformatics has emerged as its own research area is that we are experiencing a major change in the kind of observational data that’s become available.

When I talk to people about what an astronomer does, often I hear that people think that I work every night, staring at the stars through an eyepiece attached to enormous telescopes. Astronomers haven’t worked this way for decades. Since large, public observatories became available, observational astrophysicists have competitively applied for a few hours of observations per year, spending the rest of their research time analyzing data with sophisticated computer programs. Data in astrophysics are precious: they are obtained with expensive facilities which often have limited lifetimes, and they represent snapshots of the universe at only one particular moment in time. Astrophysics is an observational science where we can’t do controlled experiments, so the best way to get a statistically significant result is to start with a well-defined sample of objects, carefully controlled for selection biases, and then observe them in a uniform way. This tends to mean that we study a few objects at a time, very carefully.

In the past 15 years, though, a new model has arisen: that’s the model of the public sky survey. In this model, first brought into existence by the Sloan Digital Sky Survey and the Two Micron All Sky Survey (SDSS and 2MASS), a group of astrophysicists obtains a large quantity of data, develops the algorithms for processing it automatically, and releases the resulting database and/or processed data to the astronomical community. In contrast to the traditional methods, the survey model enables statistical studies involving thousands or even millions of celestial objects. Papers related to the SDSS were the most-cited in astrophysics for several years between 2000 and 2009, and many more public surveys have followed the SDSS and 2MASS. These have resulted in the discovery of the some of the most distant quasars in the universe and some of the smallest, most distant objects in our own solar system. Even larger surveys will be coming online in the future: for example, the Large Synoptic Survey Telescope will scan the entire sky about once a week and make full-sky time-domain astrophysics possible for the first time. (For more information about the LSST, you can listen to 365daysofastronomy podcasts for November 26, 2009 and June 17, 2011.) The Virtual Observatory is an international project designed to streamline and standardize the distribution of these future massive datasets. You can hear more about Virtual Observatory related work at the Chandra X-ray Center on the 365daysofastronomy podcast for September 24, 2010.

The field of astroinformatics involves figuring out how to use these massive datasets to make discoveries in astronomy. This might involve combing through massive databases to find very rare objects, developing new ways to process data on variable stars, or figuring out what piece of the sky is visible in any image, taken with a professional telescope or not (check out to learn more about this last idea). In my own research work I’ve recently been using several online databases to compile the most complete data on the stars in nearby dwarf galaxies, to help me find the rare stars that are losing matter into the interstellar medium at the ends of their lives.

Hopefully I’ve helped you understand why professional astronomers might care about astroinformatics. But why would anyone else care? Well, if you are an amateur astronomer, you might be interested in knowing what databases are already out there for you to explore. There are basically two kinds of astronomical databases: those which contain observational data, such as the previously-mentioned Sloan Digital Sky Survey, or the Hubble Space Telescope Archive, and databases which collect information –physical measurements –on individual objects. The two best-known examples of these are the NASA Extragalactic Database, or NED, and the Set of Identifications, Measurements, and Bibliography for Astronomical Data, or SIMBAD. Links to these sites will be in the show transcript, and you can go to either one and type in a name, for example “Messier 104”, and get a huge amount of data about this well-known galaxy.

The other reason that you might be interested in astroinformatics is that we hope that the techniques that astronomers develop in dealing with massive datasets will be transferable to other fields. Astronomers have had a long tradition of applying their skills in the wider world, and training students in dealing with big datasets is one contribution we can make. For one example of a cool real-world application of big datasets, check out There are many others, including commercial data mining, security, and medical research.

From CCDs to lasers, astrophysics has always made use of the latest technologies, and astroinformatics is just the newest example. I expect you’ll be hearing more about this emerging field in the years to come. Thanks for joining me today.

End of podcast:

365 Days of Astronomy
The 365 Days of Astronomy Podcast is produced by the Astrosphere New Media Association. Audio post-production by Preston Gibson. Bandwidth donated by and wizzard media. Web design by Clockwork Active Media Systems. You may reproduce and distribute this audio for non-commercial purposes. Please consider supporting the podcast with a few dollars (or Euros!). Visit us on the web at or email us at Until tomorrow…goodbye.