Professional Documents
Culture Documents
A Data Tsunami
The LSST is currently under construction near La
Serena, Chile. When completed, it will share a ridge
on Cerro Pachon with the Gemini South and Southern
Astrophysical Research telescopes. Its chief funders are
the U.S. National Science Foundation (for its facility and
telescope), the U.S. Department of Energy (its camera),
and various philanthropists (its mirror). This year alone
LSST will consume about $140 million, with an esti
mated $1 billion over its lifetime.
The LSST will have three components: an 8-meter
How will astronomers cope with wide-field telescope, a 3.2-billion-pixel camera, and an
automated data-processing system. The scopes field of
the tsunamis of raw data soon to view will be much larger than that of any other 8-meter
telescope: 3.5 degrees, or about seven times the diame
pour in from wide-field surveys? ter of the full Moon. Every 40 seconds or so the telescope
will point to a new area of the sky, and its camera will
Not long ago Henry Trae Winter, an astrophysicist at take two 15-second exposures (to efficiently reject cosmic
the Harvard-Smithsonian Center for Astrophysics, gave rays). The camera will record using six different visual
a talk entitled Big Data to Big Art. Winter works on and near-infrared filters, and over its anticipated 10-year
NASAs Solar Dynamics Observatory mission, and he run it will capture some 40 billion objects in an unprec
wanted to give his audience an idea of just how much edentedly large volume of the universe.
image data the SDOs telescope generated each day. The Besides being wide, fast, and deep, LSST will add
scopes four cameras take pictures of our star every 12 the time domain. After those 10 years, this will result
seconds in eight different wavelengths, which results in in a sort of stop-motion movie of much of the celestial
3 terabytes worth of images each day. sphere. All told, the camera will image about 10,000
How much is that exactly? Winter made an analogy square degrees of sky every three clear nights. LSST will
with Blu-ray discs. A standard, dual-layer Blu-ray holds not stop for follow-up observations but will simply keep
about 50 gigabytes of data, so in one day SDO generates harvesting night after night.
60 Blu-rays worth of data. The mission has been run LSST leaders expect astronomers to use the output
ning for six years 131,400 Blu-rays and the team from this technological juggernaut to explore many
hopes to observe the Suns entire 11-year cycle. Imagine key science areas. Four in particular will be counting
trying to watch all those discs, deleted scenes and all. asteroids and other moving objects in our solar system,
The same thing is happening all over astronomy. And mapping the structure and evolution of the Milky Way,
todays big data pales next to whats coming, both with exploring transient phenomena such as supernovae and
surveys already underway, such as the Panoramic Survey other variable stars, and constraining the nature of dark
Telescope and Rapid Response System (Pan-STARRS) energy and dark matter. But the science will be up to the
and with those in development, including the Large Syn scientific community. LSST will simply collect the data
optic Survey Telescope (LSST) and the Square Kilometer a continuous tsunami of it.
Array. These and other visual and radio telescopes will The project expects to accumulate 15 terabytes of raw
blow SDO out of the sky in terms of data output. data each night five times what SDO garners daily.
To explore what big data means in the astronomy con Fifteen terabytes a night is a lot, but within a few years,
text, and to look at the challenges it presents, lets take a the cumulative data will become petabytes in size. (A
close look at the principal U.S. project being developed petabyte is 1,000 terabytes; see the table above right.)
in ground-based astronomy for the 2020s: the LSST. Thats a scale were not typically used to, says Andrew
V
1 Kilobyte (KB) 1,000 bytes (103)
S k y a n d T e le s c o p e .c o m September 2016 15
i" BW1 w v
N A S A S D O / A IA / H E N R Y " T R A E W IN T E R (C F A )
example is the Sloan Digital Sky Survey. Its 2.5-meter Every science agency around the world, every obser
telescope in New Mexico conducted a thorough visible- vatory manager, saw this and said, 5,000 papers for only
light survey of one-third of the observable sky. All told, $100 million? says astronomer Eric Feigelson (Penn
SDSS recorded position and brightness for a billion sylvania State University), citing the estimated cost of
stars, galaxies, and quasars, along with spectra of a mil SDSS over about 10 years. It was probably the cheapest
lion objects. And that was just SDSSs first phase; follow science-per-dollar ever achieved. And so everyone said,
up survey work has continued ever since. Lets do it, too.
There had been other wide-held sky surveys most Today an alphabet soup of wide-held surveys is either
notably, the photographic Palomar Sky Surveys dur already or soon to be in operation (some of the largest are
ing the 1950s and 1980s but SDSS pulled astronomy shown in the diagram above right). There are also big
straight into the big-data era. The project resulted in a gish-data instruments planned for space. Traditionally,
spectacular amount of groundbreaking science, much of NASA hasnt designed its missions to transmit telemetry
it unforeseen by the ventures designers. Among other in megabytes per second, so previously any heavy data-
findings, SDSS enabled major discoveries regarding processing was done aboard Kepler, Chandra, and other
active galactic nuclei, the substructure of the Milky Way, space-based instruments. But besides the SDO, well
and baryon acoustic oscillations (S&T: Apr. 2016, p. 22). soon have NASAs Wide Field Infrared Survey Telescope
Since routine operations began in 2000, SDSS data have (WFIRST) and the European Space Agencys Euclid.
been used in more than 5,800 peer-reviewed publica Both are wide-held-survey space telescopes designed to
tions in astronomy and other sciences, and those papers help answer questions about dark energy, exoplanets,
have been cited nearly 250,000 times. and other hot topics in astronomy and astrophysics.
|
500,000
But LSST leaders are confident that such advance
efforts will not present an obstacle, nor will storage. As
the project explains on its website, While LSST is mak
ing a novel use of advanced information technology, it is
not taking the risk of pushing the expected technology
to the limit. LSST will have two redundant, 40-gigabyte-
per-second optical-fiber links from La Serena to Urbana.
100,000
Such dedicated long-haul networks mean that even SDSS S kyM apper
40 TB 500 TB
transferring that amount of information is not expected 10,000
S&T: GREGG DINDERMAN, SOURCE: Y. ZHANG S, Y ZHAO / DATA SCIENCE JOURNAL 2015
A E R IA L E D IF IC E S Below: The Large Synoptic Survey Telescope under construction in Chile, March 2016. Above: New Yorks 1,776-foot
Freedom Tower is dw arfed by the stacks o f Blu-ray discs, each about 0.05 inch (1.2 mm) thick, th a t would be needed to store the to ta l
data volum e fro m th e three largest w ide-field survey projects. Note the com paratively m inuscule stack required fo r the SDSS data set.
Algorithm as Instrument
For each object that ends up in the LSST catalog, the team
will typically measure tens of parameters: its position,
shape, brightness, color, and so on. Over its planned
10-year run, LSST will make about 1,000 observations
of every object, giving astronomers information about
stars, galaxies, and other entities as a function of time
that stop-motion movie. So, in addition to astronomi
cal amounts of data, the phase space that the data occupy
will have thousands of dimensions. How are astronomers
expected to deal with that? One word: software.
In the old days, astronomers went to the observatory
to make discoveries. Now more often they go to the data
base in fact, to extremely large databases, or XLDBs. N O C T U R N A L EYE An a r tis ts v ie w o f th e Large S y n o p tic S ur
The XLDBs arising from LSST will be trillions of lines vey T elescope a g a in st a s im u la te d b a ckg ro u n d o f stars. T he LSST
long, Kahn says. Searching them in a linear way, row w ill g a th e r 15 te ra b yte s o f raw data every n ig h t fo r a decade.
S k y a n d T e le s c o p e .c o m S e p te m b e r 2016 !9
| Cosm ic Flood
standing of statistics will be crucial to sifting treasures where they fly me out to give tutorials in astrostatistics.
from the information overload. As Feigelson and his He and colleagues have given instruction to about 10%
longtime Penn State collaborator, statistician Jogesh of the worlds astronomers, he estimates. Which means
Babu, write in one of their papers, Scientific insights weve had some inroads, but not enough to really mod
simply cannot be extracted from massive data sets with ernize the methodology used by the entire field.
out statistical analysis. All the issues raised here really come down to one
Astronomers will just have to get comfortable with question: how well prepared will astronomers be when
statistical techniques such as advanced regression and the goods begin gushing forth from the LSST, the
Bayesian inference and multivariate classification. The Square Kilometer Array, and other such projects? You
same goes for tools in informatics. All the experts inter dont fail by not being ready, Kahn says. You can still
viewed for this article agreed that collaborations among do something. The question is, are you fully exploiting
astronomers, statisticians, and information scientists the data? Thats really the challenge.
must be greatly expanded. Some astronomers are on top Incidentally, what is 200 petabytes of data Kahns
of this, such as those associated with the Department of estimate for the entire LSST data set, raw and processed,
Energys SLAC National Accelerator Laboratory, which is after 10 years in stacked Blu-rays? Its about 15,750
building the LSSTs camera. But many others are not. feet, or roughly the height of Mont Blanc, the highest
To that end, Feigelson spends a lot of time training mountain in the Alps.
astronomers in astrostatistics. Ive been in an airplane
maybe 20 times in the last two or three years, he says, Peter Tyson is editor in chief of Sky & Telescope.