Archive for the Sloan Digital Sky Survey
Our group is building the Science Archive for the
SDSS. The SDSS project will collect over 40 Terabytes of raw data, consisting
of 5 color digital images of the Northern sky. After processing, over 200
million objects will be identified on the images, and their well calibrated
properties will be stored in a large digital archive, to be used for science.
The SDSS Science Archive is built on top of Objectivity DB, using a distributed,
scalable architecture. We designed and built our spatial indices, for locating
objects both on the sky and in multicolor space.
with Jim Gray of Microsoft
We have been collaborating over the last year on various
aspects of the SDSS Science Archive. Jim has been actively helping us in
our system design, in creating various meaningful benchmarks, and we have
been also thinking about how to build a next generation system with over
1 GByte/sec I/O speed, using various scalable designs. The results of this
are in a recent paper:
Designing and Mining Multi-Terabyte Astronomy Archives:
The Sloan Digital Sky Survey, by Alexander S. Szalay, Peter Kunszt,
Ani Thakar, Jim Gray. (Microsoft
Technical Report MS-TR-99-30).
for Education 2000 grant
we recieved a substantial equipment support for research
in scaleable server architectures, related to the Sloan Digital Sky Survey
Archive.Our goal is to build a distributed server with parallel I/O, running
at hundreds of MByte/sec.
Hierarchical Triangular Mesh
This is an advanced spatial index
over the sphere of the sky, that does not have singularities at the poles.
This is the sky index used by the SDSS archive, now shared by several other
large astronomical archives (GSC2, GALEX, 2MASS, POSS2, planned by GAIA).
The scheme is based on a hierarchical subdivision of spherical triangles,
starting from an octahedron. The hierarchy is represented as a quadtree,
except for the top level. We create the tree dynamically, down to 14 levels,
corresponding to "pixels" of about 250 square arc sec area. We use a bit-interleaved
hash code describing the object's pixel (generated by our indexing scheme)
to create a 32 bit number. These are then used in localized searches, like
cross-identifications. Given a set of constrains (ranges in various spherical
coordinates) we have a fast intersection algorithm which determines,
which parts of the tree are inside the constraints.
||Spatial Indexing for Multicolor
We have developed a similar
index to handle the stars and galaxies in multicolor space. It is based
on a k-d tree, but we also store the bounding box, in a fashion similar
to the r-tree. Istvan Csabai has developed an elegant heuristic, which
avoids the emergence of `skinny' cells.
||Collaboration with Harvey Newman
and Julian Bunn, Caltech
We have submitted a joint proposal to the NSF KDI Initiative.
Particle physics has to deal with Petabytes of distributed data, called
"Large Distributed Archives
in Particle Physics and Astronomy". We plan on merging our distributed,
autonomous query agent technology with the Caltech groups' experience on
running distributed databases in a Wide Area Network environment. The collaborators
are: Alex Szalay, Michael Goodrich, Aihud Pevsner, Ethan Vishnic (JHU),
Harvey Newman, Julian Bunn, Chris Martin (Caltech) and Jim Gray (Microsoft).
The database system used by the
SDSS Science Archive is Objectivity/DB, an OODBMS. We have an ongoing collaborative
agreement with Objectivity. We presented several talks at the various Objectivity
||National Virtual Observatory
In the next 5 years there will
be several large astronomical archives, which will contain a digital view
of the sky in a total of 13 different wavelengths. Each of these archives
will be several Terabytes in size, stored at different location. Their
seamless integration, coupled to novel tools for visualization and discovery
will result in qualitatively different science. A small group is currently
thinking about this concept, and trying to promote the idea within NASA,
the NSF and the Astronomy and Astrophysics Committee of the NAS (Decetal
Committee). Our current group of advocates consists of Alex Szalay, Tom
Prince, Charles Alcock, Steve Strom, Bob Hanisch, George Lake and many
||Digital Sky Project
there is an ongoing collaboration with the Digital Sky
Project at Caltech/IPAC