Collaborative Research on Large Databases

Alex Szalay

The Science Archive for the Sloan Digital Sky Survey
     Our group is building the Science Archive for the SDSS. The SDSS project will collect over 40 Terabytes of raw data, consisting of 5 color digital images of the Northern sky. After processing, over 200 million objects will be identified on the images, and their well calibrated properties will be stored in a large digital archive, to be used for science. The SDSS Science Archive is built on top of Objectivity DB, using a distributed, scalable architecture. We designed and built our spatial indices, for locating objects both on the sky and in multicolor space.
Collaboration with Jim Gray of Microsoft
    We have been collaborating over the last year on various aspects of the SDSS Science Archive. Jim has been actively helping us in our system design, in creating various meaningful benchmarks, and we have been also thinking about how to build a next generation system with over 1 GByte/sec I/O speed, using various scalable designs. The results of this are in a recent paper: 

    Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey, by Alexander S. Szalay,  Peter Kunszt, Ani Thakar, Jim Gray. (Microsoft Technical Report MS-TR-99-30).


Intel Technology for Education 2000 grant
    we recieved a substantial equipment support for research in scaleable server architectures, related to the Sloan Digital Sky Survey Archive.Our goal is to build a distributed server with parallel I/O, running at hundreds of MByte/sec.

Hierarchical Triangular Mesh

    This is an advanced spatial index over the sphere of the sky, that does not have singularities at the poles. This is the sky index used by the SDSS archive, now shared by several other large astronomical archives (GSC2, GALEX, 2MASS, POSS2, planned by GAIA). The scheme is based on a hierarchical subdivision of spherical triangles, starting from an octahedron. The hierarchy is represented as a quadtree, except for the top level. We create the tree dynamically, down to 14 levels, corresponding to "pixels" of about 250 square arc sec area. We use a bit-interleaved hash code describing the object's pixel (generated by our indexing scheme) to create a 32 bit number. These are then used in localized searches, like cross-identifications. Given a set of constrains (ranges in various spherical coordinates)  we have a fast intersection algorithm which determines, which parts of the tree are inside the constraints.
Spatial Indexing for Multicolor Fluxes
We have developed a similar index to handle the stars and galaxies in multicolor space. It is based on a k-d tree, but we also store the bounding box, in a fashion similar to the r-tree. Istvan Csabai has developed an elegant heuristic, which avoids the emergence of `skinny' cells.
Collaboration with Harvey Newman and Julian Bunn, Caltech
    We have submitted a joint proposal to the NSF KDI Initiative.  Particle physics has to deal with Petabytes of distributed data, called "Large Distributed Archives in Particle Physics and Astronomy". We plan on merging our distributed, autonomous query agent technology with the Caltech groups' experience on running distributed databases in a Wide Area Network environment. The collaborators are: Alex Szalay, Michael Goodrich, Aihud Pevsner, Ethan Vishnic (JHU), Harvey Newman, Julian Bunn, Chris Martin (Caltech) and Jim Gray (Microsoft).
Objectivity/DB
    The database system used by the SDSS Science Archive is Objectivity/DB, an OODBMS. We have an ongoing collaborative agreement with Objectivity. We presented several talks at the various Objectivity Worldview Conferences.
National Virtual Observatory (NVO)
    In the next 5 years there will be several large astronomical archives, which will contain a digital view of the sky in a total of 13 different wavelengths. Each of these archives will be several Terabytes in size, stored at different location. Their seamless integration, coupled to novel tools for visualization and discovery will result in qualitatively different science. A small group is currently thinking about this concept, and trying to promote the idea within NASA, the NSF and the Astronomy and Astrophysics Committee of the NAS (Decetal Committee). Our current group of advocates consists of Alex Szalay, Tom Prince, Charles Alcock, Steve Strom, Bob Hanisch, George Lake and many others.
GALEX Project
 
Digital Sky Project
    there is an ongoing collaboration with the Digital Sky Project at Caltech/IPAC
Alex Szalay: Department of Physics & Astronomy, The Johns Hopkins University
Last Modified: June 15, 1999.