SkyServer: SDSS Data Management

horizontal line

The Johns Hopkins University has been in charge of managing the catalog data for the Sloan Digital Sky Survey from the beginning of the survey. All measured parameters for every star and galaxy seen by the survey are stored in an archive designed at maintained by scientists and programmers at JHU.

The Science Archive

In the beginning, the archive was known as The Science Archive (abbreviated as "SX"). During this phase, which lasted from 1997 through 2003, SDSS catalog data was stored in a commercial object-oriented database management system (ODBMS). The JHU team developed a layer of middleware that provided web interfaces and SQL-based access to the database, and also optimized the queries submitted by users. The team also built a query engine called the SX Server, which handled multiple user queries in parallel and executed them on the ODBMS data. This system was used for the SDSS's Early Data Release (EDR) and Data Release 1 (DR1) datasets.

SkyServer's Image List tool shows galaxies
that match a user query

CAS and SkyServer

However, due to performance and technical issues with the ODBMS platform, by 2003, the Science Archive could no longer keep up with the growing requirements of the SDSS data. In that year, the SDSS Collaboration decided on a new approach: storing its data in a relational database management system (RDBMS). The collaboration decided to maintain the SDSS data archive using a commercial relational database management system, Microsoft's SQL Server.

Since 2000, the SDSS Data Management team at JHU had been working with ACM Turing Award winner Jim Gray of Microsoft Research. By the time the overall SDSS collaboration decided to make the switch, the team had already designed and built a SQL Server-based data archive for SDSS Early Data Release (EDR) data. The data archive that JHU built became known as the Catalog Archive Server (CAS), and the primary web interface to the archive became known as SkyServer. Today, the two terms are often used interchangeably.

A few of the nearly one million spectra available through SkyServer

Technical Articles

The concept and design of the CAS archive and the SkyServer web interface are described in two Microsoft Technical Reports:

The usage and traffic data analysis for the first 5 years (2001-2006) of the SkyServer is described in another Microsoft Technical Report from 2006:

We are currently preparing a more extensive analysis of the SkyServer traffic from the first 10 years (2001-2011) for publication.

The migration of the SDSS data from the ODBMS to the RDBMS is described in a 2003 article in Computing in Science and Engineering (CiSE), a journal jointly published by the IEEE Computer Society and the American Institute of Physics. A local link to this article is available here.

The design of the CAS and its components is documented in series of articles in a special 2008 issue of CiSE dedicated to the SDSS (I and II) Science Archive, including:

There is also an article on lessons learned from the SDSS CAS in a later issue of CiSE that same year.

Ani Thakar
Last Modified: August 3, 2011