20 Astronomical queries and their SQL

We developed a set of 20 queries that we think characterize the kinds of questions Astronomers are likely to ask the SDSS archives.

We are in the process of translating these queries into SQL statements and evaluating their performance on a relational system. Here follow the queries and a narrative description of how we believe they will be evaluated.

Q1: Find all galaxies without unsaturated pixels within 1 arcsecond of a given point in the sky (right ascension and declination).

This is a classic spatial lookup. We expect to have a quad-tree spherical triangle index with object type (star, galaxy, ) as the first key and then the spatial attributes. So this will be a lookup in that quad-tree. Selecting those galaxies that are within one arcsecond of the specified point.

Q2: Find all galaxies with blue surface brightness between and 30 and 40, and -10<super galactic latitude (sgb) <10, and declination less than zero.

This searches for all galaxies in a certain region of the sky with a specified brightness in the blue spectral band. The query uses a different coordinate system, which ismust first be converted to the hier-archical triangles of Figure 3 and section 3.5. It is then a set of disjoint table scans, each having a compound simple predicate representing the spatial boundary conditions and surface brightness test.

Q3: Find all galaxies brighter than magnitude 22, where the local extinction is >0.75.

The local extinction is a map of the sky telling how much dust is in that direction, and hence how much light is masked by that dust. The extinction grid is stored as a table with 1 square arcminute resolution - about half a billion cells. The query is either a spatial join of bright galaxies with the extinction grid table, or the extinction is stored as an attribute of each object so that this is just a scan of the galaxies in the Photo table.

Q4: Find galaxies with a surface brightness greater than 24 with a major axis 30"<d<1', in the red-band, and with an ellipticity>0.5.

Each of the 5 color bands of a galaxy will have been pre-processed into a bitmap and then the bitmap will be broken into 15 concentric rings. The rings are then divided into octants. The intensity of the light in each ring is analyzed and recorded as a 5x15 array. The array is stored as an object (SQL blob in our type impoverished case). The concentric rings are pre-processed to compute surface bright-ness, ellipticity, major axis, and other attributes. Conse-quently, this query is a scan of the galaxies with predicates on precomputed properties.

Q5: Find all galaxies with a deVaucouleours profile (r^< falloff of intensity on disk) and the photometric colors consis-tent with an elliptical galaxy.

The deVaucouleours profile information is precomputed from the concentric rings as dis-cussed in Q4. This query is a scan of galaxies in the Photo table with predicates on the intensity profile and color limits.

Q6: Find galaxies that are blended with a star, output the deblended magnitudes.

Preprocessing separates objects that overlap or are related (a binary star for example). This process is called deblending and produces a tree of objects; each with its own 'deblended' attributes such as color and intensity. The parent child rela-tionships are represented in SQL as foreign keys. The query is a join of the deblended galaxies in the photo table, with their siblings. If one of the siblings is a star, the galaxy's identity and magnitude is added to the answer set.

Q7: Provide a list of star-like objects that are 1% rare for the 5-color attributes.

This involves classification of the attribute set and then a scan to find objects with attributes close to that of a star that occur in rare categories.

Q8: Find all objects with spectra unclassified.

This is a sequential scan returning all objects with a certain precomputed flag set

Q9: Find quasars with a line width >2000 km/s and 2.5<redshift<2.7.

This is a sequential scan of quasars in the Spectra table with a predicate on the redshift and line width. The Spectra tale has about 1.5 million objects have a known spectra but there are only 100,00 known quasars..

Q10: Find galaxies with spectra that have an equivalent width in Ha >40A (Ha is the main hydrogen spectral line.)

This is a join of the galaxies in the Spectra table and their lines in the Lines table.

Q11: Find all elliptical galaxies with spectra that have an anomalous emission line.

This is a sequential scan of galax-ies (they are indexed) that have ellipticity above .7 (a pre-computed value) with emission lines that have been flagged as strange (again a precomputed value).

Q12: Create a grided count of galaxies with u-g>1 and r<21.5 over 60<declination<70, and 200<right ascen-sion<210, on a grid of 2', and create a map of masks over the same grid.

Scan the table for galaxies and group them in cells 2 arc-minutes on a side. Provide predicates for the color restrictions on u-g and r and to limit the search to the portion of the sky defined by the right ascension and declination conditions. Return the count of qualifying galaxies in each cell. Run another query with the same grouping, but with a predicate to include only objects such as satellites, planets, and airplanes that obscure the cell. The second query returns a list of cell coordinates that serve as a mask for the first query. The mask may be stored in a temporary table and joined with the first query.

Q13: Create a count of galaxies for each of the HTM trian-gles (hierarchal triangular mesh) which satisfy a certain color cut, like 0.7u-0.5g-0.2and i-magr<1.25 and r-mag<21.75, output it in a form adequate for visualization.

This query is a sequential scan of galaxies with predicates for the color magnitude. It group the results by a specified level in the HTM hierarchy (obtained by shifting the HTM key) and return a count of galaxies in each triangle together with the key of the triangle.

Q14: Provide a list of stars with multiple epoch measure-ments, which have light variations >0.1 magnitude.

Scan for stars that have a secondary object (observed at a different time) with a predicate for the light variations.

Q15: Provide a list of moving objects consistent with an as-teroid.

Objects are classified as moving and indeed have 5 successive observations from the 5 color bands. So this is a select of the form: select moving object where sqrt((deltax5-deltax1)2 + (deltay5-deltay1)2) < 2 arc seconds.

Q16: Find all star-like objects within DetaMagnitde of 0.2 of the colors of a quasar at 5.5<redshift<6.5.

Scan all objects with a predicate to identify star-like objects and another predicate to specify a region in color space within 'distance' 0.2 of the colors of the indicated quasar (the quasar colors are known).

Q17: Find binary stars where at least one of them has the colors of a white dwarf.

Scan the Photo table for stars with white dwarf colors that are a child of a binary star. Return a list of unique binary star identifiers.

Q18: Find all objects within 1' of one another other that have very similar colors: that is with the color ratios u-g, g-r, r-I are less than 0.05m. (Magnitudes are logarithms so these are ratios.)

This is a gravitational lens query. Scan for objects in the Photo table and compare them to all objects within one arcminute of the object. If the color ratios match, this is a candidate object. We may precomputed the five nearest neighbors of each object to speed queries like this.

Q19: Find quasars with a broad absorption line in their spectra and at least one galaxy within 10". Return both the quasars and the galaxies.

Scan for quasars with a predicate for a broad absorption line and use them in a spatial join with galaxies that are within 10 arc-seconds. The nearest neighbors may be precomputed which makes this a regular join.

Q20: For a galaxy in the BCG data set (brightest color gal-axy), in 160<right ascension<170, 25<declination<35, give a count of galaxies within 30" which have a photoz within 0.05 of the BCG.

First form the BCG (brightest galaxy in a cluster) table. Then scan for galaxies in clusters (the cluster is their parent object) with a predicate to limit the region of the sky. For each galaxy, test with a sub-query that no other galaxy in the same cluster is brighter. Then do a spatial join of this table with the galaxies to return the desired counts.

Peter Z. Kunszt, Alex S. Szalay, Ani R. Thakar, Jim Gray, Don Slutz
Last Modified: February 24, 2000.