We developed a set of 20 queries that we think characterize the kinds of questions Astronomers are likely to ask the SDSS archives.
We are in the process of translating these queries into SQL
statements and evaluating their performance on a relational system.
Here follow the queries and a narrative description of how we believe
they will be evaluated.
Q1: Find all galaxies without unsaturated pixels within 1 arcsecond of a given point in the sky (right ascension and declination).
This is a classic spatial lookup. We expect to have a quad-tree
spherical triangle index with object type (star, galaxy, ) as the
first key and then the spatial attributes. So this will be a lookup
in that quad-tree. Selecting those galaxies that are within one
arcsecond of the specified point.
Q2: Find all galaxies with blue surface brightness between and 30 and 40, and -10<super galactic latitude (sgb) <10, and declination less than zero.
This searches for all galaxies in a certain region of
the sky with a specified brightness in the blue spectral band. The
query uses a different coordinate system, which ismust first be
converted to the hier-archical triangles of Figure 3 and section
3.5. It is then a set of disjoint table scans, each having a compound
simple predicate representing the spatial boundary conditions and
surface brightness test.
Q3: Find all galaxies brighter than magnitude 22, where the local extinction is >0.75.
The local extinction is a map of the sky telling
how much dust is in that direction, and hence how much light is masked
by that dust. The extinction grid is stored as a table with 1 square
arcminute resolution - about half a billion cells. The query is either
a spatial join of bright galaxies with the extinction grid table, or
the extinction is stored as an attribute of each object so that this
is just a scan of the galaxies in the Photo table.
Q4: Find galaxies with a surface brightness greater than 24 with a major axis 30"<d<1', in the red-band, and with an ellipticity>0.5.
Each of the 5 color bands of a galaxy will have
been pre-processed into a bitmap and then the bitmap will be broken
into 15 concentric rings. The rings are then divided into octants.
The intensity of the light in each ring is analyzed and recorded as a
5x15 array. The array is stored as an object (SQL blob in our type
impoverished case). The concentric rings are pre-processed to compute
surface bright-ness, ellipticity, major axis, and other
attributes. Conse-quently, this query is a scan of the galaxies with
predicates on precomputed properties.
Q5: Find all galaxies with a deVaucouleours profile (r^< falloff of intensity on disk) and the photometric colors consis-tent with an elliptical galaxy.
The deVaucouleours profile information is
precomputed from the concentric rings as dis-cussed in Q4. This query
is a scan of galaxies in the Photo table with predicates on the
intensity profile and color limits.
Q6: Find galaxies that are blended with a star, output the deblended magnitudes.
Preprocessing separates objects that overlap or
are related (a binary star for example). This process is called
deblending and produces a tree of objects; each with its own
'deblended' attributes such as color and intensity. The parent child
rela-tionships are represented in SQL as foreign keys. The query is a
join of the deblended galaxies in the photo table, with their
siblings. If one of the siblings is a star, the galaxy's identity and
magnitude is added to the answer set.
Q7: Provide a list of star-like objects that are 1% rare for the 5-color attributes.
This involves classification of the attribute set
and then a scan to find objects with attributes close to that of a
star that occur in rare categories.
Q8: Find all objects with spectra unclassified.
This is a
sequential scan returning all objects with a certain precomputed flag
set
Q9: Find quasars with a line width >2000 km/s and 2.5<redshift<2.7.
This is a sequential scan of quasars in the Spectra table
with a predicate on the redshift and line width. The Spectra tale has
about 1.5 million objects have a known spectra but there are only
100,00 known quasars..
Q10: Find galaxies with spectra that have an equivalent width in Ha >40A (Ha is the main hydrogen spectral line.)
This is a join of the galaxies in the Spectra table and their
lines in the Lines table.
Q11: Find all elliptical galaxies with spectra that have an anomalous emission line.
This is a sequential scan of galax-ies (they
are indexed) that have ellipticity above .7 (a pre-computed value)
with emission lines that have been flagged as strange (again a
precomputed value).
Q12: Create a grided count of galaxies with u-g>1 and r<21.5 over 60<declination<70, and 200<right ascen-sion<210, on a grid of 2', and create a map of masks over the same grid.
Scan the table for galaxies
and group them in cells 2 arc-minutes on a side. Provide predicates
for the color restrictions on u-g and r and to limit the search to the
portion of the sky defined by the right ascension and declination
conditions. Return the count of qualifying galaxies in each cell. Run
another query with the same grouping, but with a predicate to include
only objects such as satellites, planets, and airplanes that obscure
the cell. The second query returns a list of cell coordinates that
serve as a mask for the first query. The mask may be stored in a
temporary table and joined with the first query.
Q13: Create a count of galaxies for each of the HTM trian-gles (hierarchal triangular mesh) which satisfy a certain color cut, like 0.7u-0.5g-0.2and i-magr<1.25 and r-mag<21.75, output it in a form adequate for visualization.
This query is a sequential scan of
galaxies with predicates for the color magnitude. It group the results
by a specified level in the HTM hierarchy (obtained by shifting the
HTM key) and return a count of galaxies in each triangle together with
the key of the triangle.
Q14: Provide a list of stars with multiple epoch measure-ments, which have light variations >0.1 magnitude.
Scan for stars that have
a secondary object (observed at a different time) with a predicate for
the light variations.
Q15: Provide a list of moving objects consistent with an as-teroid.
Objects are classified as moving and indeed have 5 successive
observations from the 5 color bands. So this is a select of the form:
select moving object where sqrt((deltax5-deltax1)2 +
(deltay5-deltay1)2) < 2 arc seconds.
Q16: Find all star-like objects within DetaMagnitde of 0.2 of the colors of a quasar at 5.5<redshift<6.5.
Scan all objects with a
predicate to identify star-like objects and another predicate to
specify a region in color space within 'distance' 0.2 of the colors of
the indicated quasar (the quasar colors are known).
Q17: Find binary stars where at least one of them has the colors of a white dwarf.
Scan the Photo table for stars with white dwarf colors
that are a child of a binary star. Return a list of unique binary star
identifiers.
Q18: Find all objects within 1' of one another other that have very similar colors: that is with the color ratios u-g, g-r, r-I are less than 0.05m. (Magnitudes are logarithms so these are ratios.)
This is
a gravitational lens query. Scan for objects in the Photo table and
compare them to all objects within one arcminute of the object. If
the color ratios match, this is a candidate object. We may
precomputed the five nearest neighbors of each object to speed
queries like this.
Q19: Find quasars with a broad absorption line in their spectra and at least one galaxy within 10". Return both the quasars and the galaxies.
Scan for quasars with a predicate for a broad absorption
line and use them in a spatial join with galaxies that are within 10
arc-seconds. The nearest neighbors may be precomputed which makes this
a regular join.
Q20: For a galaxy in the BCG data set (brightest color gal-axy), in 160<right ascension<170, 25<declination<35, give a count of galaxies within 30" which have a photoz within 0.05 of the BCG.
First form the BCG (brightest galaxy in a cluster) table. Then scan for galaxies in clusters (the cluster is their parent object) with a predicate to limit the region of the sky. For each galaxy, test with a sub-query that no other galaxy in the same cluster is brighter. Then do a spatial join of this table with the galaxies to return the desired counts.