CSV Filenames, Formats and Directory
Structure
This document describes the directory structure, file-naming
convention, and formats for the CSV files that will be used as inputs
to the loader Robot that
will queue the loading directives
for the data to be loaded into the SQL Server tables for the various databases. This structure does
not as yet contain the tiling changes for spectro and tiling CSV
files.
The contents of each of the CSV files is described in a separate
document, csvFiles.htm.
Directory Structure
The directory structure for the CSV files is as follows (see Figure 1
below):
- CSV/ directory - the top of the directory tree for CSV files.
- phCSV/ directory - contains the imaging (photo) CSV
files. The tables that will be loaded from here are the Chunk,
Segment, Field, Mask, ObjMask, PhotoObj, Profile
, First, Rosat, and USNO tables. There is
a different subdirectory for each skyVersion of the data to be loaded:
- target/ directory - contains CSV files for the TARGET (skyVersion
= 0) data.
- <skyVersion#>-<stripe#>-<last_3_digits_of_startMu>-<last_3_digits_of_endMu>,
e.g. 0-42-191-786 : This is the chunk/stripe level directory that contains
the individual runs. This level is necessary to allow for the possibility
that a given run may overlap more than one chunk.
- <run#>/: There is one subdirectory for each run to be
loaded. The subdirectory name is the run-number (run#). The
files in each run# directory will be named <tablename>-target-<run#>-<rerun#>[_<sequence#>].csv
(the sequence number is used if necessary when files get very large), and
the following files will be deposited here:
- csv_ready: The csv_ready file is a semaphore file
indicating that the CSV files for this run are ready to be loaded.
This is necessary in case new runs are added while the robot is queueing
the existing runs to be loaded, and also for the future when we load runs
in parallel.
- sqlField-target-<run#>-<rerun#>.csv, e.g. sqlField-target-752-8.csv
- sqlFieldProfile-target-<run#>-<rerun#>.csv, e.g.
sqlFieldProfile-target-752-8.csv
- sqlFirst-target-<run#>-<rerun#>.csv, e.g. sqlFirst-target-752-8.csv
- sqlMask-target-<run#>-<rerun#>.csv, e.g. sqlMask-target-752-8.csv
- sqlObjMask-target-<run#>-<rerun#>.csv, e.g. sqlAtlas-target-752-8.csv
- sqlPhotoObjAll-target-<run#>-<rerun#>_<seq#>.csv,
e.g. sqlPhotoObjAll-target-752-8_0.csv
- sqlPhotoProfile-target-<run#>-<rerun#>_<seq#>.csv,
e.g. sqlPhotoProfile-target-752-8_1.csv
- sqlRosat-target-<run#>-<rerun#>.csv, e.g. sqlRosat-target-752-8.csv
- sqlSegment-target-<run#>-<rerun#>.csv, e.g.,
sqlSegment-target-752-8.csv
- sqlUSNO-target-<run#>-<rerun#>.csv, e.g. sqlUSNO-target-752-8.csv
- zoom/: This directory will contain the jpeg zoom files under
the subdirectories 1, 2, 3, 4, 5, 6 for each camCol.
- csv_ready: The csv_ready file is a semaphore file
that tells the robot that the TARGET database is ready to load, i.e. the
CSV files are ready to be queued for loading.
- sqlChunk-target.csv
- sqlTarget-target.csv
- sqlTargetInfo-target.csv
- best/ directory - contains CSV files for the BEST (skyVersion
= 1) data.
- <skyVersion#>-<stripe#>-<last_3_digits_of_startMu>-<last_3_digits_of_endMu>,
e.g. 1-42-191-169 : This is the chunk/stripe level directory that contains
the individual runs. This level is necessary to allow for the possibility
that a given run may overlap more than one chunk.
- <run#>/: As above, there will be one subdirectory for
each run. The files under this directory will be named
- csv_ready: The csv_ready file is a semaphore file
indicating that the CSV files for this run are ready to be loaded.
This is necessary in case new runs are added while the robot is queueing
the existing runs to be loaded, and also for the future when we load runs
in parallel.
- sqlField-best-<run#>-<rerun#>.csv, e.g. sqlField-best-752-7.csv
- sqlFieldProfile-best-<run#>-<rerun#>.csv, e.g.
sqlFieldProfile-best-752-7.csv
- sqlFirst-best-<run#>-<rerun#>.csv, e.g. sqlFirst-best-752-7.csv
- sqlMask-best-<run#>-<rerun#>.csv, e.g. sqlMask-best-752-7.csv
- sqlObjMask-best-<run#>-<rerun#>.csv, e.g. sqlAtlas-best-752-7.csv
- sqlPhotoObjAll-best-<run#>-<rerun#>_<seq#>.csv,
e.g. sqlPhotoObjAll-best-752-7_0.csv
- sqlPhotoProfile-best-<run#>-<rerun#>_<seq#>.csv,
e.g. sqlPhotoProfile-best-752-7_1.csv
- sqlRosat-best-<run#>-<rerun#>.csv, e.g. sqlRosat-best-752-7.csv
- sqlSegment-target-<run#>-<rerun#>.csv, e.g.,
sqlSegment-best-752-7.csv
- sqlUSNO-best-<run#>-<rerun#>.csv, e.g. sqlUSNO-best-752-7.csv
- zoom/: This directory will contain the jpeg zoom files under
the subdirectories 1, 2, 3, 4, 5, 6 for each camCol.
- csv_ready: The csv_ready is a semaphore file that
tells the robot that the BEST database is ready to load, i.e. the CSV files
are done.
- sqlChunk-best.csv, containing the chunk data.
- runs/ directory - contains CSV files for the RUNS (skyVersion
=15) data.
- <run#>-<rerun#>/: This will be the chunk (stripe)
level subdirectory. The individual run will be under this directory.
This is for consistency with the directory structure for target and best.
- <run#>/: There will be one subdirectory for each run
to be loaded. There will be no masks for the runs skyversion.
The files in this directory will be:
- csv_ready: This is a semaphore file indicating that the CSV
files for this run are ready to be queued for loading. This is to prevent
premature loading of a run before the CSV generation is finished.
- sqlField-runs-<run#>-<rerun#>.csv, e.g. sqlField-runs-1336-16.csv
- sqlFieldProfile-runs-<run#>-<rerun#>.csv, e.g.
sqlFieldProfile-runs-1336-16.csv
- sqlFirst-runs-<run#>-<rerun#>.csv, e.g. sqlFirst-runs-1336-16.csv
- sqlObjMask-runs-<run#>-<rerun#>.csv, e.g. sqlAtlas-runs-1336-16.csv
- sqlPhotoObjAll-runs-<run#>-<rerun#>_<seq#>.csv,
e.g. sqlPhotoObjAll-runs-1336-16_0.csv
- sqlPhotoProfile-runs-<run#>-<rerun#>_<seq#>.csv,
e.g. sqlPhotoProfile-runs-1336-16_1.csv
- sqlRosat-runs-<run#>-<rerun#>.csv, e.g. sqlRosat-runs-1336-16.csv
- sqlSegment-target-<run#>-<rerun#>.csv, e.g.,
sqlSegment-runs-1336-16.csv
- sqlUSNO-runs-<run#>-<rerun#>.csv, e.g. sqlUSNO-runs-1336-16.csv
- zoom/: This directory will contain the jpeg zoom files under
the subdirectories 1, 2, 3, 4, 5, 6 for each camCol.
- csv_ready: The csv_ready file is a semaphore file
that tells the robot that the RUNS database is ready to load, i.e. the CSV
files are ready to be queued for loading.
- sqlChunk-runs.csv, containing the chunk data if applicable.
- log/: The log directory - this is
the default location for the photo log and error files.
- spCSV/ directory - contains the spectro CSV files.
The subdirectories below this level are:
- plates/: The data directory containing
the subdirectories for individual spectro runs:
- <year>-<month>-<day>-<hour>,
e.g. 2002-11-19-1300 : Each such directory contains an individual spectro
run. This level is necessary to allow for the possibility that there
may be more than one run. This is the data directory containing the
actual CSV files:
- csv_ready: This is a semaphore file indicating that the CSV
files for this plate run are ready to be queued for loading. This is
to prevent premature loading of a run before the CSV generation is finished.
- sqlPlateX.csv - the data for the PlateX table.
- sqlSpecObjAll.csv - the data for the SpecObjAll table.
- sqlSpecLineAll.csv - the data for the SpecLineAll
table.
- sqlSpecLineIndex.csv - the data for the SpecLineIndex
table.
- sqlELRedshift.csv - the data for the ELRedshift (emission-line
redshifts) table.
- sqlHoleObj.csv - the data for the HoleObj table.
- sqlXCRedshift.csv - the data for the XCRedshift
(cross-correlation redshifts) table.
- csv_ready: The csv_ready a semaphore file that
tells the robot that the spectro data is ready to load, i.e. the CSV files
are ready to be queued for loading.
- log/: The log directory - this is
the default location for the spectro log and error files.
- tiCSV/ directory - contains the tiling CSV files.
The subdirectories below this level are:
- tiles/: The data directory containing the
subdirectories for individual tiling runs:
- <tileRun#>-<year>-<month>-<day>-<hour>,
e.g. 10-2003-01-8-1800 : Each such directory contains an individual tiling
run. This level is necessary to allow for the possibility that there
may be more than one run. The files in each such directory are:
- csv_ready: This is a semaphore file indicating that
the CSV files for this tiling run are ready to be queued for loading. This
is to prevent premature loading of a run before the CSV generation is finished.
- sqlTile-<tileRun#>.csv - the data for the Tile
table.
- sqlTiledTarget-<tileRun#>.csv - the data for
the TiledTarget table.
- sqlTilingGeometry-<tileRun#>.csv
- the data for the TilingGeometry table.
- sqlTilingInfo-<tileRun#>.csv
- the data for the TilingInfo table.
- sqlTilingRegion-<tileRun#>.csv
- the data for the TilingRegion table.
- csv_ready: The csv_ready a semaphore file that
tells the robot that the tiling data is ready to load, i.e. the CSV files
are ready to be queued for loading.
- log/: The log directory - this is
the default location for the tiling log and error files.
Figure 1. The directory structure layout for the CSV files.
Ani R. Thakar
Last Modified: August 07, 2008.