-
Notifications
You must be signed in to change notification settings - Fork 13
collect.py usage hints for converting time series observation data
collect.py is a program to read EPIC format netcdf(3) files [grouped by the experiment in which they were collected] and convert them to CF-1.6 compliant (discrete samples) netcdf(4) files. These output CF-1.6 files will be incorporated into the portal, and be harvested into the IOOS database.
By default, the following experiment/sensor/file types are ignored and not converted:
- Raw hourly data (files with
a1h
orA1H
orA1h
ora1H
) -8543sc-a1h.nc
-
lp
data (files withalp
) -3971-alp.nc
- Burst variances (files with
*var-
) -8545advbvar-cal.nc
-
b-cal
data (files withb-cal.nc
) -8545advb-cal.nc
- Files that don't follow the patterns:
*-a*
*-A*
*s-cal*
*d-cal*
*tide-cal*
Collect.py must have a file that provides experiment-level metadata. The default name of this file is project_metadata.csv, and it should be in the same location as collect.py. The columns must contain:
-
- Experiment Name [project_name]- string must match the directory name where the data files are (case sensitive)
-
- Scientist Name [contributor_name] - Name of the PI conducting the research
-
- Title [project_title] - longer version of the Experiment Name
-
- abstract [project_summary] - experiment summary: what was collected and why?
-
- default server location [catalog_xml] - where to download the data from A different file name may be specified using the -c option
To see the help for collect.py enter:
python collect.py -h
If you don't specify any projects with the -p option, it will try to do all the experiments it finds in the .csv file. The default command to do this is (it's always a good idea to git pull first):
python collect.py --download --output=../../CF-1.6new/
When this is complete, it will have put ALL the files into a directory called download under the cwd. I usually cd to ...emontgomery/stellwagen/usgs-cmg-portal/woods_hole_obs_data before working, so the download directory is here. In theory it will run all the experiments perfectly after downloading and put them into sub-directories under whatever was put as the output. Should it fail and you need to re-run one or more experiments, use a command like this to just re-do DIAMONDShoals:
python collect.py -p DIAMONDSHOALS --folder download -o ../../CF-1.6new
In some instances (say a new experiment being added), you want to just download from one directory and convert that. The command below downloads and converts the HURRIRENE_BB files. The last column of the project_metadata.csv has the URL from which to get the data for the experiment directory listed in column 1.
python collect.py --projects HURRIRENE_BB --download --output=../../CF-1.6/
The default location to put the downloaded files is the /download directory in the cwd. If we use this we get all the files we've ever collected in that directory. To retain our directory structure, a command like this puts the EPIC data in ../../../tmp/HURRIRENE_BB (a location not in the datasetScan path), and the CF output in ../../CF-1.6 (where it will make a subdirectory of the project_name).
$python collect.py -p HURRIRENE_BB --folder ../../../tmp/HURRIRENE_BB --download -o ../../CF-1.6
The output directory CF-1.6 is in the datasetScan path. If you write to a different output location, you need to copy or move the files to under CF-1.6. It's best to save the original if a major revision is done.
You MUST do a --download on each dataset initially because it adds an id global attribute containing the string in --projects and filename_root to each file. If that attribute isn't there, subsequent runs of collect.py using the --folder option will fail. Therefore, if you have a local set of files that you want to convert, collect.py won't work until either you a) put the data on a TDS and use the --download option or b) add an id attribute to each file.
In the case above, since we've already downloaded HURRIRENE_BB, if we wanted to update the CF-1.6 by running a more recent version of collect.py, we can skip the --download and use the local folder we created in the previous step.
python collect.py -p HURRIRENE_BB --folder ../../../tmp/HURRIRENE_BB -o ../../CF-1.6