-
Notifications
You must be signed in to change notification settings - Fork 13
collect.py usage hints for converting time series observation data
collect.py is a program to read EPIC format netcdf(3) files [grouped by the experiment in which they were collected] and convert them to CF-1.6 compliant (discrete samples) netcdf(4) files. These output CF-1.6 files will be incorporated into the portal, and be harvested into the IOOS database.
Collect.py must have a file that provides experiment-level metadata. The default name of this file is project_metadata.csv, and it should be in the same location as collect.py. The columns must contain: 1- Experiment Name [project_name]- string must match the directory name where the data files are (case sensitive) 2- Scientist Name [contributor_name] - Name of the PI conducting the research 3- Title [project_title] - longer version of the Experiment Name 4- abstract [project_summary] - experiment summary: what was collected and why? 5- default server location [catalog_xml] - where to download the data from A different file name may be specified using the -c option
If you don't specify any projects with the -p option, it will try to do all the experiments it finds in the .csv file. To see the help for collect.py enter:
python collect.py -h
The command below converts the HURRIRENE_BB files, since it has --download specified. The last column of the project_metadata.csv has the URL from which to get the data for the experiment directory listed in column 1.
python collect.py --projects HURRIRENE_BB --download --output=../../CF-1.6/
The default location to put the downloaded files is the /download directory in the cwd. If we use this we'd get all the files we've ever collected in that directory. To retain our directory structure, a command like this puts the EPIC data in ../../../tmp/HURRIRENE_BB (a location not in the datasetScan path), and the CF output in ../../CF-1.6 (where it will make a subdirectory of the project_name). The output director is in the datasetScan path
$python collect.py -p HURRIRENE_BB --folder ../../../tmp/HURRIRENE_BB --download -o ../../CF-1.6
You MUST do a --download on each dataset initially because it adds a project_name global attribute containing the string in --projects to each file. If that attribute isn't there, subsequent runs of collect.py using the --folder option will fail. Therefore, if you have a local set of files that you want to convert, collect.py won't work until either you a) put the data on a TDS and use the --download option or b) add a -project_name attribute to each file.
Since we've already downloaded HURRIRENE_BB, if we wanted to update the CF-1.6 by running a more recent version of collect.py, we can skip the --download and use the local folder we created in the previous step.
python collect.py -p HURRIRENE_BB --folder ../../../tmp/HURRIRENE_BB -o ../../CF-1.6