collect.py usage hints for converting time series observation data

collect.py is a program to read EPIC format netcdf(3) files [grouped by the experiment in which they were collected] and convert them to CF-1.6 compliant (discrete samples) netcdf(4) files. These output CF-1.6 files will be incorporated into the portal, and be harvested into the IOOS database.

Collect.py must have a file that provides experiment-level metadata. The default name of this file is project_metadata.csv, and it should be in the same location as collect.py. The columns must contain: 1- Experiment Name [project_name]- string must match the directory name where the data files are (case sensitive) 2- Scientist Name [contributor_name] - Name of the PI conducting the research 3- Title [project_title] - longer version of the Experiment Name 4- abstract [project_summary] - experiment summary: what was collected and why? 5- default server location [catalog_xml] - where to download the data from A different file name may be specified using the -c option

If you don't specify any projects with the -p option, it will try to do all the experiments it finds in the .csv file. To see the help for collect.py enter:

 python collect.py -h

The command below converts the HURRIRENE_BB files, since it has --download specified. The last column of the project_metadata.csv has the URL from which to get the data for the experiment directory listed in column 1.

python collect.py --projects HURRIRENE_BB --download --output=../../CF-1.6/

The default location to put the downloaded files is the /download directory in the cwd. If we use this we'd get all the files we've ever collected in that directory. To retain our directory structure, a command like this puts the EPIC data in ../../../tmp/HURRIRENE_BB (a location not in the datasetScan path), and the CF output in ../../CF-1.6 (where it will make a subdirectory of the project_name). The output director is in the datasetScan path

$python collect.py -p HURRIRENE_BB --folder ../../../tmp/HURRIRENE_BB --download -o ../../CF-1.6

You MUST do a --download on each dataset initially because it adds a project_name global attribute containing the string in --projects to each file. If that attribute isn't there, subsequent runs of collect.py using the --folder option will fail. Therefore, if you have a local set of files that you want to convert, collect.py won't work until either you a) put the data on a TDS and use the --download option or b) add a -project_name attribute to each file.

Since we've already downloaded HURRIRENE_BB, if we wanted to update the CF-1.6 by running a more recent version of collect.py, we can skip the --download and use the local folder we created in the previous step.

python collect.py -p HURRIRENE_BB --folder ../../../tmp/HURRIRENE_BB -o ../../CF-1.6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

collect.py usage hints for converting time series observation data

Clone this wiki locally