This repo provides a single command (at least for now) called pds3to4
which
converts PDS3 label files to the new PDS4 format.
You can get very basic help at the command line.
$ pds3to4 -h
Usage: pds3to4 [-dhnprV] [-f=FORMAT] [-l=TYPE] [-o=FILE] [-t=TEMPLATE]
[PDS3-LABEL|DIR]
Convert PDS3 label files to PDS4
[PDS3-LABEL|DIR] path to label file
-d, --delete-report remove validation reports for successful validations
-f, --format=FORMAT path to fmt file
-h, --help Show this help message and exit.
-l, --label-type=TYPE label type (rda, re1 or min)
-n, --no-validate skip NASA PDS validation
-o, --output=FILE path to output file
-p, --print-report print validation reports to standard output instead
of to file
-r, --recursive process each .lbl file in a directory
-t, --template=TEMPLATE path to template file
-V, --version Print version information and exit.
To run the conversion on a single file
$ pds3to4 "00-basic/cma_404470826rda00790050104ch11503p1.lbl"
This will generate a PDS4, XML-formatted label and a plain-text report from the PDS validate too, both in the same directory as the label file.
╭───┬──────────────────────────────────────────────────────────┬──────┬─────────╮
│ # │ name │ type │ size │
├───┼──────────────────────────────────────────────────────────┼──────┼─────────┤
│ 0 │ 00-basic/cma_404470826rda00790050104ch11503p1.csv │ file │ 11.1 KB │
│ 1 │ 00-basic/cma_404470826rda00790050104ch11503p1.lbl │ file │ 3.9 KB │
│ 2 │ 00-basic/cma_404470826rda00790050104ch11503p1.xml │ file │ 11.2 KB │
│ 3 │ 00-basic/cma_404470826rda00790050104ch11503p1.xml.report │ file │ 1.1 KB │
╰───┴──────────────────────────────────────────────────────────┴──────┴─────────╯
You can automatically delete the .report
file if the conversion is successful
by providing the -d
flag.
$ pds3to4 -d "00-basic/cma_404470826rda00790050104ch11503p1.lbl"
╭───┬──────────────────────────────────────────────────────────┬──────┬─────────╮
│ # │ name │ type │ size │
├───┼──────────────────────────────────────────────────────────┼──────┼─────────┤
│ 0 │ 00-basic/cma_404470826rda00790050104ch11503p1.csv │ file │ 11.1 KB │
│ 1 │ 00-basic/cma_404470826rda00790050104ch11503p1.lbl │ file │ 3.9 KB │
│ 2 │ 00-basic/cma_404470826rda00790050104ch11503p1.xml │ file │ 11.2 KB │
╰───┴──────────────────────────────────────────────────────────┴──────┴─────────╯
The command can be run on a directory instead of just a file to convert all
lbl
files. To do that, you'll need to provide the -r
flag.
$ pds3to4 -r -d "01-delivery-32/"
╭────┬─────────────────────────────────────────────────────────┬──────┬─────────╮
│ # │ name │ type │ size │
├────┼─────────────────────────────────────────────────────────┼──────┼─────────┤
│ 0 │ 01-delivery-32/CMB_718398059MIN36140971734CH00111P1.lbl │ file │ 3.4 KB │
│ 1 │ 01-delivery-32/CMB_718398059MIN36140971734CH00111P1.xml │ file │ 8.0 KB │
│ 2 │ 01-delivery-32/CMB_718398059RDA36140971734CH00111P1.lbl │ file │ 3.1 KB │
│ 3 │ 01-delivery-32/CMB_718398059RDA36140971734CH00111P1.xml │ file │ 8.6 KB │
│ 4 │ 01-delivery-32/CMB_718398430RE136140971734CH00111P1.lbl │ file │ 4.4 KB │
│ 5 │ 01-delivery-32/CMB_718398430RE136140971734CH00111P1.xml │ file │ 11.6 KB │
│ 6 │ 01-delivery-32/CMB_720868365RE136420980270CH00111P1.lbl │ file │ 2.9 KB │
│ 7 │ 01-delivery-32/CMB_720868365RE136420980270CH00111P1.xml │ file │ 7.5 KB │
│ 8 │ 01-delivery-32/cmb_718398059min36140971734ch00111p1.csv │ file │ 207 B │
│ 9 │ 01-delivery-32/cmb_718398059rda36140971734ch00111p1.csv │ file │ 10.3 KB │
│ 10 │ 01-delivery-32/cmb_718398430re136140971734ch00111p1.csv │ file │ 20.0 KB │
│ 11 │ 01-delivery-32/cmb_720868365re136420980270ch00111p1.csv │ file │ 18.9 KB │
╰────┴─────────────────────────────────────────────────────────┴──────┴─────────╯
If you forget the -r
flag, you'll get an error that looks something like this
$ pds3to4 -r "01-delivery-32/"
ERROR: 01-delivery-32/ failed to process
01-delivery-32 (Is a directory)
The utility makes use of a fairly strict PDS3 label parser. If it fails to read the file, then you'll get an error that looks something like this:
$ pds3to4 -r "02-parsing-error/"
Failed to parse file:
02-parsing-error/cma_404470826rda00790050104ch11503p1.lbl:5:77 missing ')' at '\r'
ERROR: 02-parsing-error/cma_404470826rda00790050104ch11503p1.lbl failed to process
failed to parse "02-parsing-error/cma_404470826rda00790050104ch11503p1.lbl"
This tell us that there's a problem in the
02-parsing-error/cma_404470826rda00790050104ch11503p1.lbl
file at line 5
around character 77. Taking a look at the file, we see that the )
at the end
of the 5th line is missing.
3 │ RECORD_BYTES = 12288
4 │ FILE_RECORDS = 981
5 │ ^HEADER = ("CMA_404470826RDA00790050104CH11503P1.CSV",1
6 │ ^SPREADSHEET = ("CMA_404470826RDA00790050104CH11503P1.CSV",2)
7 │ DATA_SET_ID = "MSL-M-CHEMIN-4-RDR-V1.0"
The utility uses the PDS Validate validation tool to automatically check the generated PDS4 XML file. If the validation fails, you'll see an error that looks something like
$ pds3to4 -r -d `03-validation-error/`
ERROR: 03-validation-error/cma_404470826rda00790050104ch11503p1.lbl failed to process
generated file failed to validate
The resulting XML file will be renamed to *.xml.err
and the report will be
retained (even if the -d
flag is provided)
╭───┬─────────────────────────────────────────────────────────────────────┬──────┬─────────╮
│ # │ name │ type │ size │
├───┼─────────────────────────────────────────────────────────────────────┼──────┼─────────┤
│ 0 │ 03-validation-error/cma_404470826rda00790050104ch11503p1.csv │ file │ 11.1 KB │
│ 1 │ 03-validation-error/cma_404470826rda00790050104ch11503p1.lbl │ file │ 3.9 KB │
│ 2 │ 03-validation-error/cma_404470826rda00790050104ch11503p1.xml.err │ file │ 11.2 KB │
│ 3 │ 03-validation-error/cma_404470826rda00790050104ch11503p1.xml.report │ file │ 1.6 KB │
╰───┴─────────────────────────────────────────────────────────────────────┴──────┴─────────╯
Taking a look at the report
...
Product Level Validation Results
FAIL: file:03-validation-error/cma_404470826rda00790050104ch11503p1.xml
Begin Content Validation: file:03-validation-error/cma_404470826rda00790050104ch11503p1.csv
ERROR [error.table.records_mismatch] data object 2: Number of records read is not equal to the defined number of records in the label (expected 1000, got 980).
End Content Validation: file:03-validation-error/cma_404470826rda00790050104ch11503p1.csv
1 product validation(s) completed
...
tells us that the associated CSV file has 1000 rows in it, but the label file says there should be 980.
If you are sure that the validation tool is wrong, you can simply remove the
report file and renamed the *.xml.err
file to *.xml
.
If you would like to run the conversion without using the validation tool, you
can provide the -n
flag at the command line.
$ pds3to4 -r -n `03-validation-error/`
╭───┬──────────────────────────────────────────────────────────────┬──────┬─────────┬─────────────╮
│ # │ name │ type │ size │ modified │
├───┼──────────────────────────────────────────────────────────────┼──────┼─────────┼─────────────┤
│ 0 │ 03-validation-error/cma_404470826rda00790050104ch11503p1.csv │ file │ 11.1 KB │ an hour ago │
│ 1 │ 03-validation-error/cma_404470826rda00790050104ch11503p1.lbl │ file │ 3.9 KB │ an hour ago │
│ 2 │ 03-validation-error/cma_404470826rda00790050104ch11503p1.xml │ file │ 11.2 KB │ now │
╰───┴──────────────────────────────────────────────────────────────┴──────┴─────────┴─────────────╯
By default, the utility uses Velocity templates
that are bundled with the utility to generate the PDS4 output. You can override
this entirely by providing the -t <template>
option at the command line.
NOTE Since the example template won't validate, we also provide the -n
flag.
$ pds3to4 -r -n -t `04-custom-template/template.vm` `04-custom-template/`
The example template looks like
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Product_Observational>
<Identification_Area>
<logical_identifier>
$label.LogicalIdentifier
</logical_identifier>
</Identification_Area>
</Product_Observational>
and the resulting file looks like this
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Product_Observational>
<Identification_Area>
<logical_identifier>
cma_404470826rda00790050104ch11503p1
</logical_identifier>
</Identification_Area>
</Product_Observational>
By default, the utility uses (slightly modified) the PD3-style fmt format files bundled with the tool to determine what the columns of the CSV data file look like. For the RDA products, the format is
OBJECT = FIELD
NAME = "2-THETA"
DATA_TYPE = ASCII_REAL
UNIT = "DEGREES"
BYTES = 6
FORMAT = "F6.2"
DESCRIPTION = "2-theta"
END_OBJECT = FIELD
OBJECT = FIELD
NAME = "INTENSITY"
DATA_TYPE = ASCII_REAL
UNIT = "COUNTS"
BYTES = 7
FORMAT = "F7.0"
DESCRIPTION = "The intensity of the diffraction for each 2-theta value in column 1"
END_OBJECT = FIELD
The DESCRIPTION
property is something we've added to allow you to specify the
description of the column in the XML label. The resulting output is
<Record_Delimited>
<fields>2</fields>
<groups>0</groups>
<Field_Delimited>
<name>2-THETA</name>
<field_number>1</field_number>
<data_type>ASCII_Real</data_type>
<maximum_field_length unit="byte">6</maximum_field_length>
<unit>degrees</unit>
<description>2-theta</description>
</Field_Delimited>
<Field_Delimited>
<name>INTENSITY</name>
<field_number>2</field_number>
<data_type>ASCII_Real</data_type>
<maximum_field_length unit="byte">7</maximum_field_length>
<unit>counts</unit>
<description>The intensity of the diffraction for each 2-theta value in column 1</description>
</Field_Delimited>
</Record_Delimited>
If FMT file does not have a DESCRIPTION
property, then the NAME
is used
for the <description></description>
in the output.
You can provide your own format, if you'd like, using the -f <format>
option.
NOTE Since the result will not validate, we provide the -n
flag.
$ pds3to4 -r -d -n -f `05-custom-format/format.fmt` `05-custom-format/`
If 05-custom-format/format.fmt
has the following content (which it does),
OBJECT = FIELD
NAME = "TWO-THETA"
DATA_TYPE = ASCII_REAL
UNIT = "DEGREES"
BYTES = 6
FORMAT = "F6.2"
DESCRIPTION = "2-theta"
END_OBJECT = FIELD
Then the resulting record properties will be
<Record_Delimited>
<fields>1</fields>
<groups>0</groups>
<Field_Delimited>
<name>TWO-THETA</name>
<field_number>1</field_number>
<data_type>ASCII_Real</data_type>
<maximum_field_length unit="byte">6</maximum_field_length>
<unit>degrees</unit>
<description>2-theta</description>
</Field_Delimited>
</Record_Delimited>