CSV format definition files define the organisation of information in delimited text files used to define station coordinates and observations used by SNAP.
Although they are termed CSV (comma separated value) files, in fact the format is generic, in that other separators can be used. SNAP accepts two basic delimited text files formats:
Usually the first line of the data file contains the names of each data field (column) in the file. However this is not necessary, the format definition can specify the column names. If neither define the names then the fields are named "col1", "col2", ...
SNAP "normalises" the names of the columns in the data file by replacing any characters that are not letters, numbers with an underscore character. Consecutive non alphanumeric characters are replaced with a single underscore. SNAP also ignores the case of characters. So the column names "FROM", " From ", and "from" are all equivalent, as are "!!!! from*" and "_from_".
The CSV format definition file specifies the structure of the CSV file - the delimiters, column names and so on, and also the organisation of station or coordinate information in the file - that is which fields represent station codes, coordinates, and so on, and what values to use where this information is not defined in the data file.
The format definition file is itself formatted as a series of definition commands, one per line. Blank lines, and lines in which the first non-blank character is "!" are ignored. The first word on the line is a command, and this is followed by text definition the corresponding value. For example the "FORMAT" command could be entered as:
FORMAT DELMITER=| HEADER=Y
The commands are case insensitive, as are references to column names in the CSV file
There are three categories of commands - generic commands that are not specific to stations or observations, station commands, which only apply to station file format definitions, and observation commands, which only apply to observation file format definitions.
Most of the station and observation commands specify how values for components of the station or observation definition are derived from the columns in the data file. Usually this will be simply the name of the column containing the information, but it may be a more complex value concatenating several columns, or using constant values or options specified in the SNAP coordinate_file or data_file commands. The specification of these values is described below.
It may be useful to look at the example station and observation format definition files.
FORMAT_NAME description
A brief descriptive name for the format
FORMAT CSV [HEADER=Y|N]
FORMAT WHITESPACE [HEADER=Y|N]
FORMAT DELIMITER=d QUOTE=q ESCAPE=e [HEADER=Y|N]
Defines how the file is delimited, either by whitespace, or by a delimiter, quote,
and escape character. The default for the delimiter is a comma. The defaults for
the quote and escape character are nothing. Use "tab" to represent the tab character,
and "space" to represent a blank. The CSV format is equivalent to
DELIMITER=, QUOTE=" ESCAPE=". If HEADER=Y (or the HEADER option
is not specified) then column names
are read from the first line in the data file, otherwise the default column names are
"col1", "col2", and so on...
COLUMNS name1 name2 name3 ...
Specifies names for the columns in the data file. These names will override names read from the data file if HEADER=Y.
REQUIRED_COLUMNS name1 name2 name3 ...
Specifies names of columns that must be in the data file for it to be loaded. If this is not specified then every column name used to define values in the format definition is assumed to be required unless there is also a default value where it is used, it is defined as optional (below).
OPTIONAL_COLUMNS name1 name2 name3 ...
Specifies names of columns that are optional in the data file. Used if REQUIRED_COLUMNS is not specified to identify columns that are not required though they may be used.
REQUIRED_CONFIGURATION name1 name2 name3 ...
Specifies configuration items required in the SNAP coordinate_file or data_file command.
SKIP_LINES nlines
Specifies a number of lines that will be ignored at the head of the data file before the header line or data lines are read.
LOOKUP lookup_name
name value
name value
name value
...
default value
END_LOOKUP
Defines a lookup table that may be used to translate values read from the data file. Each lookup table has a lookup_name. This is used in the specification of data values.
COORDINATE_SYSTEM value
CODE value
NAME value
(LONGITUDE|EASTING|X) value
(LATITUDE|NORTHING|Y) value
(HEIGHT|Z) value
GEOID_UNDULATION value
DEFLECTION_EAST value
DEFLECTION_NORTH value
HEIGHT_TYPE value
The values used to define the station. The COORDINATE_SYSTEM value should match one of the codes defined in the coordinate system definition file. It must be the same for every mark in the data file. The ordinates LONGITUDE and EASTING are equivalent, as are LATITUDE and NORTHING. Either can be used. The HEIGHT_TYPE must be one of "ellipsoidal" or "orthometric". The default is "orthometric". The values are calculated as describe below.
CLASSIFICATION name value
Specifies a classification that will be assigned to the station. The value is calculated as describe below.
CLASSIFICATION_COLUMNS col1 col2 ...
Specifies columns that will be used as classifications for each station. Each column becomes a classification with the same name as the column. This command supports simple wild cards - prefix* will match any column with a name starting prefix, and prefix** will match the same columns, but not include the prefix as part of the name of the classification.
ANGLE_FORMAT format_type
Defines how longitude and latitude coordinate values are expressed. This can be one of "deg" ("degrees"), "dms" ("dms_angles"), or "hp" ("hp_angles").
Each record can contain one or more observation. Each observation is defined as:
OBSERVATION
observation commands
observation commands
...
END_OBSERVATION
The observation commands within each observation block are:
TYPE value
SET_ID value
INSTRUMENT_STATION value
INSTRUMENT_HEIGHT value
TARGET_STATION value
TARGET_HEIGHT value
VALUE value
ERROR value
ERROR_FACTOR value
DATETIME value
PROJECTION value
ID value
NOTE value
REJECTED value
Specifies the attributes of the observation.
The TYPE value should be one of the SNAP observation data types.
The SET_ID is used to group observations into sets (eg rounds of horizontal angles). Consecutive observations with the same non-blank set id and the same instrument station form a set of observations.
The observation is rejected in SNAP if the REJECTED value is "Y".
The observation VALUE and ERROR are in metres for distances and degrees for angles (except for "calculated" distance and vector errors - see comments below). For vector observations the VALUE and ERROR should include all components of the observation separated by whitespace (Note that the value may be compiled from several columns in the data file - see below).
The ERROR_FACTOR is a factor by which the errors of the observation are multiplied. The default is 1.
The format of DATETIME value is described in below under the DATETIME_FORMAT command.
The ID is an integer id associated with the observation. It is not used by SNAP.
REFERENCE_FRAME value
REFRACTION_COEFFICIENT value
BEARING_ORIENTATION_ERROR value
DISTANCE_SCALE_FACTOR value
These fields are the equivalent of classifications named "ref_frame_code", "refraction_coef_code", "bearing_error_code", and "distance_scale_code" respectively.
CLASSIFICATION name value
Specifies a classification that will be assigned to the observation. The value is calculated as describe below.
CLASSIFICATION_COLUMNS col1 col2 ...
Specifies columns that will be used as classifications for the observation. Each column becomes a classification with the same name as the column. This command supports simple wild cards - prefix* will match any column with a name starting prefix, and prefix** will match the same columns, but not include the prefix as part of the name of the classification. Note this command specifies the column names, not the column values, so the column names are not preceded by @.
VECTOR_ERROR_TYPE error_type
Defines the structure of error information for vector data. Here error_type can be one of:
| enu | Se Sn Su |
| enu_correlation | Se Sn Su Ren Reu Rnu |
| diagonal | Sx Sy Sz |
| correlation | Sx Sy Sz Rxy Rxz Ryz |
| full | Cxx Cxy Cyy Cxz Cyz Czz |
| calculated | SCe SCn SCu mm SPe SPn SPu ppm |
where x, y, z, e, n, u represent the X, Y, Z, east, north, and up components of the vector, and C, S, R, SC, and CP represent the covariance, standard error, correlation, constant component of standard error and proportional component of standard error respectively. The calculated option calculates the error based on the length of the vector as the root sum of squares of the constant and proportional components.
DISTANCE_ERROR_TYPE error_type
Defines how the error for distance observations is derived. This can be one of:
| value | The error is a value in metres |
| calculated | The error is a string formatted as "SCd mm SPd ppm" where SCd is the constant component of the standard error and SPd is the proportional component. These are used to calculate the error of the line from its length as the root sum of squares of the two components. |
ANGLE_FORMAT format_type
Defines how angle values are expressed in the data file. This can be one of "deg" ("degrees"), "dms" ("dms_angles"), or "hp" ("hp_angles").
ANGLE_ERROR_UNITS error_units
Defines how the error for angle observations (horizontal angles, azimuth and bearing observations) is derived. This can be one of:
| default | The error is a value in degrees if the angle format is "degrees", or in seconds if the angle format is "dms" or "hp". |
| degrees | The error is a value in degrees. |
| seconds | The error is a value in seconds. |
ANGLE_ERROR_TYPE error_type
Defines how the error for angle observations (horizontal angles, azimuth and bearing observations) is derived. This can be one of:
| value | The error is a value in degrees or seconds depending on the angle format and angle error type. |
| calculated | The error is a string formatted as "SCd sec SPd mm" where SCd is the constant component of the standard error in seconds and SPd is the component due to plumbing errors in mm which is converted to an angle error based on the length of the line. The root sum of squares of these components is the total angle error. Note that SNAP calculates this based on the coordinates when the observations are loaded - they do not get updated when the coordinates are changed at each iteration. You need to run SNAP again with the new coordinates to recalculate the errors. |
ZENITH_DISTANCE_ERROR_TYPE error_type
Defines how the error for zenith distance angle observations is derived. This can be one of:
| value | The error is a value in degrees or seconds depending on the angle format and angle error type. |
| calculated | The error is a string formatted as "SCd sec SPh mmh SPvmmv" where SCd is the constant component of the standard error in seconds, SPh is the component due to plumbing errors in mm, and SPv is the component due to instrument heighting errors in mm. The root sum of squares of these components is the total angle error. Note that SNAP calculates the total error using the distance between the endpoint station coordinates when the observations are loaded - they do not get updated when the coordinates are changed at each iteration. You need to run SNAP again with the new coordinates to recalculate the errors. |
HEIGHT_DIFFERENCE_ERROR_TYPE error_type
Defines how the error for height difference observations (levelling observations) is derived. This can be one of:
| value | The error is a value in metres |
| calculated | The error is a string formatted as "SCd mm SPh mmrkm sqrt rlen" where SCd is the constant component of the standard error in millimetres, SPh is a component proportional to the square root of the levelling run in millimetres per root kilometres, and rlen is the length of the levelling run in metres (this will usually be derived from a column (eg "2.0 mmrkm sqrt " @RUNLEN). The root sum of squares of these components is the total height difference error. |
DATETIME_FORMAT format_string
Defines the format of date/time values. Date time values should contain the year, month, day, and optionally hour, minute and second of the date/time. The month can be either a month name or a number. The year must be a full 4 digit year. Each component of the date is separated by one or more non alphanumeric characters. SNAP accepts a variation in which the month and day are replaced by a day of the year (1 January = 1, etc). The order of the components in the string is defined by the format_string argument which contains the characters Y, M, D, h, m, s, and N representing year, month, day, hour, minute, second, and day of year.
For example, if the format is "DMY hm" then any of the following
date/time values are valid.
25 dec 2011 7 15
25/12/2011 07:15
25 december 2011 7 15
IGNORE_MISSING_OBSERVATIONS
If this option is specified, SNAP will silently ignore observations for which the value field is blank.
Note: The error_type option "value" was previously "error". This option is still supported but is deprecated.
Each of the fields above with a value definition can include a value definition constructed from the values of columns, configuration values (from the SNAP coordinate_file or data_file command), literal text, and values calculated from a lookup table.
These are entered using the following syntax:
| Format | Description | Example |
| @colname | The name of a column from which to take the value. | @CODE |
| $config | The name of a configuration setting from which to take the value | $coordsys |
| "literal text" number |
The name of a configuration setting from which to take the value | "DS" 0.003 |
| lookupname(@colname) lookupname($config) |
The result of looking up the column value or configuration value in the named lookup. | methoderror(@method) |
The value definition can consist of as many of these are required to construct the value. For example a vector observation error may be formed by combining fields ERR_EAST, ERR_NORTH, and ERR_UP. This could be defined in the definition file as:
VECTOR_ERROR_TYPE ENU
ERROR @ERR_EAST " " @ERR_NORTH " " @ERR_UP
Note the literal blank strings here are required to separate the components in the resulting value.
The definition can also specify a default to use when the value evaluates to a blank string. The default is specified using a syntax:
value default value
For example a coordinate system may be defined as
COORDINATE_SYSTEM @CRDSYS default $CRDSYS default "NZGD2000"
meaning that the value will be taken from the CRDSYS column in the station file, or if that is blank or not defined, from the CRDSYS configuration item the SNAP command file, and if that is also not defined it will be NZGD2000.
The format_string, error_units and error_type values can be defined in a similar way except that they apply to the entire data file, so they can include literal strings, configuration items, and lookup values but not column data values (@colname). The classification columns col1 ... can only be literal strings.