OPALS Generic Format

The OPALS Generic Format defines custom file formats for point coordinates and (optionally) corresponding attributes in an OPALS Format Definition. Data may be imported and exported either in text or binary representations. All data belonging to a point record is located either on a separate line (text) or in a consecutive block of memory (binary). Which attributes are used, the way they are formatted, and in which order they appear within a record, is up to the user.

Due to the ability of the OPALS datamanager (ODM) to store arbitrary attributes along with vector data, it is possible to import and export almost any line-based data format into/from an ODM.

OPALS Generic Format Definition for Import and Export

Generic Format Files are OFD files with the XML element specifying the format either being <text> or <binary>. This element may contain a <header> element that describes the start of a file. After this optional element, one or more of the following elements must follow:

  • <column> (text), or <segment> (binary)
  • <skip>

The sequence of these elements defines the format for a single point record:

Each <column> or <segment> element defines either a coordinate or an attribute to be accessed. Only the element's name attribute must be specified in any case. Further XML-attributes may be supported or even necessary - depending on the name, text or binary format definition, and whether the OFD is only used for import, or also for export:

  • format defines the format used to print numeric values to text files: width.precision
  • invalidValue defines the value to be exported if a point does not feature the attribute.
  • typeFile defines the coordinate's or attribute's binary representation on file. For possible values, see Supported Data Types
  • type defines the internal data type to be used for a custom attribute. For possible values, see Supported Data Types

Instead of specifying the value of invalidValue directly, for numeric attributes, the following identifiers have a special meaning:

  • Max: meaningful for all numeric types.
  • Min: meaningful only for signed integers and floating-point numbers. For unsigned integers: zero
  • NaN: supported for floating-point numbers only: quiet (non-signaling) "Not-a-Number"
  • Inf: supported for floating-point numbers only: positive infinity

If invalidValue is not given, 0 is used for numeric and character types, and an empty string for strings.

<skip> allows for ignoring one or more (consecutive) columns when importing text files, while <skip> yields the specified number of bytes to be ignored when reading binary files. Like any OFD file, Generic Format Files must comply with the OFD Schema file.

As an example, consider the following simple text file:

#This is a simple xyz file
-31499.980 209747.690 2159.780
-31499.860 209747.170 2160.710
-31499.740 209746.740 2161.780
[...]

For a successful import, the first line must be skipped. The rest may be interpreted as lines containing X, Y, and Z coordinates each, in that order.

The following xml block shows an excerpt of the corresponding text OFD XML file:

<text>
<header skipLines='1' />
<column name='x' />
<column name='y' />
<column name='z' />
</text>

The file given above may be readily used for import and export. For export, however, one may opt for documenting the file content by usage of a header text. Furthermore, the width and precision of exported numbers may need to be adapted:

4  <text>
5  <header text='#This line could provide information about the file content' />
6 
7  <column name='x' format='12.3' />
8  <column name='y' format='12.3' />
9  <column name='z' format='9.3' />
10  <column name='Amplitude' format='9.3' />
11  </text>

For the complete OFD file, see $OPALS_ROOT/addons/formatdef/simpleAscii.xml.

Handling arbitrary attributes

As described here, the ODM supports predefined and user-defined attributes. User-defined attributes are indicated by a prefixing "_" (underscore) character in the attribute name. While user-defined attributes require the definition of type using one of the XML representations, predefined attributes have predefined data types, and hence type must not be given. Consider the following text file that contains points and corresponding attributes:

# a more complex file
# x y z GPSTime EchoWidth UserData
24820.774 311160.141 322.452 314358.431470 2.606 50
24820.035 311161.159 319.200 314358.431470 1.736 43
24820.599 311160.576 322.863 314358.431485 3.541 28
[...]

The semantics of column 4 and 5 match the predefined attributes GPSTime and EchoWidth. Column 6 has in terms of OPALS an unknown semantic, which may be accessed using a user-defined attribute. An appropriate OFD file could look like:

[...]
<text>
<header skipLines="2" />
<column name='x' />
<column name='y' />
<column name='z' />
<column name='GPSTime' />
<column name='EchoWidth' />
<column name='_UserData' type='uint32' />
</text>
[...]

If a text file to be imported contains more columns than defined by the OFD file, then the trailing columns are ignored.

So far, only examples of text OFD files have been given, but the definition of coordinate columns and attributes is similar for binary files. However, the Generic Format concept also supports features that are only valid for text or for binary file definitions. Those are described in the following sections, starting with special text features.

Definition of Text Formats

OPALS supports 4 additional attributes that may be defined within an <text> tag

  1. <decimalSeparator> (default ".")
  2. <columnSeparators> (default: horizontal tabs and blanks)
  3. <commentInitiator> (default: empty)
  4. <skipWhiteSpace> (default: "true")

Since OPALS implements appropriate defaults for all elements, their definition is optional as declared in the schema file.

The <decimalSeparator> attribute is useful for text files that were generated by localised programs (e.g. using "," instead of "." as decimal separator).

<columnSeparators> specifies one or more characters. Any of them is considered as separating coordinates or attributes in text files. If an OFD file does not define <columnSeparators>, then OPALS uses white space as column separator. In case <columnSeparators> defines more than one character, Module Export uses the first character for separation.

The <commentInitiator> attribute tells OPALS to ignore lines that start with the given character (string). Consider the situation that a text point file contains a few outliers. One could remove those points by deleting the entire line of the corresponding points. This, however, requires an additional documentation step or the outlier information is lost (or at least difficult to reproduce). Using the <commentInitiator> attribute, the corresponding points can be simply "commented out" for import. Although text formats often use the hash character ("#") for comments, OPALS does not implement a <commentInitiator> default.

Unless <skipWhiteSpace> is set to false, white space is skipped during the import of text files.

Definition of Binary Formats

When accessing binary files, it is most important to specify the correct <typeFile> using one of the XML representations found here - one needs to know beforehand whether e.g. point coordinates are stored on file as single (float) or double precision (double) real numbers. By default, OPALS assumes <typeFile> to be identical to <type>.

The binary representation of numeric data types is generally not portable across different processor architectures. As an advanced feature, OPALS therefore allows for the specification of the byte representation to be used on file (<endian>), while using native endianness by default.

Examples

Example OFD file 'simpleAscii.xml'

The file can be found in the $OPALS_ROOT/addons/formatdef/ directory

<opalsFormatDefinition>
<description>Text Format: X, Y, Z, Amplitude</description>
<text>
<header text='#This line could provide information about the file content' />
<column name='x' format='12.3' />
<column name='y' format='12.3' />
<column name='z' format='9.3' />
<column name='Amplitude' format='9.3' />
</text>
</opalsFormatDefinition>

Example OFD file 'simpleBinary.xml'

The file can be found in the $OPALS_ROOT/addons/formatdef/ directory

<opalsFormatDefinition>
<description>Binary Format: X, Y, Z, Amplitude</description>
<binary>
<segment name='x' />
<segment name='y' />
<segment name='z' />
<segment name='Amplitude' />
</binary>
</opalsFormatDefinition>
Author
jo,wk
Date
06.07.2011