OPALS Generic Format

The OPALS Generic Format defines custom file formats for point coordinates and (optionally) corresponding attributes in an OPALS Format Definition. Data may be imported and exported either in text (ASCII) or binary representations. All data belonging to a point record is located either on a separate line (ASCII) or in a consecutive block of memory (binary). Which attributes are used, the way they are formatted, and in which order they appear within a record, is up to the user.

Due to the ability of the OPALS datamanager (ODM) to store arbitrary attributes along with vector data, it is possible to import and export almost any line-based data format into/from an ODM.

OPALS Generic Format Definition for Import and Export

Generic Format Files are OFD files with the XML element specifying the format either being <ascii> or <binary>. They contain two principal elements:

  • <header> (optional)
  • <data>

The <header> element is optional, and may be used to describe the file content. The <data> element is compulsory. It defines the format for a single point record using a sequence of child elements:

  • <entry>
  • <skipColumns> (text), or <skipBytes> (binary)

Each <entry> element defines either a coordinate or an attribute to be accessed. While the entry's name (XML-attribute val) must be specified in any case, further XML-attributes may be supported or even necessary - depending on the entry name, text or binary format definition, and whether the OFD is only used for import, or also for export:

  • format defines the format used to print numeric values to text files: width.precision
  • invalidValue defines the value to be exported if a point does not feature the attribute.
  • externalType defines the coordinate's or attribute's binary representation on file. For possible values, see Supported Data Types
  • internalType defines the internal data type to be used for a custom attribute. For possible values, see Supported Data Types

Instead of specifying the value of invalidValue directly, for numeric attributes, the following identifiers have a special meaning:

  • Max: meaningful for all numeric types.
  • Min: meaningful only for signed integers and floating-point numbers. For unsigned integers: zero
  • NaN: supported for floating-point numbers only: quiet (non-signaling) "Not-a-Number"
  • Inf: supported for floating-point numbers only: positive infinity

If invalidValue is not given, 0 is used for numeric and character types, and an empty string for strings.

<skipColumns> allows for ignoring one or more (consecutive) columns when importing text files, while <skipBytes> yields the specified number of bytes to be ignored when reading binary files. Like any OFD file, Generic Format Files must comply with the OFD Schema file.

As an example, consider the following simple ASCII file:

#This is a simple xyz file
-31499.980 209747.690 2159.780
-31499.860 209747.170 2160.710
-31499.740 209746.740 2161.780
[...]

For a successful import, the first line must be skipped. The rest may be interpreted as lines containing X, Y, and Z coordinates each, in that order.

The following xml block shows an excerpt of the corresponding ASCII OFD XML file:

<ascii>
<header>
<skipLines val="1" />
</header>
<data>
<entry val="x" />
<entry val="y" />
<entry val="z" />
</data>
</ascii>

The file given above may be readily used for import and export. For export, however, one may opt for documenting the file content by usage of a header text. Furthermore, the width and precision of exported numbers may need to be adapted:

7  <ascii>
8  <header>
9  <text val="#This line could provide information about the file content" />
10  </header>
11  <data>
12  <entry val="x" format="12.3" />
13  <entry val="y" format="12.3" />
14  <entry val="z" format="9.3" />
15  <entry val="Amplitude" format="9.3" />
16  </data>
17  </ascii>

For the complete OFD file, see $OPALS_ROOT/addons/formatdef/simpleAscii.xml.

Handling arbitrary attributes

As described here, the ODM supports predefined and user-defined attributes. User-defined attributes are indicated by a prefixing "_" (underscore) character in the attribute name. While user-defined attributes require the definition of internalType using one of the XML representations, predefined attributes have predefined internal data types, and hence internalType must not be given. Consider the following text file that contains points and corresponding attributes:

# a more complex file
# x y z GPSTime EchoWidth UserData
24820.774 311160.141 322.452 314358.431470 2.606 50
24820.035 311161.159 319.200 314358.431470 1.736 43
24820.599 311160.576 322.863 314358.431485 3.541 28
[...]

The semantics of column 4 and 5 match the predefined attributes GPSTime and EchoWidth. Column 6 has in terms of OPALS an unknown semantic, which may be accessed using a user-defined attribute. An appropriate OFD file could look like:

[...]
<ascii>
<header>
<skipLines val="2" />
</header>
<data>
<entry val="x" />
<entry val="y" />
<entry val="z" />
<entry val="GPSTime" />
<entry val="EchoWidth" />
<entry val="_UserData" internalType="uint32" />
</data>
</ascii>
[...]

If a text file to be imported contains more columns than defined by the OFD file, then the trailing columns are ignored.

So far, only examples of ASCII OFD files have been given, but the definition of coordinate columns and attributes is identical for binary files. However, the Generic Format concept also supports features that are only valid for ASCII or for binary file definitions. Those are described in the following sections, starting with special ASCII features.

Definition of ASCII Formats

OPALS supports 4 additional elements that may be defined within an <ascii> tag

  1. <decimalSeparator> (default ".")
  2. <columnSeparators> (default: tabs and blanks)
  3. <commentInitiator> (default: empty)
  4. <skipWhiteSpace> (default: "True")

Since OPALS implements appropriate defaults for all elements, their definition is optional as declared in the schema file.

The <decimalSeparator> element is useful for ASCII files that were generated by localised programs (e.g. using "," instead of "." as decimal separator).

<columnSeparators> specifies one or more characters. Any of them is considered as separating coordinates or attributes in text files. If an OFD file does not define <columnSeparators>, then OPALS uses white space as column separator. In case <columnSeparators> defines more than one character, Module Export uses the first character for separation.

The <commentInitiator> element tells OPALS to ignore lines that start with the given character (string). Consider the situation that an ASCII point file contains a few outliers. One could remove those points by deleting the entire line of the corresponding points. This, however, requires an additional documentation step or the outlier information is lost (or at least difficult to reproduce). Using the <commentInitiator> element, the corresponding points can be simply "commented out" for import. Although ASCII formats often use the hash character ("#") for comments, OPALS does not implement a <commentInitiator> default.

Unless <skipWhiteSpace> is set to false, white space is skipped during the import of ASCII files.

Definition of Binary Formats

When accessing binary files, it is most important to specify the correct <externalType> using one of the XML representations found here - one needs to know beforehand whether e.g. point coordinates are stored externally as single (float) or double precision (double) real numbers. By default, OPALS assumes <externalType> to be identical to <internalType>.

The binary representation of numeric data types is generally not portable across different processor architectures. As an advanced feature, OPALS therefore allows for the specification of the (external) endianness to be used (<endian>), while using native endianness by default.

Examples

Example OFD file 'simpleAscii.xml'

The file can be found in the $OPALS_ROOT/addons/formatdef/ directory

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!-- Exemplary ASCII format definition that imports/exports point coordinates, followed by amplitudes -->
<opalsFormatDefinition>
<description>X,Y,Z,Amplitude ASCII Format</description>
<ascii>
<header>
<text val="#This line could provide information about the file content" />
</header>
<data>
<entry val="x" format="12.3" />
<entry val="y" format="12.3" />
<entry val="z" format="9.3" />
<entry val="Amplitude" format="9.3" />
</data>
</ascii>
</opalsFormatDefinition>

Example OFD file 'simpleBinary.xml'

The file can be found in the $OPALS_ROOT/addons/formatdef/ directory

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!-- Exemplary binary format definition that imports/exports point coordinates, followed by amplitudes -->
<opalsFormatDefinition>
<description>X,Y,Z,Amplitude Binary Format</description>
<binary>
<data>
<entry val="x" />
<entry val="y" />
<entry val="z" />
<entry val="Amplitude" />
</data>
</binary>
</opalsFormatDefinition>
Author
jo,wk
Date
06.07.2011