The python binding (pyDM) of the C++ OPALS Datamanager Library exposes its functionality to python. Except of a some minor naming differences and some small differences caused by the language itself (mainly due to the python iterator concept) the python binding nearly completely matches the C++ Interface.
Before using the DM Library you should at least read The C++ OPALS Datamanager Library section, since it describes the central implementation concepts of the library. Then have a look at the examples section, which should give you a good starting point for your development. Each example is given in C++ and python. There the high similarity of the two interfaces is easy visible.
There are three major differences which are described in details in the following:
The static 'New' functions (construction of objects) in C++ were translated to the native python object constructors 'init'. The following example shows the creation of a DM point (coordinates set to 0/0/0) in C++
and in python
Although python and C++ support iterators there are significant differences in there usage. In the following a code snippet is presented, showing the way of iterating over all points of an ODM in C++
and in python
C++ class name | python class name | Description |
---|---|---|
DM::IPoint | pyDM.Point | 3d point object |
DM::IBox | pyDM.Box | 3d box object |
DM::IDatamanager | pyDM.Datamanager | ODM object |
Python is a powerful scripting language and therefore, an efficient tool for prototyping. Due to the huge number of extensions (many optimized C++ libraries provide a python binding. E.g. SciPy, NumPy, ...), python turned into a programming language for real-life software projects. In terms of performance, however, python cannot compete with C++ or similar languages. This is of minor concern in 'low performance' applications or if time critical section are computed by optimised libraries 'outside' of python. The python bindings of the OPALS modules represent the later case. The python binding of the DM provide low level access to the objects within an ODM, and allow manipulating or processing ODMs in a similar way as existing modules OPALS do. Considering the aforementioned statements, processing of huge point clouds on a point bases within python is not recommended. The bindings are useful for testing new processing strategies on small test sets, but for huge data sets (billon of points) it is recommended to switch to C++ or to use high level functionality (e.g. Datamanager.getHistogramSet, RConverter, etc. ) only. Depending on the task to perform, single point processing in python is between 3 to 10 times slower than in C++ (without any optimisation in C++)
As mentioned before the pyDM interface provides low level access to points and their attributes which can be quite slow. For certain operations it is much faster to retrieve points and attributes as NumPy arrays and use optimised NumPy functions for processing. E.g. OPALS uses NumPy arrays to pass attribute information (as files) to R within the tree based classification. To support different data types of attributes pyDM works with dictionaries of one dimensional NumPy arrays (similar to the pandas DataFrame class).
As shown in example DM_numpy_spatial_query.py the results of spatial queries can be retrieved as dictionary of NumPy arrays (subsequently referred to as NumPy dicts) using the pyDM.NumpyConverter class. Attributes which are of interest need to be defined in the corresponding query layout. The x,y and z coordinates are also added as NumPy array to the dict, it the withCoordinates is activated. The following code snippet
which will print
As it can be seen, the array dtype matches the attribute type in the ODM (also see ODM as a database table).
A central feature of the ODM is that it supports null values for attributes. Therefore the ODM uses separate null flags (rather than a specific no data value). So when querying NumPy dicts one needs to consider how null values should be translated to NumPy arrays. The pyDM supports 3 different ways:
The null value conversion method is controlled by the noDataObj parameter. The DM_numpy_null_value.py example demonstrates all three methods of null value handling, by creating a few random points with two attributes. Whereas _attr1 is set for all points _attr2 is fill for half of the points only. Hence, the correct null value handling method is essential when querying _attr2.
As shown in following code snippet, only _attr1 (referred as shortLayout) can be converted without specification of a noDataObj parameter. Trying to retrieve both attributes (without noDataObj) will lead to an exception.
To overcome this exception one needs to set the noDataObj parameter. As mentioned above, it possible to provide one value for all attributes or a list of values matching the number of layout columns. The code example below uses 'min' in the first call and '[0, numpy.nan]' in the second call. 'min' results in the lowest possible value of corresponding attribute type which is 0 for _attr1 (type: uint16) and -3.40282347e+38 for attr2 (type: float). In the second call 0 and NaN (Not a Number; only possible for floating-point data types) as no data value. Attention: pyDM doesn't check if any attribute values are equal to the no data values. If this is the case, subsequent processing might interpret set attribute values as null values.
To overcome the aforementioned problem, one can retrieve attributes as masked NumPy arrays. This doesn't require the definition of an appropriate no data values, since the null flag status is stored in a separated mask array. Those MaskedArray objects are natively supported by NumPy which makes them both, flexible and efficient to use. The only downside of mask arrays is the higher memory consumption due to the additional mask array. As shown below, use 'mask' in the noDataObj parameter to retrieve the corresponding objects.
In some situations it might be beneficial to retrieve the NumPy arrays with a certain value type, instead of the ODM attribute type. Although NumPy arrays can be easily converted (see astype functionality for details), it's an extra step that requires additional memory. Therefore, pyDM provides the valueType parameter when retrieving NumPy arrays. This allows setting one specific value type for all returned objects (setting different value types for different attributes is not possible).
As shown below, valueType can be set as NumPy dtype or as pyDM.ColumnType:
Due to the high similarity of the C++ and the python DM API, the C++ documentation of the DM is 'copied' to python. So it may appear that the python documentation contains links to the C++ documentation. If a function is not documented, it is well worth to have a look at the C++ documentation. Maybe it's described there.
Please note that the python documentation is derived in an external process which is why the python DM module does only partly contain the corresponding doc strings. So always have a look to the external documentation.
Examples demonstrating the usage of the python DM API can be found here