Feedback/visual inspection can be carried out after the first tile has been processed
Errors in processing do not stop the whole workflow, localization of error sources (e.g. invalid geometries) is easier
Smaller files allow for easier handling e.g. in interactive editing environments
Corrupt files only concern single tiles, which can easily be repeated
Processing can be parallelized over multiple threads, or even computers

The OPALS tileManager provides a python class to handle the task of

1) creating the appropriate tiling based on different options (see Options for tiling)

2) cutting the input file(s) into the respective tiles, including optional buffering

3) providing the input file(s) per tile to the processing script via python iterator functionality

4) merging the processed files to create an output mosaic

Note that tileManager is not meant to be called from the command-line, but rather from user-provided python scripts.

Input data and metadata format

The tileManager supports the definition of different input data sets, e.g. a raster model and a point cloud, that both have to be tiled and supplied to the processing script simultaneously. An application example (also see Example scripts) for this could be the normalization of a point cloud by an external digital elevation model. Each of these input data sets may be provided in its own tiling and needs to be collected to create a new, comprehensive tiling.

In the tileManager, this information is provided to the class upon initialization in the form of a list of python dicts.

def __init__(self, logger, inFileDicts, tilingStr, namingStr, tempdir, buffer=0, aoi=None, resultFileNames=None, skipIfExists=0, shapefileExport=False):

...

Here, inFileDicts is telling the tileManager (1) what data type to expect, (2) where to look for the input files and (3) if the input type is required, i.e. if a tile can be processed if there is no data of this type. A single entry in this list may thus look like this:

{'files': [r'strip1.odm',
           r'strip2.odm',
           r'strip3.odm',
           r'strip4.odm'], # List of Files (relative or full path, no */?)
 'type': 'odm', # either odm or raster
 'name': 'points', # any name that can be used as a dict key and a (windows) filename
 'required': True}

The files key points to a list of input files, which may be overlapping (in which case point clouds are merged and for raster files, the first provided with a valid pixel value is used), type can be either odm (for point clouds) or raster for any GDAL-supported raster file. name is an identifier that is used both as a dictionary key for the provision of the tiles to the processing script and as file names for the individual tiled files. Finally, required is a boolean value. If a tile does not contain any information (point or valid pixel) in any of the data types marked as required, the tile is skipped (i.e., not provided to the processing script). In other cases, the data type may be empty (and it is up to the processing script to deal with the missing information).

Multiple file types of the same data type may be defined, e.g. there may be two sets of rasters (say, a DTM and a DSM) that need to be processed together (to create a normalized DSM). In this case, inFileDicts would be a list with two dicts:

inFileDicts = [
    {'files': [r'DSM\tile1.tif',
               r'DSM\tile2.tif',
               r'DSM\tile3.tif',
               r'DSM\tile4.tif'],
     'type': 'raster',
     'name': 'dsm',
     'required': True},
    {'files': [r'DTM\tile1.tif',
               r'DTM\tile2.tif',
               r'DTM\tile3.tif',
               r'DTM\tile4.tif'],
     'type': 'odm',
     'name': 'dtm',
     'required': True}
 ]

Customization options

Options for tiling

The tileManager allows the user to provide different options on how tiling should be carried out:

empty string, False or None: Merge all input files into one tile covering the whole area of interest
shapefile: Path to a polygon shapefile. The polygons in this shapefile will be used to create tiles
rows and columns: a string containing numbers of rows and columns, e.g. 5r2c, meaning 5 rows and 2 columns, which would result in 10 tiles. The order may be switched as long as the indicators remain: 2c5r is the same as 5r2c.
size: a string with one or two numbers specifying the size of a tile (in units of the coordinate system of the input data). If only one value is provided, a square shape is assumed: 5000 will create tiles covering 5000x5000 units, 200 300 will create tiles of 200 units in x- and 300 units in y-direction.
name of an input type: this copies the tiling from an input source, e.g. dtm in the example above. All other datasets will be tiled to match the files of dtm. Note that this assumes axis-aligned tiles.

Options for naming

Similarily, there are different options for naming of the tiles, depending on the tiling concept used:

shp.ATTR works when using shapefile tiling and takes the name from the ATTR column of the shapefile
inp.TYPE works when using name of an input type tiling, and will copy the tiling name from the input files
corner coordinates works with size and rows and columns tiling concepts. Here, the number of digits and the reference point (one of LL, UL, LR, UR) have to be provided, along with a character for delimination. For example, UL+4 refers to the upper left coordinate of the tile, using + as a delimiter and 4 digits. LL_6 refers to the lower left corner, uses _ as a delimiter and 6 digits. Please ensure that the number of digits matches the size of your tiles, so that no two tiles have the same name.
index or numbered also works with size and rows and columns tiling concepts. Using INDEX, a two-dimensional index is created (starting with 0000_0000), using NUMBERED, a one-dimensional index is created ( starting with 000001).

Running tileManager from python

Since tileManager is no stand-alone script, but meant to be run from python, the tileManager-class implements the __iter__ and next methods required to iterate over the dataset. Assuming a processing script has a function run_tile(...) which accepts a dictionary as an input, tileManager may be called like this:

inp_files = [{'files': glob.glob(options.i),
              'type': 'odm',
              'name': 'pointcloud',
              'required': True}]
tiling_concept = '2r2c'  # two rows, two columns
naming_concept = 'LL_6'  # lower left coordinate, 6 digits, '_'-separated
tempdir = 'temp'
 
tm = tileManager.preTileManager(logger, inp_files, tiling_concept, naming_concept, tempdir)
for tile in tm:
    run_tile(tile)

After instantiating the tileManager, the list of tiles may also be retrieved from the tileManager object:

tile_list = list(tm)

Note that this already cuts the input files to the appropriate tiles, which may take some time depending on the input datasets. To skip the cutting and just list the tiles, refer to the next section.

Both the list items as well as the iterated items (tile in the example above) are again python dicts. These tile dicts carry information on the tile and the files that concern this tile. An example tile dict looks like this:

{'bbox': <-2047.2530518174171, 338947.34375, -1860.2530518174171, 339043.84375>,
 'buffered_region_filter': 'Region[-2047.2530518174171 338947.34375 -2047.2530518174171 339043.84375 -1860.2530518174171 339043.84375 -1860.2530518174171 338947.34375 -2047.2530518174171 338947.34375]',
 'cut_status': True,
 'pointcloud': 'temp\\000001\\tempV_pointcloud.odm',
 'region_filter': 'Region[-2047.25305182 338947.34375 -2047.25305182 339043.84375 -1860.25305182 339043.84375 -1860.25305182 338947.34375]',
 'tile_id': 3,
 'tile_name': '000001',
 'type': 'bbox'}

Note that in this case, bbox, buffered_region_filter and region_filter all refer to the same rectangular box, because a tiling based on rows and columns was used. Furthermore, the optional buffer was not set, so it is 0. The pointcloud entry refers to the input dataset with ‘'name’: 'pointcloud'` and tells the script where to find the respective input file for this tile.

Delayed cutting

For distributed processing it may be of interest to delay cutting of the tiles and just process the metainformation, creating folders and tiling concepts. For this, the tileManager class can be instantiated with the do_cut=False keyword. Internally, when iterating over the tileManager for the first time, a tile and a cut-function are called, which internally rely on the preTiling and preCutting scripts in the $OPALS_ROOT/opals/workflows folder. When using do_cut=False, no cutting will take place in these scripts. This results in the cut_status item of the tile dict being set to False, indicating that the file has not been cut yet. The processing script may then (a) either use the list of input files it provided along with the buffered_region_filter to carry out its own cutting, or run tm.cut(do_cut=True) to repeat the cutting step with actually creating the tiles.

Example scripts

Two example scripts are provided in the OPALS demo directory. The first script, tileManagerDemo1.py, creates a 2-by-2 tiling with no overlap, and calculates a per-tile DSM using Module DSM

opalsImport -inf G111.las
opalsImport -inf G112.las
opalsImport -inf G113.las
opals tileManagerDemo1.py -i G11?.odm

The second script, tileManagerDemo2.py is a bit more elaborate and implements a run_tile function. This script takes a point cloud and a raster. First, a smoothed version of the raster is calculated by running Module StatFilter , then the point cloud is normalized by the raster using Module Algebra . Since the StatFilter uses a search radius, it is necessary to first buffer the raster, then calculate the normalized point cloud, and then cut the point cloud to the net size of the tile, before it can be merged.

opals _import.py -i strip??.laz -o .

opals tileManagerDemo2.py -i strip??.odm -r strips_dtm.tif

Author: lwiniwar

Date: 19.04.2021

Table of Contents