When working with large (i.e., country-wide) datasets, workflows are usually carried out on a per-tile level. These tiles are individually processed and merged once all tiles have been processed. This has the following advantages to processing the full point cloud at once:
The OPALS tileManager provides a python class to handle the task of
1) creating the appropriate tiling based on different options (see Options for tiling)
2) cutting the input file(s) into the respective tiles, including optional buffering
3) providing the input file(s) per tile to the processing script via python iterator functionality
4) merging the processed files to create an output mosaic
Note that tileManager is not meant to be called from the command-line, but rather from user-provided python scripts.
The tileManager supports the definition of different input data sets, e.g. a raster model and a point cloud, that both have to be tiled and supplied to the processing script simultaneously. An application example (also see Example scripts) for this could be the normalization of a point cloud by an external digital elevation model. Each of these input data sets may be provided in its own tiling and needs to be collected to create a new, comprehensive tiling.
In the tileManager, this information is provided to the class upon initialization in the form of a list of python dicts.
Here, inFileDicts is telling the tileManager (1) what data type to expect, (2) where to look for the input files and (3) if the input type is required, i.e. if a tile can be processed if there is no data of this type. A single entry in this list may thus look like this:
The files key points to a list of input files, which may be overlapping (in which case point clouds are merged and for raster files, the first provided with a valid pixel value is used), type can be either odm (for point clouds) or raster for any GDAL-supported raster file. name is an identifier that is used both as a dictionary key for the provision of the tiles to the processing script and as file names for the individual tiled files. Finally, required is a boolean value. If a tile does not contain any information (point or valid pixel) in any of the data types marked as required, the tile is skipped (i.e., not provided to the processing script). In other cases, the data type may be empty (and it is up to the processing script to deal with the missing information).
Multiple file types of the same data type may be defined, e.g. there may be two sets of rasters (say, a DTM and a DSM) that need to be processed together (to create a normalized DSM). In this case, inFileDicts would be a list with two dicts:
The tileManager allows the user to provide different options on how tiling should be carried out:
False or None: Merge all input files into one tile covering the whole area of interest5r2c, meaning 5 rows and 2 columns, which would result in 10 tiles. The order may be switched as long as the indicators remain: 2c5r is the same as 5r2c.5000 will create tiles covering 5000x5000 units, 200 300 will create tiles of 200 units in x- and 300 units in y-direction.dtm in the example above. All other datasets will be tiled to match the files of dtm. Note that this assumes axis-aligned tiles.Similarily, there are different options for naming of the tiles, depending on the tiling concept used:
ATTR column of the shapefileLL, UL, LR, UR) have to be provided, along with a character for delimination. For example, UL+4 refers to the upper left coordinate of the tile, using + as a delimiter and 4 digits. LL_6 refers to the lower left corner, uses _ as a delimiter and 6 digits. Please ensure that the number of digits matches the size of your tiles, so that no two tiles have the same name.INDEX, a two-dimensional index is created (starting with 0000_0000), using NUMBERED, a one-dimensional index is created ( starting with 000001).Since tileManager is no stand-alone script, but meant to be run from python, the tileManager-class implements the __iter__ and next methods required to iterate over the dataset. Assuming a processing script has a function run_tile(...) which accepts a dictionary as an input, tileManager may be called like this:
After instantiating the tileManager, the list of tiles may also be retrieved from the tileManager object:
Note that this already cuts the input files to the appropriate tiles, which may take some time depending on the input datasets. To skip the cutting and just list the tiles, refer to the next section.
Both the list items as well as the iterated items (tile in the example above) are again python dicts. These tile dicts carry information on the tile and the files that concern this tile. An example tile dict looks like this:
Note that in this case, bbox, buffered_region_filter and region_filter all refer to the same rectangular box, because a tiling based on rows and columns was used. Furthermore, the optional buffer was not set, so it is 0. The pointcloud entry refers to the input dataset with ‘'name’: 'pointcloud'` and tells the script where to find the respective input file for this tile.
For distributed processing it may be of interest to delay cutting of the tiles and just process the metainformation, creating folders and tiling concepts. For this, the tileManager class can be instantiated with the do_cut=False keyword. Internally, when iterating over the tileManager for the first time, a tile and a cut-function are called, which internally rely on the preTiling and preCutting scripts in the $OPALS_ROOT/opals/workflows folder. When using do_cut=False, no cutting will take place in these scripts. This results in the cut_status item of the tile dict being set to False, indicating that the file has not been cut yet. The processing script may then (a) either use the list of input files it provided along with the buffered_region_filter to carry out its own cutting, or run tm.cut(do_cut=True) to repeat the cutting step with actually creating the tiles.
Two example scripts are provided in the OPALS demo directory. The first script, tileManagerDemo1.py, creates a 2-by-2 tiling with no overlap, and calculates a per-tile DSM using Module DSM
The second script, tileManagerDemo2.py is a bit more elaborate and implements a run_tile function. This script takes a point cloud and a raster. First, a smoothed version of the raster is calculated by running Module StatFilter , then the point cloud is normalized by the raster using Module Algebra . Since the StatFilter uses a search radius, it is necessary to first buffer the raster, then calculate the normalized point cloud, and then cut the point cloud to the net size of the tile, before it can be merged.