When working with large (i.e., country-wide) datasets, workflows are usually carried out on a per-tile level. These tiles are individually processed and merged once all tiles have been processed. This has the following advantages to processing the full point cloud at once:
The OPALS tileManager provides a python class to handle the task of
1) creating the appropriate tiling based on different options (see Options for tiling)
2) cutting the input file(s) into the respective tiles, including optional buffering
3) providing the input file(s) per tile to the processing script via python iterator functionality
4) merging the processed files to create an output mosaic
Note that tileManager is not meant to be called from the command-line, but rather from user-provided python scripts.
The tileManager supports the definition of different input data sets, e.g. a raster model and a point cloud, that both have to be tiled and supplied to the processing script simultaneously. An application example (also see Example scripts) for this could be the normalization of a point cloud by an external digital elevation model. Each of these input data sets may be provided in its own tiling and needs to be collected to create a new, comprehensive tiling.
In the tileManager, this information is provided to the class upon initialization in the form of a list of python dicts.
Here, inFileDicts
is telling the tileManager (1) what data type to expect, (2) where to look for the input files and (3) if the input type is required, i.e. if a tile can be processed if there is no data of this type. A single entry in this list may thus look like this:
The files
key points to a list of input files, which may be overlapping (in which case point clouds are merged and for raster files, the first provided with a valid pixel value is used), type
can be either odm
(for point clouds) or raster
for any GDAL-supported raster file. name
is an identifier that is used both as a dictionary key for the provision of the tiles to the processing script and as file names for the individual tiled files. Finally, required
is a boolean value. If a tile does not contain any information (point or valid pixel) in any of the data types marked as required, the tile is skipped (i.e., not provided to the processing script). In other cases, the data type may be empty (and it is up to the processing script to deal with the missing information).
Multiple file types of the same data type may be defined, e.g. there may be two sets of rasters (say, a DTM and a DSM) that need to be processed together (to create a normalized DSM). In this case, inFileDicts
would be a list with two dicts
:
The tileManager allows the user to provide different options on how tiling should be carried out:
False
or None
: Merge all input files into one tile covering the whole area of interest5r2c
, meaning 5 rows and 2 columns, which would result in 10 tiles. The order may be switched as long as the indicators remain: 2c5r
is the same as 5r2c
.5000
will create tiles covering 5000x5000 units, 200 300
will create tiles of 200 units in x- and 300 units in y-direction.dtm
in the example above. All other datasets will be tiled to match the files of dtm
. Note that this assumes axis-aligned tiles.Similarily, there are different options for naming of the tiles, depending on the tiling concept used:
ATTR
column of the shapefileLL
, UL
, LR
, UR
) have to be provided, along with a character for delimination. For example, UL+4
refers to the upper left coordinate of the tile, using +
as a delimiter and 4 digits. LL_6
refers to the lower left corner, uses _
as a delimiter and 6 digits. Please ensure that the number of digits matches the size of your tiles, so that no two tiles have the same name.INDEX
, a two-dimensional index is created (starting with 0000_0000
), using NUMBERED
, a one-dimensional index is created ( starting with 000001
).Since tileManager is no stand-alone script, but meant to be run from python, the tileManager
-class implements the __iter__
and next
methods required to iterate over the dataset. Assuming a processing script has a function run_tile(...)
which accepts a dictionary as an input, tileManager may be called like this:
After instantiating the tileManager
, the list of tiles may also be retrieved from the tileManager
object:
Note that this already cuts the input files to the appropriate tiles, which may take some time depending on the input datasets. To skip the cutting and just list the tiles, refer to the next section.
Both the list items as well as the iterated items (tile
in the example above) are again python dict
s. These tile dict
s carry information on the tile and the files that concern this tile. An example tile dict
looks like this:
Note that in this case, bbox
, buffered_region_filter
and region_filter
all refer to the same rectangular box, because a tiling based on rows and columns was used. Furthermore, the optional buffer was not set, so it is 0
. The pointcloud
entry refers to the input dataset with ‘'name’: 'pointcloud'` and tells the script where to find the respective input file for this tile.
For distributed processing it may be of interest to delay cutting of the tiles and just process the metainformation, creating folders and tiling concepts. For this, the tileManager
class can be instantiated with the do_cut=False
keyword. Internally, when iterating over the tileManager
for the first time, a tile
and a cut
-function are called, which internally rely on the preTiling
and preCutting
scripts in the $OPALS_ROOT/opals/workflows
folder. When using do_cut=False
, no cutting will take place in these scripts. This results in the cut_status
item of the tile dict
being set to False
, indicating that the file has not been cut yet. The processing script may then (a) either use the list of input files it provided along with the buffered_region_filter
to carry out its own cutting, or run tm.cut(do_cut=True)
to repeat the cutting step with actually creating the tiles.
Two example scripts are provided in the OPALS demo
directory. The first script, tileManagerDemo1.py
, creates a 2-by-2 tiling with no overlap, and calculates a per-tile DSM using Module DSM
The second script, tileManagerDemo2.py
is a bit more elaborate and implements a run_tile
function. This script takes a point cloud and a raster. First, a smoothed version of the raster is calculated by running Module StatFilter , then the point cloud is normalized by the raster using Module Algebra . Since the StatFilter uses a search radius, it is necessary to first buffer the raster, then calculate the normalized point cloud, and then cut the point cloud to the net size of the tile, before it can be merged.