Parallel Processing

Most modules support multi-threaded processing to better utilize modern multi-core CPUs. However, the modules do not scale well for 8+ core CPUs. In this case, it's beneficial to spread the workload over multiple processes rather than running a single process with a high number of threads. OPALS supports such strategies by using the Python multiprocessing framework.

Introduction

There are a few key differences between multiprocessing and multithreading. A process can be imagined as an execution unit of a whole program while threads represent execution units/segments of a process. Processes are independent of each other and have individual memory space allocated to them. Threads on the other hand share memory.

Multithreading and multiprocessing have different applications in the opals framework. A classic example of multiprocessing would be the usage Module Import to import the data of multiple strips into ODM files. As each of the Module Import instances can be run individually, using multiple processes will enhance the computational speed. An example of multithreading on the other hand would be a "divide and conquer" approach to writing a raster file, which can be divided into smaller tiles. Those are then handled by different threads and later combined into the finished raster file.

Though theoretical speedup of a program using techniques involving multiprocessing is given by Amdahl's Law, in reality there are other factors that limit the maximum possible speedup. In opals scripts like the package opalsQuality for example, certain parts of the script need to be finished before others can start running which of course leads to larger runtimes. There is also some overhead of the Python multiprocessing framework which can't be avoided and has to be taken into account.

Parallel Processing in the opals framework

In opals, multiprocessing is realised using the -distribute parameter. If you want to run the script _grid.py using 6 processes for example, you can do so by setting the -distribute parameter to 6. E.g. for the demo data:

strip11.laz
strip12.laz
strip21.laz
strip22.laz
strip31.laz
strip32.laz
_grid.py -inFile strip??.laz -distribute 6

The runtime of this particular call will be about 3.5 times faster than when using the standard distribute value 1. The speedup can of course vary, depending on the hardware you are using.

Though you can choose the value of -distribute freely, it does not make sense to choose a value higher than the number of cores your computer has. In most cases, choosing a value higher than the amount of cores will impact the runtime in a negative way. For most opals scripts, a good value for the number of processes is the number of input files you have (again, keeping your number of cores in mind). Running the above example of _grid.py with more than 6 processes does not increase the speedup as for this script, there are only n different tasks to be performed at once, n being the number of input files.

A schematic represenation of a distribution of module runs using different numbers of workers/processes can be seen in Fig.1.

Fig. 1: Possible distribution of module runs using different values for -distribute

Example of a possible speedup using Parallel Processing

As said in the opening paragraph, not only the number of processes can be chosen but also the number of threads that a program uses. The number of threads is controlled by the common parameter nbThreads and can be set in the .cfg file like so (excerpt of the cfg File of opalsQuality):

#############################################################################
########################### common parameters ###############################
#############################################################################
#############################################################################
####################### parameters for opalsGrid ###########################
#############################################################################
gridSize=1
interpolation=movingPlanes
neighbours=8
. . .

To give an idea of the magnitude of speedup achievable, different combinations of numbers of processes/threads were tested on the script _grid.py and the results depicted below in Fig.2. The demo dataset Loosdorf (To be integrated as a usecase example) containing 12 strips was used as the input. The numbers of processes (i.e. the value of the parameter -distribute) were [1,2,4,8,12]. For each configuration, each of the following number of threads was used to run _grid.py: [1,2,4,8,16,32]. The computer which was used to run these tests has 16 Cores and 32 Threads.

Fig. 2: Speedup of _grid.py, varying number of processes/threads

In order to interpret these results it is important to note that 2 runtimes of one and the same configuration can differ by up to 10-20 seconds. Having said that, for this dataset an increase of the number of processes is beneficial for decreasing the runtime in all of the configurations. Regarding the number of threads however, such a statement can not be made so easily. Increasing the number of threads to up to 4 is decreasing the runtime 100% of the time, when using 8 or more threads it is not as clear. While the general trend is indicating more threads --> more speedup, there are cases where using more threads actually slows down the program. This is due to the above described shared memory space between the threads that is not the case when using multiple processes. It can happen that the different threads (while they do each have to handle less workload) get in each other's way and perform worse as when there are less threads.

Which number of processes/threads to use

The main takeaway of the table in Fig.2 would be that an increase of the number of processes is in most cases a good idea in order to speedup your program (again keeping the specs of your hardware in mind). One should on the other hand be more careful about increasing the number of threads. The speedup may also be limited by the type of data drive that you are using, a HHD (Hard Disk Drive) or an SSD (Solid State Disk) which can operate much faster. Additional processes are typically more demanding on the physical drive (RAM) of your computer than additional threads. So the amount of RAM available to you is something to keep in mind as it can can also be a bottleneck for increasing the runtime of your program.

Example script showing how the multiprocessor object works

The instantation of a multiprocessor object is normally done by helper scripts analyzing the input argument -distribute. It can however very easily be instantiated and used in your own scripts. This example is supposed to showcase how the multiprocessor in the opals framework works and display the most important functions used by scripts like opalsQuality. Another purpose of the script is to demonstrate that using multiprocessing does not limit the queries on modules after they finished like you would when using .run() to run a module.

from opals.tools import processor
from opals import Info
import os
if __name__ == '__main__':
# First the input Files are specified
infPath = r"D:\fhacksto\opals\distro\demo" # path to your opals installation needs to be set
fileNames = ["strip11.laz", "strip21.laz", "strip31.laz"]
inFiles = [infPath + os.sep + inFile for inFile in fileNames]
# a multiprocessor object ist instantiated using 3 workers. This is normally done during the analyzing of input files
# using the scripts opalsargparse.py and analyzeInput.py, controlled by the -distribute parameter. To make this work
# as a stand-alone script however we need to instantiate it by ourselves.
proc = processor.multiprocess(3)
# we then create an instance of opalsInfo, setting the exactComputation to 1 as we want to extract some statistics later.
mod = Info.Info(exactComputation=1)
# We can now loop over our .laz files and each set them as input parameters for opalsInfo
for inf in inFiles:
print(f"***Starting processing of {inf}...***")
mod.inFile = inf
# Using the multiprocessor function "submit" we can hand our module to the processor. Using the parameter "finished Text"
# We can set a message that prints when the module finished running. As soon as the module instance is submitted,
# it is assigned to a worker and executed in a seperate process while the loop continues running.
proc.submit(mod, finishedText=f'***Finished processing of {inf} ***')
# the object function "join" waits until all processes are finished so we can start with analyzing with analyzing the outputs.
proc.join()
# with indexing "[i]" (just like you would with e.g. a list) you can access the instances of the modules after they finish.
# The order is determined by their submitting-time, NOT the finished time. I.e. the position of a certain file in the
# input list corresponds to the index of the module in the processor which used said file as the input.
# The next two lines of code showcase, how you can query the crs of the first submitted file.
fileCRS = proc[0].statistic.getCoordRefSys()
print(f"The first processed file has the wkt-string {fileCRS}")
# The next snippet of code shows how you can iterate over the processor (also like you would with e.g. a list) to
# determine the file with the highest PointDensity of your dataset.
maxPDens = 0
for mod in proc:
filePDens = mod.statistic.getPointDensity()
if filePDens > maxPDens:
maxPDens = filePDens
maxPDensFile = mod.inFile
print(f"The file {maxPDensFile} has the highest Point Density ({maxPDens}) of the dataset.")

For more information about the member functions of the multiprocess class please visit the multiprocess Class Reference.

@ nbThreads
number of concurrent threads
@ movingPlanes
moving (tilted) plane interpolation
@ strip
strip ID of an image (opalsStripAdjust)
Contains the public interface of OPALS.
Definition: AbsValueOrQuantile.hpp:8
opalsGrid is the executable file of Module Grid
Definition: ModuleExecutables.hpp:93
@ info
Some progress that may be interesting in everyday-use.
@ screenLogLevel
verbosity level of screen output