Most modules support multi-threaded processing to better utilize modern multi-core CPUs. However, the modules do not scale well for 8+ core CPUs. In this case, it's beneficial to spread the workload over multiple processes rather than running a single process with a high number of threads. OPALS supports such strategies by using the Python multiprocessing framework.
There are a few key differences between multiprocessing and multithreading. A process can be imagined as an execution unit of a whole program while threads represent execution units/segments of a process. Processes are independent of each other and have individual memory space allocated to them. Threads on the other hand share memory.
Multithreading and multiprocessing have different applications in the opals framework. A classic example of multiprocessing would be the usage Module Import to import the data of multiple strips into ODM files. As each of the Module Import instances can be run individually, using multiple processes will enhance the computational speed. An example of multithreading on the other hand would be a "divide and conquer" approach to writing a raster file, which can be divided into smaller tiles. Those are then handled by different threads and later combined into the finished raster file.
Though theoretical speedup of a program using techniques involving multiprocessing is given by Amdahl's Law, in reality there are other factors that limit the maximum possible speedup. In opals scripts like the package opalsQuality for example, certain parts of the script need to be finished before others can start running which of course leads to larger runtimes. There is also some overhead of the Python multiprocessing framework which can't be avoided and has to be taken into account.
In opals, multiprocessing is realised using the -distribute
parameter. If you want to run the script _grid.py using 6 processes for example, you can do so by setting the -distribute
parameter to 6. E.g. for the demo data:
The runtime of this particular call will be about 3.5 times faster than when using the standard distribute
value 1. The speedup can of course vary, depending on the hardware you are using.
Though you can choose the value of -distribute
freely, it does not make sense to choose a value higher than the number of cores your computer has. In most cases, choosing a value higher than the amount of cores will impact the runtime in a negative way. For most opals scripts, a good value for the number of processes is the number of input files you have (again, keeping your number of cores in mind). Running the above example of _grid.py with more than 6 processes does not increase the speedup as for this script, there are only n
different tasks to be performed at once, n
being the number of input files.
A schematic represenation of a distribution of module runs using different numbers of workers/processes can be seen in Fig.1.
As said in the opening paragraph, not only the number of processes can be chosen but also the number of threads that a program uses. The number of threads is controlled by the common parameter nbThreads
and can be set in the .cfg file like so (excerpt of the cfg File of opalsQuality):
To give an idea of the magnitude of speedup achievable, different combinations of numbers of processes/threads were tested on the script _grid.py
and the results depicted below in Fig.2. The demo dataset Loosdorf (To be integrated as a usecase example) containing 12 strips was used as the input. The numbers of processes (i.e. the value of the parameter -distribute
) were [1,2,4,8,12]. For each configuration, each of the following number of threads was used to run _grid.py:
[1,2,4,8,16,32]. The computer which was used to run these tests has 16 Cores and 32 Threads.
In order to interpret these results it is important to note that 2 runtimes of one and the same configuration can differ by up to 10-20 seconds. Having said that, for this dataset an increase of the number of processes is beneficial for decreasing the runtime in all of the configurations. Regarding the number of threads however, such a statement can not be made so easily. Increasing the number of threads to up to 4 is decreasing the runtime 100% of the time, when using 8 or more threads it is not as clear. While the general trend is indicating more threads --> more speedup, there are cases where using more threads actually slows down the program. This is due to the above described shared memory space between the threads that is not the case when using multiple processes. It can happen that the different threads (while they do each have to handle less workload) get in each other's way and perform worse as when there are less threads.
The main takeaway of the table in Fig.2 would be that an increase of the number of processes is in most cases a good idea in order to speedup your program (again keeping the specs of your hardware in mind). One should on the other hand be more careful about increasing the number of threads. The speedup may also be limited by the type of data drive that you are using, a HHD (Hard Disk Drive) or an SSD (Solid State Disk) which can operate much faster. Additional processes are typically more demanding on the physical drive (RAM) of your computer than additional threads. So the amount of RAM available to you is something to keep in mind as it can can also be a bottleneck for increasing the runtime of your program.
The instantation of a multiprocessor object is normally done by helper scripts analyzing the input argument -distribute. It can however very easily be instantiated and used in your own scripts. This example is supposed to showcase how the multiprocessor in the opals framework works and display the most important functions used by scripts like opalsQuality. Another purpose of the script is to demonstrate that using multiprocessing does not limit the queries on modules after they finished like you would when using .run() to run a module.
For more information about the member functions of the multiprocess class please visit the multiprocess Class Reference.