 See also
 opals::IHisto
Aim of module
Derives histograms and descriptive statistics (min, max, mean, r.m.s, etc.) for ODM or grid/raster data sets and stores the results graphically (SVG) and numerically (XML).
General description
Data analysis is of crucial importance for ALS data processing. Besides 3D views and 2D maps, histograms of specific data attributes (including standard statistical parameters) are useful tools for analyzing certain data characteristics (e.g. distribution of heights, amplitudes, return numbers, gradients...) and for checking the data quality (e.g. strip differences considered to be normally distributed with expectation value = 0). Whereas 2D maps visualize the spatial distribution of certain data attributes, histograms condense the entire information in a bar plot and the descriptive statistical measures summarize and characterize the whole data sample. Thus, decisions upon the necessity of additional processing steps (e.g. strip adjustment) no longer rely on a pure visual data inspection but they are supported (or even advised) by quantified statistical measures. The following statistical parameters are provided:

CountData: overall number of available data samples

CountUsed: number \(n\) of data samples ( \(x_{i}\)) used for histogram/statistics calculation (c.f. below for details)

Min: minimal sample value, \(x_{min}=min(x_{i}), i=1..n\)

Max: maximum sample value, \(x_{max}=max(x_{i}), i=1..n\)

Mean: mean value, \(\mu=\frac{\sum\limits_{i=1}^n(x_{i})}{n}\)

Median: approximation of median (50% quantile)

Std: standard deviation, \(\sigma=\sqrt{\frac{\sum\limits_{i=1}^n{(x_{i}\mu)^2}}{n99}}\)

Rms: root mean square, \(rms=\sqrt{\frac{\sum\limits_{i=1}^n{x_{i}^2}}{n}}\)

Sigma(mad): median of absolute deviation to median * 1.4826. In case of normal distribution this is a robust estimator for the standard deviation.

Skewness: describing the (a)symmetry of the sample (negative: left skewed, positive: right skewed)

Quantiles: for userdefined probabilites; default: p=0.01, 0.05, 0.25, 0.75, 0.95, 0.99
Module Histo operates on either vector data sets (OPALS Data Manager, ODM) or regular grids/rasters in GDAL supported data formats. In both cases a single or multiple input files (of the same type) can be specified (parameter inFile). By default, the histograms and statistics are calculated for the heights (z) of the ODM point cloud or the first band of the grid/raster dataset, respectively. However, any other attribute stored in the ODM (as additional info) or raster band (zerobased band index or band name) can as well be used as basis (parameter attribute). In case of multiple attributes, the module derives seperate histograms and statistics for each attribute or raster band. The respective data samples are sorted into bins (classes) of equidistant width. The width can be specified either by the desired number of different bins (parameter nBins, default: 50) or by a specific bin width (parameter binWith). In general, the histogram automatically covers the entire range of sample values from \(x_{min}\) to \(x_{max}\). Since the majority of data are often clustered together in a limited value range, with only a few number of outliers beyond, the histogram can be confined to a certain range (parameter sampleRange). The range is defined by the lower bound of the first bin and the upper bound of the last bin. All samples outside the valid range are collected in additional overflow and underflow bins.
Unless otherwise specified, all available data are considered for the calculations. However, in some situation it might be advantageous to restrict the input data. This can be achieved by specifying a data window (parameter limit) and/or a selection condition (parameter filter). The filter string must correspond to the OPALS Filters syntax. Please note, that filters based on certain point attributes (e.g.: Echo, Class, ...) are not are practicable for grid/raster datasets as the 3D grid points (x,y, grid value) do not contain additional point attributes. It is advisable to use the limit parameter (e.g.: limit xmin ymin xmax ymax) to specify a specific data window rather than specifying a region filter (e.g.: parameter filter "Region[xmin ymin xmax ymax]"). In the latter case the region filter is applied to the entire dataset (all ODM data points or entire grid) whereas in the first case a spatial subselection is applied beforehand resulting in better program performance. This is especially important when statistics are to be derived for a series of small patches based on the same data set using Module Histo repetitively. It is even possible to combine limits and filters in which case, first, the window query is applied and, subsequently, all points within the window area are checked w.r.t. the (potentially complex) region filter polygon.
Finally, the results (histogram and statistics) are stored as a complex object sperately each attribute. For the Python and C++ class implementations the results are directly accessible for further use (e.g. to decide on further processing steps based on statistical measures). Beyond that, graphical output (parameter plotFile) is provided as Scalable Vector Graphics which can be displayed in standard web browsers (Firefox, Opera, IE...). If an output parameter file (parameter outParamFile) is specified, the (numerical) results are additionally written to an XML file.
Parameter description
inFileinput files (ODM or GDAL grid/raster)
attributeattribute(s) for histogram calculation
Type:
opals::Vector<
opals::String>
Remarks: default=z
Specifies the attribute(s) for which the histogram is to be derived. Accepts ODM attribute names (e.g. z(default), Amplitude, EchoWidth etc.) or raster band number(s) and name(s) (e.g. 0 (default), 1...), respectively.
histogramgeneric histogram output object
Type:
opals::Vector<
opals::HistoStats>
Remarks: output
The results of the histogram calculation are provided as list of generic objects containing statistical information like: min, max, mean, median..., and the histogram classes each containing its relative frequency. For each attribute a separate histogram is provided. Specify outParamFile in order to store these results to an XML file.
nBinsnumber of histogram bins (classes)
Type: uint32
Remarks: default=50
The (equidistant) bin width is calculated automatically, so that the specified number of bins fit into the sample range. Ignored if 'binWidth' is specified.
binWidthwidth of a single histogram bin (class)
Type: float
Remarks: optional
If specified, the histogram is divided into equidistant classes (bins) of the given interval
sampleRangehistogram sample range (min max)
Type:
opals::Array<float, 2>
Remarks: optional
Range (min max) of sample (attribute) values considered for histogram calculation denoting the outermost bounds (i.e.: min=lower bound of first class, max=upper bound of last class. Additionally, under/overflow classes are provided to collect all samples outside the valid range. If not specified, the entire sample range of the specified ODM attribute or grid model is used.
densityRangehistogram density range [%] (min max)
Type:
opals::Array<float, 2>
Remarks: optional
By default, the yscale of the histogram, i.e. density [%], is derived automatically based on the maximum density class. A fixed density range can be forced by specifying denityRange with a lower and upper bound (min,max). This is especially convenient for a visual comparison of multiple histogram plots.
limitdata limit window: left lower right upper
Type:
opals::Array<double, 4>
Remarks: optional
If specified, only data within the given window area (left lower right upper) are considered for histogram calculations.
filtermodify the input using a (tree of) filter(s)
Type:
opals::String
Remarks: optional
A filter string in EBNF syntax as described in Section 'Filter syntax' can be passed to restrict the set of input points (e.g. to consider a boundary polygon, etc. ). For raster/grid inputs the filter is evaluated for each valid grid post (3Dpoint: x, y, grid value). Thus, only certain filters (Affine, Region, GenericZ filters) are applicable to raster/grid inputs.
plotFileplot file
Type:
opals::Vector<
opals::Path>
Remarks: estimable
The histogram is additionally plotted as a Scalable Vector Graphics file. SVG files can be displayed by web browsers like Firefox, IE, Opera.... The creation of a plot file can be suppressed by specifying: plot file off Estimation rule: ODM: current directory and Name of inFile + '_<attribute>_histo.svg', Grids/Rasters: Current directory and Name of inFile + '_histo.svg'
probabilitiesprobabilities for quantile calculation
Type:
opals::Vector<double>
Remarks: default=0.01 0.05 0.25 0.75 0.9 0.95
A list (vector) of userdefined probabilities (0 <= p <= 1) values can be specified. The respective quantiles are calculated by the module.
procModerational or integer processing mode
Type:
opals::HistoMode
Remarks: default=automatic
Possible values:
automatic ... derived mode from attribute type
real ........ allow arbitrary bin width values
integer ..... limits bin width to natural numbers
TO COME
exactComputationuse exact median/quantile computation
Type: bool
Remarks: default=0
In exact computation mode all data values have to fit into memory
Examples
The data used in the following examples are located in the $OPALS_ROOT/demo/
directory. Example 1 shows several histogram variants based on an ALS point cloud (fullwave.odm) whereas example 2 features histograms of regular grids (stip19/20.tif).
Example 1:
As a prerequisite for the following example, the ALS point cloud data must be imported into the ODM. To achieve that, change to the demo
directory and type:
Now, we are ready to derive histograms based on the resulting ODM featuring the 3Dcoordinates (x, y, z) and additional attributes (x, y, z, gps time, amplitude, echo width, echo nr, echo qualifier) for each point. The following example demonstrates how to to analyze different point attributes in the form of histograms:
The following SVG plot files are created:
fullwave_z_histo.svg
fullwave_EchoNumber_histo.svg
fullwave_Amplitude_histo.svg
fullwave_EchoWidth_histo.svg
Please note, that the names of the respective SVG plot files (not specified explicitly in the above examples) have been estimated from the input file and attribute names. The results are shown in Figure 1.
Fig. 1: Histograms of ALS point attributes: z (upper left), Echo number (upper right), Amplitude (lower left) and EchoWidth (lower right)
A numerical representation of the histogram and statistics can be obtained by specifying an output parameter file as shown in the first example (parameter outParamFile). The resulting XML file histo.xml
contains the following output (excerpt):
[...]
<Module>
<Version>1.0.0.0</Version>
<License>[...]
<Parameters>
<Parameter Name='inFile'><Val>fullwave.odm</Val></Parameter>
<Parameter Name='attribute'><Val>z</Val></Parameter>
<Parameter Name='histogram'><Val>Histogram[Bins[
(274.517,0.000) (275.535,0.003) (276.553,0.004) (277.570,0.005) (278.588,0.007)
(279.605,0.009) (280.623,0.012) (281.640,0.015) (282.658,0.016) (283.675,0.017)
(284.693,0.019) (285.710,0.019) (286.728,0.019) (287.746,0.021) (288.763,0.022)
(289.781,0.022) (290.798,0.023) (291.816,0.025) (292.833,0.027) (293.851,0.026)
(294.868,0.026) (295.886,0.029) (296.903,0.029) (297.921,0.029) (298.939,0.029)
(299.956,0.031) (300.974,0.030) (301.991,0.029) (303.009,0.030) (304.026,0.027)
(305.044,0.027) (306.061,0.026) (307.079,0.025) (308.097,0.025) (309.114,0.025)
(310.132,0.024) (311.149,0.025) (312.167,0.025) (313.184,0.025) (314.202,0.025)
(315.219,0.023) (316.237,0.022) (317.254,0.021) (318.272,0.019) (319.290,0.019)
(320.307,0.015) (321.325,0.010) (322.342,0.008) (323.360,0.006) (324.377,0.004)
(325.395,0.002) (326.412,0.000) (327.430,0.000) (328.447,0.000) (329.465,0.000) ]
CountData[67413]
CountUsed[67413]
Min[275.535]
Max[328.500]
Mean[301.512]
Median[302.010]
Std[11.685]
Rms[301.738]
Skewness[0.044]
Quantiles[
(0.01,278.430) (0.05,282.236) (0.25,292.185) (0.50,301.345) (0.75,310.977)
(0.95,320.419) (0.99,324.242) ]] Val></Parameter>
<Parameter Name='binWidth'><Val>1</Val></Parameter>
<Parameter Name='nBins'><Val>50</Val></Parameter>
<Parameter Name='plotFile'><Val>fullwave_z_histo.svg</Val></Parameter>
</Parameters>
</Module>
Example 2:
Example 2 shows how to derive histograms of grid datasets. To generate the underlying grids, please perform the following preprocessing steps:
opalsDiff inFile strip11.tif strip21.tif outf diff_11_21.tif
opalsAlgebra inFile strip11_sigmaZ.tif strip11_excen.tif outf strip11_mask.tif 
formula "r[0] && r[1] && r[0]<0.1 && r[1]<0.8"
opalsAlgebra inFile strip21_sigmaZ.tif strip21_excen.tif outf strip21_mask.tif 
formula "r[0] && r[1] && r[0]<0.1 && r[1]<0.8"
opalsAlgebra inFile strip11_mask.tif strip21_mask.tif outf diff_11_21_mask.tif 
formula "r[0] && r[1] ? r[0]*r[1] : invalid"
opalsAlgebra inFile diff_11_21.tif diff_11_21_mask.tif outf diff_11_21_masked.tif 
formula "r[0] && r[1] ? r[0]*r[1] : invalid"
This procedure, first, imports the data of strips 19 and 20 (strip11/21.las
) into separate ODM files, and generates surface models as well as additional "sigmaZ" (=smoothness) and "excenter" (=extrapolation/occlusion) layers for each strip (strip11.tif
, strip11_sigmaZ.tif
, strip11_excen.tif
). Subsequently, a strip difference model (diff_11_21.tif
) and a respective grid mask (diff_11_21_mask.tif
) are derived. The examples below (results c.f. Figure 2) show the distribution of the grid heights and excenter layer of the surface model of strip 11. Furthermore, the height differences between strip 11 and 21 are analyzed in two variants, one using all available height differences in the strip overlap area, an the other one considering a grid mask to exclude all rough and occluded pixels.
opalsHisto inFile strip11_excen.tif binwidth 0.1 sampleRang 0 1
Please further note, that wild cards are supported in case multiple input datasets.
Fig. 2: Histograms of grid datasets: DSM heights (upper left), excenter layer (upper right), Unmasked strip differences (lower left), Masked strip differences (lower right)
Example 3:
The main benefit of running Module Histo in Python is that the histogram as well as all standard statistics (min, mean, median, ...) are directly accessible after running the module via the Python API. This is exemplified in the sample script $OPALS_ROOT/demo/histoDemo.py
:
2 from opals
import Histo, Import
4 imp=Import.Import(inFile=
"fullwave.fwf").run()
6 his.commons.screenLogLevel = opals.Types.LogLevel.error
7 his.inFile = [
"fullwave.odm"]
8 his.probabilities = [0.01,0.05,0.10,0.90,0.95,0.99]
9 his.sampleRange=[280,320]
12 stats = his.histogram[0]
13 print(
"Selected height statistics of dataset: {}".format(his.inFile[0]))
14 print(
"Min : {:0.2f}".format(stats.getMin()))
15 print(
"Median: {:0.2f}".format(stats.getMedian()))
16 print(
"Max : {:0.2f}".format(stats.getMax()))
17 print(
"q0.95 : {:0.2f}".format(stats.getQuantiles()[4][1]))
18 print(
"\nHistogram: lower height bound [m]  abs. #samples : rel. #samples [%]")
19 bAbs, bRel = stats.getAbsBins(),stats.getBins()
20 for i
in range(len(bAbs)):
21 print(
" {:0.1f}  {:0.1f}: {:0.2f}".format(bAbs[i][0], bAbs[i][1], bRel[i][1]*100.))
To run the script, type:
The script queries the output parameter histogram provided by Module Histo as a complex Python type (class) and uses its access functions to print the min, median, and 95% height quantile of dataset <fullwave.odm>. The following output is printed to the screen:
Selected
height statistics of dataset: fullwave.odm
Min : 275.53
Median: 301.46
Max : 328.50
q0.95 : 320.36
Histogram: lower
height bound [m]  abs. #samples : rel. #samples [%]
275.0  1498 : 2.22
280.0  4811 : 7.14
285.0  6643 : 9.85
290.0  8248 : 12.24
295.0  9535 : 14.14
300.0  9737 : 14.44
305.0  8464 : 12.56
310.0  8134 : 12.07
315.0  6983 : 10.36
320.0  3360 : 4.98
References
Ressl, C., Kager, H. and Mandlburger, G., 2008. Quality checking of ALS projects using statistics of strip differences. In: IAPRS, XXXVII, pp. 25399860.