Aim of module

Derives histograms and descriptive statistics (min, max, mean, r.m.s, etc.) for ODM or grid/raster data sets and stores the results graphically (SVG) and numerically (XML).

General description

Data analysis is of crucial importance for ALS data processing. Besides 3D views and 2D maps, histograms of specific data attributes (including standard statistical parameters) are useful tools for analyzing certain data characteristics (e.g. distribution of heights, amplitudes, return numbers, gradients...) and for checking the data quality (e.g. strip differences considered to be normally distributed with expectation value = 0). Whereas 2D maps visualize the spatial distribution of certain data attributes, histograms condense the entire information in a bar plot and the descriptive statistical measures summarize and characterize the whole data sample. Thus, decisions upon the necessity of additional processing steps (e.g. strip adjustment) no longer rely on a pure visual data inspection but they are supported (or even advised) by quantified statistical measures. The following statistical parameters are provided:

#Data: overall number of available data samples
#Used: number $n$ of data samples ( $x_{i}$) used for histogram/statistics calculation (c.f. below for details)
Min: minimal sample value, $x_{min}=min(x_{i}), i=1..n$
Max: maximum sample value, $x_{max}=max(x_{i}), i=1..n$
Mean: mean value, $\mu=\frac{\sum\limits_{i=1}^n(x_{i})}{n}$
Median: approximate/exact median (0.5 quantile or 50-th percentile)
Mode: sample value that apprears most often
StdDev: standard deviation, $\sigma=\sqrt{\frac{\sum\limits_{i=1}^n{(x_{i}-\mu)^2}}{n-1}}$
Rms: root mean square, $rms=\sqrt{\frac{\sum\limits_{i=1}^n{x_{i}^2}}{n}}$
StdDevMAD: median of absolute deviations to median * 1.4826. In case of normal distribution this is a robust estimator for the standard deviation.
Skewness: describing the (a)symmetry of the sample (negative: left skewed, positive: right skewed)
Quantiles: for user-defined probabilites; default: p=0.01, 0.05, 0.25, 0.75, 0.95, 0.99

Module Histo operates on either vector data sets (OPALS Data Manager, ODM) or regular grids/rasters in GDAL supported data formats. In both cases a single or multiple input files (of the same type) can be specified (parameter inFile). By default, the histograms and statistics are calculated for the heights (z) of the ODM point cloud or the first band of the grid/raster dataset, respectively. However, any other attribute stored in the ODM (as additional info) or raster band (zero-based band index or band name) can as well be used as basis (parameter attribute). In case of several user-selected attributes, the module derives seperate histograms and statistics for each attribute or raster band. The respective data samples are sorted into bins (classes) of equidistant width. The width can be specified either explicitly by a specific bin width (parameter binWith), or implicitly by the desired number of different bins (parameter nBins, default: 20). By default, the histogram is limited to the 0.02 and 0.98 quantile moving (possible) outliers to the underflow and overflow bin. This behavior can be changed with the sampleRange parameter, by specifying relative or absolute sample range values. Relative values can be either specified in percentage (e.g. 5%) or in quantile (e.g. q:0.05) notation. Please note that the quantiles are by default only approximated, which may result in slightly incorrect histogram limits, as shown in Figure 1. If exact limits are required the exactComputation mode has to be activated. If the histogram should cover the full sample range from $x_{min}$ to $x_{max}$ (without underflow and overflow bins), one can either specify quantile 0 and 1 (-sampleRange q:0 q:1) or use the min max labels (-sampleRange min max). See example 3 for further details.

For processing integer attributes (EchoNumber, Classification, etc.) the module provides a specific integer processing mode (see procMode) where the bin width is constraint to integer values. Furthermore, the bin borders are shifted by half of the bin width, so that the bin centers correspond to the integer values (see EchoNumber histogram in Figure 1). By default the module automatically switches to real or integer processing mode based on the type of the ODM attribute or the raster band type. Nevertheless, this behavior can be overruled by the procMode parameter. For non-continuous attributes like classification values, it might be relevant to skip empty bins (see parameter skipEmptyBins) within the histogram (see example 5).

To limit the memory consumption, the module uses several approximation strategies that avoids storing all values in a sorted vector or list. Using the extended $P^2$ algorithm (Jain and Chlamtac, 1985) for estimating quantiles and three data passes the median and sigma(mad) can be computed with a decent precision. Nevertheless, in certain siutation the exact quantiles, median and sigma(mad) values are required. This can be achieved by activating the exact computation mode (parameter exactComputation). There, the complete data series needs to be stored in a sorted vector, which might not be possible for huge data sets. On the other hand, the exact computation mode only requires one data pass which is typically faster than the non-exact mode. In case the module cannot allocate the vector for the full data series, it automatically drops back to the non-exact mode and outputs a warning.

Unless otherwise specified, all available data are considered for the calculations. However, in some situation it might be advantageous to restrict the input data. This can be achieved by specifying a data window (parameter limit) and/or a selection condition (parameter filter). The filter string must correspond to the OPALS Filters syntax. Please note, that filters based on certain point attributes (e.g.: Echo, Class, ...) are not practicable for grid/raster datasets as the 3D grid points (x,y, grid value) do not contain additional point attributes. It is advisable to use the limit parameter (e.g.: limit xmin ymin xmax ymax) to specify a specific data window rather than specifying a region filter (e.g.: parameter filter "Region[xmin ymin xmax ymax]"). In the latter case the region filter is applied to the entire dataset (all ODM data points or entire grid) whereas in the first case a spatial sub-selection is applied beforehand resulting in better program performance. This is especially important when statistics are to be derived for a series of small patches based on the same data set using Module Histo repetitively. It is even possible to combine limits and filters in which case, first, the window query is applied and, subsequently, all points within the window area are checked w.r.t. a (potentially more complex) region filter polygon.

Finally, the results (histogram and statistics) are stored as a complex object separately for each attribute. For the Python and C++ class implementations the results are directly accessible for further use (e.g. to decide on further processing steps based on statistical measures). Beyond that, graphical output (parameter plotFile) is provided as Scalable Vector Graphics which can be displayed in standard web browsers (Firefox, Opera, IE...). If an output parameter file (parameter outParamFile) is specified, the (numerical) results are additionally written to an XML file.

Parameter description

-inFileinput files (ODM or GDAL grid/raster)

Type: opals::Vector<opals::OdmOrRasterFile>
Remarks: mandatory
Specifies the data source(s) for the histogram calculation. Either OPALS Datamanager (ODM) files or a grid/raster files in in GDAL supported format are accepted.
OdmOrRasterFile: File path of an ODM or raster dataset.
See OPALS Datamanager and Supported grid/raster formats

-attributeattribute(s) for histogram calculation

Type: opals::Vector<opals::String>
Remarks: default=z
Specifies the attribute(s) for which the histogram is to be derived. Accepts ODM attribute names (e.g. z(default), Amplitude, EchoWidth etc.) or raster band number(s) and name(s) (e.g. 0 (default), 1...), respectively.

-histogramgeneric histogram output object

Type: opals::Vector<opals::HistoStats>
Remarks: output
The results of the histogram calculation are provided as list of generic objects containing statistical information like: min, max, mean, median..., and the histogram classes each containing its relative frequency. For each attribute a separate histogram is provided. Specify -outParamFile in order to store these results to an XML file.

-nBinsnumber of histogram bins (classes)

Type: uint32
Remarks: default=20
The (equidistant) bin width is calculated automatically, so that the specified number of bins fit into the sample range. Ignored if 'binWidth' is specified.

-binWidthwidth of a single histogram bin (class)

Type: float
Remarks: optional
If specified, the histogram is divided into equidistant classes (bins) of the given interval

-sampleRangehistogram sample range (min max)

Type: opals::Array<opals::AbsValueOrQuantile, 2>
Remarks: default=q:0.02 q:0.98
Range (min max) of sample (attribute) values considered for histogram calculation denoting the outermost bounds (i.e.: min=lower bound of first class, max=upper bound of last class. Additionally, under/overflow classes are provided to collect all samples outside the valid range. If not specified, the entire sample range of the specified ODM attribute or grid model is used.
AbsValueOrQuantile: Defines an absolute value or a relative value as quantile (between 0 and 1) or percentage (0-100). Quantiles are specified with a leading 'q:' and percentages with a trailing '' character. Furthermore, 'min' and 'max' as equivalent to 'q:0' and 'q:1' are also supported.
Syntax: <absValue> | q:<quantile-value> | <percentage>% | min | max

-densityRangehistogram density range [%] (min max)

Type: opals::Array<float, 2>
Remarks: optional
By default, the y-scale of the histogram, i.e. density [%], is derived automatically based on the maximum density class. A fixed density range can be forced by specifying -denityRange with a lower and upper bound (min,max). This is especially convenient for a visual comparison of multiple histogram plots.

-limitdata limit window: left lower right upper

Type: opals::Array<double, 4>
Remarks: optional
If specified, only data within the given window area (left lower right upper) are considered for histogram calculations.

-filtermodify the input using a (tree of) filter(s)

Type: opals::String
Remarks: optional
A filter string in EBNF syntax as described in Section 'Filter syntax' can be passed to restrict the set of input points (e.g. to consider a boundary polygon, etc. ). For raster/grid inputs the filter is evaluated for each valid grid post (3D-point: x, y, grid value). Thus, only certain filters (Affine, Region, Generic-Z filters) are applicable to raster/grid inputs.

-plotFileplot file

Type: opals::Vector<opals::Path>
Remarks: mandatory
The histogram is additionally plotted as a Scalable Vector Graphics file. SVG files can be displayed by web browsers like Firefox, IE, Opera.... The creation of a plot file can be suppressed by specifying: -plot file off Estimation rule: ODM: current directory and Name of inFile + '_<attribute>_histo.svg', Grids/Rasters: Current directory and Name of inFile + '_histo.svg'

-probabilitiesprobabilities for quantile calculation

Type: opals::Vector<double>
Remarks: default=0.01 0.05 0.25 0.75 0.9 0.95
A list (vector) of user-defined probabilities (0 <= p <= 1) values can be specified. The respective quantiles are calculated by the module.

-procModereal or integer processing mode

Type: opals::HistoMode
Remarks: default=automatic
For features aggregated into bins like minority and majority, the bin center is used as representative value. The same applies to procMode=integer, where the bin limits are shifted by half the bin size so that the corresponding integer value is located in the bin center. With procMode=real the bin limits are mutiples of the bin sizes, which is suitable for processing features of floating point (real) type. In automatic mode the module selects real or integer procMode based on the attribute type.

-exactComputationuse exact median/quantile computation

Type: bool
Remarks: default=0
In exact computation mode all data values have to fit into memory

-skipEmptyBinsskip empty bin entries

Type: bool
Remarks: default=0
If enabled, empty bin entries are skipped and therefore not stored or visualised in the result. This can be appropriate for discreet, non-continuous attributes like e.g. classification values.

-distributionFuncplot reference distribution

Type: opals::DistributionDescription
Remarks: optional
For visual comparison the defined reference distribution is plotted into the histogram and cumulative histogram if activated. Depending on the distribution one or two additional parameters are required for definition. Beside numeric values, one can use statistical literals for parameters which are computed during histrogram computation, e.g. 'normal(mean,stddev)', 'normal(median,stdDevMAD)', etc. In particular, the parameters 'mean' and 'stddev' are automatically added for normal/gaussian distribution, if no parameters were specified.
DistributionDescription: Description on a normal(=gaussian), chiSquare, exponential, fisher, poisson, rayleigh or students distribution.
Syntax: normal|gaussian|chiSquare|exponential|fisher|poisson|rayleigh|students ( <value1>|max|mean|median|mode|stdDev|stdDevMAD|rms|skewness, [<value2>|...] )

-cumulativeHistogramappend cumulative histogram

Type: bool
Remarks: default=0
When activated the module also outputs the cumulative histogram.

Examples

The data used in the following examples are located in the $OPALS_ROOT/demo/ directory. Example 1 shows several histogram variants based on an ALS point cloud (fullwave.odm) whereas example 2 features histograms of regular grids (stip19/20.tif).

Example 1: ODM attribute histograms

As a prerequisite for the following example, the ALS point cloud data must be imported into the ODM. To achieve that, change to the demo directory and type:

opalsImport -inFile fullwave.fwf -iFormat FWF

Now, we are ready to derive histograms based on the resulting ODM featuring the 3D-coordinates (x, y, z) and additional attributes (x, y, z, gps time, amplitude, echo width, echo nr, echo qualifier) for each point. The following example demonstrates how to to analyze different point attributes in the form of histograms:

opalsHisto -inFile fullwave.odm -binWidth 1 -outParamFile histo.xml
opalsHisto -inFile fullwave.odm -attribute EchoNumber -sampleRange 1 5 -binWidth 1
opalsHisto -inFile fullwave.odm -attribute Amplitude -sampleRange 0 200
opalsHisto -inFile fullwave.odm -attribute EchoWidth -sampleRange 1 8

The following SVG plot files are created:

fullwave_z_histo.svg
fullwave_EchoNumber_histo.svg
fullwave_Amplitude_histo.svg
fullwave_EchoWidth_histo.svg

Please note, that the names of the respective SVG plot files (not specified explicitly in the above examples) have been estimated from the input file and attribute names. The results are shown in Figure 1.

Fig. 1: Histograms of ALS point attributes: z (upper left), Echo number (upper right), Amplitude (lower left) and EchoWidth (lower right)

A numerical representation of the histogram and statistics can be obtained by specifying an output parameter file as shown in the first example (parameter outParamFile). The resulting XML file histo.xml contains the following output (excerpt):

[...]
<Module>
    <Name>opalsHisto</Name>
    <Version>1.0.0.0</Version>
    <License>[...]
    <Parameters>
        <Parameter Name='inFile'><Val>fullwave.odm</Val></Parameter>
        <Parameter Name='attribute'><Val>z</Val></Parameter>
        <Parameter Name='histogram'><Val>Histogram[Label[z]
BinWidth[1.000]
Bins[(280.989,2984) (281.989,1023) (282.989,1131) (283.989,1157) (284.989,1215) (285.989,1275)
(286.989,1324) (287.989,1394) (288.989,1426) (289.989,1502) (290.989,1510) (291.989,1695)
(292.989,1819) (293.989,1721) (294.989,1791) (295.989,1933) (296.989,1896) (297.989,1956)
(298.989,1958) (299.989,2059) (300.989,1958) (301.989,1901) (302.989,1985) (303.989,1836)
(304.989,1799) (305.989,1740) (306.989,1639) (307.989,1652) (308.989,1636) (309.989,1609)
(310.989,1648) (311.989,1650) (312.989,1624) (313.989,1604) (314.989,1540) (315.989,1506)
(316.989,1421) (317.989,1286) (318.989,1239) (319.989,1156) (320.989,671) (321.989,1544) ]
CountData[67413]
CountUsed[67413]
Min[275.535]
Max[328.500]
Mean[301.512]
Median[301.464]
Mode[300.489]
StdDev[11.685]
StdDevMAD[13.669]
Rms[301.738]
Skewness[-0.044]
CalcMode[real]
ExcateComputation[0]
Quantiles[(0.010,278.430) (0.050,282.238) (0.250,292.189) (0.750,312.619)
(0.900,318.009) (0.950,320.754) ]]<Val></Parameter>
        <Parameter Name='binWidth'><Val>1</Val></Parameter>
        <Parameter Name='nBins'><Val>50</Val></Parameter>
        <Parameter Name='plotFile'><Val>fullwave_z_histo.svg</Val></Parameter>
    </Parameters>
</Module>

Example 2: Raster dataset histogram

Example 2 shows how to derive histograms of grid datasets. To generate the underlying grids, please perform the following preprocessing steps:

opalsImport -inFile strip11.laz
opalsImport -inFile strip21.laz
opalsGrid -inFile strip11.odm -outFile strip11.tif -gridSize 1 -interpolation movingPlanes -feature sigmaZ excen
opalsGrid -inFile strip21.odm -outFile strip21.tif -gridSize 1 -interpolation movingPlanes -feature sigmaZ excen
opalsDiff -inFile strip11.tif strip21.tif -outf diff_11_21.tif
opalsAlgebra -inFile strip11_sigmaZ.tif strip11_excen.tif -outf strip11_mask.tif -formula "r[0] && r[1] && r[0]<0.1 && r[1]<0.8"
opalsAlgebra -inFile strip21_sigmaZ.tif strip21_excen.tif -outf strip21_mask.tif -formula "r[0] && r[1] && r[0]<0.1 && r[1]<0.8"
opalsAlgebra -inFile strip11_mask.tif strip21_mask.tif  -outf diff_11_21_mask.tif    -formula "r[0] && r[1] ? r[0]*r[1] : invalid"
opalsAlgebra -inFile diff_11_21.tif diff_11_21_mask.tif -outf diff_11_21_masked.tif  -formula "r[0] && r[1] ? r[0]*r[1] : invalid"

This procedure, first, imports the data of strips 11 and 21 (strip11/21.las) into separate ODM files, and generates surface models as well as additional "sigmaZ" (=smoothness) and "excenter" (=extrapolation/occlusion) layers for each strip (strip11.tif, strip11_sigmaZ.tif, strip11_excen.tif). Subsequently, a strip difference model (diff_11_21.tif) and a respective grid mask (diff_11_21_mask.tif) are derived. The examples below (results c.f. Figure 2) show the distribution of the grid heights and excenter layer of the surface model of strip 11. Furthermore, the height differences between strip 11 and 21 are analyzed in two variants, one using all available height differences in the strip overlap area, an the other one considering a grid mask to exclude all rough and occluded pixels.

opalsHisto -inFile strip11.tif -nBins 75
opalsHisto -inFile strip11_excen.tif -binwidth 0.1 -sampleRang 0 1 -plotfile strip11_excen_histo.svg
opalsHisto -inFile diff_11_21.tif -sampleRange -0.5 0.5 -plotFile diff_11_21_unmasked.svg
opalsHisto -inFile diff_11_21_masked.tif -sampleRange -0.5 0.5 -plotFile diff_11_21_masked.svg

Please further note, that wild cards are supported in case of multiple input datasets.

Fig. 2: Histograms of grid datasets: DSM heights (upper left), excenter layer (upper right), Unmasked strip differences (lower left), Masked strip differences (lower right)

Example 3: Standard sample range vs. full sample range

By default the module uses the 2% and 98% quantile to limit the histogram. This typically moves outliers to the underflow and overflow bins and presents the relevant data with appropriate resolution. The follow example shows the effect on the z and amplitude attribute for demo dataset strip11.

opalsHisto -inf strip11.odm -attr z amplitude -plotfile std_z_histo.svg std_Amplitude_histo.svg

opalsHisto -inf strip11.odm -attr z amplitude -samplerange min max -plotfile full_z_histo.svg full_Amplitude_histo.svg

Since the dataset contains long ranges and high amplitude values, the full range histograms are not very meaningful (upper images of figure 3) whereas the distribution of the attributes is nicely visable using the standard sample range (lower images of figure 3)

Fig. 3: Full range histograms (upper images) and standard sample range using the 2% and 98% quantiles (lower images)

Example 4: Histogram access via python

The main benefit of running Module Histo in Python is that the histogram as well as all standard statistics (min, mean, median, ...) are directly accessible after running the module via the Python API. This is exemplified in the sample script $OPALS_ROOT/demo/histoDemo.py:

 import opals
 from opals import Histo, Import
  
 imp=Import.Import(inFile="fullwave.fwf").run()
 his=Histo.Histo()
 his.commons.screenLogLevel = opals.Types.LogLevel.error
 his.inFile = ["fullwave.odm"]
 his.probabilities = [0.01,0.05,0.10,0.90,0.95,0.99]
 his.sampleRange=[280,320]
 his.binWidth=5
 his.plotFile="off"  # do not create a svg file
 his.run()
 stats = his.histogram[0]
 print("Selected height statistics of dataset: {}".format(his.inFile[0]))
 print("Min   : {:0.2f}".format(stats.getMin()))
 print("Median: {:0.2f}".format(stats.getMedian()))
 print("Max   : {:0.2f}".format(stats.getMax()))
 print("q0.95 : {:0.2f}".format(stats.getQuantiles()[4][1]))
 print("\nHistogram: lower height bound [m] - abs. #samples : rel. #samples [%]")
 bAbs, bRel = stats.getAbsBins(),stats.getBins()
 for i in range(len(bAbs)):
     print("  {:0.1f} - {:d}: {:0.2f}".format(bAbs[i][0], bAbs[i][1], bRel[i][1]*100.))

To run the script, type:

python histoDemo.py

The script queries the output parameter histogram provided by Module Histo as a complex Python type (class) and uses its access functions to print the min, median, and 95% height quantile of the dataset <fullwave.odm>. The following output is printed to the screen:

Selected height statistics of dataset: fullwave.odm
 
Min   :  275.53
Median:  301.46
Max   :  328.50
q0.95 :  320.36
 
Histogram: lower height bound [m] - abs. #samples : rel. #samples [%]
 
   275.0  -  1498 : 2.22
   280.0  -  4811 : 7.14
   285.0  -  6643 : 9.85
   290.0  -  8248 : 12.24
   295.0  -  9535 : 14.14
   300.0  -  9737 : 14.44
   305.0  -  8464 : 12.56
   310.0  -  8134 : 12.07
   315.0  -  6983 : 10.36
   320.0  -  3360 : 4.98

Example 5: Non-continuous histograms

Example 5 demonstrates the skipEmptyBins parameter and its effect on the histogram. For non-continuous (integer) attributes like classification values or the like, a continuous histogram that covers the full data range might be inappropriate since the magnitude of those values is typically irrelevant. Activating the skipEmptyBins parameter as shown below, only plots occupied bins next to each other despite of the bin borders. To make non-continuous histogram easily recognizable, a uniform gab between all bars are inserted.

opalsImport -inFile classify/niederrhein.laz -outFile niederrhein.odm
opalsHisto -inFile niederrhein.odm -attr classification -sampleRange min max -plotFile continuous_histo.svg
opalsHisto -inFile niederrhein.odm -attr classification -sampleRange min max -skipEmptyBins 1 -plotFile non-continuous_histo.svg

Fig. 4: Histogram of classification values in standard, continuous mode (left) and skipEmptyBins, non-continuous mode (right)

The skipEmptyBins parameter not only influences the visual presentation of the histogram but also the actual bin table within the OPALS log as well as the python histogram object.

Fig. 5: Histogram bins of classification values in standard, continuous mode (left) and skipEmptyBins, non-continuous mode (right)

Example 6: Generate a quantile-quantile plot

With a few lines of python code it is straightforward to generate a Q-Q plot. Plotting quantiles of a theoretical distribution against computed quantiles of a data set, the equality of the two probability distributions can be visually analyzed. In this example the height differences of strip11 and strip21 (as computed above and written to file diff_11_21.tif) are compared against the normal distribution. Run the Q-Q-plotDemo.py demo script with the following parameters

python Q-Q-plotDemo.py diff_11_21.tif 1

to see figure 6 on the screen. When generating Q-Q plots it is recommended to compute exact quantiles by activating the exactComputation parameter (see General description).

Fig. 6: Q-Q plot of strip difference (strip11 and strip21) compared to normal distribution

Example 7: Histogram with superimposed plot of theoretical density function

Sometimes it is of interest to see if the values of a given sample follow a certain theoretical distribution like the normal one. In this example, we will check if the difference of the strips strip11 and strip21 follow the normal distribution. For this, we will limit the analysis to smooth objects, like streets or roofs, because the differences at rough objects, like (high) vegetation, will not obey the very same distribution. We assume grid cells to be smooth if sigma0 of their moving planes interpolation is below 10 cm, which is realised using a mask.

opalsImport -inf strip11.laz
opalsImport -inf strip21.laz
opalsGrid -inf strip11.odm -inter movingPlanes  -feature sigma0
opalsGrid -inf strip21.odm -inter movingPlanes  -feature sigma0
opalsAlgebra -inFile strip21_z_sigma0.tif strip11_z_sigma0.tif -outFile mask.tif -form "r[0] < 0.1 && r[1] < 0.1 ? 255 : 0"  -data_type_grid uint8 -nodata none
opalsAlgebra -inFile strip11_z.tif  strip21_z.tif  mask.tif  -outFile diff_11_21_masked.tif  -form "r[2] > 0 ? r[0]-r[1] : invalid"
 
opalsZColor -inf diff_11_21_masked.tif -palFile differencePal.xml -sca 0.02

Figure 7 below shows the color coded masked strip difference, limited to smooth areas.

Fig. 7: Color coded masked strip difference, limited to smooth areas; units are in cm.

With the following command the histogram of these differences is plotted together with the theoretical normal distribution based on the mean and the standard deviation derived from these differences, which are -0.009 m and 0.309 m, respectively.

opalsHisto -inf diff_11_21_masked.tif -sampleRange -1 1 -distributionFunc normal

Figure 8 (left) shows the result and one can clearly see, that the density curve of the normal distribution based on mean and the standard deviation from sample do not fit together at all. The reason for this is visible in figure 7. Some smooth cells have very large differences, which obviously do not belong to the same distribution as most of the other smooth cells. These outlier cells mostly appear at moving objects on the streets where e.g. in one strip the street surface is measured whereas in the other strip at the very same location the roof of a car was measured. In order to exclude these outliers from deriving the defining parameters of the normal distribution, one either has to fine-tune the mask (e.g. by manual inspection and correction) or one simply uses robust statistics. For the normal distribution, the median is a robust estimate of the mean and the MAD-based standard deviation is a robust estimate of the standard deviation. Using these values, which are -0.050 m and 0.011 m, as defining parameters for the theoretical normal distribution in the opalsHisto call is done in the following way:

opalsHisto -inf diff_11_21_masked.tif -sampleRange -0.1 0.1 -distributionFunc "normal(median,stdDevMAD)"

Figure 8 (right) has the result, which clearly shows that the theoretical normal distribution based on these robustly estimated values represents the strip differsence very well.

Fig. 8: Histogram with superimposed normal distribution with defining parameters Mean and StdDev (left), and Median and StdDevMAD (right).

Example 8: Histogram combined with the cumulative version

The cumulative histogram can be useful to (roughly) estimate quantiles of a given sample. In this example we'll compute the point density (based on last echoes) for the strip11.

opalsImport -inf strip11.laz

opalsCell -inf strip11.odm -feature pdens -cell 2 -filter "Echo[Last]"

Which point density is provided by the data of strip11? In order to answer that question, one has to derive a certain statistic. For this the histogram can be useful:

opalsHisto -inf strip11_z_pdens.tif -sampleRange 0 10 -cumulativeHistogram 1

The resulting histogram with the cumulative version is shown in figure 9.

Fig. 9: Ordinary (left) and cumulative histogram (right) of the point density of strip11. The meaning of the red and green lines is explained in the main text.

In an heuristic approach one could use the derived median (6.75 points/ $m^2$) and say that the point density of strip11 is roughly 7 points/ $m^2$.

The density palette for color coding is motivated by the traffic light paradigm: green indicates ok, yellow means attention, and red indicates stop. Consequently, the color coding of the point density of strip11 using 7 points/ $m^2$ as reference value

opalsZColor -inf strip11_z_pdens.tif -palFile densityPal.xml -sca 7

would look like:

Fig. 10: Color coding of the point density using 7 points/m^2 as reference value.

Judging by the large areas colored in yellow, orange and red, this data, obviously, does not provide 7 points/ $m^2$ everywhere. Actually, since the color coding is based on the median value, only 50% of the area will appear in green tones.

Figure 10 clearly shows, that the point density in this section of this strip (which was flown roughly in south-east direction) is not homogenous. One can see scan shadows close to the buildings and two larger regions with low density (in yellow and orange color, respectively) orthogonal to the flight direction. The latter low density areas are caused by aircraft motions (e.g. wind-induced pitch variations).

Because of these density variations, one cannot expect the same point density everywhere. Consequently, one must be tolerant and willing to accept a certain portion of the area to be less dense. And this is where quantiles are helpful. If we are asking for the point density that is guaranteed in e.g. 95% of the area, i.e. we are willing to accept 5% of low density, then we ask for the 0.05 quantile (or 5-th percentile). Using the cumulative histogram in figure 9, these quantiles can be roughly estimated: Following the red line the 0.05 quantile is roughly 3 points/ $m^2$. If we would be willing to accept 10% of low density, then the 0.10 quantile would be around 4 points/ $m^2$ (indicated by the green line in figure 9).

In case of an accepted low density portion of 5% the color coding would look like:

opalsZColor -inf strip11_z_pdens.tif -palFile densityPal.xml -sca 3

Fig. 11: Color coding of the point density using 3 points/m^2 as reference value.

The prevailing green tones in this color coding reflect our choice that 95% of the area guarantees 3 points/ $m^2$.

By the shape of the cumulative histogram we also get an idea of how critical our choice of the acceptable low density portion is.

Addendum: If we know right from the beginning, that we will only tolerate a low density portion of e.g. 3%, then we can call opalsHisto with the probabilities parameter:

opalsHisto -inf strip11_z_pdens.tif -sampleRange 0 10 -cumulativeHistogram 1 -exactComputation 1 -probabilities 0.01 0.03 0.05 0.1

In this case it might by good to initiate the exactComputation mode instead of the default approximate computation. The differences in this example are small:

Quantiles (exact computation)
q     0.010 0.030 0.050 0.100
pdens 1.500 2.500 3.000 4.000
 
Quantiles (non-exact computation)
q     0.010 0.030 0.050 0.100
pdens 1.489 2.475 3.045 4.233

References

Ressl, C., Kager, H. and Mandlburger, G., 2008. Quality checking of ALS projects using statistics of strip differences. In: IAPRS, XXXVII, pp. 25399860.

R. Jain and I. Chlamtac, The $P^2$ algorithm for dynamic calculation of quantiles and histograms without storing observations, Communications of the ACM, Volume 28 (October), Number 10, 1985, p. 1076-1085.

Table of Contents