Generic filter

Generic filter is one of the Filters provided by OPALS to filter vector data. Among them, generic filter is a rather flexible one: it allows for evaluating arbitrarily complex expressions consisting of any of the supported attribute identifiers (see ODM predefined attributes), coordinate identifiers, raster identifiers, neighbor geometry identifiers, constants, scalar-valued unary and binary functions, and a bunch of arithmetic, relational, logical, conditional, and assignment operators.

For each object to filter, generic filter (generally) substitutes attribute identifiers contained in the expression for the object's corresponding attribute values, and coordinate identifiers for the object's coordinates. If the subsequent evaluation converts to true, then this object passes through the generic filter.

The expression syntax not only provides means to use attribute and coordinate values, but also to test the presence of attributes and the validity of coordinates.

Kindly note that filtering by coordinates is applicable only to point data i.e. any lines will not pass. Depending on the OPALS module, the set of admissible operators may be limited.

In the following, the expression syntax is discribed twice: once in a rather colloquial way (see Informal syntax definition with examples), including some exemplary expressions, once in a formal way (see Formal grammar ).

# Informal syntax definition with examples

## Attributes, Coordinates, and Constants

Generic filter expressions constitute rooted binary trees, where the leaves are either attribute identifiers, coordinate identifiers, or constants. If the root node is an attribute identifier, then the evaluation returns the presence of the attribute, not its value. Likewise, if the root node is a coordinate identifier, then the evaluation returns the validity of the coordinate (and not its value). Thus,
Generic[SigmaZ]
is a valid generic filter consisting of a leaf only: objects that feature the specified attribute pass through, regardless of the attribute's value.
Generic[Z]
is a valid generic filter, too. Objects featuring valid z-coordinates pass through.

## Arithmetic Expressions

Leaves may be combined to arithmetic expressions using resp. operators and functions (see the table below), regarding the usual operator precedence:
Generic[0.1 + 2*SigmaZ]
is a generic filter consisting of 3 leaves and 2 binary operators: as the attribute identifier is not the root node, the corresponding attribute value is evaluated. Objects that feature the specified attribute and for whose attribute value 0.1 + 2*SigmaZ converts to true (i.e. the result is different from zero) pass through. As one may expect, the attribute presence and value are evaluated first, then the multiplication is performed, followed by the addition, and finally, the resulting real number is converted to a boolean value.

## Comparative Expressions

Leaves and / or arithmetic expressions may further be combined to comparative expressions using resp. operators:
Generic[SigmaZ < 0.1]
Generic[2*SigmaX < SigmaY]
Generic[PointLabel == "Vienna"]
Generic[Z > 100]

## Logical Expressions

Leaves, arithmetic expressions, and comparative expressions may be combined to logical expressions using the operators already mentioned in Filters :
Generic[SigmaZ < 0.1 AND PointLabel == "Vienna"]
lets data pass through that feature both the attributes SigmaZ and PointLabel, and for whom both comparisons evaluate to true.
Generic[SigmaZ < 0.1 OR PointLabel == "Vienna"]
lets data pass through that either feature the attribute SigmaZ, and whose value is smaller than 0.1, or data that feature the attribute PointLabel, and whose value is Vienna, or data that meet both conditions.
Generic[Z >= 0 AND Z <= 100]
lets (point) data pass through that feature valid z-coordinates within the (inclusive) interval [0, 100].

Like the root node, logical operators show a special behaviour concerning the evaluation of attribute-leaves if applied directly as operands:
Generic[SigmaX AND SigmaY]
does not evaluate any attribute value, but combines the presence of the 2 attributes, i.e. data pass that provide both attributes, regardless of their values. In
Generic[SigmaX OR SigmaY]
objects that feature one or both of the specified attributes pass through, while in
Generic[SigmaX OR SigmaY < 0.1]
objects pass through that either provide the attribute corresponding to SigmaX, or hold SigmaY, with a value smaller than 0.1. Data that meet both conditions are also passed through.

## Conditional Expressions

Logical expressions may serve as a condition for the ternary conditional operator. This operator evaluates to its 2nd argument if the condition (1st argument) evaluates to true, and to its 3rd argument otherwise:
Generic[x >= 0 ? y : z ]
lets data pass through whose x-coordinates are larger than or equal to zero and who have valid y-coordinates; otherwise, those who have valid z-coordinates.

Kindly note that currently, it is not possible to use parenthesis within the condition.

## Assignments

Finally, the assignment operator is supported, which may e.g. serve for setting attributes during import/export:
Generic[amplitude = 100*_myIntensity]
The assignment operator always assigns to its 1st argument its 2nd argument, regardless of the 2nd argument's validity, and it lets all data pass through. Thus, in the above example, for objects that feature the user-defined attribute _myIntensity, this user-defined attribute times 100 is assigned to the (predefined) attribute amplitude. amplitude results as invalid for objects that do not feature _myIntensity, no matter if amplitude was valid before or not. Regardless of the resulting validity of amplitude, all objects pass through the filter.

Multiple assignments may be specified, separated by a semicolon.

For user-defined attributes, their data type may explicitly be given enclosed in parentheses, following the user-defined attribute identifier:
Generic[_myQuality(float) = pow( SigmaX*SigmaY*SigmaZ, 1/3 ) ]
For objects that feature the predefined attributes SigmaX, SigmaY, and SigmaZ, this filter assigns to the user-defined attribute _myQuality of data type float a valid value. For all other objects, it renders the attribute _myQuality as invalid.

## Statistical operations

Generic filters support a set of statistical operators via the general syntax:
operator(.)
where operator is one of the identifiers that belong to rule Stat: min, max, mean, sum, stdDev, ... . With this syntax, statistical operators are applied to the valid items among the supplied ones. These items can either be the

• supplied rasters, denoted by r, or the
• supplied neighbor geometries, denoted by n. In this case, either a coordinate or an attribute identifier to be evaluated must follow, separated by a dot. The exception to this rule is count(n), which counts the number of supplied neighbor geometries, independent of their coordinates and attributes.

As mentioned, only valid items among the supplied ones are considered in the statistical operations. Thus, e.g.
Generic[ _myStat(float) = max(r) ]
determines the maximum value of all valid rasters (i.e. not NODATA) and assigns that maximum to the user-defined attribute _myStat of type float, while
Generic[ _myStat(float) = max(n.z) ]
does the same for the valid z-coordinates (i.e. of finite value) of all neighbor points.
count(n) considers all supplied neighbor geometries as valid, and hence, the following assigns the number of supplied geometries to the user-defined attribute _myStat:
Generic[ _myStat(float) = count(n) ]
If only a subrange of the supplied rasters shall be evaluated, then use the following syntax, which follows the indexing syntax of Python sequence types:
operator(r[beg:end])
where beg is the zero-based index of the first supplied raster to be evaluated, and end is the index of one after the last supplied raster to be evaluated. Both beg and end are optional and default to 0 and the number of supplied rasters, respectively. Negative indices are added to the number of supplied rasters, and hence refer to the end of the array of supplied rasters.
Generic[ _myStat(float) = max(r[0:2]) ]
thus determines the maximum value of the valid rasters among the first 2 of all supplied rasters and assigns that maximum to the user-defined attribute _myStat of type float. The same indexing syntax can be used to select a subrange of neighbor geometries, e.g.:
Generic[ _myStat(float) = max(n[0:-3].SigmaX) ]
assigns the maximum value of the valid attributes SigmaX among the supplied neighbor geometries, excluding the last 3 neighbor geometries.

## Random number generators

Random numbers may be generated using the general syntax:
random( distribution, arg1 [, arg2] )
where distribution is one of the distribution names mentioned in column "Meaning" for rule random: uniform_int, normal, ... . The distribution name must be followed by the distribution parameters, given in column "Symbol(s)", separated by commas. Hence,
random( normal, 0, 1 )
will generate real numbers distributed according to the standard normal distribution (with mean 0.0 and standard deviation 1. ). Note that the combination of the uniform integer distribution and the modulo function provides a simple way of random subsampling. To e.g. select 1 out of 10 points, use:
random( uniform_int, 1, 10 ) == 1
For a different way to do subsampling, see Serial number generator .

## Serial number generator

The enumeration of non-negative integer numbers in ascending order (0, 1, 2, ...) is gained with:
serial()
This offers a way to subsample data based on its order of evaluation. To e.g. import only every 4th geometry, one may call Module Import like this: opalsImport -infile data.las -filter "Generic[fmod(serial(), 4) == 0]"
For a different subsampling method, see Random number generators.
When using this nullary function in a more complex filter tree, mind that each serial number generator holds its own position in the sequence of non-negative integers as state, and this position is advanced only when queried. This has implications when using

• multiple serial number generators in the same filter tree,
• Conditional Expressions , and

serial() + serial()
generates the sequence 0, 2, 4, ...

x >= 0 ? serial() : -1
returns -1 for points with negative x-coordinates. For other points, it enumerates the sequence 0, 1, 2, ..., irrespective of how many points with negative x-coordinates are evaluated in between them.

## Restrictions on the composition of generic filters

Generic filter divides the attribute data types supported by the ODM into 2 groups:

1. numeric types
1. character strings

Internally, generic filter converts all numeric values to real numbers, and the full range of operators shown in the table below is supported. String-type-attributes support the addition and relational operators only. Identifiers of numeric and character string attributes must not be mixed within the same arithmetic or relational expressions, but they may be combined using logical operators. Hence,
Generic[SigmaX AND PointLabel]
is a valid expression, while
Generic[SigmaX + PointLabel]
is not.

## Generic filter leaves, operators, and functions

The following table comprises all tokens understood by generic filters. The precedence defines the order of evaluation: the lower the precedence number, the sooner the evaluation. Operators and functions of the same precedence are evaluated from left to right. Kindly note that all trigonometric functions use radians as angular measure.

OPALS generic filter leaves, operators, and functions
Group Prec. Rule Symbol(s) Meaning
arithmetic 1 Attribute <predefined attribute identifier>
<user-defined attribute identifier>
depending on the parent node, evaluates to either the presence of the corresponding data attribute, or the attribute value. User-defined attribute identifiers may be single- or double-quoted. If quoted, they may contain white space.
Coordinate x, y, z depending on the parent node, evaluates to either the validity or the value of the corresponding data coordinate. Returns invalid values for non-point data.
Raster r[i] depending on the parent node, evaluates to either the validity or the value of the corresponding raster element. Indexing starts at 0.
Neighbor n[i].attribute depending on the parent node, evaluates to either the validity or the value of the neighbor coordinate/attribute. Indexing starts at 0.
Self s Used only as argument to rule NeighborBinary, to select the current geometry. This is the same as n[0] unless different processing and neighbor filters are used, as is possible with e.g. Module Normals.
Real <string literal convertible to a real number> leaf that evaluates to the passed number
String <single or double quoted string literal> leaf that evaluates to the passed string (unquoted)
Factor - unary minus
+ unary plus, evaluates to its argument
(...) grouping operator
Constant true leaf that evaluates to the boolean value 'true'
false leaf that evaluates to the boolean value 'false'
pi leaf that evaluates to $$\pi$$, the ratio of a circle's area to the square of its radius
invalid leaf that evaluates to an invalid value
Nullary serial() evaluates to the enumeration of non-negative integers in ascending order: 0, 1, 2, ... See Serial number generator
Unary abs(...) unary function that evaluates to the absolute value of its argument
acos(...) unary function that evaluates to the arc cosine of its argument
asin(...) unary function that evaluates to the arc sine of its argument
atan(...) unary function that evaluates to the arc tangent of its argument, return values are in the range $$[-\pi/2,+\pi/2]$$
ceil(...) unary function that evaluates to the smallest integer not less than its argument
cos(...) unary function that evaluates to the cosine of its argument
cosh(...) unary function that evaluates to the hyperbolic cosine of its argument
deg2rad(...) unary function that converts its argument from degrees to radians i.e. it evaluates to $$\pi/180$$ times its argument
exp(...) the exponential function $$e^x$$, with argument as x, or Euler's number raised to the argument-th power, resp.
floor(...) unary function that evaluates to the largest integer not greater than its argument
grad2rad(...) unary function that converts its argument from gradians (gons) to radians i.e. it evaluates to $$\pi/200$$ times its argument
log(...) unary function that evaluates to the natural (base $$e$$) logarithm of its argument
log10(...) unary function that evaluates to the base 10 logarithm of its argument
rad2deg(...) unary function that converts its argument from radians to degrees i.e. it evaluates to $$180/\pi$$ times its argument
rad2grad(...) unary function that converts its argument from radians to gradians (gons) i.e. it evaluates to $$200/\pi$$ times its argument
round(...) unary function that evaluates to the integer closest to its argument. round(0.5) == 1, round(-0.5) == 0
sin(...) unary function that evaluates to the sine of its argument
sinh(...) unary function that evaluates to the hyperbolic sine of its argument
sqrt(...) unary function that evaluates to the square root of its argument
tan(...) unary function that evaluates to the tangent of its argument
tanh(...) unary function that evaluates to the hyperbolic tangent of its argument
Binary atan2(..., ...) atan2(y,x) evaluates to the arc tangent of y/x, using the signs of the arguments to compute the quadrant of the return value that is in the range $$[-\pi,+\pi]$$
fmod(..., ...) fmod(x,y) evaluates to the remainder of x/y
ldexp(..., ...) ldexp(num,exp) evaluates to num * 2exp
pow(..., ...) pow(base,exp) evaluates to base raised to the exp-th power
NeighborBinary SqrDist2D(n0, n1) evaluates to the Euclidean distance squared, considering x- and y-coordinates only
SqrDist3D(n0, n1) evaluates to the Euclidean distance squared, considering x-, y- and z-coordinates
Dist2D(n0, n1) evaluates to the Euclidean distance, considering x- and y-coordinates only
Dist3D(n0, n1) evaluates to the Euclidean distance, considering x-, y- and z-coordinates
Azimuth(n0, n1) evaluates to the angle with the positive y-axis of the difference vector of 2 neighbors projected onto the x/y plane, counted clockwise i.e. $$atan2( n1.x - n0.x, n1.y - n0.y)$$
ZenithDist(n0, n1) evaluates to the angle with the positive z-axis of the difference vector of 2 neighbors i.e. $$atan2( sqrt( pow(n1.x - n0.x, 2) + pow(n1.y - n0.y, 2) ), n1.z - n0.z )$$
Quadrant(n0, n1) evaluates to the quadrant number (1-4) of n1 with n0 as coordinate system origin. The coordinate axes are included to the quadrant region in a counterclockwise manner. i.e. the positive x-axis is part of quadrant 1, the positive y-axis is part of quadrant 2, etc. Furthermore, the origin is included to quadrant 1.
Octant(n0, n1) evaluates to the octant number (1-8) of n1 with n0 as coordinate system origin. The octant definition is based on the quadrant definition above. if n1.z < n0.z than the quadrant value is increased by 4.
Stat count( . ) returns the number of valid items
countUnique( . ) returns the number of distinct, valid items
first( . ) returns the first valid item
min( . ) returns the minimum of valid items
max( . ) returns the maximum of valid items
sum( . ) returns the sum of valid items
mean( . ) returns the arithmetic mean of valid items
median( . ) returns the median of valid items
rms( . ) returns the root mean square of valid items as $$\sqrt{\sum_{i=1}^{N} r_i^2 / N}$$ with $$N$$ being the count of valid items
stdDev( . ) returns the standard deviation of valid items as $$\sqrt{\frac{1}{N-1} \sum_{i=1}^{N}(r_i - \bar{r})^2 }$$ with $$\bar{r}=\sum_{i=1}^{N}r_i/N$$ and $$N$$ being the count of valid items
stdDevMAD( . ) returns a robust estimation of the standard deviation of valid items: the median of the absolute values of the deviations from the sample median, scaled to make it a robust, consistent estimator of the standard deviation of normally distributed data (see here). It is computed as: $$1.4826 \cdot median_i(\left|r_i - median_j(r_j)\right|)$$ with $$i,j=1..N$$ and $$N$$ being the count of valid items
minAbs( . ) returns the minimum absolute value of all valid items: minAbs(-5,4,-1)=1
maxAbs( . ) returns the maximum absolute value of all valid items: maxAbs(-5,4,-1)=5
meanAbs( . ) returns the arithmetic mean of the absolute values of all valid items
minAbsSigned( . ) returns the signed minimum absolute value of all valid items: minAbsSigned(-5,4,-1)=-1
maxAbsSigned( . ) returns the signed maximum absolute value of all valid items: maxAbsSigned(-5,4,-1)=-5
Random a, b uniform_int
a, b uniform_real
p bernoulli
t, p binomial
t, p negative_binomial
p geometric
$$\mu$$ poisson
$$\lambda$$ exponential
$$\alpha$$, $$\beta$$ gamma
a, b weibull
a, b extreme_value
$$\mu$$, $$\sigma$$ normal
m, s lognormal
n chi_squared
a, b cauchy
m, n fisher_f
n student_t
2 Term * binary operator that multiplies its operands
/ binary operator that divides its left operand by its right operand
3 Expression + binary operator that adds its operands
- binary operator that subtracts its right operand from its left operand
relational 4 Less <= binary operator that evaluates to true, iff its left operand is smaller than or equal to its right operand
< binary operator that evaluates to true, iff its left operand is smaller than its right operand
>= binary operator that evaluates to true, iff its left operand is greater than or equal to its right operand
> binary operator that evaluates to true, iff its left operand is greater than its right operand
5 Equal == binary operator that evaluates to true, iff its left operand is equal to its right operand
!= binary operator that evaluates to true, iff its left operand is not equal to its right operand
logical 6 Inverted !   not unary operator that evaluates to true, iff its operand evaluates to false
(...) logical grouping operator
7 And &&   and binary operator that evaluates to true, iff both of its operands evaluate to true
8 Or ||   or binary operator that evaluates to true, iff one or both of its operands evaluate to true
conditional 9 TernaryConditional ?   : a ? b : c evaluates to b if a evaluates to true, otherwise evaluates to c
assignment 10 Assignment = assignment operator. Multiple assignments must be separated by ';'. To specify the data type of user-defined attributes to be assigned to, append the data type in parentheses.

tbd

# Formal grammar

In the following, a formal definition of the generic filter syntax is given. For possible tokens denoting predefined attributes ( "PreDefinedAttribute" in rule Attribute), see ODM predefined attributes. For the production rule for real numbers ( "Real" in rule Constant), see Filter string syntax.

## Graphical repesentation

The generic filter syntax represented as railroad diagrams, as here: for an explanation of symbols, see Railroad diagrams.

GenericFilter:

Statement:

Assignment:

TernaryConditional:

Or:

And:

Inverted:

Equal:

Less:

Expression:

Term:

Factor:

Attribute:

UserDefinedAttribute:

Coordinate:

Constant:

Nullary:

Unary:

Binary:

Raster:

Index:

Neighbor:

NeighborGeometry:

SelfGeometry:

Stat:

IndexRange:

NeighborBinary:

String:

 ... generated by Railroad Diagram Generator R R

## Representation in EBNF syntax

For the formal definition of the generic filter syntax in EBNF, the same notation as in Filter string syntax is used.

GenericFilter ::= Statement
Statement ::= Assignment ( ";" (Assignment)? )*
Assignment ::= TernaryConditional ( "=" Assignment )?
TernaryConditional ::= Or ( "?" TernaryConditional ":" TernaryConditional )?
Or ::= And ( ( "Or" | "||" ) And )*
And ::= Inverted ( ( "And" | "&&" ) Inverted )*
Inverted ::=
Equal
| "(" Or ")"
| ( "Not" | "!" ) Inverted
Equal ::=
Less
(
( "==" | "!=" ) Less
)?
Less ::=
Expression
(
( "<=" | "<" | ">=" | ">" ) Expression
)?
Expression ::=
Term
(
( "+" | "-" ) Term
)*
Term ::=
Factor
(
( "*" | "/" ) Factor
)*
Factor ::=
Attribute
| Coordinate
| Constant
| Nullary
| Unary
| Binary
| Raster
| Neighbor
| Stat
| NeighborBinary
| "(" Expression ")"
| ( "-" | "+" ) Factor
Attribute ::=
PreDefinedAttribute |
UserDefinedAttribute
UserDefinedAttribute ::=
"_" [a-zA-Z0-9]+ ( [._]+ [a-zA-Z0-9]+ )*
Coordinate ::=
"x"
| "y"
| "z"
Constant ::=
Real
| String
| "true"
| "false"
| "pi"
| "invalid"
Nullary ::=
"serial"
Unary ::=
"abs"
| "acos"
| "asin"
| "atan"
| "ceil"
| "cos"
| "cosh"
| "exp"
| "floor"
| "log"
| "log10"
| "round"
| "sin"
| "sinh"
| "sqrt"
| "tan"
| "tanh"
Binary ::=
"atan2"
| "fmod"
| "ldexp"
| "pow"
Raster ::=
"r" Index
Index ::=
"[" [0-9]+ "]"
Neighbor ::=
NeighborGeometry "." ( Coordinate | Attribute )
NeighborGeometry ::=
"n" Index
SelfGeometry ::=
"s"
Stat ::=
"count" "(" "n" IndexRange? ")"
| ( "count" | "countUnique" | "first" | "min" | "max" | "minAbs" | "maxAbs" | "minAbsSigned" | "maxAbsSigned" | "sum" | "mean" | "median" | "rms" | "stdDev" | "stdDevMAD" )
"(" ( ( "r" ( IndexRange )? )
| ( "n" ( IndexRange )? "." ( Coordinate | Attribute ) )
) ")"
IndexRange ::=
"[" ( [+-]? [0-9]+ )? ":" ( [+-]? [0-9]+ )? "]"
NeighborBinary ::=
( "SqrDist2D" | "SqrDist3D" | "Dist2D" | "Dist3D" | "Azimuth" | "ZenithDist" | "Quadrant" | "Octant")
"(" ( NeighborGeometry | SelfGeometry ) "," ( NeighborGeometry | SelfGeometry ) ")"
String ::=
"'" ([^#x27#x5C#xA#xD])* "'"
| '"' ([^#x22#x5C#xA#xD])* '"'
Date
03.05.2013
@ Z0
Projection center's Z-coordinate.