Skip to content

OutlierDetection.jl

Score Helpers

Score Helpers

OutlierDetection.jl provides many useful helper functions to work with outlier scores. The goal of these helpers is to normalize, combine and classify raw outlier scores. The main design philosophy behind all of these functions is that they transform a tuple of train/test scores into some different train/test tuple representation, e.g. train/test classes.

Transformers

In order to normalize scores or classify them, both the training and testing scores are necessary. We thus return a tuple of training and test scores in all transform calls. Transformers can make use of one or more such train/test tuples to convert them into normalized scores, probabilities or classes.

`ScoreTransformer`

# OutlierDetection.ScoreTransformer — Type.

ScoreTransformer(combine = combine,           
                 normalize = normalize)

Transform the results of a single or multiple outlier detection models to combined and normalized scores.

Parameters

normalize::Function A function to reduce a matrix, where each row represents an instance and each column a score of specific detector, to a vector of scores for each instance. See scale_minmax for a specific implementation.

combine::Function

A function to reduce a matrix, where each row represents an instance and each column represents the score of specific detector, to a vector of scores for each instance. See combine_mean for a specific implementation.

`ProbabilisticTransformer`

# OutlierDetection.ProbabilisticTransformer — Type.

ProbabilisticTransformer(combine = combine,           
                         normalize = normalize)

Transform the results of a single or multiple outlier detection models to combined univariate finite distributions.

Parameters

normalize::Function A function to reduce a matrix, where each row represents an instance and each column a score of specific detector, to a vector of scores for each instance. See scale_minmax for a specific implementation.

combine::Function

A function to reduce a matrix, where each row represents an instance and each column represents the score of specific detector, to a vector of scores for each instance. See combine_mean for a specific implementation.

`DeterministicTransformer`

# OutlierDetection.DeterministicTransformer — Type.

DeterministicTransformer(combine = combine,           
                         normalize = normalize,
                         classify = classify_quantile(DEFAULT_THRESHOLD))

Transform the results of a single or multiple outlier detection models to combined categorical values.

Parameters

normalize::Function A function to reduce a matrix, where each row represents an instance and each column a score of specific detector, to a vector of scores for each instance. See scale_minmax for a specific implementation.

combine::Function

A function to reduce a matrix, where each row represents an instance and each column represents the score of specific detector, to a vector of scores for each instance. See combine_mean for a specific implementation.

Surrogate models can be used to transform learning networks into MLJ-models.

`@surrogate`

# OutlierDetection.@surrogate — Macro.

@surrogate(fn, name)

Create a surrogate model from a learning network, implicitly defining a composite struct using name and a prefit function using fn.

Parameters

fn::Function

A function to reduce a matrix, where each row represents an instance and each column represents the score of specific detector, to a vector of scores for each instance. See combine_mean for a specific implementation. name::Symbol The name of the resulting composite model (the surrogate model).

Wrappers

Wrappers take one or more detectors and transform the (combined) raw scores to probabilities (ProbabilisticDetector) or classes (DeterministicDetector). Using wrappers, you can easily evaluate outlier detection models with MLJ.

`CompositeDetector`

# OutlierDetection.CompositeDetector — Function.

CompositeDetector(unnamed_detectors...;
                  normalize,
                  combine,
                  named_detectors...)

Transform one or more raw detectors into a single composite detector (that returns raw outlier scores).

`ProbabilisticDetector`

# OutlierDetection.ProbabilisticDetector — Function.

ProbabilisticDetector(unnamed_detectors...;
                      normalize,
                      combine,
                      named_detectors...)

Transform one or more raw detectors into a single probabilistic detector (that returns outlier probabilities).

`DeterministicDetector`

# OutlierDetection.DeterministicDetector — Function.

DeterministicDetector(unnamed_detectors...;
                      normalize,
                      combine,
                      named_detectors...)

Transform one or more raw detectors into a single deterministic detector (that returns inlier and outlier classes).

Normalization

These functions may be used as an input for the normalize keyword argument present in wrappers and transformers, they transform a tuple of train/test scores into a tuple of normalized train/test scores.

`scale_minmax`

# OutlierDetection.scale_minmax — Function.

scale_minmax(scores)

Transform an array of scores into a range between [0,1] using min-max scaling.

Parameters

  scores::Tuple{Scores, Scores}

A tuple consisting of two vectors representing training and test scores.

Returns

normalized_scores::Tuple{Scores, Scores}

The normalized train and test scores.

Examples

scores_train, scores_test = ([1, 2, 3], [4, 3, 2, 1, 0])
scale_minmax(scores_train, scores_test) # ([0.0, 0.5, 1.0], [1.0, 1.0, 0.5, 0.0, 0.0])

`scale_unify`

# OutlierDetection.scale_unify — Function.

scale_unify(scores)

Transform an array of scores into a range between [0,1] using unifying scores as described in [1].

Parameters

scores::Tuple{Scores, Scores}

A tuple consisting of two vectors representing training and test scores.

Returns

unified_scores::Tuple{Scores, Scores}

The unified train and test scores.

Examples

scores_train, scores_test = ([1, 2, 3], [4, 3, 2, 1, 0])
unify(scores_train, scores_test) # ([0.0, 0.0, 0.68..], [0.95.., 0.68.., 0.0, 0.0, 0.0])

References

Kriegel, Hans-Peter; Kroger, Peer; Schubert, Erich; Zimek, Arthur (2011): Interpreting and Unifying Outlier Scores.

Combination

These functions may be used as an input for the combine keyword argument present in wrappers and transformers. The input for the combine functions are one or more train/test score tuples or alternatively a matrix where the first columns represents train scores and the second column test scores.

`combine_mean`

# OutlierDetection.combine_mean — Function.

combine_mean(scores_mat)

Combination method to merge outlier scores from multiple detectors using the mean value of scores.

Parameters

scores_mat::AbstractMatrix{T}

A matrix, with each row representing the scores for a specific instance and each column representing a detector.

Returns

combined_scores::AbstractVector{T}
The combined scores, i.e. column-wise mean.

Examples

scores = [1 2; 3 4; 5 6]
combine_mean(scores) # [1.5, 3.5, 5.5]

`combine_median`

# OutlierDetection.combine_median — Function.

combine_median(scores_mat)

Combination method to merge outlier scores from multiple detectors using the median value of scores.

Parameters

scores_mat::AbstractMatrix{T}

A matrix, with each row representing the scores for a specific instance and each column representing a detector.

Returns

combined_scores::AbstractVector{T}

The combined scores, i.e. column-wise median.

Examples

scores = [1 2; 3 4; 5 6]
combine_median(scores) # [1.5, 3.5, 5.5]

`combine_max`

# OutlierDetection.combine_max — Function.

combine_max(scores_mat)

Combination method to merge outlier scores from multiple detectors using the maximum value of scores.

Parameters

scores_mat::AbstractMatrix{T}

A matrix, with each row representing the scores for a specific instance and each column representing a detector.

Returns

combined_scores::AbstractVector{T}

The combined scores, i.e. column-wise maximum.

Examples

scores = [1 2; 3 4; 5 6]
combine_max(scores) # [2, 4, 6]

Classification

These functions may be used as an input for the classify keyword argument present in wrappers and transformers, they transform a tuple of train/test scores into a tuple of train/test classes.

`classify_quantile`

# OutlierDetection.classify_quantile — Function.

classify_quantile(threshold)

Create a percentile-based classification function that converts scores_train::Scores and scores_test::Scores to an array of classes with "normal" indicating normal data and "outlier" indicating outliers. The conversion is based on percentiles of the training data, i.e. all datapoints above the threshold percentile are considered outliers.

Parameters

threshold::Real

The score threshold (number between 0 and 1) used to classify the samples into inliers and outliers.

scores::Tuple{Scores, Scores}

A tuple consisting of two vectors representing training and test scores.

Returns

classes::Tuple{Vector{String}, Vector{String}}

The vector of classes consisting of "outlier" and "normal" elements.

Examples

classify = classify_quantile(0.9)
scores_train, scores_test = ([1, 2, 3], [4, 3, 2])
classify(scores_train, scores_train) # ["inlier", "inlier", "outlier"]
classify(scores_train, scores_test) # ["outlier", "outlier", "inlier"]

Output helpers

`to_univariate_finite`

# OutlierDetection.to_univariate_finite — Function.

to_univariate_finite(scores::Scores)

Convert normalized scores to a vector of univariate finite distributions.

Parameters

scores::[`Scores`](@ref)

Raw vector of scores.

Returns

scores::UnivariateFiniteVector{OrderedFactor{2}}

Univariate finite vector of scores.

`to_categorical`

# OutlierDetection.to_categorical — Function.

to_categorical(classes::AbstractVector{String})

Convert a vector of classes (with possible missing values) to a categorical vector.

Parameters

classes::[`Labels`](@ref)

A vector of classes.

Returns

classes::CategoricalVector{Union{Missing,String}, UInt32}

A categorical vector of classes.

`from_univariate_finite`

# OutlierDetection.from_univariate_finite — Function.

from_univariate_finite(scores)

Extract the raw scores from a vector of univariate finite distributions.

Parameters

scores::MLJ.UnivariateFiniteVector

A vector of univariate finite distributions.

Returns

scores::[`Scores`](@ref)

A vector of raw scores.

`from_categorical`

# OutlierDetection.from_categorical — Function.

from_categorical(classes)

Extract the raw classes from categorical arrays.

Parameters

classes::MLJ.CategoricalVector

A vector of categorical values.

Returns

classes::[`Labels`](@ref)

A vector of raw classes.

Label helpers

`normal_fraction`

# OutlierDetection.normal_fraction — Function.

normal_fraction(y)

Determine the fraction of normals in a given vector.

Parameters

y::Labels

An array containing "normal" and "outlier" classes.

Returns

outlier_fraction::Float64

The fraction of normals.

`outlier_fraction`

# OutlierDetection.outlier_fraction — Function.

outlier_fraction(y)

Determine the fraction of outliers in a given vector.

Parameters

y::Labels

An array containing "normal" and "outlier" classes.

Returns

outlier_fraction::Float64

The fraction of outliers.

`normal_count`

# OutlierDetection.normal_count — Function.

normal_count(y)

Determine the count of normals in a given vector.

Parameters

y::Labels

An array containing "normal" and "outlier" classes.

Returns

normal_count::Int64

The count of normals.

`outlier_count`

# OutlierDetection.outlier_count — Function.

outlier_count(y)

Determine the count of outliers in a given vector.

Parameters

y::Labels

An array containing "normal" and "outlier" classes.

Returns

outlier_count::Int64

The count of outliers.