Score Helpers
OutlierDetection.jl
provides many useful helper functions to work with outlier scores. The goal of these helpers is to normalize, combine and classify raw outlier scores. The main design philosophy behind all of these functions is that they transform a tuple of train/test scores into some different train/test tuple representation, e.g. train/test classes.
Transformers
In order to normalize scores or classify them, both the training and testing scores are necessary. We thus return a tuple of training and test scores in all transform
calls. Transformers can make use of one or more such train/test tuples to convert them into normalized scores, probabilities or classes.
ScoreTransformer
#
OutlierDetection.ScoreTransformer
— Type.
ScoreTransformer(combine = combine,
normalize = normalize)
Transform the results of a single or multiple outlier detection models to combined and normalized scores.
Parameters
normalize::Function A function to reduce a matrix, where each row represents an instance and each column a score of specific detector, to a vector of scores for each instance. See scale_minmax
for a specific implementation.
combine::Function
A function to reduce a matrix, where each row represents an instance and each column represents the score of specific detector, to a vector of scores for each instance. See combine_mean
for a specific implementation.
ProbabilisticTransformer
#
OutlierDetection.ProbabilisticTransformer
— Type.
ProbabilisticTransformer(combine = combine,
normalize = normalize)
Transform the results of a single or multiple outlier detection models to combined univariate finite distributions.
Parameters
normalize::Function A function to reduce a matrix, where each row represents an instance and each column a score of specific detector, to a vector of scores for each instance. See scale_minmax
for a specific implementation.
combine::Function
A function to reduce a matrix, where each row represents an instance and each column represents the score of specific detector, to a vector of scores for each instance. See combine_mean
for a specific implementation.
DeterministicTransformer
#
OutlierDetection.DeterministicTransformer
— Type.
DeterministicTransformer(combine = combine,
normalize = normalize,
classify = classify_quantile(DEFAULT_THRESHOLD))
Transform the results of a single or multiple outlier detection models to combined categorical values.
Parameters
normalize::Function A function to reduce a matrix, where each row represents an instance and each column a score of specific detector, to a vector of scores for each instance. See scale_minmax
for a specific implementation.
combine::Function
A function to reduce a matrix, where each row represents an instance and each column represents the score of specific detector, to a vector of scores for each instance. See combine_mean
for a specific implementation.
Surrogate models can be used to transform learning networks into MLJ-models.
@surrogate
#
OutlierDetection.@surrogate
— Macro.
@surrogate(fn, name)
Create a surrogate model from a learning network, implicitly defining a composite struct using name
and a prefit
function using fn
.
Parameters
fn::Function
A function to reduce a matrix, where each row represents an instance and each column represents the score of specific detector, to a vector of scores for each instance. See combine_mean
for a specific implementation. name::Symbol The name of the resulting composite model (the surrogate model).
Wrappers
Wrappers take one or more detectors and transform the (combined) raw scores to probabilities (ProbabilisticDetector
) or classes (DeterministicDetector
). Using wrappers, you can easily evaluate outlier detection models with MLJ.
CompositeDetector
#
OutlierDetection.CompositeDetector
— Function.
CompositeDetector(unnamed_detectors...;
normalize,
combine,
named_detectors...)
Transform one or more raw detectors into a single composite detector (that returns raw outlier scores).
ProbabilisticDetector
#
OutlierDetection.ProbabilisticDetector
— Function.
ProbabilisticDetector(unnamed_detectors...;
normalize,
combine,
named_detectors...)
Transform one or more raw detectors into a single probabilistic detector (that returns outlier probabilities).
DeterministicDetector
#
OutlierDetection.DeterministicDetector
— Function.
DeterministicDetector(unnamed_detectors...;
normalize,
combine,
named_detectors...)
Transform one or more raw detectors into a single deterministic detector (that returns inlier and outlier classes).
Normalization
These functions may be used as an input for the normalize
keyword argument present in wrappers and transformers, they transform a tuple of train/test scores into a tuple of normalized train/test scores.
scale_minmax
#
OutlierDetection.scale_minmax
— Function.
scale_minmax(scores)
Transform an array of scores into a range between [0,1] using min-max scaling.
Parameters
scores::Tuple{Scores, Scores}
A tuple consisting of two vectors representing training and test scores.
Returns
normalized_scores::Tuple{Scores, Scores}
The normalized train and test scores.
Examples
scores_train, scores_test = ([1, 2, 3], [4, 3, 2, 1, 0])
scale_minmax(scores_train, scores_test) # ([0.0, 0.5, 1.0], [1.0, 1.0, 0.5, 0.0, 0.0])
scale_unify
#
OutlierDetection.scale_unify
— Function.
scale_unify(scores)
Transform an array of scores into a range between [0,1] using unifying scores as described in [1].
Parameters
scores::Tuple{Scores, Scores}
A tuple consisting of two vectors representing training and test scores.
Returns
unified_scores::Tuple{Scores, Scores}
The unified train and test scores.
Examples
scores_train, scores_test = ([1, 2, 3], [4, 3, 2, 1, 0])
unify(scores_train, scores_test) # ([0.0, 0.0, 0.68..], [0.95.., 0.68.., 0.0, 0.0, 0.0])
References
Kriegel, Hans-Peter; Kroger, Peer; Schubert, Erich; Zimek, Arthur (2011): Interpreting and Unifying Outlier Scores.
Combination
These functions may be used as an input for the combine
keyword argument present in wrappers and transformers. The input for the combine functions are one or more train/test score tuples or alternatively a matrix where the first columns represents train scores and the second column test scores.
combine_mean
#
OutlierDetection.combine_mean
— Function.
combine_mean(scores_mat)
Combination method to merge outlier scores from multiple detectors using the mean value of scores.
Parameters
scores_mat::AbstractMatrix{T}
A matrix, with each row representing the scores for a specific instance and each column representing a detector.
Returns
combined_scores::AbstractVector{T}
The combined scores, i.e. column-wise mean.
Examples
scores = [1 2; 3 4; 5 6]
combine_mean(scores) # [1.5, 3.5, 5.5]
combine_median
#
OutlierDetection.combine_median
— Function.
combine_median(scores_mat)
Combination method to merge outlier scores from multiple detectors using the median value of scores.
Parameters
scores_mat::AbstractMatrix{T}
A matrix, with each row representing the scores for a specific instance and each column representing a detector.
Returns
combined_scores::AbstractVector{T}
The combined scores, i.e. column-wise median.
Examples
scores = [1 2; 3 4; 5 6]
combine_median(scores) # [1.5, 3.5, 5.5]
combine_max
#
OutlierDetection.combine_max
— Function.
combine_max(scores_mat)
Combination method to merge outlier scores from multiple detectors using the maximum value of scores.
Parameters
scores_mat::AbstractMatrix{T}
A matrix, with each row representing the scores for a specific instance and each column representing a detector.
Returns
combined_scores::AbstractVector{T}
The combined scores, i.e. column-wise maximum.
Examples
scores = [1 2; 3 4; 5 6]
combine_max(scores) # [2, 4, 6]
Classification
These functions may be used as an input for the classify
keyword argument present in wrappers and transformers, they transform a tuple of train/test scores into a tuple of train/test classes.
classify_quantile
#
OutlierDetection.classify_quantile
— Function.
classify_quantile(threshold)
Create a percentile-based classification function that converts scores_train::Scores
and scores_test::Scores
to an array of classes with "normal"
indicating normal data and "outlier"
indicating outliers. The conversion is based on percentiles of the training data, i.e. all datapoints above the threshold
percentile are considered outliers.
Parameters
threshold::Real
The score threshold (number between 0 and 1) used to classify the samples into inliers and outliers.
scores::Tuple{Scores, Scores}
A tuple consisting of two vectors representing training and test scores.
Returns
classes::Tuple{Vector{String}, Vector{String}}
The vector of classes consisting of "outlier"
and "normal"
elements.
Examples
classify = classify_quantile(0.9)
scores_train, scores_test = ([1, 2, 3], [4, 3, 2])
classify(scores_train, scores_train) # ["inlier", "inlier", "outlier"]
classify(scores_train, scores_test) # ["outlier", "outlier", "inlier"]
Output helpers
to_univariate_finite
#
OutlierDetection.to_univariate_finite
— Function.
to_univariate_finite(scores::Scores)
Convert normalized scores to a vector of univariate finite distributions.
Parameters
scores::[`Scores`](@ref)
Raw vector of scores.
Returns
scores::UnivariateFiniteVector{OrderedFactor{2}}
Univariate finite vector of scores.
to_categorical
#
OutlierDetection.to_categorical
— Function.
to_categorical(classes::AbstractVector{String})
Convert a vector of classes (with possible missing values) to a categorical vector.
Parameters
classes::[`Labels`](@ref)
A vector of classes.
Returns
classes::CategoricalVector{Union{Missing,String}, UInt32}
A categorical vector of classes.
from_univariate_finite
#
OutlierDetection.from_univariate_finite
— Function.
from_univariate_finite(scores)
Extract the raw scores from a vector of univariate finite distributions.
Parameters
scores::MLJ.UnivariateFiniteVector
A vector of univariate finite distributions.
Returns
scores::[`Scores`](@ref)
A vector of raw scores.
from_categorical
#
OutlierDetection.from_categorical
— Function.
from_categorical(classes)
Extract the raw classes from categorical arrays.
Parameters
classes::MLJ.CategoricalVector
A vector of categorical values.
Returns
classes::[`Labels`](@ref)
A vector of raw classes.
Label helpers
normal_fraction
#
OutlierDetection.normal_fraction
— Function.
normal_fraction(y)
Determine the fraction of normals in a given vector.
Parameters
y::Labels
An array containing "normal" and "outlier" classes.
Returns
outlier_fraction::Float64
The fraction of normals.
outlier_fraction
#
OutlierDetection.outlier_fraction
— Function.
outlier_fraction(y)
Determine the fraction of outliers in a given vector.
Parameters
y::Labels
An array containing "normal" and "outlier" classes.
Returns
outlier_fraction::Float64
The fraction of outliers.
normal_count
#
OutlierDetection.normal_count
— Function.
normal_count(y)
Determine the count of normals in a given vector.
Parameters
y::Labels
An array containing "normal" and "outlier" classes.
Returns
normal_count::Int64
The count of normals.
outlier_count
#
OutlierDetection.outlier_count
— Function.
outlier_count(y)
Determine the count of outliers in a given vector.
Parameters
y::Labels
An array containing "normal" and "outlier" classes.
Returns
outlier_count::Int64
The count of outliers.