Score Helpers
OutlierDetection.jl
provides many useful helper functions to work with outlier scores. The goal of these helpers is to normalize, combine and classify raw outlier scores. The main design philosophy behind all of these functions is that they transform a tuple of train/test scores into some different train/test tuple representation, e.g. train/test classes.
Transformers
In order to normalize scores or classify them, both the training and testing scores are necessary. We thus provide a helper function called augmented_transform
that returns a tuple of training and test scores. Transformers can make use of one or more such train/test tuples to convert them into normalized scores, probabilities or classes.
augmented_transform
#
OutlierDetection.augmented_transform
— Function.
augmented_transform(mach; rows=:)
Extends transform
by additionally returning the training scores from detectors as a train/test score tuple.
Parameters
mach::MLJ.Machine{<:OD.Detector}
A fitted machine with a detector model.
rows
Test data specified as rows machine-bound data (as in transform
), but could also provide new test data X
.
Returns
augmented_scores::Tuple{AbstractVector{<:Real}, AbstractVector{<:Real}}
A tuple of raw training and test scores.
ScoreTransformer
#
OutlierDetection.ScoreTransformer
— Type.
ScoreTransformer(combine = combine,
normalize = normalize)
Transform the results of a single or multiple outlier detection models to combined and normalized scores.
Parameters
normalize::Function A function to reduce a matrix, where each row represents an instance and each column a score of specific detector, to a vector of scores for each instance. See scale_minmax
for a specific implementation.
combine::Function
A function to reduce a matrix, where each row represents an instance and each column represents the score of specific detector, to a vector of scores for each instance. See combine_mean
for a specific implementation.
ProbabilisticTransformer
#
OutlierDetection.ProbabilisticTransformer
— Type.
ProbabilisticTransformer(combine = combine,
normalize = normalize)
Transform the results of a single or multiple outlier detection models to combined univariate finite distributions.
Parameters
normalize::Function A function to reduce a matrix, where each row represents an instance and each column a score of specific detector, to a vector of scores for each instance. See scale_minmax
for a specific implementation.
combine::Function
A function to reduce a matrix, where each row represents an instance and each column represents the score of specific detector, to a vector of scores for each instance. See combine_mean
for a specific implementation.
DeterministicTransformer
#
OutlierDetection.DeterministicTransformer
— Type.
DeterministicTransformer(combine = combine,
normalize = normalize,
classify = classify_quantile(DEFAULT_THRESHOLD))
Transform the results of a single or multiple outlier detection models to combined categorical values.
Parameters
normalize::Function A function to reduce a matrix, where each row represents an instance and each column a score of specific detector, to a vector of scores for each instance. See scale_minmax
for a specific implementation.
combine::Function
A function to reduce a matrix, where each row represents an instance and each column represents the score of specific detector, to a vector of scores for each instance. See combine_mean
for a specific implementation.
Wrappers
Wrappers take one or more detectors and transform the (combined) raw scores to probabilities (ProbabilisticDetector
) or classes (DeterministicDetector
). Using wrappers, you can easily evaluate outlier detection models with MLJ.
CompositeDetector
#
OutlierDetection.CompositeDetector
— Function.
CompositeDetector(unnamed_detectors...;
normalize,
combine,
named_detectors...)
Transform one or more raw detectors into a single composite detector (that returns raw outlier scores).
ProbabilisticDetector
#
OutlierDetection.ProbabilisticDetector
— Function.
ProbabilisticDetector(unnamed_detectors...;
normalize,
combine,
named_detectors...)
Transform one or more raw detectors into a single probabilistic detector (that returns outlier probabilities).
DeterministicDetector
#
OutlierDetection.DeterministicDetector
— Function.
DeterministicDetector(unnamed_detectors...;
normalize,
combine,
named_detectors...)
Transform one or more raw detectors into a single deterministic detector (that returns inlier and outlier classes).
Normalization
These functions may be used as an input for the normalize
keyword argument present in wrappers and transformers, they transform a tuple of train/test scores into a tuple of normalized train/test scores.
scale_minmax
#
OutlierDetection.scale_minmax
— Function.
scale_minmax(scores)
Transform an array of scores into a range between [0,1] using min-max scaling.
Parameters
scores::Tuple{Scores, Scores}
A tuple consisting of two vectors representing training and test scores.
Returns
normalized_scores::Tuple{Scores, Scores}
The normalized train and test scores.
Examples
scores_train, scores_test = ([1, 2, 3], [4, 3, 2, 1, 0])
scale_minmax(scores_train, scores_test) # ([0.0, 0.5, 1.0], [1.0, 1.0, 0.5, 0.0, 0.0])
scale_unify
#
OutlierDetection.scale_unify
— Function.
scale_unify(scores)
Transform an array of scores into a range between [0,1] using unifying scores as described in [1].
Parameters
scores::Tuple{Scores, Scores}
A tuple consisting of two vectors representing training and test scores.
Returns
unified_scores::Tuple{Scores, Scores}
The unified train and test scores.
Examples
scores_train, scores_test = ([1, 2, 3], [4, 3, 2, 1, 0])
unify(scores_train, scores_test) # ([0.0, 0.0, 0.68..], [0.95.., 0.68.., 0.0, 0.0, 0.0])
References
Kriegel, Hans-Peter; Kroger, Peer; Schubert, Erich; Zimek, Arthur (2011): Interpreting and Unifying Outlier Scores.
Combination
These functions may be used as an input for the combine
keyword argument present in wrappers and transformers. The input for the combine functions are one or more train/test score tuples or alternatively a matrix where the first columns represents train scores and the second column test scores.
combine_mean
#
OutlierDetection.combine_mean
— Function.
combine_mean(scores_mat)
Combination method to merge outlier scores from multiple detectors using the mean value of scores.
Parameters
scores_mat::AbstractMatrix{T}
A matrix, with each row representing the scores for a specific instance and each column representing a detector.
Returns
combined_scores::AbstractVector{T}
The combined scores, i.e. column-wise mean.
Examples
scores = [1 2; 3 4; 5 6]
combine_mean(scores) # [1.5, 3.5, 5.5]
combine_median
#
OutlierDetection.combine_median
— Function.
combine_median(scores_mat)
Combination method to merge outlier scores from multiple detectors using the median value of scores.
Parameters
scores_mat::AbstractMatrix{T}
A matrix, with each row representing the scores for a specific instance and each column representing a detector.
Returns
combined_scores::AbstractVector{T}
The combined scores, i.e. column-wise median.
Examples
scores = [1 2; 3 4; 5 6]
combine_median(scores) # [1.5, 3.5, 5.5]
combine_max
#
OutlierDetection.combine_max
— Function.
combine_max(scores_mat)
Combination method to merge outlier scores from multiple detectors using the maximum value of scores.
Parameters
scores_mat::AbstractMatrix{T}
A matrix, with each row representing the scores for a specific instance and each column representing a detector.
Returns
combined_scores::AbstractVector{T}
The combined scores, i.e. column-wise maximum.
Examples
scores = [1 2; 3 4; 5 6]
combine_max(scores) # [2, 4, 6]
Classification
These functions may be used as an input for the classify
keyword argument present in wrappers and transformers, they transform a tuple of train/test scores into a tuple of train/test classes.
classify_quantile
#
OutlierDetection.classify_quantile
— Function.
classify_quantile(threshold)
Create a percentile-based classifiction function that converts scores_train::Scores
and scores_test::Scores
to an array of classes with "normal"
indicating normal data and "outlier"
indicating outliers. The conversion is based on percentiles of the training data, i.e. all datapoints above the threshold
percentile are considered outliers.
Parameters
threshold::Real
The score threshold (number between 0 and 1) used to classify the samples into inliers and outliers.
scores::Tuple{Scores, Scores}
A tuple consisting of two vectors representing training and test scores.
Returns
classes::Tuple{Vector{String}, Vector{String}}
The vector of classes consisting of "outlier"
and "normal"
elements.
Examples
classify = classify_quantile(0.9)
scores_train, scores_test = ([1, 2, 3], [4, 3, 2])
classify(scores_train, scores_train) # ["inlier", "inlier", "outlier"]
classify(scores_train, scores_test) # ["outlier", "outlier", "inlier"]
Output helpers
to_univariate_finite
#
OutlierDetection.to_univariate_finite
— Function.
to_univariate_finite(scores::Scores)
Convert normalized scores to a vector of univariate finite distributions.
Parameters
scores::[`Scores`](@ref)
Raw vector of scores.
Returns
scores::UnivariateFiniteVector{OrderedFactor{2}}
Univariate finite vector of scores.
to_categorical
#
OutlierDetection.to_categorical
— Function.
to_categorical(classes::AbstractVector{String})
Convert a vector of classes (with possible missing values) to a categorical vector.
Parameters
classes::[`Labels`](@ref)
A vector of classes.
Returns
classes::CategoricalVector{Union{Missing,String}, UInt32}
A categorical vector of classes.
from_univariate_finite
#
OutlierDetection.from_univariate_finite
— Function.
from_univariate_finite(scores)
Extract the raw scores from a vector of univariate finite distributions.
Parameters
scores::MLJ.UnivariateFiniteVector
A vector of univariate finite distributions.
Returns
scores::[`Scores`](@ref)
A vector of raw scores.
from_categorical
#
OutlierDetection.from_categorical
— Function.
from_categorical(classes)
Extract the raw classes from categorical arrays.
Parameters
classes::MLJ.CategoricalVector
A vector of categorical values.
Returns
classes::[`Labels`](@ref)
A vector of raw classes.
Label helpers
normal_fraction
#
OutlierDetection.normal_fraction
— Function.
normal_fraction(y)
Determine the fraction of normals in a given vector.
Parameters
y::Labels
An array containing "normal" and "outlier" classes.
Returns
outlier_fraction::Float64
The fraction of normals.
outlier_fraction
#
OutlierDetection.outlier_fraction
— Function.
outlier_fraction(y)
Determine the fraction of outliers in a given vector.
Parameters
y::Labels
An array containing "normal" and "outlier" classes.
Returns
outlier_fraction::Float64
The fraction of outliers.
n_normal
#
OutlierDetection.n_normal
— Function.
n_normal(y)
Determine the count of normals in a given vector.
Parameters
y::Labels
An array containing "normal" and "outlier" classes.
Returns
n_normal::Int64
The count of normals.
n_outlier
#
OutlierDetection.n_outlier
— Function.
n_outlier(y)
Determine the count of outliers in a given vector.
Parameters
y::Labels
An array containing "normal" and "outlier" classes.
Returns
outliers::Int64
The count of outliers.