Manufacturing | Micro-/Nanotechnology

There and Back Again: Outlier Detection Between Statistical Reasoning And Data Mining Algorithms

by WIREs Authors | Sep 13, 2018

Data mining and statistics, the roots and the path of development of statistical outlier detection and of database‐related data mining methods for outlier detection.

gb_001

Credit card companies observe the financial transactions of their customers in order to be able to alert the customer or deny a transaction if it looks strange. Scientists working with measurements from lab experiments or sensor data in the wild get alerted if some measurements are considerably and unexpectedly different from the previous observations. Analysis of sports statistics can lead to the discovery of suspicious activities. Administrators get notified about unusual behavior on their webserver which could indicate technical problems or malicious attacks.

All these examples relate technically to the detection of so called outliers or anomalies, observations that do not fit well to the remainder of the given observations. In light of the common metaphor grasping the task of data mining is like mining for nuggets of information, outlier detection can be seen as being not merely interested in removing noise but also in finding interesting database objects deviating in their behavior considerably from the majority and, as such, providing new insights. Indeed, both aspects of outlier detection are like two sides of a coin as one person’s noise may be another person’s signal. The above scenarios highlight interest in outliers, as measurement errors in scientific data should possibly just be removed whereas a case of credit card abuse is the solely interesting fact among a wealth of ‘just usual’ data (that, in turn, could of course be interesting itself as well, e.g., for modeling a customer’s interests and behavior—after removing outliers).

Outlier detection is a field that has been studied in statistics and in data mining. While data mining techniques are of course based on or motivated by statistical reasoning, the development of techniques in the scientific data mining literature became detached from the statistical intuition, as the interest in data mining is the algorithmic handling of “big data” and the focus is often more on efficiency. Likewise, while statisticians nowadays also develop algorithms and programs to analyze data automatically, the algorithmic developments in data mining have not often been considered in the statistical literature, as the two communities do not strongly overlap.

In their WIREs Data Mining and Knowledge Discovery article, ‘There and Back Again: Outlier Detection Between Statistical Reasoning and Data Mining Algorithms,’ authors Arthur Zimek and Peter Filzmoser bridge the gap between the data mining literature and the statistics literature, relating concepts to each other, and discussing what it means to get an ‘outlier’ alert from some method.

Kindly contributed by the Authors.

ASN Weekly

Sign up for our weekly newsletter and receive the latest science news directly to your inbox.

ASN Weekly

Sign up for our weekly newsletter and receive the latest science news.

Related posts:

3D printing goes “green” with microalgae ink

3D printing goes “green” with microalgae ink

A search for environmentally friendly inks led researchers to microalgae biofactories, providing a renewable biomass solution.

The colorful secret behind stingrays’ electric blue spots

The colorful secret behind stingrays’ electric blue spots

Dive into the mesmerizing world of sharks and rays, where vibrant blues emerge through never-before-seen structural colors.

Making spider silk from tobacco plants

Making spider silk from tobacco plants

Plant cells outperform yeast or bacteria in producing spider silk proteins, enhancing production efficiency.

Titanium-based metamaterial unlocks strength beyond nature

Titanium-based metamaterial unlocks strength beyond nature

A groundbreaking titanium metamaterial with unparalleled strength and versatility could revolutionize manufacturing and high-speed aviation.

Violets help scientists build a 4D-printed catapult

Violets help scientists build a 4D-printed catapult

Taking inspiration from nature, scientists create a two-component catapult that overcomes limitations in 4D printing.

Using ice to make ultra-clean 2D materials

Using ice to make ultra-clean 2D materials

Using the adhesive properties of ice, researchers have developed a transfer method to move large sheets of 2D materials without breaking them.