Classification

Classification is usually understood to mean the allocation of objects to certain pre-existing classes or categories. This distinguishes it from the earlier step in which the classes themselves are established, often through clustering in which similar objects are grouped together.^[1] Examples include a pregnancy test and identifying spam emails.

Classification is a part of many different kinds of activities and studied from many different points of view including philosophy, law, anthropology, biology, taxonomy, cognition, communications, knowledge organization, psychology, statistics, machine learning, librarianship and mathematics.

As well as 'category', synonyms or near-synonyms for 'class' include 'type', 'species', 'order', 'concept', 'taxon', 'group' and 'division'. Equally, the meaning of the word 'classification' (and its synonyms) may in day-to-day usage take on one of several related meanings: it may encompass both classification and the creation of classes, as for example in 'the task of categorizing pages in Wikipedia' (the activity of taxonomy); or it may refer to the underlying scheme of classes (a taxonomy); or it may refer to the label given to an object by the classifier.

Binary vs multi-class classification[edit]

Methodological work is commonly divided between cases where there are exactly two classes (binary classification) and cases where there are three or more classes (multiclass classification).

Evaluation of accuracy[edit]

Unlike in decision theory, it is assumed that a classifier repeats the classification task over and over. And unlike a lottery, it is assumed that each classification can be either right or wrong; in the theory of measurement, classification is understood as measurement against a nominal scale. Thus it is possible to try to measure the accuracy of a classifier.

Measuring the accuracy of a classifier allows a choice to be made between two alternative classifiers. This is important both when developing a classifier and in choosing which classifier to deploy. There are however many different methods for evaluating the accuracy of a classifier and no general method for determining which method should be used in which circumstances. Different fields have taken different approaches, even in binary classification. In pattern recognition, error rate is popular. The Gini coefficient and KS statistic are widely used in the credit scoring industry. Sensitivity and specificity are widely used in epidemiology and medicine. Precision and recall are widely used in information retrieval.^[2]

Classifier accuracy depends greatly on the characteristics of the data to be classified. There is no single classifier that works best on all given problems (a phenomenon that may be explained by the no-free-lunch theorem).

References[edit]

^ https://www.theclassificationsociety.org/about/
^ David Hand (2012). "Assessing the Performance of Classification Methods". International Statistical Review. 80 (3): 400–414.

External links[edit]

Media related to Classification at Wikimedia Commons
Parrochia, Daniel 2016. "Classification". In The Internet Encyclopedia of Philosophy eds. James Fieser and Bradley Dowden.

[1] ttps://www.theclassificationsociety.org/about/

[Hand2012-2] David Hand (2012). "Assessing the Performance of Classification Methods". International Statistical Review. 80 (3): 400–414.

[1]

[2]

Binary vs multi-class classification[edit]

Evaluation of accuracy[edit]

See also[edit]

References[edit]

External links[edit]