Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Linear classifier
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Generative models vs. discriminative models== There are two broad classes of methods for determining the parameters of a linear classifier <math>\vec w</math>. They can be [[Generative model|generative]] and [[Discriminative model|discriminative]] models.<ref>T. Mitchell, [https://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression.] Draft Version, 2005</ref><ref>A. Y. Ng and M. I. Jordan. [http://www.cs.berkeley.edu/~jordan/papers/ng-jordan-nips01.ps On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes.] in NIPS 14, 2002.</ref> Methods of the former model [[joint probability distribution]], whereas methods of the latter model [[Conditional probability distribution|conditional density functions]] <math>P({\rm class}|\vec x)</math>. Examples of such algorithms include: * [[linear discriminant analysis|Linear Discriminant Analysis]] (LDA)—assumes [[normal distribution|Gaussian]] conditional density models * [[Naive Bayes classifier]] with multinomial or multivariate Bernoulli event models. The second set of methods includes [[discriminative model]]s, which attempt to maximize the quality of the output on a [[training set]]. Additional terms in the training cost function can easily perform [[regularization (machine learning)|regularization]] of the final model. Examples of discriminative training of linear classifiers include: * [[Logistic regression]]—maximum likelihood estimation of <math>\vec w</math> assuming that the observed training set was generated by a binomial model that depends on the output of the classifier. * [[Perceptron]]—an algorithm that attempts to fix all errors encountered in the training set * Fisher's Linear Discriminant Analysis—an algorithm (different than "LDA") that maximizes the ratio of between-class scatter to within-class scatter, without any other assumptions. It is in essence a method of dimensionality reduction for binary classification.<ref>R.O. Duda, P.E. Hart, D.G. Stork, "Pattern Classification", Wiley, (2001). {{ISBN|0-471-05669-3}}</ref> * [[Support vector machine]]—an algorithm that maximizes the [[Margin (machine learning)|margin]] between the decision hyperplane and the examples in the training set. '''Note:''' Despite its name, LDA does not belong to the class of discriminative models in this taxonomy. However, its name makes sense when we compare LDA to the other main linear [[dimensionality reduction]] algorithm: [[principal components analysis]] (PCA). LDA is a [[supervised learning]] algorithm that utilizes the labels of the data, while PCA is an [[unsupervised learning]] algorithm that ignores the labels. To summarize, the name is a historical artifact.<ref>{{Cite book |last1=Duda |first1=Richard O. |title=Pattern classification |last2=Hart |first2=Peter E. |last3=Stork |first3=David G. |date=2001 |publisher=John Wiley & Sons, Inc |isbn=978-0-471-05669-0 |edition=Second |series=A Wiley-Interscience publication |location=New York Chichester Weinheim Brisbane Singapore Toronto |pages=117}}</ref> Discriminative training often yields higher accuracy than modeling the conditional density functions{{Citation needed|date=October 2017}}. However, handling missing data is often easier with conditional density models{{Citation needed|date=October 2017}}. All of the linear classifier algorithms listed above can be converted into non-linear algorithms operating on a different input space <math>\varphi(\vec x)</math>, using the [[kernel trick]]. ===Discriminative training=== Discriminative training of linear classifiers usually proceeds in a [[supervised learning|supervised]] way, by means of an [[optimization algorithm]] that is given a training set with desired outputs and a [[loss function]] that measures the discrepancy between the classifier's outputs and the desired outputs. Thus, the learning algorithm solves an optimization problem of the form<ref name="ieee">{{cite journal |author1=Guo-Xun Yuan |author2=Chia-Hua Ho |author3=Chih-Jen Lin |title=Recent Advances of Large-Scale Linear Classification |journal=Proc. IEEE |volume=100 |issue=9 |year=2012|url=http://dmkd.cs.vt.edu/TUTORIAL/Bigdata/Papers/IEEE12.pdf |archive-url=https://web.archive.org/web/20170610105707/http://dmkd.cs.vt.edu/TUTORIAL/Bigdata/Papers/IEEE12.pdf |archive-date=2017-06-10 |url-status=live}}</ref> :<math>\underset{\mathbf{w}}{\arg\min} \;R(\mathbf{w}) + C \sum_{i=1}^N L(y_i, \mathbf{w}^\mathsf{T} \mathbf{x}_i)</math> where * {{math|'''w'''}} is a vector of classifier parameters, * {{math|''L''(''y<sub>i</sub>'', '''w'''<sup>T</sup>'''x'''<sub>''i''</sub>)}} is a loss function that measures the discrepancy between the classifier's prediction and the true output {{mvar|y<sub>i</sub>}} for the {{mvar|i}}'th training example, * {{math|''R''('''w''')}} is a [[Regularization (mathematics)|regularization]] function that prevents the parameters from getting too large (causing [[overfitting]]), and * {{mvar|C}} is a scalar constant (set by the user of the learning algorithm) that controls the balance between the regularization and the loss function. Popular loss functions include the [[hinge loss]] (for linear SVMs) and the [[log loss]] (for linear logistic regression). If the regularization function {{mvar|R}} is [[convex function|convex]], then the above is a [[convex optimization|convex problem]].{{r|ieee}} Many algorithms exist for solving such problems; popular ones for linear classification include ([[Stochastic gradient descent|stochastic]]) [[gradient descent]], [[L-BFGS]], [[coordinate descent]] and [[Newton method]]s.
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Linear classifier
(section)
Add topic