Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Perceptron
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Variants == The pocket algorithm with ratchet (Gallant, 1990) solves the stability problem of perceptron learning by keeping the best solution seen so far "in its pocket". The pocket algorithm then returns the solution in the pocket, rather than the last solution. It can be used also for non-separable data sets, where the aim is to find a perceptron with a small number of misclassifications. However, these solutions appear purely stochastically and hence the pocket algorithm neither approaches them gradually in the course of learning, nor are they guaranteed to show up within a given number of learning steps. The Maxover algorithm (Wendemuth, 1995) is [[Robustness (computer science)|"robust"]] in the sense that it will converge regardless of (prior) knowledge of linear separability of the data set.<ref>{{cite journal |first=A. |last=Wendemuth |title=Learning the Unlearnable |journal=Journal of Physics A: Mathematical and General |volume=28 |issue= 18|pages=5423β5436 |year=1995 |doi=10.1088/0305-4470/28/18/030 |bibcode=1995JPhA...28.5423W }}</ref> In the linearly separable case, it will solve the training problem β if desired, even with optimal stability ([[Hyperplane separation theorem|maximum margin]] between the classes). For non-separable data sets, it will return a solution with a computable small number of misclassifications.<ref>{{cite journal |first=A. |last=Wendemuth |title=Performance of robust training algorithms for neural networks |journal=Journal of Physics A: Mathematical and General |volume=28 |issue= 19|pages=5485β5493 |year=1995 |doi=10.1088/0305-4470/28/19/006 |bibcode=1995JPhA...28.5485W }}</ref> In all cases, the algorithm gradually approaches the solution in the course of learning, without memorizing previous states and without stochastic jumps. Convergence is to global optimality for separable data sets and to local optimality for non-separable data sets. The Voted Perceptron (Freund and Schapire, 1999), is a variant using multiple weighted perceptrons. The algorithm starts a new perceptron every time an example is wrongly classified, initializing the weights vector with the final weights of the last perceptron. Each perceptron will also be given another weight corresponding to how many examples do they correctly classify before wrongly classifying one, and at the end the output will be a weighted vote on all perceptrons. In separable problems, perceptron training can also aim at finding the largest separating margin between the classes. The so-called perceptron of optimal stability can be determined by means of iterative training and optimization schemes, such as the Min-Over algorithm (Krauth and Mezard, 1987)<ref name="KrauthMezard87" /> or the AdaTron (Anlauf and Biehl, 1989)).<ref>{{cite journal |first1=J. K. |last1=Anlauf |first2=M. |last2=Biehl |title=The AdaTron: an Adaptive Perceptron algorithm |journal=Europhysics Letters |volume=10 |issue= 7|pages=687β692 |year=1989 |doi=10.1209/0295-5075/10/7/014 |bibcode=1989EL.....10..687A |s2cid=250773895 }}</ref> AdaTron uses the fact that the corresponding quadratic optimization problem is convex. The perceptron of optimal stability, together with the [[kernel trick]], are the conceptual foundations of the [[support-vector machine]]. The <math>\alpha</math>-perceptron further used a pre-processing layer of fixed random weights, with thresholded output units. This enabled the perceptron to classify [[:wiktionary:analogue|analogue]] patterns, by projecting them into a [[Binary Space Partition|binary space]]. In fact, for a projection space of sufficiently high dimension, patterns can become linearly separable. Another way to solve nonlinear problems without using multiple layers is to use higher order networks (sigma-pi unit). In this type of network, each element in the input vector is extended with each pairwise combination of multiplied inputs (second order). This can be extended to an ''n''-order network. It should be kept in mind, however, that the best classifier is not necessarily that which classifies all the training data perfectly. Indeed, if we had the prior constraint that the data come from equi-variant Gaussian distributions, the linear separation in the input space is optimal, and the nonlinear solution is [[overfitting|overfitted]]. Other linear classification algorithms include [[Winnow (algorithm)|Winnow]], [[support-vector machine]], and [[logistic regression]]. === Multiclass perceptron === Like most other techniques for training linear classifiers, the perceptron generalizes naturally to [[multiclass classification]]. Here, the input <math>x</math> and the output <math>y</math> are drawn from arbitrary sets. A feature representation function <math>f(x,y)</math> maps each possible input/output pair to a finite-dimensional real-valued feature vector. As before, the feature vector is multiplied by a weight vector <math>w</math>, but now the resulting score is used to choose among many possible outputs: :<math>\hat y = \operatorname{argmax}_y f(x,y) \cdot w.</math> Learning again iterates over the examples, predicting an output for each, leaving the weights unchanged when the predicted output matches the target, and changing them when it does not. The update becomes: :<math> w_{t+1} = w_t + f(x, y) - f(x,\hat y).</math> This multiclass feedback formulation reduces to the original perceptron when <math>x</math> is a real-valued vector, <math>y</math> is chosen from <math>\{0,1\}</math>, and <math>f(x,y) = y x</math>. For certain problems, input/output representations and features can be chosen so that <math>\mathrm{argmax}_y f(x,y) \cdot w</math> can be found efficiently even though <math>y</math> is chosen from a very large or even infinite set. Since 2002, perceptron training has become popular in the field of [[natural language processing]] for such tasks as [[part-of-speech tagging]] and [[syntactic parsing]] (Collins, 2002). It has also been applied to large-scale machine learning problems in a [[distributed computing]] setting.<ref>{{cite book |last1=McDonald |first1=R. |last2=Hall |first2=K. |last3=Mann |first3=G. |year=2010 |chapter=Distributed Training Strategies for the Structured Perceptron |title=Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL |pages=456β464 |publisher=Association for Computational Linguistics |chapter-url=https://www.aclweb.org/anthology/N10-1069.pdf }}</ref>
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Perceptron
(section)
Add topic