Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Support vector machine
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Implementation == The parameters of the maximum-margin hyperplane are derived by solving the optimization. There exist several specialized algorithms for quickly solving the [[quadratic programming]] (QP) problem that arises from SVMs, mostly relying on heuristics for breaking the problem down into smaller, more manageable chunks. Another approach is to use an [[interior-point method]] that uses [[Newton's method|Newton]]-like iterations to find a solution of the [[Karush–Kuhn–Tucker conditions]] of the primal and dual problems.<ref>{{Cite journal |doi=10.1137/S1052623400374379 |title=Interior-Point Methods for Massive Support Vector Machines |journal=SIAM Journal on Optimization |volume=13 |issue=3 |pages=783–804 |year=2002 |last1=Ferris |first1=Michael C. |last2=Munson |first2=Todd S. |url=http://www.cs.wisc.edu/~ferris/papers/siopt-svm.pdf |url-status=live |archive-url=https://web.archive.org/web/20081204224416/http://www.cs.wisc.edu/~ferris/papers/siopt-svm.pdf |archive-date=2008-12-04 |citeseerx=10.1.1.216.6893 |s2cid=13563302 }}</ref> Instead of solving a sequence of broken-down problems, this approach directly solves the problem altogether. To avoid solving a linear system involving the large kernel matrix, a low-rank approximation to the matrix is often used in the kernel trick. Another common method is Platt's [[sequential minimal optimization]] (SMO) algorithm, which breaks the problem down into 2-dimensional sub-problems that are solved analytically, eliminating the need for a numerical optimization algorithm and matrix storage. This algorithm is conceptually simple, easy to implement, generally faster, and has better scaling properties for difficult SVM problems.<ref>{{cite conference |first=John C. |last=Platt |title=Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines |conference=NIPS |year=1998 |url=http://research.microsoft.com/pubs/69644/tr-98-14.pdf |url-status=live |archive-url=https://web.archive.org/web/20150702075055/http://research.microsoft.com/pubs/69644/tr-98-14.pdf |archive-date=2015-07-02 }}</ref> The special case of linear support vector machines can be solved more efficiently by the same kind of algorithms used to optimize its close cousin, [[logistic regression]]; this class of algorithms includes [[Stochastic gradient descent|sub-gradient descent]] (e.g., PEGASOS<ref>{{cite conference |first1=Shai |last1=Shalev-Shwartz |first2=Yoram |last2=Singer |first3=Nathan |last3=Srebro |title=Pegasos: Primal Estimated sub-GrAdient SOlver for SVM |conference=ICML |year=2007 |url=http://ttic.uchicago.edu/~shai/papers/ShalevSiSr07.pdf |url-status=live |archive-url=https://web.archive.org/web/20131215051700/http://ttic.uchicago.edu/~shai/papers/ShalevSiSr07.pdf |archive-date=2013-12-15 }}</ref>) and [[coordinate descent]] (e.g., LIBLINEAR<ref>{{cite journal |first1=Rong-En |last1=Fan |first2=Kai-Wei |last2=Chang |first3=Cho-Jui |last3=Hsieh |first4=Xiang-Rui |last4=Wang |first5=Chih-Jen |last5=Lin |title=LIBLINEAR: A library for large linear classification |url=https://www.csie.ntu.edu.tw/~cjlin/papers/liblinear.pdf |journal=[[Journal of Machine Learning Research]] |volume=9 |pages=1871–1874 |year=2008 }}</ref>). LIBLINEAR has some attractive training-time properties. Each convergence iteration takes time linear in the time taken to read the train data, and the iterations also have a [[Rate of convergence|Q-linear convergence]] property, making the algorithm extremely fast. The general kernel SVMs can also be solved more efficiently using [[Stochastic gradient descent|sub-gradient descent]] (e.g. P-packSVM<ref>{{cite conference |first1=Zeyuan |last1=Allen Zhu |first2=Weizhu |last2=Chen |first3=Gang |last3=Wang |first4=Chenguang |last4=Zhu |first5=Zheng |last5=Chen |title=P-packSVM: Parallel Primal grAdient desCent Kernel SVM |conference=ICDM |year=2009 |url=http://people.csail.mit.edu/zeyuan/paper/2009-ICDM-Parallel.pdf |url-status=live |archive-url=https://web.archive.org/web/20140407095709/http://people.csail.mit.edu/zeyuan/paper/2009-ICDM-Parallel.pdf |archive-date=2014-04-07 }}</ref>), especially when [[parallelization]] is allowed. Kernel SVMs are available in many machine-learning toolkits, including [[LIBSVM]], [[MATLAB]], [http://support.sas.com/documentation/cdl/en/whatsnew/64209/HTML/default/viewer.htm#emdocwhatsnew71.htm SAS], SVMlight, [https://cran.r-project.org/package=kernlab kernlab], [[scikit-learn]], [[Shogun (toolbox)|Shogun]], [[Weka (machine learning)|Weka]], [http://image.diku.dk/shark/ Shark], [https://mloss.org/software/view/409/ JKernelMachines], [[OpenCV]] and others. Preprocessing of data (standardization) is highly recommended to enhance accuracy of classification.<ref>{{Cite journal| volume = 9| issue = Aug| pages = 1871–1874| last1 = Fan| first1 = Rong-En| last2 = Chang| first2 = Kai-Wei| last3 = Hsieh| first3 = Cho-Jui| last4 = Wang| first4 = Xiang-Rui| last5 = Lin| first5 = Chih-Jen| title = LIBLINEAR: A library for large linear classification| journal = Journal of Machine Learning Research| date = 2008}}</ref> There are a few methods of standardization, such as min-max, normalization by decimal scaling, Z-score.<ref>{{Cite journal| doi = 10.19026/rjaset.6.3638| volume = 6| pages = 3299–3303| last1 = Mohamad| first1 = Ismail| last2 = Usman| first2 = Dauda| title = Standardization and Its Effects on K-Means Clustering Algorithm| journal = Research Journal of Applied Sciences, Engineering and Technology| date = 2013-09-01| issue = 17| doi-access = free}}</ref> Subtraction of mean and division by variance of each feature is usually used for SVM.<ref>{{Cite journal| doi = 10.1140/epjds/s13688-019-0201-0| volume = 8| last1 = Fennell| first1 = Peter| last2 = Zuo| first2 = Zhiya| last3 = Lerman| first3 = Kristina|author3-link=Kristina Lerman| title = Predicting and explaining behavioral data with structured feature space decomposition| journal = EPJ Data Science| date = 2019-12-01| doi-access = free| arxiv = 1810.09841}}</ref>
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Support vector machine
(section)
Add topic