Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Neural network (machine learning)
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Deep learning breakthroughs in the 1960s and 1970s=== Fundamental research was conducted on ANNs in the 1960s and 1970s. The first working deep learning algorithm was the [[Group method of data handling]], a method to train arbitrarily deep neural networks, published by [[Alexey Ivakhnenko]] and Lapa in the [[Soviet Union]] (1965). They regarded it as a form of polynomial regression,<ref name="ivak1965">{{cite book|first1=A. G. |last1=Ivakhnenko |first2=V. G. |last2=Lapa |title=Cybernetics and Forecasting Techniques|url={{google books |plainurl=y |id=rGFgAAAAMAAJ}}|year=1967|publisher=American Elsevier Publishing Co.|isbn=978-0-444-00020-0}}</ref> or a generalization of Rosenblatt's perceptron.<ref>{{Cite journal |last=Ivakhnenko |first=A.G. |date=March 1970 |title=Heuristic self-organization in problems of engineering cybernetics |url=https://linkinghub.elsevier.com/retrieve/pii/0005109870900920 |journal=Automatica |language=en |volume=6 |issue=2 |pages=207–219 |doi=10.1016/0005-1098(70)90092-0 |archive-date=12 August 2024 |access-date=7 August 2024 |archive-url=https://web.archive.org/web/20240812123448/https://linkinghub.elsevier.com/retrieve/pii/0005109870900920 |url-status=live }}</ref> A 1971 paper described a deep network with eight layers trained by this method,<ref name="ivak1971">{{Cite journal|last=Ivakhnenko|first=Alexey|date=1971|title=Polynomial theory of complex systems|url=http://gmdh.net/articles/history/polynomial.pdf|journal=IEEE Transactions on Systems, Man, and Cybernetics|pages=364–378|doi=10.1109/TSMC.1971.4308320|volume=SMC-1|issue=4|access-date=5 November 2019|archive-date=29 August 2017|archive-url=https://web.archive.org/web/20170829230621/http://www.gmdh.net/articles/history/polynomial.pdf|url-status=live}}</ref> which is based on layer by layer training through regression analysis. Superfluous hidden units are pruned using a separate validation set. Since the activation functions of the nodes are Kolmogorov-Gabor polynomials, these were also the first deep networks with multiplicative units or "gates."<ref name="DLhistory">{{cite arXiv |eprint=2212.11279 |class=cs.NE |first=Jürgen |last=Schmidhuber |author-link=Jürgen Schmidhuber |title=Annotated History of Modern AI and Deep Learning |date=2022}}</ref> The first deep learning [[multilayer perceptron]] trained by [[stochastic gradient descent]]<ref name="robbins1951">{{Cite journal | last1 = Robbins | first1 = H. | author-link = Herbert Robbins| last2 = Monro | first2 = S. | doi = 10.1214/aoms/1177729586 | title = A Stochastic Approximation Method | journal = The Annals of Mathematical Statistics | volume = 22 | issue = 3 | pages = 400 | year = 1951 | doi-access = free }}</ref> was published in 1967 by [[Shun'ichi Amari]].<ref name="Amari1967">{{cite journal |last1=Amari |first1=Shun'ichi |author-link=Shun'ichi Amari|title=A theory of adaptive pattern classifier|journal= IEEE Transactions |date=1967 |volume=EC |issue=16 |pages=279–307}}</ref> In computer experiments conducted by Amari's student Saito, a five layer MLP with two modifiable layers learned [[Knowledge representation|internal representations]] to classify non-linearily separable pattern classes.<ref name="DLhistory"/> Subsequent developments in hardware and hyperparameter tunings have made end-to-end stochastic gradient descent the currently dominant training technique. In 1969, [[Kunihiko Fukushima]] introduced the [[rectifier (neural networks)|ReLU]] (rectified linear unit) activation function.<ref name="DLhistory" /><ref name="Fukushima1969">{{cite journal |last1=Fukushima |first1=K. |date=1969 |title=Visual feature extraction by a multilayered network of analog threshold elements |journal=IEEE Transactions on Systems Science and Cybernetics |volume=5 |issue=4 |pages=322–333 |doi=10.1109/TSSC.1969.300225}}</ref><ref name=sonoda17>{{cite journal | last1 = Sonoda | first1 = Sho | last2=Murata | first2=Noboru | s2cid = 12149203 | year = 2017 | title = Neural network with unbounded activation functions is universal approximator | journal = Applied and Computational Harmonic Analysis | volume = 43 | issue = 2 | pages = 233–268 | doi = 10.1016/j.acha.2015.12.005| arxiv = 1505.03654 }}</ref> The rectifier has become the most popular activation function for deep learning.<ref>{{cite arXiv |eprint=1710.05941 |class=cs.NE |first1=Prajit |last1=Ramachandran |first2=Zoph |last2=Barret |title=Searching for Activation Functions |date=16 October 2017 |last3=Quoc |first3=V. Le}}</ref> Nevertheless, research stagnated in the United States following the work of [[Marvin Minsky|Minsky]] and [[Seymour Papert|Papert]] (1969),<ref name=":132">{{cite book |last1=Minsky |first1=Marvin |url={{google books |plainurl=y |id=Ow1OAQAAIAAJ}} |title=Perceptrons: An Introduction to Computational Geometry |last2=Papert |first2=Seymour |publisher=MIT Press |year=1969 |isbn=978-0-262-63022-1}}</ref> who emphasized that basic perceptrons were incapable of processing the exclusive-or circuit. This insight was irrelevant for the deep networks of Ivakhnenko (1965) and Amari (1967). In 1976 transfer learning was introduced in neural networks learning.<ref>Bozinovski S. and Fulgosi A. (1976). "The influence of pattern similarity and transfer learning on the base perceptron training" (original in Croatian) Proceedings of Symposium Informatica 3-121-5, Bled.</ref><ref>Bozinovski S.(2020) "Reminder of the first paper on transfer learning in neural networks, 1976". Informatica 44: 291–302.</ref> Deep learning architectures for [[convolutional neural network]]s (CNNs) with convolutional layers and downsampling layers and weight replication began with the [[Neocognitron]] introduced by Kunihiko Fukushima in 1979, though not trained by backpropagation.<ref name="FUKU1979">{{cite journal |last1=Fukushima |first1=K. |year=1979 |title=Neural network model for a mechanism of pattern recognition unaffected by shift in position—Neocognitron |journal=Trans. IECE (In Japanese)|volume= J62-A |issue=10 |pages=658–665 |doi=10.1007/bf00344251 |pmid=7370364 |s2cid=206775608}}</ref><ref name="FUKU1980">{{cite journal |last1=Fukushima |first1=K. |year=1980 |title=Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position |journal=Biol. Cybern. |volume=36 |issue=4 |pages=193–202 |doi=10.1007/bf00344251 |pmid=7370364 |s2cid=206775608}}</ref><ref name="SCHIDHUB4"/>
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Neural network (machine learning)
(section)
Add topic