Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Speech recognition
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
====Deep feedforward and recurrent neural networks==== {{Main|Deep learning}} Deep neural networks and denoising [[autoencoder]]s<ref>{{Cite book |last1=Maas |first1=Andrew L. |title=Proceedings of Interspeech 2012 |last2=Le |first2=Quoc V. |last3=O'Neil |first3=Tyler M. |last4=Vinyals |first4=Oriol |last5=Nguyen |first5=Patrick |last6=Ng |first6=Andrew Y. |author-link6=Andrew Ng |year=2012 |chapter=Recurrent Neural Networks for Noise Reduction in Robust ASR}}</ref> are also under investigation. A deep feedforward neural network (DNN) is an [[artificial neural network]] with multiple hidden layers of units between the input and output layers.<ref name=HintonDengYu2012/> Similar to shallow neural networks, DNNs can model complex non-linear relationships. DNN architectures generate compositional models, where extra layers enable composition of features from lower layers, giving a huge learning capacity and thus the potential of modeling complex patterns of speech data.<ref name=BOOK2014/> A success of DNNs in large vocabulary speech recognition occurred in 2010 by industrial researchers, in collaboration with academic researchers, where large output layers of the DNN based on context dependent HMM states constructed by decision trees were adopted.<ref name="Roles2010">{{Cite journal |last1=Yu |first1=D. |last2=Deng |first2=L. |last3=Dahl |first3=G. |date=2010 |title=Roles of Pre-Training and Fine-Tuning in Context-Dependent DBN-HMMs for Real-World Speech Recognition |url=https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/dbn4asr-nips2010.pdf |journal=NIPS Workshop on Deep Learning and Unsupervised Feature Learning}}</ref><ref name="ref27">{{Cite journal |last1=Dahl |first1=George E. |last2=Yu |first2=Dong |last3=Deng |first3=Li |last4=Acero |first4=Alex |date=2012 |title=Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition |journal=IEEE Transactions on Audio, Speech, and Language Processing |volume=20 |issue=1 |pages=30–42 |doi=10.1109/TASL.2011.2134090 |s2cid=14862572}}</ref> <ref name="ICASSP2013">Deng L., Li, J., Huang, J., Yao, K., Yu, D., Seide, F. et al. [https://pdfs.semanticscholar.org/6bdc/cfe195bc49d218acc5be750aa49e41f408e4.pdf Recent Advances in Deep Learning for Speech Research at Microsoft] {{Webarchive|url=https://web.archive.org/web/20240909052236/https://pdfs.semanticscholar.org/6bdc/cfe195bc49d218acc5be750aa49e41f408e4.pdf |date=9 September 2024 }}. ICASSP, 2013.</ref> See comprehensive reviews of this development and of the state of the art as of October 2014 in the recent Springer book from Microsoft Research.<ref name="ReferenceA" /> See also the related background of automatic speech recognition and the impact of various machine learning paradigms, notably including [[deep learning]], in recent overview articles.<ref>{{Cite journal |last1=Deng |first1=L. |last2=Li |first2=Xiao |date=2013 |title=Machine Learning Paradigms for Speech Recognition: An Overview |url=http://cvsp.cs.ntua.gr/courses/patrec/slides_material2018/slides-2018/DengLi_MLParadigms-SpeechRecogn-AnOverview_TALSP13.pdf |journal=IEEE Transactions on Audio, Speech, and Language Processing |volume=21 |issue=5 |pages=1060–1089 |doi=10.1109/TASL.2013.2244083 |s2cid=16585863 |access-date=9 September 2024 |archive-date=9 September 2024 |archive-url=https://web.archive.org/web/20240909052239/http://cvsp.cs.ntua.gr/courses/patrec/slides_material2018/slides-2018/DengLi_MLParadigms-SpeechRecogn-AnOverview_TALSP13.pdf |url-status=live }}</ref><ref name="scholarpedia2015">{{Cite journal |last=Schmidhuber |first=Jürgen |author-link=Jürgen Schmidhuber |year=2015 |title=Deep Learning |journal=Scholarpedia |volume=10 |issue=11 |page=32832 |bibcode=2015SchpJ..1032832S |doi=10.4249/scholarpedia.32832 |doi-access=free}}</ref> One fundamental principle of [[deep learning]] is to do away with hand-crafted [[feature engineering]] and to use raw features. This principle was first explored successfully in the architecture of deep autoencoder on the "raw" spectrogram or linear filter-bank features,<ref name="interspeech2010">L. Deng, M. Seltzer, D. Yu, A. Acero, A. Mohamed, and G. Hinton (2010) [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.185.1908&rep=rep1&type=pdf Binary Coding of Speech Spectrograms Using a Deep Auto-encoder]. Interspeech.</ref> showing its superiority over the Mel-Cepstral features which contain a few stages of fixed transformation from spectrograms. The true "raw" features of speech, waveforms, have more recently been shown to produce excellent larger-scale speech recognition results.<ref name="interspeech2014">{{Cite book |last1=Tüske |first1=Zoltán |title=Interspeech 2014 |last2=Golik |first2=Pavel |last3=Schlüter |first3=Ralf |last4=Ney |first4=Hermann |year=2014 |chapter=Acoustic Modeling with Deep Neural Networks Using Raw Time Signal for LVCSR |chapter-url=https://www-i6.informatik.rwth-aachen.de/publications/download/937/T%7Bu%7DskeZolt%7Ba%7DnGolikPavelSchl%7Bu%7DterRalfNeyHermann--AcousticModelingwithDeepNeuralNetworksUsingRawTimeSignalfor%7BLVCSR%7D--2014.pdf |archive-url=https://web.archive.org/web/20161221174753/https://www-i6.informatik.rwth-aachen.de/publications/download/937/T%7Bu%7DskeZolt%7Ba%7DnGolikPavelSchl%7Bu%7DterRalfNeyHermann--AcousticModelingwithDeepNeuralNetworksUsingRawTimeSignalfor%7BLVCSR%7D--2014.pdf |archive-date=21 December 2016 |url-status=live |df=dmy-all}}</ref>
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Speech recognition
(section)
Add topic