Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Naive Bayes classifier
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Gaussian naive Bayes=== When dealing with continuous data, a typical assumption is that the continuous values associated with each class are distributed according to a [[Normal distribution|normal]] (or Gaussian) distribution. For example, suppose the training data contains a continuous attribute, '''<math>x</math>'''. The data is first segmented by the class, and then the mean and [[Variance#Estimating the variance|variance]] of <math>x</math> is computed in each class. Let <math>\mu_k</math> be the mean of the values in <math>x</math> associated with class <math>C_k</math>, and let <math>\sigma^2_k</math> be the [[Bessel's correction|Bessel corrected variance]] of the values in <math>x</math> associated with class <math>C_k</math>. Suppose one has collected some observation value <math>v</math>. Then, the probability ''density'' of <math>v</math> given a class <math>C_k</math>, i.e., <math>p(x=v \mid C_k)</math>, can be computed by plugging <math>v</math> into the equation for a [[normal distribution]] parameterized by <math>\mu_k</math> and <math>\sigma^2_k</math>. Formally, <math display="block"> p(x=v \mid C_k) = \frac{1}{\sqrt{2\pi\sigma^2_k}}\,e^{ -\frac{(v-\mu_k)^2}{2\sigma^2_k} } </math> Another common technique for handling continuous values is to use binning to [[Discretization of continuous features|discretize]] the feature values and obtain a new set of Bernoulli-distributed features. Some literature suggests that this is required in order to use naive Bayes, but it is not true, as the discretization may [[Discretization error|throw away discriminative information]].<ref name="idiots"/> Sometimes the distribution of class-conditional marginal densities is far from normal. In these cases, [[kernel density estimation]] can be used for a more realistic estimate of the marginal densities of each class. This method, which was introduced by John and Langley,<ref name="john95"/> can boost the accuracy of the classifier considerably.<ref name="piryonesi2020">{{Cite journal |last1=Piryonesi |first1=S. Madeh |last2=El-Diraby |first2=Tamer E. |date=2020-06-01 |title=Role of Data Analytics in Infrastructure Asset Management: Overcoming Data Size and Quality Problems |journal=Journal of Transportation Engineering, Part B: Pavements |volume=146 |issue=2 |pages=04020022 |doi=10.1061/JPEODX.0000175 |s2cid=216485629}}</ref><ref name="hastie01">{{Cite book |last=Hastie, Trevor. |title=The elements of statistical learning : data mining, inference, and prediction : with 200 full-color illustrations |date=2001 |publisher=Springer |others=Tibshirani, Robert., Friedman, J. H. (Jerome H.) |isbn=0-387-95284-5 |location=New York |oclc=46809224}}</ref>
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Naive Bayes classifier
(section)
Add topic