Friday 17 April 2015

Performance Measure of Naive Bayes Classifier

Performance Measure
Naive Bayes classifier works better when the data set is small because of its low variance. It follows a simple algorithm which basically is related to counting. An NB classifier will give faster output in case the NB conditional independence assumption holds and in the case it when it doesn’t, an NB classifier still performs much better than expected more often than not. A naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. Its a good choice when some kind of semi-supervised learning is needed.

Examples and Test Cases
Naive Bayes Classifier is good for the text classification problem of email spam filtering: classifying email messages into spam and non-spam . Since often a document is represented as a bag of words, text classifiers often don’t use any kind of deep representation about language:. This is an extremely simple representation: it only knows which words are included in the document along with their occurrences, and doesn’t store the word order.


The inclusion of strong feature independence assumptions makes it unsuitable for Speech Recognition. Consider a model that uses the average sentence length as a feature amongst others. Now, if we add some features modeling syntactic complexity of sentences in a text, such features may add new cues to the model, but syntactic complexity also has a correlation with sentence length. In such situations, naive Bayes models may fail, since they treat all features as independent.

No comments:

Post a Comment