# Naive Bayes Classifier in Haskell

June 7th 2012

I just started learning Haskell. So far, this language has consistently blown my mind, at every turn. I feel like every new feature is a revelation. But damn, is it hard. Till now, I’ve kept to trying simple exercises from the book I’m using, Real World Haskell (A great book). Yesterday and today, I tried my first not-totally-trivial program: I made a Naive Bayes Classifier. My classifier is in two parts: A module for the classifier and a command-line interface. The main classifier is only 34 lines, including comments! Anyway,I thought I’d put it online so that anyone can give me any feedback or comments, and if it can help anyone else, all the better. DISCLAIMER: I’m just learning Haskell, so don’t consider this a tutorial, or assume it uses any ‘best practices’, or is even remotely idiomatic.

To learn more about Naive Bayes Classifiers, see it on wikipedia, read this great tutorial in Ruby, learn about the basics of Bayes Theorem, and, if you try making a classifier and keep getting zero probability, see the Laplacian Correction section of this PDF.

Here’s My Code:

 
module BayesClassifier where
-- Text Classifier Using Bayes Formula
import Data.List
import Data.Char
type Category = String
newtype Classifier = Classifier { training :: [(Category, [String])] } deriving (Eq, Show)
-- Get a new classifer with no training
classifier :: Classifier
classifier = Classifier []
-- classifier probabilities
probabilityOfWordInCategory :: Classifier -> String -> Category -> Double
probabilityOfCategory :: Classifier -> Category -> Double
-- Adding + 1 for Laplacian Correction
probabilityOfWordInCategory (Classifier training) word category = let allInCategory = filter (\(cat, _) -> cat == category) training
allInCategoryContainingWord = filter (\(_, text) -> word elem text) allInCategory
in (fromIntegral $length allInCategoryContainingWord + 1) / (fromIntegral$ length allInCategory + 1)
probabilityOfCategory (Classifier training) category =  let allInCategory = filter (\(cat, _) -> cat == category) training
in (fromIntegral $length allInCategory) / (fromIntegral$ length training)
-- Train a classifier
train :: Classifier -> String -> Category -> Classifier
train (Classifier training ) text category = Classifier $(category, cleanInput$ text):training
-- Categorize text with a classifier
classify :: Classifier -> String -> Category
classify classifier text = fst $head$ sortBy (\(_, a) (_, b)  -> b compare a) $probabilities classifier text -- Get Probability for each Category probabilities :: Classifier -> String -> [(Category, Double)] probabilities classifier@(Classifier training) text = map (\cat -> (cat, probabilityForCategory classifier text cat))$ nub $map (\(cat, _) -> cat) training -- Get Probability for a passage in a certain category probabilityForCategory :: Classifier -> String -> Category -> Double probabilityForCategory classifier text category = (+) (log$ probabilityOfCategory classifier category)  (sum $map (\word -> log$ probabilityOfWordInCategory classifier word category) $cleanInput text) -- Lowercase, Remove Punctuation cleanInput :: String -> [String] cleanInput text = filter (\w -> not (w elem stopWords))$ words $filter (elem ' ':['a'..'z'])$ map toLower text
 

import System.IO
import BayesClassifier

interactionLoop myClassifier function = case function of
"start" ->
do
putStrLn "Enter an action [train|classify]"
action <- getLine
interactionLoop myClassifier action
"train" ->
do
putStr "Category: "
category <- getLine
putStr "Material: "
material <- getLine
interactionLoop (train myClassifier material category) "start"
"classify" ->
do
putStr "Material: "
material <- getLine
putStrLn $classify myClassifier material putStrLn . show$ probabilities myClassifier material
putStrLn "\n\n\n\n"
interactionLoop myClassifier "start"
_ ->
interactionLoop myClassifier "start"
main = do
hSetBuffering stdout NoBuffering
hSetBuffering stdin NoBuffering
interactionLoop classifier "start"


`