Recently I’ve registered on the (exercism.io)[http://exercism.io] site and solved haskell tasks there. I’m recommending that site for both beginers and advanced users, as beginers may learn from the community review and advanced users may review and comment themselve. This post is not very interesting as it doesn’t contain any deep analisys or interesting notes.

There were to interesting tasks that may be solved with or without hashtables and I wanted to benchmark how hashtables may impact performance of that tasks. If you are going to participate in exercism here will be spoilers

The first task is Anagram:

{-# LANGUAGE TupleSections #-}
import qualified Data.HashMap.Strict as HM
import qualified Data.Map.Strict as M
import Data.Char
import Data.List ( sort )
import Criterion.Main

Simple version

-- by me
anagramsFor :: String -> [String] -> [String]
anagramsFor x = filter ((== xn) . normalize) . filter (/=x)
  where
     normalize = sort . map toLower
     xn = normalize x

HashMap version:

-- by (@ethercrow)[http://exercism.io/ethercrow]
anagramsFor1 :: String -> [String] -> [String]
anagramsFor1 term
   = filter (\candidate -> candidate /= term
               && letterCount candidate == termLetterCount)
 where termLetterCount = letterCount term

letterCount :: String -> HM.HashMap Char Int
letterCount = HM.fromListWith (+) . fmap ((, 1) . toLower)

And anagrams:

scoresM :: M.Map Char Int
scoresM = M.fromList scores

scoresH :: HM.HashMap Char Int
scoresH = HM.fromList scores

scores :: [(Char,Int)]
scores =  [ (l,c) |
    (ls,c) <- [ ("AEIOULNRST",1)
              , ("DG",2)
              , ("BCMP",3)
              , ("FHVWY",4)
              , ("K",5)
              , ("JX",8)
              , ("QZ",10)],
    l <- ls
  ]


scoreWord :: String -> Int
scoreWord = sum . map scoreLetter
  where
    scoreLetter :: Char -> Int
    scoreLetter = flip (M.findWithDefault 0) scoresM . toUpper


scoreWord1 :: String -> Int
scoreWord1 = sum . map scoreLetter
   where
      scoreLetter :: Char -> Int
      scoreLetter = flip (findWithDefault 0) scoresH . toUpper
      findWithDefault v k m = case k `HM.lookup` m of
        Nothing -> v
        Just x  -> x
main = defaultMain
 [ bgroup "anagram"
    [ bench "simple"  $ nf (anagramsFor "Orchestra") ["cashregister", "Carthorse", "radishes"]
    , bench "hashmap" $ nf (anagramsFor1 "Orchestra") ["cashregister", "Carthorse", "radishes"]
    ]
 , bgroup "scrabble" 
    [ bench "simple"  $ nf scoreWord "cashregister"
    , bench "hashmap" $ nf scoreWord1 "cashregister"
    ]
 ]

Results:

warming up
estimating clock resolution...
mean is 1.892643 us (320001 iterations)
found 58769 outliers among 319999 samples (18.4%)
  54086 (16.9%) low severe
  4683 (1.5%) high severe
estimating cost of a clock call...
mean is 50.56212 ns (12 iterations)
found 1 outliers among 12 samples (8.3%)
  1 (8.3%) high mild

benchmarking anagram/simple
mean: 5.985313 us, lb 5.973008 us, ub 6.000866 us, ci 0.950
std dev: 70.55760 ns, lb 57.56464 ns, ub 89.65218 ns, ci 0.950

benchmarking anagram/hashmap
mean: 8.926868 us, lb 8.689560 us, ub 9.300685 us, ci 0.950
std dev: 1.494973 us, lb 1.058908 us, ub 1.992894 us, ci 0.950
found 15 outliers among 100 samples (15.0%)
  3 (3.0%) high mild
  12 (12.0%) high severe
variance introduced by outliers: 91.528%
variance is severely inflated by outliers

benchmarking scrabble/simple
mean: 1.323841 us, lb 1.321476 us, ub 1.326522 us, ci 0.950
std dev: 12.91355 ns, lb 11.27313 ns, ub 15.02281 ns, ci 0.950

benchmarking scrabble/hashmap
mean: 1.276473 us, lb 1.254422 us, ub 1.320440 us, ci 0.950
std dev: 151.2879 ns, lb 81.85053 ns, ub 237.6315 ns, ci 0.950
found 11 outliers among 100 samples (11.0%)
  11 (11.0%) high severe
variance introduced by outliers: 84.196%
variance is severely inflated by outliers

Interesing enough that ghc-7.8 results are code slighly worse.




comments powered by Disqus