February 10, 2013
Hashmaps usage
Keywords:
Recently I’ve registered on the (exercism.io)[http://exercism.io] site and solved haskell tasks there. I’m recommending that site for both beginers and advanced users, as beginers may learn from the community review and advanced users may review and comment themselve. This post is not very interesting as it doesn’t contain any deep analisys or interesting notes.
There were to interesting tasks that may be solved with or without hashtables and I wanted to benchmark how hashtables may impact performance of that tasks. If you are going to participate in exercism here will be spoilers
The first task is Anagram:
{-# LANGUAGE TupleSections #-}
import qualified Data.HashMap.Strict as HM
import qualified Data.Map.Strict as M
import Data.Char
import Data.List ( sort )
import Criterion.Main
Simple version
-- by me
anagramsFor :: String -> [String] -> [String]
anagramsFor x = filter ((== xn) . normalize) . filter (/=x)
where
normalize = sort . map toLower
xn = normalize x
HashMap version:
-- by (@ethercrow)[http://exercism.io/ethercrow]
anagramsFor1 :: String -> [String] -> [String]
anagramsFor1 term
= filter (\candidate -> candidate /= term
&& letterCount candidate == termLetterCount)
where termLetterCount = letterCount term
letterCount :: String -> HM.HashMap Char Int
letterCount = HM.fromListWith (+) . fmap ((, 1) . toLower)
And anagrams:
scoresM :: M.Map Char Int
scoresM = M.fromList scores
scoresH :: HM.HashMap Char Int
scoresH = HM.fromList scores
scores :: [(Char,Int)]
scores = [ (l,c) |
(ls,c) <- [ ("AEIOULNRST",1)
, ("DG",2)
, ("BCMP",3)
, ("FHVWY",4)
, ("K",5)
, ("JX",8)
, ("QZ",10)],
l <- ls
]
scoreWord :: String -> Int
scoreWord = sum . map scoreLetter
where
scoreLetter :: Char -> Int
scoreLetter = flip (M.findWithDefault 0) scoresM . toUpper
scoreWord1 :: String -> Int
scoreWord1 = sum . map scoreLetter
where
scoreLetter :: Char -> Int
scoreLetter = flip (findWithDefault 0) scoresH . toUpper
findWithDefault v k m = case k `HM.lookup` m of
Nothing -> v
Just x -> x
main = defaultMain
[ bgroup "anagram"
[ bench "simple" $ nf (anagramsFor "Orchestra") ["cashregister", "Carthorse", "radishes"]
, bench "hashmap" $ nf (anagramsFor1 "Orchestra") ["cashregister", "Carthorse", "radishes"]
]
, bgroup "scrabble"
[ bench "simple" $ nf scoreWord "cashregister"
, bench "hashmap" $ nf scoreWord1 "cashregister"
]
]
Results:
warming up
estimating clock resolution...
mean is 1.892643 us (320001 iterations)
found 58769 outliers among 319999 samples (18.4%)
54086 (16.9%) low severe
4683 (1.5%) high severe
estimating cost of a clock call...
mean is 50.56212 ns (12 iterations)
found 1 outliers among 12 samples (8.3%)
1 (8.3%) high mild
benchmarking anagram/simple
mean: 5.985313 us, lb 5.973008 us, ub 6.000866 us, ci 0.950
std dev: 70.55760 ns, lb 57.56464 ns, ub 89.65218 ns, ci 0.950
benchmarking anagram/hashmap
mean: 8.926868 us, lb 8.689560 us, ub 9.300685 us, ci 0.950
std dev: 1.494973 us, lb 1.058908 us, ub 1.992894 us, ci 0.950
found 15 outliers among 100 samples (15.0%)
3 (3.0%) high mild
12 (12.0%) high severe
variance introduced by outliers: 91.528%
variance is severely inflated by outliers
benchmarking scrabble/simple
mean: 1.323841 us, lb 1.321476 us, ub 1.326522 us, ci 0.950
std dev: 12.91355 ns, lb 11.27313 ns, ub 15.02281 ns, ci 0.950
benchmarking scrabble/hashmap
mean: 1.276473 us, lb 1.254422 us, ub 1.320440 us, ci 0.950
std dev: 151.2879 ns, lb 81.85053 ns, ub 237.6315 ns, ci 0.950
found 11 outliers among 100 samples (11.0%)
11 (11.0%) high severe
variance introduced by outliers: 84.196%
variance is severely inflated by outliers
Interesing enough that ghc-7.8 results are code slighly worse.
comments powered by Disqus