Qnikst blog - Marrying Haskell and Hyper-Threading (Post Scriptum)

After writing of the previous blog post, I got some interesting feedback from the working chat and r/haskell. Some of the input I want to hightlight explicitly. Feedback order is arbitrary.

First thing is a discussion of the explicit pinning capabilities to the cores. It’s possible using +RTS -qa flag, as it was mentioned by the nh2 on Reddit. As I mentioned in the previous blog post, my approach will not work with this option correctly (for some reason I have used -xm instead of -qa in that post, I’m sorry) and I’ll need to redefine more functions. But in general pinning capabilities to cores may work on all possible CPU layouts. I have not looked deep inside that issue as in most of our cases -qa flag gave me worse performance, so your program should have some special properties to make benefit from the hard pinning. I think it’s possible to use /proc/cpuinfo to make the most efforts when pinning capabilities.

The entire thread is very entertaining and if you are interested in the topic then I recommend to check out ther comments as well.

Secondly, there was a question if my reasoning was incorrect and it’s enough to leave one thread off and still have better performance. We used this approach in some projects, however for one particular case the results with N-1 threads were very depressing:

        Cumulative quantiles per tag (N7)
        99%       98%     95%    90%    85%      80%   75%    50%
Overall 4600ms   4380ms   3980ms 3540ms 3400ms 3280ms 3210ms 1105ms
get     4600ms   4390ms   3980ms 3550ms 3410ms 3290ms 3210ms 1145ms
put     4600ms   4380ms   3980ms 3540ms 3400ms 3280ms 3210ms 1100ms

        Cumulative quantiles per tag (N4)
        99%       98%     95%      90%    85%     80%   75%    50%
Overall  139ms    105ms     37ms   17ms   12ms    8ms    6ms    2ms
get      139ms    104ms     37ms   18ms   12ms    9ms    7ms    2ms
put      139ms    105ms     37ms   17ms   12ms    8ms    6ms    2ms

There is 1 to 3 orders of magnitude differences in response times, without going deeper I have decided to stick with -N4 for now.

The third, @TerrorJack adviced me to improve teardown procedure in the wrapper.c, as it should check ifRTS was stopped and report its status. So I have rechecked the sources and introduced few updates that allow to report status of running haskell command (the same way as RTS does), and which do not require using FFI extension in the Haskell code.

Share on: