Haskell version should be more idiomatic, use proper tools & threading, use benchmarking #26

bitemyapp · 2016-02-14T20:55:46Z

bitemyapp · 2016-02-14T21:08:31Z

New version (Based on @bartavelle's solution) takes 51ms instead of 2.5 seconds on my machine.

bitemyapp · 2016-02-14T22:11:10Z

@atemerev is there something blocking this PR?

codygman · 2016-02-14T22:36:23Z

@atemerev I'm looking forward to seeing this Haskell version in the benchmark, if there's anything blocking this PR I can help as well!

bitemyapp · 2016-02-14T22:40:25Z

I added more results for running these on Ubuntu and tried to get at least basic intervals for the non-Haskell stuff (I don't know of criterion equivalents for the others). Ideally so people can get a sense of the upper and lower bounds.

ChristopherKing42 · 2016-02-15T05:01:15Z

The current version of Parallel.hs is wrong. It outputs 500000500000 instead of 499999500000. This is fixed in https://github.com/bitemyapp/skynet/pull/1.

Merge #26 with #36

Gabriella439 · 2016-02-15T05:37:09Z

I like this version because it shows the performance trade off of various points on the design spectrum

Rydgel · 2016-02-15T10:14:44Z

I really want to see this merged as well.

passy · 2016-02-15T10:48:08Z

haskell/skynet.cabal

+cabal-version:       >=1.10
+
+executable skynet
+  ghc-options:         -O2 -threaded -rtsopts


ghost · 2016-02-15T15:07:59Z

+1, go gets an unfair advantage in current master since it's using channels, while Haskell simply uses waits. The proper comparison should be between languages using the same paradigm (and since Haskell can use multiple, that offers an advantage here).

ChristopherKing42 · 2016-02-15T16:19:14Z

haskell/skynet.cabal

+version:             0.1.0.0
+synopsis:            Simple project template from stack
+description:         Please see README.md
+homepage:            http://github.com/bitemyapp/skynet#readme


Shouldn't this be @atemerev's repo? (I'm not sure what the conventions for gitHub and cabal are.)

All of these fields are optional and can be removed. Only the name field is mandatory IIRC

It's from the stack template, an oversight. I can remove it if it's an issue. I didn't expect it would get uploaded to Hackage so shouldn't have any relevance.

Add stack hidden files to .gitignore

ChristopherKing42 · 2016-02-15T21:17:02Z

@wizzard0 is there anything else needed to be done with this Pull Request?

Haskell version should be more idiomatic, use proper tools & threading, use benchmarking

bitemyapp · 2016-02-15T22:38:45Z

@atemerev Thank you :)

oblitum · 2016-02-16T03:25:53Z

chan version of this is returning a wrong result.

After analysing the parallel version.... I'm contemplating whether a C++ pure compile time version is doable and fair.

oblitum · 2016-02-16T04:47:46Z

OK, this is essentially the same algorithm that is being used in the new haskell "parallel" version that is being unfairly compared to all other non-parallel schemes, only that it's ported to C++ and, hey, it's a compile-time version, isn't it also fair? I'm not bothering constructing a PR for this, except in case the owner is interested in merging this to compare it side-by-side, just like haskell is doing, because, hey, it's a idiomatic language feature! why not 😄

#include <chrono>
#include <cstdint>
#include <iostream>

constexpr int64_t skynet(int64_t levels, int64_t children, int64_t position = 0) {
    if (levels == 0)
        return position;
    int64_t sum = 0;
    for(int64_t i = 0; i < children; ++i)
        sum += skynet(levels - 1, children, position*children + i);
    return sum;
}

int main() {
    using namespace std;
    using namespace std::chrono;

    auto start = high_resolution_clock::now();
    constexpr auto result = skynet(6, 10);  // remove constexpr to eval at runtime
    auto elapsed = high_resolution_clock::now() - start;

    cout << "Result: " << result << '\n';
    cout << "time    " << duration_cast<milliseconds>(elapsed).count() << " ms" << endl;
}

With clang, you should compile this with:

clang++ -fconstexpr-steps=10000000 -std=c++14 -O3 skynet.cpp -o skynet

with gcc, do this:

g++ -std=c++14 -O3 skynet.cpp -o skynet

Clang is smarter and compiles this in quite a short time compared to GCC. Compilation time for clang should be shorter than what haskell/stack takes to compile its stuff.

Time to run: 0 ms

Switching to runtime version is easy, as I've commented in the code:

Time for runtime version: 10 ms

In all cases, this one has evolved like the haskell version, and not only deprecated concurrency/channels/CSP/actors/whatever, but also parallelism! As they say: "less is exponentially more".

oblitum · 2016-02-16T05:26:27Z

cc #2, #11.

bartavelle · 2016-02-16T07:20:59Z

What makes you think the Haskell version is not parallel, or is not computing its result at runtime? The version I wrote that is very close to what has been merged does!

bartavelle · 2016-02-16T07:32:51Z

Here is a profiling run without the benchmarking:

        Tue Feb 16 08:29 2016 Time and Allocation Profiling Report  (Final)

           skynet +RTS -p -RTS

        total time  =        0.12 secs   (125 ticks @ 1000 us, 1 processor)
        total alloc = 182,304,880 bytes  (excludes profiling overheads)

COST CENTRE  MODULE   %time %alloc

skynet.sky   Parallel  57.6   70.7
skynet.sky.\ Parallel  42.4   29.3


                                                               individual     inherited
COST CENTRE          MODULE                  no.     entries  %time %alloc   %time %alloc

MAIN                 MAIN                     60           0    0.0    0.0   100.0  100.0
 run                 Parallel                122           0    0.0    0.0     0.0    0.0
  skynet             Parallel                123           1    0.0    0.0     0.0    0.0
 CAF:main1           Main                    118           0    0.0    0.0     0.0    0.0
  main               Main                    120           1    0.0    0.0     0.0    0.0
 CAF:run1            Parallel                110           0    0.0    0.0     0.0    0.0
  run                Parallel                121           1    0.0    0.0     0.0    0.0
 CAF:run4            Parallel                109           0    0.0    0.0   100.0  100.0
  run                Parallel                124           0    0.0    0.0   100.0  100.0
   skynet            Parallel                125           0    0.0    0.0   100.0  100.0
    skynet.sky       Parallel                126     1111111   57.6   70.7   100.0  100.0
     skynet.sky.\    Parallel                130     1111110   42.4   29.3    42.4   29.3
 CAF:childnums_r4kW  Parallel                108           0    0.0    0.0     0.0    0.0
  run                Parallel                127           0    0.0    0.0     0.0    0.0
   skynet            Parallel                128           0    0.0    0.0     0.0    0.0
    skynet.childnums Parallel                129           1    0.0    0.0     0.0    0.0
 CAF:run2            Parallel                107           0    0.0    0.0     0.0    0.0
  run                Parallel                132           0    0.0    0.0     0.0    0.0
 CAF:run3            Parallel                106           0    0.0    0.0     0.0    0.0
  run                Parallel                131           0    0.0    0.0     0.0    0.0
 CAF                 Data.Time.Clock.UTC     105           0    0.0    0.0     0.0    0.0
 CAF                 GHC.IO.Handle.FD        100           0    0.0    0.0     0.0    0.0
 CAF                 GHC.Event.Thread         98           0    0.0    0.0     0.0    0.0
 CAF                 GHC.IO.Encoding          95           0    0.0    0.0     0.0    0.0
 CAF                 GHC.IO.Handle.Text       92           0    0.0    0.0     0.0    0.0
 CAF                 GHC.Conc.Signal          88           0    0.0    0.0     0.0    0.0
 CAF                 GHC.IO.Encoding.Iconv    85           0    0.0    0.0     0.0    0.0

You can see that the sky function is executed the correct amount of times. Also:

$ time .stack-work/install/x86_64-linux/lts-5.1/7.10.3/bin/skynet +RTS -N8 -p
Result: 499999500000 in 0.128098s
 +RTS -N8 -p  0,93s user 0,05s system 735% cpu 0,134 total

It is using all cores on my laptop.

oblitum · 2016-02-16T07:32:53Z

@bartavelle I didn't state it's not parallel, I've stated it is, meaning that this turn in the implementation has changed dramatically what's being compared, whatever this bench is trying to compare. Camparing Go using channels and goroutines with this (haskell version), which is just trivial parallel code without any channel, etc, makes as much sense as comparing the haskell parallel version to the c++ compile-time/runtime version (I can't even state at what extent the compiler would unroll, vectorize, etc, the linear c++ code [runtime], effectively making it parallel to some extent). Despite this, maybe it's still possible to shoehorn something like Boost.Coroutine2 into this just for sake of claiming to be using coroutines.

bartavelle · 2016-02-16T07:34:07Z

Ah sorry, I didn't get your point. I think you are right, but this is a benchmark game ;)

oblitum · 2016-02-16T07:35:07Z

@bartavelle so, is there anyone left to beat 0? ;-)

bitemyapp · 2016-02-16T07:41:07Z

@oblitum I already rejected a PR to calculate it at compile-time in Haskell on my fork. There's your "0". I wouldn't accept anything not parallel/concurrent either.

oblitum · 2016-02-16T07:42:07Z

@bitemyapp you mean compile-time?

bitemyapp · 2016-02-16T07:42:40Z

@oblitum It's 0142 in my timezone and not far off in yours. Slow down son.

oblitum · 2016-02-16T08:46:06Z

@bitemyapp I'm not into this bench anymore, I'd just like to point out that after this pull the README has been left inconsistent/misleading. The implementation has been massaged for parallelism, for which numbers were published, while claiming it "coroutine/channel" has been left untouched. And, as said before, the internal chan version is also returning wrong results.

ChristopherKing42 · 2016-02-16T14:40:21Z

@oblitum See #54

jb55 · 2016-02-17T22:16:34Z

👎 How is this in the spirit of the competition at all?

The main benchmark should be changed to use concurrency (forkIO), not tight-loop parallelism.

Gabriella439 · 2016-02-17T23:14:10Z

So I think the number to report should be the version using concurrency (i.e. forkIO) but I believe the other versions should still be retained to show better ways to solve the same problem

oblitum · 2016-05-29T13:46:15Z

FYI http://nosubstance.me/post/the-curious-case-of-the-infamous-skynet-benchmark/

ghost · 2016-05-29T20:55:30Z

@oblitum I disagree.

Just to be clear, Haskell lists are not iterators. Consider the following sequential Haskell version:

main :: IO ()
main = print $ sum [0..999999]

$ ghc -O2 slow.hs
$ time ./slow
499999500000

real 0m0.040s
user 0m0.038s
sys  0m0.004s

Let's compare that with a sequential Rust version:

fn main() {
    let sum: i64 = (0..1000000).fold(0, |sum, x| sum+x);
    println!("{}",sum);
}

$ rustc -O test.rs
$ time ./test
499999500000
real 0m0.003s
user 0m0.002s
sys  0m0.001s

You might be tempted to think that Haskell is slower. But it's not! Lists are not loops. Consider a fairer comparison:

summation :: Int -> Int -> Int
summation accum end
    | end == 0  = accum
    | otherwise = summation (accum+end) (end-1)

main :: IO ()
main = print $ summation 0 999999

$ ghc -O2 fast.hs
$ time ./fast
499999500000

real 0m0.003s
user 0m0.002s
sys  0m0.001s

The sequential Haskell version is pretty close to the sequential Rust version, despite the fact that the Haskell version also needed to spawn a garbage collector. You might get slightly better performance in a language without a garbage collector, such as Rust, but it's not going to be a huge difference, given that you properly optimize the code.

I do agree that rpar does work stealing, and it is slightly unfair to compare it to other languages which are not doing the same thing. However, I don't think the Rust version you wrote is equivalent, because it's using iterators instead of lists.

oblitum · 2016-05-29T21:10:39Z

@siddhanathan 😄 thanks for your analysis. Sorry for not being fair with Haskell now, I guess we are even then?

To tell the truth I just cared to use Haskell's parallel version as-is because it's what is present here in the repo, whether lists are not loops, are lazy evaluated, etc, seems a bit far from the rpar topic, and a Haskell specific optimization.

IIRC, Haskell offers mechanisms for non lazy evaluated lists, so, I wonder whether just using it would lead to the same improvement you got, even though it would still be different from Rust's iterator-over-range version which should probably be reduced to a loop automatically by the compiler (unsure when using Rayon, but it's a common expectation from Rust compiler to do that), anyway, in Rust I'm still working in the language with the high level concept of a list (a range and a iterator is not conceptually the same as a loop) while knowing the compiler is smart enough to turn it into a loop in machine code.

ghost · 2016-05-30T03:35:18Z

@oblitum I doubt the strictness (strict vs lazy) or type of data structure (linked list vs arrays) would yield that sort of speedup. There's stream fusion, but that's a whole new topic.

I played around with the MVar code a little, and managed to reduce the time it spent in garbage collection:

{-# LANGUAGE BangPatterns #-}

import Control.Concurrent (forkIO)
import Control.Concurrent.MVar (MVar, newEmptyMVar, putMVar, takeMVar)
import Data.Time.Clock (getCurrentTime, diffUTCTime)

loop :: Int -> Int -> Int -> Int -> Int -> IO Int
loop !accum num size div !i
    | i == 0 = return accum
    | otherwise = do
        c <- newEmptyMVar
        forkIO (skynet c subNum sizeDiv div)
        s <- takeMVar c
        loop (accum+s) num size div (i-1)
      where
        !subNum = num + (i-1) * sizeDiv
        !sizeDiv = size `quot` div

skynet :: MVar Int -> Int -> Int -> Int -> IO ()
skynet c num size div
    | size == 1 = putMVar c num
    | otherwise = do
        result <- loop 0 num size div div
        putMVar c result

main :: IO ()
main = do
    start <- getCurrentTime
    c <- newEmptyMVar
    forkIO (skynet c 0 1000000 10)
    result <- takeMVar c
    end <- getCurrentTime
    putStrLn $ concat [ "Result: "
                      , show result
                      , " in "
                      , show (diffUTCTime end start) ]

Definitely not the prettiest code, but it yields similar performance compared to the go version:

$ go run skynet.go 
Result: 499999500000 in 1221 ms.
$ ghc -O2 -threaded -rtsopts mvar.hs
$ ./mvar +RTS -N4
Result: 499999500000 in 1.343052s

It's still spending way too much time in garbage collection. I'm sure the numbers can definitely improve further.

As @jb55 mentioned earlier, perhaps it's best to replace the numbers from the current haskell benchmarks with these numbers.

oblitum · 2016-05-30T03:46:25Z

@siddhanathan Yes, de facto. For example, the following Rust version, which creates a vector instead of using a range is an order of magnitude slower, or par with the current Haskell version:

extern crate time;
extern crate rayon;

use time::PreciseTime;
use rayon::prelude::*;

type T = usize;

fn skynet(levels: T, children: T) -> T {
    fn sky(levels: T, children: T, position: T) -> T {
        let childnums: Vec<_> = (0..children).collect();
        match levels {
            0 => position,
            _ => childnums.par_iter()
                 .map(|cn| sky(levels - 1, children, position * children + cn)).sum()
        }
    }
    sky(levels, children, 0)
}

fn main() {
    let start = PreciseTime::now();
    let result = skynet(6, 10);
    let end = PreciseTime::now();
    println!("Result: {} in {} ms", result, start.to(end).num_milliseconds());
}

I'd like to have a short way for creating an in-stack initialized array instead of using a vector, but still, I guess it would not change things much.

bitemyapp added 3 commits February 14, 2016 14:54

respinning the Haskell version

4ff7ac2

Parallel is most representative

44b0d32

Chan

92af4d2

bitemyapp added 2 commits February 14, 2016 15:21

cleanup

fa563ac

think I forgot RTS opts

038f8df

more results

611960a

alexkalderimis mentioned this pull request Feb 15, 2016

Concurrent Haskell Version using MVars #32

Closed

Make Haskell Use Control.Parallel

223e169

This was referenced Feb 15, 2016

Make Haskell Exploit Parallelism #35

Closed

Make Haskell Use Control.Parallel #37

Merged

ChristopherKing42 added 2 commits February 14, 2016 23:48

Improve time by about 20ms

dc9097d

Merge #26 with #37

208e358

Merge pull request #1 from ChristopherKing42/haskellParallelMerge26

1e6cf95

Merge #26 with #36

passy reviewed Feb 15, 2016
View reviewed changes

haskell/skynet.cabal

cabal-version: >=1.10

executable skynet

ghc-options: -O2 -threaded -rtsopts

Copy link

passy Feb 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💥

Add stack hidden files to .gitignore

38f2289

ChristopherKing42 reviewed Feb 15, 2016
View reviewed changes

ChristopherKing42 and others added 2 commits February 15, 2016 11:51

Move to haskell folder

254cd45

Merge pull request #2 from ChristopherKing42/stackWorkGitIgnore

0bac433

Add stack hidden files to .gitignore

atemerev added a commit that referenced this pull request Feb 15, 2016

Merge pull request #26 from bitemyapp/master

82f318a

Haskell version should be more idiomatic, use proper tools & threading, use benchmarking

atemerev merged commit 82f318a into atemerev:master Feb 15, 2016

alexbiehl mentioned this pull request Feb 24, 2016

Interesting Haskell Results #64

Open

wizzard0 mentioned this pull request Aug 1, 2016

Note out of date results #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Haskell version should be more idiomatic, use proper tools & threading, use benchmarking #26

Haskell version should be more idiomatic, use proper tools & threading, use benchmarking #26

bitemyapp commented Feb 14, 2016

bitemyapp commented Feb 14, 2016

bitemyapp commented Feb 14, 2016

codygman commented Feb 14, 2016

bitemyapp commented Feb 14, 2016

ChristopherKing42 commented Feb 15, 2016

Gabriella439 commented Feb 15, 2016

Rydgel commented Feb 15, 2016

passy Feb 15, 2016

ghost commented Feb 15, 2016

ChristopherKing42 Feb 15, 2016

Gabriella439 Feb 15, 2016

bitemyapp Feb 15, 2016

ChristopherKing42 commented Feb 15, 2016

bitemyapp commented Feb 15, 2016

oblitum commented Feb 16, 2016

oblitum commented Feb 16, 2016

oblitum commented Feb 16, 2016

bartavelle commented Feb 16, 2016

bartavelle commented Feb 16, 2016

oblitum commented Feb 16, 2016

bartavelle commented Feb 16, 2016

oblitum commented Feb 16, 2016

bitemyapp commented Feb 16, 2016

oblitum commented Feb 16, 2016

bitemyapp commented Feb 16, 2016

oblitum commented Feb 16, 2016

ChristopherKing42 commented Feb 16, 2016

jb55 commented Feb 17, 2016

Gabriella439 commented Feb 17, 2016

oblitum commented May 29, 2016

ghost commented May 29, 2016

oblitum commented May 29, 2016 •

edited

Loading

ghost commented May 30, 2016

oblitum commented May 30, 2016 •

edited

Loading

Haskell version should be more idiomatic, use proper tools & threading, use benchmarking #26

Haskell version should be more idiomatic, use proper tools & threading, use benchmarking #26

Conversation

bitemyapp commented Feb 14, 2016

bitemyapp commented Feb 14, 2016

bitemyapp commented Feb 14, 2016

codygman commented Feb 14, 2016

bitemyapp commented Feb 14, 2016

ChristopherKing42 commented Feb 15, 2016

Gabriella439 commented Feb 15, 2016

Rydgel commented Feb 15, 2016

passy Feb 15, 2016

Choose a reason for hiding this comment

ghost commented Feb 15, 2016

ChristopherKing42 Feb 15, 2016

Choose a reason for hiding this comment

Gabriella439 Feb 15, 2016

Choose a reason for hiding this comment

bitemyapp Feb 15, 2016

Choose a reason for hiding this comment

ChristopherKing42 commented Feb 15, 2016

bitemyapp commented Feb 15, 2016

oblitum commented Feb 16, 2016

oblitum commented Feb 16, 2016

oblitum commented Feb 16, 2016

bartavelle commented Feb 16, 2016

bartavelle commented Feb 16, 2016

oblitum commented Feb 16, 2016

bartavelle commented Feb 16, 2016

oblitum commented Feb 16, 2016

bitemyapp commented Feb 16, 2016

oblitum commented Feb 16, 2016

bitemyapp commented Feb 16, 2016

oblitum commented Feb 16, 2016

ChristopherKing42 commented Feb 16, 2016

jb55 commented Feb 17, 2016

Gabriella439 commented Feb 17, 2016

oblitum commented May 29, 2016

ghost commented May 29, 2016

oblitum commented May 29, 2016 • edited Loading

ghost commented May 30, 2016

oblitum commented May 30, 2016 • edited Loading

oblitum commented May 29, 2016 •

edited

Loading

oblitum commented May 30, 2016 •

edited

Loading