If conditions for logic? #151

polarathene · 2017-06-14T06:44:32Z

polarathene
Jun 14, 2017

There are methods like all_true, is_zero, etc that return an array with a single element of 0 or 1. We should not transfer that to the host to be able to do a true/false comparison right? Instead do I just create a constant or array with [1,1,1,1] dims containing one element 0 and another one with 1 instead for these comparisons?(which puts the conditional in JIT kernel instead of switching briefly to CPU?

In my program I will do an eq() and if there were any values at 1/true I'd want to do something such as store the value in an array(row/col extraction with a join to a results array?) or if I have the number of results I want stop and transfer the results to the host.

I'm not sure what the C++ api is like for this, I think their conditionals look like native logic, so perhaps the Rust wrapper could benefit from these two boolean array primitives? Maybe it already exists but is not easy to discover? I could put example usage with examples for methods I mentioned that return these results in the docs.

    let results = eq(&array_af, &match_value, true);
    let is_match = any_true(&results, 0);

    // No primitive so I need to make this?
    let true_af = constant(1, Dim4::new(&[1,1,1,1]));

    // Will fail, so perhaps that idea was pointless :P
    if &is_match == &true_af {
        println!("Success!"); // Transfer data to host
    }

    // JIT can't work with above for boolean if statement? should it take the array and a closure?
    // Or can it report back a boolean result somehow?(not sure if it'd work with JIT still?)
    if &is_match.is_true()  {
        println!("Success!"); // Transfer data to host
    }

Answered by 9prady9

Jun 16, 2017

@UltimaRatio @polarathene

There are two versions of reduction based functions always(w.r.t ArrayFire)

Reduce operation takes place a given specific dimension. Functions such as all_true, any_true etc. - note that these functions take an additional integer argument that is used to specify the dimension along with reduction operation will take place.
Reduce the whole array. Functions such as all_true_all, any_true_all etc.

When you reduce the whole array i.e. case (2), obviously you can use it as part of conditional statements as the return variable of such function is of native type thus allowing usage in such scenarios. However, for case (1) there is no guarantee that the final output a…

View full answer

ghost · 2017-06-14T14:55:23Z

ghost
Jun 14, 2017

not sure i have really understood your problem, but sum_all may help you.

0 replies

polarathene · 2017-06-14T15:01:43Z

polarathene
Jun 14, 2017
Author

@UltimaRatio I wanted to do an if statement to decide what to do based on the result of a method that returns an array with 0 or 1 as the only value(all_true(), any_true(), iszero(), etc). Since they were an array I could not have an if condition against true/false, similar with other methods like count_zeroes().

I solved it for now with locate(), if it's elements() is not empty then it is true. sum_all() might work too :)

0 replies

9prady9 · 2017-06-16T04:47:55Z

9prady9
Jun 16, 2017
Maintainer

@UltimaRatio @polarathene

There are two versions of reduction based functions always(w.r.t ArrayFire)

Reduce operation takes place a given specific dimension. Functions such as all_true, any_true etc. - note that these functions take an additional integer argument that is used to specify the dimension along with reduction operation will take place.
Reduce the whole array. Functions such as all_true_all, any_true_all etc.

When you reduce the whole array i.e. case (2), obviously you can use it as part of conditional statements as the return variable of such function is of native type thus allowing usage in such scenarios. However, for case (1) there is no guarantee that the final output array is going to have only one element always, thus an Array return type.

In any case, I believe what you are looking for is any_true_all in the code snippet you shared above.

Array object can't be used as a condition in conditional statements, even in ArrayFire C++ API that still holds true.

General thumb rule is NOT to use indexing(index_gen, lookup, locate etc. ) to access individual elements from GPU memory. They are very expensive operations and should be avoided.

0 replies

polarathene · 2017-06-16T05:14:18Z

polarathene
Jun 16, 2017
Author

@9prady9 Thanks for that. Any reason it returns f64? It only seems to return 0/1 so shouldn't the tuple be (bool, bool)? I wasn't aware that that the dim specific methods could return multiple values, in my code test I only had 1 element as the result of any_true. I thought the 0/1 value was equivalent of true / false answer for if any of the elements were true along that dim, I guess we can only get that via all dims.

I was providing the result of eq() which was already true/false element values, so if that would return the same result as my eq() it seems a bit weird? I guess there is some use for it behaving like that, that I am not aware of.

General thumb rule is NOT to use indexing(index_gen, lookup, locate etc. ) to access individual elements from GPU memory. They are very expensive operations and should be avoided.

That's good to be aware of, a list of these(I guess set_intersect is another) would be good to know. I'm going to submit code for my current project to upstream repo issue soonish, as under a certain workload ArrayFire is causing a whine/squeal(not that high pitched) sound from my machine. The two methods that seem to cause it are locate and some heavy(at least in my program compared to everything else) arithmetic(includes bitshift and xor).

On larger arrays the noise is not present, on the smaller ones which I guess could be due to processing them at a faster rate(or allocating/deallocating on the GPU?) the two function calls are additive to the noise caused, locate being the louder one. It is not my fan or CPU, GPU wasn't under heavy load or using much memory for that workload so it must have something to do with those methods being called more frequently.

0 replies

9prady9 · 2017-06-16T05:33:17Z

9prady9
Jun 16, 2017
Maintainer

@polarathene Thats true, the rust API can be improved in that aspect. I will try to fix this in rust wrapper's 3.4.4 crate release.

This was initially written with f64 following the C API which looks like af_sum_all(double real, double imag, af_array arr) instead of af_sum_all(void* out, af_array arr). The upstream will fix this issue in next major release of ArrayFire as it involves API change.

0 replies

9prady9 · 2017-06-16T10:23:38Z

9prady9
Jun 16, 2017
Maintainer

@polarathene I have created a separate issue for this change, please follow it's progress over there. #154

0 replies

polarathene · 2017-06-20T04:27:31Z

polarathene
Jun 20, 2017
Author

@9prady9 I've replaced locate with any_true_all like so:

let (is_match, _) = af::any_true_all(&results);

if is_match > 0.0 {
    println!("Success! {:?}", is_match);
}

I've noticed that locate added about 20 seconds to my test data in my program that would otherwise take 20 secs without the is_match check, so total time 40 seconds. any_true_all helps but still seems quite expensive adding 12 seconds.

Any idea how to reduce this? The eq() operation seems to be fast but having some way to know if I had a true result for a conditional is expensive(due to device to host transfer maybe?) This is running 26^8 permutations through an algorithm then checking each time for a true value(of which the test only involves one). I tile an array of permutations and modify it so that I am only running the algorithm 1,352 times each a different portion of that permutation set.

Is there a better way to handle this perhaps on the GPU to know when to stop and transfer? The current approach while sending minimal data from device to host for the conditional still seems quite expensive to run.

0 replies

9prady9 · 2017-06-20T05:35:14Z

9prady9
Jun 20, 2017
Maintainer

Can you please share the code. A a smaller code snippet to reproduce the same behaviour on our side is more helpful.

0 replies

polarathene · 2017-06-20T09:00:40Z

polarathene
Jun 20, 2017
Author

@9prady9 Yeah I intend to as soon as I've got the code in a more organized/digestable state :) Still learning rust and moving it into struct with smaller functions has been a frustrating process due to the borrow checker.

I'll put the project on github in the next few days hopefully where you can run it for yourself and see this issue and possibly the other one I've experienced with noise(coil whine I think) when set to a specific array size. CUDA backend doesn't work at certain sizes that work fine with OpenCL I think, I've noticed the issue and PR upstream, those might prevent crashing on CUDA backend with larger array sizes.

I would provide a smaller snippet but I'm not sure how to go about that, this specific issue with conditional logic needs to have enough work to process for a while, like I currently have with my test data that takes 20 seconds. The only addition to this is passing the array to locate() or any_true_all() each time after computation, it's millions of u64 values as only row elements. Unless the eq() statement is optimized out during JIT with the locate/any_true_all() commented out. I'm getting the time result from my terminal with how long the process took to run/complete until it exits.

0 replies

9prady9 · 2017-06-20T09:20:04Z

9prady9
Jun 20, 2017
Maintainer

@polarathene A lot of fixes have been going into devel and we will soon push for v3.5 release and I will update rust wrapper the week following v3.5 release. Hopefully, most of the errors you see now will be fixed then.

I will try to add some timing mechanism to rust wrapper if possible. I don't think timing the entire program's run is a good representative of the average run time of your algorithm.

0 replies

polarathene · 2017-06-20T10:08:58Z

polarathene
Jun 20, 2017
Author

@9prady9 It's not the best way no :) I just know that timing the ArrayFire logic isn't too reliable right? Plus the first JIT isn't clear for me when it's done, all I know is until the JIT part is done it'd be slower initially, from what is said on here. In results below it's either been cached or the JIT part takes very little time that it doesn't matter.

This is my usual timing macro:

macro_rules! before_after {
    ($label:expr, $($thing:tt)*) => (
        let start = Instant::now();
        
        $($thing)*
        
        println!("{} took: {} sec, {} ms", $label, start.elapsed().as_secs(), (start.elapsed().subsec_nanos() as f64 / 1_000_000f64) );
    )
}

//Usage
before_after!("test",
    for _ in (0 .. total_permutations).into_iter() {
        self.next_permutation(&mut permutations_af, batch_cols); // permutates af::Array
        self.compute_fn(&permutations_af, length_in_bytes); // processes and optionally does conditional check
    };
);

Results:
Each chunk processed 104_413_532_288 u64 rows/elements, or 154_457_888 per array sent to the compute_fn()

With any_true_all() enabled

ArrayFire v3.4.2 (OpenCL, 64-bit Linux, build 8e5a00d0)
[0] NVIDIA  : GeForce GTX 1070, 8105 MB

length_in_bytes: 8
permutations: 676
test took: 14 sec, 709.831802 ms
finished a chunk

length_in_bytes: 8
permutations: 676
Success! 1
test took: 14 sec, 723.172401 ms
finished a chunk

done
                                                                                                                                                                                                                              
/storage/projects/rust/af_permutator 30s

any_true_all() and if condition that uses it commented out(the function contains some constant and eq() calls but without any_true_all() involved it's same perf as commenting out the function call:

ArrayFire v3.4.2 (OpenCL, 64-bit Linux, build 8e5a00d0)
[0] NVIDIA  : GeForce GTX 1070, 8105 MB

length_in_bytes: 8
permutations: 676
test took: 8 sec, 528.203455 ms
finished a chunk

length_in_bytes: 8
permutations: 676
test took: 0 sec, 135.180271 ms
finished a chunk

done
                                                                                                                                                                                                                              
/storage/projects/rust/af_permutator 18s

Obviously a bit fishy with the 2nd chunk there. Adding af::sync(0) before the println!() in the macro gives more accurate timing:

ArrayFire v3.4.2 (OpenCL, 64-bit Linux, build 8e5a00d0)
[0] NVIDIA  : GeForce GTX 1070, 8105 MB

length_in_bytes: 8
permutations: 676
test took: 8 sec, 569.368971 ms
finished a chunk

length_in_bytes: 8
permutations: 676
test took: 8 sec, 520.276805 ms
finished a chunk

done
                                                                                                                                                                                                                              
/storage/projects/rust/af_permutator 18s

Results for with any_true_all() condition is the same with this macro as shown earlier without af::sync(0). Seems accurate enough, but when you have multiple devices in play perhaps that'd make timing more difficult.

0 replies

9prady9 · 2017-06-20T10:21:33Z

9prady9
Jun 20, 2017
Maintainer

@polarathene You should probably look at https://github.com/arrayfire/arrayfire-rust/blob/devel/examples/pi.rs example that times PI computation. Since the PI computation code calls sum_all which copies the final reduction result to host, it doesn't need an explicit af::sync call before we call start.elapsed().

If there are multiple devices in play, it is probably better to move computation of each device into separate thread and sycn the corresponding device before start.elapsed is called on that thread. v3.4.2 is not thread safe yet, but soon to be released v3.5 ArrayFire is going to have threading support.

0 replies

polarathene · 2017-06-20T10:39:49Z

polarathene
Jun 20, 2017
Author

@9prady9 That'd make sense why the any_true_all() results were the same across chunks whereas without it, af::sync(0) was required to get proper timing information.

v3.5 sounds good :) I'll let you know when I have the source on github, hopefully there is a way to know when to stop processing early and return the results without the condition causing such a perf impact. At 18s it's competitive against hashcat equivalent(20s with result compare logic). Only taking twice as long as Hashcat is still good I guess as they have optimized their code very well, JIT may not be able to get that close in performance.

0 replies

9prady9 · 2017-06-20T10:43:33Z

9prady9
Jun 20, 2017
Maintainer

Once you upload your code to github, may be we can suggest some improvements that can further speed up the code.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If conditions for logic? #151

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 14 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

If conditions for logic? #151

polarathene Jun 14, 2017

Replies: 14 comments

ghost Jun 14, 2017

polarathene Jun 14, 2017 Author

9prady9 Jun 16, 2017 Maintainer

polarathene Jun 16, 2017 Author

9prady9 Jun 16, 2017 Maintainer

9prady9 Jun 16, 2017 Maintainer

polarathene Jun 20, 2017 Author

9prady9 Jun 20, 2017 Maintainer

polarathene Jun 20, 2017 Author

9prady9 Jun 20, 2017 Maintainer

polarathene Jun 20, 2017 Author

9prady9 Jun 20, 2017 Maintainer

polarathene Jun 20, 2017 Author

9prady9 Jun 20, 2017 Maintainer

polarathene
Jun 14, 2017

ghost
Jun 14, 2017

polarathene
Jun 14, 2017
Author

9prady9
Jun 16, 2017
Maintainer

polarathene
Jun 16, 2017
Author

9prady9
Jun 16, 2017
Maintainer

9prady9
Jun 16, 2017
Maintainer

polarathene
Jun 20, 2017
Author

9prady9
Jun 20, 2017
Maintainer

polarathene
Jun 20, 2017
Author

9prady9
Jun 20, 2017
Maintainer

polarathene
Jun 20, 2017
Author

9prady9
Jun 20, 2017
Maintainer

polarathene
Jun 20, 2017
Author

9prady9
Jun 20, 2017
Maintainer