If conditions for logic? #151
-
There are methods like all_true, is_zero, etc that return an array with a single element of 0 or 1. We should not transfer that to the host to be able to do a true/false comparison right? Instead do I just create a constant or array with [1,1,1,1] dims containing one element In my program I will do an I'm not sure what the C++ api is like for this, I think their conditionals look like native logic, so perhaps the Rust wrapper could benefit from these two boolean array primitives? Maybe it already exists but is not easy to discover? I could put example usage with examples for methods I mentioned that return these results in the docs. let results = eq(&array_af, &match_value, true);
let is_match = any_true(&results, 0);
// No primitive so I need to make this?
let true_af = constant(1, Dim4::new(&[1,1,1,1]));
// Will fail, so perhaps that idea was pointless :P
if &is_match == &true_af {
println!("Success!"); // Transfer data to host
}
// JIT can't work with above for boolean if statement? should it take the array and a closure?
// Or can it report back a boolean result somehow?(not sure if it'd work with JIT still?)
if &is_match.is_true() {
println!("Success!"); // Transfer data to host
} |
Beta Was this translation helpful? Give feedback.
Replies: 14 comments
-
not sure i have really understood your problem, but sum_all may help you. |
Beta Was this translation helpful? Give feedback.
-
@UltimaRatio I wanted to do an if statement to decide what to do based on the result of a method that returns an array with 0 or 1 as the only value(all_true(), any_true(), iszero(), etc). Since they were an array I could not have an if condition against true/false, similar with other methods like count_zeroes(). I solved it for now with locate(), if it's elements() is not empty then it is true. sum_all() might work too :) |
Beta Was this translation helpful? Give feedback.
-
There are two versions of reduction based functions always(w.r.t ArrayFire)
When you reduce the whole array i.e. case (2), obviously you can use it as part of conditional statements as the return variable of such function is of native type thus allowing usage in such scenarios. However, for case (1) there is no guarantee that the final output array is going to have only one element always, thus an Array return type. In any case, I believe what you are looking for is any_true_all in the code snippet you shared above. Array object can't be used as a condition in conditional statements, even in ArrayFire C++ API that still holds true. General thumb rule is NOT to use indexing(index_gen, lookup, locate etc. ) to access individual elements from GPU memory. They are very expensive operations and should be avoided. |
Beta Was this translation helpful? Give feedback.
-
@9prady9 Thanks for that. Any reason it returns f64? It only seems to return 0/1 so shouldn't the tuple be (bool, bool)? I wasn't aware that that the dim specific methods could return multiple values, in my code test I only had 1 element as the result of any_true. I thought the 0/1 value was equivalent of true / false answer for if any of the elements were true along that dim, I guess we can only get that via all dims. I was providing the result of eq() which was already true/false element values, so if that would return the same result as my eq() it seems a bit weird? I guess there is some use for it behaving like that, that I am not aware of.
That's good to be aware of, a list of these(I guess set_intersect is another) would be good to know. I'm going to submit code for my current project to upstream repo issue soonish, as under a certain workload ArrayFire is causing a whine/squeal(not that high pitched) sound from my machine. The two methods that seem to cause it are locate and some heavy(at least in my program compared to everything else) arithmetic(includes bitshift and xor). On larger arrays the noise is not present, on the smaller ones which I guess could be due to processing them at a faster rate(or allocating/deallocating on the GPU?) the two function calls are additive to the noise caused, locate being the louder one. It is not my fan or CPU, GPU wasn't under heavy load or using much memory for that workload so it must have something to do with those methods being called more frequently. |
Beta Was this translation helpful? Give feedback.
-
@polarathene Thats true, the rust API can be improved in that aspect. I will try to fix this in rust wrapper's 3.4.4 crate release. This was initially written with |
Beta Was this translation helpful? Give feedback.
-
@polarathene I have created a separate issue for this change, please follow it's progress over there. #154 |
Beta Was this translation helpful? Give feedback.
-
@9prady9 I've replaced let (is_match, _) = af::any_true_all(&results);
if is_match > 0.0 {
println!("Success! {:?}", is_match);
} I've noticed that locate added about 20 seconds to my test data in my program that would otherwise take 20 secs without the Any idea how to reduce this? The eq() operation seems to be fast but having some way to know if I had a true result for a conditional is expensive(due to device to host transfer maybe?) This is running 26^8 permutations through an algorithm then checking each time for a true value(of which the test only involves one). I tile an array of permutations and modify it so that I am only running the algorithm 1,352 times each a different portion of that permutation set. Is there a better way to handle this perhaps on the GPU to know when to stop and transfer? The current approach while sending minimal data from device to host for the conditional still seems quite expensive to run. |
Beta Was this translation helpful? Give feedback.
-
Can you please share the code. A a smaller code snippet to reproduce the same behaviour on our side is more helpful. |
Beta Was this translation helpful? Give feedback.
-
@9prady9 Yeah I intend to as soon as I've got the code in a more organized/digestable state :) Still learning rust and moving it into struct with smaller functions has been a frustrating process due to the borrow checker. I'll put the project on github in the next few days hopefully where you can run it for yourself and see this issue and possibly the other one I've experienced with noise(coil whine I think) when set to a specific array size. CUDA backend doesn't work at certain sizes that work fine with OpenCL I think, I've noticed the issue and PR upstream, those might prevent crashing on CUDA backend with larger array sizes. I would provide a smaller snippet but I'm not sure how to go about that, this specific issue with conditional logic needs to have enough work to process for a while, like I currently have with my test data that takes 20 seconds. The only addition to this is passing the array to locate() or any_true_all() each time after computation, it's millions of u64 values as only row elements. Unless the eq() statement is optimized out during JIT with the locate/any_true_all() commented out. I'm getting the time result from my terminal with how long the process took to run/complete until it exits. |
Beta Was this translation helpful? Give feedback.
-
@polarathene A lot of fixes have been going into devel and we will soon push for v3.5 release and I will update rust wrapper the week following v3.5 release. Hopefully, most of the errors you see now will be fixed then. I will try to add some timing mechanism to rust wrapper if possible. I don't think timing the entire program's run is a good representative of the average run time of your algorithm. |
Beta Was this translation helpful? Give feedback.
-
@9prady9 It's not the best way no :) I just know that timing the ArrayFire logic isn't too reliable right? Plus the first JIT isn't clear for me when it's done, all I know is until the JIT part is done it'd be slower initially, from what is said on here. In results below it's either been cached or the JIT part takes very little time that it doesn't matter. This is my usual timing macro: macro_rules! before_after {
($label:expr, $($thing:tt)*) => (
let start = Instant::now();
$($thing)*
println!("{} took: {} sec, {} ms", $label, start.elapsed().as_secs(), (start.elapsed().subsec_nanos() as f64 / 1_000_000f64) );
)
}
//Usage
before_after!("test",
for _ in (0 .. total_permutations).into_iter() {
self.next_permutation(&mut permutations_af, batch_cols); // permutates af::Array
self.compute_fn(&permutations_af, length_in_bytes); // processes and optionally does conditional check
};
); Results: With any_true_all() enabled
any_true_all() and if condition that uses it commented out(the function contains some constant and eq() calls but without any_true_all() involved it's same perf as commenting out the function call:
Obviously a bit fishy with the 2nd chunk there. Adding af::sync(0) before the println!() in the macro gives more accurate timing:
Results for with any_true_all() condition is the same with this macro as shown earlier without af::sync(0). Seems accurate enough, but when you have multiple devices in play perhaps that'd make timing more difficult. |
Beta Was this translation helpful? Give feedback.
-
@polarathene You should probably look at https://github.com/arrayfire/arrayfire-rust/blob/devel/examples/pi.rs example that times PI computation. Since the PI computation code calls If there are multiple devices in play, it is probably better to move computation of each device into separate thread and sycn the corresponding device before start.elapsed is called on that thread. v3.4.2 is not thread safe yet, but soon to be released v3.5 ArrayFire is going to have threading support. |
Beta Was this translation helpful? Give feedback.
-
@9prady9 That'd make sense why the any_true_all() results were the same across chunks whereas without it, af::sync(0) was required to get proper timing information. v3.5 sounds good :) I'll let you know when I have the source on github, hopefully there is a way to know when to stop processing early and return the results without the condition causing such a perf impact. At 18s it's competitive against hashcat equivalent(20s with result compare logic). Only taking twice as long as Hashcat is still good I guess as they have optimized their code very well, JIT may not be able to get that close in performance. |
Beta Was this translation helpful? Give feedback.
-
Once you upload your code to github, may be we can suggest some improvements that can further speed up the code. |
Beta Was this translation helpful? Give feedback.
@UltimaRatio @polarathene
There are two versions of reduction based functions always(w.r.t ArrayFire)
When you reduce the whole array i.e. case (2), obviously you can use it as part of conditional statements as the return variable of such function is of native type thus allowing usage in such scenarios. However, for case (1) there is no guarantee that the final output a…