Q: Most performant way to update an entire column or row to a scalar value? #145
Replies: 8 comments
-
Can you provide a simple input and output ? Instead of 300 million rows may be chose 10 ? I am trying to understand if there is existing functions to do this efficiently. |
Beta Was this translation helpful? Give feedback.
-
If I understand this correctly there may be a really fast way to do this. I don't know how you would do this in rust, but the fastest (and least memory used way) to do this would be to do the following in C++.
If the following issue goes through, you can achieve same behavior using But if you don't want to use an additional output buffer, raise an issue upstream to add this behavior to |
Beta Was this translation helpful? Give feedback.
-
@pavanky Here is an example based on my existing code, sorry about the wait. With the given result at the end, I would say add a col of the same row height, dims just add a dimension via join. But to batch instead of create all 3 tiles/permutations for 3^4, I'd only be updating that new column(1st column when merged) to be a(97), then process, then change to b(98), and so forth. If the characters to be permutated were not "abc" but "qa7" this isn't as simple as adding by 1. fn get_permutations_col(input_cols: &af::Array, new_col: &af::Array) -> af::Array {
let rows_a = input_cols.dims().get()[0];
let rows_b = new_col.elements() as u64;
// Repeat the existing cols rows, repeat the new col row to match existing cols(then flatten into single col)
// eg With charset "abc"(97, 98, 99),
// input_cols is len 2(3^2) + 1 more col == 3^3 in rows for final dims
let right_cols = &tile( input_cols, Dim4::new(&[rows_b, 1, 1, 1]) );
let left_col = &flat(&tile( new_col, Dim4::new(&[rows_a, 1, 1, 1]) ));
// merge new_cols to be the first col on the left
join(1, left_col, right_cols)
}
fn generate_permutations_abc() {
let range_a: Vec<u8> = (b'a'..b'c'+1).collect();
let range_b: Vec<u8> = (b'a'..b'c'+1).collect();
let dims_1 = Dim4::new(&[3, 1, 1, 1]); // first col
let dims_n = Dim4::new(&[1, 3, 1, 1]); // additional cols
let mut range_a_af = af::Array::new(&range_a, dims_1);
let range_b_af = af::Array::new(&range_b, dims_n);
af_print!("range_a_af:", range_a_af);
// [3 1 1 1]
// 97
// 98
// 99
af_print!("range_b_af:", range_b_af);
// [1 3 1 1]
// 97 98 99
range_a_af = get_permutations_col(&range_a_af, &range_b_af);
range_a_af = get_permutations_col(&range_a_af, &range_b_af);
af_print!("3^3 == 27 permutations aaa(97,97,97) -> ccc(99,99,99): ", range_a_af);
// [27 3 1 1]
// 97 97 97
// 97 97 98
// 97 97 99
// 97 98 97
// 97 98 98
// 97 98 99
// 97 99 97
// 97 99 98
// 97 99 99
// 98 97 97
// 98 97 98
// 98 97 99
// 98 98 97
// 98 98 98
// 98 98 99
// 98 99 97
// 98 99 98
// 98 99 99
// 99 97 97
// 99 97 98
// 99 97 99
// 99 98 97
// 99 98 98
// 99 98 99
// 99 99 97
// 99 99 98
// 99 99 99
} |
Beta Was this translation helpful? Give feedback.
-
@pavanky I am using pretty much the same code as above to generate strings, and then my other ArrayFire logic to use this large array for computation. I've tiled it to increase the rows to 300 mil from 26^5("a" to "z" with 5 columns permutates to about 11mil rows). When around the 300 mil mark I've noticed GPU memory is around 6-7GB, some of that may be due to other ArrayFire logic being affected by the size, I also have a an array generated on the host that does a range to provide index values multipled against eq() result. Removing the 0's will provide me with the indices I'm interested in(though this process is slow on such a large array with current AF methods). I will convert the host created array containing index values to the AF |
Beta Was this translation helpful? Give feedback.
-
@pavanky When you refer to tile not allocating memory, is this a better option than using constant? At least in this case for creating the boolean cond array for targeting a single column. |
Beta Was this translation helpful? Give feedback.
-
I want an example to show where you are exactly replacing the values. The example doesnt show that. |
Beta Was this translation helpful? Give feedback.
-
@pavanky With the example set, replace any column with a new value? Middle column to
All permutations for this keyspace 3^5 has been completed/batched. If the processing of the array batch found all matches before completing, the full set of permutations do not need to finish. Would you like code example of this with |
Beta Was this translation helpful? Give feedback.
-
Closing due to inactivity. Please reopen if the question is still pertinent. |
Beta Was this translation helpful? Give feedback.
-
TL;DR: How to efficiently update a large column or row with a single value many times without creating similar sized arrays that use up GPU memory?
Example size and two approaches I can think of
If I have an array with 300 million rows and 5 columns, and I want to update all elements in a column to a shared value multiple times, what is the best way to go about this?
set_col()
with a constant array with matching row size for each value I want to replace the column with. This sounds like it would waste vRAM if I have a lot of these.replace_scalar()
which would work but require an array of [300mil, 5, 1,1] dims where one column is 0 and the rest 1?( Requiring either 5 of these to target each column or updating that cond array withset_col()
and two [300mil, 1, 1, 1] const arrays with values 0 or 1).How are you trying to apply this?
I am building up string permutations with
tile()
andflat()
up to a certain size that the GPU memory allows. This has proven much faster than on CPU now that I understand how to create it effectively on GPU with ArrayFire :) When tiling needs to stop, I have all permuations for that string length. In order to do longer lengths, I am then batching the permutations, processing one batch at a time.All permutations of "aaa" to "zzz", 26^3(17,576), dims of [1,3,1,1] (I'm not sure if I should prefer columns or rows for this)
Array contains all values ranging:
If the batching were to start at length of 4, I would then add a column with 26^3 rows all set to "a"(0x97 ASCII byte). I would then update all values in this column to get "b" and so forth, processing each array "batch" with the rest of the ArrayFire logic I have.
Array contains all values ranging:
Why not just add the column value by x?
In this example you would say that the column could just use addition of 1, which makes more sense and should be more performant than above two approaches. This could work best perhaps by extracting the col from array, adding against a constant array of same size with value 1, then set_col back to original, followed by repeating last two steps until 0x122/"z". When values do not increment by 1, this does not work as well?
I want to support custom charsets(which might apply to only specific columns not all), not just "a" to "z", I've not seen a way with ArrayFire to set all values of an arrays column to a value or index of another array(where I could just iterate/loop through another AF array or CPU arrays index for values.
Should I raise feature request on main repo?
If there is not a good way to approach this currently with AF, I could go to the main repo and raise a feature request? I will be contributing this part of my project as an example to the repo in future once I have it in a good working state :)
Beta Was this translation helpful? Give feedback.
All reactions