Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another attempt at supporting non-contiguous arrays #172

Open
wants to merge 35 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
5b2e0f6
Commit only DeferredSourceModule support without changing calling beh…
Feb 19, 2018
a1525de
Smarter _new_like_me that handles discontiguous input. Have copy() u…
Feb 19, 2018
8522e7f
Allow existing kernel calls to use non-contiguous arrays by sending a…
Feb 21, 2018
028c1a6
Allow setting scalars.
Feb 21, 2018
97c6b8a
Fix kernel name.
Feb 28, 2018
aa83f43
Fix _array_like_helper to work with non-contiguous arrays.
Feb 28, 2018
d50fa84
Minor, but important fixes to index calculations!
Mar 15, 2018
a0b62fa
Let _memcpy_discontig fall back to generic assignment kernel.
Mar 15, 2018
5a9d7d2
Fix arange for complex arguments.
Mar 15, 2018
cf24d72
Prettify/rearrange kernel source.
Mar 15, 2018
65562c2
Fix contiguity-match test.
Mar 15, 2018
83a95aa
Make "slow" memcpy work for host arrays too.
Mar 22, 2018
0300f31
Just in case there are any strides == 0.
Mar 22, 2018
42804f4
Fix memcpy_discontig_slow for host-side src.
Mar 23, 2018
2cb1a92
Fix some instances of incompatible strides.
Mar 23, 2018
540516e
Compile regexes once per source object.
May 29, 2018
ca6434e
Faster _flip_negative_strides (and earlier short-circuit), plus a var…
May 31, 2018
0ead8e2
Be smarter/faster calculating keys for deferred source.
May 31, 2018
eed20f8
Fix prototype.
May 31, 2018
6f09266
Add tests for non-contiguous arrays.
May 31, 2018
a4cfd36
Don't need to transfer src to host.
May 31, 2018
ba984f4
Ignore zero-strides in singleton dimensions (which will be culled lat…
May 31, 2018
084a15d
"precalc" performance improvements.
Jun 25, 2018
6effa4d
Don't delete a variable that is used later.
Jun 25, 2018
f4bdd52
Allow DeferredSource to take a string in constructor.
Jul 5, 2018
49aa8e0
Many updates including:
Jul 5, 2018
c437bcf
Fix string interpolation bug.
Jul 5, 2018
637939f
Small performance improvement.
Jul 5, 2018
6442bb3
Fix comment.
Jul 9, 2018
88f197c
Interpolation fixes.
Jul 31, 2018
39d950a
Make sure casting works for both built-in types and classes with cons…
Jul 31, 2018
07022cd
Clean up DeferredVal implementation (_eval becomes _get, _evalbase be…
Aug 3, 2018
0411e5d
Fix contig test when only one non-contiguous argument.
Aug 27, 2018
886de8a
For elementwise arrays, always treat singleton dimensions as stride =…
Aug 27, 2018
c2413ef
Rename array_arg_inds -> elwise_arg_inds, and require that all array …
Aug 27, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions pycuda/cumath.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def f(array, stream_or_out=None, **kwargs):

func = elementwise.get_unary_func_kernel(func_name, array.dtype)
func.prepared_async_call(array._grid, array._block, stream,
array.gpudata, out.gpudata, array.mem_size)
array, out, array.mem_size)

return out
return f
Expand Down Expand Up @@ -77,7 +77,7 @@ def fmod(arg, mod, stream=None):

func = elementwise.get_fmod_kernel()
func.prepared_async_call(arg._grid, arg._block, stream,
arg.gpudata, mod.gpudata, result.gpudata, arg.mem_size)
arg, mod, result, arg.mem_size)

return result

Expand All @@ -94,7 +94,7 @@ def frexp(arg, stream=None):

func = elementwise.get_frexp_kernel()
func.prepared_async_call(arg._grid, arg._block, stream,
arg.gpudata, sig.gpudata, expt.gpudata, arg.mem_size)
arg, sig, expt, arg.mem_size)

return sig, expt

Expand All @@ -111,7 +111,7 @@ def ldexp(significand, exponent, stream=None):

func = elementwise.get_ldexp_kernel()
func.prepared_async_call(significand._grid, significand._block, stream,
significand.gpudata, exponent.gpudata, result.gpudata,
significand, exponent, result,
significand.mem_size)

return result
Expand All @@ -129,7 +129,7 @@ def modf(arg, stream=None):

func = elementwise.get_modf_kernel()
func.prepared_async_call(arg._grid, arg._block, stream,
arg.gpudata, intpart.gpudata, fracpart.gpudata,
arg, intpart, fracpart,
arg.mem_size)

return fracpart, intpart
2 changes: 1 addition & 1 deletion pycuda/curandom.py
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,7 @@ def rand(shape, dtype=np.float32, stream=None):
raise NotImplementedError;

func.prepared_async_call(result._grid, result._block, stream,
result.gpudata, np.random.randint(2**31-1), result.size)
result, np.random.randint(2**31-1), result.size)

return result

Expand Down
Loading