Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add local memory CUDA support to Kernel #126

Merged
merged 3 commits into from
Oct 30, 2018

Conversation

adityapb
Copy link
Contributor

No description provided.

@pep8speaks
Copy link

pep8speaks commented Oct 17, 2018

Hello @adityapb! Thanks for updating the PR.

Comment last updated on October 24, 2018 at 10:47 Hours UTC

@adityapb adityapb closed this Oct 17, 2018
@adityapb adityapb reopened this Oct 17, 2018
Copy link
Contributor

@prabhuramachandran prabhuramachandran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we should discuss this a bit. In the meanwhile can you address the comments I have made?

@@ -255,14 +259,17 @@ def get_struct_from_instance(self, obj):
helper = CStructHelper(obj)
return helper.get_code() + '\n'

def parse(self, obj):
def parse(self, obj, local_decl=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this too much and am not sure why you need to do this? Can you explain why the signature of the object method does not have the necessary information?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if this is going to be added, we should have a proper docstring as the context here is totally non-obvious. For the case of parse itself, obj is somewhat obvious but not the local_decl as it seems very CUDA specific.

else:
raise TypeError('Unsupported type to wrap: %s' % obj_type)
self._local_decl = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be set even when the type error is raised?

@@ -532,6 +539,26 @@ def visit_FunctionDef(self, node):
args = self._get_function_args(node)
body = '\n'.join(self._indent_block(self.visit(item))
for item in self._remove_docstring(node.body))
local_decl = ''
if self._local_decl:
decls = ['extern LOCAL_MEM float shared_buff[];']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very strange, why is it declared as a float?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I am missing something with the way you define and use local memory in CUDA but I think this needs to be properly explained.

if self._local_decl:
decls = ['extern LOCAL_MEM float shared_buff[];']
for arg, dtype in self._local_decl.items():
if len(decls) == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But len(decls) will always be 1 according to above!

'&%(prev_arg)s[size_%(prev_arg)s];')
local_decl = local_decl % {'dtype': dtype, 'arg': arg,
'prev_arg': prev_arg}
decls.append(local_decl)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole block of code is not at all intentional and it is very difficult to see what you are trying to do. I think it really needs to be refactored.

@@ -693,3 +694,141 @@ def __init__(self, detect_type=ocl_detect_type, known_types=None):

def _get_self_type(self):
return KnownType('GLOBAL_MEM %s*' % self._class_name)


class CUDAConverter(CConverter):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to reuse as much code as possible, derive this from OpenCLConverter so we don't have to update things in two places.

def _get_self_type(self):
return KnownType('GLOBAL_MEM %s*' % self._class_name)

def _get_function_args(self, node):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, please try to refactor to minimize cut/pasting.

Copy link
Contributor

@prabhuramachandran prabhuramachandran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this is cleaner but do want no cut/pasting of code as it makes it very hard to maintain. Please make the suggested changes. If possible, squash your unnecessary commits.

return code

def visit_FunctionDef(self, node):
assert node.args.vararg is None, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again a lot of similar code that is cut pasted. Any cut pasting makes it very hard to update things later. I don't mind adding an empty method to the base class that is only used for CUDA but you need to think about reducing cut/pasting and try to avoid sending PRs with any of it. :) Also please measure coverage for just the translator alone and ensure that the translator has almost 100% coverage.

@adityapb adityapb force-pushed the cpy-jit branch 2 times, most recently from 8c554ab to 21e79e1 Compare October 24, 2018 11:04
Copy link
Contributor

@prabhuramachandran prabhuramachandran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better. I am sorry to ask for more changes but the alignment can be a cause for subtle issues and errors, so I think it should be handled cleanly. You do have the local size information, so would it be possible to compute a 4 byte aligned quantity and use that for the size? At the very least this should be clearly documented.

prev_arg = arg
else:
local_decl = ('%(dtype)s* %(arg)s = (%(dtype)s*) '
'&%(prev_arg)s[size_%(prev_arg)s];')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one big problem here if the sizes are not aligned to 4 bytes as discussed in the cuda programming guide. So this should perhaps be carefully documented or perhaps a warning given if there is an alignment problem.

@prabhuramachandran
Copy link
Contributor

I've filed an issue #130, for you to address. I will merge this for now. Thanks!

@prabhuramachandran prabhuramachandran merged commit cf63c32 into pypr:master Oct 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants