-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Makes prediction work on GPUs #149
base: master
Are you sure you want to change the base?
Conversation
To help those who Google, the error message you get ends with
Though it does seem there are other problems too when you run on GPU. |
I fixed all the other problems I am aware of at this point. On my machine, it runs about 8x faster on one GPU. |
@@ -20,6 +20,13 @@ import array | |||
from libc.stdint cimport uint16_t, uint32_t, uint64_t, uintptr_t, int32_t | |||
|
|||
import numpy | |||
try: | |||
import cupy | |||
to_numpy = cupy.asnumpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can avoid this dependency check by relying on Thinc for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about this?
def to_numpy(a):
if thinc.neural.util.is_cupy_array(a):
import cupy
return cupy.asnumpy(a)
else:
return a
That way there isn't a conditional import, but we still have to import cupy
. I'm not that familiar with thinc, but the thinc source does not use cupy.asnumpy()
anywhere, so there probably isn't a good wrapper.
cdef int n = 0 | ||
embed_arr = numpy.zeros(self.static_vectors.shape[1], dtype='float32') | ||
for token in span: | ||
if token.lower not in PUNCTS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all about removing the call to numpy.zeros()
. Once I had replaced it with sum()
, the code collapsed into just those two lines. Other than the location of the output vector, it should perform exactly the same way.
@thomwolf Can you take a look at this PR? I am trying to use |
I could get it to work on GPU using this fix, thanks @dirkgr! |
Hi guys this is awesome work is it possible to merge the PR ? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Looks like the answer is "no". |
Sorry for the late response to this. The PR closed automatically but I think this is valuable work so I reopened. We're probably going to work on a closer integration with |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Any updates on this? |
Would love some updates on this! |
I want to keep this PR open as the code may be useful for those who want to build from source and try this out. However moving forward, when spaCy v.3 will be released, we'll update this code significantly to be compatible with Thinc 8. At that point, GPU support will be automatic... |
For those trying to make the pipeline work with GPU support on spaCy > 2.1, here's the additional step for patching prior to installing from source (after activating your venv and git clone https://github.com/huggingface/neuralcoref.git
cd neuralcoref
git fetch origin pull/149/head:gpufix
git checkout gpufix
pip install -r requirements.txt
pip install -e . |
Any progress on spaCy v.3 integration? |
You mean the v3 that was released yesterday? ;-) It's definitely on our roadmap, but it's not the only thing we're working on ;-) |
@svlandeg Friendly ping :) |
Hi! Please refer to #295 (comment) for more info :-) |
When you use
numpy.zeros
to create theembed_arr
, you can't later add it toembed_vector
, becauseembed_vector
might not be a numpy array. This re-works the code such that the array type ofembed_vector
is preserved all the way through.I stole this approach from explosion/spaCy#3362. Thanks, @danielkingai2!