Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teeny-tiny performance improvement #11

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Andrei-Aksionov
Copy link
Contributor

Instead of indexing positional embeddings we can slice them. It has couple of benefits:

  1. Looks cleaner
  2. When indexing - returns a new tensor (plus a new tensor each time is created with torch.arange command). In contrast with slicing a view of a tensor is returned (basically the same underlying data).
ptr = lambda x: x.storage().data_ptr()

x = torch.nn.Embedding(128, 256)
out_slicing = x.weight[:3]
out_indexing = x(torch.tensor(range(3)))

print(x, "", ptr(x.weight))
print(out_slicing.shape, ptr(out_slicing))
print(out_indexing.shape, ptr(out_indexing))
print(f"out_slicing equals to out_indexing: {torch.equal(out_slicing, out_indexing)}")
--------------------------------------------------------------------------------------
(example output):
>> Embedding(128, 256)  140351087181824
>> torch.Size([3, 256]) 140351087181824
>> torch.Size([3, 256]) 140350970082304
>> out_slicing equals to out_indexing: True

As you can see when indexing the returned tensor has a different underlying data storage, where after slicing - the same.
"Slicing creates a view of the tensor, which shares the underlying data but contains information about the memory offsets used for the visible data. This avoids having to copy the data frequently, which makes a lot of operations much more efficient"[1]

Instead of indexing positional embeddings we can slice them. It has
couple of benefits:
1. Looks cleaner
2. When indexing - returns new tensor (plus a new tensor each time is
   created with torch.arange comand). In contrast with slicing a view of
   a tensor is returned (basically the same underlying data)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant