Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

word-wise dma copy #175

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

word-wise dma copy #175

wants to merge 2 commits into from

Conversation

ricrpi
Copy link
Contributor

@ricrpi ricrpi commented Aug 9, 2016

If DMA transfers are word aligned then a more efficient copy can be done. I have not seen a dma transfer that is not word aligned however I have not extensively checked all roms so have left the safer byte-wise implementation in.

}
if (((dram_address | rom_address | longueur) & 3) != 0)
{
for(i = 0; i < longueur; ++i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I suggest to add a tab in from of the for loop to keep code readable?

@Narann
Copy link
Member

Narann commented Aug 9, 2016

I have not extensively checked all roms so have left the safer byte-wise implementation in.

From the manual:

All DMA transactions in the Nintendo 64 must use 64 bit aligned for data in RDRAM. DMA transactions for data in ROM must use 16 bit aligned addresses. )

So, to keep code clean, I suggest to remove old code and assume DMA are always 16 bits aligned.

What do you think?

@wareya
Copy link

wareya commented Aug 9, 2016

Depends on what the console actually does. Manuals are known to be "best practice for stability/speed" instead of "the hardware requires this".

I would keep the former behavior until someone pops in with a hardware test.

@Narann
Copy link
Member

Narann commented Aug 9, 2016

Thinking about it, you're surely right, if it's technically possible we can't assume it will not happen...

@richard42
Copy link
Member

I'm a little hesitant to merge this because it increases the code complexity a fair amount, for a performance gain on the raspberry pi / arm architectures. Do you have any quantitative data on the performance improvement from using this aligned memcpy in real world scenarios? If it's just moving a few hundred or thousand bytes per VI then it's probably not worth it. But if these loops are run for millions of iterations per VI then it would be.

@ricrpi
Copy link
Contributor Author

ricrpi commented Aug 16, 2016

The performance increase is 1 to 2 % faster (timing to frame 501), depending on ROM on a Pi B.
It isn't much of an improvement but given Zelda OOT intro is only running at ~70% full speed it is significant.

It would be interesting to know if any ROMS require byte-wise copy as if none do then the word-wise only implementation would be simpler than the current byte-wise copy.

@loganmc10
Copy link
Member

There is merit to this. RSP DMA transfers must be 8-byte aligned. The N64 ignores the lower 3 bits of the source and destination address. It also ignores the lower 3 bits of the length (or rather, it assumes the lower 3 bits are all 1's, then it adds 1).

As long as we properly mask the source and destination addresses, the transfers should always be 8-byte aligned.

@ricrpi I don't know if you're still around. I'm far less confident in the alignment of the PI DMA's, but I am very confident in the 8-byte alignment of the RSP DMA's. If you update this PR to just to 8-byte aligned RSP DMA's (with no fallback) I'd be willing to do a fair bit of testing on it

@ricrpi
Copy link
Contributor Author

ricrpi commented Oct 21, 2017

I am trying to keep up with all the commits/issues but have very limited time to do any coding at the moment :-(

I will try to take a look at this though as it shouldn't take too long to patch in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants