Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permission needs to be elevated during eviction #150

Merged
merged 1 commit into from
Sep 15, 2024
Merged

Conversation

HanJinChi
Copy link
Member

这个PR试图解决如下的问题:
一个三级缓存包含私有的L1-A、L1-B、L2-A、L2-B以及末级缓存LLC,地址X存在于二级缓存L2-B和LLC中,B线程从L1-B向L2-B发起acquire 地址Y的请求,它在L2-B拿到X对应缓存集合的控制权以及X的缓存锁并要求踢出X, 此时A线程LLC向L2-B也发起了probe X的请求,那么可能会出现如下情况:
(1)B线程在L2-B向L1-B发起了probe X的请求,在发起probe时将L2-B的X对应的meta解锁
(2)A线程在L2-B拿到X的meta锁,也向L1-B发起了probe X的请求,因此也将L2-B中X的meta解了锁,因此同时存在两个线程同时对L1-B发起了probe请求
(3)B线程先完成L1-B的probe X请求,之后回到二级缓存踢出了X并将meta对应的地址改为了Y(即完成了acquire的请求),A线程返回L2-B后将缓存块无效掉(也就是将Y无效了,Y存在于L1-B中但不存在于L2-B中了)

我觉得这个问题存在的原因是在acquire需要驱逐地址的时候,由于evict发起的probe可能会将meta解锁,便出现了两个线程同时想驱逐一个地址的情况,因此我想的解决方案时在需要evict一个地址时将set的权限升级(具体可查看修改的文件),对于 uncached的cache和 L1 cache由于它们不会向上probe因此不用升级set

这个问题可以通过分支 debug-evict复现

  • 首先编译:
    MODE=debug make regression -j
  • 之后重复运行该程序,直到发生assert报错(这个时间大概在1分钟)
    bash run.sh regression/multi-l3-msi
    可以看到是probe_resp的assert报错,即对应上述的A线程在从L1-B probe返回后发现meta对应的地址不再是X而是Y

@wsong83 wsong83 added this pull request to the merge queue Sep 15, 2024
@wsong83
Copy link
Member

wsong83 commented Sep 15, 2024

应该是acquire先拿了锁,然后已经发了probe并unlock了,然后probe抢进来又发了probe。不过解决思路应该的确是这样。Good job!

Merged via the queue into master with commit 6a640b9 Sep 15, 2024
1 check passed
@wsong83 wsong83 deleted the fix-evict branch September 15, 2024 09:02
@HanJinChi HanJinChi mentioned this pull request Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants