Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes -trace-vis doens't work with -unified-gpus and -use-unified-memory #99

Open
JunjoFor opened this issue Oct 16, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@JunjoFor
Copy link

To Reproduce
MGPUSim version of commit ID: 1596eea

Command that recreates the problem
./fft -unified-gpus=1,2,3,4 -timing -use-unified-memory -trace-vis

Current behavior
The simulation crashes with a Panic: UNIQUE constraint failed: trace.task_id.

Full error:

Trace is Collected in Database: akita_trace_cs7orts4va1ev4r88to0.sqlite3
Monitoring simulation with http://localhost:37431
{msg_7823990_e2e 7823990_req_out msg_e2e msg_e2e PCIe.EndPoint[1] 0.0005163 0.00051645 [] 0xc03b8e9a40 <nil>}
2024/10/16 11:59:06 /home/mgpusim/driver/driver.go:117: Panic: UNIQUE constraint failed: trace.task_id
goroutine 176 [running]:
runtime/debug.Stack()
        /usr/lib/go-1.18/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
        /usr/lib/go-1.18/src/runtime/debug/stack.go:16 +0x19
github.com/sarchlab/mgpusim/v3/driver.(*Driver).runEngine.func1()
        /home/mgpusim/driver/driver.go:118 +0x58
panic({0xa748c0, 0xc042005800})
        /usr/lib/go-1.18/src/runtime/panic.go:838 +0x207
github.com/sarchlab/akita/v3/tracing.(*SQLiteTraceWriter).Flush(0xc00007fac0)
        /home/go/pkg/mod/github.com/sarchlab/akita/[email protected]/tracing/sqlite.go:73 +0x399
github.com/sarchlab/akita/v3/tracing.(*SQLiteTraceWriter).Write(0xc00007fac0, {{0xc03733d9d0, 0xf}, {0xc03739c2d0, 0x24}, {0xab5275, 0x7}, {0x9deac4, 0x12}, {0xc00011a5e0, ...}, ...})
        /home/go/pkg/mod/github.com/sarchlab/akita/[email protected]/tracing/sqlite.go:47 +0xe5
github.com/sarchlab/akita/v3/tracing.(*DBTracer).EndTask(0xc00007fd00, {{0xc03767e6a0, 0xf}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...})
        /home/go/pkg/mod/github.com/sarchlab/akita/[email protected]/tracing/dbtracer.go:83 +0x182
github.com/sarchlab/akita/v3/tracing.(*traceHook).Func(0xc000100000?, {{0x7f9b196f54b0, 0xc00027b860}, 0xfdb0a0, {0xa8dbe0, 0xc03768dcb0}, {0x0, 0x0}})
        /home/go/pkg/mod/github.com/sarchlab/akita/[email protected]/tracing/tracehook.go:39 +0x23e
github.com/sarchlab/akita/v3/sim.(*HookableBase).InvokeHook(0xa8dbe0?, {{0x7f9b196f54b0, 0xc00027b860}, 0xfdb0a0, {0xa8dbe0, 0xc03768dcb0}, {0x0, 0x0}})
        /home/go/pkg/mod/github.com/sarchlab/akita/[email protected]/sim/hook.go:81 +0xc4
github.com/sarchlab/akita/v3/tracing.EndTask({0xc03767e6a0, 0xf}, {0xcc9b80, 0xc00027b860})
        /home/go/pkg/mod/github.com/sarchlab/akita/[email protected]/tracing/api.go:151 +0x155
github.com/sarchlab/akita/v3/tracing.TraceReqFinalize({0xcc4a80?, 0xc03738e2a0?}, {0xcc9b80, 0xc00027b860})
        /home/go/pkg/mod/github.com/sarchlab/akita/[email protected]/tracing/api.go:208 +0x5a
github.com/sarchlab/akita/v3/mem/vm/addresstranslator.(*AddressTranslator).parseTranslation(0xc00027b860, 0x3f4123c0f1c4a050)
        /home/go/pkg/mod/github.com/sarchlab/akita/[email protected]/mem/vm/addresstranslator/addresstranslator.go:221 +0x45a
github.com/sarchlab/akita/v3/mem/vm/addresstranslator.(*AddressTranslator).runPipeline(0xc00027b860, 0xcc4c80?)
        /home/go/pkg/mod/github.com/sarchlab/akita/[email protected]/mem/vm/addresstranslator/addresstranslator.go:83 +0x69
github.com/sarchlab/akita/v3/mem/vm/addresstranslator.(*AddressTranslator).Tick(0xc00027b860, 0xc0003088e0?)
        /home/go/pkg/mod/github.com/sarchlab/akita/[email protected]/mem/vm/addresstranslator/addresstranslator.go:67 +0x32
github.com/sarchlab/akita/v3/sim.(*TickingComponent).Handle(0xc0003088e0, {0xcc8c90?, 0xc041e8b080?})
        /home/go/pkg/mod/github.com/sarchlab/akita/[email protected]/sim/ticker.go:140 +0x45
github.com/sarchlab/akita/v3/sim.(*SerialEngine).Run(0xc0000ea480)
        /home/go/pkg/mod/github.com/sarchlab/akita/[email protected]/sim/serialengine.go:96 +0x367
github.com/sarchlab/mgpusim/v3/driver.(*Driver).runEngine(0xc0000b31e0)
        /home/mgpusim/driver/driver.go:125 +0xaa
created by github.com/sarchlab/mgpusim/v3/driver.(*Driver).runAsync
        /home/mgpusim/driver/driver.go:108 +0x18a
{7687625 7687522@GPU[3].SA[15].L1ICache cache_transaction read GPU[3].SA[15].L1ICache.Local 0.00051437 0.000514374 [] <nil> <nil>}
error: atexit handler error: UNIQUE constraint failed: trace.task_id
{7687625 7687522@GPU[3].SA[15].L1ICache cache_transaction read GPU[3].SA[15].L1ICache.Local 0.00051437 0.000514374 [] <nil> <nil>}
error: atexit handler error: UNIQUE constraint failed: trace.task_id

Expected behavior
The simulation doesn't crash

Additional context
It also happens when I try to run fir with the same options and extended length

@JunjoFor JunjoFor added the bug Something isn't working label Oct 16, 2024
@syifan
Copy link
Contributor

syifan commented Oct 21, 2024

@JunjoFor The current implementation of unified memory is buggy. We are currently working on a project that will entirely revamp the implementation of unified memory. Do you need this problem urgently solved? I may give you some suggestions on how to avoid this problem.

@JunjoFor
Copy link
Author

JunjoFor commented Oct 21, 2024

The visualisation of the tasks is cool, but not a must for me. I reported the issue to let you know. Nice to know that you are working on revising the unified memory implementation. I would like to implement a L3 TLB shared by all GPUs like in https://arxiv.org/pdf/2404.18361 in the near future. I don't know how the revamp might interfere with the implementation of that (with the idea of merging that into the main repository one day).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants