- Added
- Model a arbitrary fixed latency between LLC cache and Memory controller
- Changed
- For Ramulator and DRAMSim3, memory access request is split into MEM_BUS_WIDTH sized parts and latency for each part is queried
- Fixed
- Rounding mode (rm) must be calculated again before executing FP instruction during simulation
- Added
- Comprehensive logging support
- Command-line option
-sim-file-path
to specify a top-level directory to store statistics and log files - Command-line option
-sim-file-prefix
to specify prefix appended to all simulator generated files - Command-line option
-sim-emulate-after-icount
to specify the number of instruction to simulate after starting simulation mode - DRAMsim3 support
- Ramulator support
- Sample MARSS-RISCV configuration files for a 64-bit RISC-V In-order and Out-of-order SoC in configs folder
- More performance counters to count different types of load instructions (byte, half-word, word, double-word)
- Time-stamp to all the statistics files generated by the simulator
- Specify latency in CPU cycles for RISC-V
SYSTEM
class instructions in the config file - Counter to track the number of CPU pipeline flushes
- Counters to tracks each type of software exceptions and hardware interrupt processed during simulation
- Parallel build support for Makefile
- During the simulation,
mtime
is calculated using simulation clock cycles - Specify frequency for CPU and RTC device via the config file
- Add option
flush_on_context_switch
in the config file to enable/disable flushing of BPU on a context switch - Start fetching the target from the next cycle on branch misprediction
- Load for non-word quantities (byte and half-word) take an extra one cycle on cache-hit
- Add function to invalidate entries in mem_request_queue on the miss-speculated path
- Changed
- Re-factor and modularize simulator code-base
- STORE type instructions submit write-request to L1-data cache and exit memory stage in a single cycle
- Delay for reading/writing page-table entries is now simulated via L1-Data cache
- Print IPC for all the RISC-V CPU modes after simulation completes to the console and log file
- In-order core doesn't support parallel execution in multiple functional units
- Replace hot-cold LRU eviction policy with bit-PLRU eviction policy for BTB and caches
- Improve the format of TinyEMU config file
- Update MARSS-RISCV Docs
- Update README.md
- Page walk delays are simulated via L1 D$
- Removed DRAMsim2 support
- Fixed
- Memory leaks
- Don't start simulating DRAM access delay until cache lookup delay is simulated
- Branch entry is added to BTB, only after the branch is resolved
- Added
- Print TLB stats to the terminal after the simulation completes
- Specify latency for each FPU ALU instruction (
fadd
,fsub
,fmul
,fdiv
,fmin
,fmax
,fcvt
,cvt
,fle
,flt
,feq
,fsgnj
,fqsrt
,fmv
,fclass
) via TinyEMU config file - Figure showing the high-level overview of MARSS-RISCV in README.md
- Changed
- Simplify the base DRAM model
- All memory accesses simulate a fixed latency
mem_access_latency
- Any subsequent accesses to the same physical page occupies a lower delay, which is roughly 60 percent of the fixed
mem_access_latency
- More info here
- All memory accesses simulate a fixed latency
- Parallel operation of functional units can be enabled or disabled in the in-order core via TinyEMU config file
- Clean exception handling code
- Simulate page table entry read/write delays directly via memory controller using a configurable fixed latency
pte_rw_latency
- Don't stall the pipeline stage for the write request to complete on the memory controller
- Make FPU-ALU non-pipelined
- Rename
dram_dispatch_queue
tomem_request_queue
- Update MARSS-RISCV Docs
- Update README.md
- Update TinyEMU config file here
- Simplify the base DRAM model
- Fixed
- memory leaks
- Added
- Support for separate RISC-V Bios and Kernel
- Command line option
flush-sim-mem
to flush simulator memory hierarchy on every fresh simulation run - Command line option
sim-trace
to generate instruction commit trace during simulation - Distinct configurable read-hit and write-hit latency for all the caches
- Return address stack (RAS)
- Branch prediction and speculative execution support for out of order core
- Print performance counters on terminal when the simulation completes
- More performance counters:
- Instruction types
- ecall
- page walks for loads, stores and instructions
- memory controller delay for data and instructions
- hardware interrupts
- Changed
- Port to TinyEMU version
2019-12-21
- For bimodal branch predictor, store prediction bits in a separate Branch history table (BHT)
- For in-order core, non-memory instructions can forward their result from MEM stage in addition to EX stage
- For in-order core, relaxed interlocking on WAW data hazard
- Simplified out of order core design, ROB slots are now used as physical registers along with a single rename table and a single global issue queue
- Port to TinyEMU version
- Fixed
- Correctly calculated the rounding mode for floating pointing instruction decoding
- Converted
c.addiw
result buffer intoint32_t
on 64-bit simulation - Set the data type to
unint64_t
for 64-bit simulation, for the buffer which holds the memory address for atomic instructions - Issue #13 and #14 (thanks to Okhotnikov Grigory)
- Added
- Added DRAMSim2 support
- Changed
- Flush all the CPU caches and DRAM models for every new simulation run
- Fixed
- Issue #8: useless cleaning of local variables
- Added
- Add 16550A UART support (thanks to Marc Gauthier)
- Add a timestamp suffix to the stats file
- Changed
- Reworked the dram latency parameters to match the Sifive HiFive U540 Board
- Increased the dram dispatch queue size from 32 to 64
- Fixed
- Calculation of hardware page walk latency
- Miscalculation in page fault counters
- Issue #2: memory leaks in copy_file
- Issue #3: 'log' instead 'log2'