Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tage #443

Open
wants to merge 73 commits into
base: dev
Choose a base branch
from
Open

Tage #443

Show file tree
Hide file tree
Changes from 59 commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
ff665cf
Rebasing to dev
ABenC377 Nov 1, 2024
e81673f
Rebasing to dev
ABenC377 Nov 1, 2024
40e7709
Rebasing to dev
ABenC377 Nov 1, 2024
6fa281d
Addressing superficial comments on PR
ABenC377 Mar 5, 2024
31c871e
Clang format
ABenC377 Mar 11, 2024
110c1c6
Adding more detail to virtual flush and update functions re order of …
ABenC377 Apr 30, 2024
4b3617c
Moving buffer branch flush functionality from core.cc to PipelineBuff…
ABenC377 Apr 30, 2024
a55e292
Rebasing to dev
ABenC377 Nov 1, 2024
17c9baf
Rebasing to dev
ABenC377 Nov 1, 2024
508a2f4
Rebasing to dev
ABenC377 Nov 1, 2024
e0f8121
Rebasing to dev
ABenC377 Nov 1, 2024
af8d1a0
Rebasing to dev
ABenC377 Nov 1, 2024
f9089e0
Rebasing to dev
ABenC377 Nov 1, 2024
3e5b507
Rebasing to dev
ABenC377 Nov 1, 2024
0445478
Rebasing to dev
ABenC377 Nov 1, 2024
f49e538
Rebasing to dev
ABenC377 Nov 1, 2024
e688a05
Rebasing to dev
ABenC377 Nov 1, 2024
1f925ea
Rebasing to dev
ABenC377 Nov 1, 2024
52f9688
clang format
ABenC377 May 7, 2024
6a286d3
Rebasing to dev
ABenC377 Nov 1, 2024
d0cc56a
undoing last push
ABenC377 May 13, 2024
1c1b6ce
Updating haeders and comments
ABenC377 May 24, 2024
1b800c1
Rebasing to dev
ABenC377 Nov 1, 2024
3aa7ca0
replacing = with ==
ABenC377 Jun 4, 2024
e525016
Rebasing to dev
ABenC377 Nov 1, 2024
673fe87
clang format
ABenC377 Jul 18, 2024
ace2d59
Rebasing to dev
ABenC377 Nov 1, 2024
416cc20
Rebasing to dev
ABenC377 Nov 1, 2024
92e67a8
Rebasing to dev
ABenC377 Nov 1, 2024
f051277
Rebasing to dev
ABenC377 Nov 1, 2024
2a957cb
Rebasing to dev
ABenC377 Nov 1, 2024
0980811
Rebasing to dev
ABenC377 Nov 1, 2024
e5f52eb
undoing last push
ABenC377 May 13, 2024
37a1c34
Updating haeders and comments
ABenC377 May 24, 2024
e297e68
Rebasing to dev
ABenC377 Nov 1, 2024
6dfa36c
clang format
ABenC377 Jul 18, 2024
4155ffc
Rebasing to dev
ABenC377 Nov 1, 2024
601178b
Rebasing to dev
ABenC377 Nov 1, 2024
5031d15
Rebasing to dev
ABenC377 Nov 1, 2024
4ad630c
Rebasing to dev
ABenC377 Nov 1, 2024
d2a651b
Rebasing to dev
ABenC377 Nov 5, 2024
111a48f
Rebasing
ABenC377 Nov 14, 2024
c661294
Adding Tage config file
ABenC377 Nov 14, 2024
e8683c1
Merging in dev
ABenC377 Dec 5, 2024
00d198a
Changes to config to allow parameterisation
ABenC377 Dec 5, 2024
9f144e1
Making TAGE paramterisable
ABenC377 Dec 9, 2024
6f73f4e
Specifying size of constant 1 throughout
ABenC377 Dec 9, 2024
03d807d
Updating default A64FX config branch predictor
ABenC377 Dec 9, 2024
9afa83c
Cleaning up BranchHistory.hh comments
ABenC377 Dec 9, 2024
79a5c7f
Adding to documentation
ABenC377 Dec 9, 2024
e11742d
Adding to documentation
ABenC377 Dec 9, 2024
729de42
Adding tests
ABenC377 Dec 9, 2024
922ba4b
Actually adding the test file
ABenC377 Dec 9, 2024
dd3053a
Adjusting comments
ABenC377 Dec 9, 2024
4e1717d
clang format
ABenC377 Dec 9, 2024
bff5fa8
Finessing
ABenC377 Dec 9, 2024
22756e6
Adding include to BranchHistory.hh
ABenC377 Dec 10, 2024
eba5447
Turning around Finn's comments
ABenC377 Dec 10, 2024
768db53
Capitalising a comment
ABenC377 Dec 14, 2024
14789a6
Turning vectors for indices and tags in the ftq into shared_ptrs of a…
ABenC377 Dec 17, 2024
7dcbb16
predTable from uint8_t to int8_t
ABenC377 Dec 17, 2024
c6dc5b5
updating how predTable is handled so that btb is -1, rather than 0 (i…
ABenC377 Dec 17, 2024
f9da602
Correcting tests after optimisation
ABenC377 Dec 17, 2024
0ecdd6b
TAGE->Tage in the documentation
ABenC377 Dec 17, 2024
57f3575
Adding Tage to TX2 config file
ABenC377 Dec 17, 2024
674672f
Correcting typos in comments
ABenC377 Dec 17, 2024
800ce6f
Merge branch 'dev' into TAGE
ABenC377 Dec 17, 2024
b23429f
Adding Tage to a64fx_SME.yaml
ABenC377 Dec 17, 2024
ac0d2cf
Adjusting comments
ABenC377 Dec 17, 2024
f9ecd29
Clang format
ABenC377 Dec 17, 2024
f11661d
Merge branch 'dev' into TAGE
ABenC377 Dec 18, 2024
e00ec65
Merge branch 'dev' into TAGE
ABenC377 Dec 20, 2024
d87cc5d
Updating comments and docs in response to PR comments
ABenC377 Dec 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion configs/a64fx.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,15 @@ Queue-Sizes:
Load: 40
Store: 24
Branch-Predictor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some TX2 diagrams note it's use of a multi-history branch predictor. I assume this is TAGE-like so maybe apply this config update to the TX2 YAML as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds like it would be. I've updated the TX2 config as well.

Type: "Perceptron"
Type: "Tage"
BTB-Tag-Bits: 11
Saturating-Count-Bits: 2
Global-History-Length: 19
RAS-entries: 8
Fallback-Static-Predictor: "Always-Taken"
Tage-Table-Bits: 12
Num-Tage-Tables: 6
Tag-Length: 8
L1-Data-Memory:
Interface-Type: Fixed
L1-Instruction-Memory:
Expand Down
19 changes: 17 additions & 2 deletions docs/sphinx/developer/components/branchPred.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ The state of the branch predictor when ``predict`` is called on a branch is stor

Generic Predictor
-----------------

The algorithm(s) held within a ``BranchPredictor`` class instance can be model-specific, however, SimEng provides a ``GenericPredictor`` which contains the following logic.

Global History
Expand Down Expand Up @@ -53,4 +52,20 @@ Branch Target Buffer (BTB)
If the supplied branch type is ``Unconditional``, then the predicted direction is overridden to be taken. If the supplied branch type is ``Conditional`` and the predicted direction is not taken, then the predicted target is overridden to be the next sequential instruction.

Return Address Stack (RAS)
Identified through the supplied branch type, Return instructions pop values off of the RAS to get their branch target whilst Branch-and-Link instructions push values onto the RAS, for later use by the Branch-and-Link instruction's corresponding Return instruction.
Identified through the supplied branch type, Return instructions pop values off of the RAS to get their branch target whilst Branch-and-Link instructions push values onto the RAS, for later use by the Branch-and-Link instruction's corresponding Return instruction.

TAGE Predictor
--------------------
The ``TagePredictor`` is a TAGE predictor of the type described in https://inria.hal.science/hal-03408381/document. Unlike ``GenericPredictor`` and ``PerceptronPredictor``, this predictor uses a series of prediction tables, each of which uses an increasing global history size. E.g., the default prediction table will be indexed by the address itself, then the following tables will use global histories of length 2, 4, 8, 16, ....

Tagged prediction tables
The prediction returned from this branch predictor will be that determined by the table with the largest global history that has an entry corresponding to the given branch. To determine whether or not a table entry corresponds to the present branch or not, a hash is made from the branch's address and the global history. Each table entry has a usefulness counter which is updated when the prediction differs from the next-best prediction. On incorrect prediction, if possible, replace a non-useful entry in a table with more global history.

Default prediction table
In addition to the tagged tables, there is a non-tagged default prediction table that is used as a fall-back in the event that none of the tagged tables have an entry corresponding to a given branch. This table is much like the BTB in the ``GenericPredictor``, except that the index is determined from the truncated address only (i.e., it does not depend on the global history at all).

Global History
To accomodate larger numbers of tagged tables, global histories of greater than 64 bits are needed. Therefore, ``TagePredictor`` incorporates a new ``BranchHistory`` structure that allows global histories of unlimited size to be kept and accessed.

Return Address Stack (RAS)
Identified through the supplied branch type, Return instructions pop values off of the RAS to get their branch target whilst Branch-and-Link instructions push values onto the RAS, for later use by the Branch-and-Link instruction's corresponding Return instruction.
15 changes: 12 additions & 3 deletions docs/sphinx/user/configuring_simeng.rst
Original file line number Diff line number Diff line change
Expand Up @@ -149,13 +149,13 @@ The Branch-Prediction section contains those options to parameterise the branch
The current options include:

Type
The type of branch predictor that is used, the options are ``Generic``, and ``Perceptron``. Both types of predictor use a branch target buffer with each entry containing a direction prediction mechanism and a target address. The direction predictor used in ``Generic`` is a saturating counter, and in ``Perceptron`` it is a perceptron.
The type of branch predictor that is used, the options are ``Generic``, ``Perceptron``, and ``Tage``. Each of these types of predictor use prediction tables with each entry containing a direction prediction mechanism and a target address. The direction predictor used in ``Generic`` and ``TAGE`` is a saturating counter, and in ``Perceptron`` it is a perceptron. ``TAGE`` also uses a series of further, tagged prediction tables to provide predictions informed by greater branch histories.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a good reason behind using Tage and TAGE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not. I've udpated to Tage throughout, as this is the capitalisation used in the config yaml.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the creator uses all forms of capitalisation


BTB-Tag-Bits
The number of bits used to index the entries in the Branch Target Buffer (BTB). The number of entries in the BTB is obtained from the calculation: 1 << ``bits``. For example, a ``bits`` value of 12 would result in a BTB with 4096 entries.

Saturating-Count-Bits
Only needed for a ``Generic`` predictor. The number of bits used in the saturating counter value.
Only needed for ``Generic`` and ``Tage`` predictors. The number of bits used in the saturating counter value.

Global-History-Length
The number of bits used to record the global history of branch directions. Each bit represents one branch direction. For ``PerceptronPredictor``, this dictates the size of the perceptrons (with each perceptron having Global-History-Length + 1 weights).
Expand All @@ -164,7 +164,16 @@ RAS-entries
The number of entries in the Return Address Stack (RAS).

Fallback-Static-Predictor
Only needed for a ``Generic`` predictor. The static predictor used when no dynamic prediction is available. The options are either ``"Always-Taken"`` or ``"Always-Not-Taken"``.
Only needed for ``Generic`` and ``Tage`` predictors. The static predictor used when no dynamic prediction is available. The options are either ``"Always-Taken"`` or ``"Always-Not-Taken"``.

Tage-Table-Bits
Only needed for a ``Tage`` predictor. The number of bits used to index entries in the tagged tables. The number of entries in each of the tagged tables is obtained from the calculation: 1 << ``bits``. For examples, a ``bits`` value of 12 would result in tagged tables with 4096 entries.

Num-Tage-Tables
Only needed for a ``Tage`` predictor. The number of tagged tables used by the predictor, in addition to a default prediction table (i.e., the BTB). Therefore, a value of 3 for ``Num-Tage-Tables`` would result in four total prediction tables: one BTB and three tagged tables. If no tagged tables are desired, it is recommended to use the ``GenericPredictor`` instead.

Tage-Length
Only needed for a ``Tage`` predictor. The number of bits used to tage the entries of the tagged tables.
FinnWilkinson marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "tage" in the latter sentence meant to be that or rather "tag"


.. _l1dcnf:

Expand Down
1 change: 1 addition & 0 deletions src/include/simeng/CoreInstance.hh
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include "simeng/branchpredictors/AlwaysNotTakenPredictor.hh"
#include "simeng/branchpredictors/GenericPredictor.hh"
#include "simeng/branchpredictors/PerceptronPredictor.hh"
#include "simeng/branchpredictors/TagePredictor.hh"
#include "simeng/config/SimInfo.hh"
#include "simeng/kernel/Linux.hh"
#include "simeng/memory/FixedLatencyMemoryInterface.hh"
Expand Down
2 changes: 1 addition & 1 deletion src/include/simeng/Instruction.hh
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ struct ExecutionInfo {
* Each supported ISA should provide a derived implementation of this class. */
class Instruction {
public:
virtual ~Instruction(){};
virtual ~Instruction() {};

/** Retrieve the source registers this instruction reads. */
virtual const span<Register> getSourceRegisters() const = 0;
Expand Down
2 changes: 1 addition & 1 deletion src/include/simeng/arch/ArchInfo.hh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ namespace arch {
/** A class to hold and generate architecture specific configuration options. */
class ArchInfo {
public:
virtual ~ArchInfo(){};
virtual ~ArchInfo() {};

/** Get the set of system register enums currently supported. */
virtual const std::vector<uint64_t>& getSysRegEnums() const = 0;
Expand Down
4 changes: 2 additions & 2 deletions src/include/simeng/arch/Architecture.hh
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ struct ExceptionResult {
* cycle until complete. */
class ExceptionHandler {
public:
virtual ~ExceptionHandler(){};
virtual ~ExceptionHandler() {};
/** Tick the exception handler to progress handling of the exception. Should
* return `false` if the exception requires further handling, or `true` once
* complete. */
Expand All @@ -46,7 +46,7 @@ class Architecture {
public:
Architecture(kernel::Linux& kernel) : linux_(kernel) {}

virtual ~Architecture(){};
virtual ~Architecture() {};

/** Attempt to pre-decode from `bytesAvailable` bytes of instruction memory.
* Writes into the supplied macro-op vector, and returns the number of bytes
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,4 @@ class AlwaysNotTakenPredictor : public BranchPredictor {
private:
};

} // namespace simeng
} // namespace simeng
114 changes: 114 additions & 0 deletions src/include/simeng/branchpredictors/BranchHistory.hh
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
#pragma once

#include <cstdint>
#include <memory>

namespace simeng {
/** A class for storing a branch history. Needed for cases where a branch
* history of more than 64 bits is required. This class makes it easier to
* access and manipulate large branch histories, as are needed in
* sophisticated branch predictors.
*
* The bits of the branch history are stored in a vector of uint64_t values,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"vector" should be "array"

* and their access/manipulation is facilitated by the public functions. */

class BranchHistory {
public:
BranchHistory(uint64_t size) : size_(size) {
history_ = std::make_unique<uint64_t[]>(size_);
}

~BranchHistory() {};
FinnWilkinson marked this conversation as resolved.
Show resolved Hide resolved

/** Returns the 'numBits' most recent bits of the branch history. Maximum
* number of bits returnable is 64 to allow it to be provided in a 64-bit
* integer. */
uint64_t getHistory(uint8_t numBits) {
assert(numBits <= 64 && "Cannot get more than 64 bits without rolling");
assert(numBits <= size_ &&
"Cannot get more bits of branch history than "
"the size of the history");
return (history_[0] & ((1ull << numBits) - 1));
}

/** Returns 'numBits' of the global history folded over on itself to get a
* value of size 'length'. The global history is folded by taking an
* XOR hash with the overflowing bits to get an output of 'length' bits. */
uint64_t getFolded(uint8_t numBits, uint8_t length) {
assert(numBits <= size_ &&
"Cannot get more bits of branch history than "
"the size of the history");
uint64_t output = 0;

uint64_t startIndex = 0;
uint64_t endIndex = numBits - 1;

while (startIndex <= numBits) {
output ^= ((history_[startIndex / 64] >> startIndex) &
((1ull << (numBits - startIndex)) - 1));

// Check to see if a second uint64_t value will need to be accessed
if ((startIndex / 64) == (endIndex / 64)) {
uint8_t leftOverBits = endIndex % 64;
output ^= (history_[endIndex / 64] << (numBits - leftOverBits));
}
startIndex += length;
endIndex += length;
}

// Trim the output to the desired size
output &= (1 << length) - 1;
return output;
}

/** Adds a branch outcome ('isTaken') to the global history */
void addHistory(bool isTaken) {
for (int8_t i = size_ / 64; i >= 0; i--) {
history_[i] <<= 1;
if (i == 0) {
history_[i] |= ((isTaken) ? 1 : 0);
} else {
history_[i] |= (((history_[i - 1] & (1ull << 63)) > 0) ? 1 : 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need the conditional statement? After doing the AND you could shift right by 63 to get your 0 or 1. Would be slightly fewer cycles and more understandable/readable in my eyes (you may disagree)

Copy link
Contributor Author

@ABenC377 ABenC377 Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the conditional is needed here. Whats being loaded into the uint64 depends on where it is in the vector. All but the least-significant uint64s get the MSB of the next uint64 added as the LSB. But the least-significant uint64 gets isTaken added as the LSB. However, if I'm misunderstanding your Q LMK.

}
}
}

/** Updates the state of a branch that has already been added to the global
* history at 'position', where 'position' is 0-indexed and starts from the
* most recent history. I.e., to update the most recently added branch
* outcome, 'position' would be 0.
* */
void updateHistory(bool isTaken, uint64_t position) {
if (position < size_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assert position being < size_ as above, or are there cases where this could "validly" be greater? For instance, if you are trying to update an entry that has been lost from the history because there have been too many branches in the meantime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly as you say, I don't think that this should be an assert as the core may validly try to update a history that is no longer being tracked. The reason that we should allow this is to allow the pipeline not to need to know the size of the branch history. We're already ensuring that this doesn't cause problems with our if statement on 82.

uint8_t vectIndex = position / 64;
uint8_t bitIndex = position % 64;
bool currentlyTaken = ((history_[vectIndex] & (1ull << bitIndex)) != 0);
if (currentlyTaken != isTaken) {
history_[vectIndex] ^= (1ull << bitIndex);
}
}
}

/** Removes the most recently added branch from the history */
void rollBack() {
for (uint8_t i = 0; i <= (size_ / 64); i++) {
history_[i] >>= 1;
if (i < (size_ / 64)) {
history_[i] |= (((history_[i + 1] & 1) > 0) ? (1ull << 63) : 0);
}
}
}

private:
/** The number of bits of branch history stored in this branch history */
uint64_t size_;

/** An array containing the bits of the branch history. The bits are
* arranged such that the most recent branches are stored in uint64_t at
* index 0 of the vector, then the next most recent at index 1 and so forth.
* Within each uint64_t, the most recent branches are recorded in the
* least-significant bits. */
std::unique_ptr<uint64_t[]> history_;
};

} // namespace simeng
2 changes: 1 addition & 1 deletion src/include/simeng/branchpredictors/BranchPredictor.hh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ namespace simeng {
/** An abstract branch predictor interface. */
class BranchPredictor {
public:
virtual ~BranchPredictor(){};
virtual ~BranchPredictor() {};

/** Generate a branch prediction for the supplied instruction address, a
* branch type, and a known branch offset. Returns a branch direction and
Expand Down
2 changes: 1 addition & 1 deletion src/include/simeng/branchpredictors/GenericPredictor.hh
Original file line number Diff line number Diff line change
Expand Up @@ -86,4 +86,4 @@ class GenericPredictor : public BranchPredictor {
uint16_t rasSize_;
};

} // namespace simeng
} // namespace simeng
2 changes: 1 addition & 1 deletion src/include/simeng/branchpredictors/PerceptronPredictor.hh
Original file line number Diff line number Diff line change
Expand Up @@ -102,4 +102,4 @@ class PerceptronPredictor : public BranchPredictor {
uint64_t rasSize_;
};

} // namespace simeng
} // namespace simeng
Loading
Loading