Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] NEON, SVE2 and SME2 instruction support with tests #439

Open
wants to merge 71 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
51ade58
Fixed execution logic for UMINP and UMAXP neon instructions.
FinnWilkinson Oct 7, 2024
6a11d7d
Implemented ldrsb (32-bit, Post) instruction with test.
FinnWilkinson Oct 7, 2024
520324c
Fixed implementation of NEON CMHS instruction.
FinnWilkinson Oct 8, 2024
2b4a886
Implemented UCVTF (fixed-point to float) instruction with test.
FinnWilkinson Oct 9, 2024
e43ada7
Implemented UCVTF (fixed-point to float) helper function.
FinnWilkinson Oct 9, 2024
4773af8
Implemented UDOT (by element) NEON instructions with tests.
FinnWilkinson Oct 10, 2024
50a8a20
Implemented LD1 (NEON 8h x2, post index) instruction with tests.
FinnWilkinson Oct 10, 2024
6696d5f
Implemented NEON UMLAL (32 to 64 bit) instruction with tests.
FinnWilkinson Oct 10, 2024
bb5096a
Implemented NEON UMLAL2 (32 to 64 bit) instruction with tests.
FinnWilkinson Oct 10, 2024
09d6506
Implemented NEON ST1 (single vector, post index) instruction with tests.
FinnWilkinson Oct 11, 2024
f6e7c03
Implemented NEON LD1 (single vector, post index, 8b) instruction with…
FinnWilkinson Oct 11, 2024
74e9b47
Implemented SVE LD1RQB (imm offset) instruction with tests.
FinnWilkinson Oct 14, 2024
4daf705
Implemented SVE LD1RQB (reg offset) instruction with tests.
FinnWilkinson Oct 14, 2024
810a324
Implemented SVE UDOT (4-way, indexed) instruction and tests.
FinnWilkinson Oct 14, 2024
2db08ae
Implemented SVE ZIP1+2 (byte) instructions and tests.
FinnWilkinson Oct 14, 2024
7ac89e8
Implemented SVE faddv (float and double) instructions and tests.
FinnWilkinson Oct 14, 2024
bb73761
Implemented SVE PTRUE (as counter) instructions with tests.
FinnWilkinson Oct 14, 2024
9febab0
Added paciasp and autiasp empty execution logic.
FinnWilkinson Oct 24, 2024
b45d8d7
Implemented NEON UMULL (uint16 to uint32) instruction and tests.
FinnWilkinson Oct 25, 2024
6383d98
Implemented RDSVL and tests.
FinnWilkinson Oct 25, 2024
6416237
Implemented ZERO {zt0} instruction with test.
FinnWilkinson Oct 25, 2024
9a3dc35
Implemented ld1d (4 consec vecs, uint64) SVE instruction with tests, …
FinnWilkinson Oct 25, 2024
4873657
Implemented ld1d (2 consec vecs, uint64) SVE instruction with tests.
FinnWilkinson Oct 25, 2024
b82ec90
Implemented SME mova (tile to vec, 4 regs, 8-bit) instruction with te…
FinnWilkinson Oct 28, 2024
89d7501
Implemented pred-as-counter to pred_as_mask function, and added unit …
FinnWilkinson Oct 28, 2024
ad5bd87
Implemented st1d (2 consec vecs, uint64) SVE2 instruction with tests.
FinnWilkinson Oct 28, 2024
9bf115a
Implemented st1d (2 consec vecs, uint64, scalar offset) SVE2 instruct…
FinnWilkinson Oct 28, 2024
ff8bb58
Implemented LD1W (2 vec and 4 vec, imm offset) SVE2 instructions with…
FinnWilkinson Oct 29, 2024
c40e9f4
Implemented LD1W (2 vec, scalar offset) SVE2 instruction with tests.
FinnWilkinson Oct 29, 2024
5f4fd1c
Implemented ST1W (2 vec, imm and scalar offset) SVE2 instructions wit…
FinnWilkinson Oct 29, 2024
7a717e1
Implemented LD1B (2 vec, imm and scalar offset) SVE2 instructions wit…
FinnWilkinson Oct 29, 2024
6dca410
Implemented UMPOA (8-bit to 32-bit widening uint) SME instruction wit…
FinnWilkinson Oct 29, 2024
8b1f9e7
Implemented LD1B (4 vec, imm offset) SVE2 instruction with tests.
FinnWilkinson Oct 29, 2024
5325d3f
Implemented UDOT (4-way, VGx4 8-bit to 32-bit widening, indexed vecto…
FinnWilkinson Oct 30, 2024
7125a40
Implemented MOVA (array to vecs, 4 registers) SME instruction with te…
FinnWilkinson Oct 30, 2024
c6da568
Implemented ST1W (4 vec, imm offset) SVE2 instructions with tests.
FinnWilkinson Oct 30, 2024
7e2f9a4
Fixed SVE udot execution logic.
FinnWilkinson Oct 30, 2024
6772b66
Fixed issue with LD1B SVE2 (4 vec) instruction.
FinnWilkinson Oct 30, 2024
ab80ba7
Implemented FMLA (float, double, VGx4, indexed) SME instruction with …
FinnWilkinson Oct 31, 2024
9e762b8
Implemented st1d (4 consec vecs, uint64, imm offset) SVE2 instruction…
FinnWilkinson Oct 31, 2024
7de0082
Added NEON bf16 UDOT (by element) instruction execution logic and BF1…
FinnWilkinson Oct 31, 2024
14a79d8
Implemented ld1b (4 strided vectors, imm and reg offset) instructions…
FinnWilkinson Nov 1, 2024
2db03bc
Implemented UVDOT (VGx4 8-bit to 32-bit widening, indexed vector) SME…
FinnWilkinson Nov 1, 2024
68038b7
Implemented ST4W (imm offset) SVE instruction with tests.
FinnWilkinson Nov 1, 2024
4a8f3f6
Implemented LD1W (4 vec, scalar offset) SVE2 instruction with tests.
FinnWilkinson Nov 1, 2024
3d5b288
Implemented FMLA (float, VGx4) SME instruction with tests.
FinnWilkinson Nov 1, 2024
b9dcabe
Implemented MOVA (array to vecs, 2 registers) SME instruction with te…
FinnWilkinson Nov 4, 2024
b988e01
Implemented FADD (float, vgx2) SME instruction with tests.
FinnWilkinson Nov 4, 2024
4f75ffe
Implemented LD1D (4 vec, scalar offset) SVE2 instruction with tests.
FinnWilkinson Nov 4, 2024
f35472b
Implemented FMLA (double, VGx4) SME instruction with tests.
FinnWilkinson Nov 4, 2024
1bf3306
Implemented FADD (double, vgx2) SME instruction with tests.
FinnWilkinson Nov 4, 2024
4effde4
Implemented LD1H (Single vec, imm offset) SVE instruction with tests.
FinnWilkinson Nov 4, 2024
40bba12
Added SVE bf16 DOT (indexed) instruction execution logic.
FinnWilkinson Nov 4, 2024
3932360
Implemented LD1H (two vec, imm and scalar offset) SVE instruction wit…
FinnWilkinson Nov 4, 2024
5aad523
Implemented BFMOPA (widening) SME instruction.
FinnWilkinson Nov 4, 2024
430c775
Minor UMAXP fix.
FinnWilkinson Nov 4, 2024
a01c2fc
Fixed function comment.
FinnWilkinson Nov 4, 2024
9790c6e
Updated BF16 comment.
FinnWilkinson Nov 5, 2024
5bc9330
Implemented NEON UDOT (by vector) instruction with tests.
FinnWilkinson Nov 6, 2024
1fd130c
Implemented SVE UDOT (by vector, 4-way) instruction with tests.
FinnWilkinson Nov 6, 2024
81ddba7
Implemented SVE ST4W (scalar offset) instruction with tests, and chan…
FinnWilkinson Nov 6, 2024
4c99a0f
Implemented LD1B (4 vec, scalar offset) SVE2 instruction with tests.
FinnWilkinson Nov 6, 2024
0d74234
Implemented UDOT (4-way, VGx4 8-bit to 32-bit widening) SME instructi…
FinnWilkinson Nov 7, 2024
40a0fa4
Implemented ADD (uint32, vgx2, vectors and ZA), SME instruction with …
FinnWilkinson Nov 7, 2024
950de41
Implemented ZIP (4 vectors) SVE2 instruction with tests.
FinnWilkinson Nov 7, 2024
03a95e7
Attended PR comments.
FinnWilkinson Dec 10, 2024
6729363
Minor bug fixes.
FinnWilkinson Dec 13, 2024
850b741
Attended PR comments.
FinnWilkinson Dec 16, 2024
1d04096
Updated multi-vector load logic.
FinnWilkinson Dec 18, 2024
246d39a
CI CD fixes.
FinnWilkinson Dec 20, 2024
0ec0b8d
CI CD fixes pt2.
FinnWilkinson Dec 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ option(SIMENG_SANITIZE "Enable compiler sanitizers" OFF)
option(SIMENG_OPTIMIZE "Enable Extra Compiler Optimizations" OFF)
option(SIMENG_ENABLE_SST "Compile SimEng SST Wrapper" OFF)
option(SIMENG_ENABLE_SST_TESTS "Enable testing for SST" OFF)
option(SIMENG_ENABLE_BF16 "Enable __bf16 instruction execution logic" OFF)

# Set CXX flag for Apple Mac so that `binary_function` and `unary_function` types that are used in SST can be recognised.
# They were deprecated in C++11 and removed in C++17, and Apple Clang v15 no longer supports these types without the following flag
Expand Down Expand Up @@ -155,10 +156,9 @@ if(SIMENG_ENABLE_TESTS)

# Print message containing if the full test suite will run
if (${LLVM_PACKAGE_VERSION} VERSION_LESS "14.0")
message(STATUS "LLVM version does not support AArch64 extensions SME or SVE2. These test suites will be skipped.")
endif()
if (${LLVM_PACKAGE_VERSION} VERSION_LESS "18.0")
message(STATUS "LLVM version does not support AArch64 extensions SME2. These test suites will be skipped.")
message(STATUS "LLVM version does not support AArch64 extensions SVE2, SVE2.1, SME, or SME2. Related tests will fail.")
ABenC377 marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we place preprocessor directives around the SME tests? I though it was just a SVE vs SVE2 problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a similar problem with SME and SME2

elseif (${LLVM_PACKAGE_VERSION} VERSION_LESS "18.0")
message(STATUS "LLVM version does not support AArch64 extensions SME2 or SVE2.1. Related test will fail.")
endif()

else()
Expand Down
3 changes: 2 additions & 1 deletion src/include/simeng/arch/aarch64/ArchInfo.hh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ class ArchInfo : public simeng::arch::ArchInfo {
aarch64_sysreg::AARCH64_SYSREG_MIDR_EL1,
aarch64_sysreg::AARCH64_SYSREG_CNTVCT_EL0,
aarch64_sysreg::AARCH64_SYSREG_PMCCNTR_EL0,
aarch64_sysreg::AARCH64_SYSREG_SVCR}),
aarch64_sysreg::AARCH64_SYSREG_SVCR,
aarch64_sysreg::AARCH64_SYSREG_TPIDR2_EL0}),
zaSize_(config["Core"]["Streaming-Vector-Length"].as<uint16_t>() / 8) {
// Generate the architecture-defined architectural register structure
archRegStruct_ = {
Expand Down
34 changes: 34 additions & 0 deletions src/include/simeng/arch/aarch64/Instruction.hh
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,40 @@ enum class InsnType : uint32_t {
isBranch = 1 << 14
};

/** Convert Predicate-as-Counter to Predicate-as-Masks.
* T represents the element type (i.e. for pg.s, T = uint32_t).
* V represents the number of vectors the predicate-as-counter is being used
* for. */
template <typename T, int V>
std::vector<std::array<uint64_t, 4>> predAsCounterToMasks(
const uint64_t predAsCounter, const uint16_t VL_bits) {
std::vector<std::array<uint64_t, 4>> out(V, {0, 0, 0, 0});

const uint16_t elemsPerVec = VL_bits / (sizeof(T) * 8);
// Get predicate-as-counter information
const bool invert = (predAsCounter & 0b1000000000000000) != 0;
const uint64_t predElemCount =
(predAsCounter & static_cast<uint64_t>(0b0111111111111111)) >>
static_cast<uint8_t>(std::log2f(sizeof(T)) + 1);

for (int r = 0; r < V; r++) {
for (uint16_t i = 0; i < elemsPerVec; i++) {
// Move bit to next position based on element type
uint64_t shifted_active = 1ull << ((i % (64 / sizeof(T))) * sizeof(T));
// If invert = True (invert bit = 1), predElemCount dictates number of
// initial inactive elements.
// If invert = False (invert bit = 0), it indicates the number of initial
// active elements.
if (static_cast<uint64_t>(r * elemsPerVec) + i < predElemCount) {
out[r][i / (64 / sizeof(T))] |= (invert) ? 0 : shifted_active;
} else {
out[r][i / (64 / sizeof(T))] |= (invert) ? shifted_active : 0;
}
}
}
return out;
}

/** A basic Armv9.2-a implementation of the `Instruction` interface. */
class Instruction : public simeng::Instruction {
public:
Expand Down
17 changes: 17 additions & 0 deletions src/include/simeng/arch/aarch64/helpers/float.hh
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,23 @@ D fcvtzu_integer(srcValContainer& sourceValues) {
return result;
}

/** Helper function for SCALAR/FP instructions with the format ucvtf rd, rn
* #fbits.
* D represents the destination register type (e.g. for Sd, D = float).
* N represents the source register type (e.g. for Xn, N = uint32_t).
* Returns single value of type D. */
template <typename D, typename N>
D ucvtf_fixedToFloat(
srcValContainer& sourceValues,
const simeng::arch::aarch64::InstructionMetadata& metadata) {
// Convert Fixed-Point to FP
// Using algorithm from
// https://embeddedartistry.com/blog/2018/07/12/simple-fixed-point-conversion-in-c/
const N xn = sourceValues[0].get<N>();
const N fbits = static_cast<N>(metadata.operands[2].imm);
return (static_cast<D>(xn) / static_cast<D>(1ull << fbits));
}

} // namespace aarch64
} // namespace arch
} // namespace simeng
71 changes: 69 additions & 2 deletions src/include/simeng/arch/aarch64/helpers/neon.hh
Original file line number Diff line number Diff line change
Expand Up @@ -568,9 +568,14 @@ RegisterValue vecUMaxP(srcValContainer& sourceValues) {
const T* n = sourceValues[0].getAsVector<T>();
const T* m = sourceValues[1].getAsVector<T>();

// Concatenate the vectors
T temp[2 * I];
memcpy(temp, n, sizeof(T) * I);
memcpy(temp + (sizeof(T) * I), m, sizeof(T) * I);
// Compare each adjacent pair of elements
T out[I];
for (int i = 0; i < I; i++) {
out[i] = std::max(n[i], m[i]);
out[i] = std::max(temp[2 * i], temp[2 * i + 1]);
}
return {out, 256};
}
Expand All @@ -585,9 +590,14 @@ RegisterValue vecUMinP(srcValContainer& sourceValues) {
const T* n = sourceValues[0].getAsVector<T>();
const T* m = sourceValues[1].getAsVector<T>();

// Concatenate the vectors
T temp[2 * I];
memcpy(temp, m, sizeof(T) * I);
jj16791 marked this conversation as resolved.
Show resolved Hide resolved
memcpy(temp + (sizeof(T) * I), n, sizeof(T) * I);

T out[I];
for (int i = 0; i < I; i++) {
out[i] = std::min(n[i], m[i]);
out[i] = std::min(temp[2 * i], temp[2 * i + 1]);
}
return {out, 256};
}
Expand Down Expand Up @@ -941,6 +951,63 @@ RegisterValue vecUzp(srcValContainer& sourceValues, bool isUzp1) {
return {out, 256};
}

/** Helper function for NEON instructions with the format `udot vd.s, vn.b,
* vm.b`. D represents the number of elements in the output vector to be updated
* (i.e. for vd.2s D = 2). Only 2 or 4 are valid. Returns correctly formatted
* RegisterValue. */
template <int D>
RegisterValue vecUdot(
srcValContainer& sourceValues,
const simeng::arch::aarch64::InstructionMetadata& metadata) {
// Check D and N are valid values
static_assert((D == 2 || D == 4) &&
"D must be either 2 or 4 to align with vd.2s or vd.4s.");

const uint32_t* vd = sourceValues[0].getAsVector<uint32_t>();
const uint8_t* vn = sourceValues[1].getAsVector<uint8_t>();
const uint8_t* vm = sourceValues[2].getAsVector<uint8_t>();

uint32_t out[D] = {0};
for (int i = 0; i < D; i++) {
out[i] = vd[i];
for (int j = 0; j < 4; j++) {
out[i] += (static_cast<uint32_t>(vn[(4 * i) + j]) *
static_cast<uint32_t>(vm[(4 * i) + j]));
}
}
return {out, 256};
}

/** Helper function for NEON instructions with the format `udot vd.s, vn.b,
* vm.4b[index]`.
* D represents the number of elements in the output vector to be updated (i.e.
* for vd.2s D = 2). Only 2 or 4 are valid.
* Returns correctly formatted RegisterValue. */
template <int D>
RegisterValue vecUdot_byElement(
srcValContainer& sourceValues,
const simeng::arch::aarch64::InstructionMetadata& metadata) {
// Check D and N are valid values
static_assert((D == 2 || D == 4) &&
"D must be either 2 or 4 to align with vd.2s or vd.4s.");

const uint32_t* vd = sourceValues[0].getAsVector<uint32_t>();
const uint8_t* vn = sourceValues[1].getAsVector<uint8_t>();
const uint8_t* vm = sourceValues[2].getAsVector<uint8_t>();
const int index = metadata.operands[2].vector_index;

uint32_t out[D] = {0};
for (int i = 0; i < D; i++) {
uint32_t acc = vd[i];
for (int j = 0; j < 4; j++) {
acc += (static_cast<uint32_t>(vn[(4 * i) + j]) *
static_cast<uint32_t>(vm[(4 * index) + j]));
}
out[i] = acc;
}
return {out, 256};
}

/** Helper function for NEON instructions with the format `zip<1,2> vd.T,
* vn.T, vm.T`.
* T represents the type of sourceValues (e.g. for vn.d, T = uint64_t).
Expand Down
118 changes: 118 additions & 0 deletions src/include/simeng/arch/aarch64/helpers/sve.hh
Original file line number Diff line number Diff line change
Expand Up @@ -626,6 +626,27 @@ std::enable_if_t<std::is_floating_point_v<T>, RegisterValue> sveFDivPredicated(
return {out, 256};
}

/** Helper function for SVE instructions with the format `faddv rd, pg, zn.
* D represents the source vector element type and the destination scalar
* register type (i.e. for zn.s and sd, D = float).
* Returns correctly formatted RegisterValue. */
template <typename D>
RegisterValue sveFaddv_predicated(srcValContainer& sourceValues,
const uint16_t VL_bits) {
const uint64_t* p = sourceValues[0].getAsVector<uint64_t>();
const D* zn = sourceValues[1].getAsVector<D>();

const uint16_t partition_num = VL_bits / (8 * sizeof(D));
D out[256 / sizeof(D)] = {0};
for (int i = 0; i < partition_num; i++) {
uint64_t shifted_active = 1ull << ((i % (64 / sizeof(D))) * sizeof(D));
if (p[i / (64 / sizeof(D))] & shifted_active) {
out[0] += zn[i];
}
}
return {out, 256};
}

/** Helper function for SVE instructions with the format `fmad zd, pg/m, zn,
* zm`.
* T represents the type of sourceValues (e.g. for zn.d, T = double).
Expand Down Expand Up @@ -1319,6 +1340,40 @@ std::array<uint64_t, 4> svePtrue(
return out;
}

/** Helper function for SVE instructions with the format `ptrue pnd.
* T represents the type of sourceValues (e.g. for pnd.d, T = uint64_t).
* Returns an array of 4 uint64_t elements. */
template <typename T>
std::array<uint64_t, 4> svePtrue_counter(const uint16_t VL_bits) {
// Predicate as counter is 16-bits and has the following encoding:
// - Up to first 4 bits encode the element size (0b1, 0b10, 0b100, 0b1000
// for b h s d respectively)
// - bits 0->LSZ
// - Bits LSZ -> 14 represent a uint of the number of consecutive elements
// from element 0 that are active / inactive
// - If invert bit = 0 it is number of active elements
// - If invert bit = 1 it is number of inactive elements
// - Bit 15 represents the invert bit
std::array<uint64_t, 4> out = {0, 0, 0, 0};

// Set invert bit to 1 and count to 0
// (The first 0 elements are FALSE)
out[0] |= 0b1000000000000000;

// Set Element size field
if (sizeof(T) == 1) {
out[0] |= 0b1;
} else if (sizeof(T) == 2) {
out[0] |= 0b10;
} else if (sizeof(T) == 4) {
out[0] |= 0b100;
} else if (sizeof(T) == 8) {
out[0] |= 0b1000;
}

return out;
}

/** Helper function for SVE instructions with the format `punpk<hi,lo> pd.h,
* pn.b`.
* If `isHI` = false, then PUNPKLO is performed.
Expand Down Expand Up @@ -1563,6 +1618,69 @@ RegisterValue sveTrn2_3vecs(srcValContainer& sourceValues,
return {out, 256};
}

/** Helper function for SVE instructions with the format `udot zd, zn, zm`.
* D represents the element type of the destination register (i.e. for zd.s,
* D = uint32_t).
* N represents the element type of the source registers (i.e. for zn.b, N =
* uint8_t).
* W represents how many source elements are multiplied to form an output
* element (i.e. for 4-way, W = 4).
* Returns correctly formatted RegisterValue. */
template <typename D, typename N, int W>
RegisterValue sveUdot(
srcValContainer& sourceValues,
const simeng::arch::aarch64::InstructionMetadata& metadata,
const uint16_t VL_bits) {
const D* zd = sourceValues[0].getAsVector<D>();
const N* zn = sourceValues[1].getAsVector<N>();
const N* zm = sourceValues[2].getAsVector<N>();

D out[256 / sizeof(D)] = {0};
for (size_t i = 0; i < (VL_bits / (sizeof(D) * 8)); i++) {
out[i] = zd[i];
for (int j = 0; j < W; j++) {
out[i] +=
(static_cast<D>(zn[(W * i) + j]) * static_cast<N>(zm[(W * i) + j]));
}
}
return {out, 256};
}

/** Helper function for SVE instructions with the format `udot zd, zn,
* zm[index]`.
* D represents the element type of the destination register (i.e. for uint32_t,
* D = uint32_t).
* N represents the element type of the source registers (i.e. for uint8_t, N =
* uint8_t).
* W represents how many source elements are multiplied to form an output
* element (i.e. for 4-way, W = 4).
* Returns correctly formatted RegisterValue. */
template <typename D, typename N, int W>
RegisterValue sveUdot_indexed(
srcValContainer& sourceValues,
const simeng::arch::aarch64::InstructionMetadata& metadata,
const uint16_t VL_bits) {
const D* zd = sourceValues[0].getAsVector<D>();
const N* zn = sourceValues[1].getAsVector<N>();
const N* zm = sourceValues[2].getAsVector<N>();
const int index = metadata.operands[2].vector_index;

D out[256 / sizeof(D)] = {0};
for (size_t i = 0; i < (VL_bits / (sizeof(D) * 8)); i++) {
D acc = zd[i];
// Index into zm selects which D-type element within each 128-bit vector
// segment to use
int base = i - (i % (128 / (sizeof(D) * 8)));
int zmIndex = base + index;
for (int j = 0; j < W; j++) {
acc += (static_cast<D>(zn[(W * i) + j]) *
static_cast<N>(zm[(W * zmIndex) + j]));
}
out[i] = acc;
}
return {out, 256};
}

/** Helper function for SVE instructions with the format `<s,u>unpk>hi,lo> zd,
* zn`.
* D represents the type of the destination register (e.g. <u>int32_t for
Expand Down
2 changes: 1 addition & 1 deletion src/include/simeng/arch/aarch64/operandContainer.hh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ namespace arch {
namespace aarch64 {

/** The maximum number of source registers a non-SME instruction can have. */
const uint8_t MAX_SOURCE_REGISTERS = 6;
const uint8_t MAX_SOURCE_REGISTERS = 7;
jj16791 marked this conversation as resolved.
Show resolved Hide resolved

/** The maximum number of destination registers a non-SME instruction can have.
*/
Expand Down
1 change: 1 addition & 0 deletions src/include/simeng/version.hh.in
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,6 @@
#define SIMENG_LLVM_VERSION @SIMENG_LLVM_VERSION@
#define SIMENG_ENABLE_TESTS "${SIMENG_ENABLE_TESTS}"
#define SIMENG_BUILD_DIR "${CMAKE_BINARY_DIR}"
#define SIMENG_ENABLE_BF16 ${SIMENG_ENABLE_BF16}

#endif
5 changes: 2 additions & 3 deletions src/lib/arch/aarch64/ExceptionHandler.cc
Original file line number Diff line number Diff line change
Expand Up @@ -626,8 +626,7 @@ bool ExceptionHandler::init() {

break;
}
case 293: // rseq
{
case 293: { // rseq
stateChange = {ChangeType::REPLACEMENT, {R0}, {0ull}};
break;
}
Expand Down Expand Up @@ -818,7 +817,7 @@ void ExceptionHandler::readLinkAt(span<char> path) {
for (size_t i = 0; i < bytesCopied; i += 256) {
uint8_t size = std::min<uint64_t>(bytesCopied - i, 256ul);
stateChange.memoryAddresses.push_back({bufAddress + i, size});
stateChange.memoryAddressValues.push_back(RegisterValue(bufPtr, size));
stateChange.memoryAddressValues.push_back(RegisterValue(bufPtr + i, size));
}

concludeSyscall(stateChange);
Expand Down
2 changes: 1 addition & 1 deletion src/lib/arch/aarch64/InstructionMetadata.cc
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@ InstructionMetadata::InstructionMetadata(const cs_insn& insn)
if (isAlias) {
exceptionString_ =
"This instruction is an alias. The printed mnemonic and operand string "
"differ from what is expected of the Capstone opcode.";
"may differ from the underlying opcode.";
}
}

Expand Down
Loading
Loading