[HMSDK-2.0] Memory tiering issue with Hynix CXL devices #2

j0807s · 2024-02-19T07:59:42Z

Hello,

We are currently testing HMSDK-2.0 with Hynix CXL devices. However, we have encountered an issue where all the memory devices, including the expanders, have the same memory tier (i.e., memory_tier), which may hinder automatic promotion and demotion.

How can we create a new memory tier for the CXL devices and utilize them as second-tiered memory?

Our environmental setup is as follows:

OS (kernel) : Ubuntu 22.04.3 LTS (Linux 6.6.0-hmsdk2.0+)
CPU : Intel Xeon 4410Y (Sapphire Rapids) @2.0 GHz, 12 cores
Memory (Socket 0,1) : 32 GB DDR5-4000 MT/s, Total 128 GB
CXL Expander: PCIe 5.0 , Each with 96 GB
Mother Board: Super X13DAI-T (Supporting CXL 1.1, CXL Type 3 Legacy Enabled)

Thanks.

JongminKim-KU · 2024-02-19T08:43:48Z

We figured out that the memory in all of NUMA nodes is in the same tier:
$ ls /sys/devices/virtual/memory_tiering
memory_tier4 power uevent
$ cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist
0-3

hyeongtakji · 2024-02-19T09:24:08Z

Hello Junsu and Jongmin,

Thank you for reporting this issue.

However, we have encountered an issue where all the memory devices, including the expanders, have the same memory tier (i.e., memory_tier), which may hinder automatic promotion and demotion.

If all NUMA nodes are on the same memory tier, promotion and demotion won't happen.

How can we create a new memory tier for the CXL devices and utilize them as second-tiered memory?

As far as I know, there is no way to change the tier of NUMA nodes other than applying custom patches when building your Linux kernel. Maybe we can share the simple patch that we've used for tests. @honggyukim will it be okay?

Also, we are currently working on RFC v2 for LKML and it will include patches that enable users to set destination nodes for migrations regardless of the memory tier of the system. However, I'm not sure when we will post it. Still, I'll update you when it's available.

j0807s · 2024-02-19T11:56:19Z

Thank you for your explanation and support!

honggyukim · 2024-02-20T11:15:36Z

Hi Junsu and Jongmin,

Thanks for the report. As mentioned by @hyeongtakji, the current HMSDK 2.0 won't work unless your system has tiered memory setup.

We figured out that the memory in all of NUMA nodes is in the same tier:
$ ls /sys/devices/virtual/memory_tiering
memory_tier4 power uevent
$ cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist
0-3

If you want to make the NUMA node 0, 1 as first tier, and node 2, 3 as second tier, you can just use the following workaround change.

diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 437441cdf78f..13f82b5d67e8 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -18,6 +18,7 @@
  * the same memory tier.
  */
 #define MEMTIER_ADISTANCE_DRAM ((4 * MEMTIER_CHUNK_SIZE) + (MEMTIER_CHUNK_SIZE >> 1))
+#define MEMTIER_ADISTANCE_CXL  (MEMTIER_ADISTANCE_DRAM * 5)

 struct memory_tier;
 struct memory_dev_type {
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 37a4f59d9585..3fdbc3c9bfa9 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -37,6 +37,7 @@ static DEFINE_MUTEX(memory_tier_lock);
 static LIST_HEAD(memory_tiers);
 static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
 static struct memory_dev_type *default_dram_type;
+static struct memory_dev_type *default_cxl_type;

 static struct bus_type memory_tier_subsys = {
        .name = "memory_tiering",
@@ -484,7 +485,10 @@ static struct memory_tier *set_node_memory_tier(int node)
        if (!node_state(node, N_MEMORY))
                return ERR_PTR(-EINVAL);

-       __init_node_memory_type(node, default_dram_type);
+       if (node < 2)
+               __init_node_memory_type(node, default_dram_type);
+       else
+               __init_node_memory_type(node, default_cxl_type);

        memtype = node_memory_types[node].memtype;
        node_set(node, memtype->nodes);
@@ -646,6 +650,9 @@ static int __init memory_tier_init(void)
        default_dram_type = alloc_memory_type(MEMTIER_ADISTANCE_DRAM);
        if (IS_ERR(default_dram_type))
                panic("%s() failed to allocate default DRAM tier\n", __func__);
+       default_cxl_type = alloc_memory_type(MEMTIER_ADISTANCE_CXL);
+       if (IS_ERR(default_cxl_type))
+               panic("%s() failed to allocate default CXL tier\n", __func__);

        /*
         * Look at all the existing N_MEMORY nodes and add them to

Also, we are currently working on RFC v2 for LKML and it will include patches that enable users to set destination nodes for migrations regardless of the memory tier of the system. However, I'm not sure when we will post it. Still, I'll update you when it's available.

I'm preparing for this now. Hopefully, I can post it maybe by the next week. I will share the patch here when it's updated.

Thanks.

honggyukim · 2024-02-20T11:18:20Z

CXL Expander: PCIe 5.0 , Each with 96 GB

I worry if you use 2 CXL expander cards. The current kernel change might not be able to find a proper promotion target in the second CXL node. It's due to the inaccuracy of node distance in the upstream kernel, but we better find a better way to handle this problem. If we have the explicit destination setting in DAMON, then this can be handled later.

honggyukim · 2024-02-20T11:25:09Z

For now, I would recommend you to test your workload with a single CXL expander. And more importantly, please make sure if your evaluation environment has enough cold memory so that you can demote them to CXL memory. Having those cold memory, you can make enough space for CXL to DRAM promotion.

In other words, if your system has large working set that is larger than your DRAM capacity, then you won't be able to get benefit. We created large amount of cold memory with mmap program for evaluation and you can think that those mmaped cold memory as idle VMs in data centers.

Please see our evaluation environment for more explanation.
https://github.com/skhynix/hmsdk/wiki/HMSDK-v2.0-Performance-Results

JongminKim-KU · 2024-02-20T11:37:41Z

Thank you for sharing the modification and experiment setup details.

We will immediately modify the source code before the patch is updated and rebuild the kernel with a single CXL expander.

honggyukim · 2024-02-20T11:44:31Z

Please let us know when you have issues again. Thanks!

j0807s · 2024-02-20T13:40:06Z

We have patched the kernel and have observed that the promotion and demotion work during our experiments!

We sincerely appreciate your help!

honggyukim · 2024-02-20T23:56:00Z

I'm glad to hear that it's working in your environment. Please don't hesitate when you have more issues later. Thanks.

honggyukim · 2024-03-02T07:21:23Z

Also, we are currently working on RFC v2 for LKML and it will include patches that enable users to set destination nodes for migrations regardless of the memory tier of the system. However, I'm not sure when we will post it. Still, I'll update you when it's available.

I'm preparing for this now. Hopefully, I can post it maybe by the next week. I will share the patch here when it's updated.

The RFC v2 patches are posted at https://lore.kernel.org/linux-mm/[email protected]. In this patch series, /sys/kernel/mm/damon/admin/kdamonds/<N>/contexts/<N>/schemes/<N>/target_nid is created to set demotion/promotion target node ID explicitly. If this isn't set, then it uses memory tiering as a fallback.

If you're okay with the workaround patch above, then you don't need to use v2 patch, but I'm just sharing the recent update.

j0807s · 2024-03-06T09:31:19Z

It seems the RFC v2 patches would provide much more flexibility to construct a tiered memory system with multiple CXL devices especially when considering NUMA topology(e.g., the 1st tier for the nodes 0,1,2 and the 2nd tier for node 3, etc).

Thank you for sharing the helpful information!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HMSDK-2.0] Memory tiering issue with Hynix CXL devices #2

[HMSDK-2.0] Memory tiering issue with Hynix CXL devices #2

j0807s commented Feb 19, 2024

JongminKim-KU commented Feb 19, 2024

hyeongtakji commented Feb 19, 2024

j0807s commented Feb 19, 2024

honggyukim commented Feb 20, 2024

honggyukim commented Feb 20, 2024

honggyukim commented Feb 20, 2024

JongminKim-KU commented Feb 20, 2024

honggyukim commented Feb 20, 2024

j0807s commented Feb 20, 2024

honggyukim commented Feb 20, 2024

honggyukim commented Mar 2, 2024 •

edited

Loading

j0807s commented Mar 6, 2024

[HMSDK-2.0] Memory tiering issue with Hynix CXL devices #2

[HMSDK-2.0] Memory tiering issue with Hynix CXL devices #2

Comments

j0807s commented Feb 19, 2024

JongminKim-KU commented Feb 19, 2024

hyeongtakji commented Feb 19, 2024

j0807s commented Feb 19, 2024

honggyukim commented Feb 20, 2024

honggyukim commented Feb 20, 2024

honggyukim commented Feb 20, 2024

JongminKim-KU commented Feb 20, 2024

honggyukim commented Feb 20, 2024

j0807s commented Feb 20, 2024

honggyukim commented Feb 20, 2024

honggyukim commented Mar 2, 2024 • edited Loading

j0807s commented Mar 6, 2024

honggyukim commented Mar 2, 2024 •

edited

Loading