{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":93854532,"defaultBranch":"master","name":"linux","ownerLogin":"davidhildenbrand","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2017-06-09T11:59:50.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/1547205?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1727365006.0","currentOid":""},"activityList":{"items":[{"before":"553a25698e0c69f38698139e3cc8c37266957f26","after":"08768b8622fcca7725ef035ab6052f7546abc932","ref":"refs/heads/copy_huge_pmd_pfn","pushedAt":"2024-09-26T15:44:04.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"mm/huge_memory: check pmd_special() only after pmd_present()\n\nWe should only check for pmd_special() after we made sure that we\nhave a present PMD. For example, if we have a migration PMD,\npmd_special() might indicate that we have a special PMD although we\nreally don't.\n\nThis fixes confusing migration entries as PFN mappings, and not\ndoing what we are supposed to do in the \"is_swap_pmd()\" case further\ndown in the function -- including messing up COW, page table handling\nand accounting.\n\nReported-by: syzbot+bf2c35fa302ebe3c7471@syzkaller.appspotmail.com\nCloses: https://lore.kernel.org/lkml/66f15c8d.050a0220.c23dd.000f.GAE@google.com/\nFixes: bc02afbd4d73 (\"mm/fork: accept huge pfnmap entries\")\nCc: Peter Xu \nCc: Andrew Morton \nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"mm/huge_memory: check pmd_special() only after pmd_present()"}},{"before":null,"after":"553a25698e0c69f38698139e3cc8c37266957f26","ref":"refs/heads/copy_huge_pmd_pfn","pushedAt":"2024-09-26T15:36:46.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"mm/huge_memory: check pmd_special() only after pmd_present()\n\nWe should only check for pmd_special() after we made sure that we\nhave a present PMD. For example, if we have a migration PMD,\npmd_special() might indicate that we have a special PMD although we\nreally don't.\n\nThis fixes confusing migration entries as PFN mappings, and not\ndoing what we are supposed to do in the \"is_swap_pmd()\" case further\ndown in the function.\n\nReported-by: syzbot+bf2c35fa302ebe3c7471@syzkaller.appspotmail.com\nCloses: https://lore.kernel.org/lkml/66f15c8d.050a0220.c23dd.000f.GAE@google.com/\nFixes: bc02afbd4d73 (\"mm/fork: accept huge pfnmap entries\")\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"mm/huge_memory: check pmd_special() only after pmd_present()"}},{"before":"466a563d1d3aaedc289516230c6f8b98d22f90a1","after":"34fb0af01e2823ee895201ba0d452f881f4b074c","ref":"refs/heads/hugetlb_fault_after_madv","pushedAt":"2024-09-26T15:23:18.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"selftests/mm: hugetlb_fault_after_madv: improve test output\n\nLet's improve the test output. For example, print the proper test\nresult. Install a SIGBUS handler to catch any SIGBUS instead of\ncrashing the test on failure.\n\nWith unsuitable hugetlb page count:\n $ ./hugetlb_fault_after_madv\n TAP version 13\n 1..1\n # [INFO] detected default hugetlb page size: 2048 KiB\n ok 2 # SKIP This test needs one and only one page to execute. Got 0\n # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:1 error:0\n\nOn a failure:\n $ ./hugetlb_fault_after_madv\n TAP version 13\n 1..1\n not ok 1 SIGBUS behavior\n Bail out! 1 out of 1 tests failed\n\nOn success:\n $ ./hugetlb_fault_after_madv\n TAP version 13\n 1..1\n # [INFO] detected default hugetlb page size: 2048 KiB\n ok 1 SIGBUS behavior\n # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"selftests/mm: hugetlb_fault_after_madv: improve test output"}},{"before":"5cc2f07d899f5646d7e2012115bf6370f1bed749","after":"466a563d1d3aaedc289516230c6f8b98d22f90a1","ref":"refs/heads/hugetlb_fault_after_madv","pushedAt":"2024-09-26T15:07:20.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"selftests/mm: hugetlb_fault_after_madv: improve test output\n\nLet's improve the test output. For example, print the proper test\nresult. Install a SIGBUS handler to catch any SIGBUS instead of\ncrashing the test on failure.\n\nWith wring hugetlb count:\n $ ./hugetlb_fault_after_madv\n TAP version 13\n 1..1\n # [INFO] detected default hugetlb page size: 2048 KiB\n ok 2 # SKIP This test needs one and only one page to execute. Got 0\n # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:1 error:0\n\nOn a failure:\n $ ./hugetlb_fault_after_madv\n TAP version 13\n 1..1\n not ok 1 SIGBUS behavior\n Bail out! 1 out of 1 tests failed\n\nOn success:\n $ ./hugetlb_fault_after_madv\n TAP version 13\n 1..1\n # [INFO] detected default hugetlb page size: 2048 KiB\n ok 1 SIGBUS behavior\n # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"selftests/mm: hugetlb_fault_after_madv: improve test output"}},{"before":"97f57b8069758dcc62402bc7287d1f26803febf7","after":"5cc2f07d899f5646d7e2012115bf6370f1bed749","ref":"refs/heads/hugetlb_fault_after_madv","pushedAt":"2024-09-26T15:04:11.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"selftests/mm: hugetlb_fault_after_madv: improve test output\n\nLet's improve the test output. For example, print the proper test\nresult.\n\nWith wring hugetlb count:\n TAP version 13\n 1..1\n # [INFO] detected default hugetlb page size: 2048 KiB\n ok 2 # SKIP This test needs one and only one page to execute. Got 0\n # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:1 error:0\n\nOn a failure:\n TAP version 13\n 1..1\n not ok 1 SIGBUS behavior\n Bail out! 1 out of 1 tests failed\n\nOn Success:\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"selftests/mm: hugetlb_fault_after_madv: improve test output"}},{"before":null,"after":"97f57b8069758dcc62402bc7287d1f26803febf7","ref":"refs/heads/hugetlb_fault_after_madv","pushedAt":"2024-09-26T13:43:35.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"selftests/mm: use default hguetlb page size in hugetlb_fault_after_madv()\n\nWe currently assume that the hugetlb page size is 2 MiB, which is\nwhy we mmap() a 2 MiB range.\n\nIs the default hugetlb size is larger, mmap() will fail because the\nrange is not suitable. If the default hugetlb size is smaller (e.g.,\ns390x), mmap() will fail because we would need more than one hugetlb\npage, but just asserted that we have exactly one.\n\nSo let's simply use the default hugetlb page size instead of hard-coded\n2 MiB, so the test isn't unconditionally skipped on architectures like\ns390x.\n\nReported-by: Mario Casquero \nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"selftests/mm: use default hguetlb page size in hugetlb_fault_after_ma…"}},{"before":"fb523ce74c78d49f591fea92be637033bdabc7e8","after":"05248d7495350c4eda82d30e63bb738ad77ec0c5","ref":"refs/heads/virtio-mem-s390x","pushedAt":"2024-09-24T14:50:21.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"s390/sparsemem: reduce section size to 128 MiB\n\nEver since commit 421c175c4d609 (\"[S390] Add support for memory hot-add.\")\nwe've been using a section size of 256 MiB on s390x and 32 MiB on s390.\nBefore that, we were using a section size of 32 MiB on both\narchitectures.\n\nLikely the reason was that we'd expect a storage increment size of\n256 MiB under z/VM back then. As we didn't support memory blocks spanning\nmultiple memory sections, we would have had to handle having multiple\nmemory blocks for a single storage increment, which complicates things.\nAlthough that issue reappeared with even bigger storage increment sizes\nlater, nowadays we have memory blocks that can span multiple memory\nsections and we avoid any such issue completely.\n\nNow that we have a new mechanism to expose additional memory to a VM --\nvirtio-mem -- reduce the section size to 128 MiB to allow for more\nflexibility and reduce the metadata overhead when dealing with hot(un)plug\ngranularity smaller than 256 MiB.\n\n128 MiB has been used by x86-64 since the very beginning. arm64 with 4k\nbase pages switched to 128 MiB as well: it's just big enough on these\narchitectures to allows for using a huge page (2 MiB) in the vmemmap in\nsane setups with sizeof(struct page) == 64 bytes and a huge page mapping\nin the direct mapping, while still allowing for small hot(un)plug\ngranularity.\n\nFor s390x, we could even switch to a 64 MiB section size, as our huge page\nsize is 1 MiB: but the smaller the section size, the more sections we'll\nhave to manage especially on bigger machines. Making it consistent with\nx86-64 and arm64 feels like te right thing for now.\n\nNote that the smallest memory hot(un)plug granularity is also limited by\nthe memory block size, determined by extracting the memory increment\nsize from SCLP. Under QEMU/KVM, implementing virtio-mem, we expose 0;\ntherefore, we'll end up with a memory block size of 128 MiB with a\n128 MiB section size.\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"s390/sparsemem: reduce section size to 128 MiB"}},{"before":null,"after":"06e0fc4651a486ac1ce333cccc89b394222431c8","ref":"refs/heads/nr_cpus","pushedAt":"2024-09-24T08:03:00.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"fixup\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"fixup"}},{"before":"b6d5b8720b4d9045d245d3115d4d42fcdf226808","after":"fb523ce74c78d49f591fea92be637033bdabc7e8","ref":"refs/heads/virtio-mem-s390x","pushedAt":"2024-09-10T10:55:00.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"s390/sparsemem: reduce section size to 128 MiB\n\nEver since commit 421c175c4d609 (\"[S390] Add support for memory hot-add.\")\nwe've been using a section size of 256 MiB on s390x and 32 MiB on s390.\nBefore that, we were using a section size of 32 MiB on both\narchitectures.\n\nLikely the reason was that we'd expect a storage increment size of\n256 MiB under z/VM back then. As we didn't support memory blocks spanning\nmultiple memory sections, we would have had to handle having multiple\nmemory blocks for a single storage increment, which complicates things.\nAlthough that issue reappeared with even bigger storage increment sizes\nlater, nowadays we have memory blocks that can span multiple memory\nsections and we avoid any such issue completely.\n\nNow that we have a new mechanism to expose additional memory to a VM --\nvirtio-mem -- reduce the section size to 128 MiB to allow for more\nflexibility and reduce the metadata overhead when dealing with hot(un)plug\ngranularity smaller than 256 MiB.\n\n128 MiB has been used by x86-64 since the very beginning. arm64 with 4k\nbase pages switched to 128 MiB as well: it's just big enough on these\narchitectures to allows for using a huge page (2 MiB) in the vmemmap in\nsane setups with sizeof(struct page) == 64 bytes and a huge page mapping\nin the direct mapping, while still allowing for small hot(un)plug\ngranularity.\n\nFor s390x, we could even switch to a 64 MiB section size, as our huge page\nsize is 1 MiB: but the smaller the section size, the more sections we'll\nhave to manage especially on bigger machines. Making it consistent with\nx86-64 and arm64 feels like te right thing for now.\n\nNote that the smallest memory hot(un)plug granularity is also limited by\nthe memory block size, determined by extracting the memory increment\nsize from SCLP. Under QEMU/KVM, implementing virtio-mem, we expose 0;\ntherefore, we'll end up with a memory block size of 128 MiB with a\n128 MiB section size.\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"s390/sparsemem: reduce section size to 128 MiB"}},{"before":"24adb7dbb37f3f010d4f2a12b36048c05201a885","after":"b6d5b8720b4d9045d245d3115d4d42fcdf226808","ref":"refs/heads/virtio-mem-s390x","pushedAt":"2024-09-10T09:42:51.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"s390/sparsemem: reduce section size to 128 MiB\n\nEver since commit 421c175c4d609 (\"[S390] Add support for memory hot-add.\")\nwe've been using a section size of 256 MiB on s390x and 32 MiB on s390.\nBefore that, we were using a section size of 32 MiB on both\narchitectures.\n\nLikely the reason was that we'd expect a storage increment size of\n256 MiB under z/VM back then. As we didn't support memory blocks spanning\nmultiple memory sections, we would have had to handle having multiple\nmemory blocks for a single storage increment, which complicates things.\nAlthough that issue reappeared with even bigger storage increment sizes\nlater, nowadays we have memory blocks that can span multiple memory\nsections and we avoid any such issue completely.\n\nNow that we have a new mechanism to expose additional memory to a VM --\nvirtio-mem -- reduce the section size to 128 MiB to allow for more\nflexibility and reduce the metadata overhead when dealing with hot(un)plug\ngranularity smaller than 256 MiB.\n\n128 MiB has been used by x86-64 since the very beginning. arm64 with 4k\nbase pages switched to 128 MiB as well: it's just big enough on these\narchitectures to allows for using a huge page (2 MiB) in the vmemmap in\nsane setups with sizeof(struct page) == 64 bytes and a huge page mapping\nin the direct mapping, while still allowing for small hot(un)plug\ngranularity.\n\nFor s390x, we could even switch to a 64 MiB section size, as our huge page\nsize is 1 MiB: but the smaller the section size, the more sections we'll\nhave to manage especially on bigger machines. Making it consistent with\nx86-64 and arm64 feels like te right thing for now.\n\nNote that the smallest memory hot(un)plug granularity is also limited by\nthe memory block size, determined by extracting the memory increment\nsize from SCLP. Under QEMU/KVM, implementing virtio-mem, we expose 0;\ntherefore, we'll end up with a memory block size of 128 MiB with a\n128 MiB section size.\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"s390/sparsemem: reduce section size to 128 MiB"}},{"before":"3b0f320eb61c8dfa2f5a87a81aa6dba54811e8dc","after":"24adb7dbb37f3f010d4f2a12b36048c05201a885","ref":"refs/heads/virtio-mem-s390x","pushedAt":"2024-09-10T09:39:50.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"s390/sparsemem: reduce section size to 128 MiB\n\nEver since commit 421c175c4d609 (\"[S390] Add support for memory hot-add.\")\nwe've been using a section size of 256 MiB on s390x and 32 MiB on s390.\nBefore that, we were using a section size of 32 MiB on both\narchitectures.\n\nLikely the reason was that we'd expect a storage increment size of\n256 MiB under z/VM back then. As we didn't support memory blocks spanning\nmultiple memory sections, we would have had to handle having multiple\nmemory blocks for a single storage increment, which complicates things.\nAlthough that issue reappeared with even bigger storage increment sizes\nlater, nowadays we have memory blocks that can span multiple memory\nsections and we avoid any such issue completely.\n\nNow that we have a new mechanism to expose additional memory to a VM --\nvirtio-mem -- reduce the section size to 128 MiB to allow for more\nflexibility and reduce the metadata overhead when dealing with hot(un)plug\ngranularity smaller than 256 MiB.\n\n128 MiB has been used by x86-64 since the very beginning. arm64 with 4k\nbase pages switched to 128 MiB as well: it's just big enough on these\narchitectures to allows for using a huge page (2 MiB) in the vmemmap in\nsane setups with sizeof(struct page) == 64 bytes and a huge page mapping\nin the direct mapping, while still allowing for small hot(un)plug\ngranularity.\n\nFor s390x, we could even switch to a 64 MiB section size, as our huge page\nsize is 1 MiB: but the smaller the section size, the more sections we'll\nhave to manage especially on bigger machines. Making it consistent with\nx86-64 and arm64 feels like te right thing for now.\n\nNote that the smallest memory hot(un)plug granularity is also limited by\nthe memory block size, determined by extracting the memory increment\nsize from SCLP. Under QEMU/KVM, implementing virtio-mem, we expose 0;\ntherefore, we'll end up with a memory block size of 128 MiB with a\n128 MiB section size.\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"s390/sparsemem: reduce section size to 128 MiB"}},{"before":"6d13d08b5022b8b34ea2bb69a267b6f6f84a3eb7","after":"3b0f320eb61c8dfa2f5a87a81aa6dba54811e8dc","ref":"refs/heads/virtio-mem-s390x","pushedAt":"2024-09-10T09:18:23.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"s390/sparsemem: reduce section size to 128 MiB\n\nEver since commit 421c175c4d609 (\"[S390] Add support for memory hot-add.\")\nwe've been using a section size of 256 MiB on s390x and 32 MiB on s390.\nBefore that, we were using a section size of 32 MiB on both\narchitectures.\n\nLikely the reason was that we'd expect a storage increment size of\n256 MiB under z/VM back then. As we didn't support memory blocks spanning\nmultiple memory sections, we would have had to handle having multiple\nmemory blocks for a single storage increment, which complicates things.\nAlthough that issue reappeared with even bigger storage increment sizes\nlater, nowadays we have memory blocks that can span multiple memory\nsections and we avoid any such issue completely.\n\nNow that we have a new mechanism to expose additional memory to a VM --\nvirtio-mem -- reduce the section size to 128 MiB to allow for more\nflexibility and reduce the metadata overhead when dealing with hot(un)plug\ngranularity smaller than 256 MiB.\n\n128 MiB has been used by x86-64 since the very beginning. arm64 with 4k\nbase pages switched to 128 MiB as well: it's just big enough on these\narchitectures to allows for using a huge page (2 MiB) in the vmemmap in\nsane setups with sizeof(struct page) == 64 bytes and a huge page mapping\nin the direct mapping, while still allowing for small hot(un)plug\ngranularity.\n\nFor s390x, we could even switch to a 64 MiB section size, as our huge page\nsize is 1 MiB: but the smaller the section size, the more sections we'll\nhave to manage especially on bigger machines. Making it consistent with\nx86-64 and arm64 feels like te right thing for now.\n\nNote that the smallest memory hot(un)plug granularity is also limited by\nthe memory block size, determined by extracting the memory increment\nsize from SCLP. Under QEMU/KVM, implementing virtio-mem, we expose 0;\ntherefore, we'll end up with a memory block size of 128 MiB with a\n128 MiB section size.\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"s390/sparsemem: reduce section size to 128 MiB"}},{"before":"a9649234d14615f71794596e42882d2e7be3057b","after":"6d13d08b5022b8b34ea2bb69a267b6f6f84a3eb7","ref":"refs/heads/virtio-mem-s390x","pushedAt":"2024-09-10T07:57:37.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"tmp\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"tmp"}},{"before":"9c0abb9441a379fd1a7d07e63b3f87c8f8c06dbb","after":null,"ref":"refs/heads/virtio-mem-suspend","pushedAt":"2024-09-10T07:57:26.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"}},{"before":"22302461039e12befd39dfe4305e679f68aa63ea","after":null,"ref":"refs/heads/virtio-mem-s390x-new","pushedAt":"2024-09-10T07:57:16.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"}},{"before":"dda90880271374fbbfd23bfc21221542c1453ba5","after":"22302461039e12befd39dfe4305e679f68aa63ea","ref":"refs/heads/virtio-mem-s390x-new","pushedAt":"2024-09-06T11:47:32.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"s390/sparsemem: reduce section size to 128 MiB\n\nEver since commit 421c175c4d609 (\"[S390] Add support for memory hot-add.\")\nwe've been using a section size of 256 MiB on s390x and 32 MiB on s390.\nBefore that, we were using a section size of 32 MiB on both\narchitectures.\n\nI can only assume that the reason was that we'd expect a storage increment\nsize of 256 MiB under z/VM back then. As we didn't support memory blocks\nspanning multiple memory sections, we would have had to handle having\nmultiple memory blocks for a single memory section, complicating things.\nAlthough that issue reappeared with even bigger storage increment sizes\nlater, nowadays we have memory blocks that can span multiple memory\nsections and we avoid any such issue completely.\n\nNow that we have a new mechanism to expose additional memory to a VM --\nvirtio-mem -- reduce the section size to 128 MiB to allow for more\nflexibility and reduce the metadata overhead when dealing with hot(un)plug\ngranularity smaller than 256 MiB.\n\n128 MiB has been used by x86-64 since the very beginning. arm64 with 4k\nbase pages just recently switched to 128 MiB as well: it's just big\nenough on these architectures to allows for using a huge page (2 MiB) in\nthe vmemmap in sane setups with sizeof(struct page) == 64 bytes and a\nhuge page mapping in the direct mapping, while still allowing for small\nhot(un)plug granularity.\n\nFor s390x, we could even switch to a 64 MiB section size, as our huge page\nsize is 1 MiB: but the smaller the section size, the more sections we'll\nhave to manage especially on bigger machines. Making it consistent with\nx86-64 and arm64 feels like te right thing for now.\n\nNote that the smallest memory hot(un)plug granularity is also limited by\nthe memory block size, determined by extracting the memory increment\nsize from SCLP. Under QEMU/KVM, implementing virtio-mem, we expose 0;\ntherefore, we'll end up with a memory block size of 128 MiB with a\n128 MiB section size.\n\nCc: Gerald Schaefer \nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"s390/sparsemem: reduce section size to 128 MiB"}},{"before":"0b9d32ebea44c31a19943e76ec43425ca3987405","after":"dda90880271374fbbfd23bfc21221542c1453ba5","ref":"refs/heads/virtio-mem-s390x-new","pushedAt":"2024-09-06T10:48:14.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"s390/sparsemem: reduce section size to 128 MiB\n\nEver since commit 421c175c4d609 (\"[S390] Add support for memory hot-add.\")\nwe've been using a section size of 256 MiB on s390x and 32 MiB on s390.\nBefore that, we were using a section size of 32 MiB on both\narchitectures.\n\nI can only assume that the reason was that we'd expect a storage increment\nsize of 256 MiB under z/VM back then. As we didn't support memory blocks\nspanning multiple memory sections, we would have had to handle having\nmultiple memory blocks for a single memory section, complicating things.\nAlthough that issue reappeared with even bigger storage increment sizes\nlater, nowadays we have memory blocks that can span multiple memory\nsections and we avoid any such issue completely.\n\nNow that we have a new mechanism to expose additional memory to a VM --\nvirtio-mem -- reduce the section size to 128 MiB to allow for more\nflexibility and reduce the metadata overhead when dealing with hot(un)plug\ngranularity smaller than 256 MiB.\n\n128 MiB has been used by x86-64 since the very beginning. arm64 with 4k\nbase pages just recently switched to 128 MiB as well: it's just big\nenough on these architectures to allows for using a huge page (2 MiB) in\nthe vmemmap in sane setups with sizeof(struct page) == 64 bytes and a\nhuge page mapping in the direct mapping, while still allowing for small\nhot(un)plug granularity.\n\nFor s390x, we could even switch to a 64 MiB section size, as our huge page\nsize is 1 MiB: but the smaller the section size, the more sections we'll\nhave to manage especially on bigger machines. Making it consistent with\nx86-64 and arm64 feels like te right thing for now.\n\nNote that the smallest memory hot(un)plug granularity is also limited by\nthe memory block size, determined by extracting the memory increment\nsize from SCLP. Under QEMU/KVM, implementing virtio-mem, we expose 0;\ntherefore, we'll end up with a memory block size of 128 MiB with a\n128 MiB section size.\n\nCc: Gerald Schaefer \nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"s390/sparsemem: reduce section size to 128 MiB"}},{"before":"20ec4366bab8de03f08ac131659b8aaaf97fb222","after":"0b9d32ebea44c31a19943e76ec43425ca3987405","ref":"refs/heads/virtio-mem-s390x-new","pushedAt":"2024-09-06T09:03:17.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"s390/sparsemem: reduce section size to 128 MiB\n\nEver since commit 421c175c4d609 (\"[S390] Add support for memory hot-add.\")\nwe've been using a section size of 256 MiB on s390x and 32 MiB on s390.\nBefore that, we were using a section size of 32 MiB on both\narchitectures.\n\nI can only assume that the reason was that we'd expect a storage increment\nsize of 256 MiB under z/VM back then. As we didn't support memory blocks\nspanning multiple memory sections, we would have had to handle having\nmultiple memory blocks for a single memory section, complicating things.\nAlthough that issue reappeared with even bigger storage increment sizes\nlater, nowadays we have memory blocks that can span multiple memory\nsections and we avoid any such issue completely.\n\nNow that we have a new mechanism to expose additional memory to a VM --\nvirtio-mem -- reduce the section size to 128 MiB to allow for more\nflexibility and reduce the metadata overhead when dealing with hot(un)plug\ngranularity smaller than 256 MiB.\n\n128 MiB has been used by x86-64 since the very beginning. arm64 with 4k\nbase pages just recently switched to 128 MiB as well: it's just big\nenough on these architectures to allows for using a huge page (2 MiB) in\nthe vmemmap in sane setups with sizeof(struct page) == 64 bytes and a\nhuge page mapping in the direct mapping, while still allowing for small\nhot(un)plug granularity.\n\nFor s390x, we could even switch to a 64 MiB section size, as our huge page\nsize is 1 MiB: but the smaller the section size, the more sections we'll\nhave to manage especially on bigger machines. Making it consistent with\nx86-64 and arm64 feels like te right thing for now.\n\nNote that the smallest memory hot(un)plug granularity is also limited by\nthe memory block size, determined by extracting the memory increment\nsize from SCLP. Under QEMU/KVM, implementing virtio-mem, we expose 0;\ntherefore, we'll end up with a memory block size of 128 MiB with a\n128 MiB section size.\n\nCc: Gerald Schaefer \nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"s390/sparsemem: reduce section size to 128 MiB"}},{"before":null,"after":"20ec4366bab8de03f08ac131659b8aaaf97fb222","ref":"refs/heads/virtio-mem-s390x-new","pushedAt":"2024-09-04T15:46:41.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"s390/sparsemem: reduce section size to 128 MiB\n\nEver since commit 421c175c4d609 (\"[S390] Add support for memory hot-add.\")\nwe've been using a section size of 256 MiB on s390x and 32 MiB on s390.\nBefore that, we were using a section size of 32 MiB on both\narchitectures.\n\nI can only assume that the reason was that we'd expect a storage increment\nsize of 256 MiB under z/VM back then. As we didn't support memory blocks\nspanning multiple memory sections, we would have had to handle having\nmultiple memory blocks for a single memory section, complicating things.\nAlthough that issue reappeared with even bigger storage increment sizes\nlater, nowadays we have memory blocks that can span multiple memory\nsections and we avoid any such issue completely.\n\nNow that we have a new mechanism to expose additional memory to a VM --\nvirtio-mem -- reduce the section size to 128 MiB to allow for more\nflexibility and reduce the metadata overhead when dealing with hot(un)plug\ngranularity smaller than 256 MiB.\n\n128 MiB has been used by x86-64 since the very beginning. arm64 with 4k\nbase pages just recently switched to 128 MiB as well: it's just big\nenough on these architectures to allows for using a huge page (2 MiB) in\nthe vmemmap in sane setups with sizeof(struct page) == 64 bytes and a\nhuge page mapping in the direct mapping, while still allowing for small\nhot(un)plug granularity.\n\nFor s390x, we could even switch to a 64 MiB section size, as our huge page\nsize is 1 MiB: but the smaller the section size, the more sections we'll\nhave to manage especially on bigger machines. Making it consistent with\nx86-64 and arm64 feels like te right thing for now.\n\nNote that the smallest memory hot(un)plug granularity is also limited by\nthe memory block size, determined by extracting the memory increment\nsize from SCLP. Under QEMU/KVM, implementing virtio-mem, we expose 0;\ntherefore, we'll end up with a memory block size of 128 MiB with a\n128 MiB section size.\n\nCc: Gerald Schaefer \nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"s390/sparsemem: reduce section size to 128 MiB"}},{"before":null,"after":"8dabc06448d534cfa588184f76bef6ecbc06458a","ref":"refs/heads/virtio-mem-logically-offline","pushedAt":"2024-09-04T12:17:50.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"mm: unexport alloc_contig_range() and free_contig_range()\n\nNow that virtio-mem no longer uses these directly, we can unexport them.\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"mm: unexport alloc_contig_range() and free_contig_range()"}},{"before":"6c79fe3636d0998604c457f91236d1bfd84aa0ac","after":"0764f7297d410cbe40e8400f74702af003159b38","ref":"refs/heads/mm_id","pushedAt":"2024-08-29T16:36:30.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO_PAGE_MAPCOUNT)\n\nEverything is in place to stop using the per-page mapcounts in large\nfolios with CONFIG_NO_PAGE_MAPCOUNT: the mapcount of tail pages will always\nbe logically 0 (-1 value), just like it currently is for hugetlb folios\nalready, and the page mapcount of the head page is either 0 (-1 value)\nor contains a page type (e.g., hugetlb).\n\nMaintaining _nr_pages_mapped without per-page mapcounts is impossible,\nso that one also has to go with CONFIG_NO_PAGE_MAPCOUNT.\n\nThere are two remaining implications:\n\n(1) Per-node, per-cgroup and per-lruvec stats of \"NR_ANON_MAPPED\"\n (\"mapped anonymous memory\") and \"NR_FILE_MAPPED\"\n (\"mapped file memory\"):\n\n As soon as any page of the folio is mapped -- folio_mapped() -- we\n now account the complete folio as mapped. Once the last page is\n unmapped -- !folio_mapped() -- we account the complete folio as\n unmapped.\n\n This implies that ...\n\n * \"AnonPages\" and \"Mapped\" in /proc/meminfo and\n /sys/devices/system/node/*/meminfo\n * cgroup v2: \"anon\" and \"file_mapped\" in \"memory.stat\" and\n \"memory.numa_stat\"\n * cgroup v1: \"rss\" and \"mapped_file\" in \"memory.stat\" and\n \"memory.numa_stat\n\n ... can now appear higher than before. But note that these folios do\n consume that memory, simply not all pages are actually currently\n mapped.\n\n It's worth nothing that other accounting in the kernel (esp. cgroup\n charging on allocation) is not affected by this change.\n\n [why oh why is \"anon\" called \"rss\" in cgroup v1]\n\n (2) Detecting partial mappings\n\n Detecting whether anon THP are partially mapped gets a bit more\n unreliable. As long as a single MM maps such a large folio\n (\"exclusively mapped\"), we can reliably detect it. Especially before\n fork() / after a short-lived child process quit, we will detect\n partial mappings reliably, which is the common case.\n\n In essence, if the average per-page mapcount in an anon THP is < 1,\n we know for sure that we have a partial mapping.\n\n However, as soon as multiple MMs are involved, we might miss detecting\n partial mappings: this might be relevant with long-lived child\n processes. If we have a fully-mapped anon folio before fork(), once\n our child processes and our parent all unmap (zap/COW) the same pages\n (but not the complete folio), we might not detect the partial mapping.\n However, once the child processes quit we would detect the partial\n mapping.\n\n How relevant this case is in practice remains to be seen.\n Swapout/migration will likely mitigate this.\n\n In the future, RMAP walkers should check for that for \"mapped shared\"\n anon folios, and flag them for deferred-splitting.\n\nThere are a couple of remaining per-page mapcount users we won't\ntouch for now:\n\n (1) __dump_folio(): we'll tackle that separately later. For now, it\n will always read effective mapcount of \"0\" for pages in large folios.\n\n (2) include/trace/events/page_ref.h: we should rework the whole\n handling to be folio-aware and simply trace folio_mapcount(). Let's\n leave it around for now, might still be helpful to trace the raw\n page mapcount value (e.g., including the page type).\n\n (3) mm/mm_init.c: to initialize the mapcount/type field to -1. Will be\n required until we decoupled type+mapcount (e.g., moving it into\n \"struct folio\"), and until we initialize the type+mapcount when\n allocating a folio.\n\n (4) mm/page_alloc.c: to sanity-check that the mapcount/type field is -1\n when a page gets freed. We could probably remove at least the tail\n page mapcount check in non-debug environments.\n\nSome added ifdefery seems unavoidable for now: at least it's mostly\nlimited to the rmap add/remove core primitives.\n\nExtend documentation.\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO…"}},{"before":"0cccf34ccd94c0df4ae8d5157608f619a0c66a51","after":"6c79fe3636d0998604c457f91236d1bfd84aa0ac","ref":"refs/heads/mm_id","pushedAt":"2024-08-29T15:56:38.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO_PAGE_MAPCOUNT)\n\nEverything is in place to stop using the per-page mapcounts in large\nfolios with CONFIG_NO_PAGE_MAPCOUNT: the mapcount of tail pages will always\nbe logically 0 (-1 value), just like it currently is for hugetlb folios\nalready, and the page mapcount of the head page is either 0 (-1 value)\nor contains a page type (e.g., hugetlb).\n\nMaintaining _nr_pages_mapped without per-page mapcounts is impossible,\nso that one also has to go with CONFIG_NO_PAGE_MAPCOUNT.\n\nThere are two remaining implications:\n\n(1) Per-node, per-cgroup and per-lruvec stats of \"NR_ANON_MAPPED\"\n (\"mapped anonymous memory\") and \"NR_FILE_MAPPED\"\n (\"mapped file memory\"):\n\n As soon as any page of the folio is mapped -- folio_mapped() -- we\n now account the complete folio as mapped. Once the last page is\n unmapped -- !folio_mapped() -- we account the complete folio as\n unmapped.\n\n This implies that ...\n\n * \"AnonPages\" and \"Mapped\" in /proc/meminfo and\n /sys/devices/system/node/*/meminfo\n * cgroup v2: \"anon\" and \"file_mapped\" in \"memory.stat\" and\n \"memory.numa_stat\"\n * cgroup v1: \"rss\" and \"mapped_file\" in \"memory.stat\" and\n \"memory.numa_stat\n\n ... can now appear higher than before. But note that these folios do\n consume that memory, simply not all pages are actually currently\n mapped.\n\n It's worth nothing that other accounting in the kernel (esp. cgroup\n charging on allocation) is not affected by this change.\n\n [why oh why is \"anon\" called \"rss\" in cgroup v1]\n\n (2) Detecting partial mappings\n\n Detecting whether anon THP are partially mapped gets a bit more\n unreliable. As long as a single MM maps such a large folio\n (\"exclusively mapped\"), we can reliably detect it. Especially before\n fork() / after a short-lived child process quit, we will detect\n partial mappings reliably, which is the common case.\n\n In essence, if the average per-page mapcount in an anon THP is < 1,\n we know for sure that we have a partial mapping.\n\n However, as soon as multiple MMs are involved, we might miss detecting\n partial mappings: this might be relevant with long-lived child\n processes. If we have a fully-mapped anon folio before fork(), once\n our child processes and our parent all unmap (zap/COW) the same pages\n (but not the complete folio), we might not detect the partial mapping.\n However, once the child processes quit we would detect the partial\n mapping.\n\n How relevant this case is in practice remains to be seen.\n Swapout/migration will likely mitigate this.\n\n In the future, RMAP walkers should check for that for \"mapped shared\"\n anon folios, and flag them for deferred-splitting.\n\nThere are a couple of remaining per-page mapcount users we won't\ntouch for now:\n\n (1) __dump_folio(): we'll tackle that separately later. For now, it\n will always read effective mapcount of \"0\" for pages in large folios.\n\n (2) include/trace/events/page_ref.h: we should rework the whole\n handling to be folio-aware and simply trace folio_mapcount(). Let's\n leave it around for now, might still be helpful to trace the raw\n page mapcount value (e.g., including the page type).\n\n (3) mm/mm_init.c: to initialize the mapcount/type field to -1. Will be\n required until we decoupled type+mapcount (e.g., moving it into\n \"struct folio\"), and until we initialize the type+mapcount when\n allocating a folio.\n\n (4) mm/page_alloc.c: to sanity-check that the mapcount/type field is -1\n when a page gets freed. We could probably remove at least the tail\n page mapcount check in non-debug environments.\n\nSome added ifdefery seems unavoidable for now: at least it's mostly\nlimited to the rmap add/remove core primitives.\n\nExtend documentation.\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO…"}},{"before":"18a24232b98993e020ef6dfefc10f16ff608c693","after":"0cccf34ccd94c0df4ae8d5157608f619a0c66a51","ref":"refs/heads/mm_id","pushedAt":"2024-08-29T07:47:23.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO_PAGE_MAPCOUNT)\n\nEverything is in place to stop using the per-page mapcounts in large\nfolios with CONFIG_NO_PAGE_MAPCOUNT: the mapcount of tail pages will always\nbe logically 0 (-1 value), just like it currently is for hugetlb folios\nalready, and the page mapcount of the head page is either 0 (-1 value)\nor contains a page type (e.g., hugetlb).\n\nMaintaining _nr_pages_mapped without per-page mapcounts is impossible,\nso that one also has to go with CONFIG_NO_PAGE_MAPCOUNT.\n\nThere are two remaining implications:\n\n(1) Per-node, per-cgroup and per-lruvec stats of \"NR_ANON_MAPPED\"\n (\"mapped anonymous memory\") and \"NR_FILE_MAPPED\"\n (\"mapped file memory\"):\n\n As soon as any page of the folio is mapped -- folio_mapped() -- we\n now account the complete folio as mapped. Once the last page is\n unmapped -- !folio_mapped() -- we account the complete folio as\n unmapped.\n\n This implies that ...\n\n * \"AnonPages\" and \"Mapped\" in /proc/meminfo and\n /sys/devices/system/node/*/meminfo\n * cgroup v2: \"anon\" and \"file_mapped\" in \"memory.stat\" and\n \"memory.numa_stat\"\n * cgroup v1: \"rss\" and \"mapped_file\" in \"memory.stat\" and\n \"memory.numa_stat\n\n ... can now appear higher than before. But note that these folios do\n consume that memory, simply not all pages are actually currently\n mapped.\n\n It's worth nothing that other accounting in the kernel (esp. cgroup\n charging on allocation) is not affected by this change.\n\n [why oh why is \"anon\" called \"rss\" in cgroup v1]\n\n (2) Detecting partial mappings\n\n Detecting whether anon THP are partially mapped gets a bit more\n unreliable. As long as a single MM maps such a large folio\n (\"exclusively mapped\"), we can reliably detect it. Especially before\n fork() / after a short-lived child process quit, we will detect\n partial mappings reliably, which is the common case.\n\n In essence, if the average per-page mapcount in an anon THP is < 1,\n we know for sure that we have a partial mapping.\n\n However, as soon as multiple MMs are involved, we might miss detecting\n partial mappings: this might be relevant with long-lived child\n processes. If we have a fully-mapped anon folio before fork(), once\n our child processes and our parent all unmap (zap/COW) the same pages\n (but not the complete folio), we might not detect the partial mapping.\n However, once the child processes quit we would detect the partial\n mapping.\n\n How relevant this case is in practice remains to be seen.\n Swapout/migration will likely mitigate this.\n\n In the future, RMAP walkers should check for that for \"mapped shared\"\n anon folios, and flag them for deferred-splitting.\n\nThere are a couple of remaining per-page mapcount users we won't\ntouch for now:\n\n (1) __dump_folio(): we'll tackle that separately later. For now, it\n will always read effective mapcount of \"0\" for pages in large folios.\n\n (2) include/trace/events/page_ref.h: we should rework the whole\n handling to be folio-aware and simply trace folio_mapcount(). Let's\n leave it around for now, might still be helpful to trace the raw\n page mapcount value (e.g., including the page type).\n\n (3) mm/mm_init.c: to initialize the mapcount/type field to -1. Will be\n required until we decoupled type+mapcount (e.g., moving it into\n \"struct folio\"), and until we initialize the type+mapcount when\n allocating a folio.\n\n (4) mm/page_alloc.c: to sanity-check that the mapcount/type field is -1\n when a page gets freed. We could probably remove at least the tail\n page mapcount check in non-debug environments.\n\nSome added ifdefery seems unavoidable for now: at least it's mostly\nlimited to the rmap add/remove core primitives.\n\nExtend documentation.\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO…"}},{"before":"af1ed1b5d68ab3685fbd0f61e49a81613d78fe15","after":"18a24232b98993e020ef6dfefc10f16ff608c693","ref":"refs/heads/mm_id","pushedAt":"2024-08-28T20:02:27.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO_PAGE_MAPCOUNT)\n\nEverything is in place to stop using the per-page mapcounts in large\nfolios with CONFIG_NO_PAGE_MAPCOUNT: the mapcount of tail pages will always\nbe logically 0 (-1 value), just like it currently is for hugetlb folios\nalready, and the page mapcount of the head page is either 0 (-1 value)\nor contains a page type (e.g., hugetlb).\n\nMaintaining _nr_pages_mapped without per-page mapcounts is impossible,\nso that one also has to go with CONFIG_NO_PAGE_MAPCOUNT.\n\nThere are two remaining implications:\n\n(1) Per-node, per-cgroup and per-lruvec stats of \"NR_ANON_MAPPED\"\n (\"mapped anonymous memory\") and \"NR_FILE_MAPPED\"\n (\"mapped file memory\"):\n\n As soon as any page of the folio is mapped -- folio_mapped() -- we\n now account the complete folio as mapped. Once the last page is\n unmapped -- !folio_mapped() -- we account the complete folio as\n unmapped.\n\n This implies that ...\n\n * \"AnonPages\" and \"Mapped\" in /proc/meminfo and\n /sys/devices/system/node/*/meminfo\n * cgroup v2: \"anon\" and \"file_mapped\" in \"memory.stat\" and\n \"memory.numa_stat\"\n * cgroup v1: \"rss\" and \"mapped_file\" in \"memory.stat\" and\n \"memory.numa_stat\n\n ... can now appear higher than before. But note that these folios do\n consume that memory, simply not all pages are actually currently\n mapped.\n\n It's worth nothing that other accounting in the kernel (esp. cgroup\n charging on allocation) is not affected by this change.\n\n [why oh why is \"anon\" called \"rss\" in cgroup v1]\n\n (2) Detecting partial mappings\n\n Detecting whether anon THP are partially mapped gets a bit more\n unreliable. As long as a single MM maps such a large folio\n (\"exclusively mapped\"), we can reliably detect it. Especially before\n fork() / after a short-lived child process quit, we will detect\n partial mappings reliably, which is the common case.\n\n In essence, if the average per-page mapcount in an anon THP is < 1,\n we know for sure that we have a partial mapping.\n\n However, as soon as multiple MMs are involved, we might miss detecting\n partial mappings: this might be relevant with long-lived child\n processes. If we have a fully-mapped anon folio before fork(), once\n our child processes and our parent all unmap (zap/COW) the same pages\n (but not the complete folio), we might not detect the partial mapping.\n However, once the child processes quit we would detect the partial\n mapping.\n\n How relevant this case is in practice remains to be seen.\n Swapout/migration will likely mitigate this.\n\n In the future, RMAP walkers schould check for that for \"mapped shared\"\n anon folios, and flag them for deferred-splitting.\n\nThere are a couple of remaining per-page mapcount users we won't\ntouch for now:\n\n (1) __dump_folio(): we'll tackle that separately later. For now, it\n will always read effective mapcount of \"0\" for pages in large folios.\n\n (2) include/trace/events/page_ref.h: we should rework the whole\n handling to be folio-aware and simply trace folio_mapcount(). Let's\n leave it around for now, might still be helpful to trace the raw\n page mapcount value (e.g., including the page type).\n\n (3) mm/mm_init.c: to initialize the mapcount/type field to -1. Will be\n required until we decoupled type+mapcount (e.g., moving it into\n \"struct folio\"), and until we initialize the type+mapcount when\n allocating a folio.\n\n (4) mm/page_alloc.c: to sanity-check that the mapcount/type field is -1\n when a page gets freed. We could probably remove at least the tail\n page mapcount check in non-debug environments.\n\nSome added ifdef'ery seems unavoidable for now: at least it's mostly\nlimited to the rmap add/remove core primitives.\n\nExtend documentation.\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO…"}},{"before":"45df8b4cfa9dc1faa788c50dbb94186455b3cf93","after":"af1ed1b5d68ab3685fbd0f61e49a81613d78fe15","ref":"refs/heads/mm_id","pushedAt":"2024-08-27T17:36:05.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO_PAGE_MAPCOUNT)\n\nEverything is in place to stop using the per-page mapcounts in large\nfolios with CONFIG_NO_PAGE_MAPCOUNT: the mapcount of tail pages will always\nbe logically 0 (-1 value), just like it currently is for hugetlb folios\nalready, and the page mapcount of the head page is either 0 (-1 value)\nor contains a page type (e.g., hugetlb).\n\nMaintaining _nr_pages_mapped without per-page mapcounts is impossible,\nso that one also has to go with CONFIG_NO_PAGE_MAPCOUNT.\n\nThere are two remaining implications:\n\n(1) Per-node, per-cgroup and per-lruvec stats of \"NR_ANON_MAPPED\"\n (\"mapped anonymous memory\") and \"NR_FILE_MAPPED\"\n (\"mapped file memory\"):\n\n As soon as any page of the folio is mapped -- folio_mapped() -- we\n now account the complete folio as mapped. Once the last page is\n unmapped -- !folio_mapped() -- we account the complete folio as\n unmapped.\n\n This implies that ...\n\n * \"AnonPages\" and \"Mapped\" in /proc/meminfo and\n /sys/devices/system/node/*/meminfo\n * cgroup v2: \"anon\" and \"file_mapped\" in \"memory.stat\" and\n \"memory.numa_stat\"\n * cgroup v1: \"rss\" and \"mapped_file\" in \"memory.stat\" and\n \"memory.numa_stat\n\n ... can now appear higher than before. But note that these folios do\n consume that memory, simply not all pages are actually currently\n mapped.\n\n It's worth nothing that other accounting in the kernel (esp. cgroup\n charging on allocation) is not affected by this change.\n\n [why oh why is \"anon\" called \"rss\" in cgroup v1]\n\n (2) Detecting partial mappings\n\n Detecting whether anon THP are partially mapped gets a bit more\n unreliable. As long as a single MM maps such a large folio\n (\"exclusively mapped\"), we can reliably detect it. Especially before\n fork() / after a short-lived child process quit, we will detect\n partial mappings reliably, which is the common case.\n\n In essence, if the average per-page mapcount in an anon THP is < 1,\n we know for sure that we have a partial mapping.\n\n However, as soon as multiple MMs are involved, we might miss detecting\n partial mappings: this might be relevant with long-lived child\n processes. If we have a fully-mapped anon folio before fork(), once\n our child processes and our parent all unmap (zap/COW) the same pages\n (but not the complete folio), we might not detect the partial mapping.\n However, once the child processes quit we would detect the partial\n mapping.\n\n How relevant this case is in practice remains to be seen.\n Swapout/migration will likely mitigate this.\n\n In the future, RMAP walkers chould check for that for \"mapped shared\"\n anon folios, and flag them for partial-splitting.\n\nThere are a couple of remaining per-page mapcount users we won't\ntouch for now:\n\n (1) __dump_folio(): we'll tackle that separately later. For now, it\n will always read effective mapcount of \"0\" for pages in large folios.\n\n (2) include/trace/events/page_ref.h: we should rework the whole\n handling to be folio-aware and simply trace folio_mapcount(). Let's\n leave it around for now, might still be helpful to trace the raw\n page mapcount value (e.g., including the page type).\n\n (3) mm/mm_init.c: to initialize the mapcount/type field to -1. Will be\n required until we decoupled type+mapcount (e.g., moving it into\n \"struct folio\"), and until we initialize the type+mapcount when\n allocating a folio.\n\n (4) mm/page_alloc.c: to sanity-check that the mapcount/type field is -1\n when a page gets freed. We could probably remove at least the tail\n page mapcount check in non-debug environments.\n\nSome added ifdef'ery seems unavoidable for now: at least it's mostly\nlimited to the rmap add/remove core primitives.\n\nExtend documentation.\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO…"}},{"before":"c0728dc76f9c7e17330c68200dea6fdd181f6c7e","after":"45df8b4cfa9dc1faa788c50dbb94186455b3cf93","ref":"refs/heads/mm_id","pushedAt":"2024-08-27T15:15:30.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO_PAGE_MAPCOUNT)\n\nEverything is in place to stop using the per-page mapcounts in large\nfolios with CONFIG_NO_PAGE_MAPCOUNT: the mapcount of tail pages will always\nbe logically 0 (-1 value), just like it currently is for hugetlb folios\nalready, and the page mapcount of the head page is either 0 (-1 value)\nor contains a page type (e.g., hugetlb).\n\nMaintaining _nr_pages_mapped without per-page mapcounts is impossible,\nso that one also has to go with CONFIG_NO_PAGE_MAPCOUNT.\n\nThere are two remaining implications:\n\n(1) Per-node, per-cgroup and per-lruvec stats of \"NR_ANON_MAPPED\"\n (\"mapped anonymous memory\") and \"NR_FILE_MAPPED\"\n (\"mapped file memory\"):\n\n As soon as any page of the folio is mapped -- folio_mapped() -- we\n now account the complete folio as mapped. Once the last page is\n unmapped -- !folio_mapped() -- we account the complete folio as\n unmapped.\n\n This implies that ...\n\n * \"AnonPages\" and \"Mapped\" in /proc/meminfo and\n /sys/devices/system/node/*/meminfo\n * cgroup v2: \"anon\" and \"file_mapped\" in \"memory.stat\" and\n \"memory.numa_stat\"\n * cgroup v1: \"rss\" and \"mapped_file\" in \"memory.stat\" and\n \"memory.numa_stat\n\n ... can now appear higher than before. But note that these folios do\n consume that memory, simply not all pages are actually currently\n mapped.\n\n It's worth nothing that other accounting in the kernel (esp. cgroup\n charging on allocation) is not affected by this change.\n\n [why oh why is \"anon\" called \"rss\" in cgroup v1]\n\n (2) Detecting partial mappings\n\n Detecting whether anon THP are partially mapped gets a bit more\n unreliable. As long as a single MM maps such a large folio\n (\"exclusively mapped\"), we can reliably detect it. Especially before\n fork() / after a short-lived child process quit, we will detect\n partial mappings reliably, which is the common case.\n\n In essence, if the average per-page mapcount in an anon THP is < 1,\n we know for sure that we have a partial mapping.\n\n However, as soon as multiple MMs are involved, we might miss detecting\n partial mappings: this might be relevant with long-lived child\n processes. If we have a fully-mapped anon folio before fork(), once\n our child processes and our parent all unmap (zap/COW) the same pages\n (but not the complete folio), we might not detect the partial mapping.\n However, once the child processes quit we would detect the partial\n mapping.\n\n How relevant this case is in practice remains to be seen.\n Swapout/migration will likely mitigate this.\n\n In the future, RMAP walkers chould check for that for \"mapped shared\"\n anon folios, and flag them for partial-splitting.\n\nThere are a couple of remaining per-page mapcount users we won't\ntouch for now:\n\n (1) __dump_folio(): we'll tackle that separately later. For now, it\n will always read effective mapcount of \"0\" for pages in large folios.\n\n (2) include/trace/events/page_ref.h: we should rework the whole\n handling to be folio-aware and simply trace folio_mapcount(). Let's\n leave it around for now, might still be helpful to trace the raw\n page mapcount value (e.g., including the page type).\n\n (3) mm/mm_init.c: to initialize the mapcount/type field to -1. Will be\n required until we decoupled type+mapcount (e.g., moving it into\n \"struct folio\"), and until we initialize the type+mapcount when\n allocating a folio.\n\n (4) mm/page_alloc.c: to sanity-check that the mapcount/type field is -1\n when a page gets freed. We could probably remove at least the tail\n page mapcount check in non-debug environments.\n\nSome added ifdef'ery seems unavoidable for now: at least it's mostly\nlimited to the rmap add/remove core primitives.\n\nExtend documentation.\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO…"}},{"before":"02675d3b8e6ab46791b0cdf81d71780bddce63d8","after":"c0728dc76f9c7e17330c68200dea6fdd181f6c7e","ref":"refs/heads/mm_id","pushedAt":"2024-08-27T14:55:11.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO_PAGE_MAPCOUNT)\n\nEverything is in place to stop using the per-page mapcounts in large\nfolios with CONFIG_NO_PAGE_MAPCOUNT: the mapcount of tail pages will always\nbe logically 0 (-1 value), just like it currently is for hugetlb folios\nalready, and the page mapcount of the head page is either 0 (-1 value)\nor contains a page type (e.g., hugetlb).\n\nMaintaining _nr_pages_mapped without per-page mapcounts is impossible,\nso that one also has to go with CONFIG_NO_PAGE_MAPCOUNT.\n\nThere are two remaining implications:\n\n(1) Per-node, per-cgroup and per-lruvec stats of \"NR_ANON_MAPPED\"\n (\"mapped anonymous memory\") and \"NR_FILE_MAPPED\" (\"mapped file memory\"):\n\n As soon as any page of the folio is mapped -- folio_mapped() -- we\n now account the complete folio as mapped. Once the last page is\n unmapped -- !folio_mapped() -- we account the complete folio as\n unmapped.\n\n This implies that ...\n\n * \"AnonPages\" and \"Mapped\" in /proc/meminfo and\n /sys/devices/system/node/*/meminfo\n * cgroup v2: \"anon\" and \"file_mapped\" in \"memory.stat\" and\n \"memory.numa_stat\"\n * cgroup v1: \"rss\" and \"mapped_file\" in \"memory.stat\" and\n \"memory.numa_stat\n\n ... can now appear higher than before. But note that these folios do\n consume that memory, simply not all pages are actually currently\n mapped.\n\n It's worth nothing that other accounting in the kernel (esp. cgroup\n charging on allocation) is not affected by this change.\n\n [why oh why is \"anon\" called \"rss\" in cgroup v1]\n\n (2) Detecting partial mappings\n\n Detecting whether anon THP are partially mapped gets a bit more\n unreliable. As long as a single MM maps such a large folio\n (\"exclusively mapped\"), we can reliably detect it. Especially before\n fork() / after a short-lived child process quit, we will detect\n partial mappings reliably, which is the common case.\n\n In essence, if the average per-page mapcount in an anon THP is < 1,\n we know for sure that we have a partial mapping.\n\n However, as soon as multiple MMs are involved, we might miss detecting\n partial mappings: this might be relevant with long-lived child\n processes. If we have a fully-mapped anon folio before fork(), once\n our child processes and our parent all unmap (zap/COW) the same pages\n (but not the complete folio), we might not detect the partial mapping.\n However, once the child processes quit we would detect the partial\n mapping.\n\n How relevant this case is in practice remains to be seen.\n Swapout/migration will likely mitigate this.\n\n In the future, RMAP walkers chould check for that for \"mapped shared\"\n anon folios, and flag them for partial-splitting.\n\nThere are a couple of remaining per-page mapcount users we won't\ntouch for now:\n\n (1) __dump_folio(): we'll tackle that separately later. For now, it\n will always read effective mapcount of \"0\" for pages in large folios.\n\n (2) include/trace/events/page_ref.h: we should rework the whole\n handling to be folio-aware and simply trace folio_mapcount(). Let's\n leave it around for now, might still be helpful to trace the raw\n page mapcount value (e.g., including the page type).\n\n (3) mm/mm_init.c: to initialize the mapcount/type field to -1. Will be\n required until we decoupled type+mapcount (e.g., moving it into\n \"struct folio\"), and until we initialize the type+mapcount when\n allocating a folio.\n\n (4) mm/page_alloc.c: to sanity-check that the mapcount/type field is -1\n when a page gets freed. We could probably remove at least the tail\n page mapcount check in non-debug environments.\n\nSome added ifdef'ery seems unavoidable for now: at least it's mostly\nlimited to the rmap add/remove core primitives.\n\nExtend documentation.\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO…"}},{"before":"8c2cd314af19044e9d5ddf659e6114beb49ab0d2","after":"02675d3b8e6ab46791b0cdf81d71780bddce63d8","ref":"refs/heads/mm_id","pushedAt":"2024-08-26T14:29:50.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"mm: stop maintaining the per-page mapcount of large folios with CONFIG_NO_PAGE_MAPCOUNT\n\nEverything is in place to stop using the per-page mapcounts of tail\npages in large folios with CONFIG_NO_PAGE_MAPCOUNT: the mapcount of\ntail pages will always be logically 0 (-1 value), just like it currently\nis for hugetlb folios already.\n\nMaintaining _nr_pages_mapped without per-page mapcounts is impossible,\nso that one also has to go with CONFIG_NO_PAGE_MAPCOUNT.\n\nThere are two remaining implications:\n\n(1) Per-node, per-cgroup and per-lruvec stats of \"NR_ANON_MAPPED\"\n (\"mapped anonymous memory\") and \"NR_FILE_MAPPED\" (\"mapped file memory\"):\n\n As soon as any page of the folio is mapped -- folio_mapped() -- we\n now account the complete folio as mapped. Once the last page is\n unmapped -- !folio_mapped() -- we account the complete folio as\n unmapped.\n\n This implies that ...\n\n * \"AnonPages\" and \"Mapped\" in /proc/meminfo and\n /sys/devices/system/node/*/meminfo\n * cgroup v2: \"anon\" and \"file_mapped\" in \"memory.stat\" and\n \"memory.numa_stat\"\n * cgroup v1: \"rss\" and \"mapped_file\" in \"memory.stat\" and\n \"memory.numa_stat\n\n ... can now appear higher than before. But note that these folios do\n consume that memory, simply not all pages are actually currently\n mapped.\n\n It's worth nothing that other accounting in the kernel (esp. cgroup\n charging on allocation) is not affected by this change.\n\n [why oh why is \"anon\" called \"rss\" in cgroup v1]\n\n (2) Detecting partial mappings\n\n Detecting whether anon THP are partially mapped gets a bit more\n unreliable. As long as a single MM maps such a large folio\n (\"exclusively mapped\"), we can reliably detect it. Especially before\n fork() / after a short-lived child process quit, we will detect\n partial mappings reliably, which is the common case.\n\n In essence, if the average per-page mapcount in an anon THP is < 1,\n we know for sure that we have a partial mapping.\n\n However, as soon as multiple MMs are involved, we might miss detecting\n partial mappings: this might be relevant with long-lived child\n processes. If we have a fully-mapped anon folio before fork(), once\n our child processes and our parent all unmap (zap/COW) the same pages\n (but not the complete folio), we might not detect the partial mapping.\n However, once the child processes quit we would detect the partial\n mapping.\n\n How relevant this case is in practice remains to be seen.\n Swapout/migration will likely mitigate this.\n\n In the future, RMAP walkers chould check for that for \"mapped shared\"\n anon folios, and flag them for partial-splitting.\n\nThere are a couple of remaining per-page mapcount users we won't\ntouch for now:\n\n (1) __dump_folio(): we'll tackle that separately later. For now, it\n will always read effective mapcount of \"0\" for pages in large folios.\n\n (2) include/trace/events/page_ref.h: we should rework the whole\n handling to be folio-aware and simply trace folio_mapcount(). Let's\n leave it around for now, might still be helpful to trace the raw\n page mapcount value (e.g., including the page type).\n\n (3) mm/mm_init.c: to initialize the mapcount/type field to -1. Will be\n required until we decoupled type+mapcount (e.g., moving it into\n \"struct folio\"), and until we initialize the type+mapcount when\n allocating a folio.\n\n (4) mm/page_alloc.c: to sanity-check that the mapcount/type field is -1\n when a page gets freed. We could probably remove at least the tail\n page mapcount check in non-debug environments.\n\nSome added ifdef'ery seems unavoidable for now: at least it's mostly\nlimited to the rmap add/remove core primitives.\n\nExtend documentation.\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"mm: stop maintaining the per-page mapcount of large folios with CONFI…"}},{"before":"43511801f2217f70e9264f78382cea8915232f05","after":"8c2cd314af19044e9d5ddf659e6114beb49ab0d2","ref":"refs/heads/mm_id","pushedAt":"2024-08-22T19:21:34.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"mm: stop maintaining the per-page mapcount of large folios with CONFIG_NO_PAGE_MAPCOUNT\n\nEverything is in place to stop using the per-page mapcounts of tail\npages in large folios with CONFIG_NO_PAGE_MAPCOUNT: the mapcount of\ntail pages will always be logically 0 (-1 value), just like it currently\nis for hugetlb folios already.\n\nMaintaining _nr_pages_mapped without per-page mapcounts is impossible,\nso that one also has to go with CONFIG_NO_PAGE_MAPCOUNT.\n\nThere are two remaining implications:\n\n(1) Per-node, per-cgroup and per-lruvec stats of \"NR_ANON_MAPPED\"\n (\"mapped anonymous memory\") and \"NR_FILE_MAPPED\" (\"mapped file memory\"):\n\n As soon as any page of the folio is mapped -- folio_mapped() -- we\n now account the complete folio as mapped. Once the last page is\n unmapped -- !folio_mapped() -- we account the complete folio as\n unmapped.\n\n This implies that ...\n\n * \"AnonPages\" and \"Mapped\" in /proc/meminfo and\n /sys/devices/system/node/*/meminfo\n * cgroup v2: \"anon\" and \"file_mapped\" in \"memory.stat\" and\n \"memory.numa_stat\"\n * cgroup v1: \"rss\" and \"mapped_file\" in \"memory.stat\" and\n \"memory.numa_stat\n\n ... can now appear higher than before. But note that these folios do\n consume that memory, simply not all pages are actually currently\n mapped.\n\n It's worth nothing that other accounting in the kernel (esp. cgroup\n charging on allocation) is not affected by this change.\n\n [why oh why is \"anon\" called \"rss\" in cgroup v1]\n\n (2) Detecting partial mappings\n\n Detecting whether anon THP are partially mapped gets a bit more\n unreliable. As long as a single MM maps such a large folio\n (\"exclusively mapped\"), we can reliably detect it. Especially before\n fork() / after a short-lived child process quit, we will detect\n partial mappings reliably, which is the common case.\n\n In essence, if the average per-page mapcount in an anon THP is < 1,\n we know for sure that we have a partial mapping.\n\n However, as soon as multiple MMs are involved, we might miss detecting\n partial mappings: this might be relevant with long-lived child\n processes. If we have a fully-mapped anon folio before fork(), once\n our child processes and our parent all unmap (zap/COW) the same pages\n (but not the complete folio), we might not detect the partial mapping.\n However, once the child processes quit we would detect the partial\n mapping.\n\n How relevant this case is in practice remains to be seen.\n Swapout/migration will likely mitigate this.\n\n In the future, RMAP walkers chould check for that for \"mapped shared\"\n anon folios, and flag them for partial-splitting.\n\nThere are a couple of remaining per-page mapcount users we won't\ntouch for now:\n\n (1) __dump_folio(): we'll tackle that separately later. For now, it\n will always read effective mapcount of \"0\" for pages in large folios.\n\n (2) include/trace/events/page_ref.h: we should rework the whole\n handling to be folio-aware and simply trace folio_mapcount(). Let's\n leave it around for now, might still be helpful to trace the raw\n page mapcount value (e.g., including the page type).\n\n (3) mm/mm_init.c: to initialize the mapcount/type field to -1. Will be\n required until we decoupled type+mapcount (e.g., moving it into\n \"struct folio\"), and until we initialize the type+mapcount when\n allocating a folio.\n\n (4) mm/page_alloc.c: to sanity-check that the mapcount/type field is -1\n when a page gets freed. We could probably remove at least the tail\n page mapcount check in non-debug environments.\n\nSome added ifdef'ery seems unavoidable for now: at least it's mostly\nlimited to the rmap add/remove core primitives.\n\nExtend documentation.\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"mm: stop maintaining the per-page mapcount of large folios with CONFI…"}},{"before":"f481cc9055eddf5eecaa0f07121824f1da9d65cc","after":"43511801f2217f70e9264f78382cea8915232f05","ref":"refs/heads/mm_id","pushedAt":"2024-08-22T15:55:11.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"davidhildenbrand","name":"David Hildenbrand","path":"/davidhildenbrand","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1547205?s=80&v=4"},"commit":{"message":"mm: stop maintaining the per-page mapcount of large folios with CONFIG_NO_PAGE_MAPCOUNT\n\nEverything is in place to stop using the per-page mapcounts of tail\npages in large folios with CONFIG_NO_PAGE_MAPCOUNT: the mapcount of\ntail pages will always be logically 0 (-1 value), just like it currently\nis for hugetlb folios already.\n\nMaintaining _nr_pages_mapped without per-page mapcounts is impossible,\nso that one also has to go with CONFIG_NO_PAGE_MAPCOUNT.\n\nThere are two remaining implications:\n\n(1) Per-node, per-cgroup and per-lruvec stats of \"NR_ANON_MAPPED\"\n (\"mapped anonymous memory\") and \"NR_FILE_MAPPED\" (\"mapped file memory\"):\n\n As soon as any page of the folio is mapped -- folio_mapped() -- we\n account all folio pages as mapped. Once the last page is unmapped --\n !folio_mapped() -- we account all folio pages as unmapped.\n\n This implies that ...\n\n * \"AnonPages\" and \"Mapped\" in /proc/meminfo and\n /sys/devices/system/node/*/meminfo\n * cgroup v2: \"anon\" and \"file_mapped\" in \"memory.stat\" and\n \"memory.numa_stat\"\n * cgroup v1: \"rss\" and \"mapped_file\" in \"memory.stat\" and\n \"memory.numa_stat\n\n ... can now appear higher than before. But note that these folios do\n consume that memory, simply not all pages are actually currently\n mapped.\n\n It's worth nothing that other accounting in the kernel (actual RSS,\n cgroup charging on allocation) is not affected by this change.\n\n [why oh why is \"anon\" called rss in cgroup v1; what an absolute mess]\n\n (2) Detecting partial mappings\n\n Detecting whether anon THP are partially mapped gets a bit more\n unreliable. As long as a single MM maps such a large folio\n (\"exclusively mapped\"), we can reliably detect it. Especially before\n fork() / after a short-lived child process quit, we will detect\n partial mappings reliably, which is the common case.\n\n In essence, if the average per-page mapcount in an anon THP is < 1,\n we know for sure that we have a partial mapping.\n\n However, as soon as multiple MMs are involved, we might miss detecting\n partial mappings: this might be relevant with long-lived child\n processes. If we have a fully-mapped anon folio before fork(), once\n our child processes and our parent all unmap (zap/COW) the same pages\n (but not the complete folio), we might not detect the partial mapping.\n Once the child processes quit, we would detect the partial mapping.\n\n How relevant this case is in practice remains to be seen.\n Swapout/migration will likely mitigate this.\n\n In the future, RMAP walkers chould check for that for \"mapped shared\"\n anon folios, and flag them for partial-splitting.\n\nThere are a couple of remaining per-page mapcount users we won't\ntouch for now:\n\n (1) __dump_folio(): we'll tackle that separately later. For now, it\n will always read effective mapcount of \"0\" for pages in large folios.\n\n (2) include/trace/events/page_ref.h: we should rework the whole\n handling to be folio-aware and simply trace folio_mapcount(). Let's\n leave it around for now, might still be helpful to trace the raw\n page mapcount value (e.g., including the page type).\n\n (3) mm/mm_init.c: to initialize the mapcount/type field to -1. Will be\n required until we decoupled type+mapcount (e.g., moving it into\n \"struct folio\"), and until we initialize the type+mapcount when\n allocating a folio.\n\n (4) mm/page_alloc.c: to sanity-check that the mapcount/type field is -1\n when a page gets freed. We could probably remove at least the tail\n page mapcount check in non-debug environments.\n\nSome added ifdef'ery seems unavoidable for now: at least it's mostly\nlimited to the rmap add/remove core primitives.\n\nExtend documentation.\n\nSigned-off-by: David Hildenbrand ","shortMessageHtmlLink":"mm: stop maintaining the per-page mapcount of large folios with CONFI…"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"startCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0yNlQxNTo0NDowNC4wMDAwMDBazwAAAATBr3SR","endCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOC0yMlQxNTo1NToxMS4wMDAwMDBazwAAAAShez4q"}},"title":"Activity · davidhildenbrand/linux"}