Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support copying GC #18

Closed
14 tasks done
wks opened this issue Nov 2, 2022 · 2 comments
Closed
14 tasks done

Support copying GC #18

wks opened this issue Nov 2, 2022 · 2 comments

Comments

@wks
Copy link
Collaborator

wks commented Nov 2, 2022

Challenges for supporting copying GC include

  • Some fields cannot be updated, and have to pin the objects they point to.
  • Some data structures depend on the address of objects, and need to be updated.
    • ID-to-address and address-to-ID map
    • address-to-gen_ivtbl map
    • finalizer_trable

Handling of roots:

Upstream changes needed for copying GC

Correctness goals

Performance goals

  • Performance on par or exceed CRuby's vanilla GC
    • Currently when running Liquid benchmark, MMTk-Ruby is close to vanilla Ruby at 2x minimum heap size, and can outperform vanilla Ruby at 3x minimum heap size or greater.
    • Still room for improvement for STW time. See: GC performance issues for MMTk Ruby #25

Un-update-able references

One challenge of supporting copying GC in Ruby is that some object references cannot be updated. Specifically,

  1. Due to conservative stack scanning, local variables (in C functions, not in Ruby functions) cannot be updated.
  2. Some fields in global data structures are marked with rb_gc_mark during gc_mark_roots. Those fields cannot be updated.
  3. Some objects have fields that cannot be updated.

Because object references held in those places cannot be updated, the objects pointed by those reference must be pinned. In other words, if an object has type T_DATA, T_IMEMO, T_HASH or has the EXIVAR flag, the object itself can move, but it pins its children.

Recording "potential pinning parents"

The Ruby binding shall maintain a runtime list of "potential pinning parents" (PPP for short). That includes all types of object in (3) in the previous section. Specifically,

  • When T_DATA, and those listed T_MEMO are instantiated, we add them to the PPP list.
  • When a T_HASH becomes compare_by_identity or when an object gets the EXIVAR flag, we add it to the PPP list.

Note that some PPPs don't always pin their children. Some T_DATA can actually move their children because it is modern, and the developers used rb_gc_mark_movable and provided the dcompact function. For other PPPs, their pinning fields may just be nil at the moment of the GC. But we have to add them to the PPP list conservatively because we don't know if a T_DATA is modern enough or if any field is nil.

We visit all PPPs before GC and pin their children (via pinning fields only), so when GC starts, those children won't move. Note that

  • those children are not kept alive, and
  • the PPPs' children's children are not recursively pinned.

After GC, re-visit the PPP list, and remove all dead objects from it. Unpin live objects in the PPP list.

More language-neutral discussions on this topic are here: mmtk/mmtk-core#690

Ways to reduce the number of potential pinning parents

  • Introduce declarative marking. T_DATA objects that support declarative marking are not considered PPPs.
  • Whitelist T_DATA types in Ruby core/stdlib that are known not to pin children
  • Fix T_DATA types in Ruby core/stdlib, replace their rb_gc_mark with rb_gc_mark_movable, and introduce compaction functions for them.

Address-aware data structures

Global address-to-ID table

In Ruby, objects may optionally have an ID. Once the ID of an object is seen, it will never change as long as it is alive. In vanilla Ruby, it maintains two tables:

typedef struct rb_objspace {
    // ...
    st_table *id_to_obj_tbl;
    st_table *obj_to_id_tbl;
    // ...
} rb_objspace_t;

Those table are maintained when objects are moved (in gc_move) and entries are removed when objects die (in obj_free).

In MMTk, we should consider them weak maps, and use our existing weak reference processing framework to handle them. Effectively, those table entries are things that gets garbage-collected when their owners (the object) die. We can treat both id_to_obj_tbl and obj_to_id_tbl as one single bi-directional weak map.

Interestingly, this is exactly what WeakReferences are intended for. In Java documentation:

Weak reference objects, which do not prevent their referents from being made finalizable, finalized, and then reclaimed. Weak references are most often used to implement canonicalizing mappings.

Global address-to-gen_ivtbl map

If an object is not T_OBJECT, its instance variables (@foo, @bar, ... in Ruby language) are stored in an external generic instance variable table (gen_ivtbl), and is associated to the object via a global hash map (generic_iv_tbl_), with the object address as key, and the gen_ivtbl as value.

This global map (generic_iv_tbl_ in variable.c) needs to be updated whenever an object is moved. In vanilla Ruby, this is done in gc_move, by calling rb_mv_generic_ivar

Like the address-to-ID table, this is also a "canonical map", and should be treated as a weak map with weak keys.

@wks
Copy link
Collaborator Author

wks commented Feb 15, 2023

Copying GC support has been merged to the main branches of mmtk-ruby and ruby.

It can now use the Immix plan and pass make btest.

PPPs are handled. Roots are currently pinned using the object-pinning API instead of black roots. We can use the black roots mechanism instead, and we can enable the movement of some non-pinning roots, too.

@wks
Copy link
Collaborator Author

wks commented Nov 3, 2023

Issues about pinning roots (red/black roots) have been resolved. We probably don't need moving global roots now because very few global roots can be updated. MMTk-Ruby is already able outperforming vanilla Ruby on Liquid benchmark at certain heap sizes after we started tuning performance. I am closing this issue.

@wks wks closed this as completed Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant