Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redact object data in heap snapshots, with option to opt-out #55326

Merged
merged 8 commits into from
Aug 25, 2024

Conversation

kpamnany
Copy link
Contributor

The contents of strings can contain user data which may be proprietary and emitting them in the heap snapshot makes the heap snapshot a potential vulnerability rather than a useful debugging artifact.

There are likely other tweaks necessary to make heap snapshots "safe", but this is one less.

@IanButterworth
Copy link
Sponsor Member

Seems like a good default, but might be worth putting behind a switch as the string content might be helpful in understanding what it is?

@gbaraldi
Copy link
Member

Right. I agree with Ian. And i've used the string content before to try and localize where it came from. Maybe we add a redacted=true option to the snapshot?

@oscardssmith
Copy link
Member

we could go further then this and zero out all bitstypes

@kpamnany
Copy link
Contributor Author

kpamnany commented Aug 1, 2024

How do folks feel about adding a global option into jl_options that would be used in this case, and also in other places in the runtime such as exception printing?

@IanButterworth
Copy link
Sponsor Member

I'd want the ability to decide to not redact when collecting a heap snapshot within a session, not require restarting.

@NHDaly
Copy link
Member

NHDaly commented Aug 8, 2024

yeah, +1 to ian's comment. i think this should go in the API request, rather than in the whole runtime. I'll leave a comment in the code

src/gc-heap-snapshot.cpp Outdated Show resolved Hide resolved
@NHDaly
Copy link
Member

NHDaly commented Aug 8, 2024

we could go further then this and zero out all bitstypes

@oscardssmith: Agreed. If there are any other sources of data in the snapshot, then yeah we should not emit them.

However, when I made the same point internally, the team pointed out that strings are the only primitive data values we actually copy out from memory into the snapshot. For everything else, we only copy object sizes, types, and their outbound pointers.

So there shouldn't be any other isbits data in these snapshots as far as I know. @IanButterworth and @gbaraldi can you confirm that?

@NHDaly
Copy link
Member

NHDaly commented Aug 8, 2024

Thanks for the PR, Kiran! :)

@kpamnany kpamnany requested a review from NHDaly August 8, 2024 21:00
@kpamnany
Copy link
Contributor Author

kpamnany commented Aug 8, 2024

Please take another look @IanButterworth, @gbaraldi and @oscardssmith?

@kpamnany kpamnany changed the title Do not emit string content in heap snapshots Add an option to redact object data in heap snapshots Aug 8, 2024
@IanButterworth IanButterworth added needs tests Unit tests are required for this change needs news A NEWS entry is required for this change profiler labels Aug 12, 2024
Comment on lines +284 to +301
sshot = read(fname, String)
@test sshot != ""
@test !contains(sshot, "redact_this")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add another test that if you set redact_data=false, it does have the string data? 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@NHDaly NHDaly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks again, kiran, this is a nice change.

I left a couple tiny suggestions, should be easy to just apply, and then one more request for another test. After that, LGTM feel free to merge 😎

@IanButterworth IanButterworth removed the needs tests Unit tests are required for this change label Aug 16, 2024
@IanButterworth
Copy link
Sponsor Member

Just needs a news entry.

@IanButterworth IanButterworth changed the title Add an option to redact object data in heap snapshots Redact object data in heap snapshots, with option to opt-out Aug 18, 2024
NEWS.md Outdated Show resolved Hide resolved
@IanButterworth IanButterworth added status:merge me PR is reviewed. Merge when all tests are passing and removed needs news A NEWS entry is required for this change labels Aug 20, 2024
@kpamnany
Copy link
Contributor Author

Not sure what's up with CI. Hope you can get this merged Ian!

@DilumAluthge DilumAluthge merged commit 383c8ef into master Aug 25, 2024
7 checks passed
@DilumAluthge DilumAluthge deleted the kp/safe-heap-snapshot branch August 25, 2024 16:02
@DilumAluthge DilumAluthge removed the status:merge me PR is reviewed. Merge when all tests are passing label Aug 25, 2024
kpamnany added a commit to RelationalAI/julia that referenced this pull request Aug 26, 2024
…ng#55326)

The contents of strings can contain user data which may be proprietary
and emitting them in the heap snapshot makes the heap snapshot a
potential vulnerability rather than a useful debugging artifact.

There are likely other tweaks necessary to make heap snapshots "safe",
but this is one less.

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Ian Butterworth <[email protected]>
kpamnany added a commit to RelationalAI/julia that referenced this pull request Aug 26, 2024
…ng#55326) (#174)

The contents of strings can contain user data which may be proprietary
and emitting them in the heap snapshot makes the heap snapshot a
potential vulnerability rather than a useful debugging artifact.

There are likely other tweaks necessary to make heap snapshots "safe",
but this is one less.

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Ian Butterworth <[email protected]>
KristofferC pushed a commit that referenced this pull request Sep 12, 2024
The contents of strings can contain user data which may be proprietary
and emitting them in the heap snapshot makes the heap snapshot a
potential vulnerability rather than a useful debugging artifact.

There are likely other tweaks necessary to make heap snapshots "safe",
but this is one less.

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Ian Butterworth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants