-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Support large json in image #48569
[Enhancement] Support large json in image #48569
Conversation
9570084
to
2e6c0fa
Compare
Path checksumPath = Path.of(imageDir, "v2", Storage.CHECKSUM + "." + imageJournalId); | ||
File checksumFile = new File(checksumPath.toUri()); | ||
if (!checksumFile.exists()) { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should treat checksum seriously, this shouldn't happend, we should throw an exception explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
long expectedCheckSum = Long.parseLong(Files.readString(checksumPath)); | ||
try (CheckedInputStream in = new CheckedInputStream(new FileInputStream(imageFile), new CRC32())) { | ||
byte[] bytes = new byte[8192]; | ||
while (in.read(bytes) != -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add log to check how long does this take
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use the checkedInputStream to judge the checksum at the end. So no pre-check required.
while (in.read(bytes) != -1) { | ||
} | ||
|
||
long realCheckSum = in.getChecksum().getValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, if we use checkedInputStream to read data from v2 image file at the beginning, we don't need to checksum again? just like SRMetaBlockReader
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I use the checkedInputStream to judge the checksum at the end.
@Override | ||
public void writeJson(Object object) throws IOException, SRMetaBlockException { | ||
if (isBasicType(object)) { | ||
throw new SRMetaBlockException("can not write primitive type"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two adjacent primitive types will be considered as one when reading.
d326c0c
to
e833dcc
Compare
Signed-off-by: gengjun-git <[email protected]>
Signed-off-by: gengjun-git <[email protected]>
1829033
to
5a5cf45
Compare
Signed-off-by: gengjun-git <[email protected]>
Signed-off-by: gengjun-git <[email protected]>
Quality Gate passedIssues Measures |
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]❌ fail : 309 / 484 (63.84%) file detail
|
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
@Mergifyio backport branch-3.3 |
✅ Backports have been created
|
Signed-off-by: gengjun-git <[email protected]> (cherry picked from commit 78715a7) # Conflicts: # fe/fe-core/src/main/java/com/starrocks/authentication/AuthenticationMgr.java # fe/fe-core/src/main/java/com/starrocks/http/meta/MetaService.java # fe/fe-core/src/main/java/com/starrocks/leader/Checkpoint.java # fe/fe-core/src/main/java/com/starrocks/server/LocalMetastore.java
Signed-off-by: gengjun-git <[email protected]> Co-authored-by: gengjun-git <[email protected]>
## Why I'm doing: Backport the stream reading of #48569 Signed-off-by: gengjun-git <[email protected]>
Why I'm doing:
Currently, JSON serialization supports a maximum of 2G, because serialization converts the object into a JSON string first, and then writes the string into the image file, and the maximum size of a string in Java is only 2G.
What I'm doing:
To support large json string, we can use the stream api of GSON:
JsonWriter
andJsonReader
.Since we cannot know the length of the string in advance, so a new image file format is introduced: a json string must begin with "{" or "[", so primitive type is not supported, and there is nothing between the json string. To support rollback, we use a double write solution. The dir format is as follows
meta/image/image.111
is the old format image file, andmeta/image/v2/image.111
is the new format image file. Since the JsonReader will cache some chars in advance, so we cannot calculate the checksum of the specific meta-block, and the checksum is put into another filechecksum.111
;What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: