Skip to content
This repository has been archived by the owner on Jul 25, 2020. It is now read-only.

the chinese string length incorrect #14

Open
1036202457 opened this issue Oct 18, 2016 · 5 comments
Open

the chinese string length incorrect #14

1036202457 opened this issue Oct 18, 2016 · 5 comments

Comments

@1036202457
Copy link

here is my code:
String result = Pherialize.serialize("中文字长度");
here is the execute result:
"s:5:"中文字长度";"
and in php 5.6.9 the serialize result is:
s:15:"中文字长度";

@ghost
Copy link

ghost commented Oct 21, 2016

Yeah, any string above ASCII range is wrong. Unserialization also have the same problems, they get characters beyond the string. It should be read by bytes instead of characters.

Serialization is easy to solve, just do a getBytes().length; on the string. The problem is the unserilization.

EDIT: I think for unserialization the best bet is

int index, length; // the index where the string starts, and the length found.

String aux = input.substring(index);
byte[] bytes = aux.getBytes();
String result = new String(Arrays.copyOfRange(bytes, 0, length));

@killer0217
Copy link

I rewrite the unserializeString function and it works now on my side

private Mixed unserializeString()
{
int pos, length;

    pos = this.data.indexOf(':', this.pos + 2);
    length = Integer.parseInt(this.data.substring(this.pos + 2, pos));
    
    int startPos = pos+2;
    int lastEndPos = pos+2+length;
	String original = this.data.substring(startPos, lastEndPos);
	// BigChar means characters occupy more than one byte
	//e.g. Chinese characters. The length defined in serialization are for byte (one Chinese character use more than one byte), but the "charAt" function is for codePoint(one Chinese character is on codePoint)
	// Actually, this is not the exact number of bigchar because in some encoding, one Character need 3 or more bytes. But for solving this issue, this doesn't matter.
	int numberOfBigChar = original.getBytes().length - length; 
	String actual = this.data.substring(startPos, lastEndPos-numberOfBigChar);
	
    this.pos = pos + length + 4 - numberOfBigChar;
    return new Mixed(actual);
}

@renbaolin
Copy link

Chinese serialization and anti - serialization is not resolved? Pherialize.serialize () method serialization, PHP unserialize parsing does not come out.

@marcospassos
Copy link

It looks like this repository is abandoned. I came with the same issues, so I wrote this library:
https://github.com/marcospassos/java-php-serializer

@jontro
Copy link

jontro commented Jun 25, 2018

@marcospassos your library does not support deserialization from what I can read. This issue is not about serialization

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants