-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
what is the correct way to do binary attributes like jpegPhoto? #82
Comments
I've narrowed this down a little bit. It looks like the data is already UTF-8 encoded by the time DumpTransaction writes this entry:
That's looking at the log from an emacs buffer, so the representation of non-printable characters may be unfamiliar. This later log entry confirms it in a more conventional format:
|
I'm using MyVD 1.0.6. I tracked the problem down to net.sourceforge.myvd.inserts.jdbc.JdbcEntrySet.hasMore(). It assumes that values coming from the database are strings:
Later, when it converts things to LDAP-ish, it calls LDAPattribute.add(String). It's probably somewhere inside the
I was able to test this by special-casing "jpegPhoto" in the above code. However, I don't know the proper way to decide what attributes ought to be binary and which ought to be strings (or numbers or whatever else LDAP allows). I'm guessing that the LDAP schema info read by MyVD tells what the target type should be. The JDBC ResultSetMetaData would give the DB type. A complete and generic implementation would have a matrix of mappings of various DB types to various LDAP types, but maybe a good heuristic would be just using the DB type to decide binary vs string. The LDAPAttribute class only provides for byte[] and String attribute values anyhow. |
After a little more poking around .... Since JdbcInsert seems to read all DB values using getString(), I think it's up the the JDBC driver to decide how to provide that String. In my case, it's MySQL. After some limited looking at the MySQL driver source, I believe it tries to use the collation character encoding for the column. For the binary datatypes, there is no collation character encoding, so it falls back to the JVM default charset. In my case, that's UTF-8. In other words, I think the JDBC driver is taking the contents of my binary column and converting it to a Java String with the assumption that it's UTF-8 encoded. Of course, it's not UTF-8 encoded, so the interpretation is faulty and might even end up with whatever a UTF-8 decoder does when it sees something it doesn't understand. It's the result of that first round of UTF-8 encoding that then gets converted back to a byte[] when the string is handed to the LDAPattribute class. If my original binary data were actually UTF-8 encoded, I think that decode/encode would probably give back the correct binary data. That wouldn't help, though, since that would still put the attribute value on the wire as UTF-8 encoded. I haven't yet been able to duplicate in standalone Java the series of decoding and encoding that I theorize is going on. But even if I did, the situation seems a bit fragile. My plan now is to switch to storing the jpegPhoto data in my database as something safe (base64 or hex strings or something) that can be carried around as an ASCII-compatible string. I'll then use an insert to convert that ASCII-compatible string to the binary value that I need for the LDAP clients. I'm pretty sure that will work (Insert.postSearchEntry() seems to behave the way I want it to). If all that works out, then the JdbcInsert handling of binary data still looks like a bug to me, but one that will no longer be important to my use case. |
Thats for the great details. need to add a check against the binary attributes list in the jdbc insert and load as a blob when needed |
The technique of storing the JPEGs as base64 and converting the values with an insert worked out well. The new insert is in the collection of inserts mentioned in issue #78. |
I'm having problems supplying jpegPhoto attributes in LDAP results. I'd be happy to find out I was just doing something wrong.
The JPEG bytes are stored as a column in a MySQL table. I've tried VARBINARY and MEDIUMBLOB with similar (bad) results. If I save the column values to a file, various applications are happy to display it as JPEG.
When I return the results as the "jpegPhoto" attribute in an inetOrgPerson, the values the client receives are not correct (I'm looking at the bytes on the wire via wireshark). It looks kind of like MyVD is taking that binary blob and delivering it UTF-8 encoded. Here is an example of the beginning of one of those JPEGs:
Database:
Wireshark:
The bytes with high bits look like they are translated to UTF-8 encodings.
Any ideas for things I should do or try?
The text was updated successfully, but these errors were encountered: