Bugs in text splitting for DXF output #986

vagran · 2024-06-18T11:16:23Z

Invalid DXF file is produced when trying to convert DWG to DXF using dwg2dxf. Long text (usually in group 1) is split incorrectly. The continuation line does not have a preceding group code. Additionally unicode code points are split between the lines. Here is an example fragment of such result (sorry, I cannot share the full source file, it is proprietary):

XRECORD
  5
E331
102
{ACAD_REACTORS
330
E2CB
102
}
330
E2CB
100
AcDbXrecord
280
     1
 40
401321167
  1
{\fCalibri|b1|i0|c0|p34;\L3\fCalibri|b1|i0|c161|p34;.     \fCalibri|b1|i0|c0|p34;ΟΡΟΙ ΔΟΜΗΣΗΣ\P\fCalibri|b0|i0|c0|p34;\l \P  \fCalibri|b1|i0|c0|p34; 3.1 ΞΕΝΟΔΟΧΕΙΑΚΕΣ ΕΓΚΑΤΑΣΤΑΣΕΙΣ - ΕΚΤΟΣ ΣΧΕΔΙΟΥ\fCalibri|
b1|i0|c161|p34;\L\P\pxi-20.507,l23,t5.4931,2  # <<<< continuation without tag!
  1
3,24;\fCalibri|b0|i0|c161|p34;\l1.	ΘΕΣΗ ΓΗΠΕΔΟΥ		\fCalibri|b1|i0|c161|p34;\L\P\fCalibri|b0|i0|c161|p34;\l2.	ΝΟΜΟΙ & ΔΙΑΤΑΓΜΑΤΑ          \P\pi0,l0,tz;\P\P\P\P\P\P\pi-20.507,l23,t5.4931,23,24;3.	ΑΡΤΙΟΤΗΤΑ\P4.	ΣΥΝΤΕΛ?
?ΣΤΗΣ ΔΟΜΗΣΗΣ \fCalibri|b0|i0|c0|p34; # <<<< continuation without tag! unicode symbol broken, part is left on the  previous line.

Looking into the code, I suspect several problems:

libredwg/src/out_dxf.c

Line 1270 in 07c078a

while (len > 0)

Continuation group code is written only if remaining length is greater than 255. It is probably a typo, and it should check for greater than 0, like in the block above it.
Unicode is not handled in any way. Code point may be split at an arbitrary byte.
According to DXF specification text length limit is 250 characters, not 255.
According to DXF specification, if text field is split (group 1), all partial fragments (starting from the first one) should have group 3, and should be terminated by last fragment with group 1. Seems there is no place for such logic in the current implementation.
This 1024 bytes limit looks very bad. Shouldn't it be increased with dynamic buffer allocation?

The text was updated successfully, but these errors were encountered:

rurban · 2024-06-20T06:09:03Z

Yep, you nailed it. The splitter is very naive

Fixes most parts of GH #986 Remaining is proper utf8-len splitting. not 250 bytes but runes.

rurban · 2024-10-04T18:22:45Z

Fixed 1,3,4,5 so far. Proper unicode rune splitting seems to be implemented by transformation to UCS-2, and transformed back to UTF-8.

Fixes most parts of GH #986 Remaining is proper utf8-len splitting. not 250 bytes but runes. This needs to be done by converting overlong strings to UCS-2, split them at 250 and then output them as UTF-8.

@vagran

Fixes most parts of GH #986 (thanks to @vagran). Remaining is proper utf8-len splitting. not 250 bytes but runes. This needs to be done by converting overlong strings to UCS-2, split them at 250 and then output them as UTF-8.

Fixes most parts of GH #986 (thanks to @vagran/Artyom Lebedev). Remaining is proper utf8-len splitting. not 250 bytes but runes. This needs to be done by converting overlong strings to UCS-2, split them at 250 and then output them as UTF-8.

Fixes rest of GH #986

rurban self-assigned this Jul 12, 2024

rurban added bug Something isn't working blocking labels Jul 12, 2024

rurban added a commit that referenced this issue Oct 4, 2024

dxf: fix dxf_fixup_string logic

b6b2b9e

Fixes most parts of GH #986 Remaining is proper utf8-len splitting. not 250 bytes but runes.

rurban added a commit that referenced this issue Oct 6, 2024

dxf: don't split utf-8 sequences, cquote all

d8828d5

Fixes rest of GH #986

rurban added a commit that referenced this issue Oct 14, 2024

dxf: don't split utf-8 sequences, cquote all

cdef1ca

Fixes rest of GH #986

rurban added a commit that referenced this issue Oct 15, 2024

dxf: don't split utf-8 sequences, cquote all

a5026a4

Fixes rest of GH #986

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugs in text splitting for DXF output #986

Bugs in text splitting for DXF output #986

vagran commented Jun 18, 2024 •

edited by rurban

Loading

rurban commented Jun 20, 2024

rurban commented Oct 4, 2024

Bugs in text splitting for DXF output #986

Bugs in text splitting for DXF output #986

Comments

vagran commented Jun 18, 2024 • edited by rurban Loading

rurban commented Jun 20, 2024

rurban commented Oct 4, 2024

vagran commented Jun 18, 2024 •

edited by rurban

Loading