Support literals encoding conversions according to the literal type. #162

AhmedMaher309 · 2024-08-03T20:29:01Z

Added the support for encoding conversion for National literals to be in UTF-16 and for alphanumeric literals to be in ISO-8859-15 and for the UTF-8 literals to be in UTF-8

Changes in details:

Handling of single-byte and UTF-8 encoded files in terms of literals construction
Implementation of internal conversions between ISO-8859–15, UTF-8 and UTF-16 according to literals type (Alphanumeric, National, and UTF-8 literals)
Enhanced support for National literals stored as UTF-16, and the alphanumeric literals as ISO-8859-15
New function for building UTF-8 literals to be stored as UTF-8
Command-line argument for specifying source file encoding or using the locale if no encoded specified in the command line

GitMensch

while the overall approach is fine (misses documentation and tests for utf8, but both should come after the new command line argument) there are some things to adjust as detailed below; please try to get to those until Tuesday and push when finished

cobc/typeck.c

tests/testsuite.src/syn_definition.at

cobc/typeck.c

cobc/tree.h

cobc/scanner.l

cobc/tree.c

cobc/typeck.c

GitMensch

nice next commits; testcase for that feature and Changelog for the last commits seems too be missin

cobc/cobc.c

cobc/flag.def

GitMensch · 2024-08-10T14:37:18Z

In this case (after my push) opening the file will create an error if that was really the encoding, and iconv will return an error on first use if the given encoding does not match the input.

GitMensch · 2024-08-10T21:36:31Z

cobc/ChangeLog

+2024-07-06 Ahmed Maher <[email protected]>
+
+	* tree.c: added encoding conversion support to cb_build_national_literal and
+	cb_build_alphanumeric_literal


indent by two spaces

GitMensch · 2024-08-12T19:21:14Z

@AhmedMaher309: I've fixed the merge conflict for the Changelog so the CI can run here; note that I've also applied the Changelog format to your entries - please have a look to get an idea how this (quite commonly, especially in GNU space) is done; you may also want to check the GNU coding guide on this for a better understanding

I'm looking forward for your documentation entry on the encoding topic, even when we can't consider it for the final evaluation (because of the deadline now being over) [note: I plan to do the evaluation part until Thursday - it's just some quite busy week so far)

GitMensch · 2024-08-12T19:23:40Z

For the failing CI build - you may want to use { } around that :-)
For the UTF8 test - let's inspect that later this week.

GitMensch · 2024-08-13T08:01:24Z

I know that's a very late thought (post GSOC project timeline)...

But we definitely should add the option to set cb_iconv.alphanumeric via command line - otherwise we force it to ISO-8859-15 where previously people did use CP850 on Windows to draw TUI symbols which won't work any more and return iconv errors as these are not part of iso-8859-1 at all.
... and of course this may also have an effect at runtime... so we should at least add this to the cob_module structure as char *alphanumeric_encoding and set this if it isn't ISO-8859-15. A later addition in libcob would then need to handle a very specific case: encodings that have digits at the non-native place - 0x30-x39 vs: 0xF0-xF9, for ease this likely should be a separate flag and the conversion of digits won't need iconv but a simple lookup table.

GitMensch · 2024-08-21T10:27:51Z

You haven't seem to pushed your last changes, do you?

modifing build_literal function to put the first version of literals encoding handling

…ing encoding conversion

…rsion yer

This reverts commit 0fc258f.

…teral instead of modifying build_literal itself

….c/validate_alphabet" This reverts commit da5a954.

…ified in by the command line

last commit for gsoc

…_line

cobc: * cobc.c, codegen.c, tree.c: fixed C89 errors * typeck.c (trimmed_size, is_blank): minor refactorings

* cobc.c, codegen.c, tree.c: fixed C89 errors * typeck.c (trimmed_size, is_blank): minor refactorings * typeck.c: provide the original buffer if encoding cannot be applied run_functions.at: testsuite drafts (currently failing) for encoding related issues testsuite environment update tests: * atlocal.in (set_utf8_locale): new function for the testsuite enabling tests to either run with UTF8 locale or skip * atlocal.in: use configure-setup for the grep binary * atlocal_win: updated to current atlocal.in [to be used for encoding and possibly screenio tests]

…gnucobol into gcos4gnucobol-3.x

GitMensch requested changes Aug 4, 2024

View reviewed changes

GitMensch reviewed Aug 5, 2024

View reviewed changes

cobc/typeck.c Outdated Show resolved Hide resolved

GitMensch requested changes Aug 9, 2024

View reviewed changes

cobc/cobc.c Outdated Show resolved Hide resolved

cobc/cobc.c Outdated Show resolved Hide resolved

cobc/flag.def Show resolved Hide resolved

GitMensch reviewed Aug 10, 2024

View reviewed changes

AhmedMaher309 marked this pull request as ready for review August 11, 2024 13:02

AhmedMaher309 added 20 commits September 29, 2024 23:32

install cobc

b4ded0a

update tree.c/build_literal

91bb975

modifing build_literal function to put the first version of literals encoding handling

refactor the build_literal function

484803e

Delete .local directory

1cf7e8e

remove configs

eacc744

added switch cases on category to build_literal fucntion

c1366f0

Handling the case of N'text' in National literals, but not space padd…

49b78b0

…ing encoding conversion

handling N'text' case in National literals but no space padding conve…

f7574e1

…rsion yer

added the parts of handling the padding in the literals

cf73509

National literals space padding is now handled

61c854c

removed the build directory

29a8580

added a debug line to build_literal function

5692732

Revert "added a debug line to build_literal function"

e49844a

This reverts commit 0fc258f.

made the encoding conversions in the two function that calls build_li…

94899e4

…teral instead of modifying build_literal itself

modify the code to make iconv_open(3) only twice when cobc starts

9d77577

Refactor the valid_move function

8c1c59a

refactor validate_move function

8c92a4a

refactored scan_x function

57ed3f5

fixed indentation

964877f

Fix indentation

5d96092

AhmedMaher309 and others added 24 commits September 30, 2024 00:03

Revert "fixed the testcase in syn_defintion and the refactored typeck…

0819ba3

….c/validate_alphabet" This reverts commit da5a954.

code refactoring

a307e44

adding the test for utf8

fe0c8a4

refactoring tree.c/cb_build_UTF8_literal and cobc.c/process_command_line

e2fd1a6

refactor tree.c/cb_build_national_literal

fe46b03

added the option to use the locale if the source encoding is not spec…

505125d

…ified in by the command line

replaced spaces indentation with tabs

affce1a

fixed cobc.c/initialize_cb_iconv

e47d0d7

fix cobc.c/initialize_cb_iconv

184aa1b

updated cobc/Changelog

a0c2ecf

updated NEWS file

2a1cf8f

last commit for gsoc

add temporal (per file->UTF8 BOM) override

41c3f81

refactored command line option for source encoding

edfb788

added test for source encoding command line argument

75e3992

updated Changelog

72f57f3

added a chapter for encoding handling

23df41d

modified the chapter for encoding handling in doc/gnucobol.texi

516a717

updated doc/gnucobol.texi

6e25262

fixed doc/gnucobol.texi

82ae4d8

added new accepted encoding for source file in cobc.c/process_command…

3dde1ab

…_line

update gnucobol.texi file

edb5bda

minor adjustments to character encoding code

713bbd6

cobc: * cobc.c, codegen.c, tree.c: fixed C89 errors * typeck.c (trimmed_size, is_blank): minor refactorings

added command line option for the alphanumeric literals encoding

7efa05c

AhmedMaher309 force-pushed the gcos4gnucobol-3.x branch from 5a2a864 to 7efa05c Compare September 29, 2024 21:14

AhmedMaher309 added 5 commits September 30, 2024 00:25

added command line option for the alphanumeric literals encoding

abf83c6

Merge branch 'gcos4gnucobol-3.x' of https://github.com/AhmedMaher309/…

45d0951

…gnucobol into gcos4gnucobol-3.x

fixing build and test issues

0096d02

fixing typeck/validate_alphabet

83e1272

fixed build errors

baaef94

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support literals encoding conversions according to the literal type. #162

Support literals encoding conversions according to the literal type. #162

AhmedMaher309 commented Aug 3, 2024 •

edited

Loading

GitMensch left a comment

GitMensch left a comment

GitMensch commented Aug 10, 2024 via email

GitMensch Aug 10, 2024

GitMensch commented Aug 12, 2024

GitMensch commented Aug 12, 2024

GitMensch commented Aug 13, 2024

GitMensch commented Aug 21, 2024

Support literals encoding conversions according to the literal type. #162

Are you sure you want to change the base?

Support literals encoding conversions according to the literal type. #162

Conversation

AhmedMaher309 commented Aug 3, 2024 • edited Loading

GitMensch left a comment

Choose a reason for hiding this comment

GitMensch left a comment

Choose a reason for hiding this comment

GitMensch commented Aug 10, 2024 via email

GitMensch Aug 10, 2024

Choose a reason for hiding this comment

GitMensch commented Aug 12, 2024

GitMensch commented Aug 12, 2024

GitMensch commented Aug 13, 2024

GitMensch commented Aug 21, 2024

AhmedMaher309 commented Aug 3, 2024 •

edited

Loading