Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support literals encoding conversions according to the literal type. #162

Open
wants to merge 94 commits into
base: gcos4gnucobol-3.x
Choose a base branch
from

Conversation

AhmedMaher309
Copy link

@AhmedMaher309 AhmedMaher309 commented Aug 3, 2024

Added the support for encoding conversion for National literals to be in UTF-16 and for alphanumeric literals to be in ISO-8859-15 and for the UTF-8 literals to be in UTF-8

Changes in details:

  1. Handling of single-byte and UTF-8 encoded files in terms of literals construction
  2. Implementation of internal conversions between ISO-8859–15, UTF-8 and UTF-16 according to literals type (Alphanumeric, National, and UTF-8 literals)
  3. Enhanced support for National literals stored as UTF-16, and the alphanumeric literals as ISO-8859-15
  4. New function for building UTF-8 literals to be stored as UTF-8
  5. Command-line argument for specifying source file encoding or using the locale if no encoded specified in the command line

Copy link
Collaborator

@GitMensch GitMensch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while the overall approach is fine (misses documentation and tests for utf8, but both should come after the new command line argument) there are some things to adjust as detailed below; please try to get to those until Tuesday and push when finished

cobc/typeck.c Outdated Show resolved Hide resolved
cobc/typeck.c Outdated Show resolved Hide resolved
tests/testsuite.src/syn_definition.at Outdated Show resolved Hide resolved
cobc/typeck.c Outdated Show resolved Hide resolved
cobc/tree.h Outdated Show resolved Hide resolved
cobc/scanner.l Outdated Show resolved Hide resolved
cobc/scanner.l Outdated Show resolved Hide resolved
cobc/tree.c Show resolved Hide resolved
cobc/tree.c Outdated Show resolved Hide resolved
cobc/tree.c Show resolved Hide resolved
cobc/typeck.c Outdated Show resolved Hide resolved
Copy link
Collaborator

@GitMensch GitMensch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice next commits; testcase for that feature and Changelog for the last commits seems too be missin

cobc/cobc.c Outdated Show resolved Hide resolved
cobc/cobc.c Outdated Show resolved Hide resolved
cobc/flag.def Show resolved Hide resolved
@GitMensch
Copy link
Collaborator

GitMensch commented Aug 10, 2024 via email

2024-07-06 Ahmed Maher <[email protected]>

* tree.c: added encoding conversion support to cb_build_national_literal and
cb_build_alphanumeric_literal
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent by two spaces

@AhmedMaher309 AhmedMaher309 marked this pull request as ready for review August 11, 2024 13:02
@GitMensch
Copy link
Collaborator

@AhmedMaher309: I've fixed the merge conflict for the Changelog so the CI can run here; note that I've also applied the Changelog format to your entries - please have a look to get an idea how this (quite commonly, especially in GNU space) is done; you may also want to check the GNU coding guide on this for a better understanding

I'm looking forward for your documentation entry on the encoding topic, even when we can't consider it for the final evaluation (because of the deadline now being over) [note: I plan to do the evaluation part until Thursday - it's just some quite busy week so far)

@GitMensch
Copy link
Collaborator

For the failing CI build - you may want to use { } around that :-)
For the UTF8 test - let's inspect that later this week.

@GitMensch
Copy link
Collaborator

I know that's a very late thought (post GSOC project timeline)...

But we definitely should add the option to set cb_iconv.alphanumeric via command line - otherwise we force it to ISO-8859-15 where previously people did use CP850 on Windows to draw TUI symbols which won't work any more and return iconv errors as these are not part of iso-8859-1 at all.
... and of course this may also have an effect at runtime... so we should at least add this to the cob_module structure as char *alphanumeric_encoding and set this if it isn't ISO-8859-15. A later addition in libcob would then need to handle a very specific case: encodings that have digits at the non-native place - 0x30-x39 vs: 0xF0-xF9, for ease this likely should be a separate flag and the conversion of digits won't need iconv but a simple lookup table.

@GitMensch
Copy link
Collaborator

You haven't seem to pushed your last changes, do you?

AhmedMaher309 and others added 24 commits September 30, 2024 00:03
last commit for gsoc
cobc:
* cobc.c, codegen.c, tree.c: fixed C89 errors
* typeck.c (trimmed_size, is_blank): minor refactorings
* cobc.c, codegen.c, tree.c: fixed C89 errors
* typeck.c (trimmed_size, is_blank): minor refactorings
* typeck.c: provide the original buffer if encoding cannot be applied

run_functions.at: testsuite drafts (currently failing) for encoding related issues

testsuite environment update

tests:
* atlocal.in (set_utf8_locale): new function for the testsuite enabling tests to either run with UTF8 locale or skip
* atlocal.in: use configure-setup for the grep binary
* atlocal_win: updated to current atlocal.in

[to be used for encoding and possibly screenio tests]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants