Support for charsets and unify internal character storage type #3

mplucinski · 2014-06-20T23:22:55Z

This pull request introduces two changes:

2e6add4: in places where text characters are processed, replace char type by YY_CHAR macro. This introduces distinguishness between character as byte of file ("physical") and character as the smallest part of text ("logical"). For logical one, YY_CHAR macro should be used. For now, they represent the same value, but in future YY_CHAR may be replaced by wider type. This is essential to introduce support for extended character sets (any set with number of code points bigger than 256).

To properly introduce YY_CHAR macro usage, some tests are changed to not use str* family of standard C functions, but instead versions that operate on YY_CHAR values. New functions are sitting in tests/strutils.h header, and test case for this file is located in tests/test-tests_strutils directory.
f1456f9, 0d3107e: adds support for character sets. The first commit adds test case, the second actual support. This gives way to convert between "physical" bytes and "logical" characters with user-provided function. It allows input data to be encoded in different character set than the one that has been used to write input ".l" file.

Charset support is completely optional. It is activated by %option charset. When it is active, new variables are activated: yycharset and yycharset_handler. First one should be set to currently used character set before calling yylex(). Second one should be set to function that will be called to convert incoming bytes into characters. Additionally, %option charset-source="ENCODING" should be set to the name of encoding used to write ".l" file - it will be used as internal scanner's character encoding. In case when incoming data are in the same encoding as internal one, conversion function is not used.

…xt is stored

mplucinski · 2014-06-20T23:36:44Z

Forgot to mention: first commit also introduces "char header file". The idea is to put generated YY_CHAR definition into separate file, to make this macro available in places where we cannot include general scanner header, e.g. Bison files.

westes · 2014-07-25T14:06:12Z

Could you rebase this onto the current tip of master?

The usual pull methods are generating conflicts.

mplucinski · 2014-07-25T14:19:04Z

Sure, I've just submitted it as request #8.

#3 in the retargeting patch series.

westes#3)

mplucinski added 3 commits June 21, 2014 00:44

Use YY_CHAR macro instead of char type in places where processed te…

2e6add4

…xt is stored

Add test cases for charset support

f1456f9

Add implementation for charset support

0d3107e

mplucinski mentioned this pull request Jul 25, 2014

feature/charset #8

Open

mplucinski closed this Jul 25, 2014

triaxx mentioned this pull request Sep 20, 2017

flex-2.6.4: segfault during build on Linux #265

Closed

fooofei mentioned this pull request Sep 29, 2017

flex %option noline outputs a #line directive #268

Closed

westes mentioned this pull request Apr 16, 2018

Memory leaks in xstrdup #340

Closed

eric-s-raymond referenced this pull request in eric-s-raymond/flex Sep 21, 2020

Create a method table for the C back end,

f1bfe65

#3 in the retargeting patch series.

db48x added a commit to db48x/flex that referenced this pull request Oct 26, 2020

Fix Next link in the Overview of the Retargeting section of the manual (

ac97741

westes#3)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for charsets and unify internal character storage type #3

Support for charsets and unify internal character storage type #3

mplucinski commented Jun 20, 2014

mplucinski commented Jun 20, 2014

westes commented Jul 25, 2014

mplucinski commented Jul 25, 2014

Support for charsets and unify internal character storage type #3

Support for charsets and unify internal character storage type #3

Conversation

mplucinski commented Jun 20, 2014

mplucinski commented Jun 20, 2014

westes commented Jul 25, 2014

mplucinski commented Jul 25, 2014