Japanese Input Method #1387

tyama501 · 2022-08-01T14:21:11Z

tyama501
Aug 1, 2022

Hello,

I was searching for Japanese Input Method ( IME) and found an open source project.

Anthy-unicode
https://github.com/fujiwarat/anthy-unicode/blob/main/doc/GUIDE.english

The Anthy was used on Linux and this is unicode version.

There is a long way to show and input the language other than English in ELKS but just for information.

ghaerr · 2022-08-01T16:06:17Z

ghaerr
Aug 1, 2022
Maintainer

Hello @tyama501,

I have been thinking of unicode for ELKS also, but there is one main problem on the IBM PC, which could also result in similar problem on PC-98. That is, the text-mode character ROM only has 128 (additional) non-ASCII characters that can be displayed on the screen. Thus, any fancy input processing that resulted in a non-displayable character might not actually work very well for arbitrary input.

That said, with the recent work been done to allow applications to run outside/off the console (i.e. via the serial terminal or network telnet session), this problem has been solved by converting any unicode character to UTF-8, which automatically displays properly on most modern terminal emulators (e.g. all on Linux and macOS, for instance). This is how the IBM PC ROM character set is easily displayable outside the console: there is a CP 437 -> unicode conversion table which readily allows for converting IBM PC character ROM output to unicode for display on terminal emulators.

For PC-98, I am aware that the basic (text mode?) code page glyphs are different. Thus, one approach would be to use a different japanese PC-98 to unicode conversion table, which, when also including the CP 437 table, should allow for japanese and IBM PC applications to run on terminal emulators.

I haven't started yet, but if we (respectively) wrote unicode input to -> IBM PC and PC-98 code page conversion routines, that would allow for unicode input to be displayed on the native console. However, as noted above, this allows for only 128 additional characters, and is potentially a lot of work for little gain, since a much simpler Latin-1 conversion table could be used while keeping the input path at 8 bits (for IBM PC).

Can the PC-98 display arbitrary bitmapped glyphs in text mode?

What are your thoughts on all this?

Thank you!

0 replies

tyama501 · 2022-08-01T16:32:40Z

tyama501
Aug 1, 2022
Author

Hello @ghaerr ,

PC-98 has kanji ROM and text console can display double byte JIS kanji but it is different from shift-jis which are used from early Windows and DOS, and it is different from unicode. So it needs conversion. I am thinking of some nano-X application that can use additional fonts.

It is also interesting your idea if serial console works.

Japanese need to separate alphabet to appropriate hiragana/katakana/kanji character so the input method engine is also needed.

0 replies

ghaerr · 2022-08-01T16:39:36Z

ghaerr
Aug 1, 2022
Maintainer

I am thinking of some nano-X application that can use additional fonts.

If an additional font is created in one of the formats allowed in nano-X (.fnt, .bdf/.pcf), then it should be no problem to display any glyph from a nano-X application, using an arbitrary character index. However, there are severe memory limitations on this 16-bit version of nano-X, so the font code may have to use disk-caching (not written yet), depending on the size of the font file.

It is also interesting your idea if serial console works.

The idea there is that it will be a lot easier to get applications to work with unicode input and libraries, than to enhance the kernel and/or console code. As said earlier, we could likely add UTF-8 processing to consoles, but they would be quite limited in the characters that can be displayed on console. In addition, we have a compatibility issue with all characters > 127, as their high bit interferes with UTF-8 processing.

1 reply

tyama501 Aug 1, 2022
Author

Thank you @ghaerr ,

I will think implementing serial console after nano-X driver work.

tyama501 · 2022-08-01T18:10:31Z

tyama501
Aug 1, 2022
Author

I don't know the size but this seems bdf fonts.
https://github.com/akahuku/ufo

0 replies

ghaerr · 2022-08-02T03:41:24Z

ghaerr
Aug 2, 2022
Maintainer

Hello @tyama501,

I don't know the size but this seems bdf fonts.

That BDF file is huge: 1.4M! Also, I didn't realize that the ELKS version of nano-X does not have support for reading .pcf files. I have a number of font conversion tools, which we could use to translate font formats, except ELKS nano-X will require all fonts to be compiled into the data segment (64K max)!

Thank you!

0 replies

tyama501 · 2022-08-02T17:22:01Z

tyama501
Aug 2, 2022
Author

Hello @ghaerr ,

This is very small 8x8 fonts but even this the size is more than several 100kB so it seems impossible to compile in 64KB.
https://littlelimit.net/misaki.htm

0 replies

tyama501 · 2022-08-02T18:32:14Z

tyama501
Aug 2, 2022
Author

I download some of files from the site and found that MISAKI.FNT binary in misaki_fontx_2012-06-03 is 55KB.
So it might be not as much if converted to binary.

0 replies

ghaerr · 2022-08-02T18:43:13Z

ghaerr
Aug 2, 2022
Maintainer

found that MISAKI.FNT binary in misaki_fontx_2012-06-03 is 55KB.

I recently added the fmemallocsystem call to ELKS, which allows for allocating as "far" memory from the main memory region, outside the current process. This could be used with (yet unwritten) code to interpret the MISAKI.FNT file for the VGA blit routines.

So it looks like we could solve this problem :)

24 replies

tyama501 Jan 26, 2024
Author

Hello @ghaerr ,

Yes, GdHideCursor() worked to hide cursor, so I will hide it for my application until multiple windows are supported :)

On the other hand, it seems GdShowCursor() after the hiding does not work.
I am not sure why, but the cursor never shows again.

Thank you.

ghaerr Jan 26, 2024
Maintainer

it seems GdShowCursor() after the hiding does not work.

Hmmm, you might try calling it twice; there is some logic in engine/devmouse.c which keeps track of curvisible that might be the cause of this, as the routines are also used to hide and restore the cursor while drawing underneath it.

tyama501 Jan 27, 2024
Author

Yes, it works if it is called twice.

tyama501 Jan 27, 2024
Author

Hello @ghaerr ,

BTW, I noticed "mv" is spending some clusters before and after.
I don't dig this now, but it might be related to "vi" issue we have seen before.

Before move Free 88, after move Free 86

ghaerr Jan 27, 2024
Maintainer

In order to debug this, what would be nice would be to use mv to move a single file in a root directory of a floppy with almost nothing else on it. We could then dump and compare the root directory entry(s) and FAT table before and afterwards to trace it down.

I am aware of the ongoing vi bug, I think that is related to creating a temp file then deleting it, possibly before closing it. I am assuming that this mv problem is something different than that.

tyama501 · 2022-12-31T17:48:42Z

tyama501
Dec 31, 2022
Author

Happy New Year @ghaerr ,

I could display Japanese UTF-8 file using the serial terminal!
This is cool :)

Thank you!

0 replies

tyama501 · 2024-01-27T19:51:53Z

tyama501
Jan 27, 2024
Author

I have released first alpha version of Simple UTF-8 Japanese text file viewer with Nano-X for ELKS.

https://github.com/tyama501/nxjtxtv
Release
https://github.com/tyama501/nxjtxtv/releases/tag/alpha_0_1

Thank you!

7 replies

tyama501 Jan 28, 2024
Author

Now, font files are 116KB in total and nxjtxtv is 38.6KB, so, not so bad although it is exceeding current space in FD1232-pc98.img.

tyama501 Jan 30, 2024
Author

I forgot to mention that the above release is PC-98 version.

I will build for IBM soon.
Maybe I can increase rows for VGA.

tyama501 Feb 3, 2024
Author

I have released the viewer for IBM compatible PC.
https://github.com/tyama501/nxjtxtv/releases/tag/alpha_0_2i

Now the line count for row is obtained from window size.
VGA can display 10 more lines per page than PC-98.

Thank you!

ghaerr Feb 3, 2024
Maintainer

Very nice! Approximately how fast is the page display? Are PC-98 and IBM about the same speed?

tyama501 Feb 3, 2024
Author

Thank you.
Well, actually I cannot compare because the Qemu for IBM PC is very fast.
It is slow on PC-98 with V30 10MHz...
(I think 486 is not much slow)

tyama501 · 2024-03-19T19:15:36Z

tyama501
Mar 19, 2024
Author

I found sjis to utf8 converter with c source that might be portable if far memory and separated table file is used.
https://github.com/dolphilia/sjis-to-utf8

10 replies

ghaerr Mar 24, 2024
Maintainer

I have thought about modifying it to streaming

I think the entire program could be made much smaller using the streaming route. It isn't that complicated to decode ShiftJIS nor to encode UTF8. ELKS already has some UTF8 conversion routines I have used for the fm and matrix programs, see elkscmd/tui/runes.c for an example where the entire process takes two functions.

tyama501 Apr 10, 2024
Author

I may make a streaming version, but I incremented segments and I could handle a file more than 64KB (if memory is available).
tyama501/sjis-to-utf8_elks@7e4241a

110KB conversion takes about 1min in i486SX, 25MHz emulator setting.

I enhanced nxjtxtv to support stdin, so small file can be seen by pipe but it seems this one is out of memory so I displayed after converted.
Here is the video.
https://github.com/ghaerr/elks/assets/61556504/769dc2f9-7c12-4984-a73d-3f9fd5d3959a

I think I will release these soon for now.
(Maybe after fixing issue, that nxjtxtv and the system crash if out of memory)

Thank you.

ghaerr Apr 10, 2024
Maintainer

There might be a better way to display lots of graphics text without clearing the screen and redrawing, which is hard on the eyes. The idea for that would be to always display the foreground and background bits of the characters (including spaces). Then, there is no need for a screen clear but the entire screen including spaces needs to be drawn. The drawback of this approach is that the drawing is slower since the background pixels also have to be drawn. A hardware EGA scroll might also work, but I'm not sure how that is done.

tyama501 Apr 28, 2024
Author

Hello @ghaerr ,

I added clearing until end of line when line feed and quit clearing the screen.
tyama501/nxjtxtv@972af1b
I think better than before.
https://github.com/ghaerr/elks/assets/61556504/d51a637f-6b4c-47ae-878a-131fe3c24804

Thank you.

ghaerr Apr 29, 2024
Maintainer

Hello @tyama501,

That's a good idea, clearing to end of line rather than screen clear. Looks good!

Thank you!

ghaerr · 2024-03-24T16:05:49Z

ghaerr
Mar 24, 2024
Maintainer

Hello @tyama501,

After looking at your screenshots above, I came up with an idea that I thought you might find interesting: creating a Japanese console that runs in graphics mode for ELKS! :)

Instead of running a conversion program to display japanese text, the new console could run (using low-level drawing code pulled over from Nano-X) directly interpreting ShiftJIS characters and displaying them on the console in graphics mode. Since scrolling graphics text will be quite slow, the console could just erase and start at the top instead.

If you're interested in this, I would suggest starting with getting an ELKS graphics console running (initially for PC-98, but later I could help for IBMPC) that just displays ASCII text. After that, the MISAKI.FNT file could be loaded (or compiled into the kernel) and the conversion routines implemented.

A graphics console would be lots slower, but still potentially interesting also to display native UTF-8. I suppose it would mostly be a fun project, rather than being seriously useful, depending on the speed of the machine.

Thank you!

3 replies

tyama501 Mar 24, 2024
Author

Thank you for the idea!

This is the console that works right after boot and it is different from nxterm, correct?

If it can handle escape sequences
I think it is better to create one with UTF-8
first, since Shift-JIS has characters that can be detected as control characters.

May be it is nice if we can expand it to other languages easily :)

ghaerr Mar 25, 2024
Maintainer

This is the console that works right after boot and it is different from nxterm, correct?

Right. It would basically be console-direct-pc98.c rewritten using graphics routines (clear screen and text output, later scroll up/down) for low level output, rather than writing character/attribute combinations to segment 0xB800 (or similar I can't remember how PC-98 works).

If it can handle escape sequences

We would keep the ANSI emulation as part of the console, so yes.

I think it is better to create one with UTF-8

Good point. Your Japanese font could be one-time converted to use UTF-8 indices into a glyph array, rather than Shift-JIS. All console output would use UTF-8 as (sparse) indexes into a loaded UTF-8 mono bitmap font. We could even start with an ASCII or IBMPC compatible 256-character font in the beginning.

May be it is nice if we can expand it to other languages easily

Yes, any language would be supported provided that the "loaded" font had glyphs that matched the UTF-8 code points being sent to the console.

since Shift-JIS has characters that can be detected as control characters.

Oh I see. That could be a problem with ANSI emulation, depending on exactly which C0 or C1 ASCII characters we're talking about.

Having said all this, there are still two major problems that need to be considered: 1) a graphics console will be very very slow on some systems, possibly not usable for any scrolling, and 2) changing ELKS to UTF-8 streamed output will break a lot of programs because probably very few to none (like vi, matrix, etc) properly handled UTF-8 now. They all likely just expect code page 437 (IBM PC). That's why I was initially thinking of this as a Japanese display enhancement. I suppose a way to enter/exit graphics vs text mode is also needed. Lots of work, could be more of an experimental project :)

tyama501 Mar 25, 2024
Author

PC-98 has a hardware console with ROM kanji characters that can display and scroll very fast and PC-98 DOS uses it.
But the code is not Shift-JIS nor UTF(unicode).
I am rather interested in the graphic console that can port to other platforms and othe fonts. (Japanese IBM compatible DOS uses graphic console)

Well, I think I have many remaining tasks like, fsck-dos, serial ip, enhancement for basic. I will do these first:)

Thank you.

tyama501 · 2024-05-11T11:14:15Z

tyama501
May 11, 2024
Author

Release & Updates

First Porting of SJIS to UTF8 for ELKS
https://github.com/tyama501/sjis-to-utf8_elks/releases/tag/sjisutf8_elks_0_1

Simple UTF-8 Japanese text file viewer
https://github.com/tyama501/nxjtxtv/releases/tag/alpha_0_3
For PC-98 and IBM Compatible. Both included.

0 replies

tyama501 · 2024-12-15T15:26:49Z

tyama501
Dec 15, 2024
Author

Release & Updates

Added character size double hight and width mode.

Simple UTF-8 Japanese text file viewer
https://github.com/tyama501/nxjtxtv/releases/tag/alpha_0_4
For PC-98 and IBM Compatible. Both included.

With option -d

Normal

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Japanese Input Method #1387

{{title}}

Replies: 14 comments 45 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Japanese Input Method #1387

tyama501 Aug 1, 2022

Replies: 14 comments · 45 replies

ghaerr Aug 1, 2022 Maintainer

tyama501 Aug 1, 2022 Author

ghaerr Aug 1, 2022 Maintainer

tyama501 Aug 1, 2022 Author

tyama501 Aug 1, 2022 Author

ghaerr Aug 2, 2022 Maintainer

tyama501 Aug 2, 2022 Author

tyama501 Aug 2, 2022 Author

ghaerr Aug 2, 2022 Maintainer

tyama501 Jan 26, 2024 Author

ghaerr Jan 26, 2024 Maintainer

tyama501 Jan 27, 2024 Author

tyama501 Jan 27, 2024 Author

ghaerr Jan 27, 2024 Maintainer

tyama501 Dec 31, 2022 Author

tyama501 Jan 27, 2024 Author

tyama501 Jan 28, 2024 Author

tyama501 Jan 30, 2024 Author

tyama501 Feb 3, 2024 Author

ghaerr Feb 3, 2024 Maintainer

tyama501 Feb 3, 2024 Author

tyama501 Mar 19, 2024 Author

ghaerr Mar 24, 2024 Maintainer

tyama501 Apr 10, 2024 Author

ghaerr Apr 10, 2024 Maintainer

tyama501 Apr 28, 2024 Author

ghaerr Apr 29, 2024 Maintainer

ghaerr Mar 24, 2024 Maintainer

tyama501 Mar 24, 2024 Author

ghaerr Mar 25, 2024 Maintainer

tyama501 Mar 25, 2024 Author

tyama501 May 11, 2024 Author

tyama501 Dec 15, 2024 Author

tyama501
Aug 1, 2022

Replies: 14 comments 45 replies

ghaerr
Aug 1, 2022
Maintainer

tyama501
Aug 1, 2022
Author

ghaerr
Aug 1, 2022
Maintainer

tyama501 Aug 1, 2022
Author

tyama501
Aug 1, 2022
Author

ghaerr
Aug 2, 2022
Maintainer

tyama501
Aug 2, 2022
Author

tyama501
Aug 2, 2022
Author

ghaerr
Aug 2, 2022
Maintainer

tyama501 Jan 26, 2024
Author

ghaerr Jan 26, 2024
Maintainer

tyama501 Jan 27, 2024
Author

tyama501 Jan 27, 2024
Author

ghaerr Jan 27, 2024
Maintainer

tyama501
Dec 31, 2022
Author

tyama501
Jan 27, 2024
Author

tyama501 Jan 28, 2024
Author

tyama501 Jan 30, 2024
Author

tyama501 Feb 3, 2024
Author

ghaerr Feb 3, 2024
Maintainer

tyama501 Feb 3, 2024
Author

tyama501
Mar 19, 2024
Author

ghaerr Mar 24, 2024
Maintainer

tyama501 Apr 10, 2024
Author

ghaerr Apr 10, 2024
Maintainer

tyama501 Apr 28, 2024
Author

ghaerr Apr 29, 2024
Maintainer

ghaerr
Mar 24, 2024
Maintainer

tyama501 Mar 24, 2024
Author

ghaerr Mar 25, 2024
Maintainer

tyama501 Mar 25, 2024
Author

tyama501
May 11, 2024
Author

tyama501
Dec 15, 2024
Author