MSWord moves word to upper line when correcting space error #50

duomdaamaendra · 2022-02-08T18:04:07Z

B. Moske (s.25) «Mun in jáme/mu luondu dušše rievdá»
Paltto (s.37) «mánát sturrot/mun ieš boarásmuvan» ??
B.Moske (s.39) «Nu jođánit moai rávásmuvaime» … (s.47) /Rumaš goldná dađistaga»

This happens when correcting "B.Moske" to "B. Moske":

The problem occurs because CR(LF) is not escaped in the various tools:

duomdaamaendra · 2022-02-08T18:07:38Z

this does not happen in Googledocs

lynnda-hill · 2022-10-13T12:42:39Z

When fixing ?? to ? ? a new suggestions appear, ?B. can be fixed to ? B. However, there is a new line after ? which the program seems to ignore.

snomos · 2023-11-16T09:57:05Z

It seems that the problem is that we haven't considered CARRIAGE RETURN / Ux000D (\r) in our processing. I assume it should be added to our whitespace analyser.

snomos · 2023-11-16T16:55:32Z

Soemthing very strange happens that looks like a bug. With the following minimal test text:

boarásmuvan» ??
B.Moske

(copy to MS Word, paste it in a new document, and copy it back from the word file if the CR is lost) I get the foliowing in UnicodeChecker:

CR (U+000D) is clearly located directly after the two question marks, and before the newline.

Now store the test text (with the CR char) in a test file, and run it through the grammar checker:

cat test.txt | ./tools/grammarcheckers/modes/smegramrelease.mod

The result is this:

"<boarásmuvan>"
	"boarásmuvvat" Err/Orth-a-á <mv> V IV Ind Prs Sg1 <W:0.0> <firstCohort> @+FMAINV &LINK &punct-aistton-right ID:1
punct-aistton-right
	"boarásmuvvat" v1 <mv> V IV Ind Prs Sg1 <W:0.0> <firstCohort> @+FMAINV &LINK &punct-aistton-right ID:1
punct-aistton-right
"<»>"
	"»" PUNCT RIGHT <W:0.0> <SpaceOnRightSide> &punct-aistton-right &space-before-punct-mark &LINK ID:2 R:LEFT:1
punct-aistton-right
space-before-punct-mark
	"»" PUNCT RIGHT <W:0.0> <SpaceOnRightSide> "boarásmuvan”"S &punct-aistton-right &SUGGESTWF ID:2 R:LEFT:1
punct-aistton-right
	"”" PUNCT RIGHT Err/Orth <W:0.0> <SpaceOnRightSide> &LINK &space-before-punct-mark ID:2 R:LEFT:1
space-before-punct-mark
:
\n
: 
"<?>"
	"?" CLB <W:0.0> <SpaceBeforePunctMark>

"<?>"
	"?" CLB <W:0.0> <NoSpaceAfterPunctMark> &no-space-after-punct-mark ID:5 R:RIGHT:7
no-space-after-punct-mark
	"?" CLB <W:0.0> <NoSpaceAfterPunctMark> "? B."S &no-space-after-punct-mark &SUGGESTWF ID:5 R:RIGHT:7
no-space-after-punct-mark

"<B.>"
	"B" N <NomGenSg> Sem/Sign ABBR Gram/TAbbr Sg Nom <W:0.0> <NoSpaceAfterPunctMark> @HNOUN &no-space-after-punct-mark &LINK ID:7
no-space-after-punct-mark
	"Balphabet" N <NomGenSg> Sem/Sign ABBR Gram/TAbbr Sg Nom <W:0.0> <NoSpaceAfterPunctMark> @HNOUN &no-space-after-punct-mark &LINK ID:7
no-space-after-punct-mark
"<Moske>"
	"Moske" N Prop Sem/Plc Sg Nom <W:0.0> <LastCohort> @HNOUN

Suddenly the CR char (and the newline) is placed before the two question marks.

That is, the character stream has been changed somewhere in the processing. That should not happen.

snomos · 2023-11-16T16:59:03Z

The tokeniser/analyser is fine:

cat test.txt | ./tools/grammarcheckers/modes/smegramrelease0-morph.mode
"<boarásmuvan>"
	"boarásmuvvat" Err/Orth-a-á V IV Ind Prs Sg1 <W:0.0>
	"boarásmuvvat" v1 V IV Ind Prs Sg1 <W:0.0>
"<»>"
	"»" PUNCT RIGHT <W:0.0>
	"”" PUNCT RIGHT Err/Orth <W:0.0>
: 
"<?>"
	"?" CLB <W:0.0>
"<?>"
	"?" CLB <W:0.0>
:
\n
"<B.>"
	"." CLB <W:0.0> "<.>"
		"B" N Sem/Sign ABBR Gram/TAbbr Attr <W:0.0> "<B>"
	"B" N Sem/Sign ABBR Gram/TAbbr Attr <W:0.0>
	"." CLB <W:0.0> "<.>"
		"B" N Sem/Sign ABBR Gram/TAbbr Sg Acc <W:0.0> "<B>"
	"B" N Sem/Sign ABBR Gram/TAbbr Sg Acc <W:0.0>
	"." CLB <W:0.0> "<.>"
		"B" N Sem/Sign ABBR Gram/TAbbr Sg Gen <W:0.0> "<B>"
	"B" N Sem/Sign ABBR Gram/TAbbr Sg Gen <W:0.0>
	"." CLB <W:0.0> "<.>"
		"B" N Sem/Sign ABBR Gram/TAbbr Sg Nom <W:0.0> "<B>"
	"B" N Sem/Sign ABBR Gram/TAbbr Sg Nom <W:0.0>
	"." CLB <W:0.0> "<.>"
		"Balphabet" N Sem/Sign ABBR Gram/TAbbr Attr <W:0.0> "<B>"
	"Balphabet" N Sem/Sign ABBR Gram/TAbbr Attr <W:0.0>
	"." CLB <W:0.0> "<.>"
		"Balphabet" N Sem/Sign ABBR Gram/TAbbr Sg Acc <W:0.0> "<B>"
	"Balphabet" N Sem/Sign ABBR Gram/TAbbr Sg Acc <W:0.0>
	"." CLB <W:0.0> "<.>"
		"Balphabet" N Sem/Sign ABBR Gram/TAbbr Sg Gen <W:0.0> "<B>"
	"Balphabet" N Sem/Sign ABBR Gram/TAbbr Sg Gen <W:0.0>
	"." CLB <W:0.0> "<.>"
		"Balphabet" N Sem/Sign ABBR Gram/TAbbr Sg Nom <W:0.0> "<B>"
	"Balphabet" N Sem/Sign ABBR Gram/TAbbr Sg Nom <W:0.0>
	"." CLB <W:0.0> "<.>"
		"b" Adv Sem/Time ABBR Gram/TNumAbbr Attr <W:0.0> "<B>"
	"." CLB <W:0.0> "<.>"
		"b" Adv Sem/Time ABBR Gram/TNumAbbr <W:0.0> "<B>"
"<Moske>"
	"Moske" N Prop Sem/Plc Attr <W:0.0>
	"Moske" N Prop Sem/Plc Sg Nom <W:0.0>

snomos · 2023-11-16T17:01:37Z

The first whitespace analyser moves the chars one place:

cat test.txt | ./tools/grammarcheckers/modes/smegramrelease1-blanktag.mode
"<boarásmuvan>"
	"boarásmuvvat" Err/Orth-a-á V IV Ind Prs Sg1 <W:0.0> <firstCohort>
	"boarásmuvvat" v1 V IV Ind Prs Sg1 <W:0.0> <firstCohort>
"<»>"
	"»" PUNCT RIGHT <W:0.0> <SpaceOnRightSide>
	"”" PUNCT RIGHT Err/Orth <W:0.0> <SpaceOnRightSide>
: 
"<?>"
	"?" CLB <W:0.0>
:
\n
"<?>"
	"?" CLB <W:0.0>
"<B.>"
	"." CLB <W:0.0> "<.>"
		"B" N Sem/Sign ABBR Gram/TAbbr Attr <W:0.0> "<B>"
	"B" N Sem/Sign ABBR Gram/TAbbr Attr <W:0.0>
	"." CLB <W:0.0> "<.>"
		"B" N Sem/Sign ABBR Gram/TAbbr Sg Acc <W:0.0> "<B>"
	"B" N Sem/Sign ABBR Gram/TAbbr Sg Acc <W:0.0>
	"." CLB <W:0.0> "<.>"
		"B" N Sem/Sign ABBR Gram/TAbbr Sg Gen <W:0.0> "<B>"
	"B" N Sem/Sign ABBR Gram/TAbbr Sg Gen <W:0.0>
	"." CLB <W:0.0> "<.>"
		"B" N Sem/Sign ABBR Gram/TAbbr Sg Nom <W:0.0> "<B>"
	"B" N Sem/Sign ABBR Gram/TAbbr Sg Nom <W:0.0>
	"." CLB <W:0.0> "<.>"
		"Balphabet" N Sem/Sign ABBR Gram/TAbbr Attr <W:0.0> "<B>"
	"Balphabet" N Sem/Sign ABBR Gram/TAbbr Attr <W:0.0>
	"." CLB <W:0.0> "<.>"
		"Balphabet" N Sem/Sign ABBR Gram/TAbbr Sg Acc <W:0.0> "<B>"
	"Balphabet" N Sem/Sign ABBR Gram/TAbbr Sg Acc <W:0.0>
	"." CLB <W:0.0> "<.>"
		"Balphabet" N Sem/Sign ABBR Gram/TAbbr Sg Gen <W:0.0> "<B>"
	"Balphabet" N Sem/Sign ABBR Gram/TAbbr Sg Gen <W:0.0>
	"." CLB <W:0.0> "<.>"
		"Balphabet" N Sem/Sign ABBR Gram/TAbbr Sg Nom <W:0.0> "<B>"
	"Balphabet" N Sem/Sign ABBR Gram/TAbbr Sg Nom <W:0.0>
	"." CLB <W:0.0> "<.>"
		"b" Adv Sem/Time ABBR Gram/TNumAbbr Attr <W:0.0> "<B>"
	"." CLB <W:0.0> "<.>"
		"b" Adv Sem/Time ABBR Gram/TNumAbbr <W:0.0> "<B>"
"<Moske>"
	"Moske" N Prop Sem/Plc Attr <W:0.0> <LastCohort>
	"Moske" N Prop Sem/Plc Sg Nom <W:0.0> <LastCohort>

snomos · 2023-11-16T17:02:58Z

And then they are moved another time by the second whitespace analyser:

"<boarásmuvan>"
	"boarásmuvvat" Err/Orth-a-á V IV Ind Prs Sg1 <W:0.0> <firstCohort>
	"boarásmuvvat" v1 V IV Ind Prs Sg1 <W:0.0> <firstCohort>
"<»>"
	"»" PUNCT RIGHT <W:0.0> <SpaceOnRightSide>
	"”" PUNCT RIGHT Err/Orth <W:0.0> <SpaceOnRightSide>
:
\n
: 
"<?>"
	"?" CLB <W:0.0> <NoSpaceAfterPunctMark> <SpaceBeforePunctMark>
"<?>"
	"?" CLB <W:0.0> <NoSpaceAfterPunctMark>
"<B.>"
	"B" N Sem/Sign ABBR Gram/TAbbr Attr <W:0.0> <NoSpaceAfterPunctMark>
	"B" N Sem/Sign ABBR Gram/TAbbr Sg Acc <W:0.0> <NoSpaceAfterPunctMark>
	"B" N Sem/Sign ABBR Gram/TAbbr Sg Gen <W:0.0> <NoSpaceAfterPunctMark>
	"B" N Sem/Sign ABBR Gram/TAbbr Sg Nom <W:0.0> <NoSpaceAfterPunctMark>
	"Balphabet" N Sem/Sign ABBR Gram/TAbbr Attr <W:0.0> <NoSpaceAfterPunctMark>
	"Balphabet" N Sem/Sign ABBR Gram/TAbbr Sg Acc <W:0.0> <NoSpaceAfterPunctMark>
	"Balphabet" N Sem/Sign ABBR Gram/TAbbr Sg Gen <W:0.0> <NoSpaceAfterPunctMark>
	"Balphabet" N Sem/Sign ABBR Gram/TAbbr Sg Nom <W:0.0> <NoSpaceAfterPunctMark>

"<Moske>"
	"Moske" N Prop Sem/Plc Attr <W:0.0> <LastCohort>
	"Moske" N Prop Sem/Plc Sg Nom <W:0.0> <LastCohort>

So something is clearly wrong in the whitespace analysers.

snomos · 2023-11-16T17:06:11Z

I tried fixing the regex to open up for CR in d1bae3e but that did not help. Could you have a look, @unhammer ?

unhammer · 2023-11-18T19:11:42Z

:
\n

This is not fine. That should probably be

:\n

which would mean a newline occurred. There should be an initial colon before any lines with unanalysed data. Anything without an initial colon/tab/quote is ignored by divvun-suggest.

got to fix this in hfst-tokenise hfst/hfst#575 and divvun-suggest divvun/libdivvun#65

flammie · 2024-08-05T14:36:26Z

does this work correctly now? I get:

$ cat ~/github/divvun/libdivvun/foo | ~/github/hfst/hfst/tools/src/hfst-tokenize -g tools/grammarcheckers/tokeniser-gramcheck-gt-desc.pmhfst | divvun-blanktag tools/grammarcheckers/analyser-gt-whitespace.hfst | vislcg3 -g '/home/flammie/github/giellalt/lang-sme/tools/grammarcheckers/valency.bin' | vislcg3 -g '/home/flammie/github/giellalt/lang-sme/tools/grammarcheckers/mwe-dis.bin'  | cg-mwesplit  | divvun-blanktag '/home/flammie/github/giellalt/lang-sme/tools/grammarcheckers/analyser-gt-errorwhitespace.hfst' | divvun-cgspell -n 10 -b 15.000000 -w 5000.000000 -u 0.400000 -l '/home/flammie/github/giellalt/lang-sme/tools/grammarcheckers/acceptor.default.hfst' -m '/home/flammie/github/giellalt/lang-sme/tools/grammarcheckers/errmodel.default.hfst'  | vislcg3 -g '/home/flammie/github/giellalt/lang-sme/tools/grammarcheckers/valency-postspell.bin' | vislcg3 -g '/home/flammie/github/giellalt/lang-sme/tools/grammarcheckers/grc-disambiguator.bin'  | vislcg3 -g '/home/flammie/github/giellalt/lang-sme/tools/grammarcheckers/spellchecker.bin' | vislcg3 -g '/home/flammie/github/giellalt/lang-sme/tools/grammarcheckers/grammarchecker-release.bin'  | divvun-suggest -g '/home/flammie/github/giellalt/lang-sme/tools/grammarcheckers/generator-gramcheck-gt-norm.hfstol' -m '/home/flammie/github/giellalt/lang-sme/tools/grammarcheckers/errors.xml' -l se 
"<boarásmuvan>"
	"boarásmuvvat" v1 <mv> V IV Ind Prs Sg1 <W:0.0> <firstCohort> @+FMAINV &LINK &punct-aistton-right ID:1
punct-aistton-right
"<»>"
	"»" PUNCT RIGHT <W:0.0> <SpaceOnRightSide> &punct-aistton-right &space-before-punct-mark &LINK ID:2 R:LEFT:1
punct-aistton-right
space-before-punct-mark
	"»" PUNCT RIGHT <W:0.0> <SpaceOnRightSide> "boarásmuvan”"S &punct-aistton-right &SUGGESTWF ID:2 R:LEFT:1
punct-aistton-right
	"”" PUNCT RIGHT Err/Orth <W:0.0> <SpaceOnRightSide> &LINK &space-before-punct-mark ID:2 R:LEFT:1
space-before-punct-mark
: 
"<?>"
	"?" CLB <W:0.0> <SpaceBeforePunctMark>

"<?>"
	"?" CLB <W:0.0> <LastCohortOfParagraph>
:\r\n

"<B.>"
	"B" N <NomGenSg> Sem/Sign ABBR Gram/TAbbr Sg Nom <W:0.0> <firstCohortOfParagraph> <NoSpaceAfterPunctMark> @HNOUN &no-space-after-punct-mark ID:7 R:RIGHT:8
no-space-after-punct-mark
	"B" N <NomGenSg> Sem/Sign ABBR Gram/TAbbr Sg Nom <W:0.0> <firstCohortOfParagraph> <NoSpaceAfterPunctMark> @HNOUN "B. Moske"S &no-space-after-punct-mark &SUGGESTWF ID:7 R:RIGHT:8
no-space-after-punct-mark
	"Balphabet" N <NomGenSg> Sem/Sign ABBR Gram/TAbbr Sg Nom <W:0.0> <firstCohortOfParagraph> <NoSpaceAfterPunctMark> @HNOUN &no-space-after-punct-mark ID:7 R:RIGHT:8
no-space-after-punct-mark
	"Balphabet" N <NomGenSg> Sem/Sign ABBR Gram/TAbbr Sg Nom <W:0.0> <firstCohortOfParagraph> <NoSpaceAfterPunctMark> @HNOUN "B. Moske"S &no-space-after-punct-mark &SUGGESTWF ID:7 R:RIGHT:8
no-space-after-punct-mark
"<Moske>"
	"Moske" N Prop Sem/Plc Sg Nom <W:0.0> <LastCohort> @HNOUN &LINK &no-space-after-punct-mark ID:8
no-space-after-punct-mark
:\r\n
$ xxd ~/github/divvun/libdivvun/foo 
00000000: 626f 6172 c3a1 736d 7576 616e c2bb 203f  boar..smuvan.. ?
00000010: 3f0d 0a42 2e4d 6f73 6b65 0d0a            ?..B.Moske..

snomos changed the title ~~MSWord moves word to upper line when correction space error~~ MSWord moves word to upper line when correcting space error Feb 8, 2022

snomos added the gramcheck Issues restricted to the grammar checker label Mar 16, 2022

snomos assigned unhammer Nov 16, 2023

snomos added the bug Something isn't working label Nov 16, 2023

albbas mentioned this issue Sep 27, 2007

-headdjiid and -heddjiid ( #170

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MSWord moves word to upper line when correcting space error #50

MSWord moves word to upper line when correcting space error #50

duomdaamaendra commented Feb 8, 2022 •

edited by unhammer

Loading

duomdaamaendra commented Feb 8, 2022

lynnda-hill commented Oct 13, 2022

snomos commented Nov 16, 2023

snomos commented Nov 16, 2023

snomos commented Nov 16, 2023

snomos commented Nov 16, 2023

snomos commented Nov 16, 2023

snomos commented Nov 16, 2023

unhammer commented Nov 18, 2023 •

edited

Loading

flammie commented Aug 5, 2024

MSWord moves word to upper line when correcting space error #50

MSWord moves word to upper line when correcting space error #50

Comments

duomdaamaendra commented Feb 8, 2022 • edited by unhammer Loading

duomdaamaendra commented Feb 8, 2022

lynnda-hill commented Oct 13, 2022

snomos commented Nov 16, 2023

snomos commented Nov 16, 2023

snomos commented Nov 16, 2023

snomos commented Nov 16, 2023

snomos commented Nov 16, 2023

snomos commented Nov 16, 2023

unhammer commented Nov 18, 2023 • edited Loading

flammie commented Aug 5, 2024

duomdaamaendra commented Feb 8, 2022 •

edited by unhammer

Loading

unhammer commented Nov 18, 2023 •

edited

Loading