Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary formats #2

Open
nikita-moor opened this issue May 22, 2019 · 8 comments
Open

Dictionary formats #2

nikita-moor opened this issue May 22, 2019 · 8 comments

Comments

@nikita-moor
Copy link
Owner

nikita-moor commented May 22, 2019

Main dictionary format of the project is XDXF. GoldenDict is a good choice for desktop users, but mobile users has no good application.

Therefore, I want to know which dictionary application (Android or iOS) you use/prefer and what supplementary formats would be useful to produce for use on mobile devices?

Related discussion on XDXF project.

Multi-format dictionary shells:

StarDict ABBYY Dictd
Alpus ✓¹
GoldenDict Mobile
EBPocket Free ✓¹
Dictan ✓¹
Linguae (desktop)

¹ only source files (DSL)

  • Alpus (Android, iOS, desktop) [commercial]. Additional formats: XDXF, TSV/Plain dictionaries, ZIM files, MediaWiki dumps. Hunspell search is absent (was removed in v9.0).
  • GoldenDict Mobile (Android) [commercial]. Additional formats: Lingoes, Babylon. Hunspell search works very well.
  • EBPocket (Android, iOS, desktop) [commercial]. Additional formats: Mdict, EPWING, PDIC, LogoVista.
  • Dictan (Android). Additional formats: ZD, FDB.
  • Linguae (Windows, Linux). Additional formats: XDXF. Last release in 2011.

Dictionary formats

  • DICT - do clients support not only "plain/text"? Yes.
  • Sdictionary
  • Lingoes (LD2, LDF) - last release in 2014; no compiler.
  • Babylon (BGL) - Glossary Builder; probably, supports images.
  • SIL Multi-Dictionary Formatter (MDF) - discontinued.

ToDo

@nikita-moor
Copy link
Owner Author

nikita-moor commented May 26, 2019

Embedding images: StarDict

Support of images in different variants of StarDict:

r h x
Goldendict (desktop)
Goldendict Mobile (Android) ✗*
Alpus (Android, iOS, desktop)
ColorDict (Android)
Twinkle Star Dictionary (Android)
EBPocket Free (Android, iOS, desktop)
Dicty (iOS)
QDict (Android) abandoned
Dict Box (Android, iOS)
Stardict Dictionary for PC (Android)
WordMateX (Android)
DusalDict (Android)

Formats:

  • r/resource - tested both 'res' folder and 'res.rdic' pack
  • h/html - tested tag <img> with different filename referencing
  • x/xdxf - tested tags <img> and <rref> (both variants: ed. 32 and 33)

Notes:

  • Goldendict Mobile - supports images encoded in base64: <img src="data:image/…" />
  • Twinkle Star Dictionary - only uncompressed res/ folder
  • EBPocket Free - accepts XDXF old "rref": <rref>1.png</rref>
  • Alpus - images should be compressed into a zip-archive; x-format — only <img>

Testing boundle: stardict-test-img.zip

  • updated 2019-07-12: added resource database file ".rifo"
  • updated 2019-07-13: added base64 image

Could import StarDict files:

@nikita-moor
Copy link
Owner Author

nikita-moor commented May 26, 2019

Embedding images: ABBYY DSL

This format is most popular in Russia(?), so not many applications support it. Official mobile client ABBYY Lingvo Dictionaries does not allow adding custom dictionaries. However, it contains free Latin-Russian dictionary…

application images
Goldendict Mobile
Alpus

Notes:

  • Alpus - images should be compressed into a zip-archive
  • Home Dictionary - not tested; declaring DSL support. Commercial and free Demo versions.

@nikita-moor
Copy link
Owner Author

nikita-moor commented May 27, 2019

Embedding images: general

Even when some applications could show images referenced in the dictionary articles, all of them do it directly in the application window. As a result, full-page scans are either too small to read or too big to fit the screen; no application provides comfortable zoom/navigation.

It depends on the device size, so what is good for tablet may be inconvenient for smartphone. I would prefer an option of switching between full image and icon-size, so the image be open in an external image viewer. For illustration (here Alpus did not recognize two pages in TIFF CCITT G4 format):

icon-preview

full-preview

@nikita-moor
Copy link
Owner Author

nikita-moor commented May 28, 2019

Format MDict

File format v2.0; images are stored in MDD file and referenced as <img src="picture.png"/>.

application images
GoldenDict (desktop)
MDict (Android, iOS, desktop)
Eudic/欧路词典 (Android, iOS, desktop)
BlueDict (Android)
Plain Dictionary (Android)
EBPocket Free (Android, iOS, desktop)
Medict (desktop) ?
SkyDic (Windows Phone) ?

Formatting: manual compilation, python-writemdict.

Per app shortages:

  • Plain Dictionary - if there are several articles corresponding to the keyword, user could watch them only one by one. Part of the code is open.
  • Eudic - images loading is slow.
  • EBPocket Free - incomplete support of MDict v2.0? Does not recognize references @@@LINK=keyword.

Comments

MDict format is very pleasant; ability to include custom CSS styles and JS libraries is unique and very powerful. Dictionary applications are alive and actively developed. Morphology search is supported in MDict (Hunspell) and BlueDict (separate dictionary; probably applicable to other shells).

MDict is a commercial closed format. Wang Xiaoqiang and @zhansliu analyzed versions 1.2 and 2.0, most of the libraries for Python, Java, JavaScript, etc. are based on their description of the format. Does third-party dictionary shells support current versions 4.0?

@nikita-moor
Copy link
Owner Author

nikita-moor commented Jun 12, 2019

Format: Slob

Slob is another perspective format. It supports including images, CSS styles and JavaScript code in one file.

Dictionary shell Aard 2 for Android is open source and does not apply limitations on use (such as no more than 5 dictionaries in Free version); GoldenDict supports Slob format. There are extensive Python libraries and tools.

Disadvantages

  1. Slob is a container, text-content coding is not standardized and expected to be plain text or HTML.
  2. Slob is supported only by Aard 2 (mobile) and GoldenDict (desktop). Having dictionaries in other formats (StarDict, DSL, MDict), users would be obligated to work with two shells simultaneously.
  3. Binary format.
  4. Inter-dictionary links are not supported.

Issues

  • Aard 2 does not recognize absolute paths (/res/styles.css) of the embedded resources, use prefer relative filenames (res/styles.css):
    • article HTML content: <link href="res/styles.css" rel="stylesheet" type="text/css">
    • embedding files: slob.add_dir(slb, prefix='res/', topdir="../sources")

ToDo

  • Is there morphology search in Aard 2?

Conclusion

Slob has all features I like in MDict. It is an open format, but MDict is more popular and better supported by dictionary shells.

@nikita-moor nikita-moor changed the title Additional formats Dictionary formats Jul 26, 2019
@michaelbeijer
Copy link

Does third-party dictionary shells support current versions 4.0?

See my wiki: http://beijer.wiki/mdxbuilder-manual_eng.txt
No, not that I know of. To convert MDict files (in .mdx format) for use in GoldenDict, you need to use MDXBuilder. The latest version of MDXBuilder (available on MDict website) does not (yet) generate files that can be handled by GoldenDict. For this to work you need an older version of MDXBuilder. I managed to find one online (from hi-pda.com), and have uploaded it to http://beijer.wiki/storage/MdxBuilder-(downloaded-from-hi-pda.com).zip in case you're looking for a copy.

Michael Beijer (technical translator, beijer.uk/beijer.wiki)

@nikita-moor
Copy link
Owner Author

nikita-moor commented Nov 28, 2021

Use discussions, please, for further talks. This page is intended to be a place for documentation.

Repository owner deleted a comment from soshial Dec 4, 2021
Repository owner deleted a comment from ilius Dec 4, 2021
Repository owner deleted a comment from ilius Dec 4, 2021
@nikita-moor
Copy link
Owner Author

nikita-moor commented Dec 4, 2021

Format: DICT

DICT is a dictionary network protocol created in 1997 (it can work locally on the user's computer). Articles can be provided as plain text or HTML (or any other format with appropriate MIME header). All servers, particularly Dictd and Dico, support MIME option. Also, some dictionary shells, such as GoldenDict, can read files in DICT format directly.

Clients

Plain text HTML
GoldenDict desktop ✓¹
Lingoes ? ?
GNOME Dictionary ✗²
xfce4-dict
GoldenDict Mobile ✗¹
Fora Dictionary / Alpus ✗¹

¹ can read DICT files directly
² not implemented

Example

Conclusion

There were many clients in the past, but now we have only GoldenDict. However, it can be a good way to make an online dictionary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants