You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current codebase of Tipograph is old and I have some ideas to make it better.
Tools
ES2015 JavaScript with Flow type annotations.
Interface
NodeJS interface should be changed into more "node" way: no capital letters.
// ES2015 modulesimport{replace,languages}from'tipograph';replace('text');// replace.all()replace.quotes('text');// CommonJSconst{ replace, languages }=require('tipograph');
Engine
Regexes are quick and sort of nice and clear way how to express things. But regex-based solution has some drawbacks: mainly that it has to go through the whole source on every rule, input format has to be plain text (HTML support is now quite hacky) and I think it can choke on very large inputs.
I am thinking about a different approach based on theory of finite state transducers. This would be the data pipeline in new architecture:
Parse input file which can be in any format any emit two types of tokens: format and content. Format tokens will be just passed along the pipeline without change and then copied to the output. Content tokens will be a subject of further analysis. The advantage is that Tipograph can eventually support any input format without need to change the core engine.
Tokenize content tokens into smaller units which kind of make sense to typography analysis. This has to be further analysed but I am thinking about for example word, space, number, quote and so on.
These tokens will then serve as an input to finite state transducer which will be fed with these tokens and will emit typographically correct tokens. This transducer will be driven by its state so I believe it is possible to achieve behavior such as quotes substitution and others with this architecture. The challenge will be support of customizable rules as well as turning on and off various substitutions.
This is going to be a long way and I have no much time to do it now. But hopefully, in the future I (we?) will make Tipograph much better tool. If you have any comment, feel free to put it here.
The text was updated successfully, but these errors were encountered:
A lot of information in this proposal is outdated since the rewrite for 0.4.0 (namely, the external interface and format/content tokens are already implemented).
I am not sure if this reimplementation would bring such performance improvement that it would be worth it despite of its drawbacks, but it can be a quite a challenge to do it and a fun experiment.
The current codebase of Tipograph is old and I have some ideas to make it better.
Tools
Interface
NodeJS interface should be changed into more "node" way: no capital letters.
Engine
Regexes are quick and sort of nice and clear way how to express things. But regex-based solution has some drawbacks: mainly that it has to go through the whole source on every rule, input format has to be plain text (HTML support is now quite hacky) and I think it can choke on very large inputs.
I am thinking about a different approach based on theory of finite state transducers. This would be the data pipeline in new architecture:
format
andcontent
. Format tokens will be just passed along the pipeline without change and then copied to the output. Content tokens will be a subject of further analysis. The advantage is that Tipograph can eventually support any input format without need to change the core engine.word
,space
,number
,quote
and so on.This is going to be a long way and I have no much time to do it now. But hopefully, in the future I (we?) will make Tipograph much better tool. If you have any comment, feel free to put it here.
The text was updated successfully, but these errors were encountered: