You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the configuration of sentsplit is done via dicts in config.py.
And customizing the config setting is done as follows:
fromcopyimportdeepcopyfromsentsplit.configimportko_configfromsentsplit.segmentimportSentSplitmy_config=deepcopy(ko_config)
my_config['segment_regexes'].append({'name': 'tilde_ending', 'regex': r'(?<=[다요])~+(?= )', 'at': 'end'})
sent_splitter=SentSplit('ko', **my_config)
sent_splitter.segment('안녕하세요~ 만나서 정말 반갑습니다~~ 잘 부탁드립니다!')
# results with the regex: ['안녕하세요~', ' 만나서 정말 반갑습니다~~', ' 잘 부탁드립니다!']# results without the regex: ['안녕하세요~ 만나서 정말 반갑습니다~~ 잘 부탁드립니다!']
Improve config structure so that it facilitates easier customization and maintenance.
The text was updated successfully, but these errors were encountered:
dicts
inconfig.py
.The text was updated successfully, but these errors were encountered: