Phonetic transcription #395

springlaughing · 2024-02-08T20:40:53Z

refinery

Tested by creator on refinery
Tested by reviewer on refinery
Ensured that output of brick conforms with refinery structure (to be checked by reviewer)

API

Tested by creator on localhost:8000/docs
Tested by reviewer on localhost:8000/docs

common code

Common code tested in notebook/ script by creator
Common code tested in notebook/ script by reviewer
Common code contains docstrings and type hints

additional points:

Docstring and README is existing
Import statements (in __init__.py)
(If necessary) Added dependency to requirements.txt
(If necessary) Added dependency to issue for refinery env here
Published brick to Strapi CMS (locally)

Testing procedure:
When testing in refinery, please ensure that the output of the brick conforms with the structure of refinery.
For extraction bricks, this would be a tuple like ("label", span_start, span_end).
For classification bricks, this would be a string representing a label.
For generator bricks, this would either be a float, interger, string, boolean or a list, depending on the situation.

When testing the bricks, try to avoid using only one source of data. Meaning that you should not only use the clickbait sample
project, but also different texts with longer or more complex strings.

A small refinery example project with a variation of texts called bricks-test-data-project.zip can be found in the bricks repository.

…transcription brick

…readme files. This will be bricks issue 278.

springlaughing · 2024-02-08T20:50:02Z

This one implements issue #278.

Hello, trying to make another brick, this time - phonetic transcriptor.
There are some things to note about this one:

In general, Linux or WSL required (at least for English due to Flite)
CEDICT .txt file is required for Chinese

Here are steps to organize the environment to run the package:

Install epitran: pip install epitran
Install jieba: pip install jieba

Get Flite for English:
git clone http://github.com/festvox/flite
cd flite
./configure
make
sudo make install
cd testsuite
make lex_lookup
sudo cp lex_lookup /usr/local/bin

Get Cedict for Chinese:
https://www.mdbg.net/chinese/dictionary?page=cedict - download and unpack, provide this path to cedict_path inside the phonetic_transcriptor function.

LeonardPuettmannKern · 2024-02-14T14:01:53Z

Hi @springlaughing, thank you for the contribution! Code looks good so far, will test more thoroughly, though. As this brick will require some dependencies to be installed, we will most likely wait until the next release to merge this, as our dev team can then also add the requirements to our tool refinery for the bricks integration. Do you know if flite is definitely needed, or if only epitran or jieba are needed for this? :)

springlaughing · 2024-02-15T01:06:33Z

Hi @springlaughing, thank you for the contribution! Code looks good so far, will test more thoroughly, though. As this brick will require some dependencies to be installed, we will most likely wait until the next release to merge this, as our dev team can then also add the requirements to our tool refinery for the bricks integration. Do you know if flite is definitely needed, or if only epitran or jieba are needed for this? :)

Yes, Flite is needed to be able to use epitran to get phonetic transcriptions for English language, here is the screenshot from epitran Github page https://github.com/dmort27/epitran:

Another thing is with Chinese: Cedict is needed to be able to use epitran for getting phonetic transcriptions for Chinese, as mentioned on the epitan page:

Additionally, I have used jieba as tokenizer for Chinese, but it shouldn't be a problem as it is a simple dependency install and MIT Licence.
:)

springlaughing added 4 commits February 8, 2024 03:47

New directory in generators added, containing files for new phonetic_…

483b83c

…transcription brick

Added phonetic_transcriptor, init, backup, common, refinery, config, …

b5ac385

…readme files. This will be bricks issue 278.

Tested with FastAPI.

3ef8879

Added requirements.

bf2c4a3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phonetic transcription #395

Phonetic transcription #395

springlaughing commented Feb 8, 2024

springlaughing commented Feb 8, 2024

LeonardPuettmannKern commented Feb 14, 2024

springlaughing commented Feb 15, 2024

Phonetic transcription #395

Are you sure you want to change the base?

Phonetic transcription #395

Conversation

springlaughing commented Feb 8, 2024

springlaughing commented Feb 8, 2024

LeonardPuettmannKern commented Feb 14, 2024

springlaughing commented Feb 15, 2024