Skip to content

Latest commit

 

History

History
82 lines (69 loc) · 3.97 KB

common-voice-taiwanese-plan.md

File metadata and controls

82 lines (69 loc) · 3.97 KB
tags: Common Voice,CC0-Corpus

台語 Common Voice 計畫共筆

Shared-notes of Taiwanese in Common Voice planning

Language Code: nan-tw

Specific for "Minnan in Taiwan" (Taiwanese)

  • ISO 639-3 defined nan as Minnan (Hokkien), the code is shared by Taiwanese Hokkien, Hokkien in Southeastern China and Hokkien in Southeastern Asia.
  • Due to words differences, political sensitive sentences (for China) and different Latin/phenotic system of Taiwan to China, we suggested add tw to nan to specify the locale is for Taiwanese.

網站介面 UI translation on Pontoon

options

  1. (prefer) Translate UI into Taiwanese
    • language code: nan-tw
    • Han character (漢字) with Tâi-lô (TL, 台羅) phenotic system
      • Fork zh-tw into nan-tw at the beginning, Taiwanese community will translate Taiwan Mandarin into Taiwanese one by one
        • The translation progress don't need to be a blocker.
  2. Use zh-tw UI directly without new ponton locale
    • It won't be problem for website user, because more than 99% of the potential recorder should be able to read Chinese
      • (Irvin) I believe there won't be any user that can only read Taiwanese
    • People from Taiwanese language community present strongly disagree with this option due to it's implicit that Taiwanese is like sub-language of Taiwan Mandarin.

句庫 Text Corpus

Writing system options

  1. All Han characters
    • Follow the MOE (Ministry of Education) standard
  2. All Latin alphabet (Lô-má-jī)
    • Problems: When multiple accents available, which should be spelled?
    • Should we list all the accents as different sentences and ask people to record according to the spelling?
      • No, we shouldn't ask people to pronounciate in non-native accents.
  3. (prefer) Mainly Han chars, supplemented by Lô-má-jī
    • Lô-má-jī is only used when the word
      • is not written in MOE dictionary
      • is ambiguous in pronunciation only by Han characters
      • has no corresponding Han characters
      • cannot be displayed on mobile or older devices (...tbd)
      • has a variety of readings and must be marked

examples

    <!-- some Taiwanese proverbs-->
    - 上愛食番仔番薯
    - 嫁著做田翁,無法梳頭鬃
    - 人肉鹹鹹,袂食得
    - 媽祖宮起毋著面,痟的出袂盡
    - 一千銀,毋值四兩
    - 七月頓頓飽,八月攏無巧(khá)
    - 三个錢尪仔,栽四个錢喙鬚
    - 了錢生理無人做,刣頭生理有人做
    - 五支指頭仔咬起來逐支嘛疼
    - 二更更,三暝暝
    - 四算錢,五燒香,六拜年
    - 七七四十九(sù/sìr-si̍p-kiú)
    - 問娘(mn̄g/muī)何月(hô gue̍h/ge̍h/ge̍rh)有(iú)
    - 除卻(tû/tîr/tî-khioh)母(bó)生年(senn/sinn-nî)
    - 再添(tsài thiam)一十九(it-si̍p-kiú)
    - 交陪醫生腹肚做藥櫥,交陪牛販仔駛瘦牛
    - 人無橫財袂富,馬無野草袂肥
    - 偷食袂瞞得喙齒,討翁袂瞞得鄉里
    - 棚頂做甲流汗,棚跤嫌甲流瀾
    - 兄弟若手足,某囝若衫褲
    - 勸人𬦰(peh)上樹,樓梯夯咧走
    - 南斗註生,北斗註死
    - 呂洞賓葫蘆內的藥,醫別人無醫家己
    - 和好人行,有布通經;和歹人行,有囝通生
    - 善的掠來縛,惡的放伊去
    - 你這款病,是腹肚內有應聲蟲咧作怪
    - 這馬症頭已經誠嚴重矣
    - 若閣拖落去毋趕緊共治予好,早慢會穢著你的某囝
    - 你提轉去了後,就一項一項共伊讀出來
    - 若拄著應聲蟲毋敢應的,你就用彼帖藥仔來治伊
    - 我散赤閣頇顢,萬項代誌都袂曉
    - 只會當靠這來趁食過日
    - 論真講,想欲對付伊是真簡單
    - 沓沓仔觀察伊的反應,就知影伊有啥物臭空矣