Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

罗马字标音 #15

Open
yanyin1986 opened this issue Jul 23, 2019 · 0 comments
Open

罗马字标音 #15

yanyin1986 opened this issue Jul 23, 2019 · 0 comments
Labels
iOS issue for iOS client type: enhancement New feature or request

Comments

@yanyin1986
Copy link
Member

let text = "8年前、東京電力福島第一原発で事故がありました。事故のあと、福島県では、放射線を出す物質で汚れた土や草、木などを取る作業をしています。"
        print(text)
        let tokens = Tokenizer.tokenize(text: text)
        for t in tokens {
            let locale = CFLocaleCreate(kCFAllocatorDefault,
                                        CFLocaleIdentifier("japanese" as CFString))
            let tokenizer = CFStringTokenizerCreate(kCFAllocatorDefault,
                                                    t as CFString,
                                                    CFRangeMake(0, t.count),
                                                    kCFStringTokenizerUnitWord,
                                                    locale)!
            var result = CFStringTokenizerAdvanceToNextToken(tokenizer)
            while result != .none {
                let r = CFStringTokenizerCopyCurrentTokenAttribute(tokenizer,
                                                                   kCFStringTokenizerAttributeLatinTranscription)
                if let rr = r {
                    let rrr = (r as! String).applyingTransform(.latinToHiragana, reverse: false)
                    print("\(t) => \(rr) => \(rrr!)")
                } else {
                    break
                }
                result = CFStringTokenizerAdvanceToNextToken(tokenizer)
            }
        }
struct Tokenizer {

    // MARK: - Publics
    static func tokenize(text: String) -> [String] {
        var tokens: [String] = []
        text.enumerateSubstrings(in: text.startIndex ..< text.endIndex, options: .byWords) { (subString, _, _, _) in
            if let substring = subString {
                tokens.append(substring)
            }
        }
        return tokens
    }
}
@yanyin1986 yanyin1986 added iOS issue for iOS client type: enhancement New feature or request labels Jul 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iOS issue for iOS client type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant