Skip to content

yanboyang713/Metamorphic-Testing-of-Machine-Translation-Software

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automatic Evaluation of Machine Translation Software

Quick Start Guide

To run the program you will need to install Python 3 and PIP3. A file setup.sh has been provided to allow for the installation of dependencies, however if using a non-linux system the user will need to manually install dependencies.

To execute the program:

  • Run the main.py file.
  • If you wish to generate test data, that is lines to be translated, enter ‘y’ at the prompt. Then enter the number of lines to generate.
  • Enter the starting line for the excel file
  • Enter the final line for the excel file
  • Wait until the analysis has been completed
  • You will need to terminate the program manually with ctrl-c as there is a background job to refresh a token with a translating service.
  • The results of the anlysis will be located in the file local_test_results.xlxs

Should the security keys for any of the translation services become out of date or invalidated you will need to update them before executing the file. Instructions which can be found included for each translator.

Please note that the results from our analysis have been left in the local_test_results.xlxs file for the sake of completeness, new results from any further execution of the program will be appended to the existing file.

Overview

The task of assessing the quality of machine translation software is naturally difficult, due to the lack of test oracle. This is known as the oracle problem. To address the inability in addressing translations, metamorphic testing can be applied to generate a suitable test case such that the metamorphic relations of translations are not violated. In this study, we will focus on the machine translation services: Google, Bing, and Youdao, and will attempt to use a metamorphic approach to quantify the consistency of each of these services as well as rank their general performance on a select number of focus languages. In this study we consider English, Chinese, Japanese, Korean, French, Russian, Portuguese, Spanish, and This project we are apply metamorphic testing to the evaluation of translation software. We are using round-trip translation metamorphic relation.

  • Firstly, we have get test case from Wikipedia.
  • Secondly, we have evaluation three translation tools, Youdao translation, bing translation and google translation
  • Thirdly, we have using
  • BLEU (Bilingual Evaluation Understudy) — One of the most popular forms of evaluation as it correlates highly with human judgements of quality.
  • Levenshtein Distance — The number of deletions, insertions or substitutions required to transform one string into another.
  • Cosine Similarity

algorithm for evaluating the quality of translated text which has been using machine-translate.

Test case

In our project, the of test cases are atomatically captured from the internet. We are using a dictionaries as key word put in Wikipedia search engine for get single sentence from Wikipedia.

Dictionary

Tushar, Shantanu (2013, pp. 219 - 220) show that WordList(Dictionary) is a standard file on all Unix and Unix-like operating systems. The wordList file is usually stored in /usr/share/dict/words or /usr/dict/words.[1] The wordList story more than 60,000 meanful words.

In this project, an online website is applied to obatin the dictionary which has generated the wordlist programmed by stndards Unix tools.[2] Due to the result to be catched in python will be a list-structure ASCII, the reuslt needs to be post-edited with spliting into individual words for using.

Wikipedia API

Wikipedia API is a Python library for accessing and parsing data from Wikipedia. It is a powerful and convenient tool to obatin text, links and images form wikipidia. Due to some literal contents are needed as the test samples in the project, the API are applied to search the sentences to build the test datasets. Library address: https://pypi.python.org/pypi/wikipedia

Get test case

To be make to result valid and accurate, the size test-sample is supposed to be large enough. Considered about the size of our group and the tools we have, the quantity of the testing sample is decided to be 1000 sentences. In the dictionary there are near 70 thousands words. In the wordlist same meaningful word may have different form.

./img/wordlist.png

We want to reduce probability have the same sentences, so we decide generate each random number from interval range for avoid get same sentences. The interval size is (size of dictionary /size of test sample) / 2, divide by 2 for let rest part to solve WiKi raise Exception or return empty string situation. We also need a lower bound variable for record start point for generate random number. And upper bound = lower bound + interval size. As a result, each time when we get random number in each interval. We only need generate random number from lower bound to upper bound. After give upper bound value to lower bound and calculate new upper bound. When the program starts running, it will keep counting until 1000 valid sentences has been written to excel file. The ratio of seeds is about 1.43% in the wordlist except WiKi raise Exception or return empty string situation. The real radio of seeds will a little bit high than 1.43%. If the radio of seeds is equal to 100% mean must be have same sentences. In our 1000 test data, we have got very lower repetitive rate. The specific steps are:

  • Using above method get the seed word
  • Uing seed word query Wikipedia search engine(we have using Wikipedia APi)
  • get first sentence from summary part(return by Wikipedia).
  • If Wikipedia raise Exception or return empty string, We will ignore it, traverse the next interval (The specific method will discuss later).
  • If have valid sentence return, we will direct written to excel file.

Merge the transaltion result

The test data sample has been divided to several parts for transalting in order to increase the efficiency of the work. After the translation, different translated parts will be merged together into one excel file row by row with the identical order of the orginal test data sample. It is an atomatic procedure through a program.

Vadliate the translation result

The ultimate prodeuced excel file includes the orginal English senteces and the translation results of other eight laugnages(zh-CHS’, ‘ja’, ‘ko’, ‘fr’, ‘ru’, ‘pt’, ‘es’, ‘sv’). One sentence is for a row and one language is for a column. For each sentence, the transaltion will be regards as invalid only if the transaltion result of every lauguage is invalid(which means empty in contents). In the way the program will make a judegment for each row so that the ones that has successful translation for at least one language will be kept. Ohterwise, the whole row will be set as empty.

translate tool

youdao translate

Youdao (有道) is a search engine released by Chinese internet company NetEase (網易). This search engine not only support English and Chinese. Also can support lots of others language. language spuuort list:

languagecode
Chinesezh-CHS
Japaneseja
EnglishEN
Koreanko
Frenchfr
Russionru
Portuguesept
Spanishes

Create account, get application key and password

When you want to using youdao api for translation, First, you must creat a account in YOUDAO ZHI YUN. THis is link http://ai.youdao.com . I have choose using my wechat to login ZHI YUN. Because, each time when I log in. I only need scan QR code in my wechat for convenience. After you need do some set up for get appKey and key, both are inportance for you send POST requie. There is step by step

  • go to application management
  • click my application
  • creat a new application, filed info and create
  • create a translation instance and bind with you application, which is you before you have created.

When you finish all of step you can start using YOUDAO API. :)

Using YOUDAO API guide

This youdao translate API, we can using http or https POST to send our sample data(sentence and paragraph) to youdao and get translated data return by JSON.

youdao api http address: http://openapi.youdao.com/api youdao api https address: https://openapi.youdao.com/api

This is a exmple for translate good(English) word to chinese’s POST URL. http://openapi.youdao.com/api?q=good&from=EN&to=zh_CHS&appKey=ff889495-4b45-46d9-8f48-946554334f2a&salt=2&sign=1995882C5064805BC30A39829B779D7B

Field NametypemeanMust filedComment
qtextwant translate textTruemust be UTF-8
fromtextfrom which languageTruemust in language support list(you also can set to auto)
totexttarget languageTruemust in language support list(you also can set to auto)
appKeytextapplication keyTrueyou can find in application management in youdao ZHI YUN
salttextrandom numberTrue
signtextsignitureTrueMD5(appKey + q + salt + key) key you can find in application management in YOUDAO ZHI YUN

You can get a JSON file back. In JSON file only have two colum is importance in our system, one is errorCode, and another one is translation If errorCode is 0 mean no error. and translation is our most inerest part is our translate result. This is a example { “errorCode”: “0”, “translation”: [“大丈夫です”] } All of code for youdao, please have a look youdao.py in code folder ./img/youdaoZhiYun.png

bing translate

Bing translate(Microsoft Translate) is a multilingual machine translation cloud service provided by Microsoft. Bing translator API include Text translation, Speech translation and Text to speech. However, I am only using text translation in this project.

Create account, get subscribe ID, get Key 1 and Key 2

This is frist step for using bing translator API.

  1. sign into Azure. link https://azure.microsoft.com/en-gb/account/
    • click MY ACCOUNT
    • click AZURE portal
    • I am using my [email protected] to login, I need to choose Work or school account
    • go to the Cognitive Service section
    • under API type select the Text and fill out the rest of the form and creat subscribe
    • get authentication key
      • In menu All Resources
      • click on your subscription, you can find subscription if in overview and Key 1 and Key 2 in resource management keys

./img/azure.png ./img/subscription.png

Obtaining A Google Cloud Translate API Key

  1. Go to the resource manager and create a new project: https://console.cloud.google.com/cloud-resource-manager
  2. Select and enable billing from the left hand panel on the following page: https://console.cloud.google.com/home/dashboard
  3. Return to your new project and search for translate in the search panel at the top.
  4. Select and enable the Google Cloud Translation API.
  5. Select credentials fomr the left hand menu and then select Create Credentials
  6. Choose a service account key, create a new service account and then download the JSON file.
  7. Rename the downloaded file ‘Google Credentials.json’ and save it in the root directory of the project.

Further instructions can be found in Google’s Quickstart Guide: https://cloud.google.com/translate/docs/getting-started

Reference List

[1] Tushar, Shantanu (2013). Linux Shell Scripting Cookbook. Birmingham, UK.: Packt Publishing. pp. 219–220. ISBN 978-1-78216-275-9. [2] An English Word List. 2017. An English Word List. [ONLINE] Available at: http://www-personal.umich.edu/~jlawler/wordlist.html. [Accessed 05 October 2017]. [3]

About

Automatic Metamorphic Testing of Machine Translation Software

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published