To run the program you will need to install Python 3 and PIP3. A file setup.sh has been provided to allow for the installation of dependencies, however if using a non-linux system the user will need to manually install dependencies.
To execute the program:
- Run the main.py file.
- If you wish to generate test data, that is lines to be translated, enter ‘y’ at the prompt. Then enter the number of lines to generate.
- Enter the starting line for the excel file
- Enter the final line for the excel file
- Wait until the analysis has been completed
- You will need to terminate the program manually with ctrl-c as there is a background job to refresh a token with a translating service.
- The results of the anlysis will be located in the file local_test_results.xlxs
Should the security keys for any of the translation services become out of date or invalidated you will need to update them before executing the file. Instructions which can be found included for each translator.
Please note that the results from our analysis have been left in the local_test_results.xlxs file for the sake of completeness, new results from any further execution of the program will be appended to the existing file.
The task of assessing the quality of machine translation software is naturally difficult, due to the lack of test oracle. This is known as the oracle problem. To address the inability in addressing translations, metamorphic testing can be applied to generate a suitable test case such that the metamorphic relations of translations are not violated. In this study, we will focus on the machine translation services: Google, Bing, and Youdao, and will attempt to use a metamorphic approach to quantify the consistency of each of these services as well as rank their general performance on a select number of focus languages. In this study we consider English, Chinese, Japanese, Korean, French, Russian, Portuguese, Spanish, and This project we are apply metamorphic testing to the evaluation of translation software. We are using round-trip translation metamorphic relation.
- Firstly, we have get test case from Wikipedia.
- Secondly, we have evaluation three translation tools, Youdao translation, bing translation and google translation
- Thirdly, we have using
- BLEU (Bilingual Evaluation Understudy) — One of the most popular forms of evaluation as it correlates highly with human judgements of quality.
- Levenshtein Distance — The number of deletions, insertions or substitutions required to transform one string into another.
- Cosine Similarity
algorithm for evaluating the quality of translated text which has been using machine-translate.
In our project, the of test cases are atomatically captured from the internet. We are using a dictionaries as key word put in Wikipedia search engine for get single sentence from Wikipedia.
Tushar, Shantanu (2013, pp. 219 - 220) show that WordList(Dictionary) is a standard file on all Unix and Unix-like operating systems. The wordList file is usually stored in /usr/share/dict/words or /usr/dict/words.[1] The wordList story more than 60,000 meanful words.
In this project, an online website is applied to obatin the dictionary which has generated the wordlist programmed by stndards Unix tools.[2] Due to the result to be catched in python will be a list-structure ASCII, the reuslt needs to be post-edited with spliting into individual words for using.
Wikipedia API is a Python library for accessing and parsing data from Wikipedia. It is a powerful and convenient tool to obatin text, links and images form wikipidia. Due to some literal contents are needed as the test samples in the project, the API are applied to search the sentences to build the test datasets. Library address: https://pypi.python.org/pypi/wikipedia
To be make to result valid and accurate, the size test-sample is supposed to be large enough. Considered about the size of our group and the tools we have, the quantity of the testing sample is decided to be 1000 sentences. In the dictionary there are near 70 thousands words. In the wordlist same meaningful word may have different form.
We want to reduce probability have the same sentences, so we decide generate each random number from interval range for avoid get same sentences. The interval size is (size of dictionary /size of test sample) / 2, divide by 2 for let rest part to solve WiKi raise Exception or return empty string situation. We also need a lower bound variable for record start point for generate random number. And upper bound = lower bound + interval size. As a result, each time when we get random number in each interval. We only need generate random number from lower bound to upper bound. After give upper bound value to lower bound and calculate new upper bound. When the program starts running, it will keep counting until 1000 valid sentences has been written to excel file. The ratio of seeds is about 1.43% in the wordlist except WiKi raise Exception or return empty string situation. The real radio of seeds will a little bit high than 1.43%. If the radio of seeds is equal to 100% mean must be have same sentences. In our 1000 test data, we have got very lower repetitive rate. The specific steps are:
- Using above method get the seed word
- Uing seed word query Wikipedia search engine(we have using Wikipedia APi)
- get first sentence from summary part(return by Wikipedia).
- If Wikipedia raise Exception or return empty string, We will ignore it, traverse the next interval (The specific method will discuss later).
- If have valid sentence return, we will direct written to excel file.
The test data sample has been divided to several parts for transalting in order to increase the efficiency of the work. After the translation, different translated parts will be merged together into one excel file row by row with the identical order of the orginal test data sample. It is an atomatic procedure through a program.
The ultimate prodeuced excel file includes the orginal English senteces and the translation results of other eight laugnages(zh-CHS’, ‘ja’, ‘ko’, ‘fr’, ‘ru’, ‘pt’, ‘es’, ‘sv’). One sentence is for a row and one language is for a column. For each sentence, the transaltion will be regards as invalid only if the transaltion result of every lauguage is invalid(which means empty in contents). In the way the program will make a judegment for each row so that the ones that has successful translation for at least one language will be kept. Ohterwise, the whole row will be set as empty.
Youdao (有道) is a search engine released by Chinese internet company NetEase (網易). This search engine not only support English and Chinese. Also can support lots of others language. language spuuort list:
language | code |
---|---|
Chinese | zh-CHS |
Japanese | ja |
English | EN |
Korean | ko |
French | fr |
Russion | ru |
Portuguese | pt |
Spanish | es |
When you want to using youdao api for translation, First, you must creat a account in YOUDAO ZHI YUN. THis is link http://ai.youdao.com . I have choose using my wechat to login ZHI YUN. Because, each time when I log in. I only need scan QR code in my wechat for convenience. After you need do some set up for get appKey and key, both are inportance for you send POST requie. There is step by step
- go to application management
- click my application
- creat a new application, filed info and create
- create a translation instance and bind with you application, which is you before you have created.
When you finish all of step you can start using YOUDAO API. :)
This youdao translate API, we can using http or https POST to send our sample data(sentence and paragraph) to youdao and get translated data return by JSON.
youdao api http address: http://openapi.youdao.com/api youdao api https address: https://openapi.youdao.com/api
This is a exmple for translate good(English) word to chinese’s POST URL. http://openapi.youdao.com/api?q=good&from=EN&to=zh_CHS&appKey=ff889495-4b45-46d9-8f48-946554334f2a&salt=2&sign=1995882C5064805BC30A39829B779D7B
Field Name | type | mean | Must filed | Comment |
---|---|---|---|---|
q | text | want translate text | True | must be UTF-8 |
from | text | from which language | True | must in language support list(you also can set to auto) |
to | text | target language | True | must in language support list(you also can set to auto) |
appKey | text | application key | True | you can find in application management in youdao ZHI YUN |
salt | text | random number | True | |
sign | text | signiture | True | MD5(appKey + q + salt + key) key you can find in application management in YOUDAO ZHI YUN |
You can get a JSON file back. In JSON file only have two colum is importance in our system, one is errorCode, and another one is translation If errorCode is 0 mean no error. and translation is our most inerest part is our translate result. This is a example { “errorCode”: “0”, “translation”: [“大丈夫です”] } All of code for youdao, please have a look youdao.py in code folder
Bing translate(Microsoft Translate) is a multilingual machine translation cloud service provided by Microsoft. Bing translator API include Text translation, Speech translation and Text to speech. However, I am only using text translation in this project.
This is frist step for using bing translator API.
- sign into Azure. link https://azure.microsoft.com/en-gb/account/
- click MY ACCOUNT
- click AZURE portal
- I am using my [email protected] to login, I need to choose Work or school account
- go to the Cognitive Service section
- under API type select the Text and fill out the rest of the form and creat subscribe
- get authentication key
- In menu All Resources
- click on your subscription, you can find subscription if in overview and Key 1 and Key 2 in resource management keys
- Go to the resource manager and create a new project: https://console.cloud.google.com/cloud-resource-manager
- Select and enable billing from the left hand panel on the following page: https://console.cloud.google.com/home/dashboard
- Return to your new project and search for translate in the search panel at the top.
- Select and enable the Google Cloud Translation API.
- Select credentials fomr the left hand menu and then select Create Credentials
- Choose a service account key, create a new service account and then download the JSON file.
- Rename the downloaded file ‘Google Credentials.json’ and save it in the root directory of the project.
Further instructions can be found in Google’s Quickstart Guide: https://cloud.google.com/translate/docs/getting-started
[1] Tushar, Shantanu (2013). Linux Shell Scripting Cookbook. Birmingham, UK.: Packt Publishing. pp. 219–220. ISBN 978-1-78216-275-9. [2] An English Word List. 2017. An English Word List. [ONLINE] Available at: http://www-personal.umich.edu/~jlawler/wordlist.html. [Accessed 05 October 2017]. [3]