State-of-the-art English to Persian Statistical Machine Translation system
Comparison of several kinds of English-Persian Statistical Machine Translation systems is reported in this paper. A large parallel corpus containing about 6 million tokens on each side has been developed for training the proposed SMT system. In development of the parallel corpus, a noisy filtering s...
        Saved in:
      
    
          | Published in | 2012 16th CSI International Symposium on Artificial Intelligence and Signal Processing pp. 174 - 179 | 
|---|---|
| Main Authors | , | 
| Format | Conference Proceeding | 
| Language | English | 
| Published | 
            IEEE
    
        01.05.2012
     | 
| Subjects | |
| Online Access | Get full text | 
| ISBN | 9781467314787 1467314781  | 
| DOI | 10.1109/AISP.2012.6313739 | 
Cover
| Summary: | Comparison of several kinds of English-Persian Statistical Machine Translation systems is reported in this paper. A large parallel corpus containing about 6 million tokens on each side has been developed for training the proposed SMT system. In development of the parallel corpus, a noisy filtering system based on MaxEnt classifier has been innovated to distinguish between correct and incorrect sentence pairs. By using the generated parallel corpus, a variety of SMT systems on English to Persian languages has been developed. Several variations on SMT, such as hybrid MT or statistical post editing MT has been proposed in this paper. The whole systems were tested on two different types of test set, one extracted randomly from parallel corpus and the other containing formal English sentences extracted from English learning book. The results shows hybrid system of SMT augmented by a rule based detection of English phrasal verb and Persian compound verb improves the baseline significantly. Also, state-of-the-art results on English-Persian translation are obtained by Verb-aware SMT with respect to BLEU measure. | 
|---|---|
| ISBN: | 9781467314787 1467314781  | 
| DOI: | 10.1109/AISP.2012.6313739 |