Ensemble Statistical and Heuristic Models for Unsupervised Word Alignment
Mohaghegh, Mahsa; Sarrafzadeh, Hossein; Mohammadi, Mehdi
Citation:Mohaghegh, M., Sarrafzadeh, A., and Mohammadi, M. (2014). Ensemble Statistical and Heuristic Models for Unsupervised Word Alignment. The 13th International Conference on Machine Learning and Applications (ICMLA'14)(Ed.), Detroit, Michigan, USA
Permanent link to Research Bank record:http://hdl.handle.net/10652/2969
Statistical word alignment models need large amounts of training data while they are weak in small-sized corpora. This paper proposes a new approach of an unsupervised hybrid word alignment technique using an ensemble learning method. This algorithm uses three base alignment models in several rounds to generate alignments. The ensemble algorithm uses a weighed scheme for resampling training data and a voting score to consider aggregated alignments. The underlying alignment algorithms used in this study include IBM Model 1, 2 and a heuristic method based on Dice measurement. Our experimental results show that by this approach, the alignment error rate could be improved by at least 15% for the base alignment models.