An unsupervised alignment algorithm for text simplification corpus construction

SID > Documentación destacada sobre discapacidad (antiguo) > An unsupervised alignment algorithm for text simplification corpus construction

Descripción física

7:00 PM

Resumen

We present a method for the sentence-level alignment of short simplified text to the original text from which they were adapted. Our goal is to align a medium-sized corpus of parallel text, consisting of short news texts in Spanish with their simplified counterpart. No training data is available for this task, so we have to rely on unsupervised learning. In contrast to bilingual sentence alignment, in this task we can exploit the fact that the probability of sentence correspondence can be estimated from lexical similarity between sentences. We show that the algoithm employed performs better than a baseline which approaches the problem with a TF*IDF sentence similarity metric. The alignment algorithm is being used for the creation of a corpus for the study of text simplification in the Spanish language.

Resumen recogido del informe

Quizás te interese:

Recibe más documentos en tu email

No te pierdas todas nuestras actualizaciones