2013-01-27

Typesetting Word-By-Word Translations

Recently I was asked how to typeset documents with a word-by-word translation like the following:

Welche Farbe hat der gelbe Bus?
Which color has the yellow bus?

As I have learned, linguists have the fancy word Interlinear Gloss for this. There are several Latex packages available for this purpose. Among them is gb4e which I decided to use.

For simplicity it is assumed that the text to be 'glossed' is provided as a plain text file with sentences delimited by '.  ', '?  ' or '!  ' (2 spaces) and words separated by individual spaces. The implementation of a small script that creates a document with nicely aligned words is very straight forward. The dictionary needs to be provided as a .csv file.

Unfortunately the task can not be fully automated. Breaking text into sentences requires some knowledge about a specific language. So does breaking sentences into words. Ideally the dictionary should also have some capability to detect flections, etc. The script just generates a Latex file that can be modified manually.

The script can be downloaded here.

No comments:

Post a Comment