Transliteration

I wanted to give Alban and others pointers to my resources on the topic of transliteration. But I can’t find my transliteration documents any more ! Anyway, my experience is that transliteration is a tough problem and after having thought a little bit on this topic, we decided not to automate the transliteration of individual names but to make people input their name according to their own habits in a more or less transliterated form. It would have been great if you were able to automate transliteration (maybe with the help of a virtual Unicode keyboard ?). The main advantage of standardized transliteration is that it is supposed to give you a standardized representation of the name of a person. You might then rely on this standardized naming elements in order to build a unique identifier. But the problem is that many language transliterations are not standardized, plus these standards evolve too much. A greek colleague of mine told me that his name was transliterated many times with many different output (creating problems at the airport, I let you imagine). Transliteration definitely remains as a problem for strong identity management. At the moment, you should just try to workaround it until transliteration standards are more robust and widely adopted…

Anyway, here are some pointers on how to process non-latin documents and maybe transliterate them :