
Check the interactive version here: http://rpubs.com/natalevs/Jesus_forms_v1
While the so-called “developed” world is rolling back to Middle Ages, reading the New Testament can be useful. First, one can find out that Jesus actually did not support guns, forced birth and “traditional values”. Second, the Bible can move science forward – at least, when it comes to language. Michael Cysouw and Thomas Mayer, who I was lucky to work with in Marburg, have created a huge collection of parallel Bible translations. Nowadays it contains about 2,000 texts in languages from all over the world.
For quite a while, I have been working on correlations between different properties of languages. For example, if a language has different case forms for Subject and Object, does it also have rigid word order? Does it prefer a verb-final order? Is it more restrictive with regard to semantics of its arguments? In this paper, I say “yes” to all these questions. But my findings were based on corpus data from a small sample of 30 languages. Obviously, that’s not much. How to scale this up?
The Bible corpus comes to rescue. We can infer a lot of information about grammar. Let us look at case marking. I found all verses with “Jesus” as transitive subject (Iesus) and object (Iesum) in the Vulgata translation in Latin. You might ask, why Jesus? There are two reasons. First, he is the person most frequently spoken about in the New Testament, so there are many verses with sentences where he is Subject or Object. Moreover, proper names look relatively similar across the languages, so it is easy to identify them in a language you know nothing about. Michael Cysouw has used common proper names in New Testament (like Jesus and Jerusalem) for many cool case studies.
Next, I used some simple statistics to infer the translation candidates of Iesus and Iesum from the same verses in different languages. It was also necessary to check bi-grams in order to include pre- and postpositions, like in Spanish Veo a Jesús “I see Jesus”.
The map above, which was made with the easy-to-use R package lingtypology, shows the first results for almost 1400 languages. The blue dots represent languages in which the subject and object forms are different, like in Latin. The orange dots are those languages in which the forms are the same, like in English.
If you like to explore the map interactively, like in the World Atlas of Language Structures, please check it here:
http://rpubs.com/natalevs/Jesus_forms_v1
This is a quick and dirty approach, so I’ll be very grateful for all corrections from language experts! By the way, Klingon is not shown here on the map, due to the lack of coordinates on this planet. But the value is “Same”. I’m wondering what experts would say to that!