Algorithm reconstructs ancient parent languages in record time
Though services like Google Translate encompass a large swath of languages that can be coherently translated to another, there are still many more languages that are left a mystery. Now, researchers at University of British Columbia and UC Berkeley have created a computer system that can automatically translate parent languages, or protolanguages, which are languages that modern-day languages are theorized to have originated from.
In the case of the study, which is published in the Proceedings of the National Academy of Sciences, the research team set its sights on Proto-Austronesian, a language that is responsible for daughter languages in Australasia, and Southeast Asia, among other regions. From a pool of more than 140,000 words, the research team’s system was able to reconstruct over 600 languages. The system used an algorithm called the Markov chain Monte Carlo sampler, which was able to analyze the history and origin of sets of cognates, then figure out which protolanguage the sets most likely belonged to. Perhaps equally as important as being able to recreate the languages, the system nailed an accuracy of 85% of what humans were able to reconstruct manually, and in much less time. Normally, it can take years for humans to manually reconstruct a language, whereas it can take the system a few hours or days.
Along with being able to reconstruct languages, the system uses a statistical model that could, in theory, help researchers to understand how languages might change in the future. Throughout the process, the system recorded a potential reconstruction for each cognate and language, revising them over and over and incorporating a set of predefined sound-change rules, until it came up with a prediction that was more refined, and thus more likely.
With the new system, languages can now be reconstructed and more closely studied much faster than before. We just hope the predictions help us cut off annoying trends like “yolo” and “hellah” before they can wreak linguistic havoc.