The origins of Proto-Indo-European, the ancestral tongue that diverged into hundreds of languages spoken by more than three billion people worldwide, have long been a matter of debate.

According to the leading scenario, the language was first spoken by pastoralists who roamed the Steppe in Ukraine and southern Russia on horseback about 6,000 years ago and whose descendants pushed outward into Europe and the Indian subcontinent.

But some have argued for an older origin, as much as 9,000 years ago, among Neolithic farmers who lived in a part of Turkey that sits at the northern end of the Fertile Crescent.

Now, based on a new analysis of words that are shared across many Indo-European languages, an international team of researchers says the linguistic evidence points to neither of those theories but rather a combination of the two.

The team’s findings, published Thursday in the journal Science, amount to a reorganization of the multiple branchings of the Indo-European language family over time. The result supports the picture of an early origin in Turkey but also points to the Steppe becoming a “second homeland” from which Indo-European speakers carried the forerunners of Italic, Celtic and Germanic languages westward into Europe.

“I think it’s a big step forward,” said Paul Heggarty, lead author of the study, which was conducted while he was a senior researcher in linguistics at the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany. “We’re in a much better place for working out what actually happened.”

While Indo-European languages account for only 5 per cent of all human languages, they are spoken by 46 per cent of the world’s population, with the most prevalent being English, Hindi-Urdu, Spanish, Bengali, French, Russian, Portuguese, German and Punjabi.

Dr. Heggarty, who is now affiliated with the Pontifical Catholic University in Lima, Peru, said a key goal of the study was to use linguistic relationships to arrive at a more reliable chronology for the emergence of various Indo-European languages. This can now be compared to archeological evidence and the emerging picture of ancient population movements based on sequencing human DNA.

At the heart of the analysis is a list of 170 “reference meanings” for words the researchers say can be traced to a common forerunner in Proto-Indo-European. The words are generally part of each daughter language’s core vocabulary. The list includes words for body parts (eye, hair), common geographic features (mountain, river) or animals that were found everywhere Indo-European language speakers lived (ant, snake).

In an exercise that took years to accomplish, language experts on the team worked to rid the list of borrowed words that jump from one language to another. A typical example is “planet,” an English word that was taken directly from Greek rather than tracing back to an older root inherited by both languages. Borrowed words tend to make two languages seem more recently related in time than they really are.

“We did everything we could to make sure we weren’t assuming similar words were similar through inheritance,” said Erik Anonby, a professor of linguistics at Carleton University in Ottawa who was involved in the portion of study that considered relationships between Indo-European languages in Iran.

Based on the word list, the researchers then conducted a statistical analysis to show which languages were most closely related and projected backward to their likely time of divergence – usually the point at which two population groups sharing a common tongue become geographically separated. The method is similar to a phylogenetic analysis that evolutionary biologists might use to determine how individual species in a group relate to one another.

Exactly when the first branches diverged is difficult to discern, Dr. Heggarty said, but the linguistic analysis points to a “fairly early splintering” some time during a two-millennium-long swath of time with a midpoint around 8,210 years ago.

By 7,000 years ago, the analysis shows, the forerunners of Greek, Albanian and Tocharian – an extinct language group in western China – had already split off from the rest. This predates the period associated with the theory of a Steppe origin for the language family.

“The Steppe is certainly doing a lot but it’s not doing everything,” Dr. Heggarty said.

Instead, he said, the language that appeared on the Steppe could have been seeded by farmers who carried their version of Indo-European northward, around or through the Caucasus mountains, after which they became increasingly reliant on animals instead of crops as a food source. From there, he said, there is a more clear-cut branching of European language groups after about 6,000 years ago that matches the Steppe theory. Before then, the Indo-Iranian languages were already on a separate trek eastward, but the analysis does not determine if that was before or after the shift into the Steppe.

The study may serve to intensify debate over the early chapters of the Indo-European story, with some experts raising early objections to the results.

“The resulting hybrid model at important points is at odds with the linguistic and genetic facts,” said Guus Kroonen, a linguistics researcher at the University of Copenhagen. “Future genetic studies will have to show whether the Steppe hypothesis truly can be rejected.”

Andrew Garrett, a linguist at the University of California, Berkeley and co-author of a 2015 paper that supports a Steppe origin for Proto-Indo-European, said the most exciting aspect of the new analysis is the “well-documented and reliably analyzed” database of words and meaning the authors created.

“It will be a significant resource for linguists for years to come,” he said.

As to the conclusions of the study, Dr. Garrett added, “the new analysis in the paper is unconvincing, but that’s less important than the value of the database used.”

The Globe and Mail, July 27, 2023