Decoding National Economies through Open Source Software: A Q&A on Digital Complexity

By

How can we measure a nation's productive knowledge when an increasing share of it is locked in software? Four researchers turned to the GitHub Innovation Graph to answer this question. Their study, published in Research Policy, introduces a new way to gauge the “digital complexity” of countries by analyzing the programming languages used by developers. In this Q&A, they discuss their findings, methodology, and implications for understanding economic growth, inequality, and environmental impact. The following conversation has been edited for clarity and length.

What is the central idea behind your research paper?

Sándor: For about 15 years, economists have measured national economic complexity by examining physical exports, patents, and research papers. These measures are surprisingly good at predicting which countries will grow, which have high inequality, and many other macroeconomic features. But they all have a massive oversight: software. Code doesn’t go through customs. It crosses borders through “git push”, cloud services, and package managers. So all that productive knowledge was essentially invisible—what some colleagues have called the “digital dark matter” of the economy. We decided to fix that using the GitHub Innovation Graph, which tracks how many developers in each economy push code in each programming language, based on IP addresses. We applied the Economic Complexity Index (ECI) to this data. The bottom line is that software ECI successfully reveals a country’s digital complexity and predicts economic outcomes in ways traditional data miss.

Decoding National Economies through Open Source Software: A Q&A on Digital Complexity
Source: github.blog

Why is it important to measure the complexity of software production?

Johannes: Software is increasingly the backbone of modern economies, yet traditional metrics ignore it entirely. By excluding software, we get an incomplete picture of a country’s productive knowledge. For instance, a nation that develops advanced AI algorithms but exports mostly agricultural goods would appear less complex under standard measures. Our work shows that digital complexity is a crucial missing piece of the puzzle. It helps explain variance in GDP, income inequality, and even carbon emissions that conventional economic indicators fail to capture. This matters because policymakers need accurate data to design effective industrial strategies, education programs, and innovation policies. If you’re not measuring software, you’re essentially flying blind in the 21st-century economy.

How did you use the GitHub Innovation Graph to measure digital complexity?

Jermain: The GitHub Innovation Graph provides granular, anonymized data on developer activity per country and per programming language. We aggregated this data to construct what we call the Software Economic Complexity Index. The methodology is similar to the classic ECI: we look at which programming languages are “ubiquitous” (used by many countries) versus “diverse” (used by a few). Countries that develop rare, specialized languages alongside many common ones are considered more complex. For example, a country where developers actively use Rust, Julia, and Kotlin alongside Python and JavaScript scores higher than one using only PHP and JavaScript. This mirrors how traditional ECI works with products like electronics versus raw materials. We validated our index against existing economic data and found strong correlations with national income, patenting rates, and R&D spending.

What were the main findings of your study?

César: Our Software ECI proved to be a robust predictor of several macroeconomic outcomes. First, it correlates positively with GDP per capita: countries with higher digital complexity tend to be wealthier. Second, it relates to income inequality—nations that produce a more diverse and sophisticated set of software also show lower levels of inequality, suggesting that digital skills are widely distributed. Third, we found a connection to carbon emissions: more digitally complex economies tend to have lower emissions per capita, likely because they can substitute physical goods with services and optimize resource use. Importantly, these effects hold even after controlling for traditional complexity measures, meaning software complexity adds unique explanatory power. This suggests that open source software activity on GitHub offers a valuable, real-time window into a nation’s underlying productive capabilities.

Decoding National Economies through Open Source Software: A Q&A on Digital Complexity
Source: github.blog

How does software complexity compare to traditional complexity measures?

Johannes: They complement each other. Traditional measures based on exports of physical products tell us about a country’s manufacturing and resource extraction strengths. Software complexity captures the digital “know-how” that is increasingly central to innovation and services. In our dataset, we found that software ECI correlates moderately with classic ECI, but it also identifies countries that are digitally advanced yet underperforming in traditional measures. For example, Estonia and Israel score high on software complexity relative to their classic ECI, reflecting strong tech ecosystems. Meanwhile, some resource-rich nations score lower in software than in physical complexity. The two metrics together give a fuller picture of a nation’s overall complexity—think of them as different lenses viewing the same economy.

What are the practical implications of your research for policymakers and businesses?

Jermain: For policymakers, knowing a country’s digital complexity can guide investment in education, infrastructure, and startup ecosystems. If your software ECI is low, you might focus on training developers in high-complexity languages or attracting tech companies that use specialized skills. For businesses, especially multinationals, understanding where digital talent clusters can inform decisions on R&D locations. The GitHub Innovation Graph offers near real-time data, unlike traditional economic statistics that lag by years. This timeliness is crucial in fast-moving tech sectors. Additionally, our methodology can help track the impact of policies like coding bootcamps or open source incentives. We hope this work encourages more data-driven conversations about innovation policy.

What are the limitations of using GitHub data, and what future research do you envision?

Sándor: GitHub data has some caveats. It underrepresents developer activity in countries with lower internet penetration or where corporate code is hosted on private platforms. Also, IP-based location is imperfect. However, the GitHub Innovation Graph is the best publicly available source with global coverage. Future research could include other open source platforms like GitLab or SourceForge. Another direction is to explore how digital complexity evolves over time and what drives changes—do specific policy interventions boost complexity? We’d also like to connect software complexity to more granular outcomes like firm-level innovation or regional inequality. Finally, the relationship with emissions deserves deeper study: could promoting software complexity be a climate action lever? These are exciting avenues we hope other researchers will pursue.

Related Articles

Recommended

Discover More

Mastering Job-Ready Skills: A Comprehensive Guide to Coursera's New AI, Finance, Leadership, and Technical ProgramsBuilding a Homemade Wire EDM Machine: From CNC Router to Precision Gear Cutting10 Crucial Facts About the Dissolution of OxyContin Maker Purdue PharmaNavigating Frontier AI: Key Insights for Defense LeadersOpen Source Behind the Scenes: New Documentary Series Explores Unsung Heroes of the Internet