Apache’s migration to Github: another patch in the centralization wall

Articles

Avatar Mayeul Berger

Here at CybelAngel, we’ve spent a significant amount of R&D effort focused on protecting our customers from compromises relating to data or credential leaks found on GitHub.  As GitHub’s adoption has continued its meteoric rise, the number of security incidents stemming from credentials, open file paths, and other sensitive data being left accessible in public GitHub repositories has also continued to climb.  

The combination of mass corporate adoption of GitHub, personal GitHub accounts, and the use of various open-source projects housed via GitHub has created a significant threat landscape for all enterprises.  A recent example that has implications for nearly all organizations is Apache’s full transition to GitHub.

On May 1st, the Apache Software Foundation (ASF) finalized the last step of one of the most important projects in its history: the migration of all its projects onto the code-sharing platform Github. The full integration, started after the Github purchase by Microsoft at the end of 2018, was completed in February; and the ASF decommissioned the two internal hosting services it used to maintain, Git and Subversion – though the Git repositories remain as backup mirrors only.

The ASF started using the Github environment back in 2016 when it first enabled some projects to use Github tools. The development community appeared to have asked for a better integration, for the obvious reasons of practicality, development and sharing ease, performance, robustness (Github survived the biggest DDoS ever recorded last year) and all the elements that make Github the new must-use platform in the open-source community.

Another crucial reason was cost. ASF’s 2018 5-year strategic plan mentioned that “Increasingly, project communities have infrastructure requirements that strain the capabilities of the ASF.” Indeed, the overall infrastructure costs amounted to $818K in 2018, or ~80% of the total expense budget. Worse still, Git appears to hardly be scalable due to its reliance on many interactive protocols. Hence, the use of an external hosting solution was not only the best strategic option, but it’s almost surprising that such a migration had not occurred before. Saved resources can now be reallocated to focus on “creating software for the public good”, which is the core mission of the ASF, according to Greg Stein (current ASF Infrastructure Administrator).

As expected, the migration announcement caused many reactions among the tech community; especially regarding the fact that being hosted on Github actually means being hosted by Microsoft. The Bill Gates empire remains, for many, one of the symbolic archenemies of the open-source culture (despite the fact that Microsoft has become, over the last ten years, one of the world’s largest open-source contributors). Feelings are therefore divided between the fear of seeing the ASF, one of the godfathers of open-source software, potentially depend on Microsoft, and the hope of accelerating the development and deployment of ASF’s projects thanks to greater participation and a more robust infrastructure.

Another lingering issue seems much more sensitive – although less commented on. Perhaps Microsoft’s strategy, turned pro-open-source, isn’t bad because “it’s Microsoft” (or “Amazon”, or “Google”, or whatever tech giant), but rather because it pools actors and talents. If, of course, one should appreciate that more and more resources are devoted to open-source software that way, the migration remains a step further to internet centralization. Though open-source development ecosystems tend to slowly but surely become less diverse and more standardized. They become, in turn, subject to the problems of lack of choice, access, and freedom; and more vulnerable to single points of failure. This ASF migration could be another small weight that makes the balance of power tilt toward the tech giants and ISPs. Looking at the big picture, this step toward centralization will probably be good for the involved actors, but probably more problematic for global internet users.

We continue to monitor GitHub (both literally and figuratively) as this centralization continues, to ensure our customers are protected.  If you’re interested in doing a scan of GitHub (and/or the many other sources our platform can detect your leaked data on), please reach out here or chat with CybelBot below!

Leaks are inevitable. Damage is optional.
Where has your enterprise's data leaked to?

See Your Data Leaks