Intro: Breaching to the Choir

When a sensitive data breach at an important company like Equifax makes headline news, millions of consumers become immediately aware that they’re now victims. The story is always about stolen data and the drama around the company’s attempt to cover up the breach. But compromised data is a consequence of a security failure. What are the actual causes of such a severe security breach?

Oftentimes, sensitive data is compromised through insecure source code. That’s the story within the story, the compelling story rarely told. Such failures occur frequently, even within the biggest and most familiar companies like Uber. Security failures even occur inside companies specializing in security, like OneLogin, but we rarely hear why these failures happened.

Oftentimes, sensitive data is compromised through insecure source code. That’s the story within the story, the compelling story rarely told.

This is true in part because source code vulnerability is highly technical. As we will see, this is one issue we cannot afford to avoid because it is “too technical.” But there is a more nefarious reason we don’t hear about the cause of security failures: if companies revealed that their sofware lacked important security features, then they would quickly lose customer confidence. For this reason, Yahoo and other companies have intentionally concealed security breaches for years.

We will find that the source of these security breaches is the source code itself. In this article, we will explore the security vulnerabilities of source code repositories such as Git and Apache Subversion. We will also discuss a variety of solutions including source code scanning. Let’s start with a look at some recent security failures and uncover the common denominator.

Hacks in the News

In late 2016, a total of 57 million Uber users and drivers fell victim to one of the largest data breaches in recent history. Two hackers made off with personal information, including phone numbers, names, and email addresses. What’s worse, however, is that the company covered up the breach for more than a year’s time. Settling claims associated with the cover-up will cost Uber $148 million.

Hacks in the News News stories such as the Uber breach are all too common. The absurd but true answer is that bad actors ofen don’t hack anything. The most damaging “hacks” involve little more ingenuity than the accidental discovery of passwords in an unencrypted file.

This story is all too common. Switch “Uber” for any organization. Stories such as this constantly remind us to increase our own web-based security precautions. Companies seek to promote confidence by enforcing strong password creation; some even use strong password generators. More draconian measures include the use of two-factor and even multi-factor authentication. This may involve a dynamic PIN code sent to your email address, via SMS, in addition to the traditional hardware token or key fob. In view of such heightened security, why then do we continue to hear that millions of consumers’ privacy was compromised at Uber and other companies?

The absurd but true answer is that bad actors ofen don’t hack anything. The most damaging “hacks” involve little more ingenuity than the accidental discovery of passwords in an unencrypted file, hosted in an indexable web directory or an Amazon S3 bucket! This is what happened at Uber, and we’ll be taking a more in-depth look at how attackers get access to login credentials through vulnerabilities in versioning platforms and code repositories like Git and Subversion.

Take for example another high-profile breach that impacted more than 148 million consumers: the breach of Equifax, a consumer credit-reporting agency. The massive attack on Equifax exploited another popular developer tool for distributed computing apps called Apache Struts. Apache is well known for producing high-quality and widely popular open source developer tools; however, Apache Struts has a long list of known security issues. One developer forum currently maintains a post of 72 known security issues! Far from moderate in scope, these security vulnerabilities span denial of service and remote code execution attacks, which can cripple an enterprise ecommerce platform.

This prompts the most natural question: if Apache Struts has known security issues, why was Equifax using it? It’s widely stated that Equifax knew about those security issues long before they were exploited, but Equifax did not take any deliberate action to resolve the issues. Why? To understand the problem with handling known bugs and attack vectors in such commonly used software, we must explore the technical side of the issue.

Even companies whose sole focus is the security of data are not immune. For example, OneLogin, a company whose whole raison d’être is web security, was attacked. And this particular story is most germane to our intended purposes here: thef of API secret keys and OAuth tokens. Although OneLogin did not reveal details of the attack, their recommendations to customers on protecting further breaches indirectly revealed the potential points of ingress. The login credentials that developers use for automated logins—API keys and OAuth tokens—are ofen stored unencrypted in shell scripts, or may even show up in log files.

But companies face an extraordinary challenge in solving this particular problem: the continuous deployment pipeline that devops and agile teams crave to streamline is much trickier to automate when authentication credentials must be fully secured. It can absolutely be done, but it will require complete attention and continuous review. Let’s start with how and why these security flaws exist.

It Starts with the Code

In a recent study, the US Department of Homeland Security noted that 90% of security breaches happen because of vulnerabilities in the code. That simple yet impactful statistic should be enough to prompt everyone from developers to CISOs to start thinking about assessing their own security practices pertaining to code. The first and most important place to look is within code repositories. Source code repositories (“repos”) are slowly becoming more understood as adoption and use increases, but historically, they’ve primarily been occupied only by developers who work on large-scale, enterprise-level sofware applications. These are shared developer resources. Because only developers spend each day working within repos, source code committed is exempt from the scrutiny exerted upon other areas of development, testing, and QA. But this is exactly where security vulnerabilities begin to sprout.

It Starts with the Code

Another place to look is within version control as a whole. In a complex web application where multiple developers may work on a single module, leaders need the convenience of being able to quickly test and roll back emerging versions of code. This functionality is provided by versioning apps such as Apache Subversion. Both the repo and the versioning app are prone to vulnerability.

Ordinary security issues, such as SQL injection or cross-site scripting (XSS), are two-dimensional and recognizable by nontechnical staff through straightforward testing methods or even the most basic vulnerability scanning tools. Testers can run a regression test suite scripted on Cucumber and then report issues without knowing anything about the contents of automated build scripts that trigger the test from a repo, some of which actually contain secret access keys and tokens. The diverse ways such credentials are used in scripted CI pipelines is limited only by the creativity of the developer. Usually continuous integration and deployment (CI/CD) involves simplifying and automating as much as possible, but scanning the code, the integration, and the deployment process itself is no straightforward task for mere mortals.

Scripted builds must instead be scanned for vulnerabilities, not by humans, but by machines. And they must be sufficiently intelligent to start before the first CI trigger. Jenkins is a popular developer tool that detects when a coder submits (“commits”) a change to an app. Jenkins then triggers a test suite that, if successful, will proceed with an automatic deployment of the successful version of the application. These are some of the steps in CI/CD, which is now a wildly popular way of distributing new software to users. Why are CI and CD the great abyss of security failures?

scripted builds for vulnerabilities
It’s important that scripted builds be scanned for vulnerabilities, not by humans, but by machines. Scanning the code, the integration, and the deployment process itself is no straightforward task for mere mortals. “

To achieve a truly automated development pipeline, a developer must script a sequence of events from code change to app deployment. This means that any change made to the web app triggers a new set of test suites to verify that the code change did not break another part of the app. As you can imagine, to automate a web app test, a virtual user must come into being, login to an account, and do something a normal user would do—perhaps purchase a TV using a new account and address. How can a bot login to an account when we have all the security features previously described, such as dynamic PIN authentication? It’s ofen as simple as pasting cleartext credentials into a file, and that might terrify even the most seasoned systems administrators and devops engineers.

When a developer—oftentimes a QA engineer—scripts an automated CI pipeline, they quite literally enter the login credentials for the virtual users into scripts. These scripts are not compiled or encrypted; they are plaintext scripts executed on both a browser and a server. Automation scripts are usually saved to repositories like Apache Subversion and Git, or in the case of the flippantly naughty, to indexable repositories on wide-open web servers or on public GitHub itself. They are triggered during rollback and release through deployment pipelines. These security vulnerabilities must be detected by a source code scanner before a commit triggers a new build. Only this will prevent hackers from wrapping their eyes around the contents of these scripts. Lax processes on frequently accessed protocols, components, and libraries result in source code vulnerabilities.

Avoid lax processes and ensure that your source code is scanned for security vulnerabilities by a source code scanner. Remember our Uber example? They, like other companies “frequently accidentally keep credentials in source code that is uploaded to GitHub.”

Remember our Uber example? They, like other companies “frequently accidentally keep credentials in source code that is uploaded to GitHub.”

But this is not accidental—sometimes it’s routine, in the name of convenience or speed of development. And the only way to get in front of a bad habit or reckless automation is to automate scanning it up front.

The Human Component

Surprisingly, security failures arising in distributed computing platforms are not bugs. They are oversights that inadvertently provide access to unknown parties. The mistake of the developer in this failed process is one of anticipation. Developers traditionally think that bugs happen in the running app, not the scripts on the repos. Insecure coding practices stem from behaviors and habits during the dev process that might lead to vulnerabilities in the code.

As mentioned, repos and versioning platforms are strictly the domain of coders. Likewise, testers and QA engineers don’t usually think of build scripts as a part of the web app under development. This mind-set must change, but there must also be an improved platform that enforces security.

The Human Component Security failures arising in distributed computing platforms are not bugs, but oversights.

The Battle of Two Heroes: Security vs. Efficiency

Devops and agile methodologies have placed such a severe emphasis on breakneck speed and efficiency that developers come under pressure to script shortcuts into software builds. This of course leads to unforeseen consequences. If two or more developers work on a branch of code, they are often tempted to share credentials to make visibility easier. Although it is impractical to reduce the pressure for speed to delivery, it is very possible to implement a repo and versioning system that enforces developer code security practices.

The Battle of Two Heroes: Security vs. Efficiency A repo and versioning system enforces developer code security practices without sacrificing speed to delivery.

Unfortunately, secure developer tools don’t exist. Today, security is “bolted on” versus integrated within. Since devs want to go fast and aren’t interested in being slowed down with additional security steps, they inherently find the idea of remaining security-conscious off-putting in an environment of fast innovation and high demand.

Irrefutably, the substance of a company is its source code. New automated delivery pipelines now expose a new form of security vulnerability. With the advent of any new technology, a new set of threats arise. Enterprises whose core product is delivered by a web app must recognize source code security as the utmost priority. Static analysis of source code can identify vulnerabilities in both proprietary and open source code before an application is deployed to production or before a build is packaged and delivered. A security-based versioning and repository platform that detects vulnerabilities can then interrupt the automatic deployment of a new version until the issue is corrected or an authoritative team leader authenticates and approves the release. The prevention of security failures like those that led to the catastrophic Equifax and Uber breaches is well worth a momentary lapse in delivery to production. Security-based developer tools must necessarily figure in the next generation of coding standards.

The prevention of security failures like those that led to the catastrophic Equifax and Uber breaches is well worth a momentary lapse in delivery to production. Security-based developer tools must necessarily figure in the next generation of coding standards.

This step alone will thwart nearly all malicious attempts to alter source code and will mitigate unauthorized access before it has a chance to occur. Among the essential best security practices is the constant monitoring of build scripts for potential security breaches. An automated inventory of source code can accomplish this. Source code scanners are needed with the capability to track all new branches in a versioning system and to present alerts when keys and tokens are entered. Some companies today are taking interest in publicly scanning for these keys and then disabling them or notifying users. Some even go so far as to empower their users with tools to do this, as is the case with Amazon Web Services’ AWSLabs GitHub account.

Solving the Security Conundrum

1. Code Scanning

Indeed, a critical security feature also needed in Git repos today is the capability to intelligently scan all source code and identify access keys and passwords added by developers. An estimated 75% of security breaches are enabled when developers code secret access keys and passwords into source code. What’s needed is an automatic system that detects key and password entry in scripts and alerts team leaders for review.

Code Scanning A code scanner is an important security feature. It can detect key and password entry in scripts and alert team leader for review.

A method of bolted on aferthought security that developers use currently is to separate login credentials into an “Include” file in order to remove passwords for Oauth and other secret keys and tokens from login scripts. But this method depends on the habits of individual developers and defeats the concept of collaborative accountability. We have already shown that devops practices that encourage speed also inspire coders to shortcut security. A platform that implements automated vulnerability scanning and analysis is the next generation of standard developer security tools to be included in improved repository and version control platforms. Automated vulnerability scanning democratizes code security across an entire devops or agile team.

A security-based distributed app development platform must implement policy enforcement automatically. In other words, in the same way that Jenkins detects a code change, a source code security system should detect the entry of sensitive credentials or private data. Such a system will critically control and audit who is authorized to access and update source code. The system will provide protected branches and user reports to team leaders for urgent response. The system should have keyword traps and work like an antivirus program to catch entries of credentials before submission and prior to any commit that triggers a build.

2. Compliance

Parallel with safe developer practices is the adoption of enterprise-level standards. The certification of standards compliance is essential to the success of web app development today. A cloud-based distributed software development platform that provides repository and versioning should critically comply with a security service organization control (SOC) 2 audit. Such compliance amounts to additional assurance of robust source code security. Ideally, the security protocols of a security-based developer platform with version control should meet comprehensive audit and certification standards to verify compliance as an important layer of quality assurance and customer privacy protection.

The Privacy Shield Framework was designed by the U.S. Department of Commerce and the European Commission to offer companies a way in which to comply with data protection requirements when transferring personal data from the European Union to the United States in support of transatlantic commerce.

EU Flag US Flag

Compliance with SOC 2 audits assures customers and partners of the strength of a company’s information security measures and maintains today’s specific cloud security requirements. Source code security and data privacy are integral to SOC 2 audit compliance.

Corequisite with SOC 2 certification are several additional compliance standards to guarantee privacy, and all should be considered essential to verification of source code security and cloud data privacy overall. PCI Level 3 and Privacy Shield7 for privacy practices are most essential in cloud-based e-commerce payment and data security networks.

The Cloud Security Alliance STAR self-assessment is likewise an assurance to all affiliates and customers that a webbased enterprise maintains the highest standards of privacy and source code security.

Data localization is also important in many jurisdictions in that it keeps data nearest to the organization for performance and control. Due to the rise of compliance regulations, specifically GDPR, the EU cloud will be especially beneficial to international customers concerned with data privacy, data protection, and the rise of preferences for data localization.

MyGet Package graphic

3. Securing Open Source Packages with Package Management

Packages are collections of code and scripts that programmatically include each other in builds for specific apps. Ultimately this is how a modern web app is constructed; it is an assembly of packages. Packages are often downloaded by customers and integrated into new software products. Such code packages often contain security issues. An enterprise cannot manually scan all the code in every package its developers use, so a significant benefit must ensue from an automatic vulnerability scanning and auditing system. Such a scanner would stand as a sentinel to protect the enterprise from a particular variety of creative but dangerous innovation: the scripting of authentication credentials.

MyGet Security
With MyGet’s vulnerability scanner, packages from all major languages can be scanned for security flaws.

For example, the list of common coding languages in use today is substantial and includes Java, .NET Framework languages like C#, client-side languages including JavaScript, and the bewildering JS libraries such as React, Vue, AngularJS, and jQuery. Server-side languages include Node.js, Ruby, Python, Perl, and of course PHP.

MyGet Package graphic

Security-based developer versioning and repo platforms must contain the capability to identify and provide alerts for security vulnerabilities in all of these languages. For each one, there is also a myriad of competing developer IDEs and studios. These now evolve as low-code platforms and automatic programming clients, which will inevitably open the field to additional complexity. Code analysis and static scanning is the first line of defense.

New Frontier of Security-Focused Developer Tools

Developers today are still scripting Selenium to automate authentication for testing a new version of an app on Docker containers and VMs with virtual users. In the simplest case, a password is scripted right into an unencrypted file for a new test and release build. This constitutes a frequent and important security vulnerability.

We have just learned that an unsecured Docker image registry exposed the entire source code of Aeroflot’s core web application. As we have seen, such exclusively developer-inhabited realms as containerization now vehemently demand scrutiny to avoid catastrophic data loss.

Features such as automated vulnerability source code scanning must now become standard fare in enterprise app development.

Developers under pressure may forget: there might be 40 scripts running in a batch contributing to a single build. The batch may upload all at once to a repo triggered by Jenkins, for example. Anyone with access to that repo can get the scripts containing authentication credentials! This was once a developer-only zone.

Developers share posts to get an idea on how to do something in an unfamiliar case, very often looking for shortcuts. One post illustrates an easy way to script a login for a test case with Selenium. Another post admonishes developers for doing just that! It’s a free-for-all zone, one riddled with risk. It’s a frontier nearly without security.

The wild new frontier of automated pipelines of continuous integration delivery evolves rapidly with emphasis on speed and innovation. In such a creative atmosphere, even more emphasis is needed on security in version management and repositories. Security-focused developer platforms now exist that implement all the versioning and repo tools but with this crucially needed security. Features such as automated vulnerability source code scanning must now become standard fare in enterprise app development.

About the Author

Michael Hicklen is a DevOps leader with over 10 years experience at companies like Amazon Web Services and Rackspace. He is the conductor of day to day technical operations at Assembla. From monitoring infrastructure to improving current internal processes, Michael is deeply immersed in ensuring the operational uptime of Assembla and its end users.

Michael Hicklen