Issues Magazine

Internet Security Flaws in the Age of Big Data

By By Robert Merkel

Lecturer in Software Engineering, Monash University

The past year has seen some spectacular internet security breaches due to poor oversight of programming code, raising valid questions about the security of our online identities and transactions.

In 2014 we’ve learned one important thing about IT security. If you want to get a security issue into the mainstream media, give the issue a catchy name. “CVE-2014-0160” and “CVE-2014-6271” might have gone unknown outside the world of exasperated IT system administrators, but as “Heartbleed” and “Shellshock” they received worldwide attention, some of it more than a little sensationalist.

Despite some unjustified hype, these two incidents are yet further illustrations of how vulnerable our IT systems are to unauthorised access by attackers varying in capability and motivation, from teenage thrillseekers to superpower intelligence agencies.

An Old Mistake in New Code: How Did it Slip Through?

The first of these incidents, the Heartbleed vulnerability, left me dumbfounded as an IT educator. A classic mistake, in a shoddily-written piece of code, made it in to one of the most important pieces of security software in the world.

Since the earliest days of computers, those tasked to write software for them have made use of the efforts of others. CSIRAC, Australia’s first computer and the world’s fifth, was designed to support “library routines” – small snippets of software that perform subtasks required in many applications, such as calculating the sum of a list of numbers.

Over the years, the scope of facilities in libraries and other ways of sharing software functionality have grown enormously. The teenage genius who sells a mobile app for millions, and the multinational conglomerate that reduces its inventory cost with a smarter logistics management system, rely on a distillation of the accumulated efforts of generations of programmers before them. Underneath a website, a desktop application or a game there are tens of millions of lines of accumulated code from a variety of sources, some of it decades old.

OpenSSL is one of these libraries, and it serves a particularly important and sensitive function. Its job is to take information that is to be sent over the internet, scramble it up in such a way that only the intended recipient can read it, and unscramble it again at the other end. It also performs the job of authentication, establishing that a particular website is indeed operated by your bank rather than an imposter website run by a scam artist. Many of the world’s best-known websites, millions of Android smartphones and many other systems use OpenSSL for this job. Millions of credit card transactions between consumers and multinational companies are secured with OpenSSL every day.

Despite its ubiquity and the importance of its job, remarkably little money was spent on the development of OpenSSL. Its development was led by volunteer contributors around the world and coordinated over internet mailing lists, with one full-time contributor supported by meager donations.

One of these volunteers, Robin Seggelmann, spent part of the 2011 holiday season adding a new feature to the library – in essence, a way to keep a non-transmitting secure connection going. Unfortunately, in the process he made a common rookie error in his code. It allowed an “attacker” to read things in the computer’s memory that they shouldn’t have been allowed to – essentially, by not checking that reads were within the appropriate boundaries. We teach people about this type of mistake in introductory programming courses, and it’s been known as a source of security problems for 25 years, when the “Morris Worm” wreaked havoc on the early internet in 1989. But not only did Seggelmann make this mistake, it was reviewed by the overworked, underpaid project leader, approved and incorporated into the production version of OpenSSL, and distributed to millions of computers.

Luckily, it does not seem that the flaw – which, among other things potentially allowed all the scrambled internet traffic from a system running the faulty version of OpenSSL to be read – was widely exploited by criminals or other attackers. Indeed, the United States’ primary electronic espionage agency, the National Security Agency, took the unusual step of denying that it had discovered the issue before it came to public attention.

But once it was discovered, system administrators the world over had to install updated versions of OpenSSL on millions of computers, among other actions. While no detailed costings were done, it is likely that the costs of securing systems ran into the hundreds of millions of dollars.

Shellshock: A Relic of the Past Leaves Systems Vulnerable Again

The second well-publicised security issue of 2014 involves another low-profile but very widely used tool. Bash, the Bourne-Again Shell, dates back to 1989, and has ancestors going all the way back to the late 1960s. Its original purpose was something akin to the Start Menu on a Windows PC – a way for users to start other software on the system. This was done by typing commands, usually the name of the next program to be started, at a “command line”.

Over time, however, it began to be used for another purpose – as an intermediary when one program starts another. Figuring out exactly which program to run, and passing configuration information across, is a complex process. Using Bash to assist can simplify things substantially.

Unfortunately, a flaw dating from the very first versions of Bash meant that a specially constructed configuration string could be interpreted as a program or programs to execute, rather than as configuration information. That’s not a big deal when Bash is used in its “Start Menu” role, as any user that could type in the configuration message could just issue the commands directly at the command line.

The problem arises when a web server or other network service takes information from an anonymous user (potentially, an attacker) on the internet, and passes that information to Bash as a configuration message in its intermediary role. The configuration message is then interpreted as commands that are executed on the web server. In essence, an attacker could, by sending the right request to a web server with this vulnerability, cause the web server to do anything the attacker wants it to do.

Again, once the problem was identified, a fix for Bash had to be developed and deployed on millions of computers, including every Apple Mac running OS X. There is no direct evidence that the vulnerability was exploited by criminals or other attackers prior to the reporting of the issue in September 2014. It’s also worth noting that the majority of high-profile websites had long ceased using Bash as an intermediary program in creating web pages, reducing the risks that the Bash issue could be exploited by attackers.

However, for 25 years the majority of devices connected to the internet have had a serious, undetected security flaw.

Private Information Regularly Stolen, but Who Bears the Costs?

While Heartbleed and Shellshock have not yet been blamed for a large-scale data theft, such thefts are a regular occurrence. While nude photos of a small number of famous actors such as Jennifer Lawrence, stolen from Apple’s iCloud file storage service, sent the prurient into a frenzy, personal data thefts have occurred on a vastly larger scale. In the United States, the personal details (though thankfully not the actual medical records) of 4.5 million customers of a health care network were stolen in April 2014. The credit card details of 56 million customers of the retailer Home Depot were stolen in September.

Such large-scale thefts represent one high-profile part of “the hacker market” that a RAND Corporation report describes as “...once a varied landscape of discrete, ad hoc networks of individuals initially motivated by little more than ego and notoriety – has emerged as a playground of financially driven, highly organized, and sophisticated groups”. In other words, there is money in stealing your personal information, and there is a global marketplace dedicated to facilitating and exploiting that theft.

The costs to Australians of identity fraud – of which this kind of systematic, industrial computer crime makes up a substantial part – are significant. An Australian Institute of Criminology study indicates that roughly 5% of Australians suffer financial loss due to identity fraud each year.

Like any crime, blame ultimately lies with those who attack systems for money. But given the profits to be made and the global nature of the problem, it is naive to imagine that moral suasion or international law enforcement will substantially reduce the threat to data stored on the computers of companies large and small. So why aren’t companies more effectively protecting their customers’ data from hackers?

Part of the reason is that it’s damnably difficult. Both Heartbleed and Shellshock involved flaws in very widely used software whose source code are available to anyone, and are known to be particularly consequential if security flaws are found, and neither problem was caught for years. In the case of Bash, the problem dated back at least 20 years – and nobody spotted it. It’s no wonder that security problems creep into the myriad lower-profile pieces of software infrastructure, and the custom applications built by and for businesses, which aren’t the beneficiary of such scrutiny.

Hard as it may be, it’s also worth observing that the costs of large-scale data breaches largely accrue to parties other than the businesses whose systems are attacked. Home Depot estimates that the data breach will cost it US$62 million, which sounds like a lot of money. But for a company whose annual revenue is US$79 billion it’s small change.

Most of the cost of the data loss is likely to be borne by banks, as fraudulent credit card transaction costs usually are, and by the consumers whose personal information was stolen. Even the bank losses ultimately diffuse to consumers as higher transaction fees on credit cards. It’s a classic instance of moral hazard, defined by Wikipedia as when “a party is more likely to take risks because the costs that could result will not be borne by the party taking the risk”.

What about Governments?

The same techniques are deployed with even greater skill and resources by the intelligence agencies of governments, but for information gathering rather than profit. Discussing the issues surrounding mass government collection and analysis of digital information, as illustrated by the leaks by Edward Snowden, is beyond the scope of this essay but it’s worth noting that the chances of keeping governments out of private-sector data in which they are sufficiently interested are minimal, and government interests, notably the US government’s interests, have become extraordinarily wide.

The Problem in Perspective

IT systems are insecure. Few organisations are prepared to wear the direct and indirect financial costs of protecting their systems properly, and criminals exploit this for substantial profit. But as a society we choose to tolerate the losses. Why? Partly because they’re largely not worn by those in the best position to reduce them – the companies holding the data they fail to keep secure.

One relatively simple way that companies could improve their data security is to collect and retain less of it. But that seems unlikely, with the latest buzzword in the IT industry being “big data”. Companies and government departments are rushing to employ sophisticated analysis techniques to wring what they hope are valuable insights from the enormous amounts of data they now keep on their clients. In this environment it seems that more, rather than less, data will be kept by companies, and it is a safe bet that hackers will find ways to steal and exploit those ever-expanding databases.