What is buffer overflow?
Buffer overflow is an anomaly that occurs when software writing data to a buffer overflows the buffer’s capacity, resulting in adjacent memory locations being overwritten. In other words, too much information is being passed into a container that does not have enough space, and that information ends up replacing data in adjacent containers.
Buffer overflows can be exploited by attackers with a goal of modifying a computer’s memory in order to undermine or take control of program execution.
What’s a buffer?
A buffer, or data buffer, is an area of physical memory storage used to temporarily store data while it is being moved from one place to another. These buffers typically live in RAM memory. Computers frequently use buffers to help improve performance; most modern hard drives take advantage of buffering to efficiently access data, and many online services also use buffers. For example, buffers are frequently used in online video streaming to prevent interruption. When a video is streamed, the video player downloads and stores perhaps 20% of the video at a time in a buffer and then streams from that buffer. This way, minor drops in connection speed or quick service disruptions won’t affect the video stream performance.
Buffers are designed to contain specific amounts of data. Unless the program utilizing the buffer has built-in instructions to discard data when too much is sent to the buffer, the program will overwrite data in memory adjacent to the buffer.
Buffer overflows can be exploited by attackers to corrupt software. Despite being well-understood, buffer overflow attacks are still a major security problem that torment cyber-security teams. In 2014 a threat known as ‘heartbleed’ exposed hundreds of millions of users to attack because of a buffer overflow vulnerability in SSL software.
How do attackers exploit buffer overflows?
An attacker can deliberately feed a carefully crafted input into a program that will cause the program to try and store that input in a buffer that isn’t large enough, overwriting portions of memory connected to the buffer space. If the memory layout of the program is well-defined, the attacker can deliberately overwrite areas known to contain executable code. The attacker can then replace this code with his own executable code, which can drastically change how the program is intended to work.
For example if the overwritten part in memory contains a pointer (an object that points to another place in memory) the attacker’s code could replace that code with another pointer that points to an exploit payload. This can transfer control of the whole program over to the attacker’s code.
Who is vulnerable to buffer overflow attacks?
Certain coding languages are more susceptible to buffer overflow than others. C and C++ are two popular languages with high vulnerability, since they contain no built-in protections against accessing or overwriting data in their memory. Windows, Mac OSX, and Linux all contain code written in one or both of these languages.
More modern languages like Java, PERL, and C# have built-in features that help reduce the chances of buffer overflow, but cannot prevent it altogether.
How to protect against buffer overflow attacks
Luckily, modern operating systems have runtime protections which help mitigate buffer overflow attacks. Let’s explore 2 common protections that help mitigate the risk of exploitation:
- Address space randomization - Randomly rearranges the address space locations of key data areas of a process. Buffer overflow attacks generally rely on knowing the exact location of important executable code, randomization of address spaces makes that nearly impossible.
- Data execution prevention - Marks certain areas of memory either executable or non-executable, preventing an exploit from running code found in a non-executable area.
Software developers can also take precautions against buffer overflow vulnerabilities by writing in languages that have built-in protections or using special security procedures in their code.
Despite precautions, new buffer overflow vulnerabilities continue to be discovered by developers, sometimes in the wake of a successful exploitation. When new vulnerabilities are discovered, engineers need to patch the affected software and ensure that users of the software get access to the patch.
What are the different types of buffer overflow attacks?
There are a number of different buffer overflow attacks which employ different strategies and target different pieces of code. Below are a few of the most well-known.
- Stack overflow attack - This is the most common type of buffer overflow attack and involves overflowing a buffer on the call stack*.
- Heap overflow attack - This type of attack targets data in the open memory pool known as the heap*.
- Integer overflow attack - In an integer overflow, an arithmetic operation results in an integer (whole number) that is too large for the integer type meant to store it; this can result in a buffer overflow.
- Unicode overflow - A unicode overflow creates a buffer overflow by inserting unicode characters into an input that expect ASCII characters. (ASCII and unicode are encoding standards that let computers represent text. For example the letter ‘a’ is represented by the number 97 in ASCII. While ASCII codes only cover characters from Western languages, unicode can create characters for almost every written language on earth. Because there are so many more characters available in unicode, many unicode characters are larger than the largest ASCII character.)
*Computers rely on two different memory allocation models, known as the stack and the heap; both live in the computer’s RAM. The stack is neatly organized and holds data in a Last-In, First-Out model. Whatever piece of data was most recently placed in the stack will be the first to come out, kind of like how the last bullet inserted into an ammunition magazine will be the first to be fired. The heap is a disorganized pool of extra memory, data does not enter or leave the heap in any particular order. Since accessing memory from the stack is much faster than accessing from the heap, the heap is generally reserved for larger pieces of data or data that a programmer wants to manage explicitly.