SQL injections are a subtype of the larger category of command reparse vulnerabilities. These attacks work because there is an intermediate language between different components of the system, more specifically between the frontend (which is tipically a webserver giving access to the whole world) and the backend (which is hidden behind a firewall / NAT and supposed to be protected by the access checks in the frontend).
To better understand the matter, take the following scenario: you have a website which allows listing of certain directories on the computer. Lets suppose that it works the following way:
- User goes to a webpage
- User enters in a form the directory s/he would like to view the contents of
- The server side script takes the directory, executes the ls (or dir, if it's a windows system) command, takes the output and sends it back to the user
This is usually how we think about the execution flow, however the devil is in the details as they say: the last step actually includes two very important substeps:
- The server side script takes the director and
- Constructs a command line to be executed and passes it to the shell (for example bash or cmd.exe)
- The command interpreter parses the command and executes it
- The output is sent back to the user
The types of vulnerabilities are created because of a mismatch between how the commands / queries are constructed and how they are parsed. For example let's say that the command is created by appending the directory to the string
ls (so if the directory is
~/, the resulting command would be
ls ~/). Now if somebody would to type in
~/ && cat /etc/passwd", the resulting command would be
ls ~/ && cat /etc/passwd which would print out not only the directory listing but also password data about the users. The fundamental problem here is that at the point of the construction we are not aware that some characters / character combinations (like the && sign) have special meaning for the command interpreter (in this case it means: execute the first command and then execute the second command).
How can these types of attacks be mitigated generally (later on we will cover the special case of SQL injections)?
- First of all try to avoid going through the additional encoding / parsing step if possible. This means that before calling out to an other system, check if you can't do the actions with built-in functions (all server-side scripting languages have for example functions to obtain directory listings). Using built-in functions you get both a performance gain (because creating a new process takes time as does the construction of the command line / query string and the parsing of it) and security gain (because you can use a much clearer - from the computers point of view - language when communicating with built-in functions: parameter lists, where it is known in advance whichg parameter represents what and there is no possibility of confusing parameters and commands. Additionally there is very little chance of executing one method and ending up doing a different action - like in the example where we wanted to get a directory listing and ended up typing out a file - unless there is a serious vulnerability in the scripting language and there aren't many of those)
- Second of all: if you have to call out to other systems, make sure that you
escapethe special characters in your query. Escaping means marking them for the parser at the other end so that the parser knows that it's meant to interpret the given characters literally rather that with their special meaning (for example in the previous case if we would have issues the command as
ls ~/\ \&\&\ cat\ \/etc\/passwd- where we marked with a backslash that the characters space and double ampersand are meant to be understood as literal characters rather than argument separators or command concatenators). Do not use homebrew methods for doing the escaping, because the parsing rules are complicated and chances are that you will miss some exceptions (and you will waste a lot of time writing the code). Use the built in functions of your language and use the specific functions rather than the general ones and use the ones provided by the libraries created to access the specific resource because chances are that they know best the syntax of the language (for example in PHP use escapeshellarg for command line parameters and mysql_escape_string when accessing a MySQL database rather than the generic addslashes. Chances are that there are additional syntax rules for the given type of strings and while they might be similar to some generic group - like the C style strings - the are not identical to it)
- Restrict the privileges of the called command by external means (like running it under a restricted user account). This way even if some things evade your other security measures, they won't be able to do much damage.
- Filter your input and use whitelisting (specifically look for and only allow known good elements rather than trying to eliminate all the bad things - because you have a greater chance of missing something when trying to enumerate all the bad things, given how they are many of them, than when trying to enumerate the good things). If you are using regular expressions, look out for this gotcha.
Now lets look how this all applies to SQL injection: frontends use SQL to communicate with the backend database. SQL is a nice abstraction because rather than having to learn a different set of methods for each database, you have a set of common commands and you have to learn only the differences (of which there are plenty, but they only manifest themselves in complex operations like creating tables and the syntax for basic operations like selecting, inserting, updating and deleting data is pretty much the same across the board). One (very bad) way of creating SQL commands (also called SQL statements) is the one discussed earlier where we treat the command as one big string and insert the parameters in it with concatenation. To give a concrete example, lets say that we have a login form where the user supplies the username and password. A query might look like this:
SELECT user_id FROM users WHERE name='$username' AND password='$password'
In the above examples
$password are placeholders where the strings supplied by the user are inserted. Now lets suppose that the user enters the following strings for username and password:
' OR '4'<'5 and
' OR '6'>'2. The resulting query would look like this:
SELECT user_id FROM users WHERE name='' OR '4'<'5' AND password='' OR '6'>'2'
This would allow the user to log in even when s/he doesn't have a valid user account. Even worse, let's suppose that the user entered the following strings in the textboxes:
'; DELETE * FROM users; --. This would result in the following query:
SELECT user_id FROM users WHERE name=''; DELETE * FROM users; --' AND password=''
Now we just convinced the database to delete all the entries in the users table (-- means in SQL comment, so everything which comes after it is ignored. This is a convenient way to eliminate the rest of the query so that the attacker doesn't have to make sure that the rest of the query is syntactically correct).
Now that you know what SQL injection is, how can we protect against it?
The best solution is to use prepared statements. A prepared statement is an SQL query where we place a marker in the places where user data will be (usually the question mark) and let the parser pre-parse it. In our case this would look like the following:
SELECT user_id FROM users WHERE name=? AND password=?
At the second step we
bind values to those locations, but because the parser already knows that what we are supplying is user data, it won't mistaken it for SQL commands.
If you don't want to rewrite you whole database access, make sure that inputs are properly filtered (if they are supposed to be integers, make sure they are integers and so on - make sure that you check out the tip mentioned above if you are using regular expression for this) and escaped before they are fed to the database. However you should gradually migrate to prepared statements because they are more secure and also they make your code easier to read.
A third very important step would be to connect to the database with a user which has only the privileges which it needs (eg. not the root user). For example it can select, insert, update and delete elements but it can not access other database, create tables, lock tables drop tables, create other users, etc. Even better, create different users for different use cases (for example a normal user may only select data, a registered user may select, insert and update data and an administrative user may select, insert, update and delete data) and use them to connect in the different scenarios. You may also implement logging (very easy to implement if your RDBMS supports triggers) in a table to which none of the users have access to. While later suggestions are complicated and hard to retrofit to existing systems, the first one is a definite must!
Also some database systems (particularly recent versions of MySQL) don't allow multiple SQL statements in one query. This means for example that the above example (the one deleting all the user data) won't work, however you still have serious information disclosure vulnerabilities and all it takes is a legitimate DELETE statement to remove all the data.
Finally a very important thing: don't store more data than you have to. Read about hash functions and learn to use them. The main idea of hashes are that it is very easy (computationally speaking) to calculate the hash of a given text but it is very hard to find the original text given its hash. This means that they are very good in verifying equality of data, without actually storing the data. As an added bonus they have a fixed width for any text, so that you can very precisely plan the space requirements of your database while simultaneously allow for your users to enter arbitrary length data. For example the following method of storing passwords is very secure: instead of storing the password store, store the hash of the following string [a long - meaning 32 characters minimum - random string which is hardcoded in your application] + [username] + [the password supplied by the user] (where
+ stands for string concatenation). Now when you want to authenticate a certain user, you just calculate the hash of the string [random data] + [username] + [supplied password] and check if it's the same as the one stored in the database. The advantages of this method are:
- If someone gets access to your user data, s/he cant directly find out the passwords (revealing the passwords is very risky because most users use the same password at multiple sites and one such compromise might lead to a chain of compromised accounts). This is true even for internal people also - like system administrators or database administrators.
- It eliminitates the possibility of guessing the password using rainbow tables, which are basically huge tables storing entires like [text], [hash of text] for fast lookup (meaning that if you know the hash you can get the text simply by doing a lookup in the database). However such tables grow very big very fast, so the biggest out there is under 16 characters as far as I know. However because you prefixed each string with a 32 character string before hashing them, you can guarantee that even the weakest password is at least 32 characters long, removing it from such tables.
- If the attacker would to generate such tables for your database (assuming that s/he has both the database and the random string), s/he would have to generate a different table for each user because their username is also hashed in.
The downsides are: you can't send out the password to the user if s/he forgets it, however there are alternative solutions, like sending a link to the user from where s/he can reset the password. Of course, these links should time out after a reasonable amount of time (6-12 hours!). Also weak passwords are still vulnerable to bruteforce attacks, so you should encourage your users to chose strong passwords.
Update: you can play with SQL injection attacks (to get a better feel for them) on the examples provided by Foundstone (now owned by McAfee) - click on the SASS tools - or listen to the corresponding episode of the Mighty Seek podcast.