
Timothy
A lot has been written in the last few years about application security. One of the most popular sub-topics in this arena has been a scheme called SQL injection. SQL Injection is a scheme whereby a hacker uses knowledge of database technology combined with flaws in the program design to allow them a level of access to data that is beyond what was intended in the applications design.
A database is a program that manages a repository of information.
A database, more properly a relational database management system, is a process that runs on a computer. It stores data in tables which are comprised of rows and columns, similar to a spreadsheet. Each row is referred to as a “record” and each column is referred to as a “field”.
Modern database management systems provide many data functions for manipulating both the data and the table structure.
A database is a program that manages a repository of information. The database can answer questions and requests for information.
The principal function that most users are involved with is the storing and retrieval of records. In order to retrieve a record or records, a request command, known as a “query”, is sent to the database. This request specifies what table the records is stored in, and what the identifying characteristics of the desired record are. This is similar to asking someone to go to the kitchen, look in the pantry, and bring you the box which has rice in it.

Image credit: jscreationzs
A database is a program that manages a repository of information. The database can answer questions and requests for information. Questions are asked using the Structured Query Language (SQL).
The SQL language was developed in the early 1970s by IBM but was not used commercially until Relational Systems (now Oracle) incorporated it in the initial release of the Oracle database system. A query in the SQL language takes the form of several clauses, most commonly the SELECT clause the FROM clause and the filter, or WHERE clause. Assume a table exists that is called “books” and contains fields named “title” (which stores the title of the book), “publication_date” (which stores the date the book was published), and “author” (which stores the name of the author of the book). In this case a SQL query that was designed to return the title and author of all books written in the year 2003 would look something like this:
SELECT title, author
FROM books
WHERE publication_date = 2003;
There are other clauses which can be included in our query to do things such as sort the data. In many programs and more importantly in the processing that occurs behind many interactive functions on web sites, there is a database that is doing some of the work. A extremely simple example of this would be a web page that allows a user to enter a book title and returns the name of the author and the date of publication. In order to do this the web page must provide the database with the information the user entered. This is done by having the web page generate a SQL statement which, assuming the user entered “Ogden Nash” as the input, would look something like this:
SELECT title, publication_date
FROM books
WHERE author = ‘Ogden Nash’
ORDER BY publication_date;
A database is a program that manages a repository of information. The database can answer questions and requests for information. Questions are asked using the Structured Query Language (SQL). The database will answer any question asked of it.
It’s that last one that’s a problem. Presuming that our user has a knowledge of the SQL language and a heart full of evil intentions, rather than an author’s name, they might enter something like “: SELECT * from users;: –“. The web page will assume that the input is an authors name and place it in the SQL statement making a final SQL statement that looks like this:
SELECT title, publication_date
FROM books
WHERE author = ‘: SELECT * from users;: –‘
ORDER BY publication_date;
What has actually happened here is that the first character in the users input (the colon) is how individual requests in a SQL statement are separated from each other. The SQL statement now contains three queries for the database to answer. The first query:
SELECT title, publication_date
FROM books
WHERE author = ‘:
is an incomplete statement and can be anticipated to cause an error message to be returned from the database. Just for an example let’s say that message is “SQL Syntax Error (#71421)”. The third query:
–‘ ORDER BY publication_date;
will be ignored because the two dashes that begin the statement identify it as a comment that is not intended for execution. It is the second query:
SELECT * from users;
that contains the threat. In this example, assuming the existence of a table named users in the database and contains fields for the name and password of each user of the system the web page that was intending to display a list of books by a certain author would actually display something like:
SQL Syntax Error (#71421)
Joe MyPassword
Elizabeth HerPassword
Dan TopSecret
If there is no table named users, “…well, maybe they called it ‘passwords’ instead”, thinks our hacker and tries again. He will ultimately be able to access any data in the database that the application he is using as a conduit can access.
How can I prevent this stuff?
There are three rules of thumb to make your database more difficult to access through a SQL Injection type of attack.
First, minimize your exposure. The attack described here is deliberately simplified in order to clearly show the problem. The technique of assembling a SQL statement from a combination of static text and user input is called “dynamic SQL”. In many situations this dynamic SQL can be replaced with a database function called a stored procedure. In a stored procedure user input is supplied in the form of parameters that are referenced by the function but are not included in the actual SQL statements that are executed. Thus an attacker will not have a way to submit SQL queries of their own design to the database for processing.
Second, never, ever trust user input. Include routines that examine the input looking for characters (such as colons) that are not expected. Very few authors have colons in their name. Other phrases and punctuation should also be looked at with a grain of salt. Input that is anticipated to be numeric should be checked to verify that it doesn’t contain any text. This will strengthen the application and prevent input errors caused by innocent typos on the part of the user.

Image credit: Salvatore Vuono
Third, don’t be too predictable. Avoid table and field names such as “password” or “SocialSecurityNumber” or table names such as “CreditCards” in your database. Such names simply shout “I contain sensitive information” to an attacker.
SQL Injection, and secure application design are large subjects. My next blog will take a more in-depth look at each of these techniques and discuss where and how to use them to protect your applications in production.