SQL and Indexes
- Details
- Category: SQL Tips
- Published: Thursday, 28 April 2016 21:55
- Written by Super User
- Hits: 1159
SQL (Structured Query Language) is the language that almost every DBMS system knows and has its functionality built around with some variations. The language has been around since 1970 and became a DBMS Query Standard (ANSI SQL) in 1986 and since then used in almost every DBMS system available.
An index is a data structure, often a tree structure, that improves the searching and retrieval speed of a data in a table. The index is in fact a copy of some part or all ( called full index) of the tables data. Indexes are created using one or more columns of a table and organized to make random access lookups and sequential accees lookups as efficient as possible.
The sql query that is built do retrieve data is often not designed the optimal way, either by the user or query-builiding tools, or tables that is queried lacks good and vital indexes to gain optimal peformance.
When parsing and executing sql queries, a DBMS system and it's optmizer use all available resources to define the quickest access path to the data involved. Those resources are the tables that are being queried, it's indexes and it's structure and statistics.
A typical and simple sql statement looks like this
select b.bookid, c.name categoryname, b.title booktitle, a.firstname || ' ' a.lastname authorname
from books b, author a, category c
where b.authorid = a.authorid
and b.categoryid = c.categoryid
order by a.lastname, a.firstname,c.name,b.title
Here the sql is joining the tables book, author and category together to make a list of books, it's author and in which category it is in.
So lets look at the indexes here that will make this query as fast and efficient as possible. The optimal indexes on this query would be
book_ix (categoryid, authorid, bookid, title)
author_ix (authorid, lastname, firstname)
category_ix (categoryid, name)
As the sql query is joining the book table to both the author and category table on the columns authorid and categoryid the best index on the book table (book_ix) is as described above. And the order of the columns are because the number of distinct categories are usually smaller than distinct number of authors in the tables and therefore the index would be faster to search. The other indexes have the authorid and categoryid at the first index-level for the dbms to be able join and match the values in the tables using only indexes. The other columns that are not predicates in the where clause but listed in the select are then put lower levels of the index to make the query index only query.
Putting good and well designed index(-es) is crucial to both simple and complex queries and their performance, especially on large tables. My experience is that execution time can go from several hours down to several minutes or seconds if well designed indexes are applied to the tables involved. Execution time as well as CPU benefits from it and if you are paying monthly fee dependent on CPU then you will see your montly budget grow as your CPU plunges down.