A database has two important jobs: One is to store data for a long period of time and the other is to provide access to the data for interrogation. These aspects are related to each other and influence each other. A good database is storing data efficient by using as little space as possible on disk. Second a database must provide fast access to the data and allow quick search and update operations.
Performance is a quality of databases to respond fast to update commands and queries. This is necessary because database can serve multiple users and many requests per second.
The efficiency is not the same as performance. The performance is coming with a cost. This cost is depending on how many resources are used to resolve a request.
- processor time,
- number of threads,
- memory size,
- disk access operations,
- temporary disk space used.
The more resource you use the less efficient your program is. Therefore when we evaluate performance we must consider the time of execution as well as the efficiency of execution.
Server resources are shared between several users or applications. Therefore if one user is taking over the resources, other user can have performance problems. Balancing the resources allocated for a job is the server responsibility but also depend heavily by the design of the database.
There are two important categories of SQL that you must distinguish when you consider performance of a database:
- Small query = a query or update that is referring to one or several records
- Large query = a query or update that is referring to large number of records
These two types of query can access data using different methods. Sometimes the proper access method exists if the database is correctly designed or can be missing when the database do not have a good design.
Design for performance:
Design of a database is depending on the type of application and the type of SQL used most of the time in the application.
A database that use small query most of the time has an OLTP design. OLTP (On-line Transaction Processing). This database is good for supporting interactive applications.
A database that is using most of the time large query it has an OLAP design. OLAP (On-line Analytical Processing) deals with historical data or archival data.
For ETL (Extract Transform Load) applications you must avoid using triggers, indexes and constraints. This include all references that are also constraints. For this purpose you can work with “deferred” constraints.
Data Access Methods
An SQL is resolved by database engine using an algorithm. This algorithm is based on several data access methods. SQL can use one or other data access method depending on the query type and database design.
To analyze an SQL performance a designer can create an “execution plan” using a tool provided by the database engine. This plan can be investigated by developers or designers to improve performance.
The execution plan show what data access methods are used and how much cost is involved for each intermediary step that is composing a query execution plan.
For example Oracle can use following access methods:
- Direct access
- Index access
- Bit-map index access
These methods are used to create an “access path”. I will enumerate the most significant:
- Full table scan
- Access by RowID
- Index unique scan
- Index range scan
- Index full scan
- Bitmap index single value
- Bitmap index range scan
See also: Oracle Documentation
For a large SQL a full table scan may be the festers method to access all the data in one table. For a small SQL a direct access by RowID may be the fastest method you can use.
Data in a table is identified by a unique key. In Oracle this key is called ROWID. User can also create a key to identify a row into a table. This is called a primary key. The primary key is optional in Oracle but may be mandatory in some other databases. Think to an index like a small table having two columns: One is the ROWID and the other is column value. Index is sorted by column value.When we search data from a column if this column have an index we can use the index for fast search. This is possible because the index is sorted by data column value.
In the picture below we have created an index with name: INDEX_PERSON_AGE for a table named PERSONS. Index name can use a convention to show that belong to a particular table but the index name can be something totally random. The index contains column AGE, sorted and the column ROWID.
Index creation SQL
To create this index we can use SQL:
CREATE INDEX INDEX_PERSON_AGE ON PERSONS (AGE);
Once an index is created it is used automatically by the database. We do not use the index directly. The SQL engine is using the index if the query can be optimized to use the index. Sometimes the index is so important that the database is creating the index for you. For example oracle creates indexes for primary key automatically.
An index can be active or inactive. When index is active it is automatically maintained by the database engine. If the index is inactive for a while it may get out of sync and become unusable.
Now if the index exists we can fast search by AGE or we can order the query result using ORDER BY and database may use the index to get the result faster.
Note: Into a sorted table you could not insert new data without a huge cost. Therefore in reality an index is not a sorted table but is a binary tree structure or other kind of data structure optimized for search and able to support fast update or insert operations.
Practical design rules:
- Create a primary key for each table;
- Create an index for the primary key;
- Create an index for any foreign key;
- Create an index for alternative keys;
- Create an index for frequent search criteria: NAME, CODE
Some mistakes to avoid:
- Do not create index for each columns but only significant ones.
- Too many indexes will slow down the insert, update and delete operations.
- Bitmap indexes can cause insert/update bottlenecks
See also: Oracle Create Index