When we design a data-centric application we need to select a database engine that we are going to use. If we start a project with the wrong database we can have higher cost if you wish to change the database latter. In the next video I will present classification of databases by features.

Database Features

What is data?

In computer science data can be in many forms:

  • simple small number: 1, 2, 3
  • a very large number:  123456789012345678901234567890
  • a simple character: A, B, C … a, b, c
  • a string of characters: “This is a string”
  • an string of Unicode symbols like: “∑∏∫±√×≤≥≠≈”
  • a very large strig or text: “I can’t give you an example but you can imagine one”
  • a date: 30-05-2016
  • a time: 11:30 AM
  • a JSON document
  • an XML document
  • an Object
  • an Image
  • a video stream
  • a group of numbers representing spatial coordinates (GIS)
  • a bunch of text that represents something for somebody
  • a bunch of bits that represents something for somebody.

Depending on how many of these data formats are supported and how data is organized a database have more or less usability. Usually there is a trade-off between performance and convenience of use.

Object Oriented

Some databases can store only native simple types: numbers and characters, while others can store complex data types like images, video streams or objects. These databases are called object oriented because they borrow something from OOP (Object Oriented Programming).

Relational Database

In this kind of databases data is stored in tables. These are database objects used to organize data in rows and columns. Relational databases RDBMS is based on relational algebra. When we design a relational database we deal with normalization rules. Relational databases use SQL query language for data definition and data manipulation.

Document Oriented

Databases that can store only JSON objects are called document oriented databases. JSON is encapsulating a complex structure that have information about data and the data stored together.

JSON is easy to parse and light weight. It comes from JavaScript Object Notation. I uses curly brackets and coma saparated pairs. It is human readable better then XML.

JSON Example:

{ID: 10,
 Name: "John Doe",
 Age: 45,
 Children: ["Ana", "Maria", "Miky"]
}

Document oriented databases are sometimes called big-data or no-sql. These databases are fast due to 2 important techniques:

  1. Distribute data over several computers;
  2. Store related data together so there is no need for a join operation.

No SQL

No SQL databases are usually very good at data replication and are using distributed storage. That means a database is replicated on multiple nodes and do not have a central server. This technique is sometimes called Map-Reduce or big data.

Graph Databases

There are few databases that store data into a graph. These databases are very good in representing relations between data elements. Neo4j is the most representative in this category. One of the most significant graph example is a network of roads. I have seen once a demonstration on Nokia heckaton about this. Very promising technology!

In-memory Databases

Most of the databases store data on disk (HDD) very efficient for update/delete/insert operations. However there is a very small number of databases that have data primary organized in memory and very few disk access operations.

In memory databases are very fast (real time fast) but limited in size due to reduce RAM capacity. However in some cases the database is small and has to be ultra fast. Redis database is the most representative in this category: http://redis.io/

Hybrid Databases

This kind of databases can store multiple data types and have support for complex types. Oracle and PostgreSQL are two of most representative hybrid databases.

These databases can organize data on rows or columns and can use a combination of local and cloud storage to improve performance for local applications versus remote storage.

By using a hybrid design one can create data centric applications that can respond faster to data retrieval and can improve overall performance of a system between 50% up to 100%. The secret is to store complex objects and avoid SQL JOIN operations.

Database Features:

SQL

This is a declarative computer language specific to databases. It has 2 important dialects: DML=Data Manipulation Language and DDL = Data Definition Language.

JSON

This is a data format that is inspired from JavaScript object representation. It is a text format for data and is better then XML. Check out the definition: json.org

NoSQL databases are using JSON for storing data but there are few exceptions that use other kind of storage.

Triggers and Constraints

Most of databases have rules that can verify database integrity when data is modified or first time introduced into the database. These rules are small programs or expressions that are hosted by the database server and started by a particular event. For example when data is updated, deleted or inserted.

Triggers can slow down a database input significantly but in general do not have a negative effect over the data search or data output. Sometimes triggers are disabled to improve batch processing performance.

Procedures

Older databases in general SQL databases are using a customized language for data manipulation. This language can be used to store a program in the database. Therefore these procedures are called also “stored procedures”.

In latest design new databases can use a general purpose language like Java, JavaScript or even Python or C++ to create native programs that are working much faster on server side applications instead of stored procedures. These programs are hosted by an application server as separated services.

Spatial

This feature is one of the most difficult to implement and show the power and the dedication of the company that supports the database.

The spatial feature permits storing x,y and z geographic coordinates or points is space. The storage permits also spatial relations between the points to form geometrical shapes like circle, triangle square, simple lines or curved lines.

Multi-User

Usually a database is a service application or daemon. This is a program that stay in memory after computer is started and most of the time start automatically with the server host operating system.

Most of databases are multi-user. However SQLite is a single user database and this database do not have a service. It is an embedded database, and this means user program can use the database compiled into first program.

Replication

Data replication is a distinguishing feature of professional databases. This is used to copy data from one main server to one or more backup servers. When the main server is not operational the backup server is used. Sometimes replication is used to improve performance of applications by using distributed computation. Replication can be uni-directional or bi-directional and can have a master server or can be master-less.

See also: Top 12 database engines