Sage-Code Laboratory
index<--

Database Systems

A database is a special software that run on servers as daemon or service. A database is the key ingredient to create: dynamic websites, single page APPs, digital maps or distributed APPs. Database developers are highly appreciated in any software team.

Video Lecture

If you have time and bandwith you can watch this video to learn about data centric architecture. This video is about 20 minutes long, and is very informative for beginners. It was filmed in Chicago by Elucian in Jul, 26, 2016. Please press thumb up button if you like it. Thanks!

Database Applications

Database applications

A database applications is a large software stystem in which a database play the most important role. For building such an application we use two tire or three tire architecture. First tire is the database itself, second is the application server and third is the Internet browser for web-app or native app.

To design a data centric application the architect will design the database first. Once the database has a design model, one or more software developers will create scripts and programs for the database.

Design elements include tables, relations between tables, validation rules, domains, triggers and stored procedure requirements. This design is usually done using ER diagrams with a program that can be provided by the database vendor.

Data Manipulation

Data centric applications can use a relational database or a document oriented database. If the programs for the databases are written as stored procedures there is a native manipulation of the data inside the database. When data manipulation is performed using an external program then there is a data transfer between the database and the program.

Data Oriented Languages

Most of computer languages are general purpose language. However for data centric applications there are specific dedicated languages. Every relational database has a different language.

Other data scientific languages:

Impedance Mismatch

In computer science there are Object Oriented Languages like Java, C++, Python and other languages that can have a data model created in memory using classes. Relational databases can have a different data model using relational tables. The two model are not matching perfectly and there is an "impedance mismatch" that need an adapter to function. This adapter is called Obect Relational Mapping (ORM) application.

See also: Object-relational mapping

For most popular computer languages there are available several ORM frameworks available for architects to choose from. For example Java has Hibernate, TopLink and Athena Framework.

See also: ORM Software

The ORM software is a key in data centric applications. Many applications have a bottleneck that is the ORM. Therefore is a good ideea to avoid impedance mismatch and use a disruptive technology to do so.

To Avoid Impedance Mismatch we can use 3 different techniques.

  1. Store data as complex objects using an Object Oriented database
  2. Store data as documents using JSON notation into document oriented databases
  3. Use data – oriented computer languages that are not object – oriented.

Data communication

A data-centric application must communicate with the client application. Depending on the client a data-centric application can use a native data protocol or micro-services. Some databases have tools that can create micro-services automatically and expose functions to read and write data into databases.

The most common approach is to use a database driver for example ODBC or JDBC driver. Some databases provide native drivers for C/C++ and other computer languages. In this case the application will send SQL and will receive data packages.

Most of the time an architect must design the interface for the database. This include the protocol and the data format for data exchange. Data protocol for communication can be XML based or JSON based for services.

Examples:

There are diverse types of applications that can be connected to a database.

All these applications are named OLTP applications. (On-line Transnational Processing). Sometimes a data-centric application has a role of backup or history. In this case the database in an OLAP database or on-line Analytical Processing Applications. These kind of applications are becoming obsolete and are replaced by Big Data Analytic applications.

See also: Data-centric Architecture

Database Features

Next we will describe database features for several database engine. To know what features of databases will help you to make a better decision when you have to select a database. Some databases sacrifice some of the features looking for better performance or better usability.

Depending on the features implemented, a programmer have an easy time programming or not. The end user do not care about the features of your database. End users care about his experience. To be fast, secure and free. So what you do as a developer? Here we tell you. Let's get started with data types:

Data Types

Hold on a minute. What is data in Computer Science? Everything your computer does is to manipulate data. Once you understand what data actually is you will next understand more easy how to create programs that can manipulate your data.

most frequent data types

Video Lecture

Next video is 10 min long and is very informative for beginners. If you think you do not have the video bandwidth, you can watch this then read the text below, or skip to the text. We will explain everything in plain text.

Database Applications

Query Language (SQL)

This is a declarative computer language specific to databases. It has 2 most important dialects: DML=Data Manipulation Language and DDL = Data Definition Language. You can study SQL as a programming language in next section:

Jump to: SQL Tutorial

Using SQL and PL/SQL we can create code stored with database. This helps to improve code reliability and maintenance. By linking the code with database, we can detect incompatibilities when we update the database structure. Stored code can become invalid and unusable. This is a signal that we need ot update the code.

JSON = Java Script Object Notation

This is a data format that is inspired from JavaScript object representation. It is a text format for data and is better then XML. Check out the definition: json.org. NoSQL databases are using JSON for storing data but there are few exceptions that use other kind of storage.

Document oriented databases can store only JSON objects. This is encapsulating a complex structure that is metadata information and the data itself, stored together as a document.

JSON is easy to parse and light weight. It comes from JavaScript Object Notation. I uses curly brackets and coma saparated pairs. It is human readable better then XML.

JSON Example:

{
    ID: 10, 
    Name: "John Doe", 
    Age: 45, 
    Children: ["Ana", "Maria", "Miky"] 
}

Document oriented databases are sometimes called big-data or no-sql. These databases are fast due to 2 important techniques:

No SQL databases are usually very good at data replication and are using distributed storage. That means a database is replicated on multiple nodes and do not have a central server. This technique is sometimes called Map-Reduce or big data.

Triggers (TRG)

Most of databases have rules that can verify database integrity when data is modified or first time introduced into the database. These rules are small programs or expressions that are hosted by the database server and started by a particular event. For example when data is updated, deleted or inserted.

Rules are implemented as triggers and constraints. These can slow down a database input significantly but in general do not have a negative effect over the data search or data output. Sometimes rules are disabled to improve batch processing performance.

Stored procedures (SP)

Older databases in general SQL databases are using a customized language for data manipulation. This language can be used to store a program in the database. Therefore these procedures are called also "stored procedures".

In latest design new databases can use a general purpose language like Java, JavaScript or even Python or C++ to create native programs that are working much faster on server side applications instead of stored procedures. These programs are hosted by an application server as separated services.

Spatial layer (SL)

This feature is one of the most difficult to implement and show the power and the dedication of the company that supports the database.

The spatial feature permits storing x,y and z geographic coordinates or points is space. The storage permits also spatial relations between the points to form geometrical shapes like circle, triangle square, simple lines or curved lines.

Multi-User (MU)

Usually a database is a service application or daemon. This is a program that stay in memory after computer is started and most of the time start automatically with the server host operating system.

Most of databases are multi-user. However SQLite is a single user database and this database do not have a service. It is an embedded database, and this means user program can use the database compiled into first program.

Replication (RPL)

Data replication is a distinguishing feature of professional databases. This is used to copy data from one main server to one or more backup servers. When the main server is not operational the backup server is used.

Sometimes replication is used to improve performance of applications by using distributed computation. Replication can be uni-directional or bi-directional and can have a master server or can be master-less.

Database engines

A database engine is a program that is installed on a server and run automatically when server is started. This is known as a service or daemon. Once loaded in the memory it will perform input-output operations on a set of data files and will listen to requests on a specific TCP-IP port. Applications are connecting to this port and request data using SQL or send data modifications using DML.

Next is list of databases and the most important features. In my opinion these are the top 12 database engines. One database engine can support: SQL Language, JSON Data, Triggers, Stoded Procedures, Spatial Functionality, Multi-User access or Replication.

# Feature SQL JSON TRG SP SPA MU RPL
1 Oracle Y Y Y Y Y Y Y
2 PostgreSQL Y Y Y Y Y Y Y
3 SQL Server Y L Y Y Y Y Y
4 SQLite Y Y          
5 MySQL Y Y Y Y Y Y Y
6 MongoDB Y Y Y Y Y    
7 Casandra Y Y Y Y Y    
8 CouchDB Y Y Y Y Y  
9 Neo4j L L Y Y      
10 RethinkDB Y L Y Y Y    
11 IBM DB2 Y Y Y Y Y Y Y
12 Redis Y Y          

Relational Databases

A relational database is a collection of data items organized as a set of tables from which data can be accessed or reassembled in many different ways using SQL language. A relational database is usually normalized.

In this kind of databases data is stored in tables. These are database objects used to organize data in rows and columns. Relational databases RDBMS is based on relational algebra. When we design a relational database we deal with normalization rules. Relational databases use SQL query language for data definition and data manipulation.

Most popular relational databases:

RDBMS databases are preferred for most applications. Architects prefer to use quality components in the application architecture to create a robust system. Relational databases are robust have good performance and are secure. Since the database can be a single point of failure for the entire system a reliable database is the key for having the system function properly with no interruption.

Hybrid databases:

Hybrid databases use both approaches. Can use Object-Oriented design or NoSQL design. Developers can design relational tables but can also create complex objects and document oriented tables.

Using a hybrid database or an object oriented database your application can have an improved performance of up to 100 or even 1000 times faster then a classic design using R-DBMS database.

File based

The most primitive databases are file based. You store data in files. Usually each file contain one table that has columns and rows. The most popular file based database are DBase and FoxPro. These are considered obsolete now and there are virtual no companies that are looking for these programmers. I used to be a FoxPro developer but Microsoft has bought this product and has closed it in favor of Microsoft Access and SQL Server.

Object Oriented

This is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases which are table-oriented.

These databases can store complex data types like images, video streams or objects. These databases are called object oriented because they borrow something from OOP (Object Oriented Programming).

NoSQL Databases

Also known as Document Oriented or Big Data. These databases are storing information into a JSON format. This is unstructured data or document oriented data. Each record can have it's own structure. Data and structure are stored into a collection of documents.

Graph Databases

There are few databases that store data into a graph. These databases are very good in representing relations between data elements. Neo4j is the most representative in this category. One of the most significant graph example is a network of roads. I have seen once a demonstration on Nokia heckaton about this. Very promising technology!

In-memory Databases

Most of the databases store data on disk (HDD) very efficient for update/delete/insert operations. However there is a very small number of databases that have data primary organized in memory and very few disk access operations.

In memory databases are very fast (real time fast) but limited in size due to reduce RAM capacity. However in some cases the database is small and has to be ultra fast. Redis database is the most representative in this category: http://redis.io/

Hybrid Databases

This kind of databases can store multiple data types and have support for complex types. Oracle and PostgreSQL are two of most representative hybrid databases. These databases can organize data on rows or columns and can use a combination of local and cloud storage to improve performance for local applications versus remote storage.

By using a hybrid design one can create data centric applications that can respond faster to data retrieval and can improve overall performance of a system between 50% up to 100%. The secret is to store complex objects and avoid SQL JOIN operations.


Read next: Database Design