What is data?
In computer science data can be in many forms:
- simple small number: 1, 2, 3
- a very large number: 123456789012345678901234567890
- a simple character: A, B, C … a, b, c
- a string of characters: “This is a string”
- an string of Unicode symbols like: “∑∏∫±√×≤≥≠≈”
- a very large strig or text: “I can’t give you an example but you can imagine one”
- a date: 30-05-2016
- a time: 11:30 AM
- a JSON document
- an XML document
- an Object
- an Image
- a video stream
- a group of numbers representing spatial coordinates (GIS)
- a bunch of text that represents something for somebody
- a bunch of bits that represents something for somebody.
Depending on how many of these data formats are supported and how data is organized a database have more or less usability. Usually there is a trade-off between performance and convenience of use.
The most primitive databases are file based. You store data in files. Usually each file contain one table that has columns and rows. The most popular file based database are DBase and FoxPro. These are considered obsolete now and there are virtual no companies that are looking for these programmers. I used to be a FoxPro developer but Microsoft has bought this product and has closed it in favor of Microsoft Access and SQL Server.
Some databases can store only native simple types: numbers and characters, while others can store complex data types like images, video streams or objects. These databases are called object oriented because they borrow something from OOP (Object Oriented Programming).
In this kind of databases data is stored in tables. These are database objects used to organize data in rows and columns. Relational databases RDBMS is based on relational algebra. When we design a relational database we deal with normalization rules. Relational databases use SQL query language for data definition and data manipulation.
Databases that can store only JSON objects are called document oriented databases. JSON is encapsulating a complex structure that have information about data and the data stored together.
Name: "John Doe",
Children: ["Ana", "Maria", "Miky"]
Document oriented databases are sometimes called big-data or no-sql. These databases are fast due to 2 important techniques:
- Distribute data over several computers;
- Store related data together so there is no need for a join operation.
No SQL databases are usually very good at data replication and are using distributed storage. That means a database is replicated on multiple nodes and do not have a central server. This technique is sometimes called Map-Reduce or big data.
There are few databases that store data into a graph. These databases are very good in representing relations between data elements. Neo4j is the most representative in this category. One of the most significant graph example is a network of roads. I have seen once a demonstration on Nokia heckaton about this. Very promising technology!
Most of the databases store data on disk (HDD) very efficient for update/delete/insert operations. However there is a very small number of databases that have data primary organized in memory and very few disk access operations.
In memory databases are very fast (real time fast) but limited in size due to reduce RAM capacity. However in some cases the database is small and has to be ultra fast. Redis database is the most representative in this category: http://redis.io/
This kind of databases can store multiple data types and have support for complex types. Oracle and PostgreSQL are two of most representative hybrid databases.
These databases can organize data on rows or columns and can use a combination of local and cloud storage to improve performance for local applications versus remote storage.
By using a hybrid design one can create data centric applications that can respond faster to data retrieval and can improve overall performance of a system between 50% up to 100%. The secret is to store complex objects and avoid SQL JOIN operations.
This is a declarative computer language specific to databases. It has 2 important dialects: DML=Data Manipulation Language and DDL = Data Definition Language.
NoSQL databases are using JSON for storing data but there are few exceptions that use other kind of storage.
Triggers and Constraints
Most of databases have rules that can verify database integrity when data is modified or first time introduced into the database. These rules are small programs or expressions that are hosted by the database server and started by a particular event. For example when data is updated, deleted or inserted.
Triggers can slow down a database input significantly but in general do not have a negative effect over the data search or data output. Sometimes triggers are disabled to improve batch processing performance.
Older databases in general SQL databases are using a customized language for data manipulation. This language can be used to store a program in the database. Therefore these procedures are called also “stored procedures”.
This feature is one of the most difficult to implement and show the power and the dedication of the company that supports the database.
The spatial feature permits storing x,y and z geographic coordinates or points is space. The storage permits also spatial relations between the points to form geometrical shapes like circle, triangle square, simple lines or curved lines.
Usually a database is a service application or daemon. This is a program that stay in memory after computer is started and most of the time start automatically with the server host operating system.
Most of databases are multi-user. However SQLite is a single user database and this database do not have a service. It is an embedded database, and this means user program can use the database compiled into first program.
Data replication is a distinguishing feature of professional databases. This is used to copy data from one main server to one or more backup servers. When the main server is not operational the backup server is used. Sometimes replication is used to improve performance of applications by using distributed computation. Replication can be uni-directional or bi-directional and can have a master server or can be master-less.
See also: Top 12 database engines