How to choose database
By Gints Plivna
I'll use term "database" in this article speaking actually about a database management system, provided by a vendor.
So you have a new project and you can at least partially force or take part in decision process for the database for the new project. What are the main criteria you should take into account to spend as less as possible $$ or €€ in your project? This article tries to give at least some basic introduction in this process and ideas to consider.
The article has two major parts - first one enumerates and explains what criteria you should take into account in database decision process. In the first part I'll try to be as impartial as I can consider my previous experience. Speaking about criteria I won't use any reference to any database :)
The second part however lists some popular databases along with links to their WebPages. Here of course I won't be as impartial and add a bit opinion on some of them.
The first isn't a criterion. The first one is premise you should understand in all this process - each database is as different as cars produced by different vendors. Cars are very different starting from Zaporozhec until Ferrari, Maybach and following monsters. The same situation is with databases. You won't be able to carry a big truck load into the Ferrari. If you'd try to use the same positions for automatic differential gear as for manual either you'll break it or gear box. The same with databases - they are produced by different vendors, they have different architectures and you cannot use the same best practices for DBMS X as you've used for DBMS Y.
This is the main criterion. You have to understand your functional and nonfunctional requirements and choose the best database that is most suited according these requirements. The main requirements should be at least as follows:
1) Data amount and type of data (text, binary, spatial or other specific data types);
2) Number of simultaneous users (simultaneous requests to database);
3) Availability i.e. how much outage you can allow for your db;
4) Scalability - what you'll do when data and user count in your database will grow;
5) Security - how much you'll need such features like secure data, data encryption, user management, data access according to different privileges.
6) Manageability and administration - how easy and user friendly you'd like to manage database;
7) What other cool features you'd like to get out from your db.
It is reasonable to create your requirements according to your resources and real necessities. Of course it is advisable to look a bit into the future and modify requirements accordingly, but that should be done with care.
Some examples. You have always to think whether you really need 99.99% availability (actually that means less than hour downtime in a year!) if your system at nights are used by a pair of users and they can easily sometimes wait till the morning. Security is very important thing however you have to remember that there aren't real cure against stupidity of the people. High availability of course is very complex affair and for example your database is running doesn't mean customers can access it in case of network failure. You have to remember also about such things like backup air-conditioners otherwise in a hot summer day after broken air-conditioner none of the database mirroring, clustering and standby databases, diesel generator or God only knows what else would help :)
Take into account your previous experience and skills working with data bases. As I've said, they are different. If you'd choose unknown or semi-unknown data base, you'd need time to get acquainted with it, to understand its supported tools, to understand its features and strengths, to understand its weaknesses, which usually are not pointed out on the top page ;) Probably you'll have to send your database administrators and developers to extra courses costing extra money and absence of workers.
There is a big difference whether your project is the first one in some area or it should take into consideration many other systems it should collaborate with. In the first case you have much bigger freedom of choices. In second case you have to deal with the fact that usually databases from one vendor talk better with the same vendor's databases than his rival DBMSes. In case of already existing databases and different DB for your new project, you and/or your customer has to deal with different supports thus introducing bigger complexity not only on technical level, but also on administrative level.
Wherever you are usually there is different situation with different DB developers and administrators. Make sure that you will be able to find a knowledgeable person in case of your db failure. Either he is your employee or specialist from Consultancy Company; you should have at least one but better more than one person you trust. It will be very unpleasing to pay big money per day to somebody restoring your db but it will be much more unpleasing to realize that there is no one who can do that.
The same story is about usual working process. Administrators and developers of different DBMSes have different salaries. This might vary from time to time and place to place but definitely there are differences. Available labor market for a chosen DB will directly influence salaries, it will directly influence whether you can get a new developer in case the previous one performs worse than you expected.
This is the most common thing everybody speaks about in db selection process. One can find numerous articles and claims that DB X is cheaper than DB Y comparing only pure license costs. Actually cost of the license is only one of the factors that directly influence cost, not to speak about indirect cost components. Speaking about license costs one has to remember:
- DBMS usually have different licensing models, for example, based on number of users, CPU count and RAM amount.
- Official license cost usually is only the starting point. Probably you can get some discounts especially if you are a big customer.
- Biggest DBMS usually have different editions, like enterprise and standard having important differences in license costs. Do you really always need enterprise edition, although this usually is the first bet? Again remember the first point - databases are different - therefore never compare the same edition prices for different DBMSes mechanically. Look inside. Look inside what each edition actually contains, what is enterprise for database X might be just standard for database Y.
- Explore DB licensing schemes, explore discounts, explore additional features and options. Don't take DB sales person called final cost for granted, ask him to explain breakdown of it.
Understand how much support costs and what does it offer? Usually there are different support levels with different costs and offered features.
6.3. Additional features
Some databases have additional options even to enterprise edition. For additional price of course. So you'll pay both for your enterprise/standard edition as well as additional features. This is the case when you should carefully analyze whether you really need additional option as well as you can ask for more discount from sales rep ;)
6.4. Operating system requirements
Some databases require specific operating systems. Operating systems as well as databases are different and require different administration and maintenance skills. In process of evaluating costs for operating system one can use similar approach as defined in this article for databases.
What is the probability that your chosen database would fail to respond? Who would recover it? What is the predicted time to recover it? How much money you or your customer would loose during failure and recovery?
There are some specific industries (for example geographic information systems) requiring special databases or at least databases having support for that specific industry. Of course if you'd like you can be the pioneer and start to adapt your chosen database for the specific industry however that might be really resource and time consuming task.
If you have chosen a specific product working with different databases then the first thing is to assure it hasn't only negative consequences of such applications. If you are definitely sure about your decision then clarify which is the preferred database. If all databases are equally good then you can be sure they are equally bad as well ;) I definitely don't wish you to be the first user on a new database for your chosen product ;)
Unless you aren't born as a pioneer it is worth to understand what are the existing user community resources - forums, e-mail lists, informal user groups, conferences and seminars. Also clarify what articles, whitepapers, manuals and other online resources which will make your life easier in case of emergency. Most probably you won't like to be the first one to solve too many problems; it is enough to be the first one to solve a few problems.
Understand what are the supported hardware platforms and minimal hardware parameters. Of course it depends on the planned amount of data, simultaneous user count and predicted load on your database.
To be fair I don't hope you'll always follow decision process and decision criteria explained above ;) Even more - I'm quite sure many times when you are going to create a small system for 5 users you don't care about high availability, security etc. BUT there is one most important thing I'd like to highlight - the cost of database isn't the same as the cost of its license! No! The bigger your system grows the less impact on overall expenses license cost has.
If you decide to use TCO method to choose your db then above mentioned criteria can be used as some of input criteria for TCO calculation.
There is an article by Lewis Cunningham "A Complete Newbie's Guide to Choosing a Database" explaining some similar thoughts about the same topic.
According to Gartner Relational Database Market Share there are three main players in commercial database market. These DBMSes are Oracle, DB2 and SQL Server by Oracle, IBM and Microsoft accordingly. Competition usually is quite good thing, there is endless rivalry among them who is the best (usually without precise definition what does it mean „best") and also we all are able to use express editions (no license cost) of their products.
From the open source products I'd like firstly to mention two of them MySQL and PostgreSQL. Motto of the first one is „The world's most popular open source database", but for the second „The world's most advanced open source database". As you most probably have already figured out they have their fight on their own, as well as of course trying to get a piece of cake from big three. There is a short article explaining some differences between above mentioned two as well as some other DBMSes.
One has to remember that both lists aren't exhaustive list of all relational DBMSes. Definitely there are other ones probably even more popular than some of listed here.
Informix - historically Informix was one of the big three, but now it is bought by IBM and most probably that means slow death.
Sybase Adaptive Server Enterprise - historical ancestor of MS SQL Server.
Teradata - usually used for data warehouses. About data warehouses and most popular solutions there one can read in another Gartner article Magic Quadrant for Data Warehouse Database Management Systems, 2007.
It is interesting to note that Oracle actually owns at least following different databases:
With the process of choosing a database all other development process with its associated problems and nightmares is only starting. To minimize these you can follow many best practices and guidelines including some on this very site (Do you have data waste or data base? and Naming conventions of tables, columns, indexes, keys, sequences in Oracle).
If you like you can discuss this article here.
Gints Plivna firstname.lastname@example.org is system analyst in Rix Technologies Ltd. (www.rixtech.lv). He has experience in working with Oracle since 1997 and his interests mostly have been connected with analyzing system requirements, design, development and SQL tuning.
e-mail - email@example.com
website - http://www.gplivna.eu/
is licensed under the Creative Commons Attribution-ShareAlike 2.5 License. To
view a copy of this license, visit http://creativecommons.org/licenses/by-sa/2.5/
or send a letter to
First edition: 2007-11-26
Added link to Lewis Cunningham article: 2008-05-13