How much compatibility do the DB engines have at the SQL level? - sql-server

Let's say I wanted to have an application that could easily switch the DB at the back-end.
I'm mostly thinking of SQL Server as the primary back-end, but with the flexibility to go another DB engine. Firebird and PostGreSQL seem to have (from my brief wikipedia excursion) the most in common w/ SQL Server (plus they are free).
How similar would the DB setup, access, queries, etc.. be for Firebird, PostGreSQL and MS SQL Server?

Unfortunately, SQL varies widely across providers. It's almost impossible to write all but the most trivial SQL to run on a number of RDBMS - and then you're into lowest common denominator territory. Far better to use an abstraction layer to handle at least the connection to the database (inc. access, sending queries), and either an ORM to handle the SQL itself or per-provider SQL.
If you want to look into how they vary - good examples are auto-incrementing ids and obtaining the ID of the last inserted record.

I worked on one project where it was an absolute requirement to support many databases, including at least Access, SQL Server and Oracle.
So I know that it can be done. Mostly DML (SELECT,UPDATE,INSERT...) is the same and certainly we didn't have huge problems making it work across all of the databases - just occasional annoyances. MySQL was the exception at that time as it simply wasn't capable enough.
We found most differences in the DDL, but with the right architecture (which we had), it wasn't difficult to fix this.
The only thing that caused us a problem was generating unique id's - autoincrement is non-standard. Fortuantely in a database of around 40 tables there were only a few places where unique ID's were need (good DB design). In the end we generate the unique ID in code, and handle any clashes (everything in transactions).
It did make things easier because we had avoided using autoincrement for ID fields, it's harder to think of unique keys - but better in the long run.

Well, CRUD stuff should be the same everywhere, but if you build anything complex, you'll probably want to use triggers and stored procedures and that's where compatibility becomes low. Writing a DBMS-agnostic application usually means moving most of the business logic outside of database, so having a 3-tier application is, IMHO, a must in such case.
Alternatively, you could use some wrapper library that works like abstraction layer, but I'm yet to see one that is able to do the job correctly over a range of DBMS-es. Of course, that is also dependent on the programming language you use.

As pointed out by the other answers, DBMS vary wildly once you go beyond basic SELECT/INSERT stuff (and sometimes even there).
We also have to maintain compatibiliy across several DBMS. In my opinion the best approach is usually to use some kind of compatibility layer. We have an in-house DB abstraction library, but there are several tools available.
In particular, it might pay to look at popular ORMs (Hibernate, nHibernate etc.). They usually offer DB-independence as a kind of side effect. At least Hibernate also has a special query language that will automatically be translated to SQL for the DBMS you are using.

Related

What do I need to know about databases?

In general, I think I do alright when it comes to coding in programming languages, but I think I'm missing something huge when it comes to databases.
I see job ads requesting knowledge of MySQL, MSSQL, Oracle, etc. but I'm at a loss to determine what the differences would be.
You see, like so many new programmers, I tend to treat my databases as a dumping ground for data. Most of what I do comes down to relatively simple SQL (INSERT this, SELECT that, DELETE this_other_thing), which is mostly independent of the engine I'm using (with minor exceptions, of course, mostly minor tweaks for syntax).
Could someone explain some common use cases for databases where the specific platform comes into play?
I'm sure things like stored procedures are a big one, but (a) these are mostly written in a specific language (T-SQL, etc) which would be a different job ad requirement than the specific RDBMS itself, and (b) I've heard from various sources that stored procedures are on their way out and that in a lot of cases they shouldn't be used now anyway. I believe Jeff Atwood is a member of this camp.
Thanks.
The above concepts do not vary much for MySQL, SQL Server, Oracle, etc.
With this question, I'm mostly trying to determine the important difference between these. I.e. why would a job ad demand n years experience with MySQL when most common use cases are relatively stable across RDBMS platforms.
CRUD statements, joins, indexes.. all of these are relatively straightforward within the confines of a certain engine. The concepts are easily transferable if you know a different RDBMS.
What I'm looking for are the specifics which would cause an employer to specify a specific engine rather than "experience using common database engines."
I believe that the essential knowledge about databases should be:
What database are for?
Basic CRUD Operations
SELECT queries with JOINs
Normalization
Basic Indexing
Referential Integrity with Foreign Key Constraints
Basic Check Constraints
The above concepts do not vary much between MySQL, SQL Server, Oracle, Postgres, and other relational database systems. However you'd find a different set of concepts for the now-popular NoSQL databases, such as CouchDB, MongoDB, SimpleDB, Cassandra, Bigtable, and many others.
After the CRUD statements, to be an effective DB programmer I think some of the most important things to understand are JOIN statements. Understand the difference between LEFT and RIGHT, OUTER and INNER joins, and know when to use each. Most importantly, know what the database is actually constructing when it performs a JOIN.
For me, the Wikipedia article was very helpful.
Also, indexing is very important - this is how relational databases can perform fast queries. Understand how to use them and what happens under the hood.
Wikipedia article on DB indexing.
You should also know how to construct a many-to-one relationship (using foreign keys) and a many-to-many relationship (using join tables).
I know that in your question you're asking about specific DB implementations, but if you're to be taken literally and you only know about SELECT, INSERT, UPDATE, and DELETE, then the above concepts will be far more valuable than learning the intricacies of a particular implementation.
It's not just stored procs and functions. Each database has fundamental differences and quirks that are important to understand even though SQL works more or less the same.
Examples:
Oracle and MySQL handle locking differently, in different situations.
Oracle doesn't have autoincrementing primary keys like MySQL and SQL Server.
Subtle vendor-specific behavior, like the way Oracle does sorting for VARCHARs differently depending on locale.
If you really want to improve your applications, you eventually have to become familiar with the details about how your specific database works. Most of the time it doesn't make a lot of difference, but when it does matter, it usually makes a big difference, especially when it comes to performance.
Some things which seem to come up when talking with my Database-keen colleagues:
Row vs page vs table locking escalation when doing multiple complex joins, implies sometimes doing very different things on different vendors dbs. This is where the theory is really hitting the tarmac and often it is non-intuitive.
Differences between how cursors are best used on different vendor db implementations
Odd stuff in the stored proc language variants, like how best to handle failure cases
Differences in how temporary tables and views are best used depending on the underlying implementations.
All of these kind of things don't really matter until you are trying to solve something that either has to
- Run very fast
- Contain lots and lots of data
- Gets very big and complex (i.e. multiple queries hitting same tables simultaneously)
These are the kinds of things that DBAs should be helping with, so depends on if you are aiming to be a DBA or a programmer. None of the above have really hurt me yet, because I've not worked on db-intensive systems, but I've worked near a few, and the programmers on those end up knowing a lot about the internals, restrictions, and good features about the specific database they are using.
Best way to get knowledge like that (other than on the job) is to read the manuals or hang out with people that already know and ask them about it.
Don't forget relation schemas, Primary and foreign keys and how they are related. To start with DB, I would use MySql and MSSQL as these are most common in the market. I take Oracle as more advanced and complex db
As for the level of differences there are between vendors, it is because SQL is a standard (http://en.wikipedia.org/wiki/SQL#Standardization) and vendors implement that std differently.
Each of these vendors try to offer extras to have the crowd by their side... that's why you see functions available to one and not the other. But sometimes that function make its way into the standard so its not always a bad thing.
For stored proc. I would agree as ORMs and practices of today tend to do a greater separation of concerns by removing business logic from the database and considering it "only" a repository.
My 2 cents
I see job ads requesting knowledge of MySQL, MSSQL, Oracle, etc. but I'm at a loss to determine what the differences would be.
I'm what's called a SQL Developer. You won't see the differences much when you are doing run of the mill database work (CRUD). However the differences become quite apparent when you are dealing with the databases own brand of SQL.
When talking SQL outside of the standards, there are 4 distinctive types of commands. These are:
Data Manipulation Language (DML)
Data Definition Language (DDL)
Data Control Language (DCL)
Transactional Control Language (TCL)
The biggest differences come in the last two, DCL and TCL. Those have a LOT of database specific non-standard SQL commands. The first two, DML and DDL, are very similar across any database that use the relational model.
Also the bigger database vendors have nicknamed their SQL implementation. Here's a short sample:
SQL Server : T-SQL
Oracle : PL-SQL
PostgreSQL : P-SQL or NG-SQL
Firebird : IB-SQL
MySQL : mSQL
The list goes on, but you get the point. Wikipedia has good articles on the different command acronyms.
I have found that most employers won't be able to articulate this, because most will use non-technical managers and/or HR to do the hiring. They are basically being told by the tech managers that the new hires need to know X technology. This and also, because the majority are too lazy to hire for intelligence, instead they fall back on the "We have X, so darn it, we need to hire somebody that knows X!" meme. The differences are actually not that hard to learn, for the people who frequent StackOverflow. I'm confident that anybody here can learn these fairly fast.
Even something as simple as an auto-incrementing primary key can be very different in Oracle, mysql, and SQL Server.
Some other important differences:
SQL Server makes a distinction between clustering key and primary key; other database do not. This choice comes with major performance implications.
SQL Server allows the SET #Total = Total = #Total + Amount syntax for fast computations of things like running totals. mysql lets you use a user variable in a similar way (I think). In other databases you'd probably have to use a correlated subquery. Huge difference in performance.
SQL Server can generate "sequential GUIDs" with newsequentialid. I'm not sure which other databases have this feature, but as with the above two points, there are significant performance implications to using a traditional GUID as opposed to a sequential or comb.
Oracle's CONNECT BY is a very useful and pretty unique syntax. Common Table Expressions in SQL Server and mysql are similar but not exactly the same.
Support for ranking/ordering functions varies vastly across different databases. I'm constantly posting answers here invoking ROW_NUMBER. A lot of queries are much harder to write without this - but at the same time, abusing it can hurt performance.
XML support is all over the map. Most databases have reasonably good support for it now, but both syntax and semantics are completely different on every platform.
Date/time handling can be quite different. Oracle has several different date/time-related types, some including time zone information. In general, Oracle is way better than other databases at managing temporal data, and has several features that you will miss if you switch. Until recently, Microsoft didn't have the date and time types, just datetime, which was much harder to normalize.
Spatial types are different and/or nonexistent in different databases. mysql exposes an entire OpenGIS model; Microsoft's support is a bit more basic but still competent. Oracle has it, but it's a little hard to find information on, and it's some sort of optional add-on. I think DB2 is starting to get it, but support is still a little spotty.
mysql actually lets you choose how to store an index (i.e. btree or hash). This is also an important performance consideration.
SQL Server allows you to INCLUDE columns in an index - very important for performance.
Oracle allows you to create function-based indexes, bitmap indexes, and so on. These can be pretty difficult to wrap your head around.
Oracle can perform "skip seeks" in very specific situations, something that I don't believe is supported in other databases (yet). This might factor into how you order index columns.
SQL Server has CLR types/functions/aggregates. Obviously not supported in any other database product.
Trigger support varies significantly. SQL Server has AFTER and INSTEAD OF. mysql has BEFORE and AFTER. Oracle has all of those and more. These all behave quite differently.
I'm sure that there are many, many more differences, but that should give you at least a basic idea of why 5 years of experience with Oracle is completely different from 5 years of experience with SQL Server.
That databases are encoded collections of assertions of fact.
That the logical structure of the tables corresponds to the syntactical structure of those "assertions of fact".
That Normalization theory helps you find the most optimal logical structure of the database, by minimizing redundancy, i.e. minimizing the possibility for contradictions in said assertions of fact to occur.
That database constraints are really nothing else than business rules, expressed in a formal way and in terms of the components of the database.
That really every and any business rule can be expressed as a database constraint.
That therefore, it is possible for the DBMS to enforce any and every business rule you can imagine.
That there is a very important difference between logical design and physical design.
That SQL and SQL systems are, eurhm, not really helpful (and that's putting it mildly), in supporting developers to recognise this important distinction.
That SQL and SQL systems are, eurhm, significantly deficient (and that's putting it mildly), in their support for database constraints.
That these latter two examples are a very good illustration of the importance of the difference between a model (Codd's RM) and its implementation (some particular SQL system). As far as relational database technology is concerned, the latters deviate ever more propostrously from the former.
And whatever else I forgot to remember.

What database to use for big data storage and manipulation?

I have to make a decision of which database server to use for my next project, but the simple decision to use MySQL like almost all the projects I did is harder now, because I expect very much records.
The database will store a user list, some other irrelevant tables, and the last one, some user-collected data. Let's say, if I have 6000 users responding to a quiz about each other. Simple math shows that from those users, if each one completes the quiz about everyone (and in my project that is 99% sure that will happen) I'll end up with 35.99million records(they will exclude themselves and in this particular situation the operation is 6000*5999). Unfortunately 6000 maybe is a small number, the real one growing day by day.
What to choose? MySQL and maybe if things go well and the project grows to expand it in a cluster? PostgreSQL, MSSQL? Oracle?
I've read about all of them, each one has it's pros and cons, but still don't know what to choose. The advantage of MySQL and PostgreSQL is of course, the starting price of $0 which is pretty nice in a usual self-funded startup.
Any opinions, pieces of advice? If you encountered this situation in your experience as developers, I'd love to hear from you.
These days, free isn't something that differenciates between databases any more. Both Oracle and SQL Server have free versions, but the limitations is resources - 4 GB database, RAM & single CPU utilization. Millions of records is not a concern - it's what datatypes you're using.
I saw the OPs comment about not liking MS software - that's your prerogative, but using the free versions of either Oracle or SQL Server do benefit from seamless transition to upscale versions of the respective database.
Personally, my choice would be either Oracle or SQL Server because of IMHO, real feature considerations like hierarchical query support, subquery factoring/CTE, packages (long before I get concerned with functions/procedures), full text searching, xml support, etc.
MySQL will handle 35 million records no problem. Worry about scalability when you get there. You can easily add raid hard disks backing your database tables, and if you really start getting big you can get a compellant SAN that will scream... Don't worry about the DB engine as much as the underlying hardware.. MySQL rocks for us with millions of records.
I've had no problems handling tables as large as 36,000,000 rows on MySQL and Oracle.
Just be sure that you index the proper columns, run EXPLAINs for your queries, and maintain proper design principles.
Most of the truly large scale web properties use a distributed key-value store. That said, 35 million is large, but not that large. With most modern databases, your main two scaling worries should be throughput and what happens when no single box can contain your entire database anymore. And both of these problems can be solved to some degree for any database you choose to use. (Caching, replication, sharding, etc.)
Use MySQL until you can't anymore. At that point, you ought to be rolling in dough anyways and you now have a very desirable problem.
Use MySQL as it's free and you have experience with it.
Besides in my opinion it matters more on how you design the tables than which database you use.
35 million records can be easily handled by MS SQL Server (assuming proper database design, indices, etc.). You can start with the free SQL Server Express edition and later, if you need, you can upgrade to the full version which supports clustering, etc.
SQL Server Express does have some limitations - single CPU, 1 GB memory, max 4 GB database size and a few other things. I'm not sure how quickly these limitations will become a problem but you can always move to the full version when you run into them.
MySQL(i) & Postgre
0$ of costs
large community
many tutorials
well documentated
MSSQL
You can get "money" from MS if you promote that you are using MSSQL (secret information from some companies I worked for)
MS tools work very well
Complete tool set from C# IDE over .NET lib to Windows Server 2003
Oracle
Professional and commercial provider
Used by many large companies (I also heard about Blizzard (World of Warcraft) using Oracle)
- expensive
The final decision depends on the very special requirements of your project.
Make yourself a quick list of things , that ARE IMPORTANT for your project (e.g. quick performed queries) and look up which Database pros are matching the most to your requirements.
Everything is about design. SQL Database are some kind of cars, you just have to know which component has to be placed here and which there.
Make a clear design and you won't struggle with any of them.
May be you can test Firebird
Blog post about big Firebird database here
MySQL licence is here (not allways free).
Postgresql and Firebird are free.
First of all, don't think about performance. Premature optimization being the root of all evil and all that. You can always throw more hardware and/or tuning at it later.
All of the mentioned should perform nicely if tuned/maintained correctly. I'd focus on manageability and familiarity. IMHO open source databases excels on manageability (perhaps not the best GUIs, but the CLI has been my home for a long long time).
And if the database becomes the bottleneck, why limit yourself to those choices? How about a key-value distributed database? Or perhaps serialize data directly to disk? Storing data outside of a RDBMS, while often frowned upon, might be the correct path. Or simply use the common route of denormalization.
Always remember not to optimize prematurely.
As far as opinions go (since you specifically asked for it) I favor open source databases, specifically PostgreSQL. It's rock solid, fast and very well-featured. And even with (relatively) large datasets it has performed superbly on mediocre hardware (some tuning involved, of course, but you can't skip that step no matter which db you end up choosing).

Swapping out databases?

It seems like the goal of a lot of ORM tools and custom data access layers (DAO pattern, etc.) is to abstract the database to the point where you could supposedly swap out the entire database system with minimal work.
Following the common DAL patterns is usually a good idea in code, but it seems like it would never be minimal work to swap out a database. (Cost, training, data migration, etc.)
Does anyone have any experience with swapping out one database for another in a large system, and dealing with the implications in code? Is it worth it to worry about abstracting the actual database from your code?
Question 1: Does anyone have any experience with
swapping out one database for another
in a large system, and dealing with
the implications in code?
Yes we tried it. Our customer is using a large MS Access based Delphi client server application. After about five years we considered switching to SQL Server. We analyzed the problem and concluded that swapping the database would be very costly and provide only a few advantages. Customer decided not to swap the database. The application is still running fine and the customer is still happy.
Note that:
MS Access is only being used for data storage and report generation.
The server application ensures that MS Access is only being accessed on the server. Normal multi-user MS Access applications will transfer large chunks of the Access database over the network - resulting in slow and unreliable database functionality. This is not the case for this application. Client <> Server <> MS Access. Only the server application communicates with the MS Access database. Actually the Server has exclusive access to the MS Access database. No other computer can open to the MS Access database. Conclusion: MS Access is being used as a true RDBMS, Relational DataBase Management System - please no flaming about MS Access being inferior and unstable - it has been running fine for more than 10 years.
The most important issues you will have to consider:
SQL statements: (SELECT, UPDATE, DELETE, INSERT, CREATE TABLE) and make sure they would be compatible with the SQL database. It's amazing how much all the RDBMS differ in the details (date formats, number formats, search formats, string formats, join syntax, create table syntax, stored procedures, user defined functions, (auto) primary keys, etc.)
Report generation: Depending on your database you might be using a different reporting tool. Our customer has over 200 complex reports. Converting all these reports is very time consuming.
Performance: all RDBMS have different performances in different environments. Normally performance optimalisations are very much RDBMS dependent.
Costs: the costs of tools, developers, server and user licenses varies greatly. It ranges from free to very expensive. Free does not mean cheap and expensive does not always equate to good. A cost/value comparison will have to be made.
Experience: making the best use of your RDBMS requires experience. If you have to develop for an "unknown" RDBMS your productivity will suffer.
Question 2: Is it worth it to worry about
abstracting the actual database from
your code?
Yes. In an ideal world, swapping a database would just be adjusting the data connection string. In the real world this is not possible because all databases are different. They all have tables and SQL support but the differences are in the details. If you can keep the differences of the databases shielded through abstraction - please do so. Make a list of the databases you need to support. Check the selected database systems for the differences. Provide centralized code to handle the differences. Support one RDBMS and provide stubs for future support of other RDBMS.
I disagree that the purpose is to be able to swap out databases, and I think you are correct in showing some suspicion about ORMs leading towards that goal.
However, I would still use an ORM, as it abstracts away the details of data access. Isn't this the goal of object oriented programming? Keep your concerns separated.
I think the primary use case for database abstraction (via ORM tools) is to be able to ship a product that works with multiple database brands. I believe it's a rarer occurrence for a company to switch between database vendors, but that's still one of the use cases.
I've worked jobs where we started out using MySQL for monetary reasons (think a startup) and, one we started making money, wanted to switch to Oracle. We didn't end up making the switch, but it was nice to have the option.
Still, ORM tools are not a completely leak-less abstractions and I know our migration still would have been painful and costly. It totally depends on what you are building, but it has been my experience that -- for performance reasons, usually -- you end up either working around your ORM solution or exploiting vendor-specific features at some point.
The only time I've seen a database switch was from HSQL during early development to Oracle as the project progressed. The ORM made this easy.
I often use the DAO pattern to swap out data services (from a database to web service or to swap a web service to a test stub).
For ORM I don't think the goal is to enable you to switch databases - it is to hide you from the complexities of different database implementations and removing the need to worry about the fine details of translating from relational to object represenations of your data.
By having someone smart write an ORM that handles caching, only updates fields that have changed, groups updates, etc I don't need to. Although in the cases where I need something special I can still revert to SQL if I want.

Implementing functionality/code directly in database system

RDBMS packages today offer a tremendous amount of functionality beyond standard data storage and retrieval. SQL Server for example can send emails, expose web service methods, and execute CLR code amongst other capabilities. However, I have always tried to limit the amount of processing my database server does to just data storage and retrieval as much as possible, for the following reasons:
A database server is harder to scale than web servers
In a lot of projects I've worked on, the DB server is a lot busier than the web servers, and thus has less spare capacity
It potentially exposes your database server to a security attack (web services for example)
My question is, how do you decide how much functionality or code should be implemented directly on your database server versus other servers in your architecture? What recommendations do you have for people starting new projects?
I know Microsoft SQL Server and Oracle really push using stored procedures for everything, which helps to encapsulate the relational architecture and creates a more procedural interface for the software developers, who typically aren't as facile writing SQL queries.
But then half your application logic is written in PL/SQL (or T-SQL or whatever) and the other half is written in your application language, Java or PHP or C#, etc. The DBA is typically responsible for coding the procedures, and the developers are responsible for everything else. No one has visibility and access to the full application logic. This tends to slow down development, testing, and future revisions to the project.
Also software development tools tend to be poor for stored procedures. Tools and best practices for debugging, source control, and testing all seem to be about 10-15 years behind the state of the art for application languages.
So I tend to stay away from stored procedures and triggers if at all possible. Except in certain cases when a well-placed stored procedure can make a complex SQL operation happen entirely in the server instead of shuffling data back and forth. This can be very effective at eliminating performance bottlenecks.
It's possible to go too far in the other direction as well. People who prefer the application manage data versus metadata, and employ designs like Entity-Attribute-Value or Polymorphic Associations, get themselves into trouble. Let the database manage that. Use referential integrity constraints (foreign keys). Use transactions.
The vendors have one set of best practices. You, however, voice concerns with that.
Years ago I supported a Major Software Product. Major.
They said "The database is relational storage. Nothing more." Every user conference people would ask about stored procedure, triggers, and all that malarkey.
Their architect was firm. As soon as you get away from plain-old-SQL, you've got a support and maintenance nightmare. They did object-relational mapping from the DB into their product, and everything else was in their product.
This scales well. Multiple application servers can easily share a single database server.

When is it time to change database backends?

Is there a general rule of thumb to follow when storing web application data to know what database backend should be used? Is the number of hits per day, number of rows of data, or other metrics that I should consider when choosing?
My initial idea is that the order for this would look something like the following (but not necessarily, which is why I'm asking the question).
Flat Files
BDB
SQLite
MySQL
PostgreSQL
SQL Server
Oracle
It's not quite that easy. The only general rule of thumb is that you should look for another solution when the current one can't keep up anymore. That could include using different software (not necessarily in any globally fixed order), hardware or architecture.
You will probably get a lot more benefit out of caching data using something like memcached than switching to another random storage backend.
If you think you are going to ever need one of the heavyweights (SqlServer, Oracle), you should start with one of those at the beginning. Data migrations are extremely difficult. In the long run it will cost you less to just start at the top and stay there.
I think you're being overly specific in your rankings. You can pretty much start with flat files and the like for very small data sets, go up to something like DBM for slightly bigger ones that don't require SQL-like syntax, and go to some kind of SQL database after that.
But who wants to do all that rewriting? If the application will benefit from access to joins, stored procedures, triggers, foreign key validation, and the like--just use a SQL database regardless of the dataset size.
Which one should depend more on the client's existing installations and what DBA skills are available than on the amount of data you're holding.
In other words, the size of your database is far from the only consideration, and maybe not the most important one.
There is no blanket answer to this, but ALMOST always, using flat files is not a good idea. You have to parse through them (i suppose) and they do not scale well. Starting with a proper database, like Oracle or SQL Server (or MySQL, Postgres if you are looking for free options) is a good idea. For very little overhead, you will save yourself a lot of effort and headache later on. They also allow you to structure your data in a non-stupid fashion, leaving you free to think of WHAT you will do with the data rather than HOW you will be getting it in/out.
It really depends on your data, and how you intend to use it. At one of my previous positions, we used Postgres due to the native geo-location and timezone extensions which existed because it allowed us to manage our data using polygonal datatypes. For us, we needed to do that, and we also wanted to use stored procedures, views and the like.
Now, another place I worked at used MySQL simply because the data was normalized, standard row by row data.
SQL Server, for a long time, had a 4gb database limit (see SQL Server 2000), but despite that limitation it remains a very stable platform for small to medium applications for which the old data is purged.
Now, from working with Oracle and SQL Server 05/08, all I can tell you is that if you want the creme of the crop for stability, scalability and flexibility, then these two are your best bet. For enterprise applications, I strongly recommend them (merely because that's what we use where I work now).
Other things to consider:
Language integration (ASP.NET session storage, role management, etc.)
Query types (Select, Update, Delete) [Although this is more of a schema design issue, not a DBMS issue)
Data storage requirements
Your application's utilization of the database is the most critical ones. Mainly what queries are used most often (SELECT, INSERT or UPDATE)?
Say if you use SQLite, it is gears for smaller application but for "web" application you might a bigger one like MySQL or SQL Server.
The way you write scripts and your web application platforms also matters. If you're developing on a Microsoft platform, then SQL Server is a better alternative.
Typically, I go with what is commonly accepted by whichever framework I am using. So, if I'm doing .NET => SQL Server, Python (via Django or Pylons) => MySQL or SQLite.
I almost never use flat files though.
There is more to choosing an RDBMS solution that just "back end horsepower". The ability to have commitment control, for example, so you can roll back a failed transaction is one. reason.
Unless you are in the megatransaction rate application, most database engines would be adequate - so it becomes a question of how much you want to pay for the software, whether it runs on the hardware and operating system environment you want, and what expertise you have in managing that software.
That progression sounds painful. If you're going to include MS products (especially the for-pay SQL Server) in there anywhere, you may as well use the whole stack, since you only have to pay for the last of these:
SQL Server Compact -> SQL Server Express -> SQL Server Enterprise (clustered).
If you target your app at SQL Server Compact initially, all your SQL code is guaranteed to scale up to the next version without modification. If you get bigger than SQL Server Enterprise, then congratulations. That's what they call a good problem to have.
Also: go back and check the SO podcasts. I believe they talked about this briefly.
This question depends on your situation really.
If you have control over the server you're deploying to and you can install whatever services you need, then the time to install a MySql or MSSQL Express server and code against an existing database framework VERSUS coding against flat file structure is not worth the effort of considering.
What about FireBird? Where would that fit into that list?
And lets not forget the requirements that the "customer" of your solution must also have in place. If your writing a commercial application for a small companies, then Oracle might not be a good choice... but if your writing a customized solution for a large enterprise which must share data among multiple campuses, and has a good sized IT department then the decision of Oracle vs Sql Server would come down to what does the customer most likely already have deployed.
Data migration nowdays isn't that bad since we have those great tools from Embarcadero, so I would instead let the customer needs drive the decision.
If you have the option SQL Server is a good choice from the word go, predominantly because you have access to solid procedures and functions and the database backup facilities are totally reliable. Wrapping up as much as your logic as you can inside the database itself (rather than in whatever language you are using) helps security and performance - indeed there's an good argument to be made for always using procedures for insert/update logic as these make you invulnerable to injection attacks.
If I have the choice the only time I'd consider MySQL in preference is with a large, fairly simple, database predominantly used for read access. This isn't to decry MySQL which has improved markedly of late and I happily use if I don't have the choice, but for more complex systems with update/insert activity MSSQL is generally the superior option.
I think your list is subjective but I will play your game.
Flat Files
BDB
SQLite
MySQL
PostgreSQL
SQL Server
Oracle
Teradata

Resources