What do I need to know about databases? - database

In general, I think I do alright when it comes to coding in programming languages, but I think I'm missing something huge when it comes to databases.
I see job ads requesting knowledge of MySQL, MSSQL, Oracle, etc. but I'm at a loss to determine what the differences would be.
You see, like so many new programmers, I tend to treat my databases as a dumping ground for data. Most of what I do comes down to relatively simple SQL (INSERT this, SELECT that, DELETE this_other_thing), which is mostly independent of the engine I'm using (with minor exceptions, of course, mostly minor tweaks for syntax).
Could someone explain some common use cases for databases where the specific platform comes into play?
I'm sure things like stored procedures are a big one, but (a) these are mostly written in a specific language (T-SQL, etc) which would be a different job ad requirement than the specific RDBMS itself, and (b) I've heard from various sources that stored procedures are on their way out and that in a lot of cases they shouldn't be used now anyway. I believe Jeff Atwood is a member of this camp.
Thanks.
The above concepts do not vary much for MySQL, SQL Server, Oracle, etc.
With this question, I'm mostly trying to determine the important difference between these. I.e. why would a job ad demand n years experience with MySQL when most common use cases are relatively stable across RDBMS platforms.
CRUD statements, joins, indexes.. all of these are relatively straightforward within the confines of a certain engine. The concepts are easily transferable if you know a different RDBMS.
What I'm looking for are the specifics which would cause an employer to specify a specific engine rather than "experience using common database engines."

I believe that the essential knowledge about databases should be:
What database are for?
Basic CRUD Operations
SELECT queries with JOINs
Normalization
Basic Indexing
Referential Integrity with Foreign Key Constraints
Basic Check Constraints
The above concepts do not vary much between MySQL, SQL Server, Oracle, Postgres, and other relational database systems. However you'd find a different set of concepts for the now-popular NoSQL databases, such as CouchDB, MongoDB, SimpleDB, Cassandra, Bigtable, and many others.

After the CRUD statements, to be an effective DB programmer I think some of the most important things to understand are JOIN statements. Understand the difference between LEFT and RIGHT, OUTER and INNER joins, and know when to use each. Most importantly, know what the database is actually constructing when it performs a JOIN.
For me, the Wikipedia article was very helpful.
Also, indexing is very important - this is how relational databases can perform fast queries. Understand how to use them and what happens under the hood.
Wikipedia article on DB indexing.
You should also know how to construct a many-to-one relationship (using foreign keys) and a many-to-many relationship (using join tables).
I know that in your question you're asking about specific DB implementations, but if you're to be taken literally and you only know about SELECT, INSERT, UPDATE, and DELETE, then the above concepts will be far more valuable than learning the intricacies of a particular implementation.

It's not just stored procs and functions. Each database has fundamental differences and quirks that are important to understand even though SQL works more or less the same.
Examples:
Oracle and MySQL handle locking differently, in different situations.
Oracle doesn't have autoincrementing primary keys like MySQL and SQL Server.
Subtle vendor-specific behavior, like the way Oracle does sorting for VARCHARs differently depending on locale.
If you really want to improve your applications, you eventually have to become familiar with the details about how your specific database works. Most of the time it doesn't make a lot of difference, but when it does matter, it usually makes a big difference, especially when it comes to performance.

Some things which seem to come up when talking with my Database-keen colleagues:
Row vs page vs table locking escalation when doing multiple complex joins, implies sometimes doing very different things on different vendors dbs. This is where the theory is really hitting the tarmac and often it is non-intuitive.
Differences between how cursors are best used on different vendor db implementations
Odd stuff in the stored proc language variants, like how best to handle failure cases
Differences in how temporary tables and views are best used depending on the underlying implementations.
All of these kind of things don't really matter until you are trying to solve something that either has to
- Run very fast
- Contain lots and lots of data
- Gets very big and complex (i.e. multiple queries hitting same tables simultaneously)
These are the kinds of things that DBAs should be helping with, so depends on if you are aiming to be a DBA or a programmer. None of the above have really hurt me yet, because I've not worked on db-intensive systems, but I've worked near a few, and the programmers on those end up knowing a lot about the internals, restrictions, and good features about the specific database they are using.
Best way to get knowledge like that (other than on the job) is to read the manuals or hang out with people that already know and ask them about it.

Don't forget relation schemas, Primary and foreign keys and how they are related. To start with DB, I would use MySql and MSSQL as these are most common in the market. I take Oracle as more advanced and complex db

As for the level of differences there are between vendors, it is because SQL is a standard (http://en.wikipedia.org/wiki/SQL#Standardization) and vendors implement that std differently.
Each of these vendors try to offer extras to have the crowd by their side... that's why you see functions available to one and not the other. But sometimes that function make its way into the standard so its not always a bad thing.
For stored proc. I would agree as ORMs and practices of today tend to do a greater separation of concerns by removing business logic from the database and considering it "only" a repository.
My 2 cents

I see job ads requesting knowledge of MySQL, MSSQL, Oracle, etc. but I'm at a loss to determine what the differences would be.
I'm what's called a SQL Developer. You won't see the differences much when you are doing run of the mill database work (CRUD). However the differences become quite apparent when you are dealing with the databases own brand of SQL.
When talking SQL outside of the standards, there are 4 distinctive types of commands. These are:
Data Manipulation Language (DML)
Data Definition Language (DDL)
Data Control Language (DCL)
Transactional Control Language (TCL)
The biggest differences come in the last two, DCL and TCL. Those have a LOT of database specific non-standard SQL commands. The first two, DML and DDL, are very similar across any database that use the relational model.
Also the bigger database vendors have nicknamed their SQL implementation. Here's a short sample:
SQL Server : T-SQL
Oracle : PL-SQL
PostgreSQL : P-SQL or NG-SQL
Firebird : IB-SQL
MySQL : mSQL
The list goes on, but you get the point. Wikipedia has good articles on the different command acronyms.
I have found that most employers won't be able to articulate this, because most will use non-technical managers and/or HR to do the hiring. They are basically being told by the tech managers that the new hires need to know X technology. This and also, because the majority are too lazy to hire for intelligence, instead they fall back on the "We have X, so darn it, we need to hire somebody that knows X!" meme. The differences are actually not that hard to learn, for the people who frequent StackOverflow. I'm confident that anybody here can learn these fairly fast.

Even something as simple as an auto-incrementing primary key can be very different in Oracle, mysql, and SQL Server.
Some other important differences:
SQL Server makes a distinction between clustering key and primary key; other database do not. This choice comes with major performance implications.
SQL Server allows the SET #Total = Total = #Total + Amount syntax for fast computations of things like running totals. mysql lets you use a user variable in a similar way (I think). In other databases you'd probably have to use a correlated subquery. Huge difference in performance.
SQL Server can generate "sequential GUIDs" with newsequentialid. I'm not sure which other databases have this feature, but as with the above two points, there are significant performance implications to using a traditional GUID as opposed to a sequential or comb.
Oracle's CONNECT BY is a very useful and pretty unique syntax. Common Table Expressions in SQL Server and mysql are similar but not exactly the same.
Support for ranking/ordering functions varies vastly across different databases. I'm constantly posting answers here invoking ROW_NUMBER. A lot of queries are much harder to write without this - but at the same time, abusing it can hurt performance.
XML support is all over the map. Most databases have reasonably good support for it now, but both syntax and semantics are completely different on every platform.
Date/time handling can be quite different. Oracle has several different date/time-related types, some including time zone information. In general, Oracle is way better than other databases at managing temporal data, and has several features that you will miss if you switch. Until recently, Microsoft didn't have the date and time types, just datetime, which was much harder to normalize.
Spatial types are different and/or nonexistent in different databases. mysql exposes an entire OpenGIS model; Microsoft's support is a bit more basic but still competent. Oracle has it, but it's a little hard to find information on, and it's some sort of optional add-on. I think DB2 is starting to get it, but support is still a little spotty.
mysql actually lets you choose how to store an index (i.e. btree or hash). This is also an important performance consideration.
SQL Server allows you to INCLUDE columns in an index - very important for performance.
Oracle allows you to create function-based indexes, bitmap indexes, and so on. These can be pretty difficult to wrap your head around.
Oracle can perform "skip seeks" in very specific situations, something that I don't believe is supported in other databases (yet). This might factor into how you order index columns.
SQL Server has CLR types/functions/aggregates. Obviously not supported in any other database product.
Trigger support varies significantly. SQL Server has AFTER and INSTEAD OF. mysql has BEFORE and AFTER. Oracle has all of those and more. These all behave quite differently.
I'm sure that there are many, many more differences, but that should give you at least a basic idea of why 5 years of experience with Oracle is completely different from 5 years of experience with SQL Server.

That databases are encoded collections of assertions of fact.
That the logical structure of the tables corresponds to the syntactical structure of those "assertions of fact".
That Normalization theory helps you find the most optimal logical structure of the database, by minimizing redundancy, i.e. minimizing the possibility for contradictions in said assertions of fact to occur.
That database constraints are really nothing else than business rules, expressed in a formal way and in terms of the components of the database.
That really every and any business rule can be expressed as a database constraint.
That therefore, it is possible for the DBMS to enforce any and every business rule you can imagine.
That there is a very important difference between logical design and physical design.
That SQL and SQL systems are, eurhm, not really helpful (and that's putting it mildly), in supporting developers to recognise this important distinction.
That SQL and SQL systems are, eurhm, significantly deficient (and that's putting it mildly), in their support for database constraints.
That these latter two examples are a very good illustration of the importance of the difference between a model (Codd's RM) and its implementation (some particular SQL system). As far as relational database technology is concerned, the latters deviate ever more propostrously from the former.
And whatever else I forgot to remember.

Related

MS SQL SERVER: sql queries compared to general programming

i'm just starting out with ms sql server (2008 express) and spent the last hour or so trying out some basic queries. how does sql scripting (correct term?) compare to general computer programming? is writing sql queries as involved, lengthy to learn, needs as much practice, etc. as software development? is writing advanced sql queries comparable to software development, at least to some degree, and if so could you explain that?
also, i found a couple of tutorial and reference sites for learning sql but could you also recommend some other sites.
also(2), i was playing around with the sql query designer in msSQL and it seems like a good tool to learn writing sql commands, but is there a way to use the designer to INSERT data. it seems that it's only for SELECT ing data.
any other general comments for learning, better understanding, etc SQL would be appreciated. thanks
First at all, SQL is more about databases and less about programming, in that way that you you cannot just succeed by "writing good queries": you must also know how to structure your data, make optimized tables, choose appropriate types, etc. You can spend a day thinking about how your data will be stored, without really writing any queries. SQL is not a way to solve an abstract problem, but a way to store and retrieve data efficiently and safely. For example, making maintenance and backup plans is purely a DBA job, and has nothing to do with SQL queries.
Is it lengthy to learn? Well, here, it is quite similar to general development. Basic SQL syntax is pretty simple, so after reading the first page of SQL book, you will probably be able to insert, retrieve and remove data. But to master SQL and database stuff, you must be ready to spend years and years of practice. Just like CSS: writing CSS is easy. Mastering it is hard.
Some advices:
Take in account security.
You communicate with SQL Server by sending strings, and the server must interpret them. The big mistake is to let the end user participate in building your queries: it leads to security leaks, with the ability for the hacker to do whatever he want with your data (it's called SQL Injection). It's just like letting everyone write any source code they want, and execute it on your machine. In practice, nobody let a third person write arbitrary code on her machine, but plenty of developers forget to sanitize user input before querying the database.
Use parametrized queries and stored procedures.
You may want to consider as soon as possible using parametrized queries. It will increase security, optimize performance and force you somehow to write better queries (even if it is debatable).
After learning SQL for a few weeks or months, you may also want to learn what are stored procedures and how to use them. They have their strong points, but don't make an error I made when I started learning SQL: do not decide to use stored procedures everywhere.
Use frameworks.
If you are a .NET developer, learn to use Linq-to-SQL. If you already used Linq on .NET objects (lists, collections, etc.), it can be very helpful to figure out how to do some queries. By the way, remember you can use Linq queries and see how Linq transforms them into SQL queries.
Keep in mind that using a framework or an abstraction layer will not make you a database guru. Use it as a helpful tool, but do sometimes SQL stuff yourself. Yes, it can free you from writing SQL queries by hand, even on large-scale projects (for example, StackOverflow uses Linq-to-SQL). But soon or later, you will either need to work on a project which does not use Linq, or you will see some possible limitations of Linq versus plain SQL.
Borrow a book.
Seriously, if you want to learn stuff, buy or borrow a book. Tutorials will explain you how to do a precise thing, but you will lose an opportunity to learn something you never thought about. For example, database partitioning or mirroring is something you must know if you want to work as a DBA. Any book about databases will talk about partitioning; on the other hand, there are few tutorials which will lead you to this subject by themselves.
Test, evaluate, profile.
SQL is about optimized queries. Anybody can write a select statement, but many people will write it in a non-optimized form.
If you are dealing with a few kilobytes database which have a maximum of hundred records, all your queries will perform well, but when the things will scale up, you will notice that a simple select query spends three seconds on a few billions of rows database instead of a few milliseconds.
To learn how to write optimized queries and create optimized databases, try to work on large sets. AdventureWorks demo database from Microsoft is a good start point, but you may also need sometimes to fill the database with random stuff just to have enough data to measure performance correctly.
Use Microsoft SQL Profiler (included in SQL Server 2008 Enterprise). It will help you to know what the server is really doing and how fast, and to find bottlenecks and poorly-written queries.
Learn from others.
Reading a book is a good point to start, but is not enough. Read the stuff on StackOverflow, especially the questions related to developers doing DBA work. For example, see Database Development Mistakes Made by App Developers question, and return reading the answers from time to time while learning SQL.
If you have any precise question (a query which does not produce what you expected, a strange performance issue, etc.), feel free to ask it on StackOverflow. The community is great, and there are plenty of people who know extremely well the stuff.
Sometimes, talking to DBA in your company (if there is one) can be also an opportunity to learn things.
is there a way to use the designer to INSERT data. it seems that it's only for SELECT ing data
If I remember well, query designed in Visual Studio let you build insert statements too. But maybe I'm wrong. In all cases, you can use Microsoft SQL Management Studio (included with Microsoft SQL 2008 Enterprise), which let you see how to build some cool queries (right-click on an element in Object Explorer, than use "Script database as..." menu).
I think you'll find that they key issue is that SQL is declarative, unlike most computing languages you're likely familiar with. This is fundamental. Grab any computer science text book and start there.
SQL is no more or less difficult than anything else in my view. Historically it was an area which people would tend to specialize in, but that was a consequence of the technology available at the time. It's now more accessible and the tools are significantly better, so expertise is generally spread more widely now.
It is different, SQL programming is quite restricted, when writing complex logic you might find it cumbersome with its limited programming options, unclear code as there is no modular programming, and bad implementations of stuff like cursors.
I read somewhere on SO, that database is not for coding, its only for storing data and querying. Its well said in some sense.
What I believe important to learn in that area is first of knowing all the features available in the db so that you make it use efficiently. Secondly improve querying/analytical skills.
Basic SQL features can be learnt from w3schools(joins , grouping etc)
Advance db features can be learnt from your dbms certification exam book. (the most basic certification exam be it oracle/sql server)
Analytical skills and some fun - puzzles by Joe Celko

What should I keep in mind if I wish to merge many DBs into one DB?

I am working with a half dozen DBs. The DBs all have the same schemas, the same SPs, etc. Speaking to the person who originally designed the DBs, a big part of the motivation for using many DBs was efficiency; the alternative would be to add a column to pretty much every table and sp in the database indicating which set of data was being worked in, resulting in one giant (and thus slower) DB instead of several samll DBs. In place of having a column to indicate which set of data is being queried, the connection string is used to select which database is being hit.
The only reason I really dislike this organization is that it involves a lot of code duplication and thus hurts maintenance. For example, every time I wish to change a stored procedure, I need to run the alter statement on every database.
One solution I have considered is to combine all of the data into one big database, adding an extra column all over the place to indicate which database the data would be in if I had not combined it. Then, I could partition all of the tables by this column's value. In theory, the result of all of this is that the underlying representation of all of the data itself will be morally the same as it is now, but without the redundancies in the indexes, schemas, SPs, etc.
My questions are this:
Is this a good idea? Is there a better way to accomplish this?
Are there any gotchas in doing this?
Will this have any impact on performance?
Everyone will deal with this at some point. My own personal opinion is that multiple databases are a pain in the backside and are not faster. They are a pain because of the maintenance headaches. Adding an extra column in each table as necessary will not slow your process done that much, if indexing is set properly. And your maintenance will be much easier. Plus, doing transactions across multiple DB's can be a hassle and involve MTC.
BTW, using a single database is often called a multi-tenant database. You might want to research this a bit. But I would avoid multiple DB's like this if possible.
I'm of a different mind than Randy.
The multi-tenant model has its advantages.
For one, maintenance is not really much different whether you have 5 databases or 500. At some point you stop looking at maintenance of individual databases and look at the set. Yes you must serialize backups and you can't be performing index reorg/rebuild across all databases at once.
But for code changes across multiple more-or-less identical databases, there are easy ways to script a lot of things to be done to multiple databases without really lifting an extra finger. I use a tool called SQLFarms Combine (now sold by JNetDirect), but there are other offerings such as RedGate MultiScript that I haven't played with.
What I like most about the multi-tenant model is that when you grow and scale and suddenly need a new database server, it is very easy to move one of the tenants (say, the busiest or fastest growing) to the new server. If everybody is jammed into the same database, this extraction of only their data becomes quite difficult, especially if there is to be minimized downtime. In the multi-tenant model, you can set up mirroring for just their database, and then switch the primary when you're ready.
I'd be in favor of combining these databases. There are other facilities built into SQL Server to account for the potential performance downfalls of a very large database, like additional indexing on a second physical disk, partitioning, clustering, etc. The headache and overhead involved in deploying schema updates to that many different databases can be time consuming when it's easily handled in a single database. I think SQL Server scales really well in cases like this - let the database server do what it's designed to do and provide responsive access to your data. You can focus on application design and leave the storage model to SQL Server.
Also, though this isn't mentioned above, I'd suspect that there's some level of dynamic SQL involved in the applications that use this "many database" model because you've got to switch between databases based on something you know, so it can't be hard coded into the application or in a configuration file, meaning that either connection strings or actual SQL statements have to be generated on the fly, and that can be a really big security risk (read about "SQL Injection" if you're unfamiliar with the potential risks of dynamic SQL).

What database to use for big data storage and manipulation?

I have to make a decision of which database server to use for my next project, but the simple decision to use MySQL like almost all the projects I did is harder now, because I expect very much records.
The database will store a user list, some other irrelevant tables, and the last one, some user-collected data. Let's say, if I have 6000 users responding to a quiz about each other. Simple math shows that from those users, if each one completes the quiz about everyone (and in my project that is 99% sure that will happen) I'll end up with 35.99million records(they will exclude themselves and in this particular situation the operation is 6000*5999). Unfortunately 6000 maybe is a small number, the real one growing day by day.
What to choose? MySQL and maybe if things go well and the project grows to expand it in a cluster? PostgreSQL, MSSQL? Oracle?
I've read about all of them, each one has it's pros and cons, but still don't know what to choose. The advantage of MySQL and PostgreSQL is of course, the starting price of $0 which is pretty nice in a usual self-funded startup.
Any opinions, pieces of advice? If you encountered this situation in your experience as developers, I'd love to hear from you.
These days, free isn't something that differenciates between databases any more. Both Oracle and SQL Server have free versions, but the limitations is resources - 4 GB database, RAM & single CPU utilization. Millions of records is not a concern - it's what datatypes you're using.
I saw the OPs comment about not liking MS software - that's your prerogative, but using the free versions of either Oracle or SQL Server do benefit from seamless transition to upscale versions of the respective database.
Personally, my choice would be either Oracle or SQL Server because of IMHO, real feature considerations like hierarchical query support, subquery factoring/CTE, packages (long before I get concerned with functions/procedures), full text searching, xml support, etc.
MySQL will handle 35 million records no problem. Worry about scalability when you get there. You can easily add raid hard disks backing your database tables, and if you really start getting big you can get a compellant SAN that will scream... Don't worry about the DB engine as much as the underlying hardware.. MySQL rocks for us with millions of records.
I've had no problems handling tables as large as 36,000,000 rows on MySQL and Oracle.
Just be sure that you index the proper columns, run EXPLAINs for your queries, and maintain proper design principles.
Most of the truly large scale web properties use a distributed key-value store. That said, 35 million is large, but not that large. With most modern databases, your main two scaling worries should be throughput and what happens when no single box can contain your entire database anymore. And both of these problems can be solved to some degree for any database you choose to use. (Caching, replication, sharding, etc.)
Use MySQL until you can't anymore. At that point, you ought to be rolling in dough anyways and you now have a very desirable problem.
Use MySQL as it's free and you have experience with it.
Besides in my opinion it matters more on how you design the tables than which database you use.
35 million records can be easily handled by MS SQL Server (assuming proper database design, indices, etc.). You can start with the free SQL Server Express edition and later, if you need, you can upgrade to the full version which supports clustering, etc.
SQL Server Express does have some limitations - single CPU, 1 GB memory, max 4 GB database size and a few other things. I'm not sure how quickly these limitations will become a problem but you can always move to the full version when you run into them.
MySQL(i) & Postgre
0$ of costs
large community
many tutorials
well documentated
MSSQL
You can get "money" from MS if you promote that you are using MSSQL (secret information from some companies I worked for)
MS tools work very well
Complete tool set from C# IDE over .NET lib to Windows Server 2003
Oracle
Professional and commercial provider
Used by many large companies (I also heard about Blizzard (World of Warcraft) using Oracle)
- expensive
The final decision depends on the very special requirements of your project.
Make yourself a quick list of things , that ARE IMPORTANT for your project (e.g. quick performed queries) and look up which Database pros are matching the most to your requirements.
Everything is about design. SQL Database are some kind of cars, you just have to know which component has to be placed here and which there.
Make a clear design and you won't struggle with any of them.
May be you can test Firebird
Blog post about big Firebird database here
MySQL licence is here (not allways free).
Postgresql and Firebird are free.
First of all, don't think about performance. Premature optimization being the root of all evil and all that. You can always throw more hardware and/or tuning at it later.
All of the mentioned should perform nicely if tuned/maintained correctly. I'd focus on manageability and familiarity. IMHO open source databases excels on manageability (perhaps not the best GUIs, but the CLI has been my home for a long long time).
And if the database becomes the bottleneck, why limit yourself to those choices? How about a key-value distributed database? Or perhaps serialize data directly to disk? Storing data outside of a RDBMS, while often frowned upon, might be the correct path. Or simply use the common route of denormalization.
Always remember not to optimize prematurely.
As far as opinions go (since you specifically asked for it) I favor open source databases, specifically PostgreSQL. It's rock solid, fast and very well-featured. And even with (relatively) large datasets it has performed superbly on mediocre hardware (some tuning involved, of course, but you can't skip that step no matter which db you end up choosing).

Swapping out databases?

It seems like the goal of a lot of ORM tools and custom data access layers (DAO pattern, etc.) is to abstract the database to the point where you could supposedly swap out the entire database system with minimal work.
Following the common DAL patterns is usually a good idea in code, but it seems like it would never be minimal work to swap out a database. (Cost, training, data migration, etc.)
Does anyone have any experience with swapping out one database for another in a large system, and dealing with the implications in code? Is it worth it to worry about abstracting the actual database from your code?
Question 1: Does anyone have any experience with
swapping out one database for another
in a large system, and dealing with
the implications in code?
Yes we tried it. Our customer is using a large MS Access based Delphi client server application. After about five years we considered switching to SQL Server. We analyzed the problem and concluded that swapping the database would be very costly and provide only a few advantages. Customer decided not to swap the database. The application is still running fine and the customer is still happy.
Note that:
MS Access is only being used for data storage and report generation.
The server application ensures that MS Access is only being accessed on the server. Normal multi-user MS Access applications will transfer large chunks of the Access database over the network - resulting in slow and unreliable database functionality. This is not the case for this application. Client <> Server <> MS Access. Only the server application communicates with the MS Access database. Actually the Server has exclusive access to the MS Access database. No other computer can open to the MS Access database. Conclusion: MS Access is being used as a true RDBMS, Relational DataBase Management System - please no flaming about MS Access being inferior and unstable - it has been running fine for more than 10 years.
The most important issues you will have to consider:
SQL statements: (SELECT, UPDATE, DELETE, INSERT, CREATE TABLE) and make sure they would be compatible with the SQL database. It's amazing how much all the RDBMS differ in the details (date formats, number formats, search formats, string formats, join syntax, create table syntax, stored procedures, user defined functions, (auto) primary keys, etc.)
Report generation: Depending on your database you might be using a different reporting tool. Our customer has over 200 complex reports. Converting all these reports is very time consuming.
Performance: all RDBMS have different performances in different environments. Normally performance optimalisations are very much RDBMS dependent.
Costs: the costs of tools, developers, server and user licenses varies greatly. It ranges from free to very expensive. Free does not mean cheap and expensive does not always equate to good. A cost/value comparison will have to be made.
Experience: making the best use of your RDBMS requires experience. If you have to develop for an "unknown" RDBMS your productivity will suffer.
Question 2: Is it worth it to worry about
abstracting the actual database from
your code?
Yes. In an ideal world, swapping a database would just be adjusting the data connection string. In the real world this is not possible because all databases are different. They all have tables and SQL support but the differences are in the details. If you can keep the differences of the databases shielded through abstraction - please do so. Make a list of the databases you need to support. Check the selected database systems for the differences. Provide centralized code to handle the differences. Support one RDBMS and provide stubs for future support of other RDBMS.
I disagree that the purpose is to be able to swap out databases, and I think you are correct in showing some suspicion about ORMs leading towards that goal.
However, I would still use an ORM, as it abstracts away the details of data access. Isn't this the goal of object oriented programming? Keep your concerns separated.
I think the primary use case for database abstraction (via ORM tools) is to be able to ship a product that works with multiple database brands. I believe it's a rarer occurrence for a company to switch between database vendors, but that's still one of the use cases.
I've worked jobs where we started out using MySQL for monetary reasons (think a startup) and, one we started making money, wanted to switch to Oracle. We didn't end up making the switch, but it was nice to have the option.
Still, ORM tools are not a completely leak-less abstractions and I know our migration still would have been painful and costly. It totally depends on what you are building, but it has been my experience that -- for performance reasons, usually -- you end up either working around your ORM solution or exploiting vendor-specific features at some point.
The only time I've seen a database switch was from HSQL during early development to Oracle as the project progressed. The ORM made this easy.
I often use the DAO pattern to swap out data services (from a database to web service or to swap a web service to a test stub).
For ORM I don't think the goal is to enable you to switch databases - it is to hide you from the complexities of different database implementations and removing the need to worry about the fine details of translating from relational to object represenations of your data.
By having someone smart write an ORM that handles caching, only updates fields that have changed, groups updates, etc I don't need to. Although in the cases where I need something special I can still revert to SQL if I want.

How much compatibility do the DB engines have at the SQL level?

Let's say I wanted to have an application that could easily switch the DB at the back-end.
I'm mostly thinking of SQL Server as the primary back-end, but with the flexibility to go another DB engine. Firebird and PostGreSQL seem to have (from my brief wikipedia excursion) the most in common w/ SQL Server (plus they are free).
How similar would the DB setup, access, queries, etc.. be for Firebird, PostGreSQL and MS SQL Server?
Unfortunately, SQL varies widely across providers. It's almost impossible to write all but the most trivial SQL to run on a number of RDBMS - and then you're into lowest common denominator territory. Far better to use an abstraction layer to handle at least the connection to the database (inc. access, sending queries), and either an ORM to handle the SQL itself or per-provider SQL.
If you want to look into how they vary - good examples are auto-incrementing ids and obtaining the ID of the last inserted record.
I worked on one project where it was an absolute requirement to support many databases, including at least Access, SQL Server and Oracle.
So I know that it can be done. Mostly DML (SELECT,UPDATE,INSERT...) is the same and certainly we didn't have huge problems making it work across all of the databases - just occasional annoyances. MySQL was the exception at that time as it simply wasn't capable enough.
We found most differences in the DDL, but with the right architecture (which we had), it wasn't difficult to fix this.
The only thing that caused us a problem was generating unique id's - autoincrement is non-standard. Fortuantely in a database of around 40 tables there were only a few places where unique ID's were need (good DB design). In the end we generate the unique ID in code, and handle any clashes (everything in transactions).
It did make things easier because we had avoided using autoincrement for ID fields, it's harder to think of unique keys - but better in the long run.
Well, CRUD stuff should be the same everywhere, but if you build anything complex, you'll probably want to use triggers and stored procedures and that's where compatibility becomes low. Writing a DBMS-agnostic application usually means moving most of the business logic outside of database, so having a 3-tier application is, IMHO, a must in such case.
Alternatively, you could use some wrapper library that works like abstraction layer, but I'm yet to see one that is able to do the job correctly over a range of DBMS-es. Of course, that is also dependent on the programming language you use.
As pointed out by the other answers, DBMS vary wildly once you go beyond basic SELECT/INSERT stuff (and sometimes even there).
We also have to maintain compatibiliy across several DBMS. In my opinion the best approach is usually to use some kind of compatibility layer. We have an in-house DB abstraction library, but there are several tools available.
In particular, it might pay to look at popular ORMs (Hibernate, nHibernate etc.). They usually offer DB-independence as a kind of side effect. At least Hibernate also has a special query language that will automatically be translated to SQL for the DBMS you are using.

Resources