Better languages than SQL for stored procedures [closed] - database

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I'm getting increasingly frustrated with the limitations and verbosity required to actually commit some business logic to stored procedures, using languages such as Transact-SQL or PL/SQL. I would love to convert some current databases to Oracle and take advantage of its support for Java stored procedures, but that option is not available at the moment.
What alternatives would you recommend in the way of databases that support stored procedures in other languages?

There are some architectural obstacles to having more clever query languages in a database manager. The principal one is the query optimiser. One of the design constraints on SQL is that it can only use constructs that are accessible to the query optimiser. This means that the language and its capabilities are quite tightly coupled to the capabilities of the query execution engine and query plan optimiser.
The other major design constraint is the mechanical nature of the database system - database programming is almost unique in that it has a mechanical component. Query performance is limited by the mechanical constraints of disk head seeks and rotational latency (the wait time before the data you want arrives under the heads).
This effectively precludes many clever abstractions that might make SQL more powerful or easier to work with. Many database management systems supplement SQL with procedural alternatives that can be used for scripting. However, they interact with the DBMS by
executing a series of SQL queries that are processed by the optimiser individually. Some languages of this type that are shipped with various DBMS platforms are:
Oracle's PL/SQL and
embedded Java. PL/SQL is
actually based on Ada - it is quite
'old school' by modern standards and
has a legacy code base with which it
must remain backwardly compatible.
It's not necessarily the most
pleasant programming environment but
it does have constructs for
facilities such as parallelism and a
reasonably flexible type system. One of the major
criticisms of Java stored procedures
on Oracle is that you are paying for
Oracle's capacity based licensing on
the CPU you are running the JVM's
on.
SQL Server CLR Integration.
Somewhat similar to Oracle's Java
Stored Procedures, this allows CLR
modules compiled from C# (or any .net
language) to be loaded into a SQL Server
instance and executed in much the same
way as stored procedures. SQL Server
also has
PostgreSQL-style API's for making
custom aggregate functions through CLR
integration and other
hooks for mixed SQL/CLR code bases.
PostgreSQL is actually the
system where back-end language
integration was originally
developed. The system exports a native C API with facilities for
custom aggregate functions, storage
engines, procedural extensions and other
functionality.
The language interfaces
are based on this API and include:
PL/pgSQL (a bespoke language similar
to PL/SQL), Python, Perl
and Tcl.This made it into the
mainstream through Illustra, a
commercialised version of Postgres,
which was then bought out by
Informix (which was subsequently
bought out by IBM). The key
features were incorporated into
Informix On-Line, which is
still sold by IBM.
One key limitation of these languages is their limited interaction with the query
optimiser (although the C API for PostgreSQL does have support for this). Participation
in a query plan as first-class citizen requires that the query optimiser can work out a sensible view of the resources your action will take. In practice, this type of interaction with the query optimiser is mainly useful for implementing storage engines.
This level of digging into the storage engine is (a) somewhat esoteric if the functionality is available at all (so most people won't have the skill to do this) and (b) probably considrably more trouble than just writing the query in SQL. The limitations of the query optimiser mean that you will probably never get the level of abstration out of SQL that you might get from (say) Python or even C# or Java.
The path of least resistance for efficient queries is likely to be
writing the query in SQL with some procedural glue in one of the other languages. In some cases a computation really does lend itself to a procedural approach.
This can become a hassle and lead to large bodies of boilerplate SQL code. The only real options for this are hand coded SQL or code generation systems. A trivial example of code generation is the CRUD functionality provided by frameworks where this SQL is generated from metadata. A more complex example can be seen in ETL tools such as Oracle Warehouse Builder or Wherescape Red which work by generating great screeds of stored procedure code from the model.
I find myself building code generation systems of one sort or another on a semi-regular basis for precisely this reason. Any templating system will do for this - I've had fairly good mileage from CherryTemplate but there are many such items around. Code Generation in Action is quite a good book on this subject - the author uses a ruby-based system whose name escapes me.
Edit: If you look at a 'Show Estimated Execution Plan' for a block of procedural code you will notice that each statement has its own query plan. The query optimisation algorithm can only work on a single SQL statement, so a procedure will have a forest of query plans. Because procedural code can have 'side-effects' you cannot use the type of algorithms used in query optimisation to reason about the code. This means that a query optimiser cannot globally optimise a block of procedural code. It can only optimise individual SQL statements.

PostgreSQL has support for many scripting languages procedures. Officially Perl, Python, and Tcl. As addons, PHP, Ruby, Java and probably many others (just Google for pl<languagename>) which may or may not be in working condition as of now.
Oh, and also SQL Server 2005 onwards has support for CLR stored procedures, where you can use .NET languages.

Oracle, HSQLDB and Derby allow to write stored procedures in Java.

Oracle does support CLR stored procedures so you can write stored procs in any .NET language like C#, VB.NET or IronPython. This only works when the database server runs on a Windows machine. You can't do it when the database runs on Linux or Unix.

DB2 for Z/OS is the database that support most languages as I know. It support COBOL,C/C++,JAVA as the store procedure, It of course also support SQL procedure.

There is also some support for writing Oracle stored procedures in Perl.

Because Oracle has a built in JVM you can develop stored procs in Java but also in non-java-languages that use the JVM, that means languages like JACL, JYTHON, SCHEME and GROOVY. See here: http://db360.blogspot.com/2006/08/oracle-database-programming-using-java_01.html and http://en.wikipedia.org/wiki/List_of_JVM_languages .

Related

What is the purpose and risks of enabling SQL CLR?

We are looking to implement unit tests using the tSQLt test framework. It has got a pre-requisite that the SQL CLR must be enabled using this command:
EXEC sp_configure 'clr enabled', 1; RECONFIGURE;
I am curious to know what is the purpose of SQL CLR and the risks of enabling this in production environment?
PURPOSE
SQLCLR allows one to do things that either:
can't be done in T-SQL, or
can't be done as efficiently as in T-SQL
There are plenty of things that can be done in both, and for which T-SQL is actually much better at. In those cases it is an inappropriate use of SQLCLR to do those things so it is best to research first to make sure that the operation cannot be done in T-SQL, or would definitely be slower.
For example of performance, T-SQL Scalar UDFs prevent parallel execution plans. But SQLCLR scalar UDFs, as long as there is no data access and that they are marked as IsDeterministic=true, do not prevent parallel execution plans.
For more details on what SQLCLR is and is not, please see the first article in the Stairway to SQLCLR series that I am writing for SQL Server Central:
Stairway to SQLCLR Level 1: What is SQLCLR?
Or, to get a sense of what can be done in SQLCLR, please see my SQL# project, which is a library of over 320 stored procedures and functions, many of which are in the Free version, and many of which work in SAFE mode: SQLsharp.com.
RISKS
The risks vary based on the PERMISSION_SET (i.e. SAFE, EXTERNAL_ACCESS, and UNSAFE) that the Assembly is marked as, and what is being done. It is possible to do things in an UNSAFE Assembly that cannot be done in regular T-SQL (except that many of those dangerous things can already be done via some extended stored procedures, xp_cmdshell, and the OLE Automatic procedures -- sp_OA* ). An Assembly marked as SAFE cannot reach outside of the database, so generally quite safe, BUT you can still lock up the system via a Regular Expression that exposes "catastrophic backtracking" (of course, this can be mitigated starting in .NET Framework 4.5, so SQL Server 2012 and newer, by setting a max time limit on the RegEx operation). An Assembly marked as UNSAFE can write to static variables, which in the context of the shared App Domain model used by SQLCLR, allows for shared memory between Sessions. This can allow for caching, but when not used properly, easily leads to race conditions.
TESTING
As for tSQLt, I do not believe that you are required to use the SQLCLR component. I thought I saw that it just enabled some extended functionality. Either way, the source code is available on GitHub so you can check it out to see what it is doing. It has been a while since I looked at it, but from what I remember, it should not present much of a risk for the little that it is doing (especially in a Dev / QA environment).
Another option that doesn't use SQLCLR is DbFit. I have always prefered DbFit as it is completely external to the DB. It is based on the FitNesse framework, written in Java, and you manage the tests via wiki-style pages. It, by default, wraps the tests in a Transaction and rolls everything back when the test is finished (i.e. clean-up). It is worth taking a look at.
Download: DbFit project on GitHub
Tutorial: Using the DbFit Framework for Data Warehouse Regression Testing
SQLCLR allows you to create .NET assemblies and run code inside them from within SQL Server.
Depending on the permissions on the assembly the risks vary. The risks are something like so:
Permission Set: Risk
SAFE You cannot do anything more than what you can in T-SQL. So fairly safe.
EXTERNAL ACCESS You can call code in .NET assemblies approved by Microsoft, such as ADO.NET. Fairly safe, but still a risk.
UNSAFE You can do almost anything that the .NET framework allows you to do. In reality, shoot yourself in the head unless you know what you are doing.

MS SQL SERVER: sql queries compared to general programming

i'm just starting out with ms sql server (2008 express) and spent the last hour or so trying out some basic queries. how does sql scripting (correct term?) compare to general computer programming? is writing sql queries as involved, lengthy to learn, needs as much practice, etc. as software development? is writing advanced sql queries comparable to software development, at least to some degree, and if so could you explain that?
also, i found a couple of tutorial and reference sites for learning sql but could you also recommend some other sites.
also(2), i was playing around with the sql query designer in msSQL and it seems like a good tool to learn writing sql commands, but is there a way to use the designer to INSERT data. it seems that it's only for SELECT ing data.
any other general comments for learning, better understanding, etc SQL would be appreciated. thanks
First at all, SQL is more about databases and less about programming, in that way that you you cannot just succeed by "writing good queries": you must also know how to structure your data, make optimized tables, choose appropriate types, etc. You can spend a day thinking about how your data will be stored, without really writing any queries. SQL is not a way to solve an abstract problem, but a way to store and retrieve data efficiently and safely. For example, making maintenance and backup plans is purely a DBA job, and has nothing to do with SQL queries.
Is it lengthy to learn? Well, here, it is quite similar to general development. Basic SQL syntax is pretty simple, so after reading the first page of SQL book, you will probably be able to insert, retrieve and remove data. But to master SQL and database stuff, you must be ready to spend years and years of practice. Just like CSS: writing CSS is easy. Mastering it is hard.
Some advices:
Take in account security.
You communicate with SQL Server by sending strings, and the server must interpret them. The big mistake is to let the end user participate in building your queries: it leads to security leaks, with the ability for the hacker to do whatever he want with your data (it's called SQL Injection). It's just like letting everyone write any source code they want, and execute it on your machine. In practice, nobody let a third person write arbitrary code on her machine, but plenty of developers forget to sanitize user input before querying the database.
Use parametrized queries and stored procedures.
You may want to consider as soon as possible using parametrized queries. It will increase security, optimize performance and force you somehow to write better queries (even if it is debatable).
After learning SQL for a few weeks or months, you may also want to learn what are stored procedures and how to use them. They have their strong points, but don't make an error I made when I started learning SQL: do not decide to use stored procedures everywhere.
Use frameworks.
If you are a .NET developer, learn to use Linq-to-SQL. If you already used Linq on .NET objects (lists, collections, etc.), it can be very helpful to figure out how to do some queries. By the way, remember you can use Linq queries and see how Linq transforms them into SQL queries.
Keep in mind that using a framework or an abstraction layer will not make you a database guru. Use it as a helpful tool, but do sometimes SQL stuff yourself. Yes, it can free you from writing SQL queries by hand, even on large-scale projects (for example, StackOverflow uses Linq-to-SQL). But soon or later, you will either need to work on a project which does not use Linq, or you will see some possible limitations of Linq versus plain SQL.
Borrow a book.
Seriously, if you want to learn stuff, buy or borrow a book. Tutorials will explain you how to do a precise thing, but you will lose an opportunity to learn something you never thought about. For example, database partitioning or mirroring is something you must know if you want to work as a DBA. Any book about databases will talk about partitioning; on the other hand, there are few tutorials which will lead you to this subject by themselves.
Test, evaluate, profile.
SQL is about optimized queries. Anybody can write a select statement, but many people will write it in a non-optimized form.
If you are dealing with a few kilobytes database which have a maximum of hundred records, all your queries will perform well, but when the things will scale up, you will notice that a simple select query spends three seconds on a few billions of rows database instead of a few milliseconds.
To learn how to write optimized queries and create optimized databases, try to work on large sets. AdventureWorks demo database from Microsoft is a good start point, but you may also need sometimes to fill the database with random stuff just to have enough data to measure performance correctly.
Use Microsoft SQL Profiler (included in SQL Server 2008 Enterprise). It will help you to know what the server is really doing and how fast, and to find bottlenecks and poorly-written queries.
Learn from others.
Reading a book is a good point to start, but is not enough. Read the stuff on StackOverflow, especially the questions related to developers doing DBA work. For example, see Database Development Mistakes Made by App Developers question, and return reading the answers from time to time while learning SQL.
If you have any precise question (a query which does not produce what you expected, a strange performance issue, etc.), feel free to ask it on StackOverflow. The community is great, and there are plenty of people who know extremely well the stuff.
Sometimes, talking to DBA in your company (if there is one) can be also an opportunity to learn things.
is there a way to use the designer to INSERT data. it seems that it's only for SELECT ing data
If I remember well, query designed in Visual Studio let you build insert statements too. But maybe I'm wrong. In all cases, you can use Microsoft SQL Management Studio (included with Microsoft SQL 2008 Enterprise), which let you see how to build some cool queries (right-click on an element in Object Explorer, than use "Script database as..." menu).
I think you'll find that they key issue is that SQL is declarative, unlike most computing languages you're likely familiar with. This is fundamental. Grab any computer science text book and start there.
SQL is no more or less difficult than anything else in my view. Historically it was an area which people would tend to specialize in, but that was a consequence of the technology available at the time. It's now more accessible and the tools are significantly better, so expertise is generally spread more widely now.
It is different, SQL programming is quite restricted, when writing complex logic you might find it cumbersome with its limited programming options, unclear code as there is no modular programming, and bad implementations of stuff like cursors.
I read somewhere on SO, that database is not for coding, its only for storing data and querying. Its well said in some sense.
What I believe important to learn in that area is first of knowing all the features available in the db so that you make it use efficiently. Secondly improve querying/analytical skills.
Basic SQL features can be learnt from w3schools(joins , grouping etc)
Advance db features can be learnt from your dbms certification exam book. (the most basic certification exam be it oracle/sql server)
Analytical skills and some fun - puzzles by Joe Celko

What do I need to know about databases?

In general, I think I do alright when it comes to coding in programming languages, but I think I'm missing something huge when it comes to databases.
I see job ads requesting knowledge of MySQL, MSSQL, Oracle, etc. but I'm at a loss to determine what the differences would be.
You see, like so many new programmers, I tend to treat my databases as a dumping ground for data. Most of what I do comes down to relatively simple SQL (INSERT this, SELECT that, DELETE this_other_thing), which is mostly independent of the engine I'm using (with minor exceptions, of course, mostly minor tweaks for syntax).
Could someone explain some common use cases for databases where the specific platform comes into play?
I'm sure things like stored procedures are a big one, but (a) these are mostly written in a specific language (T-SQL, etc) which would be a different job ad requirement than the specific RDBMS itself, and (b) I've heard from various sources that stored procedures are on their way out and that in a lot of cases they shouldn't be used now anyway. I believe Jeff Atwood is a member of this camp.
Thanks.
The above concepts do not vary much for MySQL, SQL Server, Oracle, etc.
With this question, I'm mostly trying to determine the important difference between these. I.e. why would a job ad demand n years experience with MySQL when most common use cases are relatively stable across RDBMS platforms.
CRUD statements, joins, indexes.. all of these are relatively straightforward within the confines of a certain engine. The concepts are easily transferable if you know a different RDBMS.
What I'm looking for are the specifics which would cause an employer to specify a specific engine rather than "experience using common database engines."
I believe that the essential knowledge about databases should be:
What database are for?
Basic CRUD Operations
SELECT queries with JOINs
Normalization
Basic Indexing
Referential Integrity with Foreign Key Constraints
Basic Check Constraints
The above concepts do not vary much between MySQL, SQL Server, Oracle, Postgres, and other relational database systems. However you'd find a different set of concepts for the now-popular NoSQL databases, such as CouchDB, MongoDB, SimpleDB, Cassandra, Bigtable, and many others.
After the CRUD statements, to be an effective DB programmer I think some of the most important things to understand are JOIN statements. Understand the difference between LEFT and RIGHT, OUTER and INNER joins, and know when to use each. Most importantly, know what the database is actually constructing when it performs a JOIN.
For me, the Wikipedia article was very helpful.
Also, indexing is very important - this is how relational databases can perform fast queries. Understand how to use them and what happens under the hood.
Wikipedia article on DB indexing.
You should also know how to construct a many-to-one relationship (using foreign keys) and a many-to-many relationship (using join tables).
I know that in your question you're asking about specific DB implementations, but if you're to be taken literally and you only know about SELECT, INSERT, UPDATE, and DELETE, then the above concepts will be far more valuable than learning the intricacies of a particular implementation.
It's not just stored procs and functions. Each database has fundamental differences and quirks that are important to understand even though SQL works more or less the same.
Examples:
Oracle and MySQL handle locking differently, in different situations.
Oracle doesn't have autoincrementing primary keys like MySQL and SQL Server.
Subtle vendor-specific behavior, like the way Oracle does sorting for VARCHARs differently depending on locale.
If you really want to improve your applications, you eventually have to become familiar with the details about how your specific database works. Most of the time it doesn't make a lot of difference, but when it does matter, it usually makes a big difference, especially when it comes to performance.
Some things which seem to come up when talking with my Database-keen colleagues:
Row vs page vs table locking escalation when doing multiple complex joins, implies sometimes doing very different things on different vendors dbs. This is where the theory is really hitting the tarmac and often it is non-intuitive.
Differences between how cursors are best used on different vendor db implementations
Odd stuff in the stored proc language variants, like how best to handle failure cases
Differences in how temporary tables and views are best used depending on the underlying implementations.
All of these kind of things don't really matter until you are trying to solve something that either has to
- Run very fast
- Contain lots and lots of data
- Gets very big and complex (i.e. multiple queries hitting same tables simultaneously)
These are the kinds of things that DBAs should be helping with, so depends on if you are aiming to be a DBA or a programmer. None of the above have really hurt me yet, because I've not worked on db-intensive systems, but I've worked near a few, and the programmers on those end up knowing a lot about the internals, restrictions, and good features about the specific database they are using.
Best way to get knowledge like that (other than on the job) is to read the manuals or hang out with people that already know and ask them about it.
Don't forget relation schemas, Primary and foreign keys and how they are related. To start with DB, I would use MySql and MSSQL as these are most common in the market. I take Oracle as more advanced and complex db
As for the level of differences there are between vendors, it is because SQL is a standard (http://en.wikipedia.org/wiki/SQL#Standardization) and vendors implement that std differently.
Each of these vendors try to offer extras to have the crowd by their side... that's why you see functions available to one and not the other. But sometimes that function make its way into the standard so its not always a bad thing.
For stored proc. I would agree as ORMs and practices of today tend to do a greater separation of concerns by removing business logic from the database and considering it "only" a repository.
My 2 cents
I see job ads requesting knowledge of MySQL, MSSQL, Oracle, etc. but I'm at a loss to determine what the differences would be.
I'm what's called a SQL Developer. You won't see the differences much when you are doing run of the mill database work (CRUD). However the differences become quite apparent when you are dealing with the databases own brand of SQL.
When talking SQL outside of the standards, there are 4 distinctive types of commands. These are:
Data Manipulation Language (DML)
Data Definition Language (DDL)
Data Control Language (DCL)
Transactional Control Language (TCL)
The biggest differences come in the last two, DCL and TCL. Those have a LOT of database specific non-standard SQL commands. The first two, DML and DDL, are very similar across any database that use the relational model.
Also the bigger database vendors have nicknamed their SQL implementation. Here's a short sample:
SQL Server : T-SQL
Oracle : PL-SQL
PostgreSQL : P-SQL or NG-SQL
Firebird : IB-SQL
MySQL : mSQL
The list goes on, but you get the point. Wikipedia has good articles on the different command acronyms.
I have found that most employers won't be able to articulate this, because most will use non-technical managers and/or HR to do the hiring. They are basically being told by the tech managers that the new hires need to know X technology. This and also, because the majority are too lazy to hire for intelligence, instead they fall back on the "We have X, so darn it, we need to hire somebody that knows X!" meme. The differences are actually not that hard to learn, for the people who frequent StackOverflow. I'm confident that anybody here can learn these fairly fast.
Even something as simple as an auto-incrementing primary key can be very different in Oracle, mysql, and SQL Server.
Some other important differences:
SQL Server makes a distinction between clustering key and primary key; other database do not. This choice comes with major performance implications.
SQL Server allows the SET #Total = Total = #Total + Amount syntax for fast computations of things like running totals. mysql lets you use a user variable in a similar way (I think). In other databases you'd probably have to use a correlated subquery. Huge difference in performance.
SQL Server can generate "sequential GUIDs" with newsequentialid. I'm not sure which other databases have this feature, but as with the above two points, there are significant performance implications to using a traditional GUID as opposed to a sequential or comb.
Oracle's CONNECT BY is a very useful and pretty unique syntax. Common Table Expressions in SQL Server and mysql are similar but not exactly the same.
Support for ranking/ordering functions varies vastly across different databases. I'm constantly posting answers here invoking ROW_NUMBER. A lot of queries are much harder to write without this - but at the same time, abusing it can hurt performance.
XML support is all over the map. Most databases have reasonably good support for it now, but both syntax and semantics are completely different on every platform.
Date/time handling can be quite different. Oracle has several different date/time-related types, some including time zone information. In general, Oracle is way better than other databases at managing temporal data, and has several features that you will miss if you switch. Until recently, Microsoft didn't have the date and time types, just datetime, which was much harder to normalize.
Spatial types are different and/or nonexistent in different databases. mysql exposes an entire OpenGIS model; Microsoft's support is a bit more basic but still competent. Oracle has it, but it's a little hard to find information on, and it's some sort of optional add-on. I think DB2 is starting to get it, but support is still a little spotty.
mysql actually lets you choose how to store an index (i.e. btree or hash). This is also an important performance consideration.
SQL Server allows you to INCLUDE columns in an index - very important for performance.
Oracle allows you to create function-based indexes, bitmap indexes, and so on. These can be pretty difficult to wrap your head around.
Oracle can perform "skip seeks" in very specific situations, something that I don't believe is supported in other databases (yet). This might factor into how you order index columns.
SQL Server has CLR types/functions/aggregates. Obviously not supported in any other database product.
Trigger support varies significantly. SQL Server has AFTER and INSTEAD OF. mysql has BEFORE and AFTER. Oracle has all of those and more. These all behave quite differently.
I'm sure that there are many, many more differences, but that should give you at least a basic idea of why 5 years of experience with Oracle is completely different from 5 years of experience with SQL Server.
That databases are encoded collections of assertions of fact.
That the logical structure of the tables corresponds to the syntactical structure of those "assertions of fact".
That Normalization theory helps you find the most optimal logical structure of the database, by minimizing redundancy, i.e. minimizing the possibility for contradictions in said assertions of fact to occur.
That database constraints are really nothing else than business rules, expressed in a formal way and in terms of the components of the database.
That really every and any business rule can be expressed as a database constraint.
That therefore, it is possible for the DBMS to enforce any and every business rule you can imagine.
That there is a very important difference between logical design and physical design.
That SQL and SQL systems are, eurhm, not really helpful (and that's putting it mildly), in supporting developers to recognise this important distinction.
That SQL and SQL systems are, eurhm, significantly deficient (and that's putting it mildly), in their support for database constraints.
That these latter two examples are a very good illustration of the importance of the difference between a model (Codd's RM) and its implementation (some particular SQL system). As far as relational database technology is concerned, the latters deviate ever more propostrously from the former.
And whatever else I forgot to remember.

Swapping out databases?

It seems like the goal of a lot of ORM tools and custom data access layers (DAO pattern, etc.) is to abstract the database to the point where you could supposedly swap out the entire database system with minimal work.
Following the common DAL patterns is usually a good idea in code, but it seems like it would never be minimal work to swap out a database. (Cost, training, data migration, etc.)
Does anyone have any experience with swapping out one database for another in a large system, and dealing with the implications in code? Is it worth it to worry about abstracting the actual database from your code?
Question 1: Does anyone have any experience with
swapping out one database for another
in a large system, and dealing with
the implications in code?
Yes we tried it. Our customer is using a large MS Access based Delphi client server application. After about five years we considered switching to SQL Server. We analyzed the problem and concluded that swapping the database would be very costly and provide only a few advantages. Customer decided not to swap the database. The application is still running fine and the customer is still happy.
Note that:
MS Access is only being used for data storage and report generation.
The server application ensures that MS Access is only being accessed on the server. Normal multi-user MS Access applications will transfer large chunks of the Access database over the network - resulting in slow and unreliable database functionality. This is not the case for this application. Client <> Server <> MS Access. Only the server application communicates with the MS Access database. Actually the Server has exclusive access to the MS Access database. No other computer can open to the MS Access database. Conclusion: MS Access is being used as a true RDBMS, Relational DataBase Management System - please no flaming about MS Access being inferior and unstable - it has been running fine for more than 10 years.
The most important issues you will have to consider:
SQL statements: (SELECT, UPDATE, DELETE, INSERT, CREATE TABLE) and make sure they would be compatible with the SQL database. It's amazing how much all the RDBMS differ in the details (date formats, number formats, search formats, string formats, join syntax, create table syntax, stored procedures, user defined functions, (auto) primary keys, etc.)
Report generation: Depending on your database you might be using a different reporting tool. Our customer has over 200 complex reports. Converting all these reports is very time consuming.
Performance: all RDBMS have different performances in different environments. Normally performance optimalisations are very much RDBMS dependent.
Costs: the costs of tools, developers, server and user licenses varies greatly. It ranges from free to very expensive. Free does not mean cheap and expensive does not always equate to good. A cost/value comparison will have to be made.
Experience: making the best use of your RDBMS requires experience. If you have to develop for an "unknown" RDBMS your productivity will suffer.
Question 2: Is it worth it to worry about
abstracting the actual database from
your code?
Yes. In an ideal world, swapping a database would just be adjusting the data connection string. In the real world this is not possible because all databases are different. They all have tables and SQL support but the differences are in the details. If you can keep the differences of the databases shielded through abstraction - please do so. Make a list of the databases you need to support. Check the selected database systems for the differences. Provide centralized code to handle the differences. Support one RDBMS and provide stubs for future support of other RDBMS.
I disagree that the purpose is to be able to swap out databases, and I think you are correct in showing some suspicion about ORMs leading towards that goal.
However, I would still use an ORM, as it abstracts away the details of data access. Isn't this the goal of object oriented programming? Keep your concerns separated.
I think the primary use case for database abstraction (via ORM tools) is to be able to ship a product that works with multiple database brands. I believe it's a rarer occurrence for a company to switch between database vendors, but that's still one of the use cases.
I've worked jobs where we started out using MySQL for monetary reasons (think a startup) and, one we started making money, wanted to switch to Oracle. We didn't end up making the switch, but it was nice to have the option.
Still, ORM tools are not a completely leak-less abstractions and I know our migration still would have been painful and costly. It totally depends on what you are building, but it has been my experience that -- for performance reasons, usually -- you end up either working around your ORM solution or exploiting vendor-specific features at some point.
The only time I've seen a database switch was from HSQL during early development to Oracle as the project progressed. The ORM made this easy.
I often use the DAO pattern to swap out data services (from a database to web service or to swap a web service to a test stub).
For ORM I don't think the goal is to enable you to switch databases - it is to hide you from the complexities of different database implementations and removing the need to worry about the fine details of translating from relational to object represenations of your data.
By having someone smart write an ORM that handles caching, only updates fields that have changed, groups updates, etc I don't need to. Although in the cases where I need something special I can still revert to SQL if I want.

Implementing functionality/code directly in database system

RDBMS packages today offer a tremendous amount of functionality beyond standard data storage and retrieval. SQL Server for example can send emails, expose web service methods, and execute CLR code amongst other capabilities. However, I have always tried to limit the amount of processing my database server does to just data storage and retrieval as much as possible, for the following reasons:
A database server is harder to scale than web servers
In a lot of projects I've worked on, the DB server is a lot busier than the web servers, and thus has less spare capacity
It potentially exposes your database server to a security attack (web services for example)
My question is, how do you decide how much functionality or code should be implemented directly on your database server versus other servers in your architecture? What recommendations do you have for people starting new projects?
I know Microsoft SQL Server and Oracle really push using stored procedures for everything, which helps to encapsulate the relational architecture and creates a more procedural interface for the software developers, who typically aren't as facile writing SQL queries.
But then half your application logic is written in PL/SQL (or T-SQL or whatever) and the other half is written in your application language, Java or PHP or C#, etc. The DBA is typically responsible for coding the procedures, and the developers are responsible for everything else. No one has visibility and access to the full application logic. This tends to slow down development, testing, and future revisions to the project.
Also software development tools tend to be poor for stored procedures. Tools and best practices for debugging, source control, and testing all seem to be about 10-15 years behind the state of the art for application languages.
So I tend to stay away from stored procedures and triggers if at all possible. Except in certain cases when a well-placed stored procedure can make a complex SQL operation happen entirely in the server instead of shuffling data back and forth. This can be very effective at eliminating performance bottlenecks.
It's possible to go too far in the other direction as well. People who prefer the application manage data versus metadata, and employ designs like Entity-Attribute-Value or Polymorphic Associations, get themselves into trouble. Let the database manage that. Use referential integrity constraints (foreign keys). Use transactions.
The vendors have one set of best practices. You, however, voice concerns with that.
Years ago I supported a Major Software Product. Major.
They said "The database is relational storage. Nothing more." Every user conference people would ask about stored procedure, triggers, and all that malarkey.
Their architect was firm. As soon as you get away from plain-old-SQL, you've got a support and maintenance nightmare. They did object-relational mapping from the DB into their product, and everything else was in their product.
This scales well. Multiple application servers can easily share a single database server.

Resources