Database Agnostic Application

Database Agnostic Application - database

The database for one the application that I am working on is not confirmed yet by the business.
Best guess is Oracle and DB2.
What I've heard is initially the project will go live with DB2 V9 and then to Oracle 11g.
We are using Spring 3.0.5, Hibernate 3.5, JPA2 and JBoss 5 for this project
So what are the best practices here going into the build phase and test phase?
Shall I build using DB2 first and worry about Oracle later (this
doesn't sound right)?
Or, shall I write using JPA (Hibernate) and
then generate the database schema?
Or something else?
PS: I've no control over the choice of the DB, what and when, as these are strategic decision made by people sitting in nice rooms getting fat cheques and big bonuses.
Thanks,
Adi

Obviously you are loosing the access to specific features of the database if you are writing your application database agnostic. The database is, except for automatic optimizations done by JPA and Hibernate, reduced to common features. You have to set some things to automatic and trust JPA/Hibernate to do it right that you could set specifically if you knew the database (e.g. id generator strategies).
But it seems that the specific developer features of the database are not relevant for the decision so they can't be relevant to the application. What other reasons may influence the decision (like price, money, cash, personal relations, management tools, hardware requirements, existing knowledge and personell) can only be speculated about.
So IMHO you don't have a choice. Strictly avoid anything database specific. That includes letting the JPA/Hibernate generate the schema (your point #2). In this project setup you shouldn't tinker with the database manually.
Well... sadly there ARE some hidden traps in JPA/Hibernate developement that make it database dependent (e.g. logarithmic functions are not mapped consistenly). So you should run all your tests against all possible databases from day one. As you write "Best guess is..." you should just grab any database available and test against it. Should be easly setup with the given stack.
And you should try to accelerate the decision about the database used, if possible.

Just "write using JPA (Hibernate)" develop it to be de database agnostic. Put all you business logic in java code not stored procedures.
If you are using spring you don't need jboss you could use just tomcat, about a quarter of the foot print, and much simpler imho.
Spring vs Jboss and jboss represents all that is bad, while spring represents all that is good in Java enterprise development

We have add this issue and had to migrate late in the project, leading to a lot of extra works, frustrations and delays.
My advise is to define an abstract layer. Go to the point you may have a data model without any database, say with tables or text files.
Then when you have to switch to some database, you can optimize for it, while staying free to continue application development on any already developped model. So you don't delay the developpers on the app while one is tuning the DB2 layer. When everything is duly validated, the team can switch on it.

I will disagree with the currently accepted answer suggesting avoiding database specific things. From a performance perspective, that would be a pity, and it's definitely doable.
JPA/Hibernate and also jOOQ can abstract over a lot of things and if you're using the query builder APIs of either technology (criteria query in JPA, or jOOQ for more advanced SQL), you can get very far in a vendor agnostic way without removing all the vendor specific stuff. For example, you can easily create a vendor specific predicate like this:
.where(oracle ? oracleCondition() : db2Condition())
What you should do from the very beginning of such a project, once you know you'll have to support both dialects is to run integration tests on both database products. For this, I recommend testcontainers, which makes running such tests quite simple. If you have to add support for another dialect, and if you're using one of the above abstractions, you can simply add another testcontainers configuration, check if your application still works, tweak 2-3 things, and you're set.
Disclaimer: I work for the company behind jOOQ.

Related

advice on hyperfile db

At my work, my co-workers are considering using hyperfile as a database server for a windev project. I don't even know that kind of database, it's from PCSOFT, the company that develops windev.
Since windev can also work with microsoft sql server, I'm looking for advice on that kind of database (performance, stability, etc) from people who already used it.
Regards!

It depends on the size of your project. Actually, Windev works well with HyperFileSQL. It has been designed for it ! By using another DBMS, you cut yourself some feature such as direct-reading/modifying/deleting in your tables.
Your performances will decrease significantly as soon as you have a nice amount of records in a table (> 100'000). Your database management will become a nightmare since you can't execute several SQL requests at the same time. In example, i'm using another tool developped by a french guy to manage my databases and execute some updates.
Despite of this, it's stable and provides an easy way to interact with Windev's fields.
In my opinion, Hyperfile SQL should be used with small applications with a small amount of features and datas.

Adding upon what Samuël Tremblay already wrote, I would say that after 2+ years of using Windev with HFSQL (old name is HyperFile SQL), here are my conclusions (I have used Windev versions 20 and 22):
PROS:
replication of a database to another server is rather easy to setup. You can choose to replicate a whole database or a selection of tables. But DBMS like PostgreSQL are actually offering advanced replication setups (https://www.2ndquadrant.com/en/resources/pglogical/).
easy export to a Microsoft Excel file of a query/table
create and change the schema/structure of your database through a graphical user interface (GUI)
CONS:
When you use the database server provided by Windev (i.e. HFSQL), you must use Windev (that is imposed upon you). You cannot interact with your database with another language/framework other than Windev, you are forced to use Windev to query a HFSQL database. If you use instead a DBMS like PostgreSQL, mySQL/MariaDB, etc. you can (and will be able) to query the database with some other language : C++, Java, JavaScript, etc.
Say that you wanted now to open your data to customers through a web app, you will actually need to use their other software Webdev from their software suite (and buy it actually).
Or say, some day, you want to develop a simple app for smartphone with Qt or else. Well, if your database runs on HFSQL, then you will not be able to query your database unless you use Windev (actually Windev Mobile that you also need to purchase).
UNIQUE constraints are not working with the presence of NULL (two rows containing NULL would be considered to violate the UNIQUE constraint).
(almost) every time you update your "analyse/analysis" (basically the database schema), you will also need to update your binary executable. You will need to recompile your software and distribute it again to the users. For example, say you modify a table by adding a column, or modifying the type of a column, then you need to recompile. The executable that the users have will not run, it will say that the version of the "analysis" (schema) on the database is not the same as the one in the executable, and will stop. BAM !
the HyperFile SQL (HFSQL) server is not so stable, it will crash (often) when executing slightly advanced queries with not so many rows...
You cannot create scripts to query the HFSQL database : you must create a binary executable (a new project) with Windev. Say you want to quickly modify something --> you need to recompile (and have a Windev IDE with you).
Say you are on the go, on some trip, and you forgot to bring your computer with the Windev Dongle key (a license cryptographic USB key : it you don't have it, you cannot run Windev), and you need to do some work on the database. PCSoft provides a software called HFSQL Control Center (a GUI software) that can interact with the database, but unfortunately it cannot be downloaded from the internet. You actually obtain it when you buy Windev, and you are allowed to distribute it to whom you want, but it cannot be downloaded from PCSOFT website.
Whereas if your database engine is another one, say PostgreSQL or MariaDB, you can simply download PGAdmin or the equivalent, and boom you can interact with your data.
It seems to me that HFSQL is not a real/genuine DBMS, let me explain myself : the constraints you can set in the analysis (UNIQUE for example), are not always respected. For example, after adding a UNIQUE constraint in the schema (analysis) and compiling the program, I have seen that if I would insert some data in a table from the executable, it would detect the violation of the UNIQUE constraint when it should happen. However if I would insert the same set of data through the HFSQL Control Center, the constraint would not be enforced and the duplicates would be insterted.
There would be more to say ...
Bottom line: From my own experience, I would strongly encourage anybody, who wants to develop a reliable and dependable software that "must" be developed with Windev (and that needs data persistence), not to use their database HFSQL. You would be much better off using a RDBMS such as PostgreSQL or MariaDB. We are actually going to port our databases from HFSQL to PostgreSQL this Summer.

You should carefully consider what sql functions you will use. For example deg2rad, rad2deg, ... not working correctly.
Also if you want to use it on a mobile device (Windev Mobile for iOS or Android) you should use SQLLite. Because HyperFile uses a lot of memory and it will be a problem on mobile.

If you want a free database, use PostgreSQL, the Windev connector for PostgreSQL is free to download and install on your windev as a replacement for HFSQL, it will be way more powerful while using the usual hFunctions like you would with HFSQL, plus you will find a ton of docs on the web to do powerful stuff.
HFSQL is in fact the same as an old ISAM DBASE database so it requires re-indexations and things like that of those older DB systems era.
PostgreSQL is like having a free Oracle DB with all the powerful features and reliability, we dropped HFSQL for this and performance has increased tenfold plus all the other benefits while keeping our code pretty much the same, every day feels like we discover freebies and gifts from ProsgreSQL since our migration :)
Free VS Free... You gotta go with power and sheer size of web documentation and poeple available to help .

In WinDev Mobile 18 and up, you can use Hyperfile on the device. And it is recommended from me, because it is faster and SQLLite restrict blob-size to 1MB!!
#Spek memory usage of HyperFile on the phone? Can you give me any values? I think if you want to make a full feature APP you cannot ignore the benefits of HyperFile...

FYI: New in Windev version 19: Hyperfile SQL is ACID.

Simplest way to develop an app that can use multiple types of databases?

I have a project for a class which requires that if a database is used, options exist for the user to pick a database to use which could be of a different type. So while I can use e.g. MySQL for development, in the final version of the project, the user must be able to choose a database (Oracle, MySQL, SQLite, etc.) upon installation. What's the easiest way to go about this, if there is an easy way?
The language used is up to me as long as it's supported by the department's Linux machines, so it could be Java, PHP, Perl, etc. I've been researching and found info on ODBC, JDBC, and SQLJ (such as this) but I'm quite new to databases so I'm having a hard time figuring out what would be best for my needs. It's also possible there may not be a simple enough way to do this; the professor admitted he's not a database guy either and he seemed to think it would be easy without having a clear idea of what it would take.
This is for a web app, but it ought to be fairly straight forward, using for example HTML and Javascript on the client side and Java with a MySQL database on the server side. No mention has been made of frameworks so I think they're too much. I have the option of using Tomcat or Apache if necessary but the overall idea is to keep things simple, and everything used should be able to be installed/changed/configured with just user level access. So something like having to recompile PHP to use ODBC would be out, I think.
Within these limitations, what would be the best way (if any) to be able to interact with an arbitrary database?

The issue I think you will have here is that SQL is not truely standard. What I mean is that vendors (Oracle, MySQL etc) have included types and features that are not SQL standard in order to "tie you in" to their DB, such as Oracle's VARCHAR2 and so on.
When I was at university, my final year project was to create an application that allowed users to create relational databases using JDBC with a Java front-end.
The use of JDBC was very simple but the issue was finding enough SQL features/types that all the vendors have in common. So they could switch between them without any issues. A way round this is to implement modules to deal with vendor specific issues and write ways to translate between them. So for example you may develop a database for MySQL with lots of MySQL specific code in that, but then you may want to use Oracle and then there are issues, which you would need to resolve.
I would spend some time looking at what core SQL standard all the vendors implement and then code for these features. But I think the technology you use wouldn't be the issue but rather the SQL you create.
Hope this helps, apologies if its not helpful!

Well, you can go two ways (in Java):
You can develop your own classes to work with different databases and load their drivers in JDBC. This way you will create a data access layer for yourself, which takes some time.
You can use Hibernate (or other ORMs). This way Hibernate will take care of things for you and you only have to know how to use Hibernate. Learning Hibernate may take some time, but when you get used to it, it can be very useful for your future projects.

If you want to stick Java there Hibernate (which wouldn't require a framework). Hibernate is fairly easy to use. You write HQL which gets translated to the SQL needed for the database you're using.

Maybe use an object relational mapper (ORM) or database abstraction layer (DAL). They are designed to provide a standard API to multiple database backends, making it possible to switch between different backends with minimal or no changes to your code. In Python, for example, a popular ORM is SQLAlchemy, and an excellent DAL is the web2py DAL (it's part of the web2py framework but can be used as a standalone DAL outside the framework as well). There are many other options in other languages as well.

use a framework with database abstraction layer and orm . try symfony or rails

There are a lot of Object relational database frameworks, unless you prefer jdbc. For simple/small applications this should work fine.

Does it make sense to use an OR-Mapper?

Does it make sense to use an OR-mapper?
I am putting this question of there on stack overflow because this is the best place I know of to find smart developers willing to give their assistance and opinions.
My reasoning is as follows:
1.) Where does the SQL belong?
a.) In every professional project I have worked on, security of the data has been a key requirement. Stored Procedures provide a natural gateway for controlling access and auditing.
b.) Issues with Applications in production can often be resolved between the tables and stored procedures without putting out new builds.
2.) How do I control the SQL that is generated? I am trusting parse trees to generate efficient SQL.
I have quite a bit of experience optimizing SQL in SQL-Server and Oracle, but would not feel cheated if I never had to do it again. :)
3.) What is the point of using an OR-Mapper if I am getting my data from stored procedures?
I have used the repository pattern with a homegrown generic data access layer.
If a collection needed to be cached, I cache it. I also have experience using EF on a small CRUD application and experience helping tuning an NHibernate application that was experiencing performance issues. So I am a little biased, but willing to learn.
For the past several years we have all been hearing a lot of respectable developers advocating the use of specific OR-Mappers (Entity-Framework, NHibernate, etc...).
Can anyone tell me why someone should move to an ORM for mainstream development on a major project?
edit: http://www.codinghorror.com/blog/2006/06/object-relational-mapping-is-the-vietnam-of-computer-science.html seems to have a strong discussion on this topic but it is out of date.
Yet another edit:
Everyone seems to agree that Stored Procedures are to be used for heavy-duty enterprise applications, due to their performance advantage and their ability to add programming logic nearer to the data.
I am seeing that the strongest argument in favor of OR mappers is developer productivity.
I suspect a large motivator for the ORM movement is developer preference towards remaining persistence-agnostic (don’t care if the data is in memory [unless caching] or on the database).
ORMs seem to be outstanding time-savers for local and small web applications.
Maybe the best advice I am seeing is from client09: to use an ORM setup, but use Stored Procedures for the database intensive stuff (AKA when the ORM appears to be insufficient).

I was a pro SP for many, many years and thought it was the ONLY right way to do DB development, but the last 3-4 projects I have done I completed in EF4.0 w/out SP's and the improvements in my productivity have been truly awe-inspiring - I can do things in a few lines of code now that would have taken me a day before.
I still think SP's are important for some things, (there are times when you can significantly improve performance with a well chosen SP), but for the general CRUD operations, I can't imagine ever going back.
So the short answer for me is, developer productivity is the reason to use the ORM - once you get over the learning curve anyway.

A different approach... With the raise of No SQL movement now, you might want to try object / document database instead to store your data. In this way, you basically will avoid the hell that is OR Mapping. Store the data as your application use them and do transformation behind the scene in a worker process to move it into a more relational / OLAP format for further analysis and reporting.

Stored procedures are great for encapsulating database logic in one place. I've worked on a project that used only Oracle stored procedures, and am currently on one that uses Hibernate. We found that it is very easy to develop redundant procedures, as our Java developers weren't versed in PL/SQL package dependencies.
As the DBA for the project I find that the Java developers prefer to keep everything in the Java code. You run into the occassional, "Why don't I just loop through all the Objects that just returned?" This caused a number of "Why isn't the index taking care of this?" issues.
With Hibernate your entities can contain not only their linked database properties, but can also contain any actions taken upon them.
For example, we have a Task Entity. One could Add or Modify a Task among other things. This can be modeled in the Hibernate Entity in Named Queries.
So I would say go with an ORM setup, but use procedures for the database intensive stuff.
A downside of keeping your SQL in Java is that you run the risk of developers using non-parameterized queries leaving your app open to a SQL Injection.

The following is just my private opinion, so it's rather subjective.
1.) I think that one needs to differentiate between local applications and enterprise applications. For local and some web applications, direct access to the DB is okay. For enterprise applications, I feel that the better encapsulation and rights management makes stored procedures the better choice in the end.
2.) This is one of the big issues with ORMs. They are usually optimized for specific query patterns, and as long as you use those the generated SQL is typically of good quality. However, for complex operations which need to be performed close to the data to remain efficient, my feeling is that using manual SQL code is stilol the way to go, and in this case the code goes into SPs.
3.) Dealing with objects as data entities is also beneficial compared to direct access to "loose" datasets (even if those are typed). Deserializing a result set into an object graph is very useful, no matter whether the result set was returned by a SP or from a dynamic SQL query.
If you're using SQL Server, I invite you to have a look at my open-source bsn ModuleStore project, it's a framework for DB schema versioning and using SPs via some lightweight ORM concept (serialization and deserialization of objects when calling SPs).

Database independence

We are in the early stages of design of a big business application that will have multiple modules. One of the requirements is that the application should be database independent, it should support SQL Server, Oracle, MySQL and DB2.
From what I have read on the web, database independence is a very bad idea: it would result in a hard-to-maintain code, database design with the least-common features in all supported DBMSs, bad performance and bad scalability. My personal gut feeling is that the complexity of this feature, more than any other feature, could increase the development cost and time exponentially. The code will be dreadful.
But I cannot persuade anybody to ignore this feature. The problem is that most data on this issue are empirical data, lacking numbers to support the case. If anyone can share any numbers-supported data on the issue I would appreciate it.
One of the possible design options is to use Entity framework for the database tier with provider for each DBMS. My personal feeling is that writing SQL statements manually without any ORM would be a "must" since you have no control on the SQL generated by the entity framework, and a database-independent scenario will need some SQL tweaking based on the DBMS the code is targeting, and I think that third-party entity framework providers will have a significant amount of bugs that only appear in the complex scenarios that the application will have. I would like to hear from anyone who has had an experience with using entity framework for database-independent scenario before.
Also, one of the possibilities discussed by the team is to support one DBMS (SQL Server, for example) in the first iteration and then add support for other DBMSs in successive iterations. I think that since we will need a database design with the least common features, this development strategy is bad, since we need to know all the features of all databases before we start writing code for the first DBMS. I need to hear from you about this possibility, too.

Have you looked at Comparison of different SQL implementations ?
This is an interesting comparison, I believe it is reasonably current.
Designing a good relational data model for your application should be database agnostic, for the simple reason that all RDBMSs are designed to support the features of relational data models.
On the other hand, implementation of the model is normally influenced by the personal preferences of the people specifying the implementation. Everybody has their own slant on doing things, for instance you mention autoincremented identity in a comment above. These personal preferences for implementation are the hurdles that can limit portability.
Reading between the lines, the requirement for database independence has been handed down from above, with the instruction to make it so. It also seems likely that the application is intended for sale rather than in-house use. In context, the database preference of potential clients is unkown at this stage.
Given such requirements, then the practical questions include:
who will champion each specific database for design and development ? This is important, inasmuch as the personal preferences for implementation of each of these people need to be reconciled to achieve a database-neutral solution. If a specific database has no champion, chances are that implementing the application on this database will be poorly done, if at all.
who has the depth of database experience to act as moderator for the champions ? This person will have to make some hard decisions at times, but horsetrading is part of the fun.
will the programming team be productive without all of their personal favourite features ? Stored procedures, triggers etc. are the least transportable features between RDBMs.
The specification of the application itself will also need to include a clear distinction between database-agnostic and database specific design elements/chapters/modules/whatever. Amongst other things, this allows implementation with one DBMS first, with a defined effort required to implement for each subsequent DBMS.
Database-agnostic parts should include all of the DML, or ORM if you use one.
Database-specific parts should be more-or-less limited to installation and drivers.
Believe it or not, vanilla-flavoured sql is still a very powerful programming language, and personally I find it unlikely that you cannot create a performant application without database-specific features, if you wish to.
In summary, designing database-agnostic applications is an extension of a simple precept:
Encapsulate what varies

I work with Hibernate which gives me the benefits of the ORM plus the database independence. Database specific features are out of the question and this usually improves my design. Everything (domain model, business logic and data access methods) are testable so development is not painful.

مرحبا , Muhammed!
Database independence is neither "good" nor "bad". It is a design decision; it is a trade-off.
Let's talk about the choices:
It would result in a hard to maintain code
This is the choice of your programmers. If you make your code database-independent, then you should use a layer between your code and the database. The best kind of layer is one that someone else has written.
...Database design with the least common features in all supported DBMSs
This is, by definition, true. Luckily, the common features in all supported databases are fairly broad; they should all implement the SQL-99 standard.
...bad performance and bad scalability
This should not be true. The layer should add minimal cost to the database.
...this is the most feature ever that could increase the development cost and time exponentially with complexity. The code will be dreadful.
Again, I recommend that you use a layer between your code and the database.
You didn't specify which language or platform you're writing for. Luckily, many languages have already abstracted out databases:
Java has JDBC drivers
Python has the Python Database API
.NET has ADO.NET
Good luck.

Database independence is an overrated application feature. In reality, it is very rare for a large business application to be moved onto a new database platform after it's built and deployed. You can also miss out on DBMS specific features and optimisations.
That said, if you really want to include database independence, you might be best to write all your database access code against interfaces or abstract classes, like those used in the .NET System.Data.Common namespace (DbConnection, DbCommand, etc.) or use an O/RM library that supports multiple databases like NHibernate.

Using a common database for collaborative development

Some of the people in my project seem to think that using a common development database with everyone connecting to it is the best thing. I think that it isn't and each developer having his own database (with periodic updated data dumps) is the best. Am I right or wrong? Have you encountered any problems in any of these approaches?

Disk space and CPU should be cheap enough that every developer can run their own instance of the database, with an automated build under version control. This is needed to allow developers to be bold in hacking on the database, in isolation from any other developer's concurrent hacking.
The caveat being, of course, that any changes they make to their private instance are useless to anyone else unless it can be automatically applied during the build process. So there needs to be a firm policy that application code can't depend on any database state unless that state is represented by version-controlled, unit-tested changes to the DDL.
For an excellent guide on the theory and practice of treating the database definition as another part of the project code, and coordinating changes and refactorings, see Refactoring Databases: Evolutionary Database Design by Scott W. Ambler and Pramod Sadalage.

I like having my own copy of the database for development, because it gives you the flexibility to rapidly change things without worrying how it will impact others.
However, if all the developers are hacking away on their own copy of the database, it becomes more and more difficult to merge everyone's work together in the end.
I think you can get the best of both worlds by letting developers work on a local copy during day-to-day development, but each developer should probably merge their work into a common copy on a pretty regular basis. Writing a lot of unit tests helps too.

We share a single database amongst all our developer (20-odd) but we've got it structured so that everyone has their own tables.
You don't need a separate database per developer if you structure the application right. It should be configurable which database or table-prefix it uses anyway so you can easily move it between instances (unit test, system test, acceptance test, production, disaster recovery and so on).
The advantage to using a single database is that the cost of maintenance is amortized. You don't have your DBAs trying to handle a lot of databases (or, if you're a small-DB shop, you don't have every developer trying to maintain their own database when they're better utilized in developing).

Having a single point of Failure is not a good thing isn't it?

I prefer a single, shared database. But it's very dependent on the situation and the applications being developed.
What works for me may not work for you. Go with your gut.

If you are working with Hibernate or any hibernate-based platform you can configure your database to be created when you start your server (create-drop option). This is very useful when you are adding new attributes to your classes. If this is the case each developer must have his own copy of the DB.
If you are not changing the DB structure at all then you can use a single shared DB.
In this second case is not a must. I prefer to have my own DB where I can do whatever I want. On the other hand remember that some queries can take a lot of time and this will affect your whole team if you are sharing a DB.