I'm embarking on an adventure in JPA and want to, inasmuch as possible, stay database agnostic. What orm.xml features should I avoid to stay database agnostic?
For example, if I use strategy="AUTO" in orm.xml as follows:
<id name="id">
<generated-value strategy="AUTO" />
</id>
...then MySQL shows it as an AUTO_INCREMENT column which may (I'm not yet sure) cause problems if I needed to deploy to Oracle.
JPA features
You can use all JPA features. At worse you will need to change the annotations or orm.xml (e.g. if you want to use a special sequence in oracle), but all features are supported one way or another without impacting the code. That's what's nice with an ORM -- you have an extra-layer of abstraction.
Reserved keywords
Don't use reserved word to name your tables and columns
Keep close to SQL-92 standard
The way the queries are translated (especially the native ones) is loose. This is great in some case, but can lead to some problems sometimes:
Don't use AS in native queries
Never use SELECT * in native queries
User = for equality and not ==
Use only the SQL-92 standard functions
I am not familiar with JPA, but in general a reasonable ORM should be database agnostic (for the the major databases) for all of its mappings.
Especially an "AUTO" Increment strategy should work out of the box..
When switching the database, you have to deal with migration issues for the existing data.
In general MySQL "AUTO_INCREMENT" should be used when selecting value generation of "IDENTITY", and on Sybase SERIAL, and on DB2 ... etc. Some RDBMS don't have something equivalent.
Value generation of "AUTO" is for the implementation to choose what is best for that datastore. Yes, on MySQL they may choose AUTO_INCREMENT, and on Sybase SERIAL, and on Oracle SEQUENCE, etc etc, but from the user code point of view that one will (should) work on any spec-compliant implementation. Obviously you cannot then switch JPA implementations and expect it to use the exact same mechanism, since JPA impl #1 may choose AUTO_INCREMENT on MySQL, and JPA impl #2 may choose some internal mechanism etc etc.
Related
A new project we are starting requires MultiTenancy. At storage level this can be done at several ways. (separate Database / separate schemas / Shared schema )
To keep the operational costs down we believe that "Shared Schema - Shared Tables" is the best way to continue. So all the tenants will share the same table on the same database/schema schema.
However a constraint is to provide good tenant isolation and security. For this we can use encryption. If we are able to provide each tenant with a own keypair, then we provide good security and good isolation. Each tenant can only read his own data and we don't have to add a discriminator field at each table as well.
How can we implement this technically? If you query your table we will get a lot of data we are not able to decrypt ( data from other tenants ). Also in Joins etc it will have higher load due to the other records being in database.
I've already read a couple of articles on MSDN and watched some presentations, but they keep it very high level and abstract. Any thoughts on this ?
Is something like described above possible? I thought you could do something on Amazon RDS ? Is it possible to provide some example - eg on github?
Based on what you've shared, and with some reading between the lines, I am wary of this approach. By itself, shared schema is a very reasonable design for multi-tenancy; where I see problems is with the suggested use of encryption.
While PostgreSQL does support encryption, it's done via functions in the pgcrypto module. RDS, as a managed service for PostgreSQL, adds the ability to easily provision encrypted volumes as well, but to a database user/developer, it's going to look pretty much the same.
The docs suggest using pgcrypto if you only need to encrypt small subsets of your data that you don't need to filter or join on - but it's not clear how much of the data you are looking to encrypt. If only a handful of columns and don't need to filter on them, this may work. Otherwise, reconsider - extensive use of the pgcrypto functions will render almost all standard database operations impossibly inefficient. A where clause will require decrypting the column, in turn requiring scanning/decrypting the full table; there would be zero use of indexes. Your performance will slow to a crawl very quickly.
A major consideration you haven't provided is how you are providing access - for example, a web application, where you completely mediate access with a single, trusted account? Or allowing the customers to connect directly to the database? In the former case, your code would be managing all access anyway, and would always need access to all the keys; why incur the overhead? In the latter case, you'd probably render the database unusable to the customer, because all of the standard query tools would be difficult to use.
More broadly, in my experience, a schema-per-tenant approach can offer a good balance between isolation, efficiency, and development overhead. And with judicious use of roles in PostgreSQL, you can enforce reasonable access controls for direct access (you can do the same with rows, though in my view that would require more overhead to administer correctly).
Take a look at some of the commonly used application frameworks to learn more: Rails offers the Apartment gem (https://github.com/influitive/apartment); Django has the django-tenants library (http://django-tenants.readthedocs.io/en/latest/); Hibernate has a pluggable tenant framework (e.g., https://docs.jboss.org/hibernate/orm/4.2/devguide/en-US/html/ch16.html)
Hope this helps.
When one should use MapDb vs regular database through an ORM? Other than having a direct mapping to Java.util.Map which can be implemented as well with an ORM.
Jan's answer is highly biased, since he is the author of MapDb.
MapDb is superb for "internal storage" and when there is a single entity with 'values' associated to it. Its interface is very straight forward, and you can either serialization in your own format (recommended) or rely on the highly compact internal serialization format in MapDb.
ORMs are most valuable when the stored data is under some type of "external control". This could be that there are storage policies in the company, pre-defined RDBMS schemas, or perhaps that the data must be queryable by some reporting engine that is made for SQL.
Then there are a multitude of situations where opinion and personal preference makes all the difference. Personally, I am in Jan's corner and think that ORMs quickly becomes incredibly hard to deal with, and if you take 'data migration' into account, I think MapDb (and many other NoSQL alternatives) wins out more times than not. For the case of external query engines, I would send data modification events from the primary application to a secondary system that interprets those and updates the "view" needed by such SQL-only systems.
I would use MapDB if you need extra performance and flexibility. Otherwise use regular ORM with DB.
it's not a realy problem, but it surprises me:
when I use Grails with diffrent DBs, I get different counter increments...:
with the ootb hsqldb, every table gets its own counter which is always increased by 1
with an oracle db, it seems that all tables use the same global counter
now I am using javadb/derby and the generated id are huge!
where can I find some more information about this behaviour and which one is the best?
hsql seems to keep the counters small
with oracle, I get a global unique id - also a nice feature
but what about the derby behaviour?
It really depends on the default id generation strategy in the specific dialect. Grails allows you to customize the generation strategy with mapping closure.
The most 'safe' (i.e. being supported by every RDBMS) generation strategy is TABLE, and this is preferred choice of many JPA implementations. This is probably what you get in HSQLDB. However, Oracle support sequences and these objects are generally better optimized for handling key generation -- hence, the dialect for Oracle seems to use one global sequence. I'm not familiar with Derby, but probably there is identity column support there and what you get is some sort of UUID.
I am using ORM to automatically create tables from model classes. I am naming the classes and their fields in a way that is natural for the application. The ORM then uses those same names for the tables and columns, and automatically generates names of other objects like constraints and sequences which are completely abstracted by the ORM.
I am not declaring how the tables, columns, etc. should be named. I leave it to the ORM to decide. I see this as good sense from the application's point of view.
The DBA for my team does not like this one bit. The DBA says that if column "B" in table "A" has a foreign key constraint to field "Y" in table "X", that the name must be "A.X_Y" instead of "A.B". The DBA says this is the "correct" way to name foreign keys and that the ORM is therefore naming them incorrectly. Both of the ORM engines I am familiar with allow class/field names to be explicitly mapped to table/column names, so I am aware it is possible to accommodate the DBA without changing the classes.
My question is, in practice, does doing this (explicitly mapping the names) necessarily introduce an extra entity(i.e. new configuration files/sections, new code) or coupling (i.e. imports, decorators, annotations) in the application that might not otherwise exist? ORM designs vary, so I would think it is possible that the answer is different for different engines.
If the answer to this is nearly universally yes, I would consider that a good argument that the DBA should yield to the ORM in the interest of productivity, on the logic that the database exists to serve the needs of the application, not the other way around. Commentary on this point is welcome also.
Please note, I am NOT asking for specific recommendations on how classes/fields/tables/columns should be named.
ORM designs vary, so I would think it is possible that the answer is different for different engines.
You are right here. Depending on the ORM, explicitly specifying the field->column mapping might (or might not) introduce extra entities in the form of configuration entries, annotations, attributes etc. Having said that, this is the bread and butter (or should I say the very basics) for an ORM engine and hence handling these mappings usually does not add a noticeable overhead.
I feel you definitely cannot use this as an argument against explicitly naming the tables,columns etc. Most often than not you would have another application that would use the same database. The default names generated by your application might make absolutely no sense for that application. I would let the DBA decide the right nomenclature for the database artifacts and tweak the mapping to my domain model, in my app. After all that's where the ORM adds value.
No, explicitly mapping the names does not create an extra entity or coupling; it defines a policy and process which your DBA has decided to be appropriate for their database(s). I'd contend, as well, that your logic is a bit backwards; the application exists to serve the needs of the database. The software which ANY company puts in place exists to serve / populate / present the data which exists in the database, not the other way around.
Consider this; the software that does whatever your company does will change as requirements and technology change, and get refactored as standards and complexity evolve. The data which is stored in the database, however, will not; even as the schema changes / evolves, the data underlying will change very little, if at all.
Coding standards exist for a reason - if I were the DBA, I might suggest that the productivity benefit of using the ORM without modification should yield in the name of maintainability.
It's a question of trade-offs: if the additional effort to name the foreign key column according to the corporate standard is minimal (and in most ORMs I've worked with, it is), I'd just do it.
If your ORM makes it super hard, I'd explain to the DBA that this is a different architecture than the one he's used to - the use of an ORM means there will rarely if ever be any manual SQL writing - it's all automagically created by the ORM, so it makes sense to relax the coding standards.
I know it is rather heated question. But anyway I'd like to hear opinions of those in Stackoverflow. Given that XML support is quite good in SQL Server 2005/2008, and there's no concern about database independency, why one need Linq-to-SQL, Entity Framework, NHibernate and the likes, which are quite complex and awkward in advanced use-cases, if by using POCOs, XmlSerializer, and stored procedures which process XML, one can achieve a lot less complex middle-tier? For reference, see the link: http://weblogs.asp.net/jezell/archive/2007/04/13/who-needs-orm-i-ve-got-sql-2005.aspx
The "less complex middle tier" is what worries me... the point of the ORM is to ensure that most of the complexity relates to your actual domain (whether that is order-processing, feed-reading, or whatever). That complexity has to go somewhere. And the last place you want that complexity is in the DB - your least scalable commodity (you generally scale the db server up (which is expensive), where-as you scale the app-servers out (much cheaper)).
There may be a case for using document databases instead of relational databases, but RDBMS are not going anywhere. Generally I would suggest: limit your xml usage at the db to sensible amounts. It can be a very effective tool - but be careful you aren't creating an inner-platform. The relational database (by whichever vendor) is exceptional at its job, with sophisticated indexing, ACID, referential integrity etc... leverage that power.
XmlSerialization and XML columns in databases can be difficult to work with, but I'm sure you can make it work. You'll still have to overcome standard ORM challenges like circular references. SQL queries can be quite awkward against XML database columns.
I would argue it is not ORMs that are complex in advanced use cases, it's the object-relational mismatch that is complex in advanced use cases. I don't see how XML and stored produces really addresses those advanced use cases in a better way.
XML is not relational or object-oriented in nature and you are adding an additional mismatch to the mix.