Java EE/JPA: Improve query performance - store UUID as binary - database

I want to use UUIDs as primary due to the fact that those primaries are globally unique which makes it (for example) easy to integrate data from a production environment into a running debug environment.
Regarding to the following article: http://iops.io/blog/storing-billions-uuid-fields-mysql-innodb SELECT/INSERT of millions of records into a table using a UUID decoded as BINARY(16) is significantly faster than using simple CHAR(36).
Now, using Hibernate #GenericGenerator annotation, I could use this native UUID generator for a primary key using UUIDs:
#Id
#GeneratedValue(generator = "system-uuid")
#GenericGenerator(name = "system-uuid", strategy = "uuid")
private String id;
On the other hand I could define a binary UUID as primary as follows:
#Id
#Getter
#Column(columnDefinition = "BINARY(16)", length = 16, updatable=false, nullable=false)
private byte[] id;
... and use #PrePersist to generate new UUIDs
#PrePersist
private void prePersist() {
if (this.id == null) {
this.generateUUID();
}
}
The problem with this solution is the representation as binary for filters within (native/named) queries:
SELECT * from object o WHERE o.id=:id
What I would really need is the ability store the UUID field as above-mentioned as BINARY within the database while representing the value as simple UUID-String.
Is there any way to do that? Is there any alternative?

Why don't you just use the special uuid type for the column?
#Type(type = "pg-uuid")
But I also have the problem with native queries when doing like that.

Related

How do i find name of ID Generation sequence/table in Hibernate?

I have defined an entity as so:
public class Chair {
#GenericGenerator(name = "sequencePerEntityGenerator", strategy = "org.hibernate.id.enhanced.SequenceStyleGenerator", parameters = {
#Parameter(name = "prefer_sequence_per_entity", value = "true"),
#Parameter(name = "sequence_per_entity_suffix", value = "_seq"),
#Parameter(name = "initial_value", value = "5000000"),
#Parameter(name = SequenceStyleGenerator.INCREMENT_PARAM, value = "1") })
#GeneratedValue(strategy = GenerationType.AUTO, generator = "sequencePerEntityGenerator")
#Id
int id;
But i would like to know what are the names of the sequences created (using Chair.class), and extract names/use dialect to create a nextval call of my own, and query a new ID without any .persist() call is made. Is this possible? If yes how so? If not how else?
My final aim is to query multiple IDs (upto millions) using a single SQL statement as provided in this stackoverflow question, in an existing application. (Other options for the same in MySQL, Postgres exist as separate answers to other questions)
PS: One may recommend using PooledOptimizer or HiLo Optimizer or any Client-ended optimizers already provided by hibernate to optimize ID generation, but given the heavy load my application has due to other processes, it is unable to allocate enough CPU time to sequence optimizers, and the synchronized generate methods of these optimizers blocks threads (asynchronous persistence). Incidentally, using optimizers slows down multi-threaded persist calls on the same Entity class, and is slower than NoopOptimizer (no optimizer).

Database : Table and mappings for a Matrix style table

I am working on a Spring-MVC using Postgres application in which I am trying to do a report generation form. Now, for this, I have to save the data for the form. But, the report has this matrix kind of part, which I don't know how to realize. Sure I can do it, but I want something optimized.
As you can see from the image, on left side, there are fields and each field has different values to be inserted as indicated.
As of now, I was able to come up only one Table as Parts and its class is mentioned below. But as each variable in the class will have 6 values, it will require me to create 6 tables and have some mapping. I want to avoid that. What can I do?
#Entity
#Table(name = "containment")
public class Containment {
#Id
#Column(name="containment_id")
#GeneratedValue(strategy = GenerationType.SEQUENCE,generator = "containment_gen")
#SequenceGenerator(name = "containment_gen",sequenceName = "containment_seq")
private Long containmentId;
#Column(name = "parts_at_plant")
private String partsAtPlant;
#Column(name = "parts_at_logistics")
private String partsAtLogistics;
}
I am creating class, not writing database-tables directly. If someone wants to see above in SQL code, I am more than happy to write it. Thank you.

Google app engine JPA ancestor queries

i was wondering if is there any cost/performance difference in using ancestor queries.
Query q = em.createQuery("SELECT FROM File f WHERE f.parentID = :parentID AND f.someOtherNumber > :xx");
q.setParameter("parentID", KeyFactory.createKey("User", 2343334443334L));
q.setParameter("xx",233);
//File class with ancestors
#Entity
class File{
#Id
#....
public Key ID;
#Extension(vendorName = "datanucleus", key = "gae.parent-pk", value ="true")
public Key parentID;
};
OR
Query q = em.createQuery("SELECT FROM File f WHERE f.parentID = :parentID AND f.someOtherNumber > :xx");
q.setParameter("parentID", 2343334443334L);
q.setParameter("xx",233);
//File class without ancestors
#Entity
class File{
#Id
#....
public Key ID;
public long parentID;
};
I was testing some stuff and if i use ancestor query my index doesn't include parentID(it says with ancestors) the non ancestor version it does.
Is there a difference in index/datastore read/write cost?
The writing costs might be slightly lower (one fewer indexed property), but the storage costs might be slightly higher (a key for each child entity includes all of its ancestors).
In either case, the differences are insignificant unless you have a billion records. You will face more serious performance/cost differences depending on your data access patterns (i.e. how you access the data most of the time).

Most efficient way to do this select in JPA 2?

I have an Entity that looks like this:
#Entity
public class Relationship
{
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private Key key;
#Basic
private UUID from;
#Basic
private UUID to;
}
Now I can have arbitrary levels of indirection here like so:
final Relationship r0 = new Relationship(a,b);
final Relationship r1 = new Relationship(b,c);
final Relationship r2 = new Relationship(c,d);
final Relationship rN = new Relationship(d,e);
Now what I want to find out as efficiently as possible is given a give me back e where rN is N level deep.
If I was writing regular SQL I would do something like the follow pseudo code :
SELECT r.to
FROM relationship r
WHERE r.from = 'a' AND
r.to NOT IN ( SELECT r.from FROM relationship r)
The only thing I can find online is references to passing in a List as a parameter to a Criteria.Builder.In but I don't have the list, I need to use a sub-select as the list?
Also this is using the Datastore in Google App Engine, and it is restricted on some things that it supports via JPA 2.
Am I going to have to resort to the low level Datastore API?
In the datastore, there's no way to issue a single query to get 'e' from 'a'. In fact the only way to get e, is to individually query each Relationship linearly, so you'll need to do four queries.
You can pass in a list as a parameter, but that's only for an IN query. NOT IN queries are not available, and neither are JOINs.
(Aside: you could use a combination of the from and to properties to create a key, in which case you could just fetch the entity instead of query).
Usually, the GAE datastore version of doing things is to denormalize, ie write extra data that will enable your queries. (This is a pain, because it also means that when you update an entity, you need to be careful to update the denormalized data as well, and it can be hard to synchronize this - It's designed for web type traffic where reads occur much more frequently than writes.)
This is a potential solution:
#Entity
public class Relationship
{
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private Key key;
#Basic
private UUID from;
#Basic
private UUID to;
#ElementCollection
private Collection<UUID> reachable;
}
In this case you would simply query
WHERE from = 'a' and reachable = 'e'
Solution
Surprisingly enough this recursive method doesn't error out with a StackOverflow even with 1000 levels of indirection, at least not on my local development server.
public UUID resolve(#Nonnull final UUID uuid)
{
final EntityManager em = EMF.TRANSACTIONS_OPTIONAL.createEntityManager();
try
{
final String qs = String.format("SELECT FROM %s a WHERE a.from = :from ", Alias.class.getName());
final TypedQuery<Alias> q = em.createQuery(qs, Alias.class);
q.setParameter("from", uuid);
final Alias r;
try
{
r = q.getSingleResult();
final Key tok = KeyFactory.createKey(Alias.class.getSimpleName(), r.getTo().toString());
if (em.find(Alias.class, tok) == null)
{
return r.getTo();
}
else
{
return this.resolve(r.getTo());
}
}
catch (final NoResultException e)
{
/* this is expected when there are no more aliases */
return uuid;
}
}
finally
{
em.close();
}
}
The stress test code I had is timing out on the actual GAE Service, but I am not worried about it, I won't be creating more than one level of indirection at a time in practice. And there won't be more than a handful of indirections either, and it will all get hoisted up into Memcache in the final version anyway.

store strings of arbitrary length in Postgresql

I have a Spring application which uses JPA (Hibernate) initially created with Spring Roo. I need to store Strings with arbitrary length, so for that reason I've annotated the field with #Lob:
public class MyEntity{
#NotNull
#Size(min = 2)
#Lob
private String message;
...
}
The application works ok in localhost but I've deployed it to an external server and it a problem with encoding has appeared. For that reason I'd like to check if the data stored in the PostgreSQL database is ok or not. The application creates/updates the tables automatically. And for that field (message) it has created a column of type:
text NOT NULL
The problem is that after storing data if I browse the table or just do a SELECT of that column I can't see the text but numbers. Those numbers seems to be identifiers to "somewhere" where that information is stored.
Can anyone tell me exactly what are these identifiers and if there is any way of being able to see the stored data in a #Lob columm from a pgAdmin or a select clause?
Is there any better way to store Strings of arbitrary length in JPA?
Thanks.
I would recommend skipping the '#Lob' annotation and use columnDefinition like this:
#Column(columnDefinition="TEXT")
see if that helps viewing the data while browsing the database itself.
Use the #LOB definition, it is correct. The table is storing an OID to the catalogs -> postegreSQL-> tables -> pg_largeobject table.
The binary data is stored here efficiently and JPA will correctly get the data out and store it for you with this as an implementation detail.
Old question, but here is what I found when I encountered this:
http://www.solewing.org/blog/2015/08/hibernate-postgresql-and-lob-string/
Relevant parts below.
#Entity
#Table(name = "note")
#Access(AccessType.FIELD)
class NoteEntity {
#Id
private Long id;
#Lob
#Column(name = "note_text")
private String noteText;
public NoteEntity() { }
public NoteEntity(String noteText) { this.noteText = noteText }
}
The Hibernate PostgreSQL9Dialect stores #Lob String attribute values by explicitly creating a large object instance, and then storing the UID of the object in the column associated with attribute.
Obviously, the text of our notes isn’t really in the column. So where is it? The answer is that Hibernate explicitly created a large object for each note, and stored the UID of the object in the column. If we use some PostgreSQL large object functions, we can retrieve the text itself.
Use this to query:
SELECT id,
convert_from(loread(
lo_open(note_text::int, x'40000'::int), x'40000'::int), 'UTF-8')
AS note_text
FROM note

Resources