Most efficient way to do this select in JPA 2? - google-app-engine

I have an Entity that looks like this:
#Entity
public class Relationship
{
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private Key key;
#Basic
private UUID from;
#Basic
private UUID to;
}
Now I can have arbitrary levels of indirection here like so:
final Relationship r0 = new Relationship(a,b);
final Relationship r1 = new Relationship(b,c);
final Relationship r2 = new Relationship(c,d);
final Relationship rN = new Relationship(d,e);
Now what I want to find out as efficiently as possible is given a give me back e where rN is N level deep.
If I was writing regular SQL I would do something like the follow pseudo code :
SELECT r.to
FROM relationship r
WHERE r.from = 'a' AND
r.to NOT IN ( SELECT r.from FROM relationship r)
The only thing I can find online is references to passing in a List as a parameter to a Criteria.Builder.In but I don't have the list, I need to use a sub-select as the list?
Also this is using the Datastore in Google App Engine, and it is restricted on some things that it supports via JPA 2.
Am I going to have to resort to the low level Datastore API?

In the datastore, there's no way to issue a single query to get 'e' from 'a'. In fact the only way to get e, is to individually query each Relationship linearly, so you'll need to do four queries.
You can pass in a list as a parameter, but that's only for an IN query. NOT IN queries are not available, and neither are JOINs.
(Aside: you could use a combination of the from and to properties to create a key, in which case you could just fetch the entity instead of query).
Usually, the GAE datastore version of doing things is to denormalize, ie write extra data that will enable your queries. (This is a pain, because it also means that when you update an entity, you need to be careful to update the denormalized data as well, and it can be hard to synchronize this - It's designed for web type traffic where reads occur much more frequently than writes.)
This is a potential solution:
#Entity
public class Relationship
{
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private Key key;
#Basic
private UUID from;
#Basic
private UUID to;
#ElementCollection
private Collection<UUID> reachable;
}
In this case you would simply query
WHERE from = 'a' and reachable = 'e'

Solution
Surprisingly enough this recursive method doesn't error out with a StackOverflow even with 1000 levels of indirection, at least not on my local development server.
public UUID resolve(#Nonnull final UUID uuid)
{
final EntityManager em = EMF.TRANSACTIONS_OPTIONAL.createEntityManager();
try
{
final String qs = String.format("SELECT FROM %s a WHERE a.from = :from ", Alias.class.getName());
final TypedQuery<Alias> q = em.createQuery(qs, Alias.class);
q.setParameter("from", uuid);
final Alias r;
try
{
r = q.getSingleResult();
final Key tok = KeyFactory.createKey(Alias.class.getSimpleName(), r.getTo().toString());
if (em.find(Alias.class, tok) == null)
{
return r.getTo();
}
else
{
return this.resolve(r.getTo());
}
}
catch (final NoResultException e)
{
/* this is expected when there are no more aliases */
return uuid;
}
}
finally
{
em.close();
}
}
The stress test code I had is timing out on the actual GAE Service, but I am not worried about it, I won't be creating more than one level of indirection at a time in practice. And there won't be more than a handful of indirections either, and it will all get hoisted up into Memcache in the final version anyway.

Related

Database : Table and mappings for a Matrix style table

I am working on a Spring-MVC using Postgres application in which I am trying to do a report generation form. Now, for this, I have to save the data for the form. But, the report has this matrix kind of part, which I don't know how to realize. Sure I can do it, but I want something optimized.
As you can see from the image, on left side, there are fields and each field has different values to be inserted as indicated.
As of now, I was able to come up only one Table as Parts and its class is mentioned below. But as each variable in the class will have 6 values, it will require me to create 6 tables and have some mapping. I want to avoid that. What can I do?
#Entity
#Table(name = "containment")
public class Containment {
#Id
#Column(name="containment_id")
#GeneratedValue(strategy = GenerationType.SEQUENCE,generator = "containment_gen")
#SequenceGenerator(name = "containment_gen",sequenceName = "containment_seq")
private Long containmentId;
#Column(name = "parts_at_plant")
private String partsAtPlant;
#Column(name = "parts_at_logistics")
private String partsAtLogistics;
}
I am creating class, not writing database-tables directly. If someone wants to see above in SQL code, I am more than happy to write it. Thank you.

Java EE/JPA: Improve query performance - store UUID as binary

I want to use UUIDs as primary due to the fact that those primaries are globally unique which makes it (for example) easy to integrate data from a production environment into a running debug environment.
Regarding to the following article: http://iops.io/blog/storing-billions-uuid-fields-mysql-innodb SELECT/INSERT of millions of records into a table using a UUID decoded as BINARY(16) is significantly faster than using simple CHAR(36).
Now, using Hibernate #GenericGenerator annotation, I could use this native UUID generator for a primary key using UUIDs:
#Id
#GeneratedValue(generator = "system-uuid")
#GenericGenerator(name = "system-uuid", strategy = "uuid")
private String id;
On the other hand I could define a binary UUID as primary as follows:
#Id
#Getter
#Column(columnDefinition = "BINARY(16)", length = 16, updatable=false, nullable=false)
private byte[] id;
... and use #PrePersist to generate new UUIDs
#PrePersist
private void prePersist() {
if (this.id == null) {
this.generateUUID();
}
}
The problem with this solution is the representation as binary for filters within (native/named) queries:
SELECT * from object o WHERE o.id=:id
What I would really need is the ability store the UUID field as above-mentioned as BINARY within the database while representing the value as simple UUID-String.
Is there any way to do that? Is there any alternative?
Why don't you just use the special uuid type for the column?
#Type(type = "pg-uuid")
But I also have the problem with native queries when doing like that.

Google app engine JPA ancestor queries

i was wondering if is there any cost/performance difference in using ancestor queries.
Query q = em.createQuery("SELECT FROM File f WHERE f.parentID = :parentID AND f.someOtherNumber > :xx");
q.setParameter("parentID", KeyFactory.createKey("User", 2343334443334L));
q.setParameter("xx",233);
//File class with ancestors
#Entity
class File{
#Id
#....
public Key ID;
#Extension(vendorName = "datanucleus", key = "gae.parent-pk", value ="true")
public Key parentID;
};
OR
Query q = em.createQuery("SELECT FROM File f WHERE f.parentID = :parentID AND f.someOtherNumber > :xx");
q.setParameter("parentID", 2343334443334L);
q.setParameter("xx",233);
//File class without ancestors
#Entity
class File{
#Id
#....
public Key ID;
public long parentID;
};
I was testing some stuff and if i use ancestor query my index doesn't include parentID(it says with ancestors) the non ancestor version it does.
Is there a difference in index/datastore read/write cost?
The writing costs might be slightly lower (one fewer indexed property), but the storage costs might be slightly higher (a key for each child entity includes all of its ancestors).
In either case, the differences are insignificant unless you have a billion records. You will face more serious performance/cost differences depending on your data access patterns (i.e. how you access the data most of the time).

Query Google DataStore

I have following Objectify entity to store data in Google DataStore.
public class Record implements Serializable {
private static final long serialVersionUID = 201203171843L;
#Id
private Long id;
private String name; // John Smith
private Integer age; // 43
private String gender; // Male/Female
private String eventName; // Name of the marathon/event
private String eventCityName; // City of the event
private String eventStateName; // State of the event
private Date eventDate; // event date
//Getters & Setters
}
Now, my question is how can I query my database to get count of Records for a given eventName or event City+State? Or get a list of all City+Name.
On App Engine counting is very expensive: you basically need to query with certain condition (eventName = something), then count all results. The cost is a key-only query (1 read + 1 small operation) and increases with number of entities counted. For example counting 1 million entities would cost $0.8.
What is normally done is to keep count of things as a property inside a dedicated entity: increase the property value when count goes up (entity added) and decrease when it goes down (entity deleted).
If you plan to do this on a larger scale then understand there is a write/update limitation of about 5 writes/s per entity (entity group actually). See sharded counters for how to work around this.

Select from table using XML column

I am creating a task-scheduler on SQL Server 2008.
I have a table that I use to store tasks. Each task is a task name (e.g. ImportFile) and arguments. I store arguments in XML column, since different tasks have different signatures.
Table is as follows:
Id:integer(PK) | operation:nvarchar | Arguments:xml
Before queuing a task, I often need to verify that given task hasn't been scheduled yet. The lookup is done based on both operation and args.
Question: Using Linq-to-Sql how can I check if given operation+args is present in the queue already?
I am looking for something like:
var isTaskScheduled = db.Tasks.Any(t =>
t.Opearation == task.Operation &&
t.Arguments == task.ArgumentsAsXElement);
(which doesn't work because SQL Server can't compare XML type)
Any alternative implementation suggestions?
You might want to surface e.g. a string property that encapsultes your Arguments, or maybe it would be sufficient to have e.g. the length and a CRC of your Arguments as extra properties on your class:
public partial class Task
{
public int ArgumentLength
{ .... }
public int ArgumentCRC
{ .... }
}
That way, if you can compare length (of your XML) and the CRC and they match, you can be pretty sure and safe to assume the two XML's are identical. Your check would then be something like:
var isTaskScheduled =
db.Tasks.Any(t => t.Operation == task.Operation &&
t.ArgumentLength == task.ArgumentLength &&
t.ArgumentCRC == task.ArgumentCRC);
or something like that.
This may be a stretch, but you could use a "Hashcode" when saving the data to the database, then query on the hashcode value at a later date / time.
This assumes that you have a class that represents your task entity and that you have overridden the GetHashCode method of said class.
Now, when you go to query the database to see if the task is in the scheduled queue, you simply query on the hashcode, thus avoiding the need to do any xml poking at query time.
var t1 = new Task{Operation="Run", Arguments="someXElement.value"};
var t2 = new Task{Operation="Run", Arguments="someXElement.value"};
in the code above t1 == t2 because you are overriding GetHashCode and computing the hash for Operation+Arguments.Value. if you store the hashcode in the db, then you can easily tell if you have an object in the DB that equals the hash code that you are checking for.
This may be similar to what marc_s was talking about.
You can write a class which implements IComparable:
public class XMLArgument : IComparable
{
public XMLArgument(string argument)
{
}
public int CompareTo(object obj)
{
...
}
}
var isTaskScheduled = db.Tasks.Any(t =>
t.Opearation == task.Operation &&
(new XMLArgument(t.Arguments)).CompareTo(new XMLArgument(task.ArgumentsAsXElement)) == 0);

Resources