i was wondering if is there any cost/performance difference in using ancestor queries.
Query q = em.createQuery("SELECT FROM File f WHERE f.parentID = :parentID AND f.someOtherNumber > :xx");
q.setParameter("parentID", KeyFactory.createKey("User", 2343334443334L));
q.setParameter("xx",233);
//File class with ancestors
#Entity
class File{
#Id
#....
public Key ID;
#Extension(vendorName = "datanucleus", key = "gae.parent-pk", value ="true")
public Key parentID;
};
OR
Query q = em.createQuery("SELECT FROM File f WHERE f.parentID = :parentID AND f.someOtherNumber > :xx");
q.setParameter("parentID", 2343334443334L);
q.setParameter("xx",233);
//File class without ancestors
#Entity
class File{
#Id
#....
public Key ID;
public long parentID;
};
I was testing some stuff and if i use ancestor query my index doesn't include parentID(it says with ancestors) the non ancestor version it does.
Is there a difference in index/datastore read/write cost?
The writing costs might be slightly lower (one fewer indexed property), but the storage costs might be slightly higher (a key for each child entity includes all of its ancestors).
In either case, the differences are insignificant unless you have a billion records. You will face more serious performance/cost differences depending on your data access patterns (i.e. how you access the data most of the time).
Related
I have an Entity as
#Entity
public class Book{
...
List<Key<Page>> pages;
...
}
So to get a book I do
Book book = ofy().load().type(Book.class).id(id).now();
Having obtained the book, I want to get the pages; hence my question: can I query by keys or must I query by ids? If I had the ids I could do
List<Page> pages = ofy().load().type(Page.class).ids(ids);
But what I need is
List<Page> pages = ofy().load().type(Page.class).keys(keys);
otherwise I have to do linear work to iterate through the keys to extract the ids or the names, which I am not even sure will work because the keys actually have parents so that a key for a page is constructed as
Key pageKey = KeyFactory.createKey(bookKey, Page.class.getSimpleName(),someString);
So what is my final answer in this case?
You can easily load entities by Keys. From ofy's Concepts page (https://code.google.com/p/objectify-appengine/wiki/Concepts)
Map<Key<Object>, Object> lotsOfThings = ofy().load().keys(carKey, airplaneKey, chairKey, personKey, yourMamaKey);
I am using GAE with Objectify and have entities as below:
#Entity
class LevelOne {
#Id
Long id;
#Index
#Load
Ref<LevelTwo> two;
}
#Entity
class LevelTwo {
#Id
Long id;
#Index
List<Ref<LevelThree>> threes;
}
#Entity
class LevelThree {
#Id
Long id;
}
I want to find all LevelOnes that have a LevelTwo which contains a LevelThree
I use the query :
ofy().load().type(LevelOne.class).filter("two.threes", keyOfThree).list();
But I get no results. I get from the documentation on wiki that I should be expecting results if I don't use Refs and Embed the complete entities, but the redundancy would get scary!
Google App Engine's datastore does not perform joins. Ref<?>s are key references to foreign entities. You can't filter across references.
You can, however, create a synthetic index field in your LevelOne object and fill it (perhaps in an #OnSave method) with whatever data you wish - including data in other entities. However, as with denormalization in general, you'll have to be careful about how data is updated.
I want to use UUIDs as primary due to the fact that those primaries are globally unique which makes it (for example) easy to integrate data from a production environment into a running debug environment.
Regarding to the following article: http://iops.io/blog/storing-billions-uuid-fields-mysql-innodb SELECT/INSERT of millions of records into a table using a UUID decoded as BINARY(16) is significantly faster than using simple CHAR(36).
Now, using Hibernate #GenericGenerator annotation, I could use this native UUID generator for a primary key using UUIDs:
#Id
#GeneratedValue(generator = "system-uuid")
#GenericGenerator(name = "system-uuid", strategy = "uuid")
private String id;
On the other hand I could define a binary UUID as primary as follows:
#Id
#Getter
#Column(columnDefinition = "BINARY(16)", length = 16, updatable=false, nullable=false)
private byte[] id;
... and use #PrePersist to generate new UUIDs
#PrePersist
private void prePersist() {
if (this.id == null) {
this.generateUUID();
}
}
The problem with this solution is the representation as binary for filters within (native/named) queries:
SELECT * from object o WHERE o.id=:id
What I would really need is the ability store the UUID field as above-mentioned as BINARY within the database while representing the value as simple UUID-String.
Is there any way to do that? Is there any alternative?
Why don't you just use the special uuid type for the column?
#Type(type = "pg-uuid")
But I also have the problem with native queries when doing like that.
I have an Entity that looks like this:
#Entity
public class Relationship
{
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private Key key;
#Basic
private UUID from;
#Basic
private UUID to;
}
Now I can have arbitrary levels of indirection here like so:
final Relationship r0 = new Relationship(a,b);
final Relationship r1 = new Relationship(b,c);
final Relationship r2 = new Relationship(c,d);
final Relationship rN = new Relationship(d,e);
Now what I want to find out as efficiently as possible is given a give me back e where rN is N level deep.
If I was writing regular SQL I would do something like the follow pseudo code :
SELECT r.to
FROM relationship r
WHERE r.from = 'a' AND
r.to NOT IN ( SELECT r.from FROM relationship r)
The only thing I can find online is references to passing in a List as a parameter to a Criteria.Builder.In but I don't have the list, I need to use a sub-select as the list?
Also this is using the Datastore in Google App Engine, and it is restricted on some things that it supports via JPA 2.
Am I going to have to resort to the low level Datastore API?
In the datastore, there's no way to issue a single query to get 'e' from 'a'. In fact the only way to get e, is to individually query each Relationship linearly, so you'll need to do four queries.
You can pass in a list as a parameter, but that's only for an IN query. NOT IN queries are not available, and neither are JOINs.
(Aside: you could use a combination of the from and to properties to create a key, in which case you could just fetch the entity instead of query).
Usually, the GAE datastore version of doing things is to denormalize, ie write extra data that will enable your queries. (This is a pain, because it also means that when you update an entity, you need to be careful to update the denormalized data as well, and it can be hard to synchronize this - It's designed for web type traffic where reads occur much more frequently than writes.)
This is a potential solution:
#Entity
public class Relationship
{
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private Key key;
#Basic
private UUID from;
#Basic
private UUID to;
#ElementCollection
private Collection<UUID> reachable;
}
In this case you would simply query
WHERE from = 'a' and reachable = 'e'
Solution
Surprisingly enough this recursive method doesn't error out with a StackOverflow even with 1000 levels of indirection, at least not on my local development server.
public UUID resolve(#Nonnull final UUID uuid)
{
final EntityManager em = EMF.TRANSACTIONS_OPTIONAL.createEntityManager();
try
{
final String qs = String.format("SELECT FROM %s a WHERE a.from = :from ", Alias.class.getName());
final TypedQuery<Alias> q = em.createQuery(qs, Alias.class);
q.setParameter("from", uuid);
final Alias r;
try
{
r = q.getSingleResult();
final Key tok = KeyFactory.createKey(Alias.class.getSimpleName(), r.getTo().toString());
if (em.find(Alias.class, tok) == null)
{
return r.getTo();
}
else
{
return this.resolve(r.getTo());
}
}
catch (final NoResultException e)
{
/* this is expected when there are no more aliases */
return uuid;
}
}
finally
{
em.close();
}
}
The stress test code I had is timing out on the actual GAE Service, but I am not worried about it, I won't be creating more than one level of indirection at a time in practice. And there won't be more than a handful of indirections either, and it will all get hoisted up into Memcache in the final version anyway.
I have following Objectify entity to store data in Google DataStore.
public class Record implements Serializable {
private static final long serialVersionUID = 201203171843L;
#Id
private Long id;
private String name; // John Smith
private Integer age; // 43
private String gender; // Male/Female
private String eventName; // Name of the marathon/event
private String eventCityName; // City of the event
private String eventStateName; // State of the event
private Date eventDate; // event date
//Getters & Setters
}
Now, my question is how can I query my database to get count of Records for a given eventName or event City+State? Or get a list of all City+Name.
On App Engine counting is very expensive: you basically need to query with certain condition (eventName = something), then count all results. The cost is a key-only query (1 read + 1 small operation) and increases with number of entities counted. For example counting 1 million entities would cost $0.8.
What is normally done is to keep count of things as a property inside a dedicated entity: increase the property value when count goes up (entity added) and decrease when it goes down (entity deleted).
If you plan to do this on a larger scale then understand there is a write/update limitation of about 5 writes/s per entity (entity group actually). See sharded counters for how to work around this.