Google App Engine - Recommended approach for querying by Ids

Google App Engine - Recommended approach for querying by Ids - google-app-engine

Anyone can recommend the better approach for querying entities by multiple ids from GAE HRD datastore ?
1.
mgr = getEntityManager();
Query dbQuery = mgr.createQuery("SELECT FROM CustomEntity as CustomEntity WHERE id IN (:ids)");
dbQuery.setParameter("ids", results.getIds());
return (List<CustomEntity>) dbQuery.getResultList();
2.
List<Key> customEntityKeys = new ArrayList<Key>();
for (String id : results.getIds()) {
customEntityKeys.add(KeyFactory.createKey("CustomEntity", id));
}
mgr = getEntityManager();
JPADatastoreBridge jpaBridge = new JPADatastoreBridge();
List<CustomEntity> customEntities = new ArrayList<CustomEntity>();
Map<Key,Entity> customEntityMap = datastore.get(customEntityKeys);
for (Entity customEntityEntity : customEntityMap.values()) {
customEntities.add((CustomEntity) jpaBridge.getJPAFromEntity(customEntityEntity, mgr, CustomEntity.class));
}
return customEntities;
In "better approach" i mean mainly performance wise. also, if there is another way I'll be happy to hear about it.
Thanks.
p.s.
Im using JPA as my persistance method. Not sure if this really matters.

I don't know about most of the code, but if you're asking whether you should use Query("SELECT.... WHERE ID IN....") or datastore.get(...), the second is much, much better. Gets are significantly more efficient than queries with the App Engine datastore - not to mention the fact that they are always strongly consistent, whereas non-ancestor queries are only eventually consistent.

Related

Controlling NHIbernate search query output regarding parameters

When you use NHibernate to "fetch" a mapped object, it outputs a SELECT query to the database. It outputs this using parameters; so if I query a list of cars based on tenant ID and name, I get:
select Name, Location from Car where tenantID=#p0 and Name=#p1
This has the nice benefit of our database creating (and caching) a query plan based on this query and the result, so when it is run again, the query is much faster as it can load the plan from the cache.
The problem with this is that we are a multi-tenant database, and almost all of our indexes are partition aligned. Our tenants have vastly different data sets; one tenant could have 5 cars, while another could have 50,000. And so because NHibernate does this, it has the net effect of our database creating and caching a plan for the FIRST tenant that runs it. This plan is likely not efficient for subsequent tenants who run the query.
What I WANT to do is force NHibernate NOT to parameterize certain parameters; namely, the tenant ID. So I'd want the query to read:
select Name, Location from Car where tenantID=55 and Name=#p0
I can't figure out how to do this in the HBM.XML mapping. How can I dictate to NHibernate how to use parameters? Or can I just turn parameters off altogether?

OK everyone, I figured it out.
The way I did it was overriding the SqlClientDriver with my own custom driver that looks like this:
public class CustomSqlClientDriver : SqlClientDriver
{
private static Regex _partitionKeyReplacer = new Regex(#".PartitionKey=(#p0)", RegexOptions.Compiled);
public override void AdjustCommand(IDbCommand command)
{
var m = _tenantIDReplacer.Match(command.CommandText);
if (!m.Success)
return;
// replace the first parameter with the actual partition key
var parameterName = m.Groups[1].Value;
// find the parameter value
var tenantID = (IDbDataParameter ) command.Parameters[parameterName];
var valueOfTenantID = tenantID.Value;
// now replace the string
command.CommandText = _tenantIDReplacer.Replace(command.CommandText, ".TenantID=" + valueOfTenantID);
}
} }
I override the AdjustCommand method and use a Regex to replace the tenantID. This works; not sure if there's a better way, but I really didn't want to have to open up NHibernate and start messing with core code.
You'll have to register this custom driver in the connection.driver_class property of the SessionFactory upon initialization.
Hope this helps somebody!

Not Able to retrive all data from datastore

In my datasore have one table EFlow and this table have 7000 entries but first 1000 entries have these fileds :
(ID/Name, appliedBy, approved, childEflowName, completed, completedApprovers, created_on, dueDate, eflowDispName, eflowName, isResubmitted, modified_on, nextApprover, parentEflowName, ruleEmailReceivers, ruleNames, upComingApprovers, workFlowName, workFlowVersion, approvalStateValues)
and remaining 6000 entries have these feilds:
(ID/Name, appliedBy, approvalStateValues, approved, childEflowName, completed, completedApprovers, created_on, draft, dueDate, dynamicApprovalStates, eflowApprovers, eflowDispName, eflowName, fieldValues, isResubmitted, modified_on, nextApprover, parentEflowName, ruleEmailReceivers, ruleNames, upComingApprovers, workFlowName, workFlowVersion)
I have added draft,dynamicApprovalStates,eflowApprovers and fieldValues this new field.
my problem is when I retrieve data from datastore then I got only first 1000 entries record.
How to retrieve all records?
My query is:
List<EFlow> lst = this.entityManager.createQuery("select from " + this.clazz.getName() + " i where i.completed = false and i.approved = false").getResultList();

First, it looks like you are using JPA. From our docs:
Warning: We think most developers will have a better experience using
the low-level Datastore API, or one of the open-source APIs developed
specifically for Datastore, such as Objectify. JPA was designed for
use with traditional relational databases, and so has no way to
explicitly represent some of the aspects of Datastore that make it
different from relational databases, such as entity groups and
ancestor queries. This can lead to subtle issues that are difficult to
understand and fix.
However, if you need to keep using JPA:
As the number of results can be large, you need to handle pagination with your query.
The best way to achieve this is with cursors.
import com.google.appengine.api.datastore.Cursor;
import com.google.appengine.datanucleus.query.JPACursorHelper;
...
Query query = this.entityManager.createQuery("select from " + this.clazz.getName() + " i where i.completed = false and i.approved = false")
Cursor cursor = Cursor.newBuilder().build();
do {
query.setHint(JPACursorHelper.CURSOR_HINT, cursor);
List<EFlow> lst = query.getResultList();
// ... Do stuff on lst here .. //
// Get the cursor so you can see if there are more results
cursor = JPACursorHelper.getCursor(lst);
} while (cursor != null)

Does the NDB membership query ("IN" operation) performance degrade with lots of possible values?

The documentation for the IN query operation states that those queries are implemented as a big OR'ed equality query:
qry = Article.query(Article.tags.IN(['python', 'ruby', 'php']))
is equivalent to:
qry = Article.query(ndb.OR(Article.tags == 'python',
Article.tags == 'ruby',
Article.tags == 'php'))
I am currently modelling some entities for a GAE project and plan on using these membership queries with a lot of possible values:
qry = Player.query(Player.facebook_id.IN(list_of_facebook_ids))
where list_of_facebook_ids could have thousands of items.
Will this type of query perform well with thousands of possible values in the list? If not, what would be the recommended approach for modelling this?

This won't work with thousands of values (in fact I bet it starts degrading with more than 10 values). The only alternative I can think of are some form of precomputation. You'll have to change your schema.

One way you can you do it is to create a new model called FacebookPlayer which is an index. This would be keyed by facebook_id. You would update it whenever you add a new player. It looks something like this:
class FacebookUser(ndb.Model):
player = ndb.KeyProperty(kind='Player', required=True)
Now you can avoid queries altogether. You can do this:
# Build keys from facebook ids.
facebook_id_keys = []
for facebook_id in list_of_facebook_ids:
facebook_id_keys.append(ndb.Key('FacebookPlayer', facebook_id))
keysOfUsersMatchedByFacebookId = []
for facebook_player in ndb.get_multi(facebook_id_keys):
if facebook_player:
keysOfUsersMatchedByFacebookId.append(facebook_player.player)
usersMatchedByFacebookId = ndb.get_multi(keysOfUsersMatchedByFacebookId)
If list_of_facebook_ids is thousands of items, you should do this in batches.

GAE caching objectify queries

i have a simple question
in the objectify documentation it says that "Only get(), put(), and delete() interact with the cache. query() is not cached"
http://code.google.com/p/objectify-appengine/wiki/IntroductionToObjectify#Global_Cache.
what i'm wondering - if you have one root entity (i did not use #Parent due to all the scalability issues that it seems to have) that all the other entities have a Key to, and you do a query such as
ofy.query(ChildEntity.class).filter("rootEntity", rootEntity).list()
is this completely bypassing the cache?
If this is the case, is there an efficient caching way to do a query on conditions - or for that matter can you cache a query with a parent where you would have to make an actual ancestor query like the following
Key<Parent> rootKey = ObjectifyService.factory().getKey(root)
ofy.query(ChildEntity.class).ancestor(rootKey)
Thank you
as to one of the comments below i've added an edit
sample dao (ignore the validate method - it just does some null & quantity checks):
this is a sample find all method inside a delegate called from the DAO that the request factory ServiceLocator is using
public List<EquipmentCheckin> findAll(Subject subject, Objectify ofy, Event event) {
final Business business = (Business) subject.getSession().getAttribute(BUSINESS_ATTRIBUTE);
final List<EquipmentCheckin> checkins = ofy.query(EquipmentCheckin.class).filter(BUSINESS_ATTRIBUTE, business)
.filter(EVENT_CONDITION, event).list();
return validate(ofy, checkins);
}
now, when this is executed i find that the following method is actually being called in my AbstractDAO.
/**
*
* #param id
* #return
*/
public T find(Long id) {
System.out.println("finding " + clazz.getSimpleName() + " id = " + id);
return ObjectifyService.begin().find(clazz, id);
}

Yes, all queries bypass Objectify's integrated memcache and fetch results directly from the datastore. The datastore provides the (increasingly sophisticated) query engine that understands how to return results; determining cache invalidation for query results is pretty much impossible from the client side.
On the other hand, Objectify4 does offer a hybrid query cache whereby queries are automagically converted to a keys-only query followed by a batch get. The keys-only query still requires the datastore, but any entity instances are pulled from (and populate on miss) memcache. It might save you money.

How to use MS Sync Framework to filter client-specific data?

Let's say I've got a SQL 2008 database table with lots of records associated with two different customers, Customer A and Customer B.
I would like to build a fat client application that fetches all of the records that are specific to either Customer A or Customer B based on the credentials of the requesting user, then stores the fetched records in a temporary local table.
Thinking I might use the MS Sync Framework to accomplish this, I started reading about row filtering when I came across this little chestnut:
Do not rely on filtering for security.
The ability to filter data from the
server based on a client or user ID is
not a security feature. In other
words, this approach cannot be used to
prevent one client from reading data
that belongs to another client. This
type of filtering is useful only for
partitioning data and reducing the
amount of data that is brought down to
the client database.
So, is this telling me that the MS Sync Framework is only a good option when you want to replicate an entire table between point A and point B?
Doesn't that seem to be an extremely limiting characteristic of the framework? Or am I just interpreting this statement incorrectly? Or is there some other way to use the framework to achieve my purposes?
Ideas anyone?
Thanks!

No, it is only a security warning.
We use filtering extensively in our semi-connected app.
Here is some code to get you started:
//helper
void PrepareFilter(string tablename, string filter)
{
SyncAdapters.Remove(tablename);
var ab = new SqlSyncAdapterBuilder(this.Connection as SqlConnection);
ab.TableName = "dbo." + tablename;
ab.ChangeTrackingType = ChangeTrackingType.SqlServerChangeTracking;
ab.FilterClause = filter;
var cpar = new SqlParameter("#filterid", SqlDbType.UniqueIdentifier);
cpar.IsNullable = true;
cpar.Value = DBNull.Value;
ab.FilterParameters.Add(cpar);
var nsa = ab.ToSyncAdapter();
nsa.TableName = tablename;
SyncAdapters.Add(nsa);
}
// usage
void SetupFooBar()
{
var tablename = "FooBar";
var filter = "FooId IN (SELECT BarId FROM dbo.GetAllFooBars(#filterid))";
PrepareFilter(tablename, filter);
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Google App Engine - Recommended approach for querying by Ids - google-app-engine

Related

Controlling NHIbernate search query output regarding parameters

Not Able to retrive all data from datastore

Does the NDB membership query ("IN" operation) performance degrade with lots of possible values?

GAE caching objectify queries

How to use MS Sync Framework to filter client-specific data?

Categories

Resources