https://graph.microsoft.com/v1.0/groups/delta?$filter=id eq 'id' [https://learn.microsoft.com/en-us/graph/delta-query-groups]
https://graph.microsoft.com/v1.0/groups/id/members [https://learn.microsoft.com/en-us/graph/api/group-list-members?view=graph-rest-1.0&tabs=http]
What is the difference between the above two queries? 1st one returns membership of a group with details like '#removed' users + delta link and 2nd one returns membership of a group without details like '#removed' users. Is that the only difference? What is the cost associated with running these 2 queries in terms of performance, compute time?

Delta query enables applications to discover newly created, updated, or deleted entities without performing a full read of the target resource with every request. it helps to track the changes .
where as https://graph.microsoft.com/v1.0/groups/id/members return all the members data.
please read the doc for more info - https://learn.microsoft.com/en-us/graph/delta-query-overview
Ancestor relation in datastore

I have three entities: user, post and comment. A user may have multiple posts and a post may have multiple comments.
I know I can add ancestor relations like this:
user(Grand Parent) post(parent) comment(child)
I'm little bit confused about ancestors. I read from documention and searches that ancestors are used for transactions, every ancestors are in same entity group and entity groups are stored in same datastore node which makes it less scaleable. Is this right?
Is creating user as parent of posts and post as parent of comments a good thing?
Rather than this we can add one extra property in the post entity like user_id as shown in example and filter by it.
Which is better/more scalable: filter posts by ancestors or add an extra property user_id in the post Entity and filter by it?
I know both approaches can get the same results but I want to know which one is better in performance and scalability?
Update 11/4/2017
A large number of users is using this App. It's is quite possible there are more
than one posts per sec. But A single user can not create posts more than one per sec. But multiple user may be. As described in documentations maximum entity group write rate of 1/s. Is it still possible to use Ancestor ?
Same for comments. Multiple user can add comment in a same entity group. It's is
quite possible more than one comment in one sec.
Ancestor Queries are faster ?
I read in many places that ancestors queries are much faster than others.
As I know the reason why they are fast is that because it create entity group and store related data in same node. So, it require less time to get data from single node as compare to multiple nodes.
For Example: If post is store in Asia node and comment is store in Europe node and I want to get posts and comments then datastore API need to fetch two nodes to complete request. Which make it slow. Rather than if I create ancestor relation and make entity group which create a better performance.
But what if I don't need to get post and comment data at same time. If I need post in separate web page and comment in separate page.In this scenario datastore api need to fetch only one node at a time.It is not matter data save in single node or save in multiple node. What about query performance can ancestor make it fast in this case ?
Yes, you are correct: all ancestry-related entities are in the same entity group, which raises 2 scalability issues: data contention and maximum entity group write rate of 1/s. See somehow related Is there an Entity Group Max Size?
There are advantages of using ancestries and some may be willing to sacrifice scalability for them (see What would be the purpose of putting all datastore entities in a single group?), but IMHO not for your kind of app: I think you'll agree that it's not really critical to see every new user/post/comment in random searches immediately after it is created (i.e. strong consistency) - the fact that it eventually appears is IMHO good enough.
Simply having no ancestry at all and adding additional model properties (entity keys or even just entity key IDs for entities which never have ancestors) to allow cross-referencing entities is the more scalable approach and IMHO fits well with your app.
I think the question to ask is: Are you expecting:
User to create Posts more than once per seconds (I doubt :)
People to comment on a Post more than once per second (could happen)
It not, then having ancestors queries will be faster than normal queries. So it depends of your usecase. I'd go for query speed unless you know you will have thousands of comments on posts.

google app engine query opimization

I am trying to do my reads and writes for GAE as efficiently as possible and I was wondering which is the best of the following two options.
I have a website where users are able to post different things and right now whenever I want to show all posts by that user I do a query for all posts with that user's user ID and then I display them. Would it be better to store all of the post IDs in the user entity and do a get_by_id(post_ID_list) to return all of the posts? Or would that extra space being used up not be worth it?
Is there anywhere I can find more information like this to optimize my web app?
The main reason you would want to store the list of IDs would be so that you can get each entity separately for better consistency - entity gets by id are consistent with the latest version in the datastore, while queries are eventually consistent.
Check datastore costs and optimize for cost:
Getting entities by key wouldn't be any cheaper than querying all the posts. The query makes use of an index.
If you use projection queries, you can reduce your costs quite a bit.
There is several cases.
First, if you keep track for all ids of user's posts. You must use entity group for consistency. Thats means speed of write to datastore would be ~1 entity per second. And cost is 1 read for object with ids and 1 read per entity.
Second, if you just use query. This is not need consistency. Cost is 1 read + 1 read per entity retrieved.
Third, if you quering only keys and after fetching. Cost is 1 read + 1 small per key retrieved. Watch this: Keys-Only Queries. This equals to projection quering for cost.
And if you have many result, and use pagination then you need use Query Cursors. That prevent useless usage of datastore.
The most economical solution is third case. Watch this: Batch Operations.
In case you have a list of id's because they are stored with your entity, a call to ndb.get_multi (in case you are using NDB, but it would be similar with any other framework using the memcache to cache single entities) would save you further datastore calls if all (or most) of the entities correpsonding to the keys are already in the datastore.
So in the best possible case (everything is in the memcache), the datastore wouldn't be touched at all, while using a query would.
See this issue for a discussion and caveats: http://code.google.com/p/appengine-ndb-experiment/issues/detail?id=118.

what is the meaning of recursive in cakephp?

well, i have this line of code in the tutorial i am following. However, it did not provided me the clear explanation regarding recursive. I am a newbie in cakephp and searched about this "recursive". I hope somebody could provide me a layman's explanation of this code:
$this->Author->recursive = 1;
First result on Google is a clear explanation from the reference of Cakephp itself:
It is needed to set the depth of the retrieval of records associated with a model data so that you can limit how much data is fetched from the query when there are many levels of associations between your models.
I would recommend that you check the documentation first.
Recursive defines the amount of data that will be fetched from the database, Cakephp by default will get the data of the Model/Table that you're querying for and the data of the Models/Tables that are linked to the main Model/table (hasmany, belongsto, etc.)
By setting recursive, you're forcing Cakephp to only fetch a certain amount of data, it can be more or less, depending on how much deep are the association between the models/tables and the number specified in recursive.
Setting recursive to -1 will only get the data of the model that you're querying for, setting it higher will ask Cakephp to fetch deeper association.
Lets say that in our app we have authors that sell books and they get commented by readers.
Author 1 <> * Book 1 <> * Comment
If we don't set recursive while fetching the list of authors, Cakephp will get the list of authors their books and comments.
$authors = $this->Author->find('all');
The problem is that for each list display, Cakephp and the database are dealing with a lot of unnecessary data ! which in return impact the performance of your http & database server.
Imagine that the list is shown 10/s and each list shows 20 authors (authors who can have from 1 book to *, lets say 10 books as an average number for this example with 5 comments each) do the math and you will see that the servers are processing a lot of unnecessary data which wont be used in the end.
The user want to only see the authors list, so there's no need to fetch all the books and comments unless you're going to process them in the controller or to display them in the views. We can do so by setting recursive to -1.
$this->Author->recursive = -1;
$authors = $this->Author->find('all');
You may want to optimize your queries so it fetches only the fields that you're going to use, it will boost the overall performance, but that's another subject.
Sometimes you will find yourself wanting to do the reverse of that : lets say that the app update the Auth Session variable whenever the user log-in (update ip, browser info, oauth token, group info etc.) and that the app use all the user relatives info to adapt the user experience, for example if the user belongs to a certain group shows relative info&options to that particular group, if the user has allowed the app to access certain account info of a third party provider (google ?) show services that uses that kind of data - lets say show google+ feed or something - etc.
It would be a lot easier to fetch all the relative info of the user once he's logged in and store it in Session, which in return will be used by views to adapt the user experience. One way of doing so would be to fetch the relative data one by one and storing it in Session or simply set recursive to 2 and store the result in Session, it will fetch all the relative data of the user model.
OLd response
recursive allow you to define the amount of data to get from the database. Lets say that the Author has many publication.
if you specify -1 for recursive before getting a certain author from the database like so:
$this->Author->recursive = -1;
$author = $this->Author->findByName('Someone');
you would get only the Author information/you will get information only from the the Authors table and none from the related tables like publications.
you can see this by yourself by using this code:
//only author info
$this->Author->recursive = -1;
$author = $this->Author->findByName('Someone');
//display the result
//get the author and related publications info
$this->Author->recursise = 1;
$authorAndPublications = $this->Author->findByName('Someone');
//display result
The recursive property then specify how much information do you want from your database.
where should i use it ?
lets suppose each author has at least 10 publications and you want to query the database to find the authors, if you didn't specify the recursive property, Cakephp will get all the authors and their publications too!! so lets say 50 authors * 10 publications..... you get the picture, you are querying for a ton of unnecessary data.
it mater a lot if it is a high traffic site since for example at each authors list display you query for 500 unnecessary publications informations(that wont be used) just to display some information of the 50 authors in a list/table.
by using recursive = -1; before querying for the authors you ease the strain on the database which result in better reactivity and performance.
From the documentation v1.3, v2.0:
The recursive property defines how deep CakePHP should go to fetch associated model data via find(), findAll() and read() methods.
Imagine your application features Groups which belong to a domain and have many Users which in turn have many Articles. You can set $recursive to different values based on the amount of data you want back from a $this->Group->find() call:
...documentation of the levels omitted...
Set it no higher than you need. Having CakePHP fetch data you aren’t going to use slows your app unnecessarily. Also note that the default recursive level is 1.

Managing rank of records via apex in salesforce.com

I have a requirement I'd like to get some input on. I need to have an "account rank" field that will not include all accounts and I will need to be able to add to the pool, remove from the pool, and change rank. My problem is that each time I remove a record from the pool or move it to a new position, all records after (which could be as many as 10,000) will need to be shifted up or down. Salesforce has limits on individual updates of 200 at a time, or you can split it up into batches of up to a million. My concern with batches is I won't be able to guarantee that people won't update more than 5 records in a short time, therefore reaching past the salesforce limits on total # of batches allowed.
Has anyone dealt with these issues and do you have any suggestions for a best approach?
I can't think of a good way to model this in the way you describe without resorting to some pretty custom apex using #future or batch, or having your own integration that does this recalculation through the salesforce API.
What determines account rank? Can you calculate it inside of a formula field?
This is a tough one... Can you re-think about this to somehting more SFDC-like? What if you were to generate the ranks on a dinamic way? You could for example create a voting system, or some grading system and have SFDC calculate the ranking for you.
Let's say you have one or more fields where you give a grade then have a SOQL query like this:
[SELECT ID FROM Account WHERE _ ORDER BY Vote1, Vote2, ]
If you change your Data Model it should make this quite easy e.g.
Create a ranking object Ranking__c and associate this with the Account object.
For each rank type or number create
a record with the details.
Associate the appropriate Account
records with the corresponding
Ranking__c record
When it's time to update an Accounts rank just change the Ranking__c records instead of the Account records. This should massively reduce the number of records you'd need to run over.

Can I do transactions and locks in CouchDB?

I need to do transactions (begin, commit or rollback), locks (select for update).
How can I do it in a document model db?
The case is this:
I want to run an auctions site.
And I think how to direct purchase as well.
In a direct purchase I have to decrement the quantity field in the item record, but only if the quantity is greater than zero. That is why I need locks and transactions.
I don't know how to address that without locks and/or transactions.
Can I solve this with CouchDB?
No. CouchDB uses an "optimistic concurrency" model. In the simplest terms, this just means that you send a document version along with your update, and CouchDB rejects the change if the current document version doesn't match what you've sent.
It's deceptively simple, really. You can reframe many normal transaction based scenarios for CouchDB. You do need to sort of throw out your RDBMS domain knowledge when learning CouchDB, though. It's helpful to approach problems from a higher level, rather than attempting to mold Couch to a SQL based world.
Keeping track of inventory
The problem you outlined is primarily an inventory issue. If you have a document describing an item, and it includes a field for "quantity available", you can handle concurrency issues like this:
Retrieve the document, take note of the _rev property that CouchDB sends along
Decrement the quantity field, if it's greater than zero
Send the updated document back, using the _rev property
If the _rev matches the currently stored number, be done!
If there's a conflict (when _rev doesn't match), retrieve the newest document version
In this instance, there are two possible failure scenarios to think about. If the most recent document version has a quantity of 0, you handle it just like you would in a RDBMS and alert the user that they can't actually buy what they wanted to purchase. If the most recent document version has a quantity greater than 0, you simply repeat the operation with the updated data, and start back at the beginning. This forces you to do a bit more work than an RDBMS would, and could get a little annoying if there are frequent, conflicting updates.
Now, the answer I just gave presupposes that you're going to do things in CouchDB in much the same way that you would in an RDBMS. I might approach this problem a bit differently:
I'd start with a "master product" document that includes all the descriptor data (name, picture, description, price, etc). Then I'd add an "inventory ticket" document for each specific instance, with fields for product_key and claimed_by. If you're selling a model of hammer, and have 20 of them to sell, you might have documents with keys like hammer-1, hammer-2, etc, to represent each available hammer.
Then, I'd create a view that gives me a list of available hammers, with a reduce function that lets me see a "total". These are completely off the cuff, but should give you an idea of what a working view would look like.
if (doc.type == 'inventory_ticket' && doc.claimed_by == null ) {
emit(doc.product_key, { 'inventory_ticket' :doc.id, '_rev' : doc._rev });
This gives me a list of available "tickets", by product key. I could grab a group of these when someone wants to buy a hammer, then iterate through sending updates (using the id and _rev) until I successfully claim one (previously claimed tickets will result in an update error).
function (keys, values, combine) {
return values.length;
This reduce function simply returns the total number of unclaimed inventory_ticket items, so you can tell how many "hammers" are available for purchase.
This solution represents roughly 3.5 minutes of total thinking for the particular problem you've presented. There may be better ways of doing this! That said, it does substantially reduce conflicting updates, and cuts down on the need to respond to a conflict with a new update. Under this model, you won't have multiple users attempting to change data in primary product entry. At the very worst, you'll have multiple users attempting to claim a single ticket, and if you've grabbed several of those from your view, you simply move on to the next ticket and try again.
Reference: https://wiki.apache.org/couchdb/Frequently_asked_questions#How_do_I_use_transactions_with_CouchDB.3F
Expanding on MrKurt's answer. For lots of scenarios you don't need to have stock tickets redeemed in order. Instead of selecting the first ticket, you can select randomly from the remaining tickets. Given a large number tickets and a large number of concurrent requests, you will get much reduced contention on those tickets, versus everyone trying to get the first ticket.
A design pattern for restfull transactions is to create a "tension" in the system. For the popular example use case of a bank account transaction you must ensure to update the total for both involved accounts:
Create a transaction document "transfer USD 10 from account 11223 to account 88733". This creates the tension in the system.
To resolve any tension scan for all transaction documents and
If the source account is not updated yet update the source account (-10 USD)
If the source account was updated but the transaction document does not show this then update the transaction document (e.g. set flag "sourcedone" in the document)
If the target account is not updated yet update the target account (+10 USD)
If the target account was updated but the transaction document does not show this then update the transaction document
If both accounts have been updated you can delete the transaction document or keep it for auditing.
The scanning for tension should be done in a backend process for all "tension documents" to keep the times of tension in the system short. In the above example there will be a short time anticipated inconsistence when the first account has been updated but the second is not updated yet. This must be taken into account the same way you'll deal with eventual consistency if your Couchdb is distributed.
Another possible implementation avoids the need for transactions completely: just store the tension documents and evaluate the state of your system by evaluating every involved tension document. In the example above this would mean that the total for a account is only determined as the sum values in the transaction documents where this account is involved. In Couchdb you can model this very nicely as a map/reduce view.
No, CouchDB is not generally suitable for transactional applications because it doesn't support atomic operations in a clustered/replicated environment.
CouchDB sacrificed transactional capability in favor of scalability. In order to have atomic operations you need a central coordination system, which limits your scalability.
If you can guarantee you only have one CouchDB instance or that everyone modifying a particular document connects to the same CouchDB instance then you could use the conflict detection system to create a sort of atomicity using methods described above but if you later scale up to a cluster or use a hosted service like Cloudant it will break down and you'll have to redo that part of the system.
So, my suggestion would be to use something other than CouchDB for your account balances, it will be much easier that way.
As a response to the OP's problem, Couch is probably not the best choice here. Using views is a great way to keep track of inventory, but clamping to 0 is more or less impossible. The problem being the race condition when you read the result of a view, decide you're ok to use a "hammer-1" item, and then write a doc to use it. The problem is that there's no atomic way to only write the doc to use the hammer if the result of the view is that there are > 0 hammer-1's. If 100 users all query the view at the same time and see 1 hammer-1, they can all write a doc to use a hammer 1, resulting in -99 hammer-1's. In practice, the race condition will be fairly small - really small if your DB is running localhost. But once you scale, and have an off site DB server or cluster, the problem will get much more noticeable. Regardless, it's unacceptable to have a race condition of that sort in a critical - money related system.
An update to MrKurt's response (it may just be dated, or he may have been unaware of some CouchDB features)
A view is a good way to handle things like balances / inventories in CouchDB.
You don't need to emit the docid and rev in a view. You get both of those for free when you retrieve view results. Emitting them - especially in a verbose format like a dictionary - will just grow your view unnecessarily large.
A simple view for tracking inventory balances should look more like this (also off the top of my head)
function( doc )
if( doc.InventoryChange != undefined ) {
for( product_key in doc.InventoryChange ) {
emit( product_key, 1 );
And the reduce function is even more simple
This uses a built in reduce function that just sums the values of all rows with matching keys.
In this view, any doc can have a member "InventoryChange" that maps product_key's to a change in the total inventory of them. ie.
"_id": "abc123",
"InventoryChange": {
"hammer_1234": 10,
"saw_4321": 25
Would add 10 hammer_1234's and 25 saw_4321's.
"_id": "def456",
"InventoryChange": {
"hammer_1234": -5
Would burn 5 hammers from the inventory.
With this model, you're never updating any data, only appending. This means there's no opportunity for update conflicts. All the transactional issues of updating data go away :)
Another nice thing about this model is that ANY document in the DB can both add and subtract items from the inventory. These documents can have all kinds of other data in them. You might have a "Shipment" document with a bunch of data about the date and time received, warehouse, receiving employee etc. and as long as that doc defines an InventoryChange, it'll update the inventory. As could a "Sale" doc, and a "DamagedItem" doc etc. Looking at each document, they read very clearly. And the view handles all the hard work.
Actually, you can in a way. Have a look at the HTTP Document API and scroll down to the heading "Modify Multiple Documents With a Single Request".
Basically you can create/update/delete a bunch of documents in a single post request to URI /{dbname}/_bulk_docs and they will either all succeed or all fail. The document does caution that this behaviour may change in the future, though.
EDIT: As predicted, from version 0.9 the bulk docs no longer works this way.
Just use SQlite kind of lightweight solution for transactions, and when the transaction is completed successfully replicate it, and mark it replicated in SQLite
SQLite table
txn_id , txn_attribute1, txn_attribute2,......,txn_status
dhwdhwu$sg1 x y added/replicated
You can also delete the transactions which are replicated successfully.
