How does google appengine measure datastore put operations - google-app-engine

With the appengine pricing changes, we've been paying attention to our datastore puts. According to the pricing comparison chart we're making 2.18 million puts a day. This seems a lot higher than expected. We receive about 0.6 queries per second which means that each request is making about 60 puts!!
Using the sample code for db profiling http://code.google.com/appengine/articles/hooks.html
we measured this for a day and the most we counted was ~14,000 which seems more reasonable. Does anyone have experience with something similar on their site?

The discrepancy you're seeing is because every index write is counted separately. When you do a datastore put, you're charged for the number of rows that have to be modified, so if you modified a single indexed field, you'd expect to be charged for:
One write for the entity itself
Two writes for the ascending index for the modified property
Two writes for the descending index for the modified property
For a total of 5 writes. As you can see, setting properties to indexed=False can have a big impact on your quota usage here.

Related

understanding app engine memcache statistics

Under Compute > Memcache we have some statistics:
HitRatio, Items In Cache, Oldest Item Age, Total Cache Size etc.
Then we can also see stats for 20 commonly used keys ordered by either Operation Count or Memcache compute units
my question is, is it possible to figure out how many times per second a key has been read (or read + written) from memcache using just memcache stats?
For example, if i have 1 million hits and the oldest item is 1 day old, and my memecache key uses 5% of the traffic,
Could I go (1 million hits * 5% = 50,000 hits) / 24 hours = 0.57 hits per second.
Really I have no idea what the statistics on the memcache viewer actually mean - For example the statistics don't even reset if memcache is flushed.
Cheers.
I am pretty sure counting this way won't return what you want. As explained in the python memcache statistic paragraph, the age of your items reset when they are read. So the oldest item being 1 day old just means it has been in memcache for a day since it was read.
To figure out how many times a second a key has been read, you might want to use a sharding counter, or some other form of logging, then retrieving the logged data with the Logs API to interpret them. It can't really be done directly from the memcache statistics (might be an interesting feature to request on Google's Public Issue tracker though)

NDB: What happens when 1/s write is exceeded?

I am evauating how to use GAE + NDB for a new project, and got concerned with the limit of 1 write per second for ancestor writes. I might be missing information, so I'm happy to ask for help.
Say several users work with orders. If all new "order" entities have the same unique ancestor, what would happen if say 5 users each create a new order and all 5 hit "save" at the same time?
Do you know what the consecuences could be?
Thanks!
In your use case, nothing bad would happen - all of your writes will succeed. Some of them may be retried internally by the App Engine, but you should not worry about that. You should only get concerned when you expect this rate to be exceeded for a substantial period of time. Then retries would come on top of previous retries and commits may start failing. Giving your example, you will probably need a few million people working on those orders like crazy before it becomes an issue.
From the documentation (emphasis mine):
The first type of timeout occurs when you attempt to write to a single
entity group too quickly. Writes to a single entity group are
serialized by the App Engine datastore, and thus there's a limit on
how quickly you can update one entity group. In general, this works
out to somewhere between 1 and 5 updates per second; a good guideline
is that you should consider rearchitecting if you expect an entity
group to have to sustain more than one update per second for an
extended period.

GAE — Performance of queries on indexed properties

If I had an entity with an indexed property, say "name," what would the performance of == queries on that property be like?
Of course, I understand that no exact answers are possible, but how does the performance correlate with the total number of entities for which name == x for some x, the total number of entities in the datastore, etc.?
How much slower would a query on name == x be if I had 1000 entities with name equalling x, versus 100 entities? Has any sort of benchmarking been done on this?
Some not very strenuous testing on my part indicated response times increased roughly linearly with the number of results returned. Note that even if you have 1000 entities, if you add a limit=100 to your query, it'll perform the same as if you only had 100 entities.
This is in line with the documentation which indicates that perf varies with the number of entities returned.
When I say not very strenuous, I mean that the response times were all over the place, and it was a very very rough estimate to draw a line through. I'd often see an order of magnitude difference in perf on the same request.
AppEngine does queries in a very optimized way, so it is virtually irrelevant from a performance stand-point whether you do a query on the name property vs. just doing a batch-get with the keys only. Either will be linear in the number of entities returned. The total number of entities stored in your database does not make a difference. What does make a tiny difference, though, is the number of different values for "name" that occur in your database (so, 1000 entities returned will be pretty much exactly 10 times slower than 100 entities returned).
The way this is done is via the indices (or indexes as preferred) stored along with your data. An index for the "name" property consists of a table that has all names sorted in alphabetical order (and a second one sorted in reverse alphabetical order, if you use descending order in any of your queries) and a query will then simply find the first occurrence of the name you are querying in the table and start returning results in order. This is called a "scan".
This video is a bit technical, but it explains in detail how all this works and if you're concerned about coding for maximum performance, might be a good time investment:
Google I/O 2008: Under the Covers of the Google App Engine Datastore
(the video quality is fairly bad, but they also have the slides online (see link above video))

Datastore Write Operations limit exceeding

I am using Java SDK of app-engine.
I am using Master-slave datastore.
I have only two tables, each having 30 columns and none of them has size greater than 20 bytes.
After entering 300 rows in each table, it shows Datastore write operations 0.03 Millions out of 0.05 Millions.
I have checked the tables. They contain 300 entries only. There is no infinite loop kind of bug in my code.
Would someone please help me to point me out where I might be going wrong?
Thanks,
Amrish.
As noted in the previous answer, those write totals include your index writes.
All entity properties have associated default indexes (unless the property is configured to be unindexed), even if you have not defined any custom indexes.
See http://code.google.com/appengine/articles/indexselection.html for more detail, and http://code.google.com/appengine/docs/billing.html#Billable_Resource_Unit_Cost for more specifics on write costs.
For example, a new entity 'put' is:
2 Writes + 2 Writes per indexed property value (these are for the default indexes for that property) + 1 Write per composite index value (for any relevant custom indexes that you have defined).
Datastore writer operations include index updates. Make sure you don't have any exploding indexes. Keep in mind also that by default all fields have a built-in index; make any fields that you're not using unindexed to save quota.
Also, for better reliability and availability, consider switching to the high-reliability datastore (this doesn't directly fix your problem though).
I think there is problem due to size of list_flightinfo. Also, this code might have been called several times per second.
The key of the entity is:
src+"_"+dest
Which is not getting changed in the loop, hence same entity is getting overwritten again and again.

Maximum number of records for a custom object in salesforce.com

What is the maximum number of records within a single custom object in salesforce.com?
There does not seem to be a limit indicated in https://login.salesforce.com/help/doc/en/limits.htm
But of course, there has to be a limit of some kind. EG: Could 250 million records be stored in a single salesforce.com custom object?
As far as I'm aware the only limit is your data storage, you can see what you've used by going to Setup -> Administration Setup -> Data Management -> Storage Usage.
In one of the Orgs I work with I can see one object has almost 2GB of data for just under a million records, and this accounts for a little over a third of the storage available. Your storage space depends on your Salesforce Edition and number of users. See here for details.
I've seen the performance issue as well, though after about 1-2M records the performance hit appears magically to plateau, or at least it didn't appear to significantly slow down between 1M and 10M. I wonder if orgs are tier-tuned based on volume... :/
But regardless of this, there are other challenges which make it less than ideal for big data. Even though they've increased the SOQL governor limit to permit up to 50 million records to be retrieved in one call, you're still strapped with a 200,000 line execution limit in Apex and a 10K DML limit (per execution thread). These can be bypassed through Batch Apex, yet this has limitations as well. You can only execute 250K batches in 24 hours and only have 5 batches running at any given time.
So... the moral of the story seems to be that even if you managed to get a billion records into a custom object, you really can't do much with the data at that scale anyway. Therefore, it's effectively not the right tool for that job in its current state.
2-cents
LaceySnr is correct. However, there is an inverse relationship between the number of records for an object and performance. Any part of the system that filters on that object will be impacted, such as views, reports, SOQL queries, etc.
It's hard to talk specific numbers since salesforce has upwards of a dozen server clusters, each with their own performance characteristics. And there's probably a lot of dynamic performance management that occurs regularly. But, in the past I've seen performance issues start to creep in around 2M records. One possible remedy is you can ask salesforce to index fields that you plan to filter on.

Resources