I am trying to get the emails of a bunch of users on our service. I am first getting a list of messages, and if the message is not in the DataStore, then we fetch them. However, I'm using the deferred library to avoid the DeadlineExceeded error. The current algorithm is:
Put each user task on a queue
For each user, get the list of messages
For each 10 messages from this list, enqueue to fetch the messages 10 at a time.
However, I realized that this also exceeds the rate limit since I could be doing more than 10 queries/sec. When I tried to do only 1 message at a time instead of 10, and included getting the list of messages (which makes 1 network request for each page of emails), I got an error saying I was using too much memory and my process was shut down.
What is the best algorithm so I can ensure I am always under 10 qps to GMail and yet not run out of memory?
I don't think hitting the rate limit is a big deal, just make sure you handle the error and slow down a little in that case. Fetching messages in batches of 10 seems fine.
If you run out of memory in the scenario that you described, that means you have a memory leak or an infinite loop in your code. 10 queries can be easily processed on the smallest instance possible.
Related
how can we keep fixed number of active concurrent users/requests at time for a scenario.
I have an unique testing problem where I am required to do the performance testing of services with fixed number of request at a given moment for a given time periods like 10 minutes or 30 minutes or 1 hour.
I am not looking for per second thing, what I am looking for is that we start with N number of requests and as any of request out of N requests completes we add one more so that at any given moment we have N concurrent requests only.
Things which I tried are rampUsers(100) over 10 seconds but what I see is sometimes there are more than 50 users at a given instance.
constantUsersPerSec(20) during (1 minute) also took the number of requests t0 50+ for sometime.
atOnceUsers(20) seems related but I don't see any way to keep it running for given number of seconds and adding more requests as previous ones completes.
Thankyou community in advance, expecting some direction from your side.
There is a throttling mechanism (https://gatling.io/docs/3.0/general/simulation_setup/#throttling) which allow you to set max number of requests, but you must remember that users are injected to simulation independently of that and you must inject enough users to produce that max number of request, without that you will end up with lower req/s. Also users that will be injected but won't be able to send request because of throttling will wait in queue for they turn. It may result in huge load just after throttle ends or may extend your simulation, so it is always better to have throttle time longer than injection time and add maxDuration() option to simulation setup.
You should also have in mind that throttled simulation is far from natural way how users behave. They never wait for other user to finish before opening page or making any action, so in real life you will always end up with variable number of requests per second.
Use the Closed Work Load Model injection supported by Gatling 3.0. In your case, to simulate and maintain 20 active users/requests for a minute, you can use an injection like,
Script.<Controller>.<Scenario>.inject(constantConcurrentUsers(20) during (60 seconds))
I have the following code that I run everyweek through a cron job to clear older db entries. After 3-4 minutes I get Exceeded soft private memory limit of 128 MB with 189 MB after servicing 1006 requests total.
Then there is this message also While handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application. Below is the clear code.
def clean_user_older_stories(user):
stories = Story.query(Story.user==user.key).order(-Story.created_time).fetch(offset=200, limit=500, keys_only=True)
print 'stories len ' + str(len(stories))
ndb.delete_multi(stories)
def clean_older_stories():
for user in User.query():
clean_user_older_stories(user)
I guess there is a better way to deal with this. How do I handle this?
It's because of In-Context Cache
With executing long-running queries in background tasks, it's possible for the in-context cache to consume large amounts of memory. This is because the cache keeps a copy of every entity that is retrieved or stored in the current context.
Try disabling cache
To avoid memory exceptions in long-running tasks, you can disable the cache or set a policy that excludes whichever entities are consuming the most memory.
ctx = ndb.get_context
ctx.set_cache_policy(False)
ctx.set_memcache_policy(False)
Have you tried making your User query a keys_only query? You are not using any User properties besides the key and this would help cut down on memory usage.
You should page through large queries by setting a page_size and using a Cursor.
Your handler can invoke itself through the task queue with the next cursor until the end of the result set is reached. Optionally you can use the deferred API cut down on boilerplate code for this kind of task.
That being said, the 'join' your are doing between User and Store could make this challenging. I would page through Users first as it seems from what you have described Users will grow overtime but number of Stories per User is limited.
I have a bit of a strange problem. I have a module running on gae that puts a whole lot of little tasks on the default task queue. The tasks access the same ndb module. Each task accesses a bunch of data from a few different tables then calls put.
The first few tasks work fine but as time continues I start getting these on the final put:
suspended generator _put_tasklet(context.py:358) raised TransactionFailedError(too much contention on these datastore entities. please try again.)
So I wrapped the put with a try and put in a randomised timeout so it retries a couple of times. This mitigated the problem a little, it just happens later on.
Here is some pseudocode for my task:
def my_task(request):
stuff = get_ndb_instances() #this accessed a few things from different tables
better_stuff = process(ndb_instances) #pretty much just a summation
try_put(better_stuff)
return {'status':'Groovy'}
def try_put(oInstance,iCountdown=10):
if iCountdown<1:
return oInstance.put()
try:
return oInstance.put()
except:
import time
import random
logger.info("sleeping")
time.sleep(random.random()*20)
return oInstance.try_put(iCountdown-1)
Without using try_put the queue gets about 30% of the way through until it stops working. With the try_put it gets further, like 60%.
Could it be that a task is holding onto ndb connections after it has completed somehow? I'm not making explicit use of transactions.
EDIT:
there seems to be some confusion about what I'm asking. The question is: Why does ndb contention get worse as time goes on. I have a whole lot of tasks running simultaneously and they access the ndb in a way that can cause contention. If contention is detected then a randomy timed retry happens and this eliminates contention perfectly well. For a little while. Tasks keep running and completing and the more that successfully return the more contention happens. Even though the processes using the contended upon data should be finished. Is there something going on that's holding onto datastore handles that shouldn't be? What's going on?
EDIT2:
Here is a little bit about the key structures in play:
My ndb models sit in a hierarchy where we have something like this (the direction of the arrows specifies parent child relationships, ie: Type has a bunch of child Instances etc)
Type->Instance->Position
The ids of the Positions are limited to a few different names, there are many thousands of instances and not many types.
I calculate a bunch of Positions and then do a try_put_multi (similar to try_put in an obvious way) and get contention. I'm going to run the code again pretty soon and get a full traceback to include here.
Contention will get worse overtime if you continually exceed the 1 write/transaction per entity group per second. The answer is in how Megastore/Paxo work and how Cloud Datastore handles contention in the backend.
When 2 writes are attempted at the same time on different nodes in Megastore, one transaction will win and the other will fail. Cloud Datastore detects this contention and will retry the failed transaction several times. Usually this results in the transaction succeeding without any errors being raised to the client.
If sustained writes above the recommended limit are being attempted, the chance that a transaction needs to be retried multiple times increases. The number of transactions in an internal retry state also increases. Eventually, transactions will start reaching our internal retry limit and will return a contention error to the client.
Randomized sleep method is an incorrect way to handle error response situations. You should instead look into exponential back-off with jitter (example).
Similarly, the core of your problem is a high write rate into a single entity group. you should look into whether the explicit parenting is required (removing it if not), or if you should shard the entity group in some manner that makes sense according to your queries and consistency requirements.
I'm running into major performance issues when trying to just get a list of the ten most recent threads in a user's inbox:
threads = gmail_client.users().threads().list(userId='me', maxResults=10, pageToken='', q='-in:chats ', labelIds=['INBOX']).execute()
This one query is consistently taking 5-6 seconds. Any idea what's going on here, or how I can speed this up?
Try:
threads = gmail_client.users().threads().list(userId='me', maxResults=10, labelIds=['INBOX']).execute()
There is no reason to send a empty pageToken, just omit the attribute. Also, chat messages aren't in inbox, no need to exclude them in the query.
Also, confirm performance is same across mailboxes, a busy mailbox is expected to be slower.
I have an application where you can select entries in a table to be updated. You can select 50 and hit 'send it' and it will send all 50 as 50 individual ajax calls, each of which calls controller X that updates the database table Y. The user said they selected 25 but I see 12 in the logs and no trace of the other 13. I was wondering if maybe I hit some limit with the number of requests or the length of the queue. Any Ideas?
(Running this locally it had no problem with 50-100 being sent at once it just took like 2 seconds for all of them to finally callback).
This question cannot really be answered as there's no "hard limit" that would cause your ajax calls to be lost. However, overloading the server (if you only have one) can cause requests to be queued up and/or take a long time which in turn may cause requests to timeout (depending on your settings). Chances are if you are making 50 ajax calls instead of just one then there is some room for optimization there and I'm not sure that classifies as premature.
In the end, I think the most and best advice you will get is to profile and load test your code and hardware to be sure otherwise we're all just guessing.