how long the map call can last? - google-app-engine

I want to do some heavy processing in the map() call of the mapper.
I was going through the source file MapReduceServlet.java:
// Amount of time to spend on actual map() calls per task execution.
public static final int PROCESSING_TIME_PER_TASK_MS = 10000;
Does it mean, the map call can last only for 10secs. What happens after 10sec?
Can I increase this to large number like 1min or 10min.
-Aswath

MapReduce operations are executed in tasks using Push Queues, and as said in the documentation the task deadline is currently 10 minutes (limit after which you will get a DeadlineExceededException).
If the task failed to execute, by default App Engine retries it until it succeed. If you need longer deadline that 10 minutes, you can use Backend for executing your tasks.
Looking at the actual usage of PROCESSING_TIME_PER_TASK_MS in Worker.java, this value is used to limit the number of map call done in a single task.
After each map call has been executed if more than 10s has elapsed since the beginning of the task, it will spawn a new task to handle the rest of the map calls.
Worker.scheduleWorker spawns a new Task for a each given shard
Each task will call Worker.processMapper
processMapper execute 1 map call
if less than PROCESSING_TIME_PER_TASK_MS have elapsed since 2. go back to 3.
else if processing is not finished reschedule a new worker task
In the worst case scenario the default task request deadline (10 minutes) should apply to each of your individual map call.

Related

How to fix System.LimitException: Apex CPU time limit exceeded caused by workflows with email alert?

I'm trying to execute batch test method on 100 records and get CPU Runtime Limit error.
I placed the Limits.getCpuTime() method in my code and noticed that my code without the workflow segment takes 3148 ms to complete. However, when I activate two workflows that sends emails to one user each, I get the CPU runtime limit error. In total my process without those two workflows takes around 10 seconds to complete while with them activated it takes around 20 seconds.
#IsTest
static void returnIncClientAddress(){
//Select Required Records
User incidentClient = [SELECT Id FROM User WHERE Username = 'bbaggins#shire.qa.com' LIMIT 1];
BMCServiceDesk__Category__c category = [SELECT Id FROM BMCServiceDesk__Category__c WHERE Name = 'TestCategory'];
BMCServiceDesk__BMC_BaseElement__c service = [SELECT ID FROM BMCServiceDesk__BMC_BaseElement__c WHERE Name = 'TestService'];
BMCServiceDesk__BMC_BaseElement__c serviceOffering = [SELECT ID FROM BMCServiceDesk__BMC_BaseElement__c WHERE Name = 'TestServiceOffering'];
//Create Incidents
List<BMCServiceDesk__Incident__c> incidents = new List<BMCServiceDesk__Incident__c>();
for(integer i = 0; i < 100; i++){
BMCServiceDesk__Incident__c incident = new BMCServiceDesk__Incident__c(
BMCServiceDesk__FKClient__c = incidentClient.ID,
BMCServiceDesk__FKCategory__c = category.ID,
BMCServiceDesk__FKServiceOffering__c = serviceOffering.ID,
BMCServiceDesk__FKBusinessService__c = service.ID,
BMCServiceDesk__FKStatus__c = awaiting_for_handling
);
incidents.add(incident);
}
test.startTest();
insert incidents;
test.stopTest();
}
I expected the email workflows and alerts to be processed in batch and sent without being so expensive in CPU time, but it seems that Salesforce takes a lot of time both checking the workflows rules and executing on them when needed. The majority of the process' time seems to be spent on sending the workflows' emails (which it doesn't actually do because it's a test method).
There's not much you can do to control the execution time of Workflow Rules. You could try converting them into Apex and benchmarking to see whether that results in improvement in time consumed, but I suspect the real solution is that you're going to have to dial down your bulk test.
The CPU limit for a transaction is 10 seconds. If your unit test code is already taking approximately 10 seconds to complete without Workflows (I'm not sure exactly what bounds your 3148 ms and 10 s refer to), you've really got only two choices:
Make the sum total of automation running on insert of this object faster;
Reduce the quantity of data you're processing in this unit test.
It's not clear what you're actually testing here, but if it's an Apex trigger, you should make sure that it's properly bulkified and does not consume unnecessary CPU time, including through trigger recursion. Reviewing the call stack in your logs (or simply adding System.debug() statements) may help with that.
Lastly - make sure you write assertions in your test method. Test methods without assertions are close to worthless.
Are there triggers on the BMCServiceDesk__Incident__c or on objects modified by the Workflow? Triggers on updates could possible cause the code to execute multiple times in the same execution context causing you to hit the cpu limit. Consider prventing reentry into triggers or performing check to only run triggers if specific criteria is met.
Otherwise consider refactoring the code if possible to have work executed within the same loop if possible as loops especially nested loops drive up your cpu usage. Usually workflow on their own dont drive up CPU limit unless triggers are executes due to workflow updates.

Contant timer delays the thread more than is set

I have the Test plan where is 10 requests. Just requests without Contant timer takes about 18 seconds. When I add one Contant timer with 1000 miliseconds delay after the third request It takes about 28 seconds.
Is It problem of the JMeter or I'm doing something wrong?
I'm running at Ubuntu - ElementaryOS with JMeter v. 2.11 r1554548.
I'm testing another server not mine laptop.
At Jmeter test plan I'm using Cache, Cookie manager and Request Defaults at the begin. One request with POST action. And Summary report, Graph results, View results in Table a Simple data writer at the end of test plan.
Everything is in one thread.
Order of timer object has no impact and does not mean it executes where it is located.
In fact it will apply to every child request of the parent of timer.
Read this:
http://jmeter.apache.org/usermanual/test_plan.html
4.10 Scoping Rules

Task Parallel Library and SQL Connections

I'm hoping someone can confirm what is actually happening here with TPL and SQL connections.
Basically, I have a large application which, in essence, reads a table from SQL Server, and then processes each row - serially. The processing of each row can take quite some time. So, I thought to change this to use the Task Parallel Library, with a "Parallel.ForEach" across the rows in the datatable. This seems to work for a little while (minutes), then it all goes pear-shaped with...
"The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached."
Now, I surmised the following (which may of course be entirely wrong).
The "ForEach" creates tasks for each row, up to some limit based on the number of cores (or whatever). Lets say 4 for want of a better idea. Each of the four tasks gets a row, and goes off to process it. TPL waits until the machine is not too busy, and fires up some more. I'm expecting a max of four.
But that's not what I observe - and not what I think is happening.
So... I wrote a quick test (see below):
Sub Main()
Dim tbl As New DataTable()
FillTable(tbl)
Parallel.ForEach(tbl.AsEnumerable(), AddressOf ProcessRow)
End Sub
Private n As Integer = 0
Sub ProcessRow(row As DataRow, state As ParallelLoopState)
n += 1 ' I know... not thread safe
Console.WriteLine("Starting thread {0}({1})", n, Thread.CurrentThread.ManagedThreadId)
Using cnx As SqlConnection = New SqlConnection(My.Settings.ConnectionString)
cnx.Open()
Thread.Sleep(TimeSpan.FromMinutes(5))
cnx.Close()
End Using
Console.WriteLine("Closing thread {0}({1})", n, Thread.CurrentThread.ManagedThreadId)
n -= 1
End Sub
This creates way more than my guess at the number of tasks. So, I surmise that TPL fires up tasks to the limit it thinks will keep my machine busy, but hey, what's this, we're not very busy here, so lets start some more. Still not very busy, so... etc. (seems like one new task a second - roughly).
This is reasonable-ish, but I expect it to go pop 30 seconds (SQL connection timeout) after when and if it gets 100 open SQL connections - the default connection pool size - which it doesn't.
So, to scale it back a bit, I change my connection string to limit the max pool size.
Sub Main()
Dim tbl As New DataTable()
Dim csb As New SqlConnectionStringBuilder(My.Settings.ConnectionString)
csb.MaxPoolSize = 10
csb.ApplicationName = "Test 1"
My.Settings("ConnectionString") = csb.ToString()
FillTable(tbl)
Parallel.ForEach(tbl.AsEnumerable(), AddressOf ProcessRow)
End Sub
I count the real number of connections to the SQL server, and as expected, its 10. But my application has fired up 26 tasks - and then hangs. So, setting the max pool size for SQL somehow limited the number of tasks to 26, but why no 27, and especially, why doesn't it fall over at 11 because the pool is full ?
Obviously, somewhere along the line I'm asking for more work than my machine can do, and I can add "MaxDegreesOfParallelism" to the ForEach, but I'm interested in what's actually going on here.
PS.
Actually, after sitting with 26 tasks for (I'm guessing) 5 minutes, it does fall over with the original (max pool size reached) error. Huh ?
Thanks.
Edit 1:
Actually, what I now think happens in the tasks (my "ProcessRow" method) is that after 10 successful connections/tasks, the 11th does block for the connection timeout, and then does get the original exception - as do any subsequent tasks.
So... I conclude that the TPL is creating tasks at about 1 a second, and it gets enough time to create about 26/27 before task 11 throws an exception. All subsequent tasks then also throw exceptions (about a second apart) and the TPL stops creating new tasks (because it gets unhandled exceptions in one or more tasks ?)
For some reason (as yet undetermined), the ForEach than hangs for a while. If I modify my ProcessRow method to use the state to say "stop", it appears to have no effect.
Sub ProcessRow(row As DataRow, state As ParallelLoopState)
n += 1
Console.WriteLine("Starting thread {0}({1})", n, Thread.CurrentThread.ManagedThreadId)
Try
Using cnx As SqlConnection = fnNewConnection()
Thread.Sleep(TimeSpan.FromMinutes(5))
End Using
Catch ex As Exception
Console.WriteLine("Exception on thread {0}", Thread.CurrentThread.ManagedThreadId)
state.Stop()
Throw
End Try
Console.WriteLine("Closing thread {0}({1})", n, Thread.CurrentThread.ManagedThreadId)
n -= 1
End Sub
Edit 2:
Dur... The reason for the long delay is that, while tasks 11 onwards all crash and burn, tasks 1 to 10 don't, and all sit there sleeping for 5 minutes. The TPL has stopped creating new tasks (because of the unhandled exception in one or more of the tasks it has created), and then waits for the un-crashed tasks to complete.
The edits to the original question add more detail and, eventually, the answer becomes apparent.
TPL creates tasks repeatedly because the tasks it has created are (basically) idle. This is fine until the connection pool is exhausted, at which point the tasks which want a new connection wait for one to become available, and timeout. In the meantime, the TPL is still creating more tasks, all doomed to fail. After the connection timeout, the tasks start failing, and the ensuing exception(s) cause the TPL to stop creating new tasks. The TPL then waits for the tasks that did get connections to complete, before an AggregateException is thrown.
The TPL is not made for IO-bound work. It has heuristics which it uses to steer the count of threads being active. These heuristics fail for long-running and/or IO-bound tasks, causing it to inject more and more threads without a practical limit.
Use PLINQ to set a fixed amount of threads using WithDegreeOfParallelism. You should probably test different amounts. It could look like this. I have written much more about this topic on SO, but I can't find it at the moment.
I have no idea why you are seeing exactly 26 threads in your example. Note, that when the pool is depleted, a request to take a connection only fails after a timeout. This entire system is very non-deterministic and I'd consider any number of threads plausible.

Which NDB query function is more efficient to iterate through a big set of query results?

I use NDB for my app and use iter() with limit and starting cursor to iterate through 20,000 query results in a task. A lot of time I run into timeout error.
Timeout: The datastore operation timed out, or the data was temporarily unavailable.
The way I make the call is like this:
results = query.iter(limit=20000, start_cursor=cursor, produce_cursors=True)
for item in results:
process(item)
save_cursor_for_next_time(results.cursor_after().urlsafe())
I can reduce the limit but I thought a task can run as long as 10 mins. 10 mins should be more than enough time to go through 20000 results. In fact, on a good run, the task can complete in just about a minute.
If I switched to fetch() or fetch_page(), would they be more efficient and less likely to run into the timeout error? I suspect there's a lot of overhead in iter() that causes the timeout error.
Thanks.
Fetch is not really any more efficient they all use the same mechanism, unless you know how many entities you want upfront - then fetch can be more efficient as you end up with just one round trip.
You can increase the batch size for iter, that can improve things. See https://developers.google.com/appengine/docs/python/ndb/queryclass#kwdargs_options
From the docs the default batch size is 20, which would mean for 20,000 entities a lot of batches.
Other things that can help. Consider using map and or map_async on the processing, rather than explicitly calling process(entity) Have a read https://developers.google.com/appengine/docs/python/ndb/queries#map also introducing async into your processing can mean improved concurrency.
Having said all of that you should profile so you can understand where the time is used. For instance the delays could be in your process due to things you are doing there.
There are other things to conside with ndb like context caching, you need to disable it. But I also used iter method for these. I also made an ndb version of the mapper api with the old db.
Here is my ndb mapper api that should solve timeout problems and ndb caching and easily create this kind of stuff:
http://blog.altlimit.com/2013/05/simple-mapper-class-for-ndb-on-app.html
with this mapper api you can create it like or you can just improve it too.
class NameYourJob(Mapper):
def init(self):
self.KIND = YourItemModel
self.FILTERS = [YourItemModel.send_email == True]
def map(self, item):
# here is your process(item)
# process here
item.send_email = False
self.update(item)
# Then run it like this
from google.appengine.ext import deferred
deferred.defer(NameYourJob().run, 50, # <-- this is your batch
_target='backend_name_if_you_want', _name='a_name_to_avoid_dups')
For potentially long query iterations, we use a time check to ensure slow processing can be handled. Given the disparities in GAE infrastructure performance, you will likely never find an optimal processing number. The code excerpt below is from an on-line maintenance handler we use which generally runs within ten seconds. If not, we get a return code saying it needs to be run again thanks to our timer check. In your case, you would likely break the process after passing the cursor to your next queue task. Here is some sample code which is edited down to hopefully give you a good idea of our logic. One other note: you may choose to break this up into smaller bites and then fan out the smaller tasks by re-enqueueing the task until it completes. Doing 20k things at once seems very aggressive in GAE's highly variable environment. HTH -stevep
def over_dt_limit(start, milliseconds):
dt = datetime.datetime.now() - start
mt = float(dt.seconds * 1000) + (float(dt.microseconds)/float(1000))
if mt > float(milliseconds):
return True
return False
#set a start time
start = datetime.datetime.now()
# handle a timeout issue inside your query iteration
for item in query.iter():
# do your loop logic
if over_dt_limit(start, 9000):
# your specific time-out logic here
break

Google App Engine Added task goes missing

When I add a task to the task queue, sometimes the task goes missing. I dont get any errors but I just dont find the tasks in my logs. Suppose I add n tasks. The computation cannot go forward without these n tasks finishing. However, I find that one or more of these n tasks just went missing after they were added and my whole algorithm stops in the middle.
What could be the reason ?
I keep a variable w to check the number of times the task was added. I observe w = n though some tasks were not created.
def addtask_whx(index,user,seqlen,vp_compress,iseq_compress):
global w
while True :
timeout_ms = 100
taskq_name = 'whx'+'--'+str(index[0])+'-'+str(index[1])+'-'+str(index[2])+'-'+str(index[3])+'-'+str(index[5]) + '--' + user
try :
taskqueue.add(name=taskq_name+str(timeout_ms),queue_name='whx',url='/whx', params={'m': index[0],'n': index[1],'o': index[2],'p': index[3],'q':0,'r':index[5],'user': user,'seqlen':seqlen,'vp':vp_compress,'iseq':iseq_compress})
w = w+1
break
except DeadlineExceededError:
taskq_name = taskq_name + str(timeout_ms)
time.sleep(float(timeout_ms)/1000)
timeout_ms = timeout_ms*4
logging.error("WHX Task Queue Add Timeout Retrying")
except TransientError:
taskq_name = taskq_name + str(timeout_ms)
time.sleep(float(timeout_ms)/1000)
timeout_ms = timeout_ms*4
logging.error("WHX Task Queue Add Transient Error Retrying")
except TombstonedTaskError:
logging.error("WHX Task Queue Tombstoned Error")
break
Disclaimer: this is not the answer you're looking for, but I hope it will help you nonetheless.
The computation cannot go forward
without these n tasks finishing
It sounds like you are using the task queue for something it was not designed to do. You should read: http://code.google.com/appengine/docs/java/taskqueue/overview.html#Queue_Concepts
Tasks are not guaranteed to be executed in the order they arrive, and they are not guaranteed to be executed exactly once. In some cases, a single task may be executed more than once or not at all. Further, a task can be cancelled and re-queued at the discretion of the App Engine based on available resources. For example, your timeout_ms = 100 is very low; if a new JVM has to be started, which could take several seconds, tasks n+1 and n+2 may be executed before task n.
In short, the task queue is not a reliable mechanism for performing strictly sequential computation. You've been warned.
-tjw

Resources