I am calling an API to get the codes and create a file of codes, I can call the API and get the response but when I am processing the response it's giving me the Apex CPU time limit exceeded.
The reason is response data is too big.
I am still confused with the solution that is suitable for this.
Related
I am making a bus prediction web application for a college project. The application will use GTFS-R data, which is essentially a transit delay API that is updated regularly. In my application, I plan to use a cron job and python script to make regular get requests and write the response to a JSON file, essentially creating a feed of transit updates. I have set up a get request, where the user inputs trip data that will be searched against the feed to determine if there are transit delays associated with their specific trip.
My question is - if the user sends a request at the same time as the JSON file is being updated, could this lead to issues?
One solution I was thinking of is having an intermediary JSON file, which when fully loaded will replace the file used in the search function.
I am not sure if this is a good solution or if it is even needed. I am also not sure of the semantics needed to search for solutions to similar problems so pointers in the right direction would be useful.
I am operating in a production environment with a number of different applications using the Amazon API. Of these, some are our own home-grown apps, and others are 3rd party shipping applications.
I have a situation where I am hitting an hourly throttle for the Reports API 'GetReport' request, and I am trying to determine what is causing us to be throttled. By my count, we shouldn't be exceeding ~60 calls per hour at the absolute maximum. (Just a note, while API info says this function call throttles at 60 requests per hour, the exception I received back indicated a cap of 120 requests per hour. Maybe the exception is wrong, and I'm hitting a 60 request cap?)
Is there either an API call to determine current call usage, or a method of accessing this information via Amazon Seller Central / Developers Program? I've done some searching around but everything I can find is describing how the throttling works which isn't my problem.
I am currently using C# Amazon MWS libraries for all function calls, although that information is a bit superfluous. Any insight into the proper API call to use, or how to gain access to this information would be greatly appreciated.
In the response to most calls you get back something like the following in the response.
"x-mws-quota-max"=>"60.0",
"x-mws-quota-remaining"=>"51.0",
"x-mws-quota-resetsOn"=>"2016-03-25T16:00:00.000Z"
You should be able to use this to figure out what is causing you to hit the limit quicker than expected. Perhaps logging out the call and the response with the data above??
Contact MWS Support here and ask for clarification on your issue. They surely know of your usage in order to be able to cap it. I met with the MWS team a few months ago in Detroit and they said any time you have a technical question to ask them. They've been really helpful to me.
I'm working in Java and was able to kick-off a mapreduce job. The job made it through the ShardedJob stage, but is now stuck on the ExamineStatusAndReturnResult stage. In the task queue I see a number of jobs like: /mapreduce/workerCallback/map-hex-string These jobs are all getting re-queued because the return code is 429 Too Many Requests (https://www.rfc-editor.org/rfc/rfc6585#section-4). I feel as though I'm hitting some sort of quota limit, but I cannot figure out where/why.
How can I tell why these tasks are receiving a 429 response code?
The mapreduce library tries to avoid getting OOM by doing its own estimated memory consumption bookkeeping (this can be tuned by overriding Worker/InputReader/OutputWriter estimateMemoryRequirement methods, and it work best when MR jobs are running in their own instances [module, backend, version]). Upon receiving an MR request from the task queue the mapreduce library will check the request's estimated memory and if that is less than what is currently available the request will be rejected with HTTP error code 429. To minimize such cases you should either increase the amount of available resources (type, number of instances) and/or decrease the parallel load (number of concurrent jobs, shards per job, and avoid any other type of load on the same instances).
We are sending a query from a GWT client by RPC to the GAE Java server. The response is a fairly complex object tree. The RPC implementations on the server takes 900ms from start to finish. The total HTTP request is taking 4-5 seconds. We have checked that the actual transmission time and ping time is negligible. (An RPC with void resonse takes 300ms and actual transmission time is small.)
I thought maybe the serialization of the response could be taking time but when we call that explicitly on the server using RPC.encodeResponseForSuccess it takes just 50ms.
So we have 3-4 seconds overhead completely unaccounted for and I'm at a loss how to debug that. We even tried sending the serialized RPC response back using a servlet instead of RPC and sure enough the very same response took ~1s instead of 5!
Thanks!
You're forgetting the client side serialization time for the request data and deserialization time for the response data.
Compile your app with -style PRETTY and run it through Chrome Dev Tools Profiler, IE Dev Tools Profiler or dynaTrace Ajax (http://ajax.dynatrace.com/ajax/en/) or a similar javascript profiling tool to see where your time goes.
Very large responses will take a long time to deserialize. If you use a lot of BigDecimal values in your response, because of the really complex nature of the emulation code it will take even longer (this one is a killer).
My app needs to do many datastore operations on each request. I'd like to run them in parallel to get better response times.
For datastore updates I'm doing batch puts so they all happen asynchronously which saves many milliseconds. App Engine allows up to 500 entities to be updated in parallel.
But I haven't found a built-in function that allows datastore fetches of different kinds to execute in parallel.
Since App Engine does allow urlfetch calls to run asynchronously, I created a getter URL for each kind which returns the query results as JSON-formatted text. Now my app can do async urlfetch calls to these URLs which could parallelize the datastore fetches.
This technique works well with small numbers of parallel requests, but App Engine throws errors when attempting to run more than 5 or 10 of these urlfetch calls at the same time.
I'm only testing now, so each urlfetch is the identical query; since they work fine in small volumes but start failing with more than a handful of simultaneous requests, I'm thinking it must have something to do with the async urlfetch calls.
My questions are:
Is there a limit to the number of urlfetch.create_rpc() calls that can run asynchronously?
The synchronous urlfecth.fetch() function has a 'deadline' parameter that will allow the function to wait up to 10 seconds for a response before failing. Is there any way to tell urlfetch.create_rpc() how long to wait for a response?
What do the errors shown below mean?
Is there a better server-side technique to run datastore fetches of different kinds in parallel?
File "/base/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 501, in get_result
return self.__get_result_hook(self)
File "/base/python_lib/versions/1/google/appengine/api/urlfetch.py", line 331, in _get_fetch_result
raise DownloadError(str(err))
InterruptedError: ('The Wait() request was interrupted by an exception from another callback:', DownloadError('ApplicationError: 5 ',))
Since App Engine allows async urlfetch calls but does not allow async datastore gets, I was trying to use urlfetch RPCs to retrieve from the datastore in parallel.
The lack of async datastore gets is an acknowledged issue:
http://code.google.com/p/googleappengine/issues/detail?id=1889
And there's now a third-party tool that allows async queries:
http://code.google.com/p/asynctools/
"asynctools is a library allowing you to execute Google App Engine API calls in parallel. API calls can be mixed together and queued up and then all are kicked off in parallel."
This is exactly what I was looking for.
While I am afraid that I can't directly answer any of the questions that you pose, I think that I ought to tell you that all of your research along these lines may not lead to you to a working solution for your problem.
The problem is that datastore writes take much longer than reads, so if you find a way to max out the number of reads that can happen, you're code will very run out of time long before it is able to make corresponding writes to all of the entities that you have read.
I would seriously consider rethinking the design of your datastore classes to reduce the number of reads and writes that needs to happen, as this will quickly become a bottleneck for your application.
Have you considered using TaskQueues to do the work of queuing the requests to be executed later?
If the task returns a 4xx status it will be considered failed and will be retried later - so you could pass the error back up and have the task queue handle retrying the requests until the succeed. Also, with some experimentation with bucket sizes and rates, you can probably have the Task Queue slow down the requests enough that you don't max out the database
There's also a nice wrapper (deferred.defer) which makes things even simpler - you can make a deferred call to (almost) any function in your app.