Asynchronous requestAction - cakephp

Pleeeeease help me... :- )
"You are my only hope".
I need to execute an action asynchronously several thousand times. The action is supposed to fetch email content from external API and it is located in the different controller, so I use requestAction to get it. When I get all the results, then these 1000 email contents are sent as 1000 emails during one request, using other API.
Unfortunately, when run sequentially, it takes a lof of time, so I need to run these request asynchronously.
My question is:
Can I execute
$this->requestAction($myUrl)
...parallelly? For example 100 requests at one time? I've seen a bit of asynchronous examples in PHP, but they all used static files and I need to preserve CakePHP structure to be able to use requestAction.
Thanks to all who can help!
EDIT: By the way, when I tried to run requests via fopen($url, 'r'); and then stream get contents the efficiency was worse than worst, but maybe it could be improved. Don't know, but the requestAction seems to be definitely better option (I think).

Is there any reason why you wouldn't use a shell for this?
Please take a look at: CakePHP Shells
"Deferred Execution" is probably what you are after here, basically you want to send one command not have the user waiting around for it? If so then you can use a Message Queue to handle this pretty easily.
We use CakeResque and Redis to send 1000's of emails and perform other API calls
CakeResque - Deferred Processing for CakePHP
There are other message queues that are available but this is really simple to get working and probably wont need much of a change to your code.

In the end, I wasn't able to find a solution that would use requestAction, but I was able to extract the code to a standalone php file, which then is processed using include method.
And the most interesting part, asynchronous requests, was done using great library called cURL-easy and with help of my utility function. You can read about it (how to install and how to use it) here:
Is making asynchronous HTTP requests possible with PHP?

Related

will gatling actually perform the operation or will it check only the urls' response time?

I have a gatling test for an application that will answer a survey and upon answering this survey, the application will identify possible answers that may pose a risk and create what we call riskareas. These riskareas are normally created in the background as soon as the survey answering is finished. My question is I have a gatling test with ten users who will go and answer the survey and logout, I used recorder to record the test; now after these ten users are finished I do not see any riskareas being created in the application. Am I missing something--should the survey be really answered by gatling (like it does in selenium) user or is it just the urls that the gatling test will touch ?
I am new to gatling please help.
Gatling should be indistinguishable from a user in a web browser (or Selenium) as far as the server is concerned, so the end result should be exactly the same as if you'd gone through the process yourself. However, writing a Gatling script is a little more work than writing a Selenium script.
For performance reasons, Gatling operates at a lower level than Selenium. Gatling works with the actual data that is sent and received from the server (i.e, the actual GETs and POSTs sent to the server), rather than with user-level interactions (such as clicking links and filling forms).
The recorder will generally produce a relaitvely "dumb" script. It records the exact data that was sent to the server, and makes no attempt to account for things that may change from run to run. For example, the web application you are testing might have hidden form fields that contain session information, or the link addresses might contain a unique identifier or a session id.
This means that your script may not be doing what you think it's doing.
To debug the script, the first thing to do is to add checks on each of the requests, to validate that you are getting the response you expect (for example, check that when you submit page 1 of the survey, you are taken to page 2 - check for something that you'd only expect to find on page 2, like a specific question).
Once you know which requests are failing, look at what data was sent with the request, and try to figure out where it came from. You will probably find that there are session ids, view state, or similar, that must be extracted from the previous page.
It will help to enable request and response logging, as per the documentation.
To simplify testing of web apps, we wrote some helper functions to allow tests to be written in a more Selenium-like way. Once you understand what your application is doing, you may find that it simplifies scripting for you too. However, understanding why your current script doesn't work the way you expect should be your first step.

How to create separate results with a single nagios check?

I've written a check for our Nagios to find offline AccessPoints. This one check will test all 400 APs at one turn, which is also very efficient, but there is one drawback: If some of the APs will be offline for a while and i know, there is no way to get rid of the critical-error in Nagios. If i acknowledge the service to a time when i expect the APs o work again, i will not see other APs fail.
Now, i wonder if there is a way to check all APs in one turn, but to create separate check-results in Nagios, so i could only ACK the ones which i know they're out-of-order for a while. I don't think do a check for each AP is a solution here.
A check for each AP is the easiest and cleanest way to do this. Then you can have a metacheck that alerts on the conditions you give to it.
Another approach is using a temp file to store the current status and a config file where you pseudo-ack the APs. the meta-check would only need to compare those two files.
In our Icinga (nagios-fork) installation we have something on these lines. I wrote a php frontend for it, the action note link brings me to a page where I can edit graphically the parameters for the checks.

Backbone.sync and rpc

I like backbone very much, but I use not REST, but rpc over socket.io, so i need to customize somehow Backbone.sync logic, not to send RESTful requests, but to execute my client rpc library methods.
I found such example of Backbone.sync customization:
http://jsfiddle.net/nikoshr/4ArmM/
But not everything is clear for me. In the end Backbone.sync.call() is executed - what is it?
How does it really work? Does it just perform some GET request here, and i can just omit it (i do not have to make any requests as i am using socket), or it makes something important?
My idea is to take this example and just to insert here some rpc calls. Is it right way?
Rather than start with an arbitrary fiddle, why not take a look at the Backbone source code. It's very easy to read and very well documented. Scroll down to the Backbone.Sync section and you'll find that it isn't very hard to override.

How can I profile Python functions line-by-line in Google App Engine?

I'd like to use line_profiler to profile a single large method line by line in my GoogleAppEngine application.
Unfortunately GAE doesn't seem to let you import .so libraries, even on a local dev server.
How could I go about achieving my goal?
I'd be happy to use a python-only solution, if there's one out there, or take suggestions as to how to write my own.
Use gae_mini_profiler.
It can either keep track of all function calls and their timings (instrumented) or can periodically examine the call stack to figure out in which functions time is being spent during a request (sampling). You can see an example of it in action here - http://mini-profiler.appspot.com/

How to ensure that a bot/scraper does not get blocked

I coded a simple scraper , who's job is to go on several different pages of a site. Do some parsing , call some URL's that are otherwise called via AJAX , and store the data in a database.
Trouble is , that sometimes my ip is blocked after my scraper executes. What steps can I take so that my ip does not get blocked? Are there any recommended practices? I have added a 5 second gap between requests to almost no effect. The site is medium-big(need to scrape several URLs)and my internet connection slow, so the script runs for over an hour. Would being on a faster net connection(like on a hosting service) help ?
Basically I want to code a well behaved bot.
lastly I am not POST'ing or spamming .
Edit: I think I'll break my script into 4-5 parts and run them at different times of the day.
You could use rotating proxies, but that wouldn't be a very well behaved bot. Have you looked at the site's robots.txt?
Write your bot so that it is more polite, i.e. don't sequentially fetch everything, but add delays in strategic places.
Following guidelines set in robots.txt is a good first step. There are tools such as import.io and morph.io. There are also packages/ plugins for servers. For example x-ray; a node.js which have options to assist in quickly writing responsible scrapers e.g. throttle, delays, max connections etc.

Resources