cakephp offline tasks - cakephp-2.0

I have a site which runs a quite intensive job from the admin panel.
I would somehow like to run it without interfering with the output it produce.
To make it more clear. On submitting a form I must unzip a big zip file, and for every image I fin, I must insert it to my database. This process takes more than 3-4 minutes. So I would like to put the job running and output that the job is being processed.
I don't know if I can notify user then (maybe through an ajax call) but I don't mind that.
Is it possible?
thanks

It depends on what you want the user to do while you're processing the data.
Do you want him to wait? Then use a AJAX call and display a progress bar (or whatever you like) until the task is complete. When the task is complete, inform the user (via ajax again) and proceed with (whatever) functionality you've got in store.
If you want the file to be processed while the user is doing something else, you can call (CakePHP) ShellTask in the background and let it do its' magic. If you want for the User to be able to view what the status of a import is, you can create a imports table, that stores information about all imports. By using a "status" field, the user can know that, for e.g. the task is still running.
When the shell task completes you can change the "status" of the import and do what ever. This needs to be done in the task itself, but it wouldn't be a problem because you can use models in Shell Tasks - the same way you can use models in Shells.
Hope this helps, comment if you need any further clarification.

Related

What's the right way to do long-running processes on app engine?

I'm working with Go on App Engine and I'm trying to build an API that needs to perform a long-running background task - in this case it needs to parse and chunk a big file out to task queues. I'd like it to return a 200 and close the user connection immediately and let the process keep running in the background until it's complete (this could take 5-10 minutes). Task queues alone don't really work for my use case because parsing the initial file can take more than the time limit for an API request.
At first I tried a Go routine as a solution for this problem. This failed because my app engine context expired as soon as the parent function closed the user connection. (I suppose I could try writing a go routine that doesn't require a context, but then I'd lose logging and I'd need to fetch the entire remote file and pass it to the go routine.)
Looking through the docs, it looks like App Engine used to have functionality to support exactly what I want to do: [runtime.RunInBackground], but that functionality is now deprecated and the replacement isn't obvious.
Is there a "right" or recommended way to do background processing now?
I suppose I could put a link to my big file into a task queue, but if I understand correctly, even functions called through task queues have to complete execution within a specified amount of time (is it 90 seconds?) I need to be able to run longer than that.
Thanks for any help.
try using:
appengine.BackgroundContext()
it should be long-lived but will only work on GAE Flex

will gatling actually perform the operation or will it check only the urls' response time?

I have a gatling test for an application that will answer a survey and upon answering this survey, the application will identify possible answers that may pose a risk and create what we call riskareas. These riskareas are normally created in the background as soon as the survey answering is finished. My question is I have a gatling test with ten users who will go and answer the survey and logout, I used recorder to record the test; now after these ten users are finished I do not see any riskareas being created in the application. Am I missing something--should the survey be really answered by gatling (like it does in selenium) user or is it just the urls that the gatling test will touch ?
I am new to gatling please help.
Gatling should be indistinguishable from a user in a web browser (or Selenium) as far as the server is concerned, so the end result should be exactly the same as if you'd gone through the process yourself. However, writing a Gatling script is a little more work than writing a Selenium script.
For performance reasons, Gatling operates at a lower level than Selenium. Gatling works with the actual data that is sent and received from the server (i.e, the actual GETs and POSTs sent to the server), rather than with user-level interactions (such as clicking links and filling forms).
The recorder will generally produce a relaitvely "dumb" script. It records the exact data that was sent to the server, and makes no attempt to account for things that may change from run to run. For example, the web application you are testing might have hidden form fields that contain session information, or the link addresses might contain a unique identifier or a session id.
This means that your script may not be doing what you think it's doing.
To debug the script, the first thing to do is to add checks on each of the requests, to validate that you are getting the response you expect (for example, check that when you submit page 1 of the survey, you are taken to page 2 - check for something that you'd only expect to find on page 2, like a specific question).
Once you know which requests are failing, look at what data was sent with the request, and try to figure out where it came from. You will probably find that there are session ids, view state, or similar, that must be extracted from the previous page.
It will help to enable request and response logging, as per the documentation.
To simplify testing of web apps, we wrote some helper functions to allow tests to be written in a more Selenium-like way. Once you understand what your application is doing, you may find that it simplifies scripting for you too. However, understanding why your current script doesn't work the way you expect should be your first step.

how to index data in solr from database automatically

I have MySql database for my application. i implemented solr search and used dataimporthandler(DIH)to index data from database into solr. my question is: is there any way that if database gets updated then my solr indexes automatically gets update for new data added in the database. . It means i need not to run index process manually every time data base tables changes.If yes then please tell me how can i achieve this.
I don't think there is a possibility in Solr which lets you index the data when any updates happens to DB.
But there could be possibilities like, with the help of Triggers - there is a possibility to run an external application from triggers.
Write a CRON to trigger PHP script which does reading from the DB and indexing it in Solr. Write a trigger (which calls this script) for CRUD operation and dump it into DB, so, whenever something happens to DB, this trigger will call the above script and indexing could happen.
Please see:
Invoking a PHP script from a MySQL trigger
Automatic Scheduling:
Please see this post How can I Schedule data imports in Solr for more information on scheduling. The second answer, explains how to import using Cron.
Since you used a DataImportHandler to initially load your data into Solr... You could create a Delta Import Handler that is executed using curl from a cron job to periodically add changes in the database to the index. Also, if you need more real time updates, as #Rakesh suggested, you could use a trigger in your database and have that kick off the curl call to the Delta DIH.
you can import the data using your browser and task manager.
do the following steps on windows server...
GO to Administrative tools => task Schedular
Click "Create Task"
Now a screen of Create Task will be open with the TAB
General,Triggers,Actions,Conditions,Settings.
In the genral tab enter the task name "Solrdataimport" and in discriptions enter "Import mysql data"
Now go to Triggers tab CLick new in Setting check Daily.In Advanced setting Repeat task every ... Put time there whatever you want.click OK
Now go to Actions button click new Button IN setting put Program/Script "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" this is the installation path of chrome browser.In the Add Arguments enter http://localhost:8983/solr/#/collection1/dataimport//dataimport?command=full-import&clean=true And click OK
Using the all above process Data import will Run automatically.In case of Stop the Imort process follow the all above process just change the Program/Script "taskkill" in place of "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" under Actions Tab In arguments enter "f /im chrome.exe"
Set the triggers timing according the requirements
What you're looking for is a "delta-import", and a lot of the other posts have information about that covered. I created a Windows WPF application and service to issue commands to Solr on a recurring schedule, as using CRON jobs and Task Scheduler is a bit difficult to maintain if you have a lot of cores / environments.
https://github.com/systemidx/SolrScheduler
You basically just drop in a JSON file in a specified folder and it will use a REST client to issue the commands to Solr.

Saving data instantly, onClose, save button? Which is better in this situation

In my GWT application, I have been saving everything that the user does instantly to the datastore in the background whenever they make changes. So far this has been fine because the things that the user can change aren't being changed a whole lot.
But now I have added a series of check boxes that the user can check & uncheck:
Would it be proper to save everything instantly to the database everytime the user checks/unchecks a box? The thing that's on my mind is reducing the amount of times my web application has to go to the server to save data. Facebook, Google, (and many many others) use a "Save" button whenever a user makes changes to a large amount of fields (say, to their user information).
I am trying to stay away from having a Save button, and so the thought came to mind about saving these values whenever the user closed or refreshed the page. I don't know if that's proper either (what if there is a loss of power, and their system shuts down!), but I know that I could use it like this:
public void onClose(CloseEvent<Window> event) {
//save changes to the datastore
}
I'm torn between the three methods and don't know which path to take! Any information will be helpful
Thank you!
Your current system follows a really excellent design pattern which a lot of apps (web- and otherwise) are picking up lately: Eliminate the manual operation of 'saving'. I think you should stick with it.
That said, if you want to reduce the number of round-trips and server load, you can do a couple of things: You could restrict the number of saves-in-progress to one, so that if the user makes a change while you're waiting for a 'save' request to complete, you wait until that request completes before sending a new one. Or, you could start a timer when the user makes a change, and commit any changes when the timer expires - this is pretty much how GMail's auto-save of draft messages works.
Either way, don't rely on a close event to trigger sending state to the server: If the user's browser crashes, the close event won't fire and they'll lose all their changes.

What should you do if a required asset fails to load?

My program is in Flex but it doesn't really matter for the question I am asking. OK say I need to load an XML file for the application to work at all. If I capture an IOError while the xml file is loading, what logically should I do with that? The application needs it or the app is useless, so should I just keep re-trying on error, or should I notify the user to try again later? What would you do?
Thanks.
Ask the user what to do - Retry or Fail, with Fail meaning the app will close . If it makes sense, give the user a chance to browse to the resource.
It really depends on the nature of the file. If you know the file will exist at one time, it may make sense to wait for the file's creation (although this seems like a poor man's network model). However, in situations where an application is useless without a resource, I would fail unrecoverably and give meaningful error messages to the user, as well as log some debugging code to a file that the user could later submit for developer debugging.
Like GMail - do both. Notify the user when an error happens or a timeout is hit, and keep trying in the meantime.
Loading is taking longer than expected, retrying - please wait...
Notify the user that the XML is not available, and offer the user the possibility to retry loading the XML, locate another XML or quit the application.
I don't think you should really try to many times that the page has noticeable lag. It really depends on whether the file being accessed is controlled by a third party or not and whether it is usually failing for large chunks of time or just say a second.

Resources