What should you do if a required asset fails to load? - theory

My program is in Flex but it doesn't really matter for the question I am asking. OK say I need to load an XML file for the application to work at all. If I capture an IOError while the xml file is loading, what logically should I do with that? The application needs it or the app is useless, so should I just keep re-trying on error, or should I notify the user to try again later? What would you do?
Thanks.

Ask the user what to do - Retry or Fail, with Fail meaning the app will close . If it makes sense, give the user a chance to browse to the resource.

It really depends on the nature of the file. If you know the file will exist at one time, it may make sense to wait for the file's creation (although this seems like a poor man's network model). However, in situations where an application is useless without a resource, I would fail unrecoverably and give meaningful error messages to the user, as well as log some debugging code to a file that the user could later submit for developer debugging.

Like GMail - do both. Notify the user when an error happens or a timeout is hit, and keep trying in the meantime.
Loading is taking longer than expected, retrying - please wait...

Notify the user that the XML is not available, and offer the user the possibility to retry loading the XML, locate another XML or quit the application.

I don't think you should really try to many times that the page has noticeable lag. It really depends on whether the file being accessed is controlled by a third party or not and whether it is usually failing for large chunks of time or just say a second.

Related

App Engine silently fails on some requests

Some requests silently fail in my python app, intermittently and unpredictably. The hallmarks of the failure are:
Request returns a 200, so the client doesn't know there's a problem.
Request does NOT successfully execute on the server.
No logging statements are recorded for the request.
Below is an example from my logs of a bunch of requests which are each supposed to write an entity to the datastore. You can see for the lower, successful request, a blue 'i' is present, indicating that info level logs were recorded. When I examine the datastore, an entity was successfully written for this request.
However, for the failed request, you can see there is just a white box, and there are no logging statements present at all. While the server returned a 200, no entity was written to the datastore for this request.
Has anyone encountered something like this before on App Engine? Any ideas on how to debug it? I've seen it in multiple different apps myself, but I've never been able to figure it out.
EDIT
To clarify, the main problem here is that code doesn't execute, as measured by the failure to write an entity. The spurious 200 and lack of logging is an associated symptom.
From a comment originally, but seems to be the resolution path for this issue:
Given that there are no log statements at all in the line and you appear to unpack the arguments and log them as soon as you enter the handler, this starts to look like an infrastructure/platform issue.
In such a case, it's best to open a public issue tracker issue, with "Type-Production" as a tag, including your app's app id and a timeframe, and as much information about your app and request handler involved as possible, and platform support will pick up the issue in the course of triage.
That said, it's worth examining the handler to make absolutely sure there's no way you could be exiting from the handler and sending a 200 without logging anything or seeing an exception. It all depends on what the code handling the request is capable of, what stack of libraries it's build upon, etc.

will gatling actually perform the operation or will it check only the urls' response time?

I have a gatling test for an application that will answer a survey and upon answering this survey, the application will identify possible answers that may pose a risk and create what we call riskareas. These riskareas are normally created in the background as soon as the survey answering is finished. My question is I have a gatling test with ten users who will go and answer the survey and logout, I used recorder to record the test; now after these ten users are finished I do not see any riskareas being created in the application. Am I missing something--should the survey be really answered by gatling (like it does in selenium) user or is it just the urls that the gatling test will touch ?
I am new to gatling please help.
Gatling should be indistinguishable from a user in a web browser (or Selenium) as far as the server is concerned, so the end result should be exactly the same as if you'd gone through the process yourself. However, writing a Gatling script is a little more work than writing a Selenium script.
For performance reasons, Gatling operates at a lower level than Selenium. Gatling works with the actual data that is sent and received from the server (i.e, the actual GETs and POSTs sent to the server), rather than with user-level interactions (such as clicking links and filling forms).
The recorder will generally produce a relaitvely "dumb" script. It records the exact data that was sent to the server, and makes no attempt to account for things that may change from run to run. For example, the web application you are testing might have hidden form fields that contain session information, or the link addresses might contain a unique identifier or a session id.
This means that your script may not be doing what you think it's doing.
To debug the script, the first thing to do is to add checks on each of the requests, to validate that you are getting the response you expect (for example, check that when you submit page 1 of the survey, you are taken to page 2 - check for something that you'd only expect to find on page 2, like a specific question).
Once you know which requests are failing, look at what data was sent with the request, and try to figure out where it came from. You will probably find that there are session ids, view state, or similar, that must be extracted from the previous page.
It will help to enable request and response logging, as per the documentation.
To simplify testing of web apps, we wrote some helper functions to allow tests to be written in a more Selenium-like way. Once you understand what your application is doing, you may find that it simplifies scripting for you too. However, understanding why your current script doesn't work the way you expect should be your first step.

Need ideas on retrieving data from a website

I'm stumped and need some ideas on how to do this or even whether it can be done at all.
I have a client who would like to build a website tailored to English-speaking travelers in a specific country (Thailand, in this case). The different modes of transportation (bus & train) have good web sites for providing their respective information. And both are very static in terms of the data they present (the schedules rarely change). Here's one of the sites I would need to get info from: train schedules The client wants to provide users the ability to search for a beginning and end location and determine, using the external website's information, how they can best get there, being provided a route with schedule times for the different modes of chosen transport.
Now, in my limited experience, I would think the way to do that would be to retrieve the original schedule info from the external site's server (via API or some other means) and retain the info in a database, which can be queried as needed. Our first thought was to contact the respective authorities to determine how/if this can be done, but this has proven to be problematic due to the language barrier, mainly.
My client suggested what is basically "screen scraping", but that sounds like it would be complicated at best, downloading the web page(s) and filtering through the HTML for relevant/necessary data to put into the database. My worry is that the info on these mainly static sites is so static, that the data isn't even kept in a database to build the page and the web page itself is updated (hard-coded) when something changes.
I could really use some help and suggestions here. Thanks!
Screen scraping is always problematic IMO as you are at the mercy of the person who wrote the page. If the content is static, then I think it would be easier to copy the data manually to your database. If you wanted to keep up to date with changes, you could then snapshot the page when you transcribe the info and run a job to periodically check whether the page has changed from the snapshot. When it does, it sends an email for you to update it.
The above method could also be used in conjunction with some sort of screen scaper which could fall back to a manual process if the page changes too drastically.
Ultimately, it is a case of how much effort (cost) is your client willing to bear for accuracy
I have done this for the following site: http://www.buscatchers.com/ so it's definitely more than doable! A key feature of a web scraping solution for travel sites is that it must send you emails if anything went wrong during the scraping process. On the site, I use a two day window so that I have two days to fix the code if the design changes. Only once or twice have I had to change my code, and it's very easy to do.
As for some examples. There is some simplified source code here: http://www.buscatchers.com/about/guide. The full source code for the project is here: https://github.com/nicodjimenez/bus_catchers. This should give you some ideas on how to get started.
I can tell that the data is dynamic, it's to well structured. It's not hard for someone who is familiar with xpath to scrape this site.

Inaccessible database

I'm working on an rather small application (C#, winform) that is kind of front-end to MS Access database file stored on shared drive. While it is possible that drive could be down I am checking connection while loading Main Form.
I would like to know your opinions on how to deal with this problem.
I came up with ideas like:
Application shows only MessageBox
with an error message and close
itself (before actually showing up),
as it won't be useful at all.
Application load itself and then
display an error-message to not make
users confused (if they click warning
before reading the explanation).
What are yours best practices?
I think it's rather irrelevant whether the application is shown or not because in the end, you display the message box with the error anyway. The user clicks OK and you close your Application.
However, to me it's a bit nicer without the application showing in background, mainly because it makes no sense to fire it up when the database isn't available. Save yourself (and the pc) the time it takes to show it ;)

How to control a web application through email? Or how to run php script by sending an email?

I want to run a web application on php and mysql, using the CakePHP framework. And to keep the threshold of using the site at a very low place, I want to not use the standard login with username/password. (And I don't want to hassle my users with something like OpenID either. Goes to user type.)
So I'm thinking that the users shall be able to log in by sending an email to login#domain.com with no subject or content required. And they will get, in reply, an email with a link that will log them in (it will contain a hash). Also I will let the users do some actions without even visiting the site at all, just send an email with command#domain.com and the command will be carried out. I will assume that the users and their email providers takes care of their email account security and as such there is no need for it on my site.
Now, how do I go from an email is sent to an account that is not read by humans to there being fired off some script (basically a "dummy browser client" calls an url( and the cakephp will take care of the rest)?
I have never used a cron job before, but I do think I understand their purpose or how they generally work. I can not have the script be called by random people visiting the site, as that solution won't work for several reasons. I think I would like to hear more about the possibility of having the script be run as response to an email coming in, if anyone has any input at all on that. If it's run as a cron job it would only check every X minutes and users would get a lag in their response (if i understand it correctly).
Since there will be different email addresses for different commands, like login#domain.com and I know what to do and how to do it to based on the sender email, i dont even need the content, subject or any other headers from the email.
There is a lot of worry about security of this application, I understand the issues, but without giving away my concept, I dont think it is a big issue for what I am doing. Also about the usability issue, there really isnt any. It's just gonna be login to provide changes on a users profile if/when they need that and one other command. And this is the main email and is very easy to remember and the outset of this whole concept.
I have used the pop3 php class with great success (there is also a Pear POP3 module).
Using the pop3 class looks something like this:
require ('pop3.php');
$pop3 = new pop3_class();
$pop3->hostname = MAILHOST;
$pop3->Open();
$pop3->Login('myemailaddress#mydomain.com', 'mypassword');
foreach($pop3->ListMessages("","") as $msgidx => $msgsize)
{
$headers = "";
$body = "";
$pop3->RetrieveMessage($msgidx, $headers, $body, -1);
}
I use it to monitor a POP3 mailbox which feeds into a database.
It gets called by a cronjob which uses wget to call the url to my php script.
*/5 * * * * "wget -q --http-user=me --http-passwd=pass 'http://mydomain.com/mail.php'" >> /dev/null 2>&1
Edit
I've been thinking about your need to have users send certain site commands by email.
Wouldn't it be easier to have a single address that multiple commands can be sent to rather than having multiple addresses?
I think the security concerns are pretty valid too. Unless the commands are non-destructive or aren't doing anything user-specific, the system will be wide open to anyone who knows how to spoof an email address (which would be everyone :) ).
You'll need some sort of CronJob/Timer Service that checks the Mailbox regularly and then acts on it. Alternatively, you should check the mailserver if it can run a script when a mail arrives (i.e. see if it's possible to put a spamfilter-script in and "abuse" that functionality to call your script instead).
With pure PHP, you're mostly out of luck as something needs to trigger the script. On a Pagewith a LOT of traffic, you could have your index.php or whatever do the check, but when no one visits your site for quite some time, then the mail will not be sent, and you have to be careful of "race conditions" when multiple people are accessing the script at the same time.
Edit: Just keep one usability flaw in mind: People with Multiple PCs and without an e-Mail Client on every one. For example, I use 4 PCs, but only 1 (my main one) has a Mail Client installed, and I use Webmail to check the other ones. Now, logging in and sending a mail through Webmail is not the greatest usability - in order to use YOUR site, I first have to log in to ANOTHER site, compose a mail through the crappy interface most Webmail tools have and wait for answer. Could as well use OpenID there :-)
If your server allows it you can use a .forward file or Procmail to start a process (php or anything) when a mail arrives to a certain address.
You don't want to hassle users with OpenID, but you want them to deal with this email scheme. Firstly, email can take a long time to go through. There isn't any guaranteed time that an email will be delivered in. It's not even guaranteed that the email will get there at all. I know things usually are quick, but it's not uncommon to take up to 10 minutes for a round trip to be completed. Also, unless you're encrypting the email, the link you are sending back is sent in the open. That means anybody can use that link to log in. Depending one how secure you want to be, this may or may not be an issue, but it's definitely something to think about. Using a non-standard login method like this is going to be a lot more work than it is probably worth, and I can't really see any advantages to the whole process.
I was also thinking using procmail to start some script. There is also formail, which might come in handy to change or extract headers. If you have admin access to the mail server, you could also use /etc/aliases and just pipe to your script.
Besides usability issues, you should really think about security - it's actually quite simple to send email with a fake sender address, so I would not rely on it for anything critical.
I agree with all the security concerns. Your assumption that "the users and their email providers takes care of their email account security" is not correct when it comes to the sender's e-mail address.
But since you specifically asked "how do I go from an email is sent to an account that is not read by humans to there being fired off some script", I recommend using procmail to deliver the incoming e-mail to a script you write.
I would not call a URL. I would have the script perform the work by reading the message sent in on stdin. That way, the script is not acessible to anyone on the web site.
To set this up, the e-mail address you provide to your users will have to be associated with a real user
on the system. In that user's home directory, create a file called ".procmailrc"
In that file, add these two lines:
:0 hb:
| /path/to/program
Where /path/to/program is the full path to the script or program for handling
the incoming message. Then create the script with code something like this:
#!/usr/bin/php
<?php
$fp=fopen('php://stdin','r');
while($line = fgets($fp)) {
[do something with each $line of input here]
}
?>
The e-mail message will not remain in the mailbox, so if you want to save or log it, have the script do it.
--
Bruce
I would seriously reconsider this approach. E-mail hasn't got very high reliability. There's all kinds of spamfilters that might intercept e-mails with links thereby rendering the "command" half-finished, not to mention the security risks.
It's very easy to spoof the sender-address on an e-mail. You are basically opening up your system to anyone.
Also instead of a username/password combination you're suddenly requiring the users to remember a list of commands to put in front of an email-address. It would be better to provide them with a username/password and then giving access to a help page.
In other words the usability and security of this scheme scores very low.
I can't really find any advantages to this approach that even comes close to outweighing the massive disadvantages.
One solution to prevent spam, make sure the first line, last line or a specific line contains a certain string, almost like a password, but a full sentence is better.
Only you have the word or words, pretty secure, just remember to delete the mails after use and those that do not have the secret line.
Apart from the security and usability email delivery can be another problem. Depending on the user's email provider, email delivery can be delayed from a few minutes to few hours.
There is a realy nice educational story on thedailywtf.com on designing software. The posed question should be solved by a proper design, not by techo-woopla.
Alexander, please read the linked story and think gloves, not email-driven webpage browsing.
PHP is not a hammer.

Resources