When exactly a site uses it's database for retrieval - database

my knowledge on how a database exactly works is close to zero and I'm trying to understand when exactly a site uses it's database to retrieve information. So for example the site retrieves all the information the moment i load the site(so when i choose for example "funny pictures" it doesn't have to retrieve anything from the database) or it retrieves information only when i make a specific choice? I hope you kind of understand my question, I'm sorry for my bad English.

It depends how it has been implemented and with whihc technology.
Most of time, It load a specific set of information only relevant of the current context.

It depends on SW technology used, settings and site code. Normally it is set to show something only when needed, but it is possible to change this behaviour.
There is one more possibility you haven't noticed, but often used - for example, you have a set of pictures. You see all their small icons on the page, but the whole picture - only if you click on an icon. Then while you are looking on the icons, or on one of the pictures, others pictures are downloaded at the background, to be quickly shown when needed. Sometimes even some primitive prediction system works, to guess what you'll look at the next time.
And all these behaviours work not only for data from databases, but for all data on the pages.
Again, the main thought - it is for you to choose.

Related

Any recent changes in how contacts identify themselves to the Mirror API?

I'm concerned - when I take a picture, I usually (ie, last week) am able to share the image to my app.
Now, however, only Google + contacts appear as share targets. For example, if I turn off sharing to G+, I get no Share options at all, only a greyed-out Share dialog that says "Visit google.com/myglass to add friends"
However, when I go to that address I clearly see my app and a number of contacts (who aren't in G+) who also usually show up.
Has something changed to cause this behavior? For example, is the code listed in the starter-project no longer sufficient to register a share target for photos?
For example, I could imagine that suddenly the acceptTypes[] parameter was now mandatory. But I'd love to hear someone closer to the API weigh in, if possible.
Thanks!
AKA
I solved this by following Alain's comment's advice.
It's very easy to think that the "Contacts" page you see at https://glass.google.com/myglass is all there is.
But if you want your app to receive shared stuff, you have to go here: https://glass.google.com/myglass/share

CakePHP Beginner: Advice needed, Everything on a single view or multi part forms

Thanks in advance for any help offered and patience for my current web-coding experience.
Background:
I'm currently attempting to develop an web based application for my family's business. There is a current version of this system I have developed in C#, however I want to get the system web-based and in the process learn cakephp and the MVC pattern.
Current problem:
I'm currently stuck in a controller that's supposed to take care of a PurchaseTicket. This ticket will have an associated customer, line items, totals etc. I've been trying to develop a basic 'add()' function to the controller however I'm having trouble with the following:
I'm creating a view with everything on it: a button for searching customer, a button to add line items, and a save button. Since I'm used to developing desktop applications, I'm thinking that I might be trying to transfer the same logic to web-based. Is this something that would be recommended or do'able?
I'm running into basic problems like 'searching customer'. From the New Ticket page I'm redirecting to the customer controller, searching and then putting result in session variable or posting it back, but as I continue my process with the rest of the required information, I'm ending up with a bit of "spaghetti" code. Should I do a multi part form? If I do I break the visual design of the application.
Right now I ended up instantiating my PurchaseTicket model and putting it in a session variable. I did this to save intermediate data however I'm not sure if instantiating a Model is conforming to cakephp standards or MVC pattern.
I apologize for the length, this is my first post as a member.
Thanks!
Welcome to Stack Overflow!
So it sounds like there's a few questions, all with pretty open-ended answers. I don't know if this will end up an answer as such, but it's more information than I could put in a comment, so here I go:
First and foremost, if you haven't already, I'd recommend doing the CakePHP Blog Tutorial to get familiar with Cake, before diving straight into a conversion of your existing desktop app.
Second, get familiar with CakePHP's bake console. It will save you a LOT of time if you use it to get started on the web version of your app.
I can't stress how important it is to get a decent grasp of MVC and CakePHP on a small project before trying to tackle something substantial.
Third, the UI for web apps is definitely different to desktop apps. In the case of CakePHP, nothing is 'running' permanently on the server. The entire CakePHP framework gets instantiated, and dies, with every single page request to the server. That can be a tricky concept when transitioning from desktop apps, where everything is stored in memory, and instances of objects can exist for as long as you want them to. With desktop apps, it's easier to have a user go and do another task (like searching for a customer), and then send the result back to the calling object, the instance of which will still exist. As you've found out, if you try and mimic this functionality in a web app by storing too much information in sessions, you'll quickly end up with spaghetti code.
You can use AJAX (google it if you don't already know about it) to update parts of a page only, and get a more streamlined UI, which it sounds like something you'll be needing to do. To get a general idea of the possibilities, you might want to take a look at Bamboo Invoice. It's not built with CakePHP, but it's built with CodeIgniter, which is another open source PHP MVC framework. It sounds like Bamboo Invoice has quite a few similar functionalities to what you're describing (an Invoice has line items, totals, a customer, etc), so it might help you to get an idea of how you should structure your interface - and if you want to dig into the source code, how you can achieve some of the things you want to do.
Bamboo Invoice uses Ajax to give the app a feel of 'one view with everything on it', which it sounds like you want.
Fourth, regarding the specific case of your Customer Search situation, storing stuff in a session variable probably isn't the way to go. You may well want to use an autocomplete field, which sends an Ajax request to server after each time a character is entered in the field, and displays the list list of suggestions / matching customers that the server sends back. See an example here: http://jqueryui.com/autocomplete/. Implementing an autocomplete isn't totally straight forward, but there should be plenty of examples and tutorials all over the web.
Lastly, I obviously don't know what your business does, but have you looked into existing software that might work for you, before building your own? There's a lot of great, flexible web-based solutions, at very reasonable prices, for a LOT of the common tasks that businesses have. There might be something that gives you great results for much less time and money than it costs to build your own solution.
Either way, good luck, and enjoy CakePHP!

Need ideas on retrieving data from a website

I'm stumped and need some ideas on how to do this or even whether it can be done at all.
I have a client who would like to build a website tailored to English-speaking travelers in a specific country (Thailand, in this case). The different modes of transportation (bus & train) have good web sites for providing their respective information. And both are very static in terms of the data they present (the schedules rarely change). Here's one of the sites I would need to get info from: train schedules The client wants to provide users the ability to search for a beginning and end location and determine, using the external website's information, how they can best get there, being provided a route with schedule times for the different modes of chosen transport.
Now, in my limited experience, I would think the way to do that would be to retrieve the original schedule info from the external site's server (via API or some other means) and retain the info in a database, which can be queried as needed. Our first thought was to contact the respective authorities to determine how/if this can be done, but this has proven to be problematic due to the language barrier, mainly.
My client suggested what is basically "screen scraping", but that sounds like it would be complicated at best, downloading the web page(s) and filtering through the HTML for relevant/necessary data to put into the database. My worry is that the info on these mainly static sites is so static, that the data isn't even kept in a database to build the page and the web page itself is updated (hard-coded) when something changes.
I could really use some help and suggestions here. Thanks!
Screen scraping is always problematic IMO as you are at the mercy of the person who wrote the page. If the content is static, then I think it would be easier to copy the data manually to your database. If you wanted to keep up to date with changes, you could then snapshot the page when you transcribe the info and run a job to periodically check whether the page has changed from the snapshot. When it does, it sends an email for you to update it.
The above method could also be used in conjunction with some sort of screen scaper which could fall back to a manual process if the page changes too drastically.
Ultimately, it is a case of how much effort (cost) is your client willing to bear for accuracy
I have done this for the following site: http://www.buscatchers.com/ so it's definitely more than doable! A key feature of a web scraping solution for travel sites is that it must send you emails if anything went wrong during the scraping process. On the site, I use a two day window so that I have two days to fix the code if the design changes. Only once or twice have I had to change my code, and it's very easy to do.
As for some examples. There is some simplified source code here: http://www.buscatchers.com/about/guide. The full source code for the project is here: https://github.com/nicodjimenez/bus_catchers. This should give you some ideas on how to get started.
I can tell that the data is dynamic, it's to well structured. It's not hard for someone who is familiar with xpath to scrape this site.

Optimisation: use local files or databases for HTML

This follows on from this question where I was getting a few answers assuming I was using files for storing my HTML templates.
Before recently I've always saved 'compiled' templates as html files in a directory (above the root). My templates usually have two types of variable - 'static' variables which are not replaced on every usage but are used across the site - basically for ease of maintenanceif I decide to change the site name for example; and dynamic vars that change on every page load.
I always used to save these as files on the server - but my friend pointed out something I'd overlooked: why have 5-10 filesystem calls when you can have one database call?
What I want to know is which is more efficient? Calling several HTML files from the system or calling several rows of templates from the database (in one query/call).
Don't Store Editable HTML In the database
Seriously, because the maintenance overhead for mere changes becomes exhaustive once you realise you can no longer just pop open a text editor.
I worked on many projects which had HTML content in the database, and it was a constant nightmare of "find that row the content is on" and I really would liked to have shot the person whom made it.
Also, DON'T PREMATURELY OPTIMISE . If you find it a problem thats slowing the project down, then change it. Because making the code exhaustively less maintainable to save a millisecond. But design the code well enough that should you need to change where the content comes from later it should be easy to do.
Surely that can be resolved by having
a suitable web interface for editing
the templates?
Erm, really not, unless you're only trying to compete with notepad. Syntax highlighting and all the other full host of features you can get in a standard editor just make your developers suicidal when they find themselves editing web pages by hacking at an undersized text area with awful white on black ( not to mention the extra fun you get with having to worry about entity encoding etc, for instance, try editing html with in a text area where the html content contains a text area element! )
On FileIO While File IO can be a bottleneck, keep in mind that if you have a proper linux install, and plenty of memory, a handy thing known as "disk-cache" takes effect, which in effect, keeps files used in memory, so file IO becomes mere memcpy.
On the contrary, in real stress tests on any of the code I have used, the biggest slowdowns have been in the database!, primarily the nice slow CONNECT string, query parse time, extra php<->mysql interactions. You're not really looking at gaining anything. Filesystem lookup is close to database index lookup, and you don't have any unknowns other than "you need to stream it from a disk", no table locking stuff to worry about!
You should probably try something like a caching library, X-Cache comes highly recommended, thats more likely to give you visible performance gains.
I pretty much agree with the gist of Kent Fredric's answer. But, if you really want to know, which is more efficient/faster, you cannot reasonably expect to get the answer here. If you want that answer, there is only one way to get it: profile the application both ways.

How do screen scrapers work? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I hear people writing these programs all the time and I know what they do, but how do they actually do it? I'm looking for general concepts.
Technically, screenscraping is any program that grabs the display data of another program and ingests it for it's own use.
Quite often, screenscaping refers to a web client that parses the HTML pages of targeted website to extract formatted data. This is done when a website does not offer an RSS feed or a REST API for accessing the data in a programmatic way.
One example of a library used for this purpose is Hpricot for Ruby, which is one of the better-architected HTML parsers used for screen scraping.
Lots of accurate answers here.
What nobody's said is don't do it!
Screen scraping is what you do when nobody's provided you with a reasonable machine-readable interface. It's hard to write, and brittle.
As an example, consider an RSS aggregator, then consider code that gets the same information by working through a normal human-oriented blog interface. Which one breaks when the blogger decides to change their layout?
Of course, sometimes you have no choice :(
In general a screen scraper is a program that captures output from a server program by mimicing the actions of a person sitting in front of the workstation using a browser or terminal access program. at certain key points the program would interpret the output and then take an action or extract certain amounts of information from the output.
Originally this was done with character/terminal outputs from mainframes for extracting data or updating systems that were archaic or not directly accessible to the end user. in modern terms it usually means parsing the output from an HTTP request to extract data or to take some other action. with the advent of web services this sort of thing should have died away, but not all apps provide a nice api to interact with.
A screen scraper downloads the html page, and pulls out the data interested either by searching for known tokens or parsing it as XML or some such.
In the early days of PC's, screen scrapers would emulate a terminal (e.g. IBM 3270) and pretend to be a user in order to interactively extract, update information on the mainframe. In more recent times, the concept is applied to any application that provides an interface via web pages.
With emergence of SOA, screenscraping is a convenient way in which to services enable applications that aren't. In those cases, the web page scraping is the more common approach taken.
Here's a tiny bit of screen scraping implemented in Javascript, using jQuery (not a common choice, mind you, since scraping is usually a client-server activity):
//Show My SO Reputation Score
var repval = $('span.reputation-score:first'); alert('StackOverflow User "' + repval.prev().attr('href').split('/').pop() + '" has (' + repval.html() + ') Reputation Points.');
If you run Firebug, copy the above code and paste it into the Console and see it in action right here on this Question page.
If SO changes the DOM structure / element class names / URI path conventions, all bets are off and it may not work any longer - that's the usual risk in screen scraping endeavors where there is no contract/understanding between parties (the scraper and the scrapee [yes I just invented a word]).
Technically, screenscraping is any program that grabs the display data of another program and ingests it for it's own use.In the early days of PC's, screen scrapers would emulate a terminal (e.g. IBM 3270) and pretend to be a user in order to interactively extract, update information on the mainframe. In more recent times, the concept is applied to any application that provides an interface via web pages.
With emergence of SOA, screenscraping is a convenient way in which to services enable applications that aren't. In those cases, the web page scraping is the more common approach taken.
Quite often, screenscaping refers to a web client that parses the HTML pages of targeted website to extract formatted data. This is done when a website does not offer an RSS feed or a REST API for accessing the data in a programmatic way.
Typically You have an HTML page that contains some data you want. What you do is you write a program that will fetch that web page and attempt to extract that data. This can be done with XML parsers, but for simple applications I prefer to use regular expressions to match a specific spot in the HTML and extract the necessary data. Sometimes it can be tricky to create a good regular expression, though, because the surrounding HTML appears multiple times in the document. You always want to match a unique item as close as you can to the data you need.
Screen scraping is what you do when nobody's provided you with a reasonable machine-readable interface. It's hard to write, and brittle.
As an example, consider an RSS aggregator, then consider code that gets the same information by working through a normal human-oriented blog interface. Which one breaks when the blogger decides to change their layout.
One example of a library used for this purpose is Hpricot for Ruby, which is one of the better-architected HTML parsers used for screen scraping.
You have an HTML page that contains some data you want. What you do is you write a program that will fetch that web page and attempt to extract that data. This can be done with XML parsers, but for simple applications I prefer to use regular expressions to match a specific spot in the HTML and extract the necessary data. Sometimes it can be tricky to create a good regular expression, though, because the surrounding HTML appears multiple times in the document. You always want to match a unique item as close as you can to the data you need.
Screen scraping is what you do when nobody's provided you with a reasonable machine-readable interface. It's hard to write, and brittle.
Not quite true. I don't think I'm exaggerating when I say that most developers do not have enough experience to write decents APIs. I've worked with screen scraping companies and often the APIs are so problematic (ranging from cryptic errors to bad results) and often don't give the full functionality that the website provides that it can be better to screen scrape (web scrape if you will). The extranet/website portals are used my more customers/brokers than API clients and thus are better supported. In big companies changes to extranet portals etc.. are infrequent, usually because it was originally outsourced and now its just maintained. I refer more to screen scraping where the output is tailored, e.g. a flight on particular route and time, an insurance quote, a shipping quote etc..
In terms of doing it, it can be as simple as web client to pull the page contents into a string and using a series of regular expressions to extract the information you want.
string pageContents = new WebClient("www.stackoverflow.com").DownloadString();
int numberOfPosts = // regex match
Obviously in a large scale environment you'd be writing more robust code than the above.
A screen scraper downloads the html
page, and pulls out the data
interested either by searching for
known tokens or parsing it as XML or
some such.
That is cleaner approach than regex... in theory.., however in practice its not quite as easy, given that most documents will need normalized to XHTML before you can XPath through it, in the end we found the fine tuned regular expressions were more practical.

Resources