Is it possible to find the source of website data? - database

When a website is constantly updating it's information from a data source, is it possible to find out what or where that data source is?
An example of what I'm talking about is stock prices. I'm curious to learn for educational purposes.

It depends how the site has implemented the solution. If it's server-side, you probably won't be able to find out. If it's in javascript, you could find out by looking at the page sources, such as in Chrome -> Inspect. But you'll need to know your way around.

Related

How are stored handwritten notes from apps in databases?

At the beggining I'd like to say it's not an emergency :D
I was thinking about project ideas recently. Projects that I could try to create to learn something more, something new or just to leave my comfort zone. I've picked notes app project that support handwritten notes. And here's the first problem, my little knowledge can't come up with idea how to store these handwritten notes in database.
Database or other technologies haven't been picked yet so there is no "How to store it in MySQL?" and so on... just theoretically thinking how it could be done. I was looking in google and here on stackoverflow but didn't get nothing similar, just some questions how to verify or recognize handwritten notes.
Has anybody any idea or lead I could go by?
Here I am assuming your "handwritten notes" are images. A simple solution might be uploading your images somewhere (e.g. Amazon S3, but there are countless options out there). Then, in some database you might have a reference to the URL of the image. In your code you can then download the images using the URL and process them as you see fit.
Note: I am making many assumptions here but I hope this helps.

Do you have to host your data with MapQuest?

From what I've read so far, it seems like the only way for me to map custom data points from my own dataset is to host that data with MapQuest. Am I correct in that or have I just not read deep enough?
And if it's possible, does anyone have a link to more information about how to go about it? Their API documentation is subpar.
Thanks :)
Disclaimer: I work at MapQuest
While the MapQuest Data Manager makes it easy to store custom data with MapQuest so that you can query it through the Search API, you don't have to store data with us in order to show custom points on a map.
Are you trying to do something along the lines of storing data in MySQL or PostgreSQL and then use something like PHP to query your own database, loop through the results, and then show the results on a MapQuest map using the JavaScript API? Unfortunately I don't have any easy/quick examples that show how to do that, but it is possible.
The forums on the Developer Network are also good place to look to see if others have had issues similar to the one that you are facing.
Also, let me know exactly which MapQuest APIs/tools you are using and I will do my best to provide more information depending on what you need.

Where can i find a good web to obtain UAPROF's?

i'm trying to find a good UAPROF website but only i can see that http://www.uaprof.com/ . I need UAPROFS of HTC and in that web a lot of links are broken. Do you know another web or resource to find that UAPROFS ?
Thank you in advance for your attention.
The Open Mobile Alliance validator keeps links to validated profiles. However the manufacturers do remove old profiles, or in some cases the manufacturers no longer exist, which is probably why you are seeing broken links at uaprof.com.
Google is another good source of profiles by searching for "uaprof filetype:xml" and "uaprof filetype:rdf".
Alternatively check out WURFL. This is not UAProf, but the data is derived from UAProf, it's XML so it is easier to process, and it is curated so it will not have the same problems.
There are more UAProf resources at the DELI website.

Need ideas on retrieving data from a website

I'm stumped and need some ideas on how to do this or even whether it can be done at all.
I have a client who would like to build a website tailored to English-speaking travelers in a specific country (Thailand, in this case). The different modes of transportation (bus & train) have good web sites for providing their respective information. And both are very static in terms of the data they present (the schedules rarely change). Here's one of the sites I would need to get info from: train schedules The client wants to provide users the ability to search for a beginning and end location and determine, using the external website's information, how they can best get there, being provided a route with schedule times for the different modes of chosen transport.
Now, in my limited experience, I would think the way to do that would be to retrieve the original schedule info from the external site's server (via API or some other means) and retain the info in a database, which can be queried as needed. Our first thought was to contact the respective authorities to determine how/if this can be done, but this has proven to be problematic due to the language barrier, mainly.
My client suggested what is basically "screen scraping", but that sounds like it would be complicated at best, downloading the web page(s) and filtering through the HTML for relevant/necessary data to put into the database. My worry is that the info on these mainly static sites is so static, that the data isn't even kept in a database to build the page and the web page itself is updated (hard-coded) when something changes.
I could really use some help and suggestions here. Thanks!
Screen scraping is always problematic IMO as you are at the mercy of the person who wrote the page. If the content is static, then I think it would be easier to copy the data manually to your database. If you wanted to keep up to date with changes, you could then snapshot the page when you transcribe the info and run a job to periodically check whether the page has changed from the snapshot. When it does, it sends an email for you to update it.
The above method could also be used in conjunction with some sort of screen scaper which could fall back to a manual process if the page changes too drastically.
Ultimately, it is a case of how much effort (cost) is your client willing to bear for accuracy
I have done this for the following site: http://www.buscatchers.com/ so it's definitely more than doable! A key feature of a web scraping solution for travel sites is that it must send you emails if anything went wrong during the scraping process. On the site, I use a two day window so that I have two days to fix the code if the design changes. Only once or twice have I had to change my code, and it's very easy to do.
As for some examples. There is some simplified source code here: http://www.buscatchers.com/about/guide. The full source code for the project is here: https://github.com/nicodjimenez/bus_catchers. This should give you some ideas on how to get started.
I can tell that the data is dynamic, it's to well structured. It's not hard for someone who is familiar with xpath to scrape this site.

Best Way to automatically find links to your content?

So, here is the task I've found myself thinking of. Pretend for a moment, that I have a large body of content. I want to see what websites are linking to my content. I know that I could look into TrackBack or PingBack but what about those that aren't using tools capable of dealing with that?
It would seem that some form of Web Crawler that looks for pages linking to the original document might be useful. My question to the greater community is what would be the best way to get started here? Do TrackBack and PingBack do more than I assume? Are there services or tools out there that already do what I'm thinking?
Google is your friend!
Use the link prefix:
link:whatsite.com
And yes, trackbacks do more.
If you have HTTP referers setup in your logs, you can mine them.
You can even discover pages taht does not know about.
Else, there is the paying Linkscape from Seomoz or the free majesticSEO (if you confirm ownership of the domain).
MajesticSEO has a bigger backlink index and an API (need to login!).

Resources