Retrieve DBpedia page information automatically - database

I currently have a huge list of DBpedia pages of which I need corresponding data to be extracted (mainly dct:subject) from their respective pages. So for example, for the Android_TV page I need the following data (under dct:subject) to be returned as text:
dbc:Smart_TV
dbc:2014_software
dbc:Google_software
dbc:Android_(operating_system)
dbc:Natural_language_processing_software
Now I know how to manually query this of course, but since I have a huge amount of these pages to query I need to find a way to do this automatically and have all this extracted information stored somehow. I of course have done a lot of searching on Google as well as Google Scholar, but I'm quite stuck.
Could anyone point me in the right direction? Any useful papers / websites / explanations you might know of are very welcome! Thanks in advance! :)

Related

Matching article text against pre-existing list of categories

I'm new to Azure Cognitive Services, and while I'm pretty sure it can help me solve my problem, I don't quite understand which part of it to use for it...
Here's what I want to do:
We have blog posts, say ~1k, and those blog posts all have categories and tags (multiple each). What I want to do, is to "guess" the right categories/tags for each article based on the content, and then present that to the editor as a suggestions at the time of input ("looks like this article is about: health, well-being, ..."). The ~1k articles we already have in the system are currently correctly tagged/categorized, so I'd like to use these a data source for this "guessing".
I've used Azure Search before, and it seems like some combination of EntityRecognition and KeyPhraseExtraction might be a way in the right direction? Azure Cognitive Services also seems to have an API that supports TextAnalytics that would do something similar. I'm a bit confused about why these are two different things (or are they not?)
This also seems like an entirely common problem (matching text against pre-defined categories based on other text that is categorized), so I'm wondering if I'm just missing an obvious solution here?
Thanks in advance.
I think the Azure Cognitive Text Analytics API is your best bet as you are looking for real-time analysis prior to tagging/categorizing for storage.
Text Analytics could return a list of named entities that you could map to your available tags/categories and present to the user.
Azure Cognitive Search requires an indexer and skillset to process target text with an end result of storing the processed results to an index specifically for searching.

Airports, points of interest and cities Auto complete

I'm currently working on a travel booking application.
I have two questions related to the same topic.
I need to know where sites like Priceline, Expedia or cheapoair get their autocomplete search data from? Such as Airports, points of interest and city's- states. Do these sites go off the google places API for their search autocomplete?
I was thinking about getting this data using google places autocomplete. Would this be a wise way to go about it? Or would I be better of finding a JSON file with all this data and store it on my own server and query the JSON file directly.
did you try out this?
- https://community.algolia.com/places/
- https://demos.algolia.com/geo-search-demo/ [Search for airports]
- and check the guide that goes with the demo https://www.algolia.com/doc/guides/geo-search/geo-search-overview
Having your own database would give you more flexibility. Also, users would query you data so there'd probably be less search with no results (with google search, users could type queries that are not related to any of your content).
Does that make sense?

Need help stringing together database processes

I need some help from those with more knowledge than I posses. I am currently trying to figure out how to get real time data from a database.
I need to be able to find the company info from the most recent licensees. So the search parameter I'm using is 2016-05-10T00:00:00.000
The full string together from the API and the search parameter can be found directly at this link:
https://www.hurl.it/?method=GET&url=https%3A%2F%2Fdata.wa.gov%2Fresource%2Fv8vv-gqqs.json&headers=%7B%22X-App-Token%22%3A[%22bjp8KrRvAPtuf809u1UXnI0Z8%22]%7D&args=%7B%22licenseeffectivedate%22%3A[%222004-07-14T00%3A00%3A00.000%22]%7D
So I'm looking to retrieve the most recently added accounts in order to verify 1. the license is active 2. the license number the contractor gives matches what the website says. I would like to figure out how to automate this so that when the newest licenses are added I'll know, and they will be extracted/downloaded into excel.
If anyone can help with this I would appreciate it very much. I also have more questions about using databases if any of you are experts in the field.
Once again, thank you!
Clay
Since your goal is to get this data into Excell, have you considered using something like our OData support instead? You could structure your query in Excel PowerBI and it'd automatically refresh the data.
Another option would be to use our CSV output type with an Excel web query. I use the IMPORTDATA(...) function in Google Sheets, which is very similar.

How do websites use information on a database to create pages?

Sorry about the broad question. I'm just curious if someone could point me in the right direction.
Say there's a database of contact information, and there's a site where you can input a persons name and it brings you to a page with all of their information on that database. How does this happen exactly? The server would have to dynamically create this page, but does it have a generic format that it just fills with the information? And how does this happen?
Like you said, this is a an extremely broad question. It could be either way. The server could generate the entire contents dinamically, or it could be "filling the blanks" into a preformatted layout.
Google some PHP basic tutorials. That should give you a good idea about how this "dynamism" works. Sorry but your question is too broad to ellaborate more.
The server would dynamically create the page using PHP and SQL. There is a quick tutorial at http://www.mysqltutorial.org/php-querying-data-from-mysql-table/ that shows how it would be setup.
If I understood your question right, you are asking things like how this page was created, for example, in which case it can be as simple as a basic PHP and SQL combination. You can check an example on the w3sschools website:
Try it yourself example
There would be special place holders for the data and a query will extract the data and put it into the given place holders, please note, you can also use loops to add things like tables, fetch through multiple rows and so on.

Need ideas on retrieving data from a website

I'm stumped and need some ideas on how to do this or even whether it can be done at all.
I have a client who would like to build a website tailored to English-speaking travelers in a specific country (Thailand, in this case). The different modes of transportation (bus & train) have good web sites for providing their respective information. And both are very static in terms of the data they present (the schedules rarely change). Here's one of the sites I would need to get info from: train schedules The client wants to provide users the ability to search for a beginning and end location and determine, using the external website's information, how they can best get there, being provided a route with schedule times for the different modes of chosen transport.
Now, in my limited experience, I would think the way to do that would be to retrieve the original schedule info from the external site's server (via API or some other means) and retain the info in a database, which can be queried as needed. Our first thought was to contact the respective authorities to determine how/if this can be done, but this has proven to be problematic due to the language barrier, mainly.
My client suggested what is basically "screen scraping", but that sounds like it would be complicated at best, downloading the web page(s) and filtering through the HTML for relevant/necessary data to put into the database. My worry is that the info on these mainly static sites is so static, that the data isn't even kept in a database to build the page and the web page itself is updated (hard-coded) when something changes.
I could really use some help and suggestions here. Thanks!
Screen scraping is always problematic IMO as you are at the mercy of the person who wrote the page. If the content is static, then I think it would be easier to copy the data manually to your database. If you wanted to keep up to date with changes, you could then snapshot the page when you transcribe the info and run a job to periodically check whether the page has changed from the snapshot. When it does, it sends an email for you to update it.
The above method could also be used in conjunction with some sort of screen scaper which could fall back to a manual process if the page changes too drastically.
Ultimately, it is a case of how much effort (cost) is your client willing to bear for accuracy
I have done this for the following site: http://www.buscatchers.com/ so it's definitely more than doable! A key feature of a web scraping solution for travel sites is that it must send you emails if anything went wrong during the scraping process. On the site, I use a two day window so that I have two days to fix the code if the design changes. Only once or twice have I had to change my code, and it's very easy to do.
As for some examples. There is some simplified source code here: http://www.buscatchers.com/about/guide. The full source code for the project is here: https://github.com/nicodjimenez/bus_catchers. This should give you some ideas on how to get started.
I can tell that the data is dynamic, it's to well structured. It's not hard for someone who is familiar with xpath to scrape this site.

Resources