Is there any endpoint or API to query Wikipedia, DBpedia or something similar to get all the Wikipedia pages of all the countries and its respective administrative divisions?
I've tried to find in DBpedia the way to know the administrative sub-divisions of a given region but I couldn't do it.
I would instead query against Wikidata and then convert the reults to Wikipedia pages.
You can use Sweden as an example for how a country and its subdivisions are structured.
For the querying you can use either
https://query.wikidata.org (SPARQL) or
http://wdq.wmflabs.org
You can try looking at Despegar. You can retrieve Administrative Divisions using this RESTful APIs. Here's link to method you need.
Related
I'm new to Azure Cognitive Services, and while I'm pretty sure it can help me solve my problem, I don't quite understand which part of it to use for it...
Here's what I want to do:
We have blog posts, say ~1k, and those blog posts all have categories and tags (multiple each). What I want to do, is to "guess" the right categories/tags for each article based on the content, and then present that to the editor as a suggestions at the time of input ("looks like this article is about: health, well-being, ..."). The ~1k articles we already have in the system are currently correctly tagged/categorized, so I'd like to use these a data source for this "guessing".
I've used Azure Search before, and it seems like some combination of EntityRecognition and KeyPhraseExtraction might be a way in the right direction? Azure Cognitive Services also seems to have an API that supports TextAnalytics that would do something similar. I'm a bit confused about why these are two different things (or are they not?)
This also seems like an entirely common problem (matching text against pre-defined categories based on other text that is categorized), so I'm wondering if I'm just missing an obvious solution here?
Thanks in advance.
I think the Azure Cognitive Text Analytics API is your best bet as you are looking for real-time analysis prior to tagging/categorizing for storage.
Text Analytics could return a list of named entities that you could map to your available tags/categories and present to the user.
Azure Cognitive Search requires an indexer and skillset to process target text with an end result of storing the processed results to an index specifically for searching.
I'm currently working on a travel booking application.
I have two questions related to the same topic.
I need to know where sites like Priceline, Expedia or cheapoair get their autocomplete search data from? Such as Airports, points of interest and city's- states. Do these sites go off the google places API for their search autocomplete?
I was thinking about getting this data using google places autocomplete. Would this be a wise way to go about it? Or would I be better of finding a JSON file with all this data and store it on my own server and query the JSON file directly.
did you try out this?
- https://community.algolia.com/places/
- https://demos.algolia.com/geo-search-demo/ [Search for airports]
- and check the guide that goes with the demo https://www.algolia.com/doc/guides/geo-search/geo-search-overview
Having your own database would give you more flexibility. Also, users would query you data so there'd probably be less search with no results (with google search, users could type queries that are not related to any of your content).
Does that make sense?
How can I find out (any language but better if Python) when Google indexed a specific html page?
Ideally I would have a list of URLs to check for.
I have already tried the WayBack machine but it doesn't have the majority of the pages I need. Also if anyone can suggest an API to extract dates in multiple language from text.
You can use this pattern to access the cached version of your webpage.
http://webcache.googleusercontent.com/search?q=cache:<URL>
Say for example, you can see the cache version of my blog datafireball.com like this, as you can see, it is indexed 20141020, 23:33:30, strip= will avoid loading javascript, css..etc. To get the time when the indexed was index, you can use some browser automation tool like Selenium or Phantomjs... etc. to get the page.
I am planning to create a recommender system using apache Mahout.
I searched on internet about it. and i found it uses the following format for dataset file.
userId, itemId, preference
what i want to use as a dataset have structure like this.
Id, rating, location, skills, fee
Is there any way i can do this?
Or i have to use Weka
It provides the option of creating custom dataset. but reviews suggest that it is not a good option as compared to mahout for Recommender system.
Are you planning to do collaborative filtering? Usually with CF you take in lots of user preferences about items. Then for a given user you recommend items. You don't seem to have user preferences.
In any case you will need to preprocess your data into the form required, it is all that will be used in CF anyway.
Try to understend this exemple:
https://github.com/apache/mahout/tree/master/examples/src/main/java/org/apache/mahout/cf/taste/example/bookcrossing
i hope it will help you
I need a database of interests for a coming project.
By saying "interests" I mean like:
Sports - Football, Basketball and so on...
Is somebody know something like this?
I just don't want start writing thousands (or even millions) of interests.
Try to google on "interests list" or "hobbies list" and the write simple parcer to extract all (based on neko-html for example). Examples of useful links:
http://www.notsoboringlife.com/list-of-hobbies/
http://www.buzzle.com/articles/list-types-of-hobbies/
Also it is possible to ask admins of any dating sites to make a simple DB-query to get relevant result.
There is an API from google called Freebase. I allows you suggest to the user what input he should give the same way google search suggests you what to look for.
Freebase