How can I optimize my recursive SPARQL query? - query-optimization

I'm trying to extract buildings from Wikidata using a recursive SPARQL query but I keep getting query timeouts. Is there a way to circumvent this?
This is my current query, selecting all buildings with either a Freebase ID or a Google Knowledge Graph ID, and a Dutch label:
SELECT DISTINCT ?building ?buildingLabel
WHERE {
?building p:P2671|p:P646 ?id;
p:P31/ps:P31/wdt:P279* wd:Q41176;
rdfs:label ?buildingLabel .
FILTER(LANG(?buildingLabel) = 'nl') .
FILTER (?building != ?buildingLabel) .
}
I've tried manually looking a few layers deep instead but, for some reason, I get no results for three or more layers deep even though those definitely exist. I've tried this using:
SELECT ?building
WHERE {
?building p:P31/ps:P31/wdt:P279 [p:P31/ps:P31/wdt:P279 [p:P31/ps:P31/wdt:P279 wd:Q41176]].
}
and using
SELECT ?building
WHERE {
?parent2 p:P31/ps:P31/wdt:P279 wd:Q41176.
?parent1 p:P31/ps:P31/wdt:P279 ?parent2.
?building p:P31/ps:P31/wdt:P279 ?parent1.
}
There are about 2.24 million buildings and about 18 million entities with either a Freebase ID or a Google Knowledge Graph ID on Wikidata. I've looked at this guide but couldn't quite figure out how to apply it to my query. I've also read the answer to this question but, unfortunately, using multiple queries isn't really an option for me.

If your intention is to use the "recursive" property path to find things of type building and also types that are subclasses of buildings, your first query using wdt:P279* is right, while the later attempts at repeating the full p:P31/ps:P31/wdt:P279 pattern won't match any data.
By simplifying the first query a bit I was able to get this to run (returning 96,297 results in 39s):
SELECT DISTINCT ?building ?buildingLabel
WHERE {
?building p:P2671|p:P646 ?id;
wdt:P31/wdt:P279* wd:Q41176 .
?building rdfs:label ?buildingLabel .
FILTER(LANGMATCHES(LANG(?buildingLabel), "nl"))
}
Two notable changes:
p:P31/ps:P31 is replaced by wdt:P31, removing one join from the query.
The second FILTER is unnecessary, as ?building (a URI) and ?buildingLabel (a string) are necessarily going to be unequal

Related

Concat values to use as subject in SPARQL Queries (MarkLogic)

I am attempting to do a SPARQL query in MarkLogic by concating my subject, predicate and object to use as a new "subject" node. I have attempted to do so with the query below
SELECT *
WHERE {
?subject </in/relationship/with> ?object .
BIND(concat(?subject, "/in/relationship/with", ?object) AS ?relationship
?relationship </current/status> ?status
}
However, this query does not work as ?relationship now contains a string for each row resulting in the output of the query to be completely empty. Therefore, I am wondering if this can be done and whether it is possible to convert a string into a object that SPARQL can query with.
Stanislav is correct, you need to wrap the string in IRI(). Here a code snippet that runs directly in QC. Run it against an empty database to not pollute your other data:
xdmp:document-insert('/triples.xml', <triples>{
sem:triple(sem:iri("http://my/subject1"), sem:iri("/in/relationship/with"), sem:iri("http://my/subject2")),
sem:triple(sem:iri("http://my/subject1/in/relationship/with/http://my/subject2"), sem:iri("/current/status"), "My status")
}</triples>)
;
sem:sparql('
SELECT *
WHERE {
?subject </in/relationship/with> ?object.
BIND(IRI(CONCAT(?subject, "/in/relationship/with/", ?object)) AS ?relationship)
?relationship </current/status> ?status.
}
')
Whether this is a sensible approach might depend. Keep in mind that MarkLogic is particularly strong in keeping associated data together in documents, and you can embed triples, or use TDE to project triples out of them as well, allowing you to combine strength from document search, and keeping related data together, while still allowing to reason over facts with SPARQL.
HTH!

SPARQL Multiple Dataset with Blank nodes

I've been learning about sparql lately and confuse about blank nodes. Is blank nodes can be used to linked data from multiple dataset ? Or it is just used for one dataset? then what is the specific usage of this blank nodes?
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0-1/>
SELECT DISTINCT ?class
WHERE {[] a ?class}
does query above already using different dataset or multiple datasets?
Blank nodes are basically pronouns.
They're used when you know an entity exists, and you can say some things about it, but you don't know its absolute identifier, its URI, its name, so you use a pronoun to reference it.
In your example query, you're not really using a blank node, as the [] is just taking the place of the common ?s or any other variable. A better example of a blank node would be here --
:Fred :hasThing [ :hasColor :Blue ]
We don't know anything else about the "Thing", so we refer to it obliquely.
Added --
Also note that in your query, the PREFIX declarations are pointless, as the declared prefixes appear nowhere in your query. They do not cause inclusion of listed datasets (because they're not lists of datasets, in this context; they're just syntactic sugar to make other URIs in the query easier to write as prefixed-URIs, like foaf:Person, rather than fully-qualified URIs, like <http://xmlns.com/foaf/0.1/Person>), nor exclusion of others.
(Tangentially -- your foaf: prefix is incorrect, as it has a hyphen, "-", where it should have a dot, ".".)
This query is identical to yours --
SELECT DISTINCT ?class
WHERE { ?s a ?class }

Selecting direct children with Postgres ltree using Elixir

I'm trying to select any children that are a single level below with ltree.
For example, if I had Car.Ford, the query would grab any child with a path such as Car.Ford.Fiesta, Car.Ford.Fusion, Car.Ford.Mustang.
How can I build this query using ltree, if possible, specifically using Elixir?
Right now I'm using
from c in query, where: fragment("path <# ?", c.path)
But it returns all entries with the path in it.
Figured this out.
The documentation on Postgres states that the {} in an lquery limits the number of labels it will match, the documentation from the developers clarifies that this is actually to limit the number of levels to search.
'My.Example.*{1}'
That will match anything one level below a path starting with My.Example

Rails 3, ActiveRecord, PostgreSQL - ".uniq" command doesn't work?

I have following query:
Article.joins(:themes => [:users]).where(["articles.user_id != ?", current_user.id]).order("Random()").limit(15).uniq
and gives me the error
PG::Error: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: ...s"."user_id" WHERE (articles.user_id != 1) ORDER BY Random() L...
When I update the original query to
Article.joins(:themes => [:users]).where(["articles.user_id != ?", current_user.id]).order("Random()").limit(15)#.uniq
so the error is gone... In MySQL .uniq works, in PostgreSQL not. Exist any alternative?
As the error states for SELECT DISTINCT, ORDER BY expressions must appear in select list.
Therefore, you must explicitly select for the clause you are ordering by.
Here is an example, it is similar to your case but generalize a bit.
Article.select('articles.*, RANDOM()')
.joins(:users)
.where(:column => 'whatever')
.order('Random()')
.uniq
.limit(15)
So, explicitly include your ORDER BY clause (in this case RANDOM()) using .select(). As shown above, in order for your query to return the Article attributes, you must explicitly select them also.
I hope this helps; good luck
Just to enrich the thread with more examples, in case you have nested relations in the query, you can try with the following statement.
Person.find(params[:id]).cars.select('cars.*, lower(cars.name)').order("lower(cars.name) ASC")
In the given example, you're asking all the cars for a given person, ordered by model name (Audi, Ferrari, Porsche)
I don't think this is a better way, but may help to address this kind of situation thinking in objects and collections, instead of a relational (Database) way.
Thanks!
I assume that the .uniq method is translated to a DISTINCT clause on the SQL. PostgreSQL is picky (pickier than MySQL) -- all fields in the select list when using DISTINCT must be present in the ORDER_BY (and GROUP_BY) clauses.
It's a little unclear what you are attempting to do (a random ordering?). In addition to posting the full SQL sent, if you could explain your objective, that might be helpful in finding an alternative.
I just upgraded my 100% working and tested application from 3.1.1 to 3.2.7 and now have this same PG::Error.
I am using Cancan...
#users = User.accessible_by(current_ability).order('lname asc').uniq
Removing the .uniq solves the problem and it was not necessary anyway for this simple query.
Still looking through the change notes between 3.1.1 and 3.2.7 to see what caused this to break.

Datastore Query filtering on list

Select all records, ID which is not in the list
How to make like :
query = Story.all()
query.filter('ID **NOT IN** =', [100,200,..,..])
There's no way to do this efficiently in App Engine. You should simply select everything without that filter, and filter out any matching entities in your code.
This is now supported via GQL query
The 'IN' and '!=' operators in the Python runtime are actually
implemented in the SDK and translate to multiple queries 'under the
hood'.
For example, the query "SELECT * FROM People WHERE name IN ('Bob',
'Jane')" gets translated into two queries, equivalent to running
"SELECT * FROM People WHERE name = 'Bob'" and "SELECT * FROM People
WHERE name = 'Jane'" and merging the results. Combining multiple
disjunctions multiplies the number of queries needed, so the query
"SELECT * FROM People WHERE name IN ('Bob', 'Jane') AND age != 25"
generates a total of four queries, for each of the possible conditions
(age less than or greater than 25, and name is 'Bob' or 'Jane'), then
merges them together into a single result set.
source: appengine blog
This is an old question, so I'm not sure if the ID is a non-key property. But in order to answer this:
query = Story.all()
query.filter('ID **NOT IN** =', [100,200,..,..])
...With ndb models, you can definitely query for items that are in a list. For example, see the docs here for IN and !=. Here's how to filter as the OP requested:
query = Story.filter(Story.id.IN([100,200,..,..])
We can even query for items that in a list of repeated keys:
def all(user_id):
# See if my user_id is associated with any Group.
groups_belonged_to = Group.query().filter(user_id == Group.members)
print [group.to_dict() for group in belong_to]
Some caveats:
There's docs out there that mention that in order to perform these types of queries, Datastore performs multiple queries behind the scenes, which (1) might take a while to execute, (2) take longer if you searching in repeated properties, and (3) will up your costs with more operations.

Resources