Recursive queries in SPARQL to browse collections of collections - database

I am trying to create an RDF graph from a Mulgara RDF store, using a Sparql query to return results. I'm just beginning to get comfortable with simple queries, effectively asking, "which objects are Members of a particular collection?"
My question is, and I would greatly appreciate any advice, whether I can take the results from this simple query and reroute them back through as the object of the query?
For example, I have this sparql query:
SELECT ?x WHERE {?x <fedora-rels-ext:isMemberOfCollection> <info:fedora/collection:ramsey>}
With these results:
"x"
info:fedora/ramsey:ThelifeandadventuresofRobinsonCrusoe
info:fedora/ramsey:Jackanapes
info:fedora/ramsey:SundayJournalvol01no0219951126
info:fedora/ramsey:Ideologyandchange
info:fedora/ramsey:theshepherdofthepyrenees
info:fedora/ramsey:ScenesinAmerica
...
My goal, is to then take these unique identifiers and replace the object, <info:fedora/collection:ramsey>, from the original query and run the query again.
I'm imagining a scenario where I would identify a root element in the initial query, have the results return all member objects, then return all those objects' member objects, ad infinitum...
Is this possible with Sparql queries? Specifically, I believe I'm querying a Mulgara RDF database. Any thoughts, even if its' not doable, greatly appreciated.

Lets assume you have to stick to SPARQL 1.0. I believe that mulgara has limited support for SPARQL 1.1 if any.
With SPARQL 1.0 if you probably know how many levels you want to query you can do things like:
SELECT ?y WHERE {
?x <fedora-rels-ext:isMemberOfCollection> <info:fedora/collection:ramsey>
?y <fedora-rels-ext:isMemberOfCollection> ?x
}
Here ?y will be bound with 2nd level elements from your root. With UNIONS you can query multiple levels with one query. An example for one and two levels from root in one query:
SELECT ?x WHERE {
{
?x <fedora-rels-ext:isMemberOfCollection> <info:fedora/collection:ramsey> .
} UNION {
?zz <fedora-rels-ext:isMemberOfCollection> <info:fedora/collection:ramsey>
?x <fedora-rels-ext:isMemberOfCollection> ?zz .
}
}
The problem with this is that you do not really know at what level ?x is bound. Therefore you cannot paint a tree with this type of query. In SPARQL 1.1 this gets solved with BIND AS
SELECT ?x ?level WHERE {
{
?x <fedora-rels-ext:isMemberOfCollection> <info:fedora/collection:ramsey> .
BIND (1 AS ?level)
} UNION {
?zz <fedora-rels-ext:isMemberOfCollection> <info:fedora/collection:ramsey>
?x <fedora-rels-ext:isMemberOfCollection> ?zz .
BIND (2 AS ?level)
}
}
This second query will return at what level ?x is bound. You can imagine some programatically generated query with lots of unions trying to reach the max depth of the tree. If you want full support for SPARQL 1.1 if you can try to use Jena/ARQ. In Jena you can also use Property paths and with something like the following:
SELECT ?x WHERE {
?x <fedora-rels-ext:isMemberOfCollection>+ <info:fedora/collection:ramsey> .
}
You would bind in ?x all the nodes reachable from <info:fedora/collection:ramsey> via the predicate <fedora-rels-ext:isMemberOfCollection>.

Related

How can I optimize my recursive SPARQL query?

I'm trying to extract buildings from Wikidata using a recursive SPARQL query but I keep getting query timeouts. Is there a way to circumvent this?
This is my current query, selecting all buildings with either a Freebase ID or a Google Knowledge Graph ID, and a Dutch label:
SELECT DISTINCT ?building ?buildingLabel
WHERE {
?building p:P2671|p:P646 ?id;
p:P31/ps:P31/wdt:P279* wd:Q41176;
rdfs:label ?buildingLabel .
FILTER(LANG(?buildingLabel) = 'nl') .
FILTER (?building != ?buildingLabel) .
}
I've tried manually looking a few layers deep instead but, for some reason, I get no results for three or more layers deep even though those definitely exist. I've tried this using:
SELECT ?building
WHERE {
?building p:P31/ps:P31/wdt:P279 [p:P31/ps:P31/wdt:P279 [p:P31/ps:P31/wdt:P279 wd:Q41176]].
}
and using
SELECT ?building
WHERE {
?parent2 p:P31/ps:P31/wdt:P279 wd:Q41176.
?parent1 p:P31/ps:P31/wdt:P279 ?parent2.
?building p:P31/ps:P31/wdt:P279 ?parent1.
}
There are about 2.24 million buildings and about 18 million entities with either a Freebase ID or a Google Knowledge Graph ID on Wikidata. I've looked at this guide but couldn't quite figure out how to apply it to my query. I've also read the answer to this question but, unfortunately, using multiple queries isn't really an option for me.
If your intention is to use the "recursive" property path to find things of type building and also types that are subclasses of buildings, your first query using wdt:P279* is right, while the later attempts at repeating the full p:P31/ps:P31/wdt:P279 pattern won't match any data.
By simplifying the first query a bit I was able to get this to run (returning 96,297 results in 39s):
SELECT DISTINCT ?building ?buildingLabel
WHERE {
?building p:P2671|p:P646 ?id;
wdt:P31/wdt:P279* wd:Q41176 .
?building rdfs:label ?buildingLabel .
FILTER(LANGMATCHES(LANG(?buildingLabel), "nl"))
}
Two notable changes:
p:P31/ps:P31 is replaced by wdt:P31, removing one join from the query.
The second FILTER is unnecessary, as ?building (a URI) and ?buildingLabel (a string) are necessarily going to be unequal

ONTOP plugin protege - reasoning and query answering

I am new to using ONTOP in protege and I have a question about its use.
I have a small ontology that contains a class called OnConnectedState and an individual of that type "on_connected_state_1". With the following axiom: "satisfies some OnConnectedPrecondition".
When I use a Hermit reasoner, and create a DL query, I can query "satisfies some OnConnectedPrecondition" and the query will return my individual "on_connected_state_1".
Using ontop, I want to be able to query this information. So I run the ONTOP reasoner with no errors and have written the following sparql query:
PREFIX proc: http://www.semanticweb.org/procedure-ontology#
SELECT ?s ?y
WHERE {
?s proc:satisfies ?y
}
However, the result comes up blank and there is no SQL translation.
What am I missing here? I am unclear why my "on_connected_state_1" individual does not appear.

Concat values to use as subject in SPARQL Queries (MarkLogic)

I am attempting to do a SPARQL query in MarkLogic by concating my subject, predicate and object to use as a new "subject" node. I have attempted to do so with the query below
SELECT *
WHERE {
?subject </in/relationship/with> ?object .
BIND(concat(?subject, "/in/relationship/with", ?object) AS ?relationship
?relationship </current/status> ?status
}
However, this query does not work as ?relationship now contains a string for each row resulting in the output of the query to be completely empty. Therefore, I am wondering if this can be done and whether it is possible to convert a string into a object that SPARQL can query with.
Stanislav is correct, you need to wrap the string in IRI(). Here a code snippet that runs directly in QC. Run it against an empty database to not pollute your other data:
xdmp:document-insert('/triples.xml', <triples>{
sem:triple(sem:iri("http://my/subject1"), sem:iri("/in/relationship/with"), sem:iri("http://my/subject2")),
sem:triple(sem:iri("http://my/subject1/in/relationship/with/http://my/subject2"), sem:iri("/current/status"), "My status")
}</triples>)
;
sem:sparql('
SELECT *
WHERE {
?subject </in/relationship/with> ?object.
BIND(IRI(CONCAT(?subject, "/in/relationship/with/", ?object)) AS ?relationship)
?relationship </current/status> ?status.
}
')
Whether this is a sensible approach might depend. Keep in mind that MarkLogic is particularly strong in keeping associated data together in documents, and you can embed triples, or use TDE to project triples out of them as well, allowing you to combine strength from document search, and keeping related data together, while still allowing to reason over facts with SPARQL.
HTH!

SPARQL Multiple Dataset with Blank nodes

I've been learning about sparql lately and confuse about blank nodes. Is blank nodes can be used to linked data from multiple dataset ? Or it is just used for one dataset? then what is the specific usage of this blank nodes?
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0-1/>
SELECT DISTINCT ?class
WHERE {[] a ?class}
does query above already using different dataset or multiple datasets?
Blank nodes are basically pronouns.
They're used when you know an entity exists, and you can say some things about it, but you don't know its absolute identifier, its URI, its name, so you use a pronoun to reference it.
In your example query, you're not really using a blank node, as the [] is just taking the place of the common ?s or any other variable. A better example of a blank node would be here --
:Fred :hasThing [ :hasColor :Blue ]
We don't know anything else about the "Thing", so we refer to it obliquely.
Added --
Also note that in your query, the PREFIX declarations are pointless, as the declared prefixes appear nowhere in your query. They do not cause inclusion of listed datasets (because they're not lists of datasets, in this context; they're just syntactic sugar to make other URIs in the query easier to write as prefixed-URIs, like foaf:Person, rather than fully-qualified URIs, like <http://xmlns.com/foaf/0.1/Person>), nor exclusion of others.
(Tangentially -- your foaf: prefix is incorrect, as it has a hyphen, "-", where it should have a dot, ".".)
This query is identical to yours --
SELECT DISTINCT ?class
WHERE { ?s a ?class }

Most common datatypes in DBpedia

I need to know which are the most common datatypes in DBpedia. So I am asking a query to Virtuoso like this:
SELECT datatype(?d) (COUNT(?d) as ?dCount)
WHERE
{
?s ?p ?d
}
GROUP BY ?d
ORDER BY DESC(?dCount)
I am not sure if the query is correct and, above all, the transaction timed out. How can I get my answer or reduce my research space to "something relevant"? Or, for example, get anyway my result when the query times out?
The query is not correct.
You must group by the datatype, not the literal value:
SELECT (datatype(?d) as ?dt) (COUNT(?d) as ?dCount)
WHERE
{
?s ?p ?d
FILTER(isLiteral(?d))
}
GROUP BY datatype(?d)
ORDER BY DESC(?dCount)
The query might still timeout.
You could restrict it to data properties of DBpedia, i.e.
SELECT (datatype(?d) as ?dt) (COUNT(*) as ?dCount)
WHERE
{
?p a owl:DatatypeProperty .
?s ?p ?d
}
GROUP BY datatype(?d)
ORDER BY DESC(?dCount)
but you would miss the triples with properties of http://dbpedia.org/property/ namespace.
Alternatives:
load the data into a local more powerful server
simply use the DBpedia ontology although this probably doesn't contain all datatypes used in the instance data

Resources