Most common datatypes in DBpedia - database

I need to know which are the most common datatypes in DBpedia. So I am asking a query to Virtuoso like this:
SELECT datatype(?d) (COUNT(?d) as ?dCount)
WHERE
{
?s ?p ?d
}
GROUP BY ?d
ORDER BY DESC(?dCount)
I am not sure if the query is correct and, above all, the transaction timed out. How can I get my answer or reduce my research space to "something relevant"? Or, for example, get anyway my result when the query times out?

The query is not correct.
You must group by the datatype, not the literal value:
SELECT (datatype(?d) as ?dt) (COUNT(?d) as ?dCount)
WHERE
{
?s ?p ?d
FILTER(isLiteral(?d))
}
GROUP BY datatype(?d)
ORDER BY DESC(?dCount)
The query might still timeout.
You could restrict it to data properties of DBpedia, i.e.
SELECT (datatype(?d) as ?dt) (COUNT(*) as ?dCount)
WHERE
{
?p a owl:DatatypeProperty .
?s ?p ?d
}
GROUP BY datatype(?d)
ORDER BY DESC(?dCount)
but you would miss the triples with properties of http://dbpedia.org/property/ namespace.
Alternatives:
load the data into a local more powerful server
simply use the DBpedia ontology although this probably doesn't contain all datatypes used in the instance data

Related

How can I optimize my recursive SPARQL query?

I'm trying to extract buildings from Wikidata using a recursive SPARQL query but I keep getting query timeouts. Is there a way to circumvent this?
This is my current query, selecting all buildings with either a Freebase ID or a Google Knowledge Graph ID, and a Dutch label:
SELECT DISTINCT ?building ?buildingLabel
WHERE {
?building p:P2671|p:P646 ?id;
p:P31/ps:P31/wdt:P279* wd:Q41176;
rdfs:label ?buildingLabel .
FILTER(LANG(?buildingLabel) = 'nl') .
FILTER (?building != ?buildingLabel) .
}
I've tried manually looking a few layers deep instead but, for some reason, I get no results for three or more layers deep even though those definitely exist. I've tried this using:
SELECT ?building
WHERE {
?building p:P31/ps:P31/wdt:P279 [p:P31/ps:P31/wdt:P279 [p:P31/ps:P31/wdt:P279 wd:Q41176]].
}
and using
SELECT ?building
WHERE {
?parent2 p:P31/ps:P31/wdt:P279 wd:Q41176.
?parent1 p:P31/ps:P31/wdt:P279 ?parent2.
?building p:P31/ps:P31/wdt:P279 ?parent1.
}
There are about 2.24 million buildings and about 18 million entities with either a Freebase ID or a Google Knowledge Graph ID on Wikidata. I've looked at this guide but couldn't quite figure out how to apply it to my query. I've also read the answer to this question but, unfortunately, using multiple queries isn't really an option for me.
If your intention is to use the "recursive" property path to find things of type building and also types that are subclasses of buildings, your first query using wdt:P279* is right, while the later attempts at repeating the full p:P31/ps:P31/wdt:P279 pattern won't match any data.
By simplifying the first query a bit I was able to get this to run (returning 96,297 results in 39s):
SELECT DISTINCT ?building ?buildingLabel
WHERE {
?building p:P2671|p:P646 ?id;
wdt:P31/wdt:P279* wd:Q41176 .
?building rdfs:label ?buildingLabel .
FILTER(LANGMATCHES(LANG(?buildingLabel), "nl"))
}
Two notable changes:
p:P31/ps:P31 is replaced by wdt:P31, removing one join from the query.
The second FILTER is unnecessary, as ?building (a URI) and ?buildingLabel (a string) are necessarily going to be unequal

Concat values to use as subject in SPARQL Queries (MarkLogic)

I am attempting to do a SPARQL query in MarkLogic by concating my subject, predicate and object to use as a new "subject" node. I have attempted to do so with the query below
SELECT *
WHERE {
?subject </in/relationship/with> ?object .
BIND(concat(?subject, "/in/relationship/with", ?object) AS ?relationship
?relationship </current/status> ?status
}
However, this query does not work as ?relationship now contains a string for each row resulting in the output of the query to be completely empty. Therefore, I am wondering if this can be done and whether it is possible to convert a string into a object that SPARQL can query with.
Stanislav is correct, you need to wrap the string in IRI(). Here a code snippet that runs directly in QC. Run it against an empty database to not pollute your other data:
xdmp:document-insert('/triples.xml', <triples>{
sem:triple(sem:iri("http://my/subject1"), sem:iri("/in/relationship/with"), sem:iri("http://my/subject2")),
sem:triple(sem:iri("http://my/subject1/in/relationship/with/http://my/subject2"), sem:iri("/current/status"), "My status")
}</triples>)
;
sem:sparql('
SELECT *
WHERE {
?subject </in/relationship/with> ?object.
BIND(IRI(CONCAT(?subject, "/in/relationship/with/", ?object)) AS ?relationship)
?relationship </current/status> ?status.
}
')
Whether this is a sensible approach might depend. Keep in mind that MarkLogic is particularly strong in keeping associated data together in documents, and you can embed triples, or use TDE to project triples out of them as well, allowing you to combine strength from document search, and keeping related data together, while still allowing to reason over facts with SPARQL.
HTH!

SPARQL Multiple Dataset with Blank nodes

I've been learning about sparql lately and confuse about blank nodes. Is blank nodes can be used to linked data from multiple dataset ? Or it is just used for one dataset? then what is the specific usage of this blank nodes?
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0-1/>
SELECT DISTINCT ?class
WHERE {[] a ?class}
does query above already using different dataset or multiple datasets?
Blank nodes are basically pronouns.
They're used when you know an entity exists, and you can say some things about it, but you don't know its absolute identifier, its URI, its name, so you use a pronoun to reference it.
In your example query, you're not really using a blank node, as the [] is just taking the place of the common ?s or any other variable. A better example of a blank node would be here --
:Fred :hasThing [ :hasColor :Blue ]
We don't know anything else about the "Thing", so we refer to it obliquely.
Added --
Also note that in your query, the PREFIX declarations are pointless, as the declared prefixes appear nowhere in your query. They do not cause inclusion of listed datasets (because they're not lists of datasets, in this context; they're just syntactic sugar to make other URIs in the query easier to write as prefixed-URIs, like foaf:Person, rather than fully-qualified URIs, like <http://xmlns.com/foaf/0.1/Person>), nor exclusion of others.
(Tangentially -- your foaf: prefix is incorrect, as it has a hyphen, "-", where it should have a dot, ".".)
This query is identical to yours --
SELECT DISTINCT ?class
WHERE { ?s a ?class }

Recursive queries in SPARQL to browse collections of collections

I am trying to create an RDF graph from a Mulgara RDF store, using a Sparql query to return results. I'm just beginning to get comfortable with simple queries, effectively asking, "which objects are Members of a particular collection?"
My question is, and I would greatly appreciate any advice, whether I can take the results from this simple query and reroute them back through as the object of the query?
For example, I have this sparql query:
SELECT ?x WHERE {?x <fedora-rels-ext:isMemberOfCollection> <info:fedora/collection:ramsey>}
With these results:
"x"
info:fedora/ramsey:ThelifeandadventuresofRobinsonCrusoe
info:fedora/ramsey:Jackanapes
info:fedora/ramsey:SundayJournalvol01no0219951126
info:fedora/ramsey:Ideologyandchange
info:fedora/ramsey:theshepherdofthepyrenees
info:fedora/ramsey:ScenesinAmerica
...
My goal, is to then take these unique identifiers and replace the object, <info:fedora/collection:ramsey>, from the original query and run the query again.
I'm imagining a scenario where I would identify a root element in the initial query, have the results return all member objects, then return all those objects' member objects, ad infinitum...
Is this possible with Sparql queries? Specifically, I believe I'm querying a Mulgara RDF database. Any thoughts, even if its' not doable, greatly appreciated.
Lets assume you have to stick to SPARQL 1.0. I believe that mulgara has limited support for SPARQL 1.1 if any.
With SPARQL 1.0 if you probably know how many levels you want to query you can do things like:
SELECT ?y WHERE {
?x <fedora-rels-ext:isMemberOfCollection> <info:fedora/collection:ramsey>
?y <fedora-rels-ext:isMemberOfCollection> ?x
}
Here ?y will be bound with 2nd level elements from your root. With UNIONS you can query multiple levels with one query. An example for one and two levels from root in one query:
SELECT ?x WHERE {
{
?x <fedora-rels-ext:isMemberOfCollection> <info:fedora/collection:ramsey> .
} UNION {
?zz <fedora-rels-ext:isMemberOfCollection> <info:fedora/collection:ramsey>
?x <fedora-rels-ext:isMemberOfCollection> ?zz .
}
}
The problem with this is that you do not really know at what level ?x is bound. Therefore you cannot paint a tree with this type of query. In SPARQL 1.1 this gets solved with BIND AS
SELECT ?x ?level WHERE {
{
?x <fedora-rels-ext:isMemberOfCollection> <info:fedora/collection:ramsey> .
BIND (1 AS ?level)
} UNION {
?zz <fedora-rels-ext:isMemberOfCollection> <info:fedora/collection:ramsey>
?x <fedora-rels-ext:isMemberOfCollection> ?zz .
BIND (2 AS ?level)
}
}
This second query will return at what level ?x is bound. You can imagine some programatically generated query with lots of unions trying to reach the max depth of the tree. If you want full support for SPARQL 1.1 if you can try to use Jena/ARQ. In Jena you can also use Property paths and with something like the following:
SELECT ?x WHERE {
?x <fedora-rels-ext:isMemberOfCollection>+ <info:fedora/collection:ramsey> .
}
You would bind in ?x all the nodes reachable from <info:fedora/collection:ramsey> via the predicate <fedora-rels-ext:isMemberOfCollection>.

Summing up multiple MDX queries in SSAS

i need to SUM the results of multiple queries.
the challenge that i have is that each query has defined members (to calculate a date range)
i need to be able to combine/sum those members across multiple mdx queries
WITH Member [M1] AS Sum(DateRange, Measure)
SELECT [M1]
FROM [Cube]
WHERE {[x].&[y]}
WITH Member [M1] AS Sum(Different DateRange, Measure)
SELECT [M1]
FROM [Cube]
WHERE {[z].&[q]}
each query selects the same members based on different criteria.
the only way i can think of doing this is a UNION and than SUM([M1]) but no idea how that is possible in MDX
UPDATE - in reply to icCube question, here is why i need to have a separate WHERE clause for each query:
i need separate WHERE sections for each query because i need to aggregate the results of different slices. and my slices are defined by n number of dimensions. i emit the mdx query for each slice dynamically based on user configuration input (and construct my WHERE clause dynamically to filter by user preferences). Users are allowed to configure overlapping slices (these are the ones i need to sum up together). then I need to combine these slice row counts into a report. The way i am doing is by passing a string with MDX query to a report. but since i can't think of a way to get multiple queries into one executable string, (nor do i know how many queries there will be) this approach is no longer possible (unless there is some way to union / sum them.
The only way i could think of accomplishing this for now, is with additional batching step that will iterate through all queries, process them (using Adomd.net) into a staging table, and then i can aggregate them into a report using SQL sum(..). Biggest disadvantage to this approach being additional system to be maintained and more possibilities that the data in the report will be stale.
Not sure if this is what you're looking
WITH Member [M1] AS Sum(Different DateRange, ([z].&[q],Measure) ) +
Sum(DateRange, ([x].&[y],Measure))
SELECT [M1]
FROM [Cube]
or
WITH Member [M1] AS Sum(Different DateRange * {[z].&[q]}, Measure ) +
Sum(DateRange * {[x].&[y]}, Measure)
SELECT [M1]
FROM [Cube]
I don't know any way adding the result of two selects in MDX...
I believe you need Aggregate() not Sum.
You could implement the UNION behavior in MDX using SubCubes on this way:
Select
{...} On Columns,
{...} On Rows
From (
Select
{
{Dimension1.Level.Members * Dimension2.&[1] * Dimension3.&[2]},
{Dimension1.&[X] * Dimension2.Members * Dimension3.&[5]}
} On Columns
From [Cube]
)

Resources