SPARQL Multiple Dataset with Blank nodes - dataset

I've been learning about sparql lately and confuse about blank nodes. Is blank nodes can be used to linked data from multiple dataset ? Or it is just used for one dataset? then what is the specific usage of this blank nodes?
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0-1/>
SELECT DISTINCT ?class
WHERE {[] a ?class}
does query above already using different dataset or multiple datasets?

Blank nodes are basically pronouns.
They're used when you know an entity exists, and you can say some things about it, but you don't know its absolute identifier, its URI, its name, so you use a pronoun to reference it.
In your example query, you're not really using a blank node, as the [] is just taking the place of the common ?s or any other variable. A better example of a blank node would be here --
:Fred :hasThing [ :hasColor :Blue ]
We don't know anything else about the "Thing", so we refer to it obliquely.
Added --
Also note that in your query, the PREFIX declarations are pointless, as the declared prefixes appear nowhere in your query. They do not cause inclusion of listed datasets (because they're not lists of datasets, in this context; they're just syntactic sugar to make other URIs in the query easier to write as prefixed-URIs, like foaf:Person, rather than fully-qualified URIs, like <http://xmlns.com/foaf/0.1/Person>), nor exclusion of others.
(Tangentially -- your foaf: prefix is incorrect, as it has a hyphen, "-", where it should have a dot, ".".)
This query is identical to yours --
SELECT DISTINCT ?class
WHERE { ?s a ?class }

Related

How can I optimize my recursive SPARQL query?

I'm trying to extract buildings from Wikidata using a recursive SPARQL query but I keep getting query timeouts. Is there a way to circumvent this?
This is my current query, selecting all buildings with either a Freebase ID or a Google Knowledge Graph ID, and a Dutch label:
SELECT DISTINCT ?building ?buildingLabel
WHERE {
?building p:P2671|p:P646 ?id;
p:P31/ps:P31/wdt:P279* wd:Q41176;
rdfs:label ?buildingLabel .
FILTER(LANG(?buildingLabel) = 'nl') .
FILTER (?building != ?buildingLabel) .
}
I've tried manually looking a few layers deep instead but, for some reason, I get no results for three or more layers deep even though those definitely exist. I've tried this using:
SELECT ?building
WHERE {
?building p:P31/ps:P31/wdt:P279 [p:P31/ps:P31/wdt:P279 [p:P31/ps:P31/wdt:P279 wd:Q41176]].
}
and using
SELECT ?building
WHERE {
?parent2 p:P31/ps:P31/wdt:P279 wd:Q41176.
?parent1 p:P31/ps:P31/wdt:P279 ?parent2.
?building p:P31/ps:P31/wdt:P279 ?parent1.
}
There are about 2.24 million buildings and about 18 million entities with either a Freebase ID or a Google Knowledge Graph ID on Wikidata. I've looked at this guide but couldn't quite figure out how to apply it to my query. I've also read the answer to this question but, unfortunately, using multiple queries isn't really an option for me.
If your intention is to use the "recursive" property path to find things of type building and also types that are subclasses of buildings, your first query using wdt:P279* is right, while the later attempts at repeating the full p:P31/ps:P31/wdt:P279 pattern won't match any data.
By simplifying the first query a bit I was able to get this to run (returning 96,297 results in 39s):
SELECT DISTINCT ?building ?buildingLabel
WHERE {
?building p:P2671|p:P646 ?id;
wdt:P31/wdt:P279* wd:Q41176 .
?building rdfs:label ?buildingLabel .
FILTER(LANGMATCHES(LANG(?buildingLabel), "nl"))
}
Two notable changes:
p:P31/ps:P31 is replaced by wdt:P31, removing one join from the query.
The second FILTER is unnecessary, as ?building (a URI) and ?buildingLabel (a string) are necessarily going to be unequal

Cypher query in neo4j to find specific node with most paths matching pattern

I have a neo4j database with statistical information on water and waste. In this database are data points linked with the facts that are relevant, including mappings to internal definitions. Here in the attached screenshot is an example of a data point and the related metadata. The node in the center is the value, and the immediate nodes linked by "HAS_DIMENSION" are the dimensions that came with the data provider. These are not fixed and change depending on the provider. Each dimension of interest is mapped to an internal definition. Currently this is my query:
MATCH (o:Observation {uq_id:'e__ABS_AGR_AQ__FSW__MIO_M3__BG__1970____9f07c7a629625e5ae00e35838fcd4f824a3593dd'})-[:HAS_DIMENSION]->()
MATCH (o)-[:HAS_DIMENSION]->()-[:HAS_SYNONYM_FROM]->()-[:WITH_TARGET_DEF]->(v:Variable)<-[:HAS_UNIT]-(u:Unit)
MATCH (o)-[vl0:HAS_DIMENSION]->()-[:HAS_SYNONYM_FROM]->()-[:WITH_TARGET_DEF]->(l:Location)
MATCH (o)-[vc0:HAS_DIMENSION]->()-[:HAS_SYNONYM_FROM]->()-[:WITH_TARGET_DEF]->(c:Country)
MATCH (o)-[vy0:HAS_DIMENSION]->()-[:HAS_SYNONYM_FROM]->()-[:WITH_TARGET_DEF]->(y:Year)
MATCH (o)-[:HAS_DIMENSION]->(unk0)
MATCH (o)-[sr0:CAME_FROM_FILE]->(ds0)-[sr1:BELONGS_TO]->(s0)
OPTIONAL MATCH (o)-[dtr0:HAS_DIMENSION]->()-[:HAS_SYNONYM_FROM]->()-[:WITH_TARGET_DEF]->(d:DataType)
RETURN *
The issue I have is exemplified by the pink circles. I want only one pink circle (which is a node with label Variable) in the query, in particular I want the variable like follows
MATCH (v:Variable)<-[:MAPS_TO]-()<-[:HAS_DIMENSION]-(o:Observation)
By this I want to force it to observe a pattern where it identifies the single variable that matches the pattern above for the most number of intermediate nodes. So the "Fresh surface water abstracted" variable would match this pattern, since it has two paths that match this. But the "Fresh groundwater abstracted" would not, since it only has one. How could I accomplish this?
It sounds like you want to return the Variable node with the most number of paths leading to it. Would something like this roughly return the results you are after? You will need to adapt according to your matching statements.
MATCH p=(o:Observation {uq_id:'<your_id>'})-[:HAS_DIMENSION]->()<-[:MAPS_TO]-(v:Variable)
RETURN v.name, COUNT(p) as p ORDER BY p DESC LIMIT 1

Concat values to use as subject in SPARQL Queries (MarkLogic)

I am attempting to do a SPARQL query in MarkLogic by concating my subject, predicate and object to use as a new "subject" node. I have attempted to do so with the query below
SELECT *
WHERE {
?subject </in/relationship/with> ?object .
BIND(concat(?subject, "/in/relationship/with", ?object) AS ?relationship
?relationship </current/status> ?status
}
However, this query does not work as ?relationship now contains a string for each row resulting in the output of the query to be completely empty. Therefore, I am wondering if this can be done and whether it is possible to convert a string into a object that SPARQL can query with.
Stanislav is correct, you need to wrap the string in IRI(). Here a code snippet that runs directly in QC. Run it against an empty database to not pollute your other data:
xdmp:document-insert('/triples.xml', <triples>{
sem:triple(sem:iri("http://my/subject1"), sem:iri("/in/relationship/with"), sem:iri("http://my/subject2")),
sem:triple(sem:iri("http://my/subject1/in/relationship/with/http://my/subject2"), sem:iri("/current/status"), "My status")
}</triples>)
;
sem:sparql('
SELECT *
WHERE {
?subject </in/relationship/with> ?object.
BIND(IRI(CONCAT(?subject, "/in/relationship/with/", ?object)) AS ?relationship)
?relationship </current/status> ?status.
}
')
Whether this is a sensible approach might depend. Keep in mind that MarkLogic is particularly strong in keeping associated data together in documents, and you can embed triples, or use TDE to project triples out of them as well, allowing you to combine strength from document search, and keeping related data together, while still allowing to reason over facts with SPARQL.
HTH!

LIKE query on elements of flat jsonb array

I have a Postgres table posts with a column of type jsonb which is basically a flat array of tags.
What i need to do is to somehow run a LIKE query on that tags column elements so that i can find a posts which has a tags beginning with some partial string.
Is such thing possible in Postgres? I'm constantly finding super complex examples and no one is ever describing such basic and simple scenario.
My current code works fine for checking if there are posts having specific tags:
select * from posts where tags #> '"TAG"'
and I'm looking for a way of running something among the lines of
select * from posts where tags #> '"%TAG%"'
SELECT *
FROM posts p
WHERE EXISTS (
SELECT FROM jsonb_array_elements_text(p.tags) tag
WHERE tag LIKE '%TAG%'
);
Related, with explanation:
Search a JSON array for an object containing a value matching a pattern
Or simpler with the #? operator since Postgres 12 implemented SQL/JSON:
SELECT *
-- optional to show the matching item:
-- , jsonb_path_query_first(tags, '$[*] ? (# like_regex "^ tag" flag "i")')
FROM posts
WHERE tags #? '$[*] ? (# like_regex "TAG")';
The operator #? is just a wrapper around the function jsonb_path_exists(). So this is equivalent:
...
WHERE jsonb_path_exists(tags, '$[*] ? (# like_regex "TAG")');
Neither has index support. (May be added for the #? operator later, but not there in pg 13, yet). So those queries are slow for big tables. A normalized design, like Laurenz already suggested would be superior - with a trigram index:
PostgreSQL LIKE query performance variations
For just prefix matching (LIKE 'TAG%', no leading wildcard), you could make it work with a full text index:
CREATE INDEX posts_tags_fts_gin_idx ON posts USING GIN (to_tsvector('simple', tags));
And a matching query:
SELECT *
FROM posts p
WHERE to_tsvector('simple', tags) ## 'TAG:*'::tsquery
Or use the english dictionary instead of simple (or whatever fits your case) if you want stemming for natural English language.
to_tsvector(json(b)) requires Postgres 10 or later.
Related:
Get partial match from GIN indexed TSVECTOR column
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

Projection query with new fields/properites ignores entries that haven't set those properties yet

I have an Article type structured like this:
type Article struct {
Title string
Content string `datastore:",noindex"`
}
In an administrative portion of my site, I list all of my Articles. The only property I need in order to display this list is Title; grabbing the content of the article seems wasteful. So I use a projection query:
q := datastore.NewQuery("Article").Project("Title")
Everything works as expected so far. Now I decide I'd like to add two fields to Article so that some articles can be unlisted in the public article list and/or unviewable when access is attempted. Understanding the datastore to be schema-less, I think this might be very simple. I add the two new fields to Article:
type Article struct {
Title string
Content string `datastore:",noindex"`
Unlisted bool
Unviewable bool
}
I also add them to the projection query, since I want to indicate in the administrative article list when an article is publicly unlisted and/or unviewable:
q := datastore.NewQuery("Article").Project("Title", "Unlisted", "Unviewable")
Unfortunately, this only returns entries that have explicitly included Unlisted and Unviewable when Put into the datastore.
My workaround for now is to simply stop using a projection query:
q := datastore.NewQuery("Article")
All entries are returned, and the entries that never set Unlisted or Unviewable have them set to their zero value as expected. The downside is that the article content is being passed around needlessly.
In this case, that compromise isn't terrible, but I expect similar situations to arise in the future, and it could be a big deal not being able to use projection queries. Projections queries and adding new properties to datastore entries seem like they're not fitting together well. I want to make sure I'm not misunderstanding something or missing the correct way to do things.
It's not clear to me from the documentation that projection queries should behave this way (ignoring entries that don't have the projected properties rather than including them with zero values). Is this the intended behavior?
Are the only options in scenarios like this (adding new fields to structs / properties to entries) to either forgo projection queries or run some kind of "schema migration", Getting all entries and then Puting them back, so they now have zero-valued properties and can be projected?
Projection queries source the data for fields from the indexes not the entity, when you have added new properties pre-existing records do not appear in those indexes you are performing the project query on. They will need to be re-indexed.
You are asking for those specific properties and they don't exist hence the current behaviour.
You should probably think of a projection query as a request for entities with a value in a requested index in addition to any filter you place on a query.

Resources