JSON-LD frame for mixed type definitions: #type and rdf:type in the same graph - json-ld

Sometimes constructing graphs using sparql creates JSON-LD documents that have mixed type definitions instead of #type (both rdf:type and #type in same graph). See this gist in JSON-LD Playground.
Example graph with mixed type definitions:
{
"#context": {
"label": "http://www.w3.org/2000/01/rdf-schema#label",
"ex": "http://example.org/ex#",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#"
},
"#graph": [
{
"#id": "ex:Test1",
"rdf:type": "ex:ExampleClass",
"label": "Test 1"
},
{
"#id": "ex:Test2",
"#type": "ex:AnotherExampleClass",
"label": "Test 2"
}
]
}
Is there a way to use frame to transform all the instances to use #type instead of rdf:type?

This happens when you have bad data like rdf:type pointing to strings. Didnt find a way to fix this with frame but fixing the data works.

Related

How do I restrict medication annotations to a specific document section via IBM Watson Annotator for Clinical Data (ACD)

I’m using the IBM Watson Annotator for Clinical Data (ACD) API hosted in IBM Cloud to detect medication mentions within discharge summary clinic notes. I’m using the out-of-the-box medication annotator provided with ACD.
I’m able to detect and extract medication mentions, but I ONLY want medications mentioned within “DISCHARGE MEDICATIONS” or “DISCHARGE INSTRUCTIONS” sections.
Is there a way I can restrict ACD to only return medication mentions that appear within those two sections? I’m only interested in discharge medications.
For example, given the following contrived (non-PHI) text:
“Patient was previously prescribed cisplatin.DISCHARGE MEDICATIONS: 1. Aspirin 81 mg orally once daily.”
I get two medication mentions: one over “cisplatin” and another over “aspirin” - I only want the latter, since it appears within the “DISCHARGE MEDICATIONS” section.
Since the ACD medication annotator captures the section headings as part of the mention annotations that appear within a section, you can define an inclusive filter that checks for (1) the desired normalized section heading as well as (2) a filter that checks for the existence of the section heading fields in general, should a mention appear outside of any section and not include section header fields as part of the annotation. This will filter out any medication mentions from the ACD response that don't appear within a "DISCHARGE MEDICATIONS" section. I added a couple other related normalized section headings so you can see how that's done. Feel free to modify the sample below to meet your needs.
Here's a sample flow you can persist via POST /flows and then reference on the analyze call as POST /analyze/{flow_id} - e.g. POST /analyze/discharge_med_flow
{
"id": "discharge_med_flow",
"name": "Disharge Medications Flow",
"description": "Detect medication mentions within DISCHARGE MEDICATIONS sections",
"annotatorFlows": [
{
"flow": {
"elements": [
{
"annotator": {
"name": "medication",
"configurations": [
{
"filter": {
"target": "unstructured.data.MedicationInd",
"condition": {
"type": "all",
"conditions": [
{
"type": "all",
"conditions": [
{
"type": "match",
"field": "sectionNormalizedName",
"values": [
"Discharge medication",
"Discharge instructions",
"Medications on discharge"
],
"not": false,
"caseInsensitive": true,
"operator": "equals"
},
{
"type": "match",
"field": "sectionNormalizedName",
"operator": "fieldExists"
}
]
}
]
}
}
}
]
}
}
],
"async": false
}
}
]
}
See the IBM Watson Annotator for Clinical Data filtering docs for additional details.
Thanks

How to represent internationalized strings in Google-friendly Schema.org

Google's Structured Data Testing Tool doesn't seem to like JSON-LD's #language in value object approach to string internationalization. For example:
{
"#context": "https://schema.org/",
"#type": "Person",
"name": [{"#language": "ar", "#value": "أياس"},
{"#language": "en", "#value": "Eyas"}]
}
or
{
"#context": "https://schema.org/",
"#type": "Person",
"name": {"#language": "ar", "#value": "أياس"}
}
don't seem to work. I also tried adding "#type": "Text" but that doesn't seem to make it happy either.
Is there an accepted way of specifying multiple language representations of the same thing in Schema.org JSON-LD that is respected by search engines?
I know there's "inLanguage" for certain types, but that is not general enough to, e.g., work with a Person.

Merge two properties using JSON-LD framing

I'm trying to standartize a property in a json-ld document. A simple example:
json-ld
{
"#context": {
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"dcterms": "http://purl.org/dc/terms/"
},
"#graph": [
{
"#id": "1",
"rdfs:label": "A title"
},
{
"#id": "2",
"dcterms:title": "Another title"
}
]
}
frame (failing attempt)
{
"type": "array",
"items": {
"title": ["rdfs:label", "dcterms:title"]
}
}
This produces an empty graph, instead of this:
desired output
[{
"title": "A title"
},
{
"title": "Another title"
}]
The documentation at https://json-ld.org/primer/latest/#framing seems to be work in progress and there is really not a lot of examples or tutorials covering json-ld framing.
Playground example
Framing is used to shape the data in a JSON-LD document, using an example frame document which is used to both match the flattened data and show an example of how the resulting data should be shaped
https://json-ld.org/spec/latest/json-ld-framing/#framing
This beeing said, re-shaping data does not mean you can change the semantics. rdfs:label and dcterms:title are different things in the source data and will be different things in the result, you can not merge them to a "title" property that expands to only one URI (which one?). If that were the case, the result would have different semantics than the source, but framing is only meant to change the structure.

Localisation of values in JSON-LD

I'm trying to work out the best way to handle localisation in JSON-LD. The spec has information on String Internationalization that allows you to specify different translations for string values:
{
"#context":
{
...
"occupation": { "#id": "ex:occupation", "#container": "#language" }
},
"name": "Yagyū Muneyoshi",
"occupation":
{
"ja": "忍者",
"en": "Ninja",
"cs": "Nindža"
}
...
}
This covers translation but not internationalization where the content changes depending on locale.
E.g.
{
"#context":
{
"#id": "http://example.org/carousel#mycarousel",
"#language": "ja"
...
},
"slides": ["http://example.org/japan.jpg"]
}
{
"#context":
{
"#id": "http://example.org/carousel#mycarousel",
"#language": "es"
...
},
"slides": ["http://example.org/spain.jpg"]
}
Does anyone know if the above is invalid in the JSON-LD spec, i.e. having different field values depending on the #language while there #ids are the same? If not is there an alternate approach that could work?
Yes, the above is invalid. #language is only used to annotate strings with their language. What you are looking for is higher-level information. As such, you need to use some vocabulary. Schema.org for instance has http://schema.org/inLanguage for this. There exist various others as well. Which one you want to use, depends on the specific use case.

fuzzy search in elasticsearch different than fuzziness match boolean

i'm trying to figure out why the following queries produce vastly different results. i'm told a fuzzy query is almost never a good idea per this document Found-fuzzy so i'm trying to use a match query with a fuzziness parameter. they produce extremely different results. i'm not sure what's the best way of doing this.
my example is a movie title containing 'batman'. the user, however, types 'bat man' (with a space). this would make sense that a fuzzy query should find batman. it should also find other variations like spider man, but for now that's ok i guess. (not really, but...)
so the fuzzy search is actually returning more relevant results than the match one below. any ideas?
--fuzzy:
{
"query":{
"bool":{
"should": [
{
"fuzzy": {
"title": {
"value": "bat man",
"boost": 4
}
}
}
], "minimum_number_should_match": 1
}
}
}
--match:
{
"query":{
"bool":{
"should": [
{
"match": {
"title": {
"query": "bat man",
"boost": 4
}
}
}
], "minimum_number_should_match": 1
}
}
}
EDIT
i'm adding examples of what gets returned.
first, nothing gets returned using the match query, even with a high fuzziness value added (fuzziness: 5)
but i do get several 'batman' related titles using the fuzzy query such as 'batman' or 'batman returns'.
this gets even stranger when i do multiple fuzzy searches on 'bat man' using the fuzzy search... if i search my 'starring' field, in addition to the title field, (starring contains lists of actors), i get 'jason bateman' as well as the title 'batman'.
{
"_index": "store24",
"_type": "searchdata",
"_id": "081227987909",
"_score": 4.600759,
"fields": {
"title": [
"Batman"
]
}
},
{
"_index": "store24",
"_type": "searchdata",
"_id": "883929053353",
"_score": 4.1418676,
"fields": {
"title": [
"Batman Forever"
]
}
},
{
"_index": "store24",
"_type": "searchdata",
"_id": "883929331789",
"_score": 3.5298011,
"fields": {
"title": [
"Batman Returns"
]
}
}
BEST SO FAR (STILL NOT GREAT)
what i've found that works best so far is to combine both queries. this seems redundant, but i can't as yet make one work like the other. so, this seems to be better:
"should": [
{
"fuzzy": {
"title": {
"boost": 6.0,
"min_similarity": 1.0,
"value": "batman"
}
}
},
{
"match": {
"title": {
"query": "batman",
"boost": 6.0
,"fuzziness": 1
}
}
}
]
Elastic Search analyzes docs and converts them into terms, which are what is actually searched (not the docs themselves). The key difference between the two query types is that the match query does not analyze the query text before sending the query. So consider the example below:
The search of 'bat man' in a fuzzy search would first tokenize the term, then search. So what it really looks for is 'btmn,' which might not turn up the same matches. A good example of this is how Jason Bateman showed up because the last name was tokenized to btmn or a similar form.
More detailed information on the Analyzing of text fields when searching can be read http://exploringelasticsearch.com/searching_data.html#sec-searching-analysis
When a search is performed on an analyzed field, the query itself is
analyzed, matching it up to the documents which are analyzed when
added to the database. Reducing words to these short tokens normalizes
the text allowing for fast efficient lookups. Whether you’re searching
for "rollerblading" in any form, internally we’re just looking for
"rollerblad".

Resources