Merge two properties using JSON-LD framing - json-ld

I'm trying to standartize a property in a json-ld document. A simple example:
json-ld
{
"#context": {
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"dcterms": "http://purl.org/dc/terms/"
},
"#graph": [
{
"#id": "1",
"rdfs:label": "A title"
},
{
"#id": "2",
"dcterms:title": "Another title"
}
]
}
frame (failing attempt)
{
"type": "array",
"items": {
"title": ["rdfs:label", "dcterms:title"]
}
}
This produces an empty graph, instead of this:
desired output
[{
"title": "A title"
},
{
"title": "Another title"
}]
The documentation at https://json-ld.org/primer/latest/#framing seems to be work in progress and there is really not a lot of examples or tutorials covering json-ld framing.
Playground example

Framing is used to shape the data in a JSON-LD document, using an example frame document which is used to both match the flattened data and show an example of how the resulting data should be shaped
https://json-ld.org/spec/latest/json-ld-framing/#framing
This beeing said, re-shaping data does not mean you can change the semantics. rdfs:label and dcterms:title are different things in the source data and will be different things in the result, you can not merge them to a "title" property that expands to only one URI (which one?). If that were the case, the result would have different semantics than the source, but framing is only meant to change the structure.

Related

Solr: using the labelled relationship for nested documents throws unknown field error

Using the example document that Solr has:
{
"ID": "1",
"title": "Solr adds block join support",
"content_type": "parentDocument",
"comments": [{
"ID": "2",
"content": "SolrCloud supports it too!"
},
{
"ID": "3",
"content": "New filter syntax"
}
]
},
When I try to index this json, it would give this error: "ERROR: [doc=1] unknown field 'comments.ID'" even though the field ID is defined in the schema (of course, comments.ID is not)
I am trying to use the labelled relationship and not the anonymous relationship using _childDocuments_ because that is what the docs recommends. What am I missing?
If you're trying to send this to the /update/json/docs convenience path, it will likely fail with a nested document.
Try instead to send your document to the /update path, and use the JSON command structure shown here https://solr.apache.org/guide/8_11/uploading-data-with-index-handlers.html#sending-json-update-commands
Basically, send to /update and wrap your document in an
{
"add": {
"doc": {<your document here>}
}
}
Be sure to also set the content type to application/json

json-ld: Good way to model custom values

I'm trying to get a good json-ld that combines the schema.org/Product definition with some custom elements.
I'm coming from an xsd background and the extensibility in json-ld seems very difficult to achieve.
I started from the template markup for Products found at Google (https://developers.google.com/search/docs/guides/search-gallery) and tried to extend it (I would like to add something like mydomain:tags to it) but I'm not sure how to do this.
<script type="application/ld+json">
{
"#context": ["http://schema.org/",
{"mydomain": "http://mystuff.com/"}],
"#type": "Product",
"name": "Executive Anvil",
"image": "http://www.example.com/anvil_executive.jpg",
"description": "Sleeker than ACME's Classic Anvil, the Executive Anvil is perfect for the business traveler looking for something to drop from a height.",
"mpn": "925872",
"brand": {
"#type": "Thing",
"name": "ACME"
},
"aggregateRating": {
"#type": "AggregateRating",
"ratingValue": "4.4",
"reviewCount": "89"
},
"offers": {
"#type": "Offer",
"priceCurrency": "USD",
"price": "119.99",
"priceValidUntil": "2020-11-05",
"itemCondition": "http://schema.org/UsedCondition",
"availability": "http://schema.org/InStock",
"seller": {
"#type": "Organization",
"name": "Executive Objects"
}
},
"mydomain:tags" : {}
}
</script>
Any clue on what I'm doing wrong here would be much appreciated.
It's probably something silly...
Your JSON-LD seems to be correct. You are using a combination of example 19 (Compact IRIs) and example 29 (Advanced Context Usage).
Google’s Structured Data Testing Tool is not a general JSON-LD validator. The errors it reports are primarily for their search result features. Their error ("The property http://mystuff.com/tags is not recognized by Google for an object of type Product.") just says that it’s not one of the properties Google knows, which is, of course, correct.
If you want to validate your JSON-LD, without getting errors for Google-specific features, you could use http://json-ld.org/playground/, for example.
If you want to use JsonLd for your ListView and DetailView in Django then you don't need to write it for all the list items added from the admin side you only need to pass JsonLdListView in the List View class and JsonLdDetailView in DetailView class and one function in model
Step-1
In models.py write this function in the model for which you have created ListView and DetailView
#property
def sd(self):
return {
"#type": 'Organization',
"description": self.description,
"name": self.name,
}
*name and description is the field name from the same model
from django_json_ld.views import JsonLdDetailView, JsonLdListView
Step-2
class PortfolioListView(JsonLdListView, ListView):
pass
Step-3
class PortfolioDetailView(JsonLdDetailView, DetailView):
def get_structured_data(self):
sd = super(DesignzPortfolioDetailView,
self).get_structured_data()
return sd

How can you retrieve a full nested document in Solr?

In my instance of Solr 4.10.3 I would like to index JSONs with a nested structure.
Example:
{
"id": "myDoc",
"title": "myTitle"
"nestedDoc": {
"name": "test name"
"nestedAttribute": {
"attr1": "attr1Val"
}
}
}
I am able to store it correctly through the admin interface:
/solr/#/mySchema/documents
and I'm also able to search and retrieve the document.
The problem I'm facing is that when I get the response document from my Solr search, I cannot see the nested attributes. I only see:
{
"id": "myDoc",
"title": "myTitle"
}
Is there a way to include ALL the nested fields in the returned documents?
I tried with : "fl=[child parentFilter=title:myTitle]" but it's not working (ChildDocTransformerFactory from:https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents). Is that the right way to do it or is there any other way?
I'm using: Solr 4.10.3!!!!!!
To get returned all the nested structure, you indeed need to use ChildDocTransformerFactor. However, you first need to properly index your documents.
If you just passed your structure as it is, Solr will index them as separate documents and won't know that they're actually connected. If you want to be able to correctly query nested documents, you'll have to pre-process your data structure as described in this post or try using (modifying as needed) a pre-processing script. Unfortunately, including the latest Solr 6.0, there's no nice and smooth solution on indexing and returning nested document structures, so everything is done through "workarounds".
Particularly in your case, you'll need to transform your document structure into this:
{
"type": "parentDoc",
"id": "myDoc",
"title": "myTitle"
"_childDocuments_": [
{
"type": "nestedDoc",
"name": "test name",
"_childDocuments_" :[
{
"type": "nestedAttribute"
"attr1": "attr1Val"
}]
}]
}
Then, the following ChildDocTransformerFactor query will return you all subdocuments (btw, although it says it's available since Solr 4.9, I've actually only seen it in Solr 5.3... so you need to test):
q=title:myTitle&fl=*,[child parentFilter=type:parentDoc limit=50]
Note, although it returns all nested documents, the returned document structure will be flattend (alas!), i.e., you'll get:
{
"type": "parentDoc",
"id": "myDoc",
"title": "myTitle"
"_childDocuments_": [
{
"type": "nestedDoc",
"name": "test name"
},
{
"type": "nestedAttribute"
"attr1": "attr1Val"
}]
}
Probably, not really what you've expected but... this is the unfortunate Solr's behavior that will be fixed in a nearest future release.
You can put
q={!parent which=}
and in fl field :"fl=*,[child parentFilter=title:myTitle].
It will give you all parent field and children field of title:mytitle

Text inside entities in Draft.js

I've been playing with the Entity system in Draft.js. One limitation I see is that entities have to correspond with a range of text in the content they are inserted into. I was hoping I could make a zero-length entity which would have a display based on the data in the entity rather than the text-content in the block. Is this possible?
This is possible when you have a whole block. As you can see in the code example this serialised blockMap contains a block containing no text, but the character list has one entry with an entity attached to it. There is also some discussion going on regarding adding meta-data to a block. see https://github.com/facebook/draft-js/issues/129
"blockMap": {
"80sam": {
"key": "80sam",
"type": "sticker",
"text": "",
"characterList": [
{
"style": [],
"entity": "1"
}
],
"depth": 0
},
},

fuzzy search in elasticsearch different than fuzziness match boolean

i'm trying to figure out why the following queries produce vastly different results. i'm told a fuzzy query is almost never a good idea per this document Found-fuzzy so i'm trying to use a match query with a fuzziness parameter. they produce extremely different results. i'm not sure what's the best way of doing this.
my example is a movie title containing 'batman'. the user, however, types 'bat man' (with a space). this would make sense that a fuzzy query should find batman. it should also find other variations like spider man, but for now that's ok i guess. (not really, but...)
so the fuzzy search is actually returning more relevant results than the match one below. any ideas?
--fuzzy:
{
"query":{
"bool":{
"should": [
{
"fuzzy": {
"title": {
"value": "bat man",
"boost": 4
}
}
}
], "minimum_number_should_match": 1
}
}
}
--match:
{
"query":{
"bool":{
"should": [
{
"match": {
"title": {
"query": "bat man",
"boost": 4
}
}
}
], "minimum_number_should_match": 1
}
}
}
EDIT
i'm adding examples of what gets returned.
first, nothing gets returned using the match query, even with a high fuzziness value added (fuzziness: 5)
but i do get several 'batman' related titles using the fuzzy query such as 'batman' or 'batman returns'.
this gets even stranger when i do multiple fuzzy searches on 'bat man' using the fuzzy search... if i search my 'starring' field, in addition to the title field, (starring contains lists of actors), i get 'jason bateman' as well as the title 'batman'.
{
"_index": "store24",
"_type": "searchdata",
"_id": "081227987909",
"_score": 4.600759,
"fields": {
"title": [
"Batman"
]
}
},
{
"_index": "store24",
"_type": "searchdata",
"_id": "883929053353",
"_score": 4.1418676,
"fields": {
"title": [
"Batman Forever"
]
}
},
{
"_index": "store24",
"_type": "searchdata",
"_id": "883929331789",
"_score": 3.5298011,
"fields": {
"title": [
"Batman Returns"
]
}
}
BEST SO FAR (STILL NOT GREAT)
what i've found that works best so far is to combine both queries. this seems redundant, but i can't as yet make one work like the other. so, this seems to be better:
"should": [
{
"fuzzy": {
"title": {
"boost": 6.0,
"min_similarity": 1.0,
"value": "batman"
}
}
},
{
"match": {
"title": {
"query": "batman",
"boost": 6.0
,"fuzziness": 1
}
}
}
]
Elastic Search analyzes docs and converts them into terms, which are what is actually searched (not the docs themselves). The key difference between the two query types is that the match query does not analyze the query text before sending the query. So consider the example below:
The search of 'bat man' in a fuzzy search would first tokenize the term, then search. So what it really looks for is 'btmn,' which might not turn up the same matches. A good example of this is how Jason Bateman showed up because the last name was tokenized to btmn or a similar form.
More detailed information on the Analyzing of text fields when searching can be read http://exploringelasticsearch.com/searching_data.html#sec-searching-analysis
When a search is performed on an analyzed field, the query itself is
analyzed, matching it up to the documents which are analyzed when
added to the database. Reducing words to these short tokens normalizes
the text allowing for fast efficient lookups. Whether you’re searching
for "rollerblading" in any form, internally we’re just looking for
"rollerblad".

Resources