How to include imported fields in the search results? - vespa

I'm using document references to import parent fields into a child document. While searches against the parent fields work, the parent fields themselves do not seem to be included in the search results, only child fields.
To use the example in the documentation, salesperson_name does not appear in the fields entry for id:test:ad::1 when using query=John, or indeed when retrieving id:test:ad::1 via GET directly.
Here's a simplified configuration for my document model:
search definitions
person.sd - the parent
search person {
document person {
field name type string {
indexing: summary | attribute
}
}
fieldset default {
fields: name
}
}
event.sd - the child
search event {
document event {
field code type string {
indexing: summary | attribute
}
field speaker type reference<person> {
indexing: summary | attribute
}
}
import field speaker.name as name {}
fieldset default {
fields: code
}
}
documents
p1 - person
{
"fields": {
"name": "p1"
}
}
e1 - event
{
"fields": {
"code": "e1",
"speaker": "id:n1:person::1"
}
}
query result
curl -s "http://localhost:8080/search/?yql=select%20*%20from%20sources%20*where%20name%20contains%20%22p1%22%3B" | python -m json.tool
This returns both e1 and p1, as you would expect, given that name is present in both. But the fields of e1 do not include the name.
{
"root": {
"children": [
{
"fields": {
"documentid": "id:n1:person::1",
"name": "p1",
"sddocname": "person"
},
"id": "id:n1:person::1",
"relevance": 0.0017429193899782135,
"source": "music"
},
{
"fields": {
"code": "e1",
"documentid": "id:n1:event::1",
"sddocname": "event",
"speaker": "id:n1:person::1"
},
"id": "id:n1:event::1",
"relevance": 0.0017429193899782135,
"source": "music"
}
],
...
"fields": {
"totalCount": 2
},
}
}

Currently you'll need to add the imported 'name' into the default summary by
import field speaker.name as name {}
document-summary default {
summary name type string{}
}
More about explicit document summaries in http://docs.vespa.ai/documentation/document-summaries.html
The result of your query will then return
"children": [
{
"fields": {
"documentid": "id:n1:person::1",
"name": "p1",
"sddocname": "person"
},
"id": "id:n1:person::1",
"relevance": 0.0017429193899782135,
"source": "stuff"
},
{
"fields": {
"code": "e1",
"documentid": "id:n1:event::1",
"name": "p1",
"sddocname": "event",
"speaker": "id:n1:person::1"
},
"id": "id:n1:event::1",
"relevance": 0.0017429193899782135,
"source": "stuff"
}
],
We'll improve the documentation on this. Thanks for the very detailed write-up.

Add "summary" to the indexing statement of the imported field in the parent document type.
E.g in the documentation example change the "name" field in the "salesperson" document type to say "indexing: attribute | summary".

Related

How to add required to sub array in a json schema?

I'm creating a json schema to define necessary data with data types. There is some data need to be set into required filed. But didn't find how to do it in its document.
For this json schema:
{
"type": "object",
"required": [
"version",
"categories"
],
"properties": {
"version": {
"type": "string",
"minLength": 1,
"maxLength": 1
},
"categories": {
"type": "array",
"items": [
{
"title": {
"type": "string",
"minLength": 1
},
"body": {
"type": "string",
"minLength": 1
}
}
]
}
}
}
json like
{
"version":"1",
"categories":[
{
"title":"First",
"body":"Good"
},
{
"title":"Second",
"body":"Bad"
}
]
}
I want to set title to be required, too. It's in a sub array. How to set it in json schema?
There are a few things wrong with your schema. I'm going to assume you're using JSON Schema draft 2019-09.
First, you want items to be an object, not an array, as you want it to apply to every item in the array.
If "items" is a schema, validation succeeds if all elements in the
array successfully validate against that schema.
If "items" is an array of schemas, validation succeeds if each
element of the instance validates against the schema at the same
position, if any.
https://datatracker.ietf.org/doc/html/draft-handrews-json-schema-02#section-9.3.1.1
Second, if the value of items should be a schema, you need to treat it like a schema in its own right.
If we take the item from your items array as a schema, it doesn't actually do anything, and you need to nest it in a properties keyword...
{
"properties": {
"title": {
"type": "string",
"minLength": 1
},
"body": {
"type": "string",
"minLength": 1
}
}
}
Finally, now your items keyword value is a schema (subschema), you can add any keywords you can normally use, such as required, the same as you have done previously.
{
"required": [
"title"
],
"properties": {
...
}
}

JSON schema for an unnamed array?

I need to create a JSON schema for data that comes as an array directly within the root object, unnamed. An MWE for this kind of JSON would be:
{
[
{
"veggieName": "potato",
"veggieLike": true
},
{
"veggieName": "broccoli",
"veggieLike": false
}
]
}
I have seen examples for schemas which validate such an array which is not nested in an object. I have also seen examples which work when the array is named, for example
{
vegetables : [
{
"veggieName": "potato",
"veggieLike": true
},
{
"veggieName": "broccoli",
"veggieLike": false
}
]
}
This second example can be validated by the schema
{
"$id": "https://example.com/arrays.schema.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"description": "A representation of a person, company, organization, or place",
"type": "object",
"properties": {
"vegetables": {
"type": "array",
"items": { "$ref": "#/definitions/veggie" }
}
},
"definitions": {
"veggie": {
"type": "object",
"required": [ "veggieName", "veggieLike" ],
"properties": {
"veggieName": {
"type": "string",
"description": "The name of the vegetable."
},
"veggieLike": {
"type": "boolean",
"description": "Do I like this vegetable?"
}
}
}
}
}
But the problem is, as soon as the name "vegetables" is removed, I was not able to find a way to define a valid schema. How do I properly represent my data structure in a schema?
(MWEs derived from http://json-schema.org/learn/miscellaneous-examples.html).
The schema you are looking for is the following:
{
"$id":"https://example.com/arrays.schema.json",
"$schema":"http://json-schema.org/draft-07/schema#",
"description":"A representation of a person, company, organization, or place",
"type":"array",
"items":{
"type":"object",
"required":[
"veggieName",
"veggieLike"
],
"properties":{
"veggieName":{
"type":"string",
"description":"The name of the vegetable."
},
"veggieLike":{
"type":"boolean",
"description":"Do I like this vegetable?"
}
}
}
}
You also need to modify your base array instance, your original one (the "unnamed" array) was not valid JSON:
[
{
"veggieName":"potato",
"veggieLike":true
},
{
"veggieName":"broccoli",
"veggieLike":false
}
]
Unlike XML, where you are allowed a single root node per document only, in JSON you can have either a type or an array as a root type.

Is it possible to apply a solr document int field value as boost value if a specific field is matched?

Ex.
"docs": [
{
"id": "f37914",
"index_id": "some_index",
"field_1": [
{
"Some value",
"boost": 20.
}
]
},
]
If 'field_1' is matched, then boost by corresponding 'boost' field.
Boost what? the document? the specific field? you can do any of them.
Anyway the way to do it is to user Function Queries:
https://lucene.apache.org/solr/guide/6_6/function-queries.html#FunctionQueries-AvailableFunctions
For example if you want to boost the document (and assuming if the value doesn't match then the score is 0) then you can do something like that:
q:_val_:"if(query($q1), field(boost), 0)"&q1=field_1:"Some Value"
_val_ is just a hook into Solr function query, query returns true if q1 matches, field is a simple function that just return the value of the field it self and if allows us to join the two together.
So what I ended up doing is using lucence payloads and solr 6.6 new DelimitedPayloadTokenFilter feature.
First I created a terms field with the following configuration:
{
"add-field-type": {
"name": "terms",
"stored": "true",
"class": "solr.TextField",
"positionIncrementGap": "100",
"indexAnalyzer": {
"tokenizer": {
"class": "solr.KeywordTokenizerFactory"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
},
{
"class": "solr.DelimitedPayloadTokenFilterFactory",
"encoder": "float",
"delimiter": "|"
}
]
},
"queryAnalyzer": {
"tokenizer": {
"class": "solr.KeywordTokenizerFactory"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
},
{
"class": "solr.SynonymGraphFilterFactory",
"ignoreCase": "true",
"expand": "false",
"tokenizerFactory": "solr.KeywordTokenizerFactory",
"synonyms": "synonyms.txt"
}
]
}
},
"add-field" : {
"name":"terms",
"type":"terms",
"stored": "true",
"multiValued": "true"
}
}
I indexed my documents likes so:
[
{
"id" : "1",
"terms" : [
"some term|10.0",
"another term|60.0"
]
}
,
{
"id" : "2",
"terms" : [
"some term|11.0",
"another term|21.0"
]
}
]
I used solr's functional query support to query for a match on terms and grab the attached boost payload and apply it to the relevancy score:
/solr/payloads/select?indent=on&wt=json&q={!payload_score%20f=ai_terms_wtih_synm_3%20v=$payload_term%20func=max}&fl=id,score&payload_term=some+term

"There is no index available for this selector" despite the fact I made one

In my data, I have two fields that I want to use as an index together. They are sensorid (any string) and timestamp (yyyy-mm-dd hh:mm:ss).
So I made an index for these two using the Cloudant index generator. This was created successfully and it appears as a design document.
{
"index": {
"fields": [
{
"name": "sensorid",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
}
]
},
"type": "text"
}
However, when I try to make the following query to find all documents with a timestamp newer than some value, I am told there is no index available for the selector:
{
"selector": {
"timestamp": {
"$gt": "2015-10-13 16:00:00"
}
},
"fields": [
"_id",
"_rev"
],
"sort": [
{
"_id": "asc"
}
]
}
What have I done wrong?
It seems to me like cloudant query only allows sorting on fields that are part of the selector.
Therefore your selector should include the _id field and look like:
"selector":{
"_id":{
"$gt":0
},
"timestamp":{
"$gt":"2015-10-13 16:00:00"
}
}
I hope this works for you!

CouchDB View With OR Condition

I have two kinds of documents in couchDB with following json type:
1.
{
"_id": "4a91f3e8-616a-431d-8199-ace00055763d",
"_rev": "2-9105188217acd506251c98cd4566e788",
"Vehicle": {
"type": "STRING",
"name": "Vehicle",
"value": "12345"
},
"Start": {
"type": "DATE",
"name": "Start",
"value": "2014-09-10T11:19:00.000Z"
}
}
2.
{
"_id": "4a91f3e8-616a-431d-8199-ace00055763d",
"_rev": "2-9105188217acd506251c98cd4566e788",
"Equipment": {
"type": "STRING",
"name": "Equipment",
"value": "12345"
},
"Start": {
"type": "DATE",
"name": "Start",
"value": "2014-09-10T11:19:00.000Z"
}
}
I want to make one view which search all these documents whose doc.Vehicle.value=12345 OR doc.Equipment.value=12345.
How can I make this view that will return all these kind of documents.
Thanks in advance.
Just emit both (yes, map functions may emits multiple times different key-values for the same doc) values with your view:
function(doc){
if (doc.Equipment) {
emit(doc.Equipment.value, null)
}
if (doc.Vehicle) {
emit(doc.Vehicle.value, null)
}
}
And request them by the same key:
http://localhost:5984/db/_design/ddoc/_view/by_equip_value?key="12345"
See also the Guide to Views for more info about CouchDB views.
With Kxepals Version, you cannot query the type of results ("12345" can be either Vehicle, OR Equipment). you can only see the result when you use "include_docs=true" and search inside the doc, or make a second query with the id of the results.
If you want to see the type (or Query by type) you need to extend the View :
..
if(doc.Equipment) {
emit (doc.Equipment.value,doc.Equipment.name);
}
if(doc.Vehicle) {
emit(doc.Vehicle.value,doc.Vehicle.name);
}
Here, the name is the value of the result rows.
But you can also define the results in the query, if you put the name as a first query item:
if(doc.Equipment) {
emit([doc.Equipment.name,doc.Equipment.value],null);
}
if(doc.Vehicle) {
emit ([doc.Vehicle.name,doc.Vehicle.value],null);
}
Here, the
Your Query for Vehicles:
/viewname?startkey=["Vehicle"]&Endkey=["Vehicle",{}]
Equipment:
/viewname?startkey=["Equipment"]&endkey=["Equipment,{}]
Here, the name is the first Item of the result rows key array.
Maybe this will help : http://de.slideshare.net/okurow/couchdb-mapreduce-13321353
BTW: Better solution would be :
{
"_id": "4a91f3e8-616a-431d-8199-ace00055763d",
"_rev": "2-9105188217acd506251c98cd4566e788",
"type": "Vehicle",
"value":"12345",
"Start": {
"type": "DATE",
"name": "Start", // ? maybe also obsolete, because already inside "Start" Element
"value": "2014-09-10T11:19:00.000Z"
}
}
{
"_id": "4a91f3e8-616a-431d-8199-ace00055763d",
"_rev": "2-9105188217acd506251c98cd4566e788",
"type": "Equipment",
"value":"12345",
"Start": {
"type": "DATE",
"name": "Start", // ? maybe also obsolete, because already inside "Start" Element
"value": "2014-09-10T11:19:00.000Z"
}
}
in this case you can use only one emit:
emit([doc.type,doc.value],null)

Resources