Vespa: Can we aggregate on nested fields?

Vespa: Can we aggregate on nested fields? - vespa

In search definition the fields inside struct can not have "attribute" indexing.
http://docs.vespa.ai/documentation/reference/search-definitions-reference.html#field_types
Also, struct and maps are not attribute by default.
Resulting search definition would look something like this:
struct nlp {
field token type string {
match: text //can't add indexing here
}
}
field n type nlp {
indexing: summary //can't add attribute here
}
How to add search definition such that we can group by "n.token"? Is it possible to add attribute or indexing for struct fields? Or group by on fields which is not attribute?

Struct field type cannot have attribute which is necessary pre-requisite if you want to run grouping with indexed search, see http://docs.vespa.ai/documentation/reference/search-definitions-reference.html#struct
The only thing you can really do with struct fields with mode=index is to have them as part of the summary (response). You could add a custom searcher which does the aggregation over the struct field analyzing the top K retrieved hits. See http://docs.vespa.ai/documentation/searcher-development.html
With mode=streaming you can run grouping over struct fields, more about streaming here http://docs.vespa.ai/documentation/streaming-search.html

can't comment yet hence posting as answer.
#jkb does this amount to us running a search on some field not in a struct, returning a lot of documents and then synthesising them in a searcher (chaining a searcher would essentially do similar stuff)?
is it also correct to say that nested fields in a document cannot be indexed and in-turn effectively searched (i don't want to use streaming search), and hence the structure must always be flat for index search to work?
i can achieve a flat structure in most cases but what about fields having array of objects where i'd want to search on some attribute of such an object?
"x" : [
object: { attributes},
object: {attributes}
]

Related

Is there any way to sort on a nested value in Azure Cognitive Search?

Is there any way to sort on a nested value in Azure Cognitive Search?
My use case is that I have a database of songs that are associated with dances that one can dance to that song. Users can vote on the danceability of a dance to a song, so there is a is a numeric vote tally for each song/dance combination. A core part of the functionality for the search is to be able to do an arbitrary search and sort the results by the popularity of a particular dance.
I am currently modeling this by creating a new top level field with a decorated name (e.g. DNC_Salsa or DNC_Waltz) for each dance. This works. But aside from being clumsy, I can't associate other information with a dance. In addition, I have to dynamically add the dance fields, so I have to use the generic SearchDocument type in the C# library rather than using a POCO type.
I'd much prefer to model this with the dance fields as an array of subdocuments where the subdocuments contain a dance name, a vote count and the other information I'd like to associate with a dance.
A simplified example record would look something like this:
{
"title": "Baby, It's Cold Outside",
"artist": "Seth MacFarlane",
"tempo": 119.1,
"dances": [
{ "name", "cha cah", "votes", 1 },
{ "name", "foxtrot", "votes", 4 }
]
}
I gave this a try and received:
{"error":{"code":"OperationNotAllowed","message":"The request is invalid.","details":[{"code":"CannotEnableFieldForSorting","message":"The field 'Votes' cannot be enabled for sorting because it is directly or indirectly contained in a collection, which makes it a multi-valued field. Sorting is not allowed on multi-valued fields. Parameters: definition"}]}}
It looks like elastic search will do what I want:
Sort search results | Elasticsearch Guide [7.17] | Elastic
If I'm reading the Elasticsearch documetion correctly, you can basically say I'd like to sort on the dances subdocument by first filtering for name == "cha cha" and then sorting on the vote field.
Is there anything like this in Azure Cognitive Search? Or even something more restrictive? I don't need to do arbitrary sorting on anything in the subdocument. I would be happy to only ever sort on the vote count (although I'd have to be able to do that for any dance name).

It's not clear to me what your records or data model looks like. However, from the error message you provided, it's clear that you try to sort on a multivalue property. That is logically impossible.
Imagine a property Color that can contain colors like 'Red' or 'Blue'. If you sort by Color, you would get your red values before the blues. If you instead had 'Colors' that can contain multiple values like both 'Red' and 'Blue', how would you sort it? You can't.
So, if you actually want to sort by a property, that property has to contain a single value.
When that's said, I have a feeling you are really asking about ranking/boosting. Not sorting. Have a look at the examples with boosting and scoring profiles for different genres of music. I believe the use case in these examples could help you solve your use case.
https://learn.microsoft.com/en-us/azure/search/index-add-scoring-profiles#extended-example

How can I query Solr to get a list with all field-names prefixed by a string?

I would like to create an output based on the field-names of my Solr index objects.
What I have are objects like this e.g.:
{
"Id":"ID12345678",
"GroupKey":"Beta",
"PricePackage":5796.0,
"PriceCoupon":5316.0,
"PriceMin":5316.0
}
Whereby the Price* fields may vary from object to object, some might have more than three of those, some less, however they would be always prefixed with Price.
How can I query Solr to get a list with all field-names prefixed by Price?
I've looked into filters, facets but could not find any clue on how to do this, as all examples - e.g. regex facet - are in regard to the field-value, not the field-name itself. Or at least I could not adapt it to that.

You can get a comma separated list of all existing field names if you query for 0 documents and use the csv response writer (wt parameter) to generate the field name list.
For example if you request /solr/collection/select?q=*:*&wt=csv you get a list of all fields. If you only want fields prefixed with Price you could also add the field list parameter (fl) to limit the fields.
So the request to /solr/collection/select?q=*:*&wt=csv&fl=Price*should return the following response:
PricePackage,PriceCoupon,PriceMin
With this solution you get all fields existing including dynamic fields.

storing an array in mongodb

I'm new to mongodb 3.2 and I'm wondering what the accepted method to store an array.
I have a safe words array that just contains... words.
I want to store it in mongodb for a fast comparison of a queried word.
should I just create a document, and add to it objects with one property called 'word' that contains a word?
should I create a document with one object that contains the property words with an array of words?
maybe something else?
any ideas?

MongoDB supports dynamic schema i.e The documents stored in the database can have varying sets of fields, with different types for each field
Following example demonstrates storing an array in MongoDB document
{
_id : ObjectId("56bc838924f5ca3e0d3c9871"),
"tags":["mongodb","NoSQL"]
}
In above example tags is an array field which stores all tags

golang datastore struct: keeping field unique and required

Im wondering how to best guarantee that a field is unique and isn't saved to the datastore if it isn't. Also that it should be required. I am using this field as stringID and need it to be unique. I know that I can simply try to get an entity by this field and see if it exists and build a logic around it. But is there a simpler way like declaring in your struct that the field should be unique and/or required? Like the mockup below.
type Car struct {
Regnr string "required" "unique"
}
Thanks!

From the Datastore API:
By default, for struct pointers, all properties are potentially
indexed, and the property name is the same as the field name (and
hence must start with an upper case letter). Fields may have a
datastore:"name,options" tag. The tag name is the property name,
which must be one or more valid Go identifiers joined by ".", but may
start with a lower case letter. An empty tag name means to just use
the field name. A "-" tag name means that the datastore will ignore
that field. If options is "noindex" then the field will not be
indexed. If the options is "" then the comma may be omitted. There are
no other recognized options.
Not possible to set those type of tags with Datastore.

Index map values

I have data in which field have following java data types.
What would be the best way to index such kind of data.
Thanks,
field_a map<string,string>
field b map<string,array<string>>
How to define schema.xml for it

Currently Solr doesn't support map type field type. So, you can not query on some particular key inside the map and retrieve its value. I don't know whether it'll be helpful you or not, but I can suggest you a way to keep this in Solr.
You can store the map in a field as a json formatted string. Say, document1 has map1 in field_a and document2 has map2 in field_a. Now, you keep some distinct data related to each map to their corresponding documents. When you want to query, query on those fields in stead of the maps. Then in the search result, when you retrieve the json formatted string, parse it in your application and get the values.
Hope this will help.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight