Im wondering how to best guarantee that a field is unique and isn't saved to the datastore if it isn't. Also that it should be required. I am using this field as stringID and need it to be unique. I know that I can simply try to get an entity by this field and see if it exists and build a logic around it. But is there a simpler way like declaring in your struct that the field should be unique and/or required? Like the mockup below.
type Car struct {
Regnr string "required" "unique"
}
Thanks!
From the Datastore API:
By default, for struct pointers, all properties are potentially
indexed, and the property name is the same as the field name (and
hence must start with an upper case letter). Fields may have a
datastore:"name,options" tag. The tag name is the property name,
which must be one or more valid Go identifiers joined by ".", but may
start with a lower case letter. An empty tag name means to just use
the field name. A "-" tag name means that the datastore will ignore
that field. If options is "noindex" then the field will not be
indexed. If the options is "" then the comma may be omitted. There are
no other recognized options.
Not possible to set those type of tags with Datastore.
Related
I have a field containing short texts (a few tokens). I index it as Text rather than String because I need to search within the text.
However, I need to search with the String-style (matching the entire field).
For example, if a field is Google Search Engine. I currently find the row by searching "search engine". While preserving this behavior, I need another option to catch the row only if the search term is "google search engine".
I believe it is possible by regex, but it should be slow.
I wonder if there is a standard way to do so or if I need to add another field of the same content but with the String type.
Use multiple fields - the definition of the second field will differ based on whether you want the search to be case sensitive or not. If you're OK with having a case sensitive field (i.e. "Google" and "google" are different terms), then string is the correct choice.
If you want the field to be case insensitive, use a TextField with a KeywordTokenizer (which keeps the input as a single, large token) with a LowercaseFilter attached (which lowercases the content).
You can then search both fields by using qf - query fields - with the edismax/dismax query parses and score them differently. If you only need explicit searching (you choose whether you want to match the whole string or just words in it yourself), using the field name in the regular way would work.
Use a copyField instruction to index the same content into both fields without changing your indexing pipeline. You'll need to reindex your core / collection for the new field to get any values.
And no, you can't do this with a regex, since the regex is applied against the tokens. You already have the tokens split up into smaller parts, so /foo bar/ doesn't have a foo bar token to match against, just foo and bar - neither match the regex.
In search definition the fields inside struct can not have "attribute" indexing.
http://docs.vespa.ai/documentation/reference/search-definitions-reference.html#field_types
Also, struct and maps are not attribute by default.
Resulting search definition would look something like this:
struct nlp {
field token type string {
match: text //can't add indexing here
}
}
field n type nlp {
indexing: summary //can't add attribute here
}
How to add search definition such that we can group by "n.token"? Is it possible to add attribute or indexing for struct fields? Or group by on fields which is not attribute?
Struct field type cannot have attribute which is necessary pre-requisite if you want to run grouping with indexed search, see http://docs.vespa.ai/documentation/reference/search-definitions-reference.html#struct
The only thing you can really do with struct fields with mode=index is to have them as part of the summary (response). You could add a custom searcher which does the aggregation over the struct field analyzing the top K retrieved hits. See http://docs.vespa.ai/documentation/searcher-development.html
With mode=streaming you can run grouping over struct fields, more about streaming here http://docs.vespa.ai/documentation/streaming-search.html
can't comment yet hence posting as answer.
#jkb does this amount to us running a search on some field not in a struct, returning a lot of documents and then synthesising them in a searcher (chaining a searcher would essentially do similar stuff)?
is it also correct to say that nested fields in a document cannot be indexed and in-turn effectively searched (i don't want to use streaming search), and hence the structure must always be flat for index search to work?
i can achieve a flat structure in most cases but what about fields having array of objects where i'd want to search on some attribute of such an object?
"x" : [
object: { attributes},
object: {attributes}
]
So I have the following multifields in my Solr docs (just an example).
"type": [
"Single",
"Series",
"Other association"
],
"name": [
"Name associated with first type",
"Name associated with second type",
"Name associated with third type"
]
I'm to be able to query both fields with:
name:"Name associated with third type" AND type:"Series"
But I need to query the value of name and also match the type value in the same array index. So the above example should not hit because "Name associated with third type" is third name and "Series" is second in type.
A query that should produce a hit would be this, because both values are in the same array index.
name:"Name associated with second type" AND type:"Series"
But this matches everything. I hope this was properly explained. I don't even know if this is possible.
No, not possible to match across two multi-valued fields with the same index. You will need to get all the docs that match and deal with them in the client side.
However, if you will always have the full values for both type and name then you can alter your schema and make it work like this.
Define a new field called type_names which is a multi-valued string field. Modify your indexer to set values in that field like
type_names: [Single#Name associated with first type,
Series#Name associated with second type,
Other association#Name associated with third type]
I am using # but you can use any delimiter string that works for you.
(You may want to lower case everything in your index if you want case-insensitive matching and also lower-case your query values.)
Then query this field like:
type_names:(Series#Name associated with second type)
If it so happens that you know the full value for type but only one word in name (say second), then you can use regex match:
type_names:/Series#.*second.*/
For fields with a unique name, you can just do doc.field('prop').value, but when you try to do that with a field with a name that occurs multiple times, you get
ValueError: Must have exactly one field with name prop, but found 3.
I've looked through the source of the Document class, but I couldn't find a method that returns a list of values associated with fields of a given name. Is there a good way to do that?
To be clear, I'm trying to do this with a Document, not a ScoredDocument.
When looking through the source code, I ignored any methods that started with underscores, because most of them were either private, or methods like __eq__. I clearly shouldn't have done that, because the document class implements __get__, so if a document has multiple fields named 'prop', you can do the following to get all its values:
values = [field.value for field in doc['prop']]
What is the advantages of mutivalued field option in solr.
I have a field with comma separated keywords.
I can do 2 things
make a non-multivalued text field
make a multivalued text field which contains each keyword
I can still query in both the cases. So whats the advantages of multivalued over non-multivalued?
advantages of multivalued: you don't need to change the document design. If en document containes multiple values in one filed, so solr/lucen can handle this field.
Also an advantage: multiple values could describe an document more exact (thing about tags of an blog post, or so)
advantages of non-multivalued: you can use specific features, which required an single term (word) in one filed, like spell checking. It's also a benefit for clustering (carrot) or grouping, which works mostly better on non-multivalued fields
Querying by the multivalue field will receive what you want.
Example: doc1 has a keyword 'abc', and doc2 has a keyword 'abcd'. If query by keyword 'abc' only doc1 should be matched.
So in non-multivalue approach both documents will matched, case you'll use like syntax.
multivalue fields can be very handy, let say you have many fields and you wish to search for several fields but not in all of them. you can create multivalue field that include all the fields that you wont to search for them on this field and search in it.
for example, let say you have fields that may have value of string or value of number. and than you wish to search on all string values that were found in the document. so you can create multivalue field for all string values and search in it.