Query a specific index value in a multivalue field in Solr

Query a specific index value in a multivalue field in Solr - solr

So I have the following multifields in my Solr docs (just an example).
"type": [
"Single",
"Series",
"Other association"
],
"name": [
"Name associated with first type",
"Name associated with second type",
"Name associated with third type"
]
I'm to be able to query both fields with:
name:"Name associated with third type" AND type:"Series"
But I need to query the value of name and also match the type value in the same array index. So the above example should not hit because "Name associated with third type" is third name and "Series" is second in type.
A query that should produce a hit would be this, because both values are in the same array index.
name:"Name associated with second type" AND type:"Series"
But this matches everything. I hope this was properly explained. I don't even know if this is possible.

No, not possible to match across two multi-valued fields with the same index. You will need to get all the docs that match and deal with them in the client side.
However, if you will always have the full values for both type and name then you can alter your schema and make it work like this.
Define a new field called type_names which is a multi-valued string field. Modify your indexer to set values in that field like
type_names: [Single#Name associated with first type,
Series#Name associated with second type,
Other association#Name associated with third type]
I am using # but you can use any delimiter string that works for you.
(You may want to lower case everything in your index if you want case-insensitive matching and also lower-case your query values.)
Then query this field like:
type_names:(Series#Name associated with second type)
If it so happens that you know the full value for type but only one word in name (say second), then you can use regex match:
type_names:/Series#.*second.*/

Related

SOLR - Searching record based on SOLR field in passed string

I have a CSV string field say "field1" in SOLR which can have value similar to 1,5,7
Now, I want to get this record if I pass values:
1,5,6,7
OR
1,5,7,10
OR
1,5,7
Basically any of these inputs should return me this record from SOLR.
Is there anyway to achieve this. I am open for schema change if it helps.

The Standard Tokenizer (used in text fields like text_general) will not split on commas if there is no space in between characters.
That means that "1,2,3" will be indexed as a single token ("1,2,3") but it will index "1, 2, 3" as three tokens ("1", "2", "3").
If you can make sure there will be a space after the comma in the value that you are indexing and the value that you are using in your search query you might be able to achieve what you want by indexing your field as a text_general.
You can use the Analysis Screen in Solr to see how your value will be indexed and searched and see if any of the built-in field types gives you what you want.

Solr ignoring slash order

I have an index field called: texts
The field contains values like: 12/1
And also: 1/12
The problem is when I query: texts:"1/*"
It's finding also 12/1 it's like the slash don't have any meaning.
How I can limit the results by order?
(I've tried texts:"1\/*" and it's not working)
The type of the field:
<fieldType class="org.apache.solr.schema.TextField" name="TextField">

The problem is that you're using the TextField type that performs tokenization of your text, and then additional filtering, like, lower-casing, etc. In your case, you don't have value 12/1 in your index, but you have 2 values, 12 and 1, for both first & second values, so you search for 1/* will match to both records because search will be performed for value 1 that was generated after tokenization of your input.
To keep string from tokenization you need:
either use StrField type instead - but in this case, the string will be indexed as-is, without lower-casing, etc.
if you want to have lower-casing, etc., then define a new type for your field, but use solr.KeywordTokenizerFactory as tokenizer, and add corresponding filters.
You can read more in the DataStax documentation. Also note, that starting with version 6, default type for text data is StrField, and you need explicitly define TextField if you need tokenization, etc.

How to filter an array in Azure Search

I have following Data in my Index,
{
"name" : "The 100",
"lists" : [
"2c8540ee-85df-4f1a-b35f-00124e1d3c4a;Bellamy",
"2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike",
"2c8540ee-85df-4f1a-b35f-00155c02e581;Clark"
]
}
I have to get all the documents where the lists has Pike in it.
Though a full search query works with Any I could't get the contains work.
$filter=lists/any(t: t eq '2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike')
However i am not sure how to search only with Pike.
$filter=lists/any(t: t eq 'Pike')
I guess the eq looks for a full text search, is there any way with the given data structure I should make this query work.
Currently the field lists has no searchable property only the filterable property.

The eq operator looks for exact, case-sensitive matches. That's why it doesn't match 'Pike'. You need to structure your index such that terms like 'Pike' can be easily found. You can accomplish this in one of two ways:
Separate the GUIDs from the names when you index documents. So instead of indexing "2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike" as a single string, you could index them as separate strings in the same array, or perhaps in two different collection fields (one for GUIDs and one for names) if you need to correlate them by position.
If the field is searchable, you can use the new search.ismatch function in your filter. Assuming the field is using the standard analyzer, full-text search will word-break on the semicolons, so you should be able to search just for "Pike" and get a match. The syntax would look like this: $filter=search.ismatch('Pike', 'lists') (If looking for "Pike" is all your filter does, you can just use the search and searchFields parameters to the Search API instead of $filter.) If the "lists" field is not already searchable, you will need to either add a new field and re-index the "lists" values, or re-create your index from scratch with the new field definition.
Update
There is a new approach to solve this type of problem that's available in API versions 2019-05-06 and above. You can now use complex types to represent structured data, including in collections. For the original example, you could structure the data like this:
{
"name" : "The 100",
"lists" : [
{ "id": "2c8540ee-85df-4f1a-b35f-00124e1d3c4a", "name": "Bellamy" },
{ "id": "2c8540ee-85df-4f1a-b35f-00155c40f11c", "name": "Pike" },
{ "id": "2c8540ee-85df-4f1a-b35f-00155c02e581", "name": "Clark" }
]
}
And then directly query for the name sub-field like this:
$filter=lists/any(l: l/name eq 'Pike')
The documentation for complex types is here.

golang datastore struct: keeping field unique and required

Im wondering how to best guarantee that a field is unique and isn't saved to the datastore if it isn't. Also that it should be required. I am using this field as stringID and need it to be unique. I know that I can simply try to get an entity by this field and see if it exists and build a logic around it. But is there a simpler way like declaring in your struct that the field should be unique and/or required? Like the mockup below.
type Car struct {
Regnr string "required" "unique"
}
Thanks!

From the Datastore API:
By default, for struct pointers, all properties are potentially
indexed, and the property name is the same as the field name (and
hence must start with an upper case letter). Fields may have a
datastore:"name,options" tag. The tag name is the property name,
which must be one or more valid Go identifiers joined by ".", but may
start with a lower case letter. An empty tag name means to just use
the field name. A "-" tag name means that the datastore will ignore
that field. If options is "noindex" then the field will not be
indexed. If the options is "" then the comma may be omitted. There are
no other recognized options.
Not possible to set those type of tags with Datastore.

CFSearch solr using a list in a custom field

I'm attempting to index around 30,000 database records in a single collection and per my requirements I need to be able to include a list of items in a single custom field - and use that in my search.
Here's an example of my index:
<cfindex collection = "myCollection"
action = "refresh"
type = "custom"
query = "Local.myQuery"
key = "ID"
title="Title"
applications_s="A_Comma_Separated_List"
body = "a_field,a_nother_field">
In this example, applications_s is a dynamic custom field (introduced in CF10) containing a list of application IDs.
An example of content for this field would be:
T1,T2,B4,G1
This all indexes splendidly, however I've been unable to figure out how to search, using a single item in the applications list as criteria.
So, I'd like to be able to do this:
<cfsearch name="Local.qSearch"
collection="myCollection"
criteria="test AND applications_s:T1">
This should return all records that contain the word 'test' in the body, and also contain 'T1' in the applications field. However, I can't find a criteria syntax that will treat the contents of the custom field as a comma separated list... it seems to only work as a string. Therefore my example record wouldn't be returned unless I include a wildcard - which could cause problems with extra records being returned by mistake.
Is there any way to explicitly specify that my custom field is a list and should contain my specified value?

I managed to get the following to work on CF9.0.1. Although the MYCUSTOMNAME_TYPE (e.g. applications_s) fields are CF10-only, I was able to use the custom1 field and specify it as a "string" type by editing the collection's schema.xml and restarting Solr. You shouldn't have to on CF10.
1) In the query you're indexing, add TWO commas to the beginning of the application list column, and ONE at the end, so an example row would look like:
,,T1,T1B,T2,B4,G1,
You could do this either in your SQL using concatenation (preferable), or by post-processing the query result with Query-of-Queries, or QueryNew() and looping over the query to build a copy.
2) Index the query with cfindex as in your question, using applications_s to ensure the field is a string type, not text. We don't want the list to be "tokenised" as words. The commas are critical and we don't want them to be ignored.
3) In your cfsearch pad the criteria as follows:
<cfset searchString= "test">
<cfset applicationFilter = "T1">
<cfsearch name="Local.qSearch"
collection="myCollection"
criteria="#searchString# AND applications_s:,*,#applicationFilter#,*">
Note there are 3 commas and 2 wildcard asterisks altogether. The first comma is there because you cannot start a Solr query with a wildcard. The second and third commas ensure that the wildcard search for T1 does not match T1B.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight