How to get all solr field names except for multivalued fields? - solr

I'm new to solr and I'm trying to query for field names excluding fields with multiValued=true.
So far I have
select?q=*:*&wt=csv&rows=0&facet
which returns all the fields.
Is there a way to modify the query to check if a field is multivalued?

You can retrieve information about all the defined fields through the Schema API. The response will contain a multiValued field set to true if the field is defined as multivalued:
v1 API:
http://localhost:8983/techproducts/schema/fields
v2 API:
http://localhost:8983/api/collections/techproducts/schema/fields
{
"fields": [
{
"indexed": true,
"name": "_version_",
"stored": true,
"type": "long"
},
{
"indexed": true,
"multiValued": true, <----
"name": "cat",
"stored": true,
"type": "string"
},
],
"responseHeader": {
"QTime": 1,
"status": 0
}
}

Related

Azure Cognitive Search Complex object filtering

I have an index with azure cognative search but cant seem to find the right syntax to query it for what I need.
I have documents that looks like the below and want to be able to pass in a search for "black denim shirt" and have that matched against each item object in the document rather than the whole document.
I need this match to be confined to the objects as I don't want the "black" and "denim" from the "black denim shirt" query to be matched to a "black denim jeans". Therefore the match/higher ranked result should be Document 2
Document 1:
{
"id": "Style1",
"itemKeyWords": [
{
"productKeyWords": "shirt,oversized shirt,denim",
"attributeKeyWords": "blue"
},
{
"productKeyWords": "Skinny, denim, jeans",
"attributeKeyWords": "black"
}
]
}
Document 2:
{
"id": "Style2",
"itemKeyWords": [
{
"productKeyWords": "shirt,oversized shirt,denim",
"attributeKeyWords": "black"
},
{
"productKeyWords": "Skinny, denim, jeans",
"attributeKeyWords": "blue"
}
]
}
I have the itemKeyWords set up in the index as a
{
"name": "itemKeyWords",
"type": "Collection(Edm.ComplexType)",
"fields": [
{
"name": "productKeyWords",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": false,
"facetable": true,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": "en.lucene",
"normalizer": null,
"synonymMaps": []
},
{
"name": "attributeKeyWords",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": false,
"facetable": true,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": "en.lucene",
"normalizer": null,
"synonymMaps": []
}
]
}
I have tried various attempts using this as a guid but cant seem to get the syntax right
https://learn.microsoft.com/en-gb/azure/search/search-howto-complex-data-types?tabs=portal
Unfortunately, as of today, it is not possible to make "search" requests (queries that rely on the tokenized content) that enforce the requirement to have all matches within a specific entry of a complex object collection. This is only supported for filters right now (as long as the filter does not rely on the search.in function).
I can think of two (less than idea) work around:
Index each entry of the collection as separate documents
Flatten the sub-fields into a single field:
AggregateField: "Skinny, denim, jeans. black"
And then emit a query that use proximity search (to make sure all terms are within a certain distance):
queryType=full&search="black denim jeans"~5
If it's important for you to still keep the structured version of the content in the document (attribute and keywords separately), you can still index them along side the aggregated field for retrieval purpose (you can target different fields for matching purpose vs the one you actually return in the response by using select and searchFields)
queryType=full&search="black denim jeans"~3&searchFields=aggregatedFields&select=productKeyWords, attributeKeyWords
or
queryType=full&search=aggregatedFields:"black denim jeans"~3&select=productKeyWords,attributeKeyWords

Azure Cognitive Search prefix searching as single token

I'm trying to create an Azure Search index with a searchable Name field that should not be tokenized and be treated as a single string.
So if I have two values:
"Total Insurance"
"Invoice Total"
With a search term like this: search=Total*, then only "Total Insurance" should be returned because it starts with "Total".
My assumption was that the 'keyword' analyzer is to be used for this type of search
https://learn.microsoft.com/en-us/azure/search/index-add-custom-analyzers#built-in-analyzers
But it doesn't seem to work like that, it doesn't return any results with search=Total*.
Is there a different setup for this type of search?
Something like this is required:
{
"name":"myIndex",
"fields": [
{
"name":"Name",
"type":"Edm.String",
"searchable":true,
"filterable": true,
"retrievable": true,
"sortable": true,
"searchAnalyzer":"keyword",
"indexAnalyzer":"prefixAnalyzer"
}
],
"analyzers": [
{
"name":"prefixAnalyzer",
"#odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"keyword_v2",
"tokenFilters":[ "lowercase", "my_edgeNGram" ]
}
],
"tokenFilters": [
{
"name":"my_edgeNGram",
"#odata.type":"#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
"minGram":3,
"maxGram":7
}
]
}

Manipulate field value of copy-field in Apache Solr

I have a simple string "PART_NUMBER" value as a field in solr. I would like to add an additional field which places that value in a URL field. To do this, I created a new field type, field, and copy field
"add-field-type": {
"name": "endpoint_url",
"class": "solr.TextField",
"positionIncrementGap": "100",
"analyzer": {
"tokenizer": {
"class": "solr.KeywordTokenizerFactory"
},
"filters": [
{
"class": "solr.PatternReplaceFilterFactory",
"pattern": "([\\s\\S]*)",
"replacement": "http://myurl/$1.jpg"
}
]
}
},
"add-field": {
"name": "URL",
"type": "endpoint_url",
"stored": true,
"indexed": true
},
"add-copy-field":{ "source":"PART_NUMBER", "dest":"URL" }
As some of you probably guessed, my query output looks like
{
"id": "1",
"PART_NUMBER": "ABCD1234",
"URL": "ABCD1234",
"_version_": 1645658574812086272
}
Because the endpoint_url fieldtype only modifies the index. Indeed, when doing my analysis, I get
http://myurl/ABCD1234.jpg
My question: Is there any way to apply a tokenizer or filter and feed it back in to the field value? I would prefer this output when returning the result:
{
"id": "1",
"PART_NUMBER": "ABCD1234",
"URL": "http://myurl/ABCD1234.jpg",
"_version_": 1645658574812086272
}
Is this possible to do in Solr?
Solution was posted here:
Custom Solr analyzers not being used during indexing
I need to use an Update Processors In order to change the field value before analysis. The process can be found here:
https://lucene.apache.org/solr/guide/8_1/update-request-processors.html

"Unknown Error: mango_idx :: {no_usable_index,missing_sort_index}"}

I have the following query:
{'type': 'text',
'name': 'album-rating-text',
'index': {'fields': [
{'type': 'string', 'name': 'user_id'},
{'type': 'string', 'name': 'album_id'},
{'type': 'number', 'name': 'timestamp'}
]}}
Here is the query:
{'sort': [
{'user_id': 'desc'},
{'album_id': 'desc'},
{'timestamp': 'desc'}
],
'limit': 1,
'fields': ['user_id', 'album_id', 'timestamp'],
'selector': {
'$and': [
{'user_id': {'$eq': 'a#a.com'}},
{'album_id': {'$in': ['bf129f0d', '380e3a05'
]
}}]}}
The error:
{
"error":"unknown_error",
"reason":"Unknown Error: mango_idx :: {no_usable_index,missing_sort_index}"
}
I've seen a similar question however, all the fields that I'm indexing on are in my sort list.
Update:
As a workaround, I attempted to simplify by dropping the timestamp field:
{"type": "text",
"name": "album-rating-text",
"index": {"fields": [
{"type": "string", "name": "user_id"},
{"type": "string", "name": "album_id"}
]}}
And query as so ...
{"selector": {"$and": [
{"user_id": {"$eq": "a#a.com"}},
{"album_id": {"$in": ["bf129f0d", "380e3a05"]}
}]},
"fields": ["user_id", "album_id"]}
I get the following error:
{"warning":"no matching index found, create an index to optimize query time",
"docs":[
]}
To use sort function for a custom field, that field needs to be manually registered to "Query-index".
Cloudant doesn't do this, because it's resource consuming:
"The example in the editor shows how to index the field "foo" using
the json type index. You can automatically index all the fields in all
of your documents using a text type index with the syntax '{ "index":
{}, "type": "text" }', Note that indexing all fields can be resource
consuming on large data sets."
You can do this using the Cloudant dashboard. Go to your database and look for "Queryable indexes". Click Edit.
Add your field to the default template:
{
"index": {
"fields": [
"user_id"
]
},
"type": "json"
}
Press "Create index"
Field "user_id" is now queryable, and you can now use sort-function to it.
All fields need to be add manually, or you can register all fields as Query-index with:
{ "index": {}, "type": "text" }
Video instructions for creating Query-index:
https://www.youtube.com/watch?v=B3ZkxSFau8U
Try using a JSON index instead of the text index:
{
"type": "json",
"name": "album-rating-text",
"index": {
"fields": ["user_id", "album_id", "timestamp"]
}
}
If I remember correct, my query requirements changed and I chose to use a standard Cloudant Search index instead of a Mango index.

"There is no index available for this selector" despite the fact I made one

In my data, I have two fields that I want to use as an index together. They are sensorid (any string) and timestamp (yyyy-mm-dd hh:mm:ss).
So I made an index for these two using the Cloudant index generator. This was created successfully and it appears as a design document.
{
"index": {
"fields": [
{
"name": "sensorid",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
}
]
},
"type": "text"
}
However, when I try to make the following query to find all documents with a timestamp newer than some value, I am told there is no index available for the selector:
{
"selector": {
"timestamp": {
"$gt": "2015-10-13 16:00:00"
}
},
"fields": [
"_id",
"_rev"
],
"sort": [
{
"_id": "asc"
}
]
}
What have I done wrong?
It seems to me like cloudant query only allows sorting on fields that are part of the selector.
Therefore your selector should include the _id field and look like:
"selector":{
"_id":{
"$gt":0
},
"timestamp":{
"$gt":"2015-10-13 16:00:00"
}
}
I hope this works for you!

Resources