Partial update in Azure cognitive search document - azure-cognitive-search

Is there a way for patch/partial update in Azure cognitive search?
I need to update just one field in the search document.
We have 100 fields (including complex types) in the search document. If I want to just update one single field which is of complex type (list of objects). Is there an option available to update just one property?
Or do we need to always read the complete existing document, update the required field and merge?

If you are trying to update your index definition, you may have to drop and rebuild unless you are doing any of the following:
Add new fields to a fields collection
Add newly created fields to a suggested
Add or change scoring profiles
Add or change encryption keys
Add new custom analyzers
Change CORS options
Change existing fields with any of these three modifications:
Change retrievable (values are true or false) Change searchAnalyzer
(used at query time) Add or change synonymMaps (used at query time)
Change CORS options
Change existing fields with any of these three modifications:
Change retrievable (values are true or false)
Change searchAnalyzer (used at query time)
Add or change synonymMaps (used at query time)
See https://learn.microsoft.com/en-us/rest/api/searchservice/update-index for more information.
If you want to update documents, you can use the "mergeOrUpload" search action to update any documents in your index.
See https://learn.microsoft.com/en-us/rest/api/searchservice/addupdate-or-delete-documents for more information.

Related

SOLR indexing arbitrary data

Let's say you have a simple forms automation application, and you want to index every submitted form in a Solr collection. Let's also say that form content is open-ended so that the user can create custom fields on the form and so forth.
Since users can define custom forms, you can't really predefine fields to Solr, so we've been using Solr's "schema-less" or managed schema mode. It works well, except for one problem.
Let's say a form comes through with a field called "ID" and a value of "9". If this is the first time Solr has seen a field called "ID", it dutifully updates it's schema, and since the value of this field is numeric, Solr assigns it a data type of one of it's numeric data types (we see "plong" a lot).
Now, let's say that the next day, someone submits another instance of this same form, but in the ID field, they type their name instead of entering a number. Solr spits this out and won't index this record because the schema says ID should be numeric, but on this record, it's not.
The way we've been dealing with this so far is to trap the exception we get when a field's data type disagrees with the schema, and then we use the Solr API to alter the schema, making the field in question a text or string instead of a numeric.
Of course, when we do this, we need to reindex the entire collection since the schema changed, and so we need to persist all the original data just in case we need to re-index everything after one of these schema data-type collisions. We're big Solr fans, but at the same time, we wonder whether the benefits of using the search engine outweigh all this extra work that gets triggered if a user simply enters character data in a previously numeric field.
Is there a way to just have Solr always assign something like "text_general" for every field, or is there some other better way?
I would say that you might need to handle the Id values at your application end.
It would be good to add a validation for Id, that Id should be of either string or numberic.
This would resolve your issue permanently. If this type is decided you don't have to do anything on the solr side.
The alternative approach would be have a fixed schema.xml.
In this add a field Id with a fixed fieldType.
I would suggest you to go with string as a fieldType for ID if don't want it to tokenize the data and want the exact match in the search.
If you would like to have flexibility in search for the Id field then you can add a text_general field type for the field.
You can create your own fieldType as well with provided tokenizer and filter according to your requirement for you the field Id.
Also don't use the schemaless mode in production. You can also map your field names to a dynamic field definition. Create a dynamic field such as *_t for the text fields. All your fields with ending with _t will be mapped to this.

Best Practices to update/add/remove fields for an Azure Search Index

I was wondering if there any good resources for best practices to deal with changes (Add/remove fields from search index) to your search index without taking your Azure search service and index down.
Do we need to create a completely new index and indexer to do that? I discovered that the Azure portal currently lets you add new fields to your index but what about updating/deleting fields from your search index.
Thanks!
If you add a field there is no strict requirement on rebuild. Existing indexed documents are given a null value for the new field. On a future re-index, values from source data are added to documents.
While you can't directly delete a field from an Azure Search index, you can achieve the same effect without rebuilding the index by having your application simply ignore the "deleted" field. If you use this approach, a deleted field isn't used, but physically the field definition and contents remain in the index until the next time you rebuild your index.
Changing a field definition requires you to rebuild your index, with the exception of changing these index attributes: Retrievable, SearchAnalyzer, SynonymMaps. You can add the Retrievable, SearchAnalyzer, and SynonymMaps attributes to an existing field or change their values without having to rebuild the index.

Google App Engine Datastore returns no rows if i have an Order clause

I have a 'kind' in datastore like so:
type CompanyDS struct {
Name string
}
If i query it with the 'order' clause below, it returns no rows (but doesn't give any error):
var companiesDS []CompanyDS
datastore.NewQuery("Company").Order("Name").GetAll(c, &companiesDS)
However if i remove the 'order("Name")' section it returns all the rows just fine.
I had to edit my entities in the google cloud platform console, and tick the box 'Index this property' in the Name field.
Since without Order() you can query all entities, that means they do exist with name "Company" and property "Name".
Indices for single properties are automatically created, so you don't need to specify explicit index for them.
But if you can't list them using a single property ordering like Order("Name"), that means that your existing entities are not indexed with the Name property. Note that every single entity may be indexed differently. When you save (put) an entity into the Datastore, you have the ability to specify which properties are to be indexed and which are not.
You can confirm this on the Google Cloud Platform Datastore console: execute the query
select * from Company
Then click on any of the results (its ID), then you will see the details of that entity, listing which property is indexed and which is not.
Fix:
You may edit the entities on the console: click on the "Name" property, and before saving, check the "Index this property". This will re-save this entity, making its Name indexed, and thus it will show up in the next query (ordered by Name).
You don't need to do this manually for all entities. Use your Go query code (without Order()), query all entities, then re-save all without modification, and so the Name will get indexed as a result of this (because your CompanyDS does not turn off indexing for the Name property). Make sure your struct contains all properties, else you would lose them when re-saving.
Note: You should ensure that the code that saves Company entities saves them with Name indexed.
In Go for example a struct tag with value ",noindex" will disable indexing for a single property like in this example:
type CompanyDS struct {
Name string `datastore:",noindex"`
}

How to create multiple index on two fields in google Objectify which should not be included in update and create request?

I want to create multiple index in an entity
on ID and creation date
There is one condition i dont want to use these index on update and create of that object
I am using Google objectify
I will use these multiple index in my search query
Please help?
Objectify has a feature called partial indexes , which define conditions that a certain property has to meet in order to get indexed.
You could hack that so that those indexed fields are only indexed if a given attribute (ej lastOperation) is not create or update.
Bear in mind tampering with index updates might lead to invalid query results as the index records (used for search) wont match the actual entity values.

Solr schema modifications that do not affect existing Documents

I am trying to figure out whether I need to re-index a [very large] document base in Solr in the following scenarios:
I want to add a few new fields to the schema: none of the old Documents need to be updated to add values for these fields, only new documents that I will be adding after the schema update will have these fields. Do I still need to re-index Solr?
I want to remove couple of not-used fields from the schema (they were added prematurely ...): none of the existing documents has any of these fields. Do I still need to re-index the Solr after the schema update?
I saw many recommendations for updating existing documents when adding/modifying fields, but this is not the case for me - I only want to update the schema, not the existing documents.
Thanks!
Marina
Answer 1: You are correct, you can add new field, you do not need to reindex if you want only new documents going forward to have value for that new field.
Answer 2: Yes, you can remove field without rebuilding index if none of documents have value for that field. You can make sure by looking at that field under:
http://localhost:8080/admin/schema.jsp
If one of documents has value for field you want to remove, you have to rebuild index, else it will give error.

Resources