Azure Search : Blob Metadata Field value not appearing in Indexed data - azure-cognitive-search

We have set Metadata on Block Blob and are able to verify the Key/Value correctly stamped on the blob.
Out of many fields that have been defined in the index, only one field value (PromotionId) is not appearing in the indexed data, as can be confirmed by executing search through the "Search explorer".
This field has actually been mapped to the key "ID" in the indexer.
And has been defined in Index.
Why is the value of this specific field not appearing in the index? Rest all metadata fields are all appearing in the index as expected.

The field mapping is not working because "ID" is specified as a sourceFieldName, but there is no ID property on the source Blob as it only exists on the Index you have defined.
This may be a bit confusing since it behaves like there is an "ID" property because the "ID" field is being populated without a field mapping. However, this is because Azure Search automatically maps the "metadata_storage_path" to whichever field is the document key when no field mapping for the document key is specified. This behavior is documented here.
If you want the PromotionId to be the document path like the ID field you can change the sourceFieldName to "metadata_storage_path" for the PromotionId field mapping. If you want to also be base64 encoded you can add the fieldMappingFunction to the field mapping as well.

Related

SOLR indexing arbitrary data

Let's say you have a simple forms automation application, and you want to index every submitted form in a Solr collection. Let's also say that form content is open-ended so that the user can create custom fields on the form and so forth.
Since users can define custom forms, you can't really predefine fields to Solr, so we've been using Solr's "schema-less" or managed schema mode. It works well, except for one problem.
Let's say a form comes through with a field called "ID" and a value of "9". If this is the first time Solr has seen a field called "ID", it dutifully updates it's schema, and since the value of this field is numeric, Solr assigns it a data type of one of it's numeric data types (we see "plong" a lot).
Now, let's say that the next day, someone submits another instance of this same form, but in the ID field, they type their name instead of entering a number. Solr spits this out and won't index this record because the schema says ID should be numeric, but on this record, it's not.
The way we've been dealing with this so far is to trap the exception we get when a field's data type disagrees with the schema, and then we use the Solr API to alter the schema, making the field in question a text or string instead of a numeric.
Of course, when we do this, we need to reindex the entire collection since the schema changed, and so we need to persist all the original data just in case we need to re-index everything after one of these schema data-type collisions. We're big Solr fans, but at the same time, we wonder whether the benefits of using the search engine outweigh all this extra work that gets triggered if a user simply enters character data in a previously numeric field.
Is there a way to just have Solr always assign something like "text_general" for every field, or is there some other better way?
I would say that you might need to handle the Id values at your application end.
It would be good to add a validation for Id, that Id should be of either string or numberic.
This would resolve your issue permanently. If this type is decided you don't have to do anything on the solr side.
The alternative approach would be have a fixed schema.xml.
In this add a field Id with a fixed fieldType.
I would suggest you to go with string as a fieldType for ID if don't want it to tokenize the data and want the exact match in the search.
If you would like to have flexibility in search for the Id field then you can add a text_general field type for the field.
You can create your own fieldType as well with provided tokenizer and filter according to your requirement for you the field Id.
Also don't use the schemaless mode in production. You can also map your field names to a dynamic field definition. Create a dynamic field such as *_t for the text fields. All your fields with ending with _t will be mapped to this.

Google App Engine Datastore returns no rows if i have an Order clause

I have a 'kind' in datastore like so:
type CompanyDS struct {
Name string
}
If i query it with the 'order' clause below, it returns no rows (but doesn't give any error):
var companiesDS []CompanyDS
datastore.NewQuery("Company").Order("Name").GetAll(c, &companiesDS)
However if i remove the 'order("Name")' section it returns all the rows just fine.
I had to edit my entities in the google cloud platform console, and tick the box 'Index this property' in the Name field.
Since without Order() you can query all entities, that means they do exist with name "Company" and property "Name".
Indices for single properties are automatically created, so you don't need to specify explicit index for them.
But if you can't list them using a single property ordering like Order("Name"), that means that your existing entities are not indexed with the Name property. Note that every single entity may be indexed differently. When you save (put) an entity into the Datastore, you have the ability to specify which properties are to be indexed and which are not.
You can confirm this on the Google Cloud Platform Datastore console: execute the query
select * from Company
Then click on any of the results (its ID), then you will see the details of that entity, listing which property is indexed and which is not.
Fix:
You may edit the entities on the console: click on the "Name" property, and before saving, check the "Index this property". This will re-save this entity, making its Name indexed, and thus it will show up in the next query (ordered by Name).
You don't need to do this manually for all entities. Use your Go query code (without Order()), query all entities, then re-save all without modification, and so the Name will get indexed as a result of this (because your CompanyDS does not turn off indexing for the Name property). Make sure your struct contains all properties, else you would lose them when re-saving.
Note: You should ensure that the code that saves Company entities saves them with Name indexed.
In Go for example a struct tag with value ",noindex" will disable indexing for a single property like in this example:
type CompanyDS struct {
Name string `datastore:",noindex"`
}

Solr query searching on non-indexed fields

Solr version 6.1.0
Created a schema with some fields as indexed=true on which I specifically want the solr main-query q to search.
And also added more fields, which I just wanted to select, so marked them as stored=true and indexed=false.
Issue now is that, main query q=India is searching on non-indexed fields like country, which I have specified in the image.
See the result below
It is selecting the non-indexed field only when I specify the full value of non-indexed field.
See result for q=Indi
How can I restrict solr from searching on non-index fields?
According to the screenshot above you're copying the content sent to the field country into the field _text_. When you're not giving Solr a specific field to search (i.e. you're not using one of the dismax handlers with qf or not prefixing your term with the field name field:value), it falls back to the default search field. This is set to _text_ by default. This field is indexed, and since you're copying the content from your country field into the _text_ field, the values from country will give a hit.
If you don't want this to happen, don't copy the content from country into _text_, or give Solr the actual field you want to search.

Elasticsearch Unique field

I want to store urls in an index but I want unique url.
I'm making POST request to store my documents but I want to avoid duplicate document based on the url field.
Is there a way to specify a unique constraint on the url field ?
I have around 5 million of data so I don't want to make url as the document ID instead as it will slowdown my search query.
No, the _id is the only field that can have the uniqueness restriction. You probably know this but a new document with existing id would override the existing document with same id. You can use op_type=create or /my_index/my_type/ID/_create in order to get back an error if a document with same id already exists.

Solr return file name

I have indexed a couple of documents using solr, now when I perform a search using the admin interface, it returns search results in the XML format.
I am trying to figure out how can I associate a document that I have indexed example: test.pdf with the results that I receive and then serve that document to my user ?
Will solr return to me a unique ID of the document that I index, so that after indexing a document I can store the document along with that UID in my database somewhere and then when the user performs a search solr return the unique ID's of documents that match the search criteria and then I serve them from the database
You will need to add the filename as a stored field. Look at your schema.xml and make sure you declare a field of type string and set the stored attribute to true. By setting stored=true you will ensure that Solr can return the field back in results.
See this page for more information: http://wiki.apache.org/solr/SchemaXml

Resources