How solr behaves for empty index fields?

How solr behaves for empty index fields? - solr

I have a number of index fields in solr schema. Some of these are filled on regular basis on indexing content into solr. But some of these are filled conditionally such as if field name is available in source, index it and fill it in index field otherwise leave it empty.
How solr behaves in these scenarios. Do i get all fields with/without values in solr index or i will see only those index fields which are non-empty. I think later scenario should hold true.
Regards.

If the field is not marked as required then it will simply be missing from the document, and queries on those field will miss the document in question.

Related

SOLR indexing arbitrary data

Let's say you have a simple forms automation application, and you want to index every submitted form in a Solr collection. Let's also say that form content is open-ended so that the user can create custom fields on the form and so forth.
Since users can define custom forms, you can't really predefine fields to Solr, so we've been using Solr's "schema-less" or managed schema mode. It works well, except for one problem.
Let's say a form comes through with a field called "ID" and a value of "9". If this is the first time Solr has seen a field called "ID", it dutifully updates it's schema, and since the value of this field is numeric, Solr assigns it a data type of one of it's numeric data types (we see "plong" a lot).
Now, let's say that the next day, someone submits another instance of this same form, but in the ID field, they type their name instead of entering a number. Solr spits this out and won't index this record because the schema says ID should be numeric, but on this record, it's not.
The way we've been dealing with this so far is to trap the exception we get when a field's data type disagrees with the schema, and then we use the Solr API to alter the schema, making the field in question a text or string instead of a numeric.
Of course, when we do this, we need to reindex the entire collection since the schema changed, and so we need to persist all the original data just in case we need to re-index everything after one of these schema data-type collisions. We're big Solr fans, but at the same time, we wonder whether the benefits of using the search engine outweigh all this extra work that gets triggered if a user simply enters character data in a previously numeric field.
Is there a way to just have Solr always assign something like "text_general" for every field, or is there some other better way?

I would say that you might need to handle the Id values at your application end.
It would be good to add a validation for Id, that Id should be of either string or numberic.
This would resolve your issue permanently. If this type is decided you don't have to do anything on the solr side.
The alternative approach would be have a fixed schema.xml.
In this add a field Id with a fixed fieldType.
I would suggest you to go with string as a fieldType for ID if don't want it to tokenize the data and want the exact match in the search.
If you would like to have flexibility in search for the Id field then you can add a text_general field type for the field.
You can create your own fieldType as well with provided tokenizer and filter according to your requirement for you the field Id.
Also don't use the schemaless mode in production. You can also map your field names to a dynamic field definition. Create a dynamic field such as *_t for the text fields. All your fields with ending with _t will be mapped to this.

MongoDB with Apache Solr: Should you index entire collection in Solr? If not then how to get the complete document based on solr index search query

I am using apache solr for field based and for full text search as well.
Should I index entire collection of mongodb in Solr?
If i decide to index only selected fields out of a document of mongo collection in apache-solr, then will I be able to get the complete document from indexed search query?

Two of the properties fields can have in Solr are indexed and stored. In a high level simple way, indexed means they are processed and searchable, stored just means their original content is saved as is and can be retrieved. So for example, you can index the entire MongoDB document into a stored Solr field, then index various other parts of the document into indexed fields. So you could search on those indexed fields, and get the entire document back from the stored field in the result.
Note: fields can be both indexed and stored

Solr query searching on non-indexed fields

Solr version 6.1.0
Created a schema with some fields as indexed=true on which I specifically want the solr main-query q to search.
And also added more fields, which I just wanted to select, so marked them as stored=true and indexed=false.
Issue now is that, main query q=India is searching on non-indexed fields like country, which I have specified in the image.
See the result below
It is selecting the non-indexed field only when I specify the full value of non-indexed field.
See result for q=Indi
How can I restrict solr from searching on non-index fields?

According to the screenshot above you're copying the content sent to the field country into the field _text_. When you're not giving Solr a specific field to search (i.e. you're not using one of the dismax handlers with qf or not prefixing your term with the field name field:value), it falls back to the default search field. This is set to _text_ by default. This field is indexed, and since you're copying the content from your country field into the _text_ field, the values from country will give a hit.
If you don't want this to happen, don't copy the content from country into _text_, or give Solr the actual field you want to search.

changing solr id from string to uuid

I am very new to solr.
Initially the "id" in my solr schema was of type string.
I have 30,000 documents, but now I want to use uuid instead of a string.
Simply changing the id to uuid and following instructions from http://wiki.apache.org/solr/UniqueKey
It did not work because it tried to string id as uuid and it failed.
My question is how do i change my id to uuid without deleting any data ?
Any info on this will be helpful.

Hope your id field is be mentioned as uniqueKey in the schema.xml. That means every solr document in your Solr instance must contain the id field. When you modify the type of any field in the schema, the previously created index for those fields get messed up. Now you can't query on those field, though they are still present in your Solr instance.
What good is that if you can not query on the data, you indexed to query? So, there is no good keeping the old document in your Solr, on which you can't query. And this time you have modified the uniqueKey field. So, you must re-index. If you would have modified the type of other field except uniqueKey, then Atomic update or partial update would have been a solution.

Is there any way to convert a solr multifield value to single field for sort?

I have records that have multiple values so I put those fields in a multifield value for its solr document. The issue is I also need to return an ordered list of these values. I have way to many records to pull all document values and sort myself. I tried to create separate solr documents to store just these values with needed information but managing this has become a nightmare. Attempting to keep comments low and managing memory has not been ideal for this solution.
Is there anyway to copy these multifield values into single field values for the same document and sort on these multiple single field values in solr?
Thanks for any help.

doesn't faceting help you? you won't need to have a copyfield for multivalued/non-multivalued, just store them in a multivalued field, facet them and set the sorting criteria for the facet (default: number of occurrencies for each value)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight