changing solr id from string to uuid - solr

I am very new to solr.
Initially the "id" in my solr schema was of type string.
I have 30,000 documents, but now I want to use uuid instead of a string.
Simply changing the id to uuid and following instructions from http://wiki.apache.org/solr/UniqueKey
It did not work because it tried to string id as uuid and it failed.
My question is how do i change my id to uuid without deleting any data ?
Any info on this will be helpful.

Hope your id field is be mentioned as uniqueKey in the schema.xml. That means every solr document in your Solr instance must contain the id field. When you modify the type of any field in the schema, the previously created index for those fields get messed up. Now you can't query on those field, though they are still present in your Solr instance.
What good is that if you can not query on the data, you indexed to query? So, there is no good keeping the old document in your Solr, on which you can't query. And this time you have modified the uniqueKey field. So, you must re-index. If you would have modified the type of other field except uniqueKey, then Atomic update or partial update would have been a solution.

Related

solr don't send the id field with the result

i am pretty new to solr. and i don't know what is the best practice for the id column.
currently i wish to exclude the internal "id" parameter from solr search results (i am using my custom user_id field ).
i know i can use the fl=field1,field2. but this means specifying all my fields here. and i don't have a deep knowledge in solr and i fear this will hurt performance. ?
another question is it recommended to add another field user_id or overwrite the default id field ?
thank you very much.
If the value you have in your user_id field is unique, index that into your id column or define the user_id field as your unique key instead and don't use the id field.
The important thing is that there's a unique field in your document so that Solr knows when a document should be updated compared to when a new document should be added instead.
If the id field is not relevant / secret, I'm not sure why you'd be worried about including it.

SOLR indexing arbitrary data

Let's say you have a simple forms automation application, and you want to index every submitted form in a Solr collection. Let's also say that form content is open-ended so that the user can create custom fields on the form and so forth.
Since users can define custom forms, you can't really predefine fields to Solr, so we've been using Solr's "schema-less" or managed schema mode. It works well, except for one problem.
Let's say a form comes through with a field called "ID" and a value of "9". If this is the first time Solr has seen a field called "ID", it dutifully updates it's schema, and since the value of this field is numeric, Solr assigns it a data type of one of it's numeric data types (we see "plong" a lot).
Now, let's say that the next day, someone submits another instance of this same form, but in the ID field, they type their name instead of entering a number. Solr spits this out and won't index this record because the schema says ID should be numeric, but on this record, it's not.
The way we've been dealing with this so far is to trap the exception we get when a field's data type disagrees with the schema, and then we use the Solr API to alter the schema, making the field in question a text or string instead of a numeric.
Of course, when we do this, we need to reindex the entire collection since the schema changed, and so we need to persist all the original data just in case we need to re-index everything after one of these schema data-type collisions. We're big Solr fans, but at the same time, we wonder whether the benefits of using the search engine outweigh all this extra work that gets triggered if a user simply enters character data in a previously numeric field.
Is there a way to just have Solr always assign something like "text_general" for every field, or is there some other better way?
I would say that you might need to handle the Id values at your application end.
It would be good to add a validation for Id, that Id should be of either string or numberic.
This would resolve your issue permanently. If this type is decided you don't have to do anything on the solr side.
The alternative approach would be have a fixed schema.xml.
In this add a field Id with a fixed fieldType.
I would suggest you to go with string as a fieldType for ID if don't want it to tokenize the data and want the exact match in the search.
If you would like to have flexibility in search for the Id field then you can add a text_general field type for the field.
You can create your own fieldType as well with provided tokenizer and filter according to your requirement for you the field Id.
Also don't use the schemaless mode in production. You can also map your field names to a dynamic field definition. Create a dynamic field such as *_t for the text fields. All your fields with ending with _t will be mapped to this.

Solr schema modifications that do not affect existing Documents

I am trying to figure out whether I need to re-index a [very large] document base in Solr in the following scenarios:
I want to add a few new fields to the schema: none of the old Documents need to be updated to add values for these fields, only new documents that I will be adding after the schema update will have these fields. Do I still need to re-index Solr?
I want to remove couple of not-used fields from the schema (they were added prematurely ...): none of the existing documents has any of these fields. Do I still need to re-index the Solr after the schema update?
I saw many recommendations for updating existing documents when adding/modifying fields, but this is not the case for me - I only want to update the schema, not the existing documents.
Thanks!
Marina
Answer 1: You are correct, you can add new field, you do not need to reindex if you want only new documents going forward to have value for that new field.
Answer 2: Yes, you can remove field without rebuilding index if none of documents have value for that field. You can make sure by looking at that field under:
http://localhost:8080/admin/schema.jsp
If one of documents has value for field you want to remove, you have to rebuild index, else it will give error.

DIH delta import for multiple fields index from mysql

What is pk in solr DIH delta import? I am trying to delta index multiple fields in solr?
I believe it is whatever field you specify in your schema.xml file as the id field.
It is a name of Solr field that serves as a unique key for that record. You define your mapping of source to that Solr column and then - after mapping - Solr checks its presence and values based on the pk field you specified.
It is different from primaryKey because you may be generating primaryKey or it may not be suitable somehow. But it could be same. I think the clearest Wiki explanation may be in the example for HttpDataSource.
I believe, you may also be able to define a compound pk for when you are flattening inner source entries into one Solr entry.
I think the problem is in your delta-query for the child entity. You have given,
deltaQuery="select id from cc_gadget_lang where '${cc_gadget.last_modified_date}' > '${dataimporter.last_index_time}'"
I think the where condition in the above query validates to TRUE always and there is no specific purpose of having that.
The Solution I would suggest is to have a separate "last_modified_date" field in the "cc_gadget_lang" table in your database and use that in the delta query of your child entity.
I also believe that there is no need to have the "pk" of the child entity in your schema file because, they are stored and used temporarily during delta-imports and do not require to be stored permanently in Index.

Obtaining record from Solr using single Key

I am using solr and looked over the documentations but couldn't find a way to get a single record from Solr by using a key?
If I know the key value of the record what is the query I need to pass to Solr to obtain this record?
Thanks.
Not sure what you mean by key, but guessing from context, you mean a field defined by your schema, if this is the case, you could issue the following:
// Assumes Id is a schema field
// If via solr admin
q=Id:1
// Properly escaped
q=Id%3A1

Resources