I am trying to figure out whether I need to re-index a [very large] document base in Solr in the following scenarios:
I want to add a few new fields to the schema: none of the old Documents need to be updated to add values for these fields, only new documents that I will be adding after the schema update will have these fields. Do I still need to re-index Solr?
I want to remove couple of not-used fields from the schema (they were added prematurely ...): none of the existing documents has any of these fields. Do I still need to re-index the Solr after the schema update?
I saw many recommendations for updating existing documents when adding/modifying fields, but this is not the case for me - I only want to update the schema, not the existing documents.
Thanks!
Marina
Answer 1: You are correct, you can add new field, you do not need to reindex if you want only new documents going forward to have value for that new field.
Answer 2: Yes, you can remove field without rebuilding index if none of documents have value for that field. You can make sure by looking at that field under:
http://localhost:8080/admin/schema.jsp
If one of documents has value for field you want to remove, you have to rebuild index, else it will give error.
Related
I have some documents, I want to update some fields by one field (indexed, but not id), it seems Solr could not support it, I know Solr can update by id. Anyone could give me answers?
There is no similar way to SQLs UPDATE collection SET field = 'foo' WHERE field = 'bar';, no. You'd have to implement this yourself by fetching the documents, changing the value and then reindexing the documents.
Unfortunately this is not supported. See Solr documentation about updating parts of documents.
I was wondering if there any good resources for best practices to deal with changes (Add/remove fields from search index) to your search index without taking your Azure search service and index down.
Do we need to create a completely new index and indexer to do that? I discovered that the Azure portal currently lets you add new fields to your index but what about updating/deleting fields from your search index.
Thanks!
If you add a field there is no strict requirement on rebuild. Existing indexed documents are given a null value for the new field. On a future re-index, values from source data are added to documents.
While you can't directly delete a field from an Azure Search index, you can achieve the same effect without rebuilding the index by having your application simply ignore the "deleted" field. If you use this approach, a deleted field isn't used, but physically the field definition and contents remain in the index until the next time you rebuild your index.
Changing a field definition requires you to rebuild your index, with the exception of changing these index attributes: Retrievable, SearchAnalyzer, SynonymMaps. You can add the Retrievable, SearchAnalyzer, and SynonymMaps attributes to an existing field or change their values without having to rebuild the index.
I'm building a Java app using a relational database and I wish to map it's primary data to a Solr index/es. However, I'm not sure how to map the components of a database. At the momement I've mapped a single row cell to a Solr/Lucene Document.
A doc would be something like this (each line is a field):
schema: "schemaName"
table: "tableName"
column: "columnName"
row: "rowNumber"
data: "data on schemaName.tableName.columnName.row"
This allows me to have a "fixed" Solr schema.xml(as far as I know it has to be defined "before" creating indexes). Also dynamic fields doesn't seem to serve my purpose.
What I've found while searching is that a single row is usually mapped to a Solr Document and each column is mapped as a Field. But, how can I add the column names as fields into schema.xml (when I don't know the columns a table has)? Also, I would need the info to be queried as if it was SQL. I.e, search for all rows of a column in a table, etc, etc.
With my current "solution" I can do that kind of queries but I'm worried with performance as I'm new to Solr and I don't know the implications it may have.
So, what do you say about my "solution"? Is there another way map a database to a Solr index concerning the schema.xml fields should be set before indexing? I've also "heard" that a table is usually mapped to a index: how could I achieve that?
Maybe I'm just being noob but by the research I did I don't see how I can map a database Row to a Solr Document without messing with schema.xml Fields.
I would appreciate any thoughts :) Regards.
You can specify your table columns in the schema before hand or use dynamic fields and then use the solr DIH to import the data into solr from the database. Select your dynamic fields name in the queries for DIH.
Please go through Solr DIH for database integration
I'm working with solr and indexing data from DB.
When I import the data using SQL query, I got some rows with the same key.
I need a way that solr will generate a new field with unique key.
How can I do that?
Thanks
I am not sure if this is possible or not, but maybe you need to re-consider your logic here...
Indexing operation into Solr should be Re-Runable. So, imagine that you come one day and decide to change the schema of your core.
If you generate a new key everytime you import a document, you will end up creating duplicate items when you re-run your data import.
Maybe you need to revisit your DB design to have a unique key, or maybe in the select query, you can create a derived or calculated column value that is calculated based on multiple columns. But I am sure that pushing this problem to solr is not the solution.
ideally the unique key should come from the db (are you sure you cannot get one, by composing some columns etc?).
But, if you cannot, Solr supports UUID generation for this, look here to see how it works depending on your solr version
I am very new to solr.
Initially the "id" in my solr schema was of type string.
I have 30,000 documents, but now I want to use uuid instead of a string.
Simply changing the id to uuid and following instructions from http://wiki.apache.org/solr/UniqueKey
It did not work because it tried to string id as uuid and it failed.
My question is how do i change my id to uuid without deleting any data ?
Any info on this will be helpful.
Hope your id field is be mentioned as uniqueKey in the schema.xml. That means every solr document in your Solr instance must contain the id field. When you modify the type of any field in the schema, the previously created index for those fields get messed up. Now you can't query on those field, though they are still present in your Solr instance.
What good is that if you can not query on the data, you indexed to query? So, there is no good keeping the old document in your Solr, on which you can't query. And this time you have modified the uniqueKey field. So, you must re-index. If you would have modified the type of other field except uniqueKey, then Atomic update or partial update would have been a solution.