Solr search engine Updating document - solr

I'm using solr search engine.I'm new to this. I want to update data automatically every time when my database getting update or new data created in the tables.I tried delta import and full import.In these method I have to do it manually when ever I need to update.
Which way is best for update solr document.?
How to make it automatically?
Thanks for your help.

There isn't a built in way to do this using Solr. I wouldn't recommend running a full or delta import when just updating one row in a table. What most Solr deployments do with a database is update the corresponding document when updating a row. This will be application specific, but this is the most efficient and standard way of dealing with this issue.
Using full or delta imports would be something that would run nightly or every few hours typically.

So, basically you want to process document before adding in solr.
This can be achieved by adding new update processor in update process chain you can go through : Solr split joined dates to multivalue field.
Here they split data in a field and saved it as multi valued field

Related

How to revert an update on single mongo db document

I just updated a single document on mongo and my transaction wrong and i lost the previous data. Is there any way to get the data before making the update?
If you have written the document you are trying to restore recently, and you are using a replica set, you should be able to extricate the previous version of the document out of the oplog. Start here.
Atlas provides a point in time restore feature.

Updating SOLR schema using version numbers

I am designing a SOLR schema for my project, and will create the fields using the Schema-API.
I will likely need to add new fields to the schema in the future.
With SQL databases, I usually store a schema version number in a well-known table. Then when my app starts up, it checks to make sure the database schema is current. If not, I execute all of the needed updates (which are numbered) to bring it up to date.
How can I achieve this with SOLR using the schema-api? Specifically, how/where could I store and retrieve a version number?
My current workaround/solution is to store the version in the name of a SOLR field, for example by creating a field called "schema_version_2". When my app starts up, I retrieve the list of fields using the schema-api and iterate over them, looking for a field called "schema_version_XX".
Then I can determine whether I need to apply any upgrades to the SOLR schema for my app. If necessary, my app upgrades to the latest schema version (typically adding/modifying fields). At the end, I increment the version, for example by deleting the "schema_version_2" field and creating a new field called "schema_version_3".
I would still like to know what pattern and solution developers with more SOLR experience use to solve this problem.

Solr 4 - Delete fields from FieldsInfo file

I am running solr 4 version. I have created million fields in solr using a script . I saw GC has gone very high after adding these fields as every time searcher is open, these fields were loaded.
Now, I want to go back to the stage where my solr cluster was before adding those fields. Even though, I delete documents which has those fields, the cluster is not coming back to what it was as the fields are not getting deleted from fieldsInfo file.
Is there a way we can explicitly tell solr to delete the fields from the fieldsInfo file???
There is a schema API documented that can delete a field. However I don't know if this is already available for Solr 4. You should try if it works.

Handling constantly updating fields in solr and preventing constantly reloading index?

I'm running solr 6.3.0 with zookeeper/cloud.
I have a 'product' core that with a few integer/boolean fields that are getting updated constantly...a specific example would be inventory count.I have these working currently with an external file fields. This setup accomplishes my goal of not having solr constantly reindexing. However I don't like having to generate these files. I would rather handle have the come directly from my database.
It seems like atomic/partial updates is what I am looking for but I'm having trouble understanding the efficiency ramifications of these updates vs using external file fields...or perhaps there is a better approach all together.
I would appreciate any you could provide
ExternalFileField is useful when you want to change the values of a field in many/all of the docs at the same time. If that is not the case, I would just use a normal field and update as needed (with partial updates is possible). With the external file you don't need to reindex but you do need to reload the file...
Regarding efficiency, for sure, the less frequently you update your docs the better, you have to reach a balance.

SOLR 3.1 Indexing Issue

We are facing some issues with SOLR search.
We are using SOLR 3.1 with Jetty. We have set schema according to our requirement. We have set data-config.xml to import records into the Collection (Core) from our database (Sql Server 2005).
There are 320, 000 records in the database which we need to import.
After finished import, when i try to search all the records by SOLR admin
http://localhost:8983/solr/Collection_201/admin/
It shows me total number found 290, 000. So, 30, 000 records are missing.
Now following questions are in my mind
How could i know which record is not properly indexed? OR which record is missing? To know that, i tried a trick, i thought i should have put a field in the database to know that which record is imported into the SOLR collection and which is not. But the big question is how would i update this database field while import from data-config.xml. Because tag allows you only search queries OR in other words something to return. So, i got another idea to still update that database field. I created a stored procedure in my database, which contains update query that would update the field in the database and after that i have select query which is simply return 1 record to fulfill requirement. But when i tried to run DIH with that it returns "Index Failed. Rollback all the changes" error message and nothing imported. When i commented update query into the stored procedure, then it works. So it was not allowing me to run update query even it from stored procedure. So i tried really hard to find a way to update the database from DIH. But i was really failed to find anything Sad smile i refused this idea to update database.
I cleared the index and started import data again. This time i tried it manually run the solr admin import page for 5, 000 records per turn. At the end, for some how records are still missing.
Is this possible it is not committed properly. I red in the documentation that import page (http://localhost:8983/solr/Collection_201/dataimport?command=full-import&clean=false) automatically committed the imported data. But i personally noticed some time it does or sometime it does not. So it is really driving me crazy Sad smile
Now i am fully frustrated and start thinking the way i am using to use SOLR is right or not. If i am right, then is it reliable???? If am wrong, please guide me what is my mistake??
Please Please Please guide me how easily we can sync. collection with our database and make sure it is 100% synced.
What field are you using for your IDs in Solr and the database? The id field needs to be unique, so if you have 30,000 records that have the same ID as some 30,000 other records then the data will overwrite those records.
Also, when you run data import handler, you can query it for status (?command=status) and that should tell you the total number of records imported on the last run.
The first thing I would do is check for non-unique IDs in your database WRT the solr id field.
Also be aware, that when one record in the batch is wrong, the whole batch gets rolledback. So if it happened 3 times, and you are indexing 10K docs each, that would explain it.
At the time, I solved it: https://github.com/romanchyla/montysolr/blob/master/contrib/invenio/src/java/org/apache/solr/handler/dataimport/NoRollbackDataImporter.java
but there should be a better/more elegant solution than that. I don't know how to get missing records in your case. But if you have indexed the ids, then you can compare the indexed ids with the external source and get the gaps

Resources