reindexing json object into solr by adding only unique elements

reindexing json object into solr by adding only unique elements - solr

I have indexed json object into solr using httpclient
and when I tried to index again, duplicate records are getting indexed.
So how to update the records into solr, everytime I index I want to update the records.
Thanks in advance

In your JSON Object include an ID field inside your json object and it should be unique, for example some random number like 65746 . When you will try to index this document again, solr will check for id .If id is same, solr will not index that whole document again . Now the question is how you declare a unique field in solr schema . So for that go to your schema.xml file or managed-schema file which is inside your core configuration and define unique field like this id . Now solr will identify id coming from your JSON as unique , and won't indexed already indexed documents.Hence there will be no duplicate records. Let me know if that helped you :)

Related

how to extract two json indexed documents separately using solrj or QueryResponse

I have indexed two json documents into Solr, and when I get the response am I recieving both documents - how to differentiate the two documents and store the documents separately?

You need to define a (unique) key when indexing the json-documents - this key being either mandatory or not. This could be done in schema.xml or managed-schema, if not already done. Further on would you have to search for this key in the query for fetching the wanted document.
This can be compared with querying for a unique primary key in SQL and traditional databases. A tuple/record, uniquely defined by the primary key, would in this scenario be equivalent with the json documents.
Assuming two documents with respective unique id 1 and 2 - can you fetch document 1 by searching forq=id:1 in the Solr Admin-UI - if you want the document with id 1. I'm afraid I don't know how to do this is Solrj or by QueryResponse.
Management of where documents are stored in Solr is not supported - it is more or less black-boxed. This should however not be a problem considering your situation as long as you specify the query correctly.

Look here for a link that tells how to use Solr 6 as a JDBC dataSource . Better if you use Solr 6 if you want to utilize Solr more as a data source rather than an index source as it has enhanced SQL level features and hence, serves the purpose best . Here is the link https://sematext.com/blog/2016/04/26/solr-6-as-jdbc-data-source/ . Let me know if that helps you :) .

Solr Indexing duplicate documents

I am using solr to store filepaths and my 'id' (uniquekey) and index its content. When I change the file contents and re-index it, it replaces the contents of the file in the index. Is there anyway I can retain the old version of file under the same id ? I tried adding the overwrite=false parameter with no luck. I am using solr 6.1.0

I think you cannot do that under the same id as id is the uniquekey.
Even its not possible to achieve on RDBMS type.
It could be achieved by providing another id and maintaining the relations of new id(where the document content is changed, consider it as new document with new id) and then have relation of new id and the old id .
You can have a similar concept for solr as well, but every document you need to have another field like id and older_id .
Here in the older_id you can have the id of the document which id the older version and has the content of old document.
And with this your older documents will not be deleted from solr as they will have the new document and new id and older_id the previous document id.

Elasticsearch Unique field

I want to store urls in an index but I want unique url.
I'm making POST request to store my documents but I want to avoid duplicate document based on the url field.
Is there a way to specify a unique constraint on the url field ?
I have around 5 million of data so I don't want to make url as the document ID instead as it will slowdown my search query.

No, the _id is the only field that can have the uniqueness restriction. You probably know this but a new document with existing id would override the existing document with same id. You can use op_type=create or /my_index/my_type/ID/_create in order to get back an error if a document with same id already exists.

changing solr id from string to uuid

I am very new to solr.
Initially the "id" in my solr schema was of type string.
I have 30,000 documents, but now I want to use uuid instead of a string.
Simply changing the id to uuid and following instructions from http://wiki.apache.org/solr/UniqueKey
It did not work because it tried to string id as uuid and it failed.
My question is how do i change my id to uuid without deleting any data ?
Any info on this will be helpful.

Hope your id field is be mentioned as uniqueKey in the schema.xml. That means every solr document in your Solr instance must contain the id field. When you modify the type of any field in the schema, the previously created index for those fields get messed up. Now you can't query on those field, though they are still present in your Solr instance.
What good is that if you can not query on the data, you indexed to query? So, there is no good keeping the old document in your Solr, on which you can't query. And this time you have modified the uniqueKey field. So, you must re-index. If you would have modified the type of other field except uniqueKey, then Atomic update or partial update would have been a solution.

Solr return file name

I have indexed a couple of documents using solr, now when I perform a search using the admin interface, it returns search results in the XML format.
I am trying to figure out how can I associate a document that I have indexed example: test.pdf with the results that I receive and then serve that document to my user ?
Will solr return to me a unique ID of the document that I index, so that after indexing a document I can store the document along with that UID in my database somewhere and then when the user performs a search solr return the unique ID's of documents that match the search criteria and then I serve them from the database

You will need to add the filename as a stored field. Look at your schema.xml and make sure you declare a field of type string and set the stored attribute to true. By setting stored=true you will ensure that Solr can return the field back in results.
See this page for more information: http://wiki.apache.org/solr/SchemaXml

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

reindexing json object into solr by adding only unique elements - solr

I have indexed json object into solr using httpclient and when I tried to index again, duplicate records are getting indexed. So how to update the records into solr, everytime I index I want to update the records. Thanks in advance

Related

how to extract two json indexed documents separately using solrj or QueryResponse

Solr Indexing duplicate documents

Elasticsearch Unique field

changing solr id from string to uuid

Solr return file name

Categories

Resources