I have a bespoke CMS that needs to be searchable in Solr. Currently, I am using Nutch to crawl the pages based on a seed list generated from the CMS itself.
I need to be able to add metadata stored in the CMS database to the document indexed in Solr. So, the thought here is that the page text (html generated by the CMS) is crawled via Nutch and the metadata is added to the Solr document where the unique ID (in this instance, the URL) is the same.
As such, the metadata from the DB can be used for facets / filtering etc while full-text search and ranking is handled via the document added by Nutch.
Is this pattern possible? Is there any way to update the fields expected from the CMS DB after Nutch has added it to Solr?
Solr has the ability to partially update a document, provided that all your document fields are stored. See this. This way, you can define several fields for your document, that are not originally filled by nutch, but after the document is added to solr by nutch, you can update those fields with your database values.
In spite of this, I think there is one major problem to be solved. Whenever nutch recrawls a page, it updates the entire document in solr, so your updated fields are missed. Even in the first time, you must be sure that nutch first added the document, and then the fields are updated. To solve this, I think you need to write a plugin for nutch or a special request handler for solr to know when updates are happening.
Related
Hi I am new to Solr and I'm trying to get my bearings.
Using Solr in my case might not be the best idea or might be a bit overkill but this is just for testing to see how to use it.
I would like to create a database which handles users posts and pages, in mongodb I would have created a collection for users, a collection for post and a collection for pages which would obviously contain the individual documents.
I don't know how would I be able to replicate that in Solr . I have created a core for users which I thought is like a collection in mongodb. To add a post on Pages, do I then create a new core for each or is there another way to separate the data?
Thank you for the advice
Yes you can have separate collection in solr as well.
With the latest version of solr where you can use solr cloud and create multiple collections.
Each collection can handle a separate entity.
Please refer the below links for more details
Solr Collection API
Solr Collection Management
I am a new learner of Solr. Now I want to make my own schema.xml. So I add some fields. I stop the solr and restart it. In the admin of solr, I can see the changes in the schema choice. But the content of schema browser doesn't changes. And when I want to index some document. There is an error that says there is no field which I just added in the schema. The content of schema browser is not same as the schema file.
Changing the schema of a core doesn't change the documents you already have there, which is why they look the same even after you restart the Solr service. You need to re-upload the documents with the new fields specified (if they are required fields) after you make a schema change to get these new fields for existing documents.
from here I went to the path of my core instance to make the changes.
/usr/local/Cellar/solr#7.7/7.7.3_1/server/solr/drupal
then I was able to confirm the changes by clicking on Files and scrolling to where I made the change.
Is there any way Solr can throw exception back, either in the status or exception message somehow, for an update request that having an existing unique key. Right now, Solr just sends back a good update message with status 0 while its not adding the document. I need an ability to tell from the client side that if a document was not added because of the duplicate unique key issue.
Thanks!
If a document with the unique id exists, solr just updates the doc. It is by design and as far as i know, there is no way to change it.
You can solr query before you update/add a doc, so that you are not adding it again... but that is not really transactional (solr is not a database).It'd work if you are the only one updating solar and the changes are serialized etc.
If you have this stringent requirement on not adding existing ids, you could use an intermediary database, load it and reindex solr from that..?
I am trying to index my database data with SOLR and I am successfully indexed it.
What I need is:
I need to put URLs with every results.
The URLs for each result item will be different.
Each result item need to append its item_id (which is available as a field) with its URL.
I am very new to SOLR configurations and SOLR query, so please help to implement a better search result XML.
Thanks in advance.
You can store URL in an additional field (stored=true indexed=false) and then simply retrieve it when you're searching.
Even better if you can compose URLs yourself (if they differ only in ID/primary key) by appending document ID to some fixed URL, that's certainly a better way to go.
That would include altering your page which displays search results.
What kind of application is your Solr integrated with?
Where are those documents of yours stored? In a db? How do you get to them through your application?
I hava a problem with solr,i wanna sort the search result by the fields in solr's document and
some fields in db. is there anybody who can help me,thanks!
Add those fields you have in your database to Solr, then let Solr do the sorting.