I am trying to index my database data with SOLR and I am successfully indexed it.
What I need is:
I need to put URLs with every results.
The URLs for each result item will be different.
Each result item need to append its item_id (which is available as a field) with its URL.
I am very new to SOLR configurations and SOLR query, so please help to implement a better search result XML.
Thanks in advance.
You can store URL in an additional field (stored=true indexed=false) and then simply retrieve it when you're searching.
Even better if you can compose URLs yourself (if they differ only in ID/primary key) by appending document ID to some fixed URL, that's certainly a better way to go.
That would include altering your page which displays search results.
What kind of application is your Solr integrated with?
Where are those documents of yours stored? In a db? How do you get to them through your application?
Related
Hi I am new to Solr and I'm trying to get my bearings.
Using Solr in my case might not be the best idea or might be a bit overkill but this is just for testing to see how to use it.
I would like to create a database which handles users posts and pages, in mongodb I would have created a collection for users, a collection for post and a collection for pages which would obviously contain the individual documents.
I don't know how would I be able to replicate that in Solr . I have created a core for users which I thought is like a collection in mongodb. To add a post on Pages, do I then create a new core for each or is there another way to separate the data?
Thank you for the advice
Yes you can have separate collection in solr as well.
With the latest version of solr where you can use solr cloud and create multiple collections.
Each collection can handle a separate entity.
Please refer the below links for more details
Solr Collection API
Solr Collection Management
Today I don't store the URL of my pages in the index of Azure Search. I am still coping what is the best solution to solve this. So, when a search is returned from Azure Search, all I receive is the Id and no URL to the pages :/.
So, to retrieve the URL, what would be a proper way to solve this?
Store the urls in the index and retrieve it with the search;
Do an additional query to the database with the data returned from Azure Search and retrieve the URL.
???
THANKS
You definitely want to store enough information in the search index to avoid an extra hit to a different database/store if possible.
If your URLs follow a consistent pattern and only a part changes (e.g. the document id or something like that), you can store only the variable part and construct the final URL when rendering results. If your URLs cannot be turned into a pattern, you can just store the whole URL in a field in the Azure Search index.
When storing data used for presentation (URLs, external keys, etc.) it's a good idea to ensure you disable all options related to fast search/filtering (searchable, filterable, sortable, facetable, etc.) and only leave retrievable enabled. That way you minimize the use of resources caused by the extra field, but you have the data at hand to avoid an extra roundtrip during results rendering.
I have a bespoke CMS that needs to be searchable in Solr. Currently, I am using Nutch to crawl the pages based on a seed list generated from the CMS itself.
I need to be able to add metadata stored in the CMS database to the document indexed in Solr. So, the thought here is that the page text (html generated by the CMS) is crawled via Nutch and the metadata is added to the Solr document where the unique ID (in this instance, the URL) is the same.
As such, the metadata from the DB can be used for facets / filtering etc while full-text search and ranking is handled via the document added by Nutch.
Is this pattern possible? Is there any way to update the fields expected from the CMS DB after Nutch has added it to Solr?
Solr has the ability to partially update a document, provided that all your document fields are stored. See this. This way, you can define several fields for your document, that are not originally filled by nutch, but after the document is added to solr by nutch, you can update those fields with your database values.
In spite of this, I think there is one major problem to be solved. Whenever nutch recrawls a page, it updates the entire document in solr, so your updated fields are missed. Even in the first time, you must be sure that nutch first added the document, and then the fields are updated. To solve this, I think you need to write a plugin for nutch or a special request handler for solr to know when updates are happening.
I have a use case where a query needs to come only from a few web sites (I am building some kind of e-commerce search and there are products from different retailer web sites) and those few web sites can be different (actually most of the time it will be different). So I am OR'ing a few sites in the filter something like this:
fq=site:"aaa.com"+OR+site:"bbb.com"+OR+site:"ccc.com"+OR+site:"ddd.com"
This is too slow. Any help would be appreciated.
I am guessing that site is a text field and the double quotes are making it a phrase query. Make site a string field. Then use:
fq=site:(aaa.com OR bbb.com OR ccc.com OR ddd.com)
If you cannot make site a string field, keep a copyField of site which is of string type and execute the above query on that field.
I hava a problem with solr,i wanna sort the search result by the fields in solr's document and
some fields in db. is there anybody who can help me,thanks!
Add those fields you have in your database to Solr, then let Solr do the sorting.