I just started exploring SolrNet. Previously I have been using MSSQL FULL TEXT.
In sql server, my query make full text searches and also have multiple joins and Where clauses. I am also using custom paging to return only the 10 rows out of millions.
I have read few solrNet docs and run sample apps provided on the blogs. All worked well so far. Just need to get an idea, What do I do with JOINS and WHERE clauses??
e.g. If user searches for Samsung, db would return 100k records, but if users searches for Samsung && City='New york' && Price >'500' then he would only get couple of thousands records.
Do I add all columns in Solr and write WHERE clauses in Solr?
What do I do about SQL JOINS?
Thanks in Advance!
There are no joins in Solr. From the Solr wiki:
Solr provides one table. Storing a set
database tables in an index generally
requires denormalizing some of the
tables. Attempts to avoid
denormalizing usually fail.
About WHERE clauses (i.e. filtering), see Querying in SolrNet, Solr query syntax, and Common Solr query parameters.
The Solr equivalent of your where clauses is to map your columns to fields and run queries based on the query syntax. A query like your example:
Samsung && City='New york' && Price >'500'
could be translated to something like this in Solr:
q=Samsung AND city:"new york" AND price:[500 TO *]
You need to take some care when you map your database to a Solr schema, specifically you will probably have to denormalize your data. See this page on the Solr wiki for more information. Basically, you can't really do complex JOINs in Solr. It's a "flat" index.
Related
I have multiple collections with schemas almost same. I'd like to apply certain conditions specific to each collection while other conditions are same across all collections and return a combined result set. Is this possible in Solr? Appreciate if you can share a sample query. I'm using Solr 5.3.0.
You're going to have issues with the "certain conditions specific to each collection", as there is no query support for anything like that. You're probably going to have do to the querying and merging yourself.
Otherwise as possible solution would be the "shard unification" strategy as mentioned in Query multiple collections with different fields in solr, but scoring between documents would be local to each shard.
I am currently migrating a SOLR app to Elasticsearch and have become stuck on a particular query. The ElasticSearch documentation is rather vague on how to achieve my desired result.
Currently I am trying to convert tagged "fq's" (filter queries) from SOLR into Elasticsearch. I need to be able to return from Elasticsearch facets (now known as aggregations) based on my query and filters but also show aggregations for other options in a search
Although this sounds complicated it is achieved in SOLR simply by adding an "fq" parameter and tagging the filter as follows:
q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=on&facet.field={!ex=dt}doctype
From the main SOLR help docs this will filter on "doctype:pdf" but also include counts for other doc types in the facet output - again this works fine for me, I am simply trying to recreate this in Elasticsearch.
So far I have tried a "post_filter" which does the job until I wish to apply anymore than one filter (again something SOLR handles with no problems). You can see an example of how this works and how I want to achieve it at:
https://www.jobsinhealthcare.co.uk/search?latitude=&longitude=&title=&location=&radius=5&type=&salary=0&frequency=year&since=&jobtype=&keywords=&company=&sort=Most+recent&filter[contract_type_estr][33d5667c]=Temporary&filter[job_type_estr][5d370027]=Part+time&filter[job_type_estr][4b45bd05]=Full+time
IN the filters/facets on the Right of the results you can select multiple "contract type" and/or "job type" and/or "location" and still be shown the facet counts for unselected queries/filters. Please note that Hourly Salary, Annual Salary and Date Added do NOT have this functionality - this is by design.
Any pointers as to how I should be structuring my query would be greatly apprreciated.
I think what you need is global aggregation (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-global-aggregation.html). Inside top level aggregation you should use filter aggregation (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html) as a sub-aggretation to filter only "status:public".
Let's assume the following scenario.
Lucene document: ArticleDocument
Fields: {Id, text, publisherId}
A publisher can publish multiple articles.
Problem
I would like to build word clouds (most frequent words, shingles) for each Publisher Id.
After my investigation, I could find ways to get most frequent terms for the entire Index or a document but not for a subset of documents. I found a similar question but that's Lucene 2.x and I'm hoping there exists an effective way in recent Lucene.
Please could you guide me with a way to perform that in Lucene 4.x (preferred) or 3.x (latest in version 3).
Please note that I cannot make each Publisher a document with all the articles being appended to a field.
That's because I would like to have those words in the cloud to be searchable with corresponding articles (by same publisher id) being the results.
I'm not sure whether maintaining two types of lucene documents (article and publisher) is a good idea in terms of maintenance and performance.
Use Pivot Faceting available in Solr 4.X releases. Pivot faceting allows you to facet within the results of the parent facet.
Generate Shingled token for "text" field at indexing time using Shingle Filter Factory.
For faceting add facet=true&facet.pivot=publisherid,text parameters in your query.
Sample query:
http://localhost:8983/solr/collection1/select?q=*:*&wt=json&indent=true&facet=true&facet.pivot=publisherid,text
Query will return frequent shingles/words with frequency for each "publisherid".
I just wonder what I am supposed to use for a lookup by ID.
This thread Solr Query on Unique Integer Field seems to use a query.
But it is not what is said here: Search document by id very slow
And here: http://lucene.472066.n3.nabble.com/Solr-Unique-Key-Field-Should-Apply-on-q-search-or-fq-search-td4003066.html
But I'm not reusing the same query because the lookups can be on any ID
ElasticSearch provides natively a lookup by id, someone knows what is happening under the hood of ElasticSearch lookup so that I can eventually use the same strategy with Solr?
Thanks
You should still use Filter Query cause you may search the same id some time later which would be much faster in performance in comparison to the q query
fq - Provide an optional filtering query.
Results of the query are restricted to searching only those results returned by the filter query. Filtered queries are cached by Solr.
They are very useful for improving the speed of complex queries.
I got two tables, one is news, the other is contact.
news: newsid, news_content, news_orgid
contact: contactid, contact_orgid
I indexed these two tables in solr, so i have two cores.
but i have a use case that i need to find out all contactids by news_content.
I get a large set of orgids from news index first, approximate 1 million. I want to use it as a filter query in solr to search, like
select?q=:&fq=id:100+id:101+id:102+id:103+id:104
but solr has a limit of 1024 boolean queries. So i can't transfer in one process. Is there another ways to fix this?
Because i want to use solr's facet data, i can't search all data in solr and compare with the ids.
Appreciate your help!
Best Regards! Rick.
I solve this problem with solr4.0 new feature: join. First i put contact and news in one core, as http://searchhub.org/2011/02/12/solr-powered-isfdb-part-4/ , then we can join with orgids