Query Parser Solr , Difference between standard query parser and dismax query parser - solr

I gone through dismax query parser and standard query parser and found the standard query parser is different in handling error and hence more prone to error.so what are the different area in which one is powerful than other and what is the specific difference between them.

The key advantage of the standard query parser is that it supports a
robust and fairly intuitive syntax allowing you to create a variety of
structured queries. The largest disadvantage is that it’s very
intolerant of syntax errors, as compared with something like the
DisMax query parser which is designed to throw as few errors as
possible.
Standard Query parses is also known as Lucene query parser, so it's expect queries to be following correct syntax.
The DisMax query parser is designed to process simple phrases (without
complex syntax) entered by users and to search for individual terms
across several fields using different weighting (boosts) based on the
significance of each field. Additional options enable users to
influence the score based on rules specific to each use case
(independent of user input).
In general, the DisMax query parser’s interface is more like that of
Google than the interface of the 'lucene' (aka Standard) Solr query
parser. This similarity makes DisMax the appropriate query parser for
many consumer applications. It accepts a simple syntax, and it rarely
produces error messages.
The DisMax query parser supports an extremely simplified subset of the
Lucene QueryParser syntax. As in Lucene, quotes can be used to group
phrases, and +/- can be used to denote mandatory and optional clauses.
All other Lucene query parser special characters (except AND and OR)
are escaped to simplify the user experience. The DisMax query parser
takes responsibility for building a good query from the user’s input
using Boolean clauses containing DisMax queries across fields and
boosts specified by the user. It also lets the Solr administrator
provide additional boosting queries, boosting functions, and filtering
queries to artificially affect the outcome of all searches.
For more information on Standard Query Parser - https://lucene.apache.org/solr/guide/7_6/the-standard-query-parser.html , on DisMax - https://lucene.apache.org/solr/guide/7_6/the-dismax-query-parser.html

Related

Apache Solr use entire string for search within collection

I have managed to create a dataset using Apache Solr. I have also managed to make queries, such as in this example:
content:(test1 OR test2) OR title: test2
I would now like to search the dataset using an entire string, in similar fashion to searching on google. Is the correct way to approach this to keep using or tags on the title and content for each word within the query, or is there a better way to achieve this ? (I am not looking for exact matches, just the most relevant ones)
You can use dismax or edismax for your approach and can pass the phrases if you have with the boosting.
The DisMax query parser is designed to process simple phrases (without
complex syntax) entered by users and to search for individual terms
across several fields using different weighting (boosts) based on the
significance of each field. Additional options enable users to
influence the score based on rules specific to each use case
(independent of user input).
The detailed parameters are found on the solr page at Solr Dismax

Which Solr QueryParser is the fastest for a simple query?

I have a query like sku:(123 456 ... 999) with 9000 skus in it. Sku is a "string" type. Which QueryParser should I use to get maximum performance from Solr?
Solr version is 7.2
If these SKUs are defined as the uniqueKey for your document, you can use the Realtime Get endpoint and bypass almost everything in Solr. That'll probably be the most performant way of handling it, but it changes the expected behavior slightly (non-committed documents are returned as well, for example).
Otherwise the performance difference will probably be neglible, so go with the standard Lucene query parser. If you want to optimize it further, it's probably better to look at the query profile (i.e. if it's the same set of 9000 SKUs being requested - index a tag for those SKUs instead and query for that).
In all cases this can differ based on your document profile and your server's performance, so the strategy is usually to test it for your specific use case and get timing information for your own infrastructure.

Extract query terms from text for querying Solr server

I am using Solrj to build queries for Solr server.
So I have some pretty short free-form texts that can contain various special characters - like Mr. John's New-Wall, "Hotels & Food".
A phrase query for text like this would not produce enough matches. So from this text I would like to extract terms for building a simple query, something like content:Mr OR content:John's OR content:Hotels OR content:Food. (It probably would be good to somehow consider the term proximity, but I have to start with something).
The field that I am searching is the default text_general field. I started with replacing some special characters with spaces and splitting them up to extract the terms. But it feels kind of redundant.
Isn't there an easier way to extract terms from text using Solrj and Solr? Basically I would like to extract terms from text similarly to how it is done by Solr when it creates its index.
I am not sure exactly what your question is, however here is a bit of info that you may find helpful:
Basically I would like to extract terms from text similarly to how it is done by Solr when it creates its index.
You can configure indexing and query field processing in your schema. I would suggest you take a look in here. This gives you a bit of flexibility to normalize your data.
So from this text I would like to extract terms for building a simple query, something like content:Mr OR content:John's OR content:Hotels OR content:Food.
This is the default way that solr queries under the hood. I would suggest you look up edismax query parser and qf and tie parameters.
Hope it helps

Solr doesnot accepts unparsed query

I have added some documents in my solr index using requestHandler and now I am trying to query them from the web UI, I am getting the correct result when my query parameter is in the fomat
[id]:[search-item]
but i want to search it without parsing in this format, so for example i have to search for cat, i just type "cat" and it gives me the result, and not "animal:cat",
I am new to solr so I am not very sure, where am I going wrong
Use the DisMax query parsers/handlers
Extract from DisMax documentation
The DisMax query parser is designed to process simple phrases (without
complex syntax) entered by users and to search for individual terms
across several fields using different weighting (boosts) based on the
significance of each field. Additional options enable users to
influence the score based on rules specific to each use case
(independent of user input).
In general, the DisMax query parser's interface is more like that of
Google than the interface of the 'standard' Solr request handler. This
similarity makes DisMax the appropriate query parser for many consumer
applications. It accepts a simple syntax, and it rarely produces error
messages.
Also see DisMax and full documentation of the DisMax query parser here

Complex queries with Solr 4

I would like to fire complex queries in Solr 4. If I am using Lucene, I can search using XML Query parser and get the results I need. However, I am not able to see how to use the XML Query Parser in Solr.
I need to be able to execute queries with proximity searches, booleans, wildcards, span or, phrases (although these can be handled by proximity searches).
Guidance on material on how to proceed also welcome.
Regards
Puneet
As far as I know it's still a work in progress. More info can be found at their Jira. You can of course use the normal query language, it's also capable of doing pretty complex things, for example:
"a proximity search"~2 AND *wildcards* OR "a phrase"
As you can see you can search for phrases, boolean operators (AND, OR, ...), span, proximity and wildcards. For more information about the query syntax look at the Lucene documentation. Solr also added some extra features on top of the Lucene query parser and more information about that can be found at the Solr wiki.
Solr 4.8 now has the "complexphrase" query parser built in that can construct all sorts of complex proximity queries (i.e. phrase queries with embedded boolean logic and wildcards).
you can use the query url as
http://xx.xxx.xx.xx:8983/solr/collectionname/select?indent=on&q=
{!complexphrase%20inOrder=true}"good*"&wt=json&fl=Category,keywords,ImageID

Resources