I have a question regarding solr queries.
My query basically contains thousand of OR conditions for authors (author:name1 OR author:name2 OR author:name3 OR author:name4 ...)
The execution time on my index is huge (around 15 sec). When i tag all the associated documents with a custom field and value like authorlist:1 and then i change my query to just search for authorlist:1 it executes in 78 ms. How come there is such a big difference in exec-time?
Can somebody please explain why there is sucha difference (maybe the query parser?) and if there is a way to speed this up?
Thx for the help
Related
We are using Solr on Windows with multiple collections. Collections are having multiple stored and indexed fields with appx 200k documents. The use case is for e-commerce website search. The size of index is appx. 200 MB
While the normal search takes less than few ms, the query where I need to find all data for multiple categories are taking somewhere around 1100ms to 1200ms. The query includes appx. 400 categories with OR something like..
Category:(5 OR 33 OR 312 OR 1192 OR 1193 OR 1196 OR .....)
I have increased Heap Size to 4gb, and configured Solr cache value to be on higher size, this reduced the query time from 2000ms to 1100ms, but we are looking for more improvement.
I also found following on Solr UI:
lockFactory=org.apache.lucene.store.NativeFSLockFactory#56761b2a; maxCacheMB=48.0 maxMergeSizeMB=4.0
But not sure does that impact? And if Yes, how to change that?
Can you advise what else we can do? Let me know if you need more details.
Thank you in anticipation.
you should add your full request so it's easier to give some advice. But, from your sentence "The query includes appx. 400 categories with OR something like.." I understand you are putting your huge clause in q param? That is not the right approach.
Instead use q=* :* and put your clause in fq. This way, it will be cached, and subsequent queries will be much faster. If you get a good cache hit rate, queries will be significantly faster.
As a second thing you might try (but go first with the above) could be transforming the big OR clause into a (or a combination of) range clauses, as:
Category:[5 TO 1190] OR Category:[1192 TO 1196]
If your type is an tint, and you can transform the clause into ranges combination by significantly decreasing it's size, it might work too
I am working with Solr facet fields and come across a performance problem I don't understand. Consider these two queries:
q=word&facet.field=CONTENT&facet=true&facet.prefix=&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0
q=word&facet.field=CONTENT&facet=true&facet.prefix=a&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0
The only difference is an empty facet.prefix in the first query.
The first query returns after some 20 seconds (QTime 20000 in the result) while the second one takes only 80 msec (QTime 80). Why is this?
And as side note: facet.method=fc makes the queries run 'forever' and eventually fail with org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field CONTENT.
This is with Solr 1.4.
From this doc:http://docs.lucidworks.com/display/solr/Faceting
The facet.prefix parameter limits the terms on which to facet to those
starting with the given string prefix.
that means that you facet by less terms.
Now, I'm quite sure the faceting time is included in the Qtime (as seems demonstrated by this post: http://www.mail-archive.com/solr-user#lucene.apache.org/msg39859.html).
So that means less terms, less time.
Maybe not facet on CONTENT as this probably has many different terms and makes no sense faceting on. Try faceting on a category field or some other field with less unique terms.
Have you tried executing them in the opposite order after a fresh restart of Solr server?
Usually the first query takes more time and if the next queries happen to have more in common with any of the previous, there'd be cache-hits and response time would be incredible.
In addition, please note that 'enum' is more suitable for facet-fields with less number of unique terms within.
Also, try increasing filter-cache. to a really big number and check your cache-hit ratio at
SOLR_DOMAIN:PORT/solr/#/collection1/plugins/cache?entry=fieldValueCache,filterCache
We are using solr 4.3 for search funcationlity. We have configured 2 shard and 2 replicas.
We have total 132905 Solr documents in index.
Our search query is very long takes around 3 second (from Solr Admin Console) for below query.
id:(FOLD5002861 FOLD5002890 FOLD5219963 FOLD4105003 FOLD4105005 FOLD4105006 FOLD4105007 FOLD4105008 FOLD4105009 FOLD4105010 FOLD4105011 FOLD4105012 FOLD4105013 FOLD4105014 FOLD4105018 FOLD4105019 FOLD4105020 FOLD4105021 FOLD4105022 FOLD4105023 FOLD4105024 FOLD4105025 FOLD4105026 FOLD4105027 FOLD5220166 FOLD5220168 FOLD5220169 FOLD5220170 FOLD5220171 FOLD5220172 FOLD5220173 FOLD5220174 FOLD5220175 FOLD5220176 FOLD5220177 FOLD5220178 FOLD5220179 FOLD5220180 FOLD5220181 FOLD4100876 FOLD4100877 FOLD4100878 FOLD4100879 FOLD4100880 FOLD4100881 FOLD4655426 FOLD4655428 FOLD4655429 FOLD4655430 FOLD4655431 FOLD4655432 FOLD4655433 FOLD4655434 FOLD4655435 FOLD4655436 FOLD4655437 FOLD4655438 FOLD4655439 FOLD4655483 FOLD4655487 FOLD4655523 FOLD4655874 FOLD4655884 FOLD4655856 FOLD4655858 FOLD4655859 FOLD4655860 FOLD4655861 FOLD4655862 FOLD4655863 FOLD4655864 FOLD4655865 FOLD4655866 FOLD4655867 FOLD4655868 FOLD4655869 FOLD4655870 FOLD4655871 FOLD4655872 FOLD4655882 FOLD4655892 FOLD4649510 FOLD4649512 FOLD4649513 FOLD4649514 FOLD4649515...50000 times)
We want to trace where it is taking time. we tried debugQuery option in solr Admin console but not getting useful information.
Is there any way to improve the query? How can we track detail timing?
If you transform this query into a filter query it will give better performance most of the time because it will apply the filter on top of a query (that is a subset of your index and that can be cached) and not on the entire index.
The best would be if you had another piece of query that you could run with this and then apply this as a filter on top of that , but if you donøt have it also running a query for field:[* TO *] should still give better performances.
have a look at this question that explain the difference between filter query and normal query:
SOLR filter-query vs main-query
Take a look at some factors that affect Solr performance.
Try to optimize your index with:
curl http://<solr_host>:<port>/solr/<core_name>/update -F stream.body=' <optimize />'
I have a solr index which contains approx 10 million web discussion threads. The Solr operates in Reader-Writer mode. I have another process which queries solr for different keyword queries. Keywords can be of following formats:
A
A AND B AND C.....
A AND B AND C.... AND Z NOT AA NOT AB NOT AC......
The final Solr Query somewhat looks like this
text:( "Keyword A" OR "Keyword B" OR "Keyword C" ...) AND source: (source1 OR source2 OR source3...) AND date:[date1 TO date2]
There are around 100 such different combination which are queried on solr. The selection of query combination depends on the number of results each query it returned.
The query somehow seems to take a lot of time. Sometimes it is in minutes (2 - 15 min). The use of cache seems to be difficult as very rarely a query is picked up back to back by scheduling thread.
How can I reduce the time taken for Solr Queries?
We're running Solr 3.4 and have a relatively small index of 90,000 documents or so. These documents are split over several logical sources, and so each search will have an applied filter query for a particular source, e.g:
?q=<query>&fq=source:<source>
where source is a classic string field. We're using edismax and have a default search field text.
We are currently seeing q=* taking on average 20 times longer to run than q=*:*. The difference is quite noticeable, with *:* taking 100ms and * taking up to 3500ms. A search for a common word in the document set (matching nearly 50% of all documents) will return a result in less than 200ms.
Looking at the queries with debugQuery on, we can see that * is parsed to a DisjunctionMaxQuery((text:*)), while *:* is parsed to a MatchAllDocsQuery(*:*). This makes sense, but I still don't feel like it accounts for a slowdown of this magnitude (a slowdown of 2000% over something that matches 50% of the documents).
What could be causing this? Is there anything we can tweak?
When you are passing just * you are ordering to check every value in the field and match it against * and that is a lot to do. However when you are using * : * you are asking Solr to give you everything and skip any matching.
Solr/Lucene is optimized to do * : * fast and efficient!