Solr Query Max Condition - solr

I am using solr 4.3.0 for my web site search. I want to do something using solr but when I query, I get an error. In my situation I have 40000 products, and I want to excludes 1500 products with query. This is the my query
-brand-slug:reebok OR -brand-slug:nike AND
-skuCode:(01-117363 01-117364 01-117552 01-119131 01-119166 01-1J622 01-1J793 01-1M4434 01-1M9691 01-1Q279 01-1T405 01-1T865 01-2109830 01-2111116 01-2111186 01-21J625 01-21J794 01-21V019 01-2M9691 01-2M9696 01-33J793 01-519075 01-M4431 01-M7652 01-M9160 01-M9165 01-M9166 01-M9613 01-M9622 01-M9697 01200CY0001N00 01211SU0141M00 01212KU0009N00 01212KU0010N00 01212KU0025N00 01212KU0027N00 01212KU0038N00 01212KW0019N00 01212KW0020N00
....thousands of skuCodes)
If I put 670 skuCodes in their that will works good, but I use 1500 skuCodes is an error like
Solr HTTP error: OK (400)
How could I solve this problem? Thanks

What a night :) I solved my problem. Actually there was 2 problems in my system. First problem is in my tomcat server. I increase their request size with change maxHttpHeaderSize="65536". ( You could change your web server buffer size I changed my nginx conf). The other problem is about solr config. I got an error like 'too many boolean clauses'. If you get this error, you could change maxBooleanClauses in solrconfig.xml. After restart my tomcat server everything was ok.

Related

Nutch to allow when the host name is having portnumber

I am using nutch to push and index data to solr. In nutch, i have added abc.com:85 to domain-urlfilter.txt and +^http://abc\.com\:85 to regex-urlfilter.txt.
The problem is that nutch is not indexing data and it is throwing this message Total number of urls rejected by filters:1
Here in the url, i need the portnumber ,this configuration is done.
Could you please let me know how to make nutch work with the port number :85 added.
The problem is the syntax: +^http://abc\.com\:85 is not correct. Please check the syntax here: Nutch regex-urlfilter syntax
Hope this helps,
Le Quoc Do

ArrayIndexOutOfBoundsException seen in SOLR on running ToParentBlockJoinQuery

On running the following query with
q=: and fq=(
{!parent which=type:scrap v='visibility:show AND (stock:1)'}
) we ran into the following exception :
java.lang.ArrayIndexOutOfBoundsException: 33554431\n\tat
org.apache.lucene.util.FixedBitSet.get(FixedBitSet.java:150)\n\tat
org.apache.lucene.search.join.ToParentBlockJoinQuery$BlockJoinScorer.nextDoc(ToParentBlockJoinQuery.java:284)\n\tat
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:177)\n\tat
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:148)\n\tat
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)\n\tat
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:592)\n\tat
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:284)\n\tat
org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1264)\n\tat
org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:951)\n\tat
org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1109)\n\tat
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1630)\n\tat
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1506)\n\tat
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:586)\n\tat
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:511)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:227)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2006)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\n\tat`
This used to run perfectly fine before. We increased the memory for SOLR from 3000M to 4096M and ran Optimise again and things seemed to work fine. I want to understand if this can occur again and the steps we need to take to prevent it from occurring
Suggest you raise this to the lucene / solr users mailing list. See here for more details http://lucene.apache.org/solr/resources.html

Solr times out when I try to rebuild index for django

I am trying to build my solr index for Django on ubuntu for the first time with ./manage.py rebuild_index and I get the following error:
Removing all documents from your index because you said so.
Failed to clear Solr index: Connection to server 'http://localhost:8983/solr/update/?commit=true' timed out: HTTPConnectionPool(host='localhost', port=8983): Request timed out. (timeout=10)
All documents removed.
Indexing 4 dishess
Failed to add documents to Solr: Connection to server 'http://localhost:8983/solr/update/?commit=true' timed out: HTTPConnectionPool(host='localhost', port=8983): Request timed out. (timeout=10)
I have access to localhost:8983/solr/ and localhost:8983/solr/admin via my web browser
You can bump up the TIMEOUT in settings.py.
For example
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
'URL': 'http://127.0.0.1:8080/solr/default',
'INCLUDE_SPELLING': True,
'TIMEOUT': 60 * 5,
},
}
Important thing here is that you shoudn't increase default timeout, because it could possibly block all your workers as haystack works synchronously.
The best way to avoid this is to define multiple connections for reads and writes with different timeouts and define.
http://django-haystack.readthedocs.org/en/latest/settings.html#haystack-connections
And use routers for read and write separation http://django-haystack.readthedocs.org/en/v2.4.0/multiple_index.html#automatic-routing

URL returned non 200 response code

I am Indexing Documents by using SolrPhpClient library. while making POST request by Solr using extract function reply with
URL http://localhost/moodledemo/pluginfile.php/99/course/overviewfiles/pre bio- data.docx returned non 200 response code".
It happens only if the file name include space. if the filename don't have any space it goes well.
I don't conclude why it returns non 200 response with files that have space in their names. while accessing the same path works in browser.
Using rawurlencode() Solved the Issue..
may be the jetty server was responding with non 200 response code with the filename:-
How are you.doc
Using rawurlencode()
How%20are%20you.doc
Thanks :)

committed before 500 null error in solr 3.6.1

In solr 3.6.1, At some point am getting the following error when concurrent request(concurrent load test) performed against the solr server.
org.apache.solr.common.SolrException log
SEVERE: org.mortbay.jetty.EofException
Caused by: java.net.SocketException: Broken pipe
and
Committed before 500 null||org.mortbay.jetty.EofException|?at
org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)|?at
org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)|?at
org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)|?at
sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278)|?at
sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)|?at
java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)|?at
org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115)|?at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:353)|?at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:273)|?at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)|?at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)|?at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)|?at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)|?at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)|?at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)|?at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)|?at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)|?at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)|?at
org.mortbay.jetty.Server.handle(Server.java:326)|?at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)|?at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)|?at
org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)|?at
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)|?at
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)|?at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)|?at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)|Caused
by: java.net.SocketException: Broken pipe|?at
java.net.SocketOutputStream.socketWrite0(Native Method)|?at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)|?at
java.net.SocketOutputStream.write(SocketOutputStream.java:136)|?at
org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368)|?at
org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129)|?at
org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:161)|?at
org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714)|?... 25
more
Kindly suggest any idea to resolve this error from solr ?
I don't think it's your solr, the broken pipe happens (happened to me, at least) because of a timeout problem with the client.
Check for your curl timeout value and try to set explicitly a keep-alive Tomcat so you can avoid this situation again.
quick update (just to give a hint, configuration may vary)
in your jetty folder, you should look for a folder named WEB-INF that should contain a file named jetty-web.xml (or web-jetty.xml)
adding these lines:
<session-config>
<session-timeout>720</session-timeout>
</session-config>
should help you (change 720 in what you like more)
there's also the option
<Set name="maxIdleTime">300000</Set>
that may do your trick. You'll have to dig into jetty's doc a lot to figure out this for your case
more about this: here and here

Resources