solr ReplicationHandler - SnapPull failed to download files - solr

we are continuously getting this exception during replication from master to slave.
our index size is 9.7 G and we are trying to replicate a slave from scratch.
30 Oct 2013 18:22:16,996 [explicit-fetchindex-cmd] ERROR ReplicationHandler - SnapPull failed :org.apache.solr.common.SolrException: Unable to download _41c_Lucene41_0.doc completely. Downloaded 0!=107464871
at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1266)
at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1146)
at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:741)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:405)
at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:319)
at org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:220)
I read in some thread that there was a related bug in solr 4.1, but we are using solr 4.3 and tried with 4.5.1 also.
It seams that DirectoryFileFetcher can not download a file sometimes , the files is downloaded to the salve in size zero.
this is the master setup:
<requestHandler name="/replication" class="solr.ReplicationHandler" >
<lst name="master">
<str name="replicateAfter">commit</str>
<str name="replicateAfter">startup</str>
<str name="confFiles">stopwords.txt,spellings.txt,synonyms.txt,protwords.txt,elevate.xml,currency.xml</str>
<str name="commitReserveDuration">00:00:50</str>
</lst>
</requestHandler>
and the slave setup:
<requestHandler name="/replication" class="solr.ReplicationHandler" >
<lst name="master">
<str name="replicateAfter">commit</str>
<str name="replicateAfter">startup</str>
<str name="confFiles">stopwords.txt,spellings.txt,synonyms.txt,protwords.txt,elevate.xml,currency.xml</str>
<str name="commitReserveDuration">00:00:50</str>
</lst>
</requestHandler>

The problem appeared to be with httpclient.
I turned on debug logging for all libraries and saw a message "Garbage in response" coming from httpclient just before the failure.
this is a log snippet:
31 Oct 2013 18:10:40,360 [explicit-fetchindex-cmd] DEBUG DefaultClientConnection - Sending request: GET /solr-master/replication?comman
d=filecontent&generation=6814&qt=%2Freplication&file=_aa7_Lucene41_0.pos&checksum=true&wt=filestream HTTP/1.1
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - >> "GET /solr-master/replication?command=filecontent&generation=6814&qt
=%2Freplication&file=_aa7_Lucene41_0.pos&checksum=true&wt=filestream HTTP/1.1[\r][\n]"
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - >> "User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrServer]
1.0[\r][\n]"
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - >> "Host: solr-master.saltdev.sealdoc.com:8081[\r][\n]"
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - >> "Connection: Keep-Alive[\r][\n]"
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - >> "[\r][\n]"
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG headers - >> GET /solr-master/replication?command=filecontent&generation=6814&
qt=%2Freplication&file=_aa7_Lucene41_0.pos&checksum=true&wt=filestream HTTP/1.1
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG headers - >> User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrServer
] 1.0
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG headers - >> Host: solr-master.saltdev.sealdoc.com:8081
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG headers - >> Connection: Keep-Alive
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - << "[\r][\n]"
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG DefaultHttpResponseParser - Garbage in response:
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - << "4[\r][\n]"
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG DefaultHttpResponseParser - Garbage in response: 4
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - << "[0x0][0x0][0x0][0x0][\r][\n]"
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG DefaultHttpResponseParser - Garbage in response: ^#^#^#^#
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - << "0[\r][\n]"
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG DefaultHttpResponseParser - Garbage in response: 0
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - << "[\r][\n]"
31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG DefaultHttpResponseParser - Garbage in response:
31 Oct 2013 18:10:40,398 [explicit-fetchindex-cmd] DEBUG DefaultClientConnection - Connection 0.0.0.0:55266<->172.16.77.121:8081 closed
31 Oct 2013 18:10:40,398 [explicit-fetchindex-cmd] DEBUG DefaultClientConnection - Connection 0.0.0.0:55266<->172.16.77.121:8081 shut down
31 Oct 2013 18:10:40,398 [explicit-fetchindex-cmd] DEBUG DefaultClientConnection - Connection 0.0.0.0:55266<->172.16.77.121:8081 closed
31 Oct 2013 18:10:40,398 [explicit-fetchindex-cmd] DEBUG PoolingClientConnectionManager - Connection released: [id: 0][route: {}->http://solr-master.saltdev.sealdoc.com:8081][total kept alive: 1; route allocated: 1 of 10000; total allocated: 1 of 10000]
31 Oct 2013 18:10:40,425 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Releasing directory: /opt/watchdox/solr-slave/data/index 2 false
31 Oct 2013 18:10:40,425 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Reusing cached directory: CachedDir<>
31 Oct 2013 18:10:40,425 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Releasing directory: /opt/watchdox/solr-slave/data 0 false
31 Oct 2013 18:10:40,425 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Reusing cached directory: CachedDir<>
31 Oct 2013 18:10:40,427 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Releasing directory: /opt/watchdox/solr-slave/data 0 false
31 Oct 2013 18:10:40,428 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Done with dir: CachedDir<>
31 Oct 2013 18:10:40,428 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Releasing directory: /opt/watchdox/solr-slave/data/index.20131031180837277 0 true
31 Oct 2013 18:10:40,428 [explicit-fetchindex-cmd] INFO CachingDirectoryFactory - looking to close /opt/watchdox/solr-slave/data/index.20131031180837277 [CachedDir<>]
31 Oct 2013 18:10:40,428 [explicit-fetchindex-cmd] INFO CachingDirectoryFactory - Closing directory: /opt/watchdox/solr-slave/data/index.20131031180837277
31 Oct 2013 18:10:40,428 [explicit-fetchindex-cmd] INFO CachingDirectoryFactory - Removing directory before core close: /opt/watchdox/solr-slave/data/index.20131031180837277
31 Oct 2013 18:10:40,878 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Removing from cache: CachedDir<>
31 Oct 2013 18:10:40,878 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Releasing directory: /opt/watchdox/solr-slave/data/index 1 false
31 Oct 2013 18:10:40,879 [explicit-fetchindex-cmd] ERROR ReplicationHandler - SnapPull failed :org.apache.solr.common.SolrException: Unable to download _aa7_Lucene41_0.pos completely. Downloaded 0!=1081710
at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1212)
at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1092)
at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:719)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:397)
at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:317)
at org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:218)
31 Oct 2013 18:10:40,910 [http-bio-8080-exec-8] DEBUG CachingDirectoryFactory - Reusing cached directory: CachedDir<>
So I upgraded the httpcomponents jars to their latest 4.3.x version and the problem disappeared.
the httpcomponents jars which are dependencies of solrj where in the 4.2.x version, I upgraded to httpclient-4.3.1 , httpcore-4.3 and httpmime-4.3.1
I ran the replication a few times now and no problem at all, it is now working as expected.
It seams that the upgrade is necessary only on the slave side but I'm going to upgrade the master too.

Related

Apache2 Error AH00072 Reliably determine the server

i want to create a DB, and i'm used to PHPMYADMIN. I wanted to use PHPMYADMIN
for it, but after i install php, i got this error, any clues?
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2023-01-29 00:16:49 UTC; 6min ago
Docs: https://httpd.apache.org/docs/2.4/
Jan 29 00:16:49 ubuntu-4gb-hel1-2 apachectl[260775]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1. Set the 'ServerName' directive globally to suppress this message
Jan 29 00:16:49 ubuntu-4gb-hel1-2 apachectl[260775]: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:80
Jan 29 00:16:49 ubuntu-4gb-hel1-2 apachectl[260775]: (98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:80
Jan 29 00:16:49 ubuntu-4gb-hel1-2 apachectl[260775]: no listening sockets available, shutting down
Jan 29 00:16:49 ubuntu-4gb-hel1-2 apachectl[260775]: AH00015: Unable to open logs
Jan 29 00:16:49 ubuntu-4gb-hel1-2 apachectl[260772]: Action 'start' failed.
Jan 29 00:16:49 ubuntu-4gb-hel1-2 apachectl[260772]: The Apache error log may have more information.
Jan 29 00:16:49 ubuntu-4gb-hel1-2 systemd[1]: apache2.service: Control process exited, code=exited, status=1/FAILURE
Jan 29 00:16:49 ubuntu-4gb-hel1-2 systemd[1]: apache2.service: Failed with result 'exit-code'.
Jan 29 00:16:49 ubuntu-4gb-hel1-2 systemd[1]: Failed to start The Apache HTTP Server.

apache2 restart problem on ubuntu 20.04 LTS

I had installed nginx but now i uninstalled the nginx facing this problem
when i run command "system start apache2"
it through the massage run "journalctl -xeu apache2.service" to check error massage when i run the above command it give following massage
Oct 29 17:37:32 localhost apachectl[37192]: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:80
Oct 29 17:37:32 localhost apachectl[37192]: (98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:80
Oct 29 17:37:32 localhost apachectl[37192]: no listening sockets available, shutting down
Oct 29 17:37:32 localhost apachectl[37192]: AH00015: Unable to open logs
Oct 29 17:37:32 localhost apachectl[37189]: Action 'start' failed.
Oct 29 17:37:32 localhost apachectl[37189]: The Apache error log may have more information.
Oct 29 17:37:32 localhost systemd[1]: apache2.service: Control process exited, code=exited, status=1/FAILURE
I am not able to start my apache2 server

Apache 2 configtest failed: File name too long

I run an apache2 webserver locally on a raspberry-pi with raspbian distro.
It worked without a problem until suddenly it can't be started anymore with
sudo /etc/init.d/apache2 start, which yields the following error:
[....] Starting apache2 (via systemctl): apache2.serviceJob for
apache2.service failed. See 'systemctl status apache2.service' and
'journalctl -xn' for details. failed!
Removing and reinstalling with apt-get doesn't solve it.systemctl status shows the following entries:
> Apr 22 11:27:16 raspberrypi apache2[18234]: [344B blob data] Apr 22
> 11:27:16 raspberrypi apache2[18234]: [293B blob data] Apr 22 11:27:16
> raspberrypi apache2[18234]: [293B blob data] Apr 22 11:27:16
> raspberrypi apache2[18234]: [293B blob data] Apr 22 11:27:16
> raspberrypi apache2[18234]: [293B blob data] Apr 22 11:27:16
> raspberrypi apache2[18234]: Action 'configtest' failed. Apr 22
> 11:27:16 raspberrypi apache2[18234]: The Apache error log may have
> more information. Apr 22 11:27:16 raspberrypi systemd[1]:
> apache2.service: control process exited, code=exited status=1 Apr 22
> 11:27:16 raspberrypi systemd[1]: Failed to start LSB: Apache2 web
> server. Apr 22 11:27:16 raspberrypi systemd[1]: Unit apache2.service
> entered failed state.
journalctl -xn yields:
> -- Logs begin at Sun 2017-04-23 07:06:12 UTC, end at Sun 2017-04-23 12:38:48 UTC. -- Apr 23 12:38:30 raspberrypi apache2[3292]: [2.0K blob
> data] Apr 23 12:38:30 raspberrypi apache2[3292]: [1.7K blob data] Apr
> 23 12:38:30 raspberrypi apache2[3292]: Action 'configtest' failed. Apr
> 23 12:38:30 raspberrypi apache2[3292]: The Apache error log may have
> more information. Apr 23 12:38:31 raspberrypi systemd[1]:
> apache2.service: control process exited, code=exited status=1 Apr 23
> 12:38:31 raspberrypi systemd[1]: Failed to start LSB: Apache2 web
> server.
> -- Subject: Unit apache2.service has failed
> -- Defined-By: systemd
> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> --
> -- Unit apache2.service has failed.
> --
> -- The result is failed. Apr 23 12:38:31 raspberrypi systemd[1]: Unit apache2.service entered failed state. Apr 23 12:38:31 raspberrypi
> sudo[3278]: pam_unix(sudo:session): session closed for user root Apr
> 23 12:38:48 raspberrypi sudo[3345]: pi : TTY=pts/0 ;
> PWD=/etc/apache2/conf-enabled ; USER=root ; COMMAND=/bin/journalctl
> -xn Apr 23 12:38:48 raspberrypi sudo[3345]: pam_unix(sudo:session): session opened for user root by pi(uid=0)
Unfortunately the apache error.log contains no useful information whatsoever. But if i run apache2 configtest I get a "File name too long" error.
Strangely even after formatting my sdcard and putting a fresh copy of the distribution image on it and reinstalling apache2 the error remains.
What can I do?

Google Cloud Datalab setup process was unable to deploy App Engine application

I followed the steps to get started with Google Cloud Datalab.
The setup failed at this error:
Oct 14 09:20:26 datalab-deploy-main-20151014-09-18-39 startupscript:
Oct 14 09:20:26 datalab-deploy-main-20151014-09-18-39 startupscript: env_variables:
Oct 14 09:20:26 datalab-deploy-main-20151014-09-18-39 startupscript: DATALAB_ANALYTICS_ID: UA-54894152-3
Oct 14 09:20:26 datalab-deploy-main-20151014-09-18-39 startupscript: -------End of app.yaml------
Oct 14 09:20:26 datalab-deploy-main-20151014-09-18-39 startupscript: Start deploying...
Oct 14 09:20:28 datalab-deploy-main-20151014-09-18-39 startupscript: You are about to deploy the following modules:
Oct 14 09:20:28 datalab-deploy-main-20151014-09-18-39 startupscript: - team-marvel-sandbox/datalab/main From: [/datalab/app.yaml]
Oct 14 09:20:28 datalab-deploy-main-20151014-09-18-39 startupscript: ERROR: (gcloud.preview.app.deploy) Server responded with code [404]:
Oct 14 09:20:28 datalab-deploy-main-20151014-09-18-39 startupscript: This application does not exist (app_id=u'team-marvel-sandbox').
Oct 14 09:20:28 datalab-deploy-main-20151014-09-18-39 startupscript: ['_HTTPError__super_init', '__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__getitem__', '__getslice__', '__hash__', '__init__', '__iter__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__unicode__', '__weakref__', 'args', 'close', 'code', 'errno', 'filename', 'fileno', 'fp', 'getcode', 'geturl', 'hdrs', 'headers', 'info', 'message', 'msg', 'next', 'read', 'readline', 'readlines', 'reason', 'strerror', 'url']
Oct 14 09:20:28 datalab-deploy-main-20151014-09-18-39 startupscript: Step deploy datalab module failed.
I tried a solution by deploying a empty skeleton application to App Engine and re-run the setup. It failed again with the same error.
Anybody has any experience dealing with this?
This is a known issue and it happens to certain projects. While the fix is on the way, the only workaround so far is to try deploying it into another project.
When your project was created, what is the App Engine location (in advanced options) you picked? Datalab depends on managed VM, which is only supported in US only. https://cloud.google.com/appengine/docs/managed-vms/.

glassfish + jk + large file/strange response

I use a Glassfish-ApplicationServer for a Webapplication. The Glassfish is connected with mod_jk to a Apache2-Server.
Now there is a really strange behavior: Some Parts of the received Files (html, css, js, ...) are missing and there are strange numbers in the Files... If I use the direct access to the Glassfish eth works.
I use Glassfish 3.1.2, mod_jk 1.2.33 and Apache2. The Webapplication uses jsf/Primefaces 3.2.
In the mod_jk log there are strange error Messages. The Messages apear only on the first access to the Webpage.
[Mon Mar 19 13:33:42 2012] [3763:2928831344] [error]ajp_connection_tcp_get_message::jk_ajp_common.c (1280): wrong message format 0x2020 from 127.0.0.1:9009
[Mon Mar 19 13:33:42 2012] [3763:2928831344] [error] ajp_get_reply::jk_ajp_common.c (2145): (ajp13_worker) Tomcat is down or network problems. Part of the response has already been sent to the client
[Mon Mar 19 13:33:42 2012] [3763:2928831344] [info] ajp_service::jk_ajp_common.c (2614): (ajp13_worker) sending request to tomcat failed (recoverable), because of protocol error (attempt=2)
[Mon Mar 19 13:33:42 2012] [3763:2928831344] [error] ajp_service::jk_ajp_common.c (2634): (ajp13_worker) connecting to tomcat failed.
[Mon Mar 19 13:33:42 2012] [3763:2928831344] [info] jk_handler::mod_jk.c (2788): Service error=-11 for worker=ajp13_worker
[Mon Mar 19 13:33:42 2012] ajp13_worker ores.pragma.biz 0.191397
[Mon Mar 19 13:33:42 2012] [3764:2903653232] [error] ajp_connection_tcp_get_message::jk_ajp_common.c (1280): wrong message format 0x6973 from 127.0.0.1:9009
[Mon Mar 19 13:33:42 2012] [3764:2903653232] [error] ajp_get_reply::jk_ajp_common.c (2145): (ajp13_worker) Tomcat is down or network problems. Part of the response has already been sent to the client
[Mon Mar 19 13:33:42 2012] [3764:2903653232] [info] ajp_service::jk_ajp_common.c (2614): (ajp13_worker) sending request to tomcat failed (recoverable), because of protocol error (attempt=2)
[Mon Mar 19 13:33:42 2012] [3764:2903653232] [error] ajp_service::jk_ajp_common.c (2634): (ajp13_worker) connecting to tomcat failed.
[Mon Mar 19 13:33:42 2012] [3764:2903653232] [info] jk_handler::mod_jk.c (2788): Service error=-11 for worker=ajp13_worker
Hope, that anybody can help me.
Try installing GF 3.1.1 and see if that fixes the problem. I read about this bug last week, which might be your problem: http://java.net/jira/browse/GLASSFISH-18446 Looks like there is a patch available on that bug.
This is not exactly an answer, but if by any chance you're using apache and mod_jk only as a proxy to GF (as we did for years) - install nginx, and forget about apache. It's like day and night.
http://wiki.nginx.org/HttpProxyModule

Resources