I was reading this PDF: http://2010.lucene-eurocon.org/slides/Solr-In-The-Cloud_Mark-Miller.pdf, and there is a section that talks about CloudSolrServer. In particular, this statement is made:
It keeps a list of both live and dead servers. When a request to a server fails, that server is added to the ‘dead’ list, and another ‘live’ server is queried instead.
The ‘dead’ server list is occasionally pinged, and if a server comes back, it is moved back into the ‘live’ list.
This works fine when a SOLR instance or the machine crashes, but for normal maintenance it would be undesirable because requests in progress would be lost. Typically with a normal load balancer, there's a way to shut off traffic to a box, and then normal shutdown can proceed at some interval after that.
Since it appears that CloudSolrServer is intended to replace a traditional load balancer in front of a SOLR cluster, I was wondering about graceful shutdown. What is the recommended way to shutdown a SOLR instance while not losing requests, (while using CloudSolrServer)?
If you want to gracefully shutdown an instance, you will need to first remove the corresponding node from ZooKeeper and then shut down the instance. You can remove the node from ZK by using "DELETEREPLICA" command:
/admin/collections?action=DELETEREPLICA&collection=collection&shard=shard&replica=replica
See more in Solr Collections API documentation
once the ephemeral node is removed from ZooKeeper, CloudSolrServer will stop sending requests to it.
Related
I have a Node.js app and I used to have a standalone Solr but then our company decided to use SolrCloud to provide failover.
In the standalone Solr I had only the one server with it and I had all my requests like: http://solr_server:8983/solr/mycore/select?indent=on&q=*:*&wt=json so all requests led to the same server all the time.
But now I have 3 different instances with 1 ZooKeeper and 1 Solr node on each of them and my requests look like this: http://solr_server_1:8983/solr/mycollection/select?q=*:*
And now the question: what if the solr_server_1 will go down? How can I still get my results? How can I handle requests in this case?
If you're doing this manually: You'll have to catch the exception when the connection fails, and then retry the next server in your list.
let servers = ['ip1:8983', 'ip2:8983', 'ip3:8983']
If you're using a library that supports Zookeeper (i.e. it connects to zookeeper to find out what the live nodes are), you give the client a list of zookeeper nodes and lets it figure out the rest. node-solr-smart-client is a client that supports Zookeeper as well.
options = {
zkConnectionString: 'ip1:2181,ip2:2181,ip3:2181',
// etc.
}
solrSmartClient.createClient('my_solr_collection', options, function (err, solrClient) {
I want to send a particular HTTP request (or otherwise communicate a message) to every (dynamic/autoscaled) instance which is currently running for a particular App Engine application.
My goal is to trigger each instance to discard some locally cached data (because I have just modified the underlying data and want them to reload it).
One possible solution is to store a value in Memcache, and have instances check this each time they handle a request to see if they should flush their cache. But this adds latency to every request.
Another possible solution would be to somehow stop all running instances. No fixed overhead, but some impact while instances are restarted.
An even less desirable solution would be to redeploy the application code in order to cause all instances to be stopped. This now adds additional delay on my end as a deployment takes some time.
You could use the management API to list instances for a given version, but I'd suggest that you'd probably want to use something like the PubSub API to create a subscription on each of your App Engine instances. Since each instance has its own subscription, any messages sent to the monitored queue will be received by all instances.
You can create the subscription at startup (the /_ah/start endpoint may be useful), and then delete it at shutdown (using the /_ah/stop endpoint).
We have an apache web server (version httpd-2.2.22-win32-x86-openssl-0.9.8t) with weblogic (version 10.3.2) cluster having 3 nodes. In our load testing, we get session timeout errors in some cases (less than 1%). This was happening, even if we have -1 for session timeout in the web.xmls of weblogic nodes. After days of debugging, we realized that in some cases, the JSESSIOID sent by request is not honored by the response. Fiddler traces show that the RESPONSE has a header named Set-Cookie:JSESSIONID and the value for this is different from the JSESSIONID sent in the request. We get the session expiry page immediately. As already mentioned, this happens only in some rare cases.
When using WeblogicCluster, the requests have session affinity. Thus the requests go to the same node where the initial contact was made. But the issue turned out to be that at high loads, the nodes were not responding. Thus the requests go to the other nodes. This is the default behavior with WeblogicCluster. Since we do not have session replication and failover enabled, any request that goes to the secondary nodes would give us session timeout error.
One solution to this was to start supporting session replication and failover in weblogic. But we did not want that as the impact was high.
These were the configuration changes that fixed this issue
In httpd.conf
ConnectTimeoutSecs 50 (default is 10)
ConnectRetrySecs 5 (default is 2)
WLSocketTimeoutSecs 10 (default is 2)
WLIOTimeoutSecs 18000 (default is 300)
Idempotent OFF (default is ON)
The first 2 changes in ConnectTimeoutSecs and ConnectRetrySecs means that
retry would 10 times (50/5) instead of the default 5 (10/2)
In weblogic nodes
domain --> environment --> servers --> click on the required server -->
tuning--> Accept Backlog:
--> default value is 300. Made it 375.
restart the weblogic nodes and apache
For more details refer
http://docs.oracle.com/cd/E13222_01/wls/docs81/plugins/plugin_params.html
http://docs.oracle.com/cd/E13222_01/wls/docs81/plugins/apache.html . Refer the diagram here for the
I'm trying to tweak our system status check to see the state of the Solr nodes in our SolrCloud. I'm facing the following problems:
We send a query to each of the Solr nodes separately. If we get a response and the status of the response is 0, we assume the node is running. Unfortunately, we've seen cases in which the node is recovering or even down and select queries are still handled.
In hope to prevent this, we've added a check which sends a ping request to solr. If the status returned by this is request reads 'OK' we assume the node is up. Unfortunately even with this request, if the node is recovering or down, this check won't fail.
My question is: What is the correct way to check the status of a node in SolrCloud?
If you are using a SolrCloud, it's recommended to maintain an explicit zookeeper ensemble as well. Because zookeeper ensemble maintains the SolrCloud's current status of each node and each shard wise. This status is actually get reflected from the SolrCloud admin window.
Go to the Admin window. Click on "Cloud".
Then click on "Tree" to get a tree view of your SolrCloud architecture.
Click /clusterstate.json to view the SolrCloud status.
This (clusterstate.json) json file holds the SolrCloud status information. Now if you are running an explicit zookeeper ensemble, following are the steps to get SolrCloud status.
Go to the path "zookeeper/installation/directory/bin"
Execute ./zkCli.sh -server ZK_IP:ZK_PORT (E.g ./zkCli.sh -server localhost:2181)
Execute get /clusterstate.json
You'll find the SolrCloud status.
Note : ZK_IP - The HOST IP where zoopeeper is running.
ZK_PORT - Zookeeper's client port.
You actually don't want /clusterstate.json - as this only covers the case where collections are already present. From ZooKeeper you need /live_nodes
Because Zookeeper is the authority for what Solr Nodes are members of the Solr cloud cluster, it follows that you should go to it first, to discover what members are accessible. This is how all Solr cloud clients work, and probably is the best way to approach the problem.
/live_nodes contains a file for each live Solr node, regardless of what collections exist or where the replicas are located.
Once you have resolved /live_nodes... you can call clusterstatus on any Solr instance with the address and port from one of the live-nodes.
http://localhost:8983/solr/admin/collections?action=clusterstatus&wt=json
clusterstatus provides a detailed overview of Solr nodes, collections, replicas, etc. Everything you would want to know.
As a final note, it's very wise to set SOLR_HOST inside of solr.in.sh configuration (/etc/default/solr.in.sh) - by default 'localhost' is used to reference the solr node. Setting this value to the public address you want the Solr node identified by will prevent ZooKeeper from returning the address "localhost" to clients when attempting to reach a Solr Node.
I have configured a sticky session set up with a load balancer (Apache) and three app nodes running Jboss 4.2.2 .
the load balancer uses mod_jk and settings as mentioned in the tutorial here.
http://community.jboss.org/wiki/UsingModjk12WithJBoss;jsessionid=1569CBFB7C3096C59C977CD3F7159A32
I have the jumRoute set as node1 ,node2 and node3 for the three nodes and my workerlist property for load balancer is set as
node1,node2,node3
The tutorial has been followed till the last point but I did not configure the useJK parameters under.the value is still set to false.
The sticky sessions are holding up but I seem to loose session and get this error in my mod_jk log file
[error] ajp_get_reply::jk_ajp_common.c (1926): (node1) Timeout with waiting reply from tomcat. Tomcat is down, stopped or network problems (errno=110)
I personally checked the user logged in on node1 and then moved to node2.
Does Apache redirect to another node when it fails to get reply from node1, How does useJk help in this situation.
---edit 01---
I changed the UseJK value to true but still few users still experience sudden log out which I know due to change in the server node catering the users request.
I also wanted to know whether traffic on the nodes has any affect on sticky session and how to counter it.( I am experiencing high load on all the servers for a few days)
----edit 02 ----
I would also like to know about controlling the number of connections per worker.
controlling the number of ajp connector/connections.
relation between the number of connections of apache load balancer and number of
ajp connections in JBoss worker nodes.
what would be the best configuration between Apache 2.2.3 and JBoss 4.2.2 worker nodes with Tomcat 5.5 connectors.
---- edit03-----
http://community.jboss.org/wiki/OptimalModjk12Configuration
using the above article i just wanted to know the best values for Apache
MaxClients
ThreadPerChild
I found the following note in this article interesting. I haven't tried this, but perhaps could be useful for someone experiencing the same problem.
If you are using mod_jk and have turned sticky sessions on, but your sessions are failing to stick, you have probably failed to set the domain, or you have failed to set the jvmRoute, or you are using a non-standard cookie name to implement the stickyness!
I think in your worker.properties file the workerlist should have loadBalancer worker not the node1,node2 & node3.It should be like this
worker.list=loadmanager
worker.loadmanager.balance_workers=node1,node2,node3
I hope u must have these correct.
Also you have to set UserJK arttribute to set as true for load balancing with sticky session combined with JvmRoute. If set to true, it will insert a JvmRouteFilter
to intercept every request and replace the JvmRoute if it detects a failover.
<attribute name="UseJK">true</attribute>
in deploy/jboss-web.deployer/META-INF/jboss-service.xml