Distribute Solr Using Replication without Using SolrCloud - solr

I want to use Solr replication without using SolrCloud.
I have three Solr servers, one is master and others are slave.
How to dispatch the search query on the Solr server which isn't busy?
What tools do and how to lead?

You can use any load balancer - Solr talks HTTP, which makes any existing load balancing technology available. HAProxy, varnish, nginx, etc. will all work as you expect, and you'll be able to use all the advanced features that the different packages offer. It'll also be independent of the client, meaning that you're not limited to the LBHttpSolrServer class from SolrJ or what your particular client offers. Certain LB solutions also offer high throughput caching (varnish) or dynamic real time fallover between live nodes.
Another option we've also used successfully is to replicate the core to each web node, allowing us to always query localhost for searching.

You have configured solr in master-slave mode. I think you can use LBHttpSolrServer from solrj api for querying the solr. You need to send the update requests to master node explicitly. The LBHttpSolrServer will provide you the load balancing among all the specified nodes. In the master-slave mode, slave are responsible for keeping themselves updated with the master.
Do NOT use this class for indexing in master/slave scenarios since documents must be sent to the correct master; no inter-node routing is done. In SolrCloud (leader/replica) scenarios, this class may be used for updates since updates will be forwarded to the appropriate leader.
I hope this will help.

apache camel can be used for general load balancer. like this:
public class LoadBalancer {
public static void main(String args[]) throws Exception {
CamelContext context = new DefaultCamelContext();
context.addRoutes(new RouteBuilder() {
public void configure() {
from("jetty://http://localhost:8080")
.loadBalance().roundRobin().to("http://172.28.39.138:8080","http://172.168.20.118:8080");
}
});
context.start();
Thread.sleep(100000);
context.stop();
}
}
There is some other materials maybe useful:
Basic Apache Camel LoadBalancer Failover Example
http://camel.apache.org/load-balancer.html
But is seems there are not straight way to solr-camel integration, because camel can be used to balance the requests upon he java "Beans" components
http://camel.apache.org/loadbalancing-mina-example.html
There is another useful example:
https://svn.apache.org/repos/asf/camel/trunk/camel-core/src/test/java/org/apache/camel/processor/CustomLoadBalanceTest.java
And you can use camel as a proxy between client and server
http://camel.apache.org/how-to-use-camel-as-a-http-proxy-between-a-client-and-server.html
There are some presentation to beginning with apache camel,its approach and architecture:
http://www.slideshare.net/ieugen222/eip-cu-apache-camel

Related

can we use solr as persistent store for apache ignite?

I have been working to integrate solr and apache ignite.....while I am trying to run the program write
class org.apache.ignite.IgniteCheckedException: Cannot enable write-behind (writer or store is not provided) for cache
this error is shown
CacheConfiguration textMetaConfig = new CacheConfiguration<>("textMetaCache");
textMetaConfig.setWriteThrough(true);
textMetaConfig.setReadThrough(true);
textMetaConfig.setAtomicityMode(CacheAtomicityMode.ATOMIC);
textMetaConfig.setWriteBehindEnabled(true);
textMetaConfig.setWriteBehindFlushSize(40960);
textMetaConfig.setWriteBehindFlushFrequency(1);
textMetaConfig.setWriteBehindFlushThreadCount(5);
textMetaConfig.setCacheMode(CacheMode.PARTITIONED);
textMetaConfig.setIndexedTypes(String.class, TextMeta.class);
this is how i have configured cache
You can implement the CacheStore interface to integrate with any kind of persistence storage. Out of the box Ignite provides Cassandra store implementation and JDBC store implementation which covers most of the regular relational databases. For anything else you will have to create your own implementation. And in any case, the store must be configured via CacheConfiguration.setCacheStoreFactory(..) configuration property. Please refer to this page for details: https://apacheignite.readme.io/docs/persistent-store

Apache Httpclient and the Cloud

I want to put a scraping service using Apache HttpClient to the Cloud. I read problems are possible with Google App Engine, as it's direct network access and threads creation are prohibited. What's about other cloud hosting providers? Have anyone experince with Apache HttpClient + cloud?
AppEngine has threads and direct network access (HTTP only). There is a workaround to make it work with HttpClient.
Also, if you plan to use many parse tasks in parallel, you might check out Task Queue or even mapreduce.
Btw, there is a "misfeature" in GAE that you can not fully set custom User-agent header on your requests - GAE always adds "AppEngine" to the end of it (this breaks requests to certain sites - most notably iTunes).
It's certainly possible to create threads and access other websites from CloudFoundry, you're just time limited for each process. For example, if you take a look at http://rack-scrape.cloudfoundry.com/, it's a simple rack application that inspects the 'a' tags from Google.com;
require 'rubygems'
require 'open-uri'
require 'hpricot'
run Proc.new { |env|
doc = Hpricot(open("http://www.google.com"))
anchors = (doc/"a")
[200, {"Content-Type" => "text/html"}, [anchors.inspect]]
}
As for Apache HttpClient, I have no experience of this but I understand it isn't maintained any more.

A scalable bus with multiple Camel instances

My idea is to use camel to decouple modules. In order to support scalability and failover, I am wondering if the following architecture is adviced?
I have two applications with Camel embedded AppCamel1 and AppCamel2. Then I have standalone camel nodes Camel1 and Camel2.
AppCamel1 would have a route with fail-over/load balancing to Camel1 and Camel2. This way, if Camel1 crashes for example, Camel2 is used for failover.
Camel1 and 2 would do a REST call with the http component for example. Also there would be a request-reply from AppCamel1 up to camel1 or 2.
Is it a valid scenario?
What should I use to interconnect the different Camel instances (AppCamel1 to Camel1 or 2)? (I would like to know if it's possible to avoid another component like a jms server in the middle)
Thank you!
Edited following Boday's answer
the REST calls are from Camel1/2. I'd like to interconnect AppCamel1/2 to Camel1/2 and see if I can avoid anything in between. I guess mina is a possibility or even http but in that case a AppCamel1 and AppCamel2 need to know Camel1/2 which is not so good.
This is also being discussed at the Camel mailing list, where there is also some pointers and suggestions
http://camel.465427.n5.nabble.com/scalable-bus-with-multiple-Camel-instances-tp5606593p5606593.html
If you are trying to load balance HTTP requests to your AppCamel1/2, then you'd need a proxy server in between (apache mod_proxy, perlbal, etc). To load balance from AppCamel1/2 to Camel1/2, you can use Camel's load balancer or even JMS request/reply...
From AppCamel1/2 to Camel1/2, it sounds like you are using REST as the interface. If you need more complex communication between the instances, then I'd use JMS (via camel-activemq) for messaging and Hazelcast (via camel-hazelcast) for distributed caching/locking, etc.
If you use jms to communicate then you do not need a special load balancer. Just use one queue and let both Camel1/2 listen to the queue. Then they will automatically failover and load balance.
I would definetly go for a jms middleware. Activemq is the natural choice (camel is even considered a sub project of activemq). It is trivial to embedd amq along with your canel instances and cluster them. Activemq will then be able to handle both load balancing and failover for you.

Spring + Hibernate with transactions in multithreading environment

I have a simple application that fetches some data from ONE table on db (MySQL 5.1) via Hibernate and display the content. The main framework used is Spring 3.0. The query runs correctly in #Transactional(read-only) (+second cache level).
The problems come out running some concurrent tests with 20/30 requests against the same page. Some page-requests return 500 instead of 200. I suppose that is due to #Transactional doesn't manage multi-thread access (pls correct me if I am wrong).
In the controller I have something like this:
List<String> names = usersService.getUserNames(); // this executes query in #Transactional env
doSomething(names);
the logs say that "doSomething" threws some NullPointerException as there are not data in the list passed.
Is there a way to implement a multi-thread access manager with Spring+Hibernate that manages concurrent requests to db?
#Transactional is working perfectly fine in multithreaded applications. In fact, all web applications are multithreaded, and each spring bean singleton instance handles all requests. So the issue is not there.
When you get error 500, you should check the logs files. If nothing's there, make sure you haven't swallowed some exception.
You need to make sure that a separate database connection is allocated to every incoming request. Connections should be managed in a pool. The size of the database connection pool would determine (indirectly) how many requests your application can server simultaneously.

Service Registry for Apache Camel Applications

A registry is a list of items with pointers for where to find the items, like the index on a database table or the card catalog for a library.
Correct me if I am wrong, from this definition, what I'd expect from a camel application registry is where a client application can (depending on the client protocol) do a lookup and based on metadata, selects a particular service and uses it as defined.
I am wondering if Apache Camel has anything close to this. Most of the service registries articles/implementations I have seen seems to address only SOAP protocols.
Regards.
You can use the REST API from camel-web to lookup routes and endpoint which is the "services" in Camel.
http://camel.apache.org/web-console.html
In terms of a SOA service registry then you may look at other products which specialize in that such as Apache ZooKepper
http://hadoop.apache.org/zookeeper/
You can use ManagementStrategy SPI to hook into events in Camel and track services as they are created/started/stopped etc. Then you can bridge that to your SOA service registry product of choice.
you can also use the CamelContext getEndpoints() and getEndpointsMap() APIs to browse the endpoints
see this post for some general monitoring information...
http://benoday.blogspot.com/2011/01/apache-camel-monitoring.html

Resources