Make CloudSolrServer to run dataimports only on a leader - solr

I set up a two servers Solr cluster with SolrCloud. Currently I have Master and Replica.
I want to dataimports go to the leader since it doesn't make any sense to make delta-imports on slave (updates wouldn't be distributed to the leader).
From the documentation I get that CloudSolrServer knows cluster state (obtained from Zookeeper) and by default sends all updates only to the leader.
What I want is to make CloudSolrServer to send all dataimport commands to the master. I have the following code:
SolrServer solrServer = new CloudSolrServer("localhost:2181");
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("qt", "/dataimport");
params.set("command", "delta-import");
QueryResponse response = solrServer.query(params);
But I see that the requests still goes to both my servers
localhost:8080 and localhost:8983. Is there any way to fix this?

Just replace your solr server initialization to below
SolrServer solrServer = new CloudSolrServer("zkHost1:port,zkHost2:port");
THis will cause the solr server client to consult zookeeper for solrcloud state.
For more details read CloudSolrServer documentation to init from zookeeper ensemble.

try { CloudSolrServer css = new CloudSolrServer("host1:2181,host2:2181"); css.connect(); ZkStateReader zkSR2 = css.getZkStateReader(); String leader = zkSR2.getLeaderUrl("collection_name", "shard1", 10); } catch (KeeperException e) { } catch (IOException
e) { } catch (InterruptedException e) {}

Related

How to configure consumer-level transactional redelivery with Camel and IBM MQ

I am trying to accomplish a transactional JMS client in Java Spring Boot using Apache Camel, which connects to IBM MQ. Furthermore, the client needs to apply an exponential back-off redelivery behavior when processing of messages fails. Reason: Messages from MQ need to be processed and forwarded to external systems that may be down for maintenance for many hours. Using transactions to guarantee at-least once processing guarantees seems the appropriate solution to me.
I have researched this topic for many hours and have not been able to find a solution. I will start with what I currently have:
#Bean
UserCredentialsConnectionFactoryAdapter uccConnectionFactoryAdapter ()
throws IOException {
MQConnectionFactory factory = new MQConnectionFactory();
factory.setCCDTURL(tabFilePath);
UserCredentialsConnectionFactoryAdapter adapter =
new UserCredentialsConnectionFactoryAdapter();
adapter.setTargetConnectionFactory(factory);
adapter.setUsername(userName);
bentechConnectionFactoryAdapter.setPassword(password);
return adapter;
}
#Bean
PlatformTransactionManager jmsTransactionManager(#Autowired UserCredentialsConnectionFactoryAdapter uccConnectionFactoryAdapter) {
JmsTransactionManager txMgr = new JmsTransactionManager(uccConnectionFactoryAdapter);
return txMgr;
}
#Bean()
CamelContextConfiguration contextConfiguration(#Autowired UserCredentialsConnectionFactoryAdapter uccConnectionFactoryAdapter,
#Qualifier("jmsTransactionManager") #Autowired PlatformTransactionManager txMgr) {
return new CamelContextConfiguration() {
#Override
public void beforeApplicationStart(CamelContext context) {
JmsComponent jmsComponent = JmsComponent.jmsComponentTransacted(uccConnectionFactoryAdapter, txMgr);
// required for consumer-level redelivery after rollback
jmsComponent.setCacheLevelName("CACHE_CONSUMER");
jmsComponent.setTransacted(true);
jmsComponent.getConfiguration().setConcurrentConsumers(1);
context.addComponent("jms", jmsComponent);
}
#Override
public void afterApplicationStart(CamelContext camelContext) {
// Do nothing
}
};
}
// in a route builder
...
from("jms:topic:INPUT_TOPIC?clientId=" + CLIENT_ID + "&subscriptionDurable=true&durableSubscriptionName="+ SUBSCRIPTION_NAME)
.transacted()
.("direct:processMessage");
...
I was able to verify the transactional behavior through integration tests. If an unhandled exception occurs during message processing, the transaction gets rolled back and retried. The problem is, it gets immediately retried, several times per second, causing possibly significant load on the IBM MQ manager and external system.
For ActiveMQ, redelivery policies are easy to do, with plenty of examples on the net. The ActiveMQConnectionFactory has a setRedeliveryPolicy method, meaning, the ActiveMQ client library has redelivery logic built in. This from all I can tell in line with the documentation of Camel's Transactional Client EIP, which states:
The redelivery in transacted mode is not handled by Camel but by the backing system (the transaction manager). In such cases you should resort to the backing system how to configure the redelivery.
What I absolutely can't figure out is how to achieve the same thing for IBM MQ. IBM's MQConnectionFactory does not have any support for redelivery policies. In fact, searching for redeliverypolicy in the MQ Knowledge Center brings up exactly... drumroll... 0 hits. I even looked a bit through the implementation of the MQConnectionFactory and didn't discover anything either.
Another backing system I looked into was the JmsTransactionManager. Searches for "jmstransactionmanager redelivery policy" or "jmstransactionmanager exponential backoff" did not turn up anything useful either. There was some talk about TransactionTemplate and AbstractMessageListenerContainer but 1) I didn't see any connection to redelivery policies, and 2) I could not figure out how those interact with Camel and JMS.
Sooo, does anybody have any idea how to implement exponential backoff redelivery policies with Apache Camel and IBM MQ?
Closing note: Camel supports redelivery policies on errorHandler and onException are not the same as redelivery policies in the transaction/connection backing system. Those handlers retry at the point of failure using the 'Exchange' object in whichever state it is, without rolling back and reprocessing the message from the start of the route. The transaction remains active during entire rety period, and a rollback only occurs when the errorHandler or onException gives up. This is not what I want for retries that may go on for many hours.
Looks like #JoshMc pointed me in the right direction. I managed to implement a RoutePolicy that delays redeliveries with increasing delays. I have run a test session for a few hours and several thousand redeliveries of the same message to see if there are any problems like memory leak, MQ connection exhaustion or so. I did not observe any problems. There were two stable TCP connections to the MQ manager, and memory usage of the Java process moved within a close range.
import java.util.Timer;
import java.util.TimerTask;
import javax.jms.Session;
import lombok.extern.log4j.Log4j2;
import org.apache.camel.CamelContext;
import org.apache.camel.CamelContextAware;
import org.apache.camel.Exchange;
import org.apache.camel.Message;
import org.apache.camel.Route;
import org.apache.camel.component.jms.JmsMessage;
import org.apache.camel.support.RoutePolicySupport;
#Log4j2
public class ExponentialBackoffPolicy extends RoutePolicySupport implements CamelContextAware {
final static String JMSX_DELIVERY_COUNT = "JMSXDeliveryCount";
private CamelContext camelContext;
#Override
public void setCamelContext(CamelContext camelContext) {
this.camelContext = camelContext;
}
#Override
public CamelContext getCamelContext() {
return this.camelContext;
}
#Override
public void onExchangeDone(Route route, Exchange exchange) {
try {
// ideally we would check if the exchange is transacted but onExchangeDone is called after the
// transaction is already rolled back, and the transaction context has already been removed.
if (exchange.getException() == null)
{
log.debug("No exception occurred, skipping route suspension.");
return;
}
int deliveryCount = getRetryCount(exchange);
int redeliveryDelay = getRedeliveryDelay(deliveryCount);
log.info("Suspending route {} for {}ms after exception. Current delivery count {}.",
route.getId(), redeliveryDelay, deliveryCount);
super.suspendRoute(route);
scheduleWakeup(route, redeliveryDelay);
} catch (Exception ex) {
// only log exception and let Camel continue as of this policy didn't exist.
log.error("Exception while suspending route", ex);
}
}
void scheduleWakeup(Route route, int redeliveryDelay) {
Timer timer = new Timer();
timer.schedule(
new TimerTask() {
#Override
public void run() {
log.info("Resuming route {} after redelivery delay of {}ms.", route.getId(), redeliveryDelay);
try {
resumeRoute(route);
} catch (Exception ex) {
// only log exception and let Camel continue as of this policy didn't exist.
log.error("Exception while resuming route", ex);
}
timer.cancel();
}
},
redeliveryDelay);
}
int getRetryCount(Exchange exchange) {
Message msg = exchange.getIn();
return (int) msg.getHeader(JMSX_DELIVERY_COUNT, 1);
}
int getRedeliveryDelay(int deliveryCount) {
// very crude backoff strategy for now, will need to refine later
if (deliveryCount < 10) return 1000;
if (deliveryCount < 20) return 5000;
if (deliveryCount < 30) return 20000;
return 60000;
}
}
And this is how it being used in route definitions:
from(mqConnectionString)
.routePolicy(new ExponentialBackoffPolicy())
.transacted()
...
// and if you want to distinguish between retriable and non-retriable situations, apply the following two exception handlers
onException(NonRetriableProcessingException.class)
.handled(true)
.log(LoggingLevel.WARN, "Non-retriable exception occurred, discard message.");
onException(Exception.class)
.handled(false)
.log(LoggingLevel.WARN, "Retriable exception occurred, retry message.");
One thing to note is that the JMSXDeliveryCount header comes from the MQ manager, and the redelivery delay is calculated from that. When you restart an application using the ExponentialBackoff policy while a message permanently fails, upon restart it will immediately attempt to reprocess that message but in case of another failure apply a delay corresponding to the total number of redeliveries, and not start over with the initial short delay.

How to index data in a specific shard using solrj

I am using solrj as client to index documents into solr cloud (Using solr4.5)
I had a requirement to save documents based on tenant_id, so i am trying to do document routing. Which is possible only if the collection is created using numShards parameter (http://searchhub.org/2013/06/13/solr-cloud-document-routing/)
I have two instances of solr in solr cloud(example1/solr and example2/solr) and exrenal zookeeper which is running in 2181 port.
Both the instances consist collection called collection1
I created one more collection called newCollection(With two shards and two replicas) using
http://localhost:8501/solr/admin/collectionsaction=CREATE&name=newCollection&numShards=2&replicationFactor=2&maxShardsPerNode=2&router.field=id
So in example1/solr-> I have newCollection_shard1_replica1 & newCollection_shard2_replica1,
In example2/solr -> I have newCollection_shard1_replica2 & newCollection_shard2_replica2
I copied example1/solr/collection1/conf to all shards and replicas
I restarted zookeeper server as well as solr instances:
zookeeper->zkServer.cmd
example1/solr-> java -Dbootstrap_confdir=./solr/newCollection_shard1_replica1/conf -Dcollection.configName=myconf -DzkHost=localhost:2181 -jar start.jar
example2/solr->java -DzkHost=localhost:2181 -jar start.jar
(Both instances are running at different port, one is at 8081 and other at 8051)
I am using solrj client to index documents
Here is my sample code
String url="http://localhost:8081/solr"
ConcurrentUpdateSolrServer solrServer= new ConcurrentUpdateSolrServer(url, 10000, 4);
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "shard1!513");
doc.addField("name", "Santhosh");
solrServer.add(documents);
solrServer.commit();
But it is saving document in collection1 with id shard1!513, is there any configuration changes required in solrconfig.xml (I am using default solrconfig.xml which came with solr4.5)
How to save documents in my newCollection? and how to do document routing?
Please help me out with issue.
Thanks!
You can Use CloudSolrServer and UpdateRequest
SolrServer solrServer = new CloudSolrServer(zkHost) // zkHost is your solr zookeeper host string
SolrInputDocument doc = new SolrInputDocument();
UpdateRequest add = new UpdateRequest();
add.add(document);
add.setParam("collection", "newCollection");
add.process(solrServer);
UpdateRequest commit = new UpdateRequest();
commit.setAction(UpdateRequest.ACTION.COMMIT, true, true);
commit.setParam("collection", "newCollection");
commit.process(solrServer);
I appended Core name of new Collection to the URL. so it is working fine now.
Instead of:
String url="http://localhost:8081/solr"
I used:
String url="http://localhost:8081/solr/newCollection_shard1_replica1"
ConcurrentUpdateSolrServer solrServer= new ConcurrentUpdateSolrServer(url, 10000, 4);
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "shard1!513");
doc.addField("name", "Santhosh");
solrServer.add(documents);
solrServer.commit();
You should use CloudSolrServer http://lucene.apache.org/solr/4_2_1/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrServer.html
Because in solrcloud, updates must be routed via zookeeper, as zookeeper knows the status of leaders in cloud.One more thing you need not to append collection name to url, just use setDefaultCollection(collectionName); method of CloudSolrServer to send your updates to 'collectionName' collection

Solr : Server at http://localhost:8080//solr returned non ok status:500, message:Internal Server Error

I keep getting this error "Server at solr 8080 returned non ok status:500, message:Internal Server Error" when I am trying to index text files on solr server using solrj api.
My code is as follows:
public void IndexData(String filePath,String solrId)
{
try {
String urlString = "http://localhost:8080//solr";
HttpSolrServer server = new HttpSolrServer(urlString);
ContentStreamUpdateRequest up
= new ContentStreamUpdateRequest("/update/extract");
up.addFile(new File(filePath),"");
up.setParam("literal.id", solrId);
up.setParam("uprefix", "attr_");
up.setParam("fmap.content", "attr_content");
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true,true);
server.request(up);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SolrServerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (Exception e) {
// TODO: handle exception
e.printStackTrace();
}
}
I am able to query the solr server using same server but while indexing data, why I am getting this error?
log4j:WARN No appenders could be found for logger (org.apache.solr.client.solrj.impl.HttpClientUtil).
log4j:WARN Please initialize the log4j system properly.
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at `http://localhost:8080//solr` returned non ok status:500, message:Internal Server Error
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at dataAnalysis.dataIndexer.DataIndexer.IndexData(DataIndexer.java:41)
at dataAnalysis.dataHome.DataHome.main(DataHome.java:13)
Hey great news I was able to solve the issue. Two changes required. First I edited solrconfig.xml to remove extra tab like to for every path and second change was to copy jars from /solr/contrib/extraction/lib to /tomcat/web-inf/lib folder. –

404 Error When Accessing Solr From Eclipse

I have a solr instance running and am able to access it through the browser and use the Admin to run queries. When I try to access it via Java code in Eclipse, however, I receive the following error:
Exception in thread "main" org.apache.solr.common.SolrException: Server at http://localhost:8983/solr returned non ok status:404, message:Not Found
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at testClass.main(testClass.java:18)
Here is the code I am running:
public static void main(String[] args) throws MalformedURLException, SolrServerException {
SolrServer server = new HttpSolrServer("http://localhost:8983/solr/");
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("myParam", "myValue");
QueryResponse response = server.query(params);
}
It turns out that I had two errors:
1) My setup actually has a nested solr directory so I needed to add another "solr" level.
2) I was setting the params variable incorrectly. The first argument sent should be "q", with the second argument being the "name:value" pairs.
Updated example, includes passing multiple params at once:
public static void main(String[] args) throws MalformedURLException, SolrServerException {
SolrServer server = new HttpSolrServer("http://localhost:8983/solr/solr/");
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("q", "param1:value1 AND param2:value2");
QueryResponse response = server.query(params);
System.out.println("response = " + response);
}
shouldn't it be :-
SolrServer server = new HttpSolrServer("http://localhost:8983/solr");
See the accepted answer in the following link :-
Querying Solr via Solrj: Basics

Strange problem with Google App Engine Java Mail

I'm using the MailService feature of Google App Engine in my
application. It works fine in one application without any issues.
But the same code doesn't work in another app. I'm not able to figure
it out. Please help. Following is the piece of code that I use to
send mail.
public static void sendHTMLEmail(String from, String fromName, String
to, String toName, String subject, String body) {
_logger.info("entering ...");
Properties props = new Properties();
Session session = Session.getDefaultInstance(props, null);
_logger.info("got mail session ...");
String htmlBody = body;
try {
Message msg = new MimeMessage(session);
_logger.info("created mimemessage ...");
msg.setFrom(new InternetAddress(from,
fromName));
_logger.info("from is set ...");
msg.addRecipient(Message.RecipientType.TO, new InternetAddress(
to, toName));
_logger.info("recipient is set ...");
msg.setSubject(subject);
_logger.info("subject is set ...");
Multipart mp = new MimeMultipart();
MimeBodyPart htmlPart = new MimeBodyPart();
htmlPart.setContent(htmlBody, "text/html");
mp.addBodyPart(htmlPart);
_logger.info("body part added ...");
msg.setContent(mp);
_logger.info("content is set ...");
Transport.send(msg);
_logger.info("email sent successfully.");
} catch (AddressException e) {
e.printStackTrace();
} catch (MessagingException e) {
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
System.err.println(e.getMessage());
}
}
When I look at the log (on the server admin console), it prints the
statement "content is set ..." and after that there is nothing in the
log. The mail is not sent. At times I get the following error after
the above statement is printed (and the mail is not sent).
com.google.appengine.repackaged.com.google.common.base.internal.Finalizer
getInheritableThreadLocalsField: Couldn't access
Thread.inheritableThreadLocals. Reference finalizer threads will
inherit thread local values.
But the mail quota usage keeps increasing.
Remember, this works fine in one application, but not in other. I'm
using the same set of email addresses in both the apps (for from and
to).
I'm really stuck with this. Appreciate any help.
Thank you.
Velu
Have you tried logging the exceptions? I bet one of them is being thrown - your printStackTrace will go nowhere.

Resources