Pysolr add throws (HTTP 500): [Reason: Task queue processing has stalled - solr

I am using Pysolr to add data to solr. I add 100 documents at once.But i am getting the below error.
Solr responded with an error (HTTP 500): [Reason: Task queue processing has stalled for 20121 ms with 0 remaining elements to process.]
is solr has queue internally and is it filled due to high number of hits? Can I increase the size of the queue(I mean limit)

It sounds like you can control this time via the solr.cloud.client.stallTime system property.
Reference: https://issues.apache.org/jira/browse/SOLR-13975?focusedCommentId=16988216&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16988216

Related

GAE push queue with max_concurrent_requests

I want to set up a push queue with max_concurrent_requests set to 1. So I created a queue.yaml like this:
queue:
- name: myqueue
max_concurrent_requests: 1
When running in the dev server, I get the error:
root: WARNING: Refill rate must be specified for push-based queue. Please check queue.yaml file.
Doing a Google search for "refill rate" and queue.yaml doesn't give any relevant hits except for the taskqueue stub, which doesn't help me.
Changing queue.yaml to this:
queue:
- name: myqueue
max_concurrent_requests: 1
rate: 10/s
Gets rid of the error in the dev server. Can anyone confirm that this will actually create a queue with a max of 1 concurrent request? (ok, that it is also limited to 10 per second) I'm suspicious because the queue.yaml documentation doesn't address this.
Although not specified in the documentation, you must specify a "rate" when creating a queue. To achieve 1 maximum concurrent request, you can simply set a high rate, and the rate will essentially be ignored. My tasks take about 0.25 seconds (i.e., 4/s) so a rate of 10/s makes sure that the rate does not impact task execution.

Rexster/Rexpro : RexProScriptException: .. java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: PermGen space

I am using a TITAN-0.4.3, REXSTER 2.4 over Cassandra & Elasticsearch.
I am calling rexpro from Python. In a single gremlin-request, I am trying to add 100 vertices and commit. I am able to successfully add 40000+ vertices, in 400+ gremlin-requests. However after that , I am getting exception :
Encountered a RexProScriptException: An error occurred while processing the script for language [groov
y]. All transactions across all graphs in the session have been concluded with failure: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: PermGen space
Rexster.sh [JVM heap size]
I tried to increase heap memory, but still throws the exception, after insertion of few more batches of vertices.
# Set Java options
if [ "$JAVA_OPTIONS" = "" ] ; then
JAVA_OPTIONS="-Xms256m -Xmx1024m"
fi
Please advice
Just a guess based on the information you provided, but.....PermGen errors usually show up in Rexster if you are not parameterizing the scripts you are sending. Most of the python libraries out there that I know of support that feature. You can read more about this issue here:
https://github.com/tinkerpop/rexster/issues/143
and other places in the gremlin users mailing list if you search around. If for some reason you can't parameterize then you can alter this JVM setting:
-XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=512M
but I'd consider that a last resort. Parameterization should not only get rid of your problem but will also greatly speed up your data loading process.

How to restrict the answer size in javamail search method?

I am using Javamail library to fetch emails from several servers through IMAP.
I only care for the unread messages and I want to download only the 5 last received unread messages.
For filtering the messages in a folder I am using using the Folder.search(FlagTerm ft) method passing the flag SEEN with value false, just as following code shows:
FlagTerm ft = new FlagTerm(new Flags(Flags.Flag.SEEN), false);
Message[] messages = folder.search(ft);
I need to diminish bandwidth usage and the above method may return an arbitrarily large number of messages. I am only interested in the last 5 of them, is there a way for the IMAP server return a limited number of messages?
You can do the search over a subset of the messages, effectively setting an upper limit on the number of messages returned, but you might have to do multiple searches. There's no direct way to limit the number of results returned if you're searching all messages.
Note that the search results are relatively compact (effectively only the message number), so unless you're searching a huge number of messages I wouldn't think bandwidth would be an issue relative to fetching the content of the messages.

Apache Camel - Make Aggregator 'flush'

I effectively want a flush, or a completionSize but for all the aggregations in the aggregator. Like a global completionSize.
Basically I want to make sure that every message that comes in a batch is aggregated and then have all the aggregations in that aggregator complete at once when the last one has been read.
e.g. 1000 messages arrive (the length is not known beforehand)
aggregate on correlation id into bins
A 300
B 400
C 300 (size of the bins is not known before hand)
I want the aggregator not to complete until the 1000th exchange is aggregated
thereupon I want all of the aggregations in the aggregator to complete at once
The CompleteSize applies to each aggregation, and not the aggregator as a whole unfortunately. So if I set CompleteSize( 1000 ) it will just never finish, since each aggregation has to exceed 1000 before it is 'complete'
I could get around it by building up a single Map object, but this is kind of sidestepping the correlation in aggregator2, that I would prefer to use ideally
so yeah, either global-complete-size or flushing, is there a way to do this intelligently?
one option is to simply add some logic to keep a global counter and set the Exchange.AGGREGATION_COMPLETE_ALL_GROUPS header once its reached...
Available as of Camel 2.9...You can manually complete all current aggregated exchanges by sending in a message containing the header Exchange.AGGREGATION_COMPLETE_ALL_GROUPS set to true. The message is considered a signal message only, the message headers/contents will not be processed otherwise.
I suggest to take a look at the Camel aggregator eip doc http://camel.apache.org/aggregator2, and read about the different completion conditions. And as well that special message Ben refers to you can send to signal to complete all in-flight aggregates.
If you consume from a batch consumer http://camel.apache.org/batch-consumer.html then you can use a special completion that complets when the batch is done. For example if you pickup files or rows from a JPA database table etc. Then when all messages from the batch consumer has been processed then the aggregator can signal completion for all these aggregated messages, using the completionFromBatchConsumer option.
Also if you have a copy of the Camel in Action book, then read chapter 8, section 8.2, as its all about the aggregate EIP covered in much more details.
Using Exchange.AGGREGATION_COMPLETE_ALL_GROUPS_INCLUSIVE worked for me:
from(endpoint)
.unmarshal(csvFormat)
.split(body())
.bean(CsvProcessor())
.choice()
// If all messages are processed,
// flush the aggregation
.`when`(simple("\${property.CamelSplitComplete}"))
.setHeader(Exchange.AGGREGATION_COMPLETE_ALL_GROUPS_INCLUSIVE, constant(true))
.end()
.aggregate(simple("\${body.trackingKey}"),
AggregationStrategies.bean(OrderAggregationStrategy()))
.completionTimeout(10000)

parameters for mapreduce

I am using Java mapreduce module of appengine
I get the following info message
Out of mapper quota. Aborting request until quota is replenished. Consider increasing mapreduce.mapper.inputprocessingrate (default 1000) if you would like your mapper job to complete faster.
Task parameters.
queue name = default
rate = 1/s
bucketsize = 1
I have about 2000 entities of the KIND, and I am just doing the logging in the map() call
What mapreduce/task parameters needs to be provided to get rid of that info message.
-Aswath
I believe that this is a special quota in mapreduce implemented by the framework itself. It's designed to limit the speed that it can consume resources, so that
mapreduce does not run through available app engine quota too quickly. It looks like it denotes the maximum overall rate of map() calls/second.
Try increasing the mapreduce.mapper.inputprocessingrate property in the configuration for your map job.
Or, just to test, you can change the default, defined in mapreduce/AppEngineJobContext.java:
public static final int DEFAULT_MAP_INPUT_PROCESSING_RATE = 1000;

Resources