Occasional, unpredictable exceptions with multi-threaded solrj queries - solr

I have a multi-threaded application using solrj 4. There are a maximum of 25 threads. Each thread creates a connection using HttpSolrServer, and runs one query. Most of the time this works just fine. But occasionally I get the following exception:
Jan 10, 2013 9:29:07 AM org.apache.http.impl.client.DefaultRequestDirector tryConnect
INFO: I/O exception (java.net.NoRouteToHostException) caught when connecting to the target host: Cannot assign requested address
Jan 10, 2013 9:29:07 AM org.apache.http.impl.client.DefaultRequestDirector tryConnect
INFO: Retrying connect
Exception in thread "main" java.util.concurrent.ExecutionException: java.lang.NullPointerException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
at java.util.concurrent.FutureTask.get(FutureTask.java:111)
I wrote some code to retry the query, if it fails:
while(!querySuccess && queryAttempts<m_MaxQueryAttempts ){
try{
queryAttempts++;
rsp = m_Server.query( query );
querySuccess = true;
}catch(SolrServerException e){
querySuccess = false;
}
}
After one or more retries the query usually works. But sometimes it fails even after 100 retries. Either way, I'd like to understand what the cause of the problem is. Why does it work some of the time? Is it an issue with concurrent access to solr? Apart from this process, I only have one other process that it continually writing to the index using a single connection. The default server settings are below - so I don't think it's because of too many simultaneous connections.
INFO: Creating new http client, config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
Any suggestions on how to diagnose this would be much appreciated.

Related

Apache Flink with Kinesis Analytics : java.lang.IllegalArgumentException: The fraction of memory to allocate should not be 0

Background :
I have been trying to setup BATCH + STREAMING in the same flink application which is deployed on kinesis analytics runtime. The STREAMING part works fine, but I'm having trouble adding support for BATCH.
Flink : Handling Keyed Streams with data older than application watermark
Apache Flink : Batch Mode failing for Datastream API's with exception `IllegalStateException: Checkpointing is not allowed with sorted inputs.`
The logic is something like this :
The logic is something like this :
streamExecutionEnvironment.setRuntimeMode(RuntimeExecutionMode.BATCH);
streamExecutionEnvironment.fromSource(FileSource.forRecordStreamFormat(new TextLineFormat(), path).build(),
WatermarkStrategy.noWatermarks(),
"Text File")
.process(process function which transforms input)
.assignTimestampsAndWatermarks(WatermarkStrategy
.<DetectionEvent>forBoundedOutOfOrderness(orderness)
.withTimestampAssigner(
(SerializableTimestampAssigner<Event>) (event, l) -> event.getEventTime()))
.keyBy(keyFunction)
.window(TumblingEventWindows(Time.of(x days))
.process(processWindowFunction);
On doing this I'm getting the below exception :
java.lang.Exception: Exception while creating StreamOperatorStateContext.
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:254)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:272)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:441)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:582)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
at org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:562)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:537)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:764)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:571)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for WindowOperator_90bea66de1c231edf33913ecd54406c1_(1/1) from any of the 1 provided restore options.
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:345)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:163)
... 10 more
Caused by: java.io.IOException: Failed to acquire shared cache resource for RocksDB
at org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.allocateSharedCachesIfConfigured(RocksDBOperationUtils.java:306)
at org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend.createKeyedStateBackend(EmbeddedRocksDBStateBackend.java:426)
at org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend.createKeyedStateBackend(EmbeddedRocksDBStateBackend.java:90)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:328)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
... 12 more
Caused by: java.lang.IllegalArgumentException: The fraction of memory to allocate should not be 0. Please make sure that all types of managed memory consumers contained in the job are configured with a non-negative weight via `taskmanager.memory.managed.consumer-weights`.
at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:160)
at org.apache.flink.runtime.memory.MemoryManager.validateFraction(MemoryManager.java:672)
at org.apache.flink.runtime.memory.MemoryManager.computeMemorySize(MemoryManager.java:653)
at org.apache.flink.runtime.memory.MemoryManager.getSharedMemoryResourceForManagedMemory(MemoryManager.java:521)
at org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.allocateSharedCachesIfConfigured(RocksDBOperationUtils.java:302)
... 17 more
Seems like kinesis-analytics does not allow clients to define a flink-conf.yaml file to define taskmanager.memory.managed.consumer-weights. Is there any way around this ?
It's not clear to me what the underlying cause of this exception is, nor how to make batch processing work on KDA.
You can try this (but I'm not sure KDA will allow it):
Configuration conf = new Configuration();
conf.setString("taskmanager.memory.managed.consumer-weights", "put-the-value-here");
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment(conf);

Google Cloud Run pubsub pull listener app fails to start

I'm testing pubsub "pull" subscriber on Cloud Run using just listener part of this sample java code (SubscribeAsyncExample...reworked slightly to fit in my SpringBoot app):
https://cloud.google.com/pubsub/docs/quickstart-client-libraries#java_1
It fails to startup during deploy...but while it's trying to start, it does pull items from the pubsub queue. Originally, I had an HTTP "push" receiver (a #RestController) on a different pubsub topic and that worked fine. Any suggestions? I'm new to Cloud Run. Thanks.
Deploying...
Creating Revision... Cloud Run error: Container failed to start. Failed to start and then listen on the port defined
by the PORT environment variable. Logs for this revision might contain more information....failed
Deployment failed
In logs:
2020-08-11 18:43:22.688 INFO 1 --- [ main] o.s.web.context.ContextLoader : Root WebApplicationContext: initialization completed in 4606 ms
2020-08-11T18:43:25.287759Z Listening for messages on projects/ce-cxmo-dev/subscriptions/AndySubscriptionPull:
2020-08-11T18:43:25.351650801Z Container Sandbox: Unsupported syscall setsockopt(0x18,0x29,0x31,0x3eca02dfd974,0x4,0x28). It is very likely that you can safely ignore this message and that this is not the cause of any error you might be troubleshooting. Please, refer to https://gvisor.dev/c/linux/amd64/setsockopt for more information.
2020-08-11T18:43:25.351770555Z Container Sandbox: Unsupported syscall setsockopt(0x18,0x29,0x12,0x3eca02dfd97c,0x4,0x28). It is very likely that you can safely ignore this message and that this is not the cause of any error you might be troubleshooting. Please, refer to https://gvisor.dev/c/linux/amd64/setsockopt for more information.
2020-08-11 18:43:25.680 WARN 1 --- [ault-executor-0] i.g.n.s.i.n.u.internal.MacAddressUtil : Failed to find a usable hardware address from the network interfaces; using random bytes: ae:2c:fb:e7:92:9c:2b:24
2020-08-11T18:45:36.282714Z Id: 1421389098497572
2020-08-11T18:45:36.282763Z Data: We be pub-sub'n in pull mode2!!
Nothing else after this and the app stops running.
#Component
public class AndyTopicPullRecv {
public AndyTopicPullRecv()
{
subscribeAsyncExample("ce-cxmo-dev", "AndySubscriptionPull");
}
public static void subscribeAsyncExample(String projectId, String subscriptionId) {
ProjectSubscriptionName subscriptionName =
ProjectSubscriptionName.of(projectId, subscriptionId);
// Instantiate an asynchronous message receiver.
MessageReceiver receiver =
(PubsubMessage message, AckReplyConsumer consumer) -> {
// Handle incoming message, then ack the received message.
System.out.println("Id: " + message.getMessageId());
System.out.println("Data: " + message.getData().toStringUtf8());
consumer.ack();
};
Subscriber subscriber = null;
try {
subscriber = Subscriber.newBuilder(subscriptionName, receiver).build();
// Start the subscriber.
subscriber.startAsync().awaitRunning();
System.out.printf("Listening for messages on %s:\n", subscriptionName.toString());
// Allow the subscriber to run for 30s unless an unrecoverable error occurs.
// subscriber.awaitTerminated(30, TimeUnit.SECONDS);
subscriber.awaitTerminated();
System.out.printf("Async subscribe terminated on %s:\n", subscriptionName.toString());
// } catch (TimeoutException timeoutException) {
} catch (Exception e) {
// Shut down the subscriber after 30s. Stop receiving messages.
subscriber.stopAsync();
System.out.printf("Async subscriber exception: " + e);
}
}
}
Kolban question is very important!! With the shared code, I would like to say "No". The Cloud Run contract is clear:
Your service must answer to HTTP request. Out of request, you pay nothing and no CPU is dedicated to your instance (the instance is like a daemon when no request is processing)
Your service must be stateless (not your case here, I won't take time on this)
If you want to pull your PubSub subscription, create an endpoint in your code with a Rest controller. While you are processing this request, run your pull mechanism and process messages.
This endpoint can be called by Cloud Scheduler regularly to keep the process up.
Be careful, you have a max request processing timeout at 15 minutes (today, subject to change in a near future). So, you can't run your process more than 15 minutes. Make it resilient to fail and set your scheduler to call your service every 15 minutes

'amplify init' keeps failing

I recently got myself a new PC(Predator Helios 300) and I wanted to start using aws there but when I try to perform amplify init I get the error below even though I already did all the other steps such as configuration.
× Root stack creation failed
init failed
{ SignatureDoesNotMatch: Signature expired: 20190427T235724Z is now earlier than 20190428T094952Z (20190428T095452Z - 5 min.)
at Request.extractError (C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\protocol\query.js:50:29)
at Request.callListeners (C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\sequential_executor.js:106:20)
at Request.emit (C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\sequential_executor.js:78:10)
at Request.emit (C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\request.js:683:14)
at Request.transition (C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\request.js:22:10)
at AcceptorStateMachine.runTo (C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\state_machine.js:14:12)
at C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\state_machine.js:26:10
at Request.<anonymous> (C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\request.js:38:9)
at Request.<anonymous> (C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\request.js:685:12)
at Request.callListeners (C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\sequential_executor.js:116:18)
message:
'Signature expired: 20190427T235724Z is now earlier than 20190428T094952Z (20190428T095452Z - 5 min.)',
code: 'SignatureDoesNotMatch',
time: 2019-04-27T23:57:24.753Z,
requestId: 'ab179ef3-699b-11e9-bfe3-4ddc7ceb66ee',
statusCode: 403,
retryable: true }
After doing some research It seems to be a verification problem. Does anyone has experience with this or knows how to resolve this issue. Thanks a lot!
Any time you see an error like "is now earlier than" around some numbers that look like timestamps (20190427T235724Z -> 2019-04-27 23:57:24 UTC), that's an indicator that the error is time related. Time matters for cryptography in order to validate certificates (so that an attacker cannot break a certificate and use it after its expiration, among other reasons) [1]. In this case, either your clock or the remote server clock is wrongly set. Since the remote server in this case is AWS, it is highly unlikely that they have any significant clock drift, leaving you as the possible outlier.
Given that you mentioned a new computer, it is even more likely that this is due to an incorrectly set system clock.
Reset/synchronize your system clock and the error should disappear.
Reference [1]: https://security.stackexchange.com/q/72866/47422

Warning message: org.apache.http.impl.client.DefaultRequestDirector tryExecute

Please find below the code i ran (Using: eclipse-java-kepler-SR2-win32-x86_64 + IE 11)
public class SampleTest {
public static void main(String[] args) {
System.setProperty("webdriver.ie.driver", "C:\\Program Files\\IEDriverServer\\IEDriverServer.exe");
WebDriver d1 = new InternetExplorerDriver();
d1.get("http://www.google.com/");
WebElement element = d1.findElement(By.name("q"));
element.sendKeys("selenium");
System.out.println("Test Selenium");
}
}
While running I got below logs
Started InternetExplorerDriver server (64-bit)
2.40.0.0
Listening on port 22795
Mar 26, 2014 7:04:27 PM org.apache.http.impl.client.DefaultRequestDirector tryExecute
INFO: I/O exception (java.net.SocketException) caught when processing request: Software caused connection abort: recv failed
Mar 26, 2014 7:04:27 PM org.apache.http.impl.client.DefaultRequestDirector tryExecute
INFO: Retrying request
Why am i getting this warning messages all the time only in IE
While writing "Send Keys" string in "Search" text box its taking more than 5 secs for each character
Would appreciate any helpful note on these... :)
From a blog post that discusses this issue in great detail:
There are two answers to this question, a short one and a long one.
The short one is, "Read the log message. It's clearly tagged as
'INFO', as in an informational message, and not indicative of any
problem with the code?" I find that this question often comes from
users of Eclipse, and that the Eclipse console has colored the message
red, and people are so conditioned to see "red == bad" that they react
to the format of the message rather than the content. The content of
the message is flagged at a level that means, "Hey, nothing is wrong,
we're just telling you about it."
For the longer, more detailed explanation, see the blog post, but it boils down to a race condition in bringing up an HTTP server, and using an HTTP client to poll for when that server is available to receive commands.

Solr times out when I try to rebuild index for django

I am trying to build my solr index for Django on ubuntu for the first time with ./manage.py rebuild_index and I get the following error:
Removing all documents from your index because you said so.
Failed to clear Solr index: Connection to server 'http://localhost:8983/solr/update/?commit=true' timed out: HTTPConnectionPool(host='localhost', port=8983): Request timed out. (timeout=10)
All documents removed.
Indexing 4 dishess
Failed to add documents to Solr: Connection to server 'http://localhost:8983/solr/update/?commit=true' timed out: HTTPConnectionPool(host='localhost', port=8983): Request timed out. (timeout=10)
I have access to localhost:8983/solr/ and localhost:8983/solr/admin via my web browser
You can bump up the TIMEOUT in settings.py.
For example
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
'URL': 'http://127.0.0.1:8080/solr/default',
'INCLUDE_SPELLING': True,
'TIMEOUT': 60 * 5,
},
}
Important thing here is that you shoudn't increase default timeout, because it could possibly block all your workers as haystack works synchronously.
The best way to avoid this is to define multiple connections for reads and writes with different timeouts and define.
http://django-haystack.readthedocs.org/en/latest/settings.html#haystack-connections
And use routers for read and write separation http://django-haystack.readthedocs.org/en/v2.4.0/multiple_index.html#automatic-routing

Resources