Kong/Scylladb intermittent timeout issue when writing routes "received only 0 responses 2" - K8 - database

When trying to register routes via Kong /routes, we are seeing intermittent timeout issues.
Kong and Scylladb are setup in docker/k8
[error] 41#0: *119054033 [kong] api_helpers.lua:310 could not execute insertion query: [Write timeout] Operation timed out for kong.routes - received only 0 responses from 2 CL=SERIAL., client: 10.xxx.xxx.xxx, server: kong_admin, request: \"POST /routes HTTP/1.1\"
Only happens on /routes registration call, and write_request_timeout_in_ms is set to 2000ms.
Anyone seen this issue before?

Related

'amplify init' keeps failing

I recently got myself a new PC(Predator Helios 300) and I wanted to start using aws there but when I try to perform amplify init I get the error below even though I already did all the other steps such as configuration.
× Root stack creation failed
init failed
{ SignatureDoesNotMatch: Signature expired: 20190427T235724Z is now earlier than 20190428T094952Z (20190428T095452Z - 5 min.)
at Request.extractError (C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\protocol\query.js:50:29)
at Request.callListeners (C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\sequential_executor.js:106:20)
at Request.emit (C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\sequential_executor.js:78:10)
at Request.emit (C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\request.js:683:14)
at Request.transition (C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\request.js:22:10)
at AcceptorStateMachine.runTo (C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\state_machine.js:14:12)
at C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\state_machine.js:26:10
at Request.<anonymous> (C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\request.js:38:9)
at Request.<anonymous> (C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\request.js:685:12)
at Request.callListeners (C:\Users\sahve\AppData\Roaming\npm\node_modules\#aws-amplify\cli\node_modules\aws-sdk\lib\sequential_executor.js:116:18)
message:
'Signature expired: 20190427T235724Z is now earlier than 20190428T094952Z (20190428T095452Z - 5 min.)',
code: 'SignatureDoesNotMatch',
time: 2019-04-27T23:57:24.753Z,
requestId: 'ab179ef3-699b-11e9-bfe3-4ddc7ceb66ee',
statusCode: 403,
retryable: true }
After doing some research It seems to be a verification problem. Does anyone has experience with this or knows how to resolve this issue. Thanks a lot!
Any time you see an error like "is now earlier than" around some numbers that look like timestamps (20190427T235724Z -> 2019-04-27 23:57:24 UTC), that's an indicator that the error is time related. Time matters for cryptography in order to validate certificates (so that an attacker cannot break a certificate and use it after its expiration, among other reasons) [1]. In this case, either your clock or the remote server clock is wrongly set. Since the remote server in this case is AWS, it is highly unlikely that they have any significant clock drift, leaving you as the possible outlier.
Given that you mentioned a new computer, it is even more likely that this is due to an incorrectly set system clock.
Reset/synchronize your system clock and the error should disappear.
Reference [1]: https://security.stackexchange.com/q/72866/47422

Aggregate after exception from ftp consumer: FatalFallbackErrorHandler

My camel route tries to pick up some files from sftp, transfer them to network, and delete them from sftp. If the sftp is unreachable after 3 attempts, I want the route to send an email warning the admin about the problem.
For this reason my sftp address has the following parameters:
maximumReconnectAttempts=2&throwExceptionOnConnectFailed=true&consumer.bridgeErrorHandler=true
In case the network location is not available, i want the route to notify the admin and not delete the files from sftp.
For this reason i have set .handled(false) in onException.
However, when connecting to sftp fails, aggregation also fails and no emails are coming. I have made a minimalist example below:
/configure
onException(Throwable.class)
.retryAttemptedLogLevel(LoggingLevel.WARN)
.redeliveryDelay(1000)
.handled(false)
.log(LoggingLevel.ERROR, LOG, "XXX - Error moving files")
.to(AGGREGATEROUTE)
.end();
from(downloadFrom)
.to(to)
.log(LoggingLevel.INFO, LOG, "XXX - Moving file OK")
.to(AGGREGATEROUTE);
from(AGGREGATEROUTE)
.log(LoggingLevel.INFO, LOG, "XXX - Starting aggregation.")
.aggregate(constant(true), new GroupedExchangeAggregationStrategy())
.completionFromBatchConsumer()
.completionTimeout(10000)
.log(LoggingLevel.INFO, LOG, "XXX - Aggregation completed, sending mail.");
In the logs i see:
16:02| ERROR | CamelLogger.java 156 | XXX - Error moving files
Then the logs for the Exception occurring during connection.
And then this:
16:02| ERROR | FatalFallbackErrorHandler.java 174 | Exception occurred while trying to handle previously thrown exception on exchangeId: ID-LP0641-1552662095664-0-2 using: [Pipeline[[Channel[Log(proefjes.camel_cursus.routebuilders.MoveWithPickupExceptions)[XXX - Error moving files]], Channel[sendTo(direct://aggregate)]]]].
16:02| ERROR | FatalFallbackErrorHandler.java 172 | \--> New exception on exchangeId: ID-LP0641-1552662095664-0-2
org.apache.camel.component.file.GenericFileOperationFailedException: Cannot connect to sftp://user#mycompany.nl:22
at org.apache.camel.component.file.remote.SftpOperations.connect(SftpOperations.java:149)
I do not see "XXX - Starting aggregation." which i would expect to see in the log. Does some kind of error occur befor aggregation? The breakpoint on aggregate(*, *) is never reached.
First, I just want to clarify something. You write "In case the network location is not available, i want the route to notify the admin and not delete the files from sftp", but shouldn't that be obvious anyhow? I mean, if the network location is not available, wouldn't deleting the files from sftp be impossible?
It's a little confusing that your exception handler is also routing .to(AGGREGATEROUTE). Given that you want to email an admin, shouldn't that be in the exception handler, not in the happy path? Why would you and how would you "aggregate" a connection failure?
Finally, and here I think is a real problem with your implementation, you may have misunderstood what handled(false) does. Setting this to false means routing should stop and propagate the exception returned to the caller. I'm not sure what having to the .to(AGGREGATEROUTE) would do in this case, but I'm not surprised it's not being called.
I suggest trying a few things. I don't have your code so I'm not sure which will work best. These are all related and any might work:
Change handled(false) to handled(true).
Replace handled with continued(true).
Use a Dead Letter Channel.
Reference:
Handle and Continue Exceptions
Dead Letter Channel
Since errorhandling is different depending on which endpoint causes the error, i have solved this by having two different versions of onException:
//configure exception on sft end
onException(Throwable.class)
.maximumRedeliveries(2)
.retryAttemptedLogLevel(LoggingLevel.WARN)
.redeliveryDelay(1000)
.onWhen(new hasSFTPErrorPredicate())
// .continued(true) // tries to connect once, mails and continues to aggregation with empty exchange
//.handled(false) // tries to connect twice but does not reach mail
.handled(true) // tries to connect once, does reach mail
// handled not defined: tries to connect twice but does not reach mail
.log(LoggingLevel.INFO, LOG, "XXX - SFTP exception")
.to(MAIL_ROUTE)
.end();
// exception anywhere else
onException(Throwable.class)
.maximumRedeliveries(2)
.retryAttemptedLogLevel(LoggingLevel.WARN)
.redeliveryDelay(1000)
.log(LoggingLevel.ERROR, LOG, "XXX - Error moving file ${file:name}: ${exception}")
.to(AGGREGATEROUTE)
.handled(false)
.end();
Exceptions occuring at the sftp end are handled in the first onException, because there the hasSFTPErrorPredicate returns 'true'. All this predicate does is check if any exception or their cause has "Cannot connect to sftp:" in the message.
No rollback is required in this case because nothing has happened yet.
Any other exception is handled by the second onException.

Gatling active users drops to negative for a longer run after timeout exception

I'm running a simulation using gatling.
5 users per second for 150 mins.
After a certain exception:
15:58:19.643 [WARN ] i.g.h.e.GatlingHttpListener - Request 'facebook_outbound_msg' failed for user 29989
javax.net.ssl.SSLException: handshake timed out
at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source)
15:58:19.643 [ERROR] i.g.h.e.r.DefaultStatsProcessor - Request 'facebook_outbound_msg' failed for user 29989: j.n.s.SSLException: handshake timed out
15:58:19.643 [ERROR] i.g.h.c.i.DefaultHttpClient - Failed to install SslHandler
15:58:19.678 [WARN ] i.g.h.e.GatlingHttpListener - Request 'facebook_inbound_msg' failed for user 29984
javax.net.ssl.SSLException: handshake timed out
at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source)
15:58:19.678 [ERROR] i.g.h.e.r.DefaultStatsProcessor - Request 'facebook_inbound_msg' failed for user 29984: j.n.s.SSLException: handshake timed out
the number of active users dropped to -1, then everytime this exception happens, the number of users keeps dropping.
Example:
---- FacebookOutboundSimulation ------------------------------------------------
[############################################ ] 59%
waiting: 18061 / active: -3 / done: 26942
---- FacebookInboundMessageSimulation ------------------------------------------
[############################################ ] 59%
waiting: 18061 / active: -3 / done: 26942
================================================================================
Why does this happen and how to fix this?
The issue was with gatling-charts-highcharts-bundle-3.0.0-RC1 when I switched to gatling-charts-highcharts-bundle-3.0.0-RC3 that was released on 2nd October 2018 it got resolved.
Sidenote: Use AsJson instead of AsJSON.

Camel with RabbitMQ exception only occurs on second message - mis-spelt exchange name

I'm using Camel within a Spring boot application and integrate with RabbitMQ but am encountering strange behaviour.
My app has Restful endpointswhich convert the http request to a RabbitMQ message and publish this to a predefined exchange. There is a separate consumer app which listens to a queue and processes the messages.
I have deliberately entered an incorrect rabbitmq exchange name (invalidxchangename)to check that the application will fail if the exchange does not exist however the camel context starts without error and when I send in a first request is does not report any error. This message gets lost as there is no matching RabbitMQ exchange. When I submit a second request I receive the following exception which I would have expected on route startup.
com.rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no exchange 'invalidxchangename' in vhost
EDIT:
I've tried a more simple example to show the issue in Camel.
I've created a simple route as follows:
from("file:in?fileName=in.txt").log(LoggingLevel.DEBUG, "in here!").to("rabbitmq://localhost:5762/invalidexchange?declare=false");
where there is an existing RabbitMQ exchange called validexchange (so I have deliberately made a typo in the RabbitMQ uri). I would expect the camel route to fail at startup since the exchange doesn't exist, or even the first time it tries to process a new in.txt file.
What I am actually seeing in the logs is that on start up it reports no error and only on the 2nd invocation of the route does it report an error.
2015-03-11 16:17:04.356 INFO 9756 : ID-SBMELW7W-06220-59960-1426051020468-0-2 >>> (route2) from(file://in?fileName=in.txt) --> log[in here!] <<< Pattern:InOnly, Headers:...
2015-03-11 16:17:04.360 INFO 9756 : ID-SBMELW7W-06220-59960-1426051020468-0-2 >>> (route2) log[in here!] --> rabbitmq://localhost:5762/customerchannel.exchang?declare=false <<< Pattern:InOnly, Headers:...
2015-03-11 16:17:45.073 INFO 9756 : ID-SBMELW7W-06220-59960-1426051020468-0-4 >>> (route2) from(file://in?fileName=in.txt) --> log[in here!] <<< Pattern:InOnly, Headers: ...
2015-03-11 16:17:45.079 INFO 9756 : ID-SBMELW7W-06220-59960-1426051020468-0-4 >>> (route2) log[in here!] --> rabbitmq://localhost:5762/customerchannel.exchang?declare=false <<< Pattern:InOnly, Headers:...
2015-03-11 16:17:45.092 ERROR 9756 : Failed delivery for (MessageId: ID-SBMELW7W-06220-59960-1426051020468-0-3 on ExchangeId: ID-SBMELW7W-06220-59960-1426051020468-0-4). Exhausted after delivery attempt: 1 caught: com.rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no exchange 'customerchannel.exchang' in vhost '/', class-id=60, method-id=40)
It looks like the first request is causing an error which closes the connection and logs the reason, and when you try to use the channel the second time it's returning an AlreadyClosedException with the message that caused the channel to close in the first call.
You can test this by trying to publish the second message to a different exchange name in the same channel and checking which exchange is in the error. E.g. publish the second message to invalidxchangename2 and you should still see invalidxchangename as the exchange in the error.
To fix, you should handle the publish result when you publish and re-establish the connection if there's an error.
If you want to be sure that a message got delivered to a RabbitMQ queue, then you have to use publisher confirms: https://www.rabbitmq.com/confirms.html
That you are able to publish a message it doesn't mean that the message will reach a queue. You could go to a mailbox and leave a letter inside, but between the time you left the letter there and a postman picked up, many things could have happened, for example, the mailbox catching fire and so on.

Solr times out when I try to rebuild index for django

I am trying to build my solr index for Django on ubuntu for the first time with ./manage.py rebuild_index and I get the following error:
Removing all documents from your index because you said so.
Failed to clear Solr index: Connection to server 'http://localhost:8983/solr/update/?commit=true' timed out: HTTPConnectionPool(host='localhost', port=8983): Request timed out. (timeout=10)
All documents removed.
Indexing 4 dishess
Failed to add documents to Solr: Connection to server 'http://localhost:8983/solr/update/?commit=true' timed out: HTTPConnectionPool(host='localhost', port=8983): Request timed out. (timeout=10)
I have access to localhost:8983/solr/ and localhost:8983/solr/admin via my web browser
You can bump up the TIMEOUT in settings.py.
For example
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
'URL': 'http://127.0.0.1:8080/solr/default',
'INCLUDE_SPELLING': True,
'TIMEOUT': 60 * 5,
},
}
Important thing here is that you shoudn't increase default timeout, because it could possibly block all your workers as haystack works synchronously.
The best way to avoid this is to define multiple connections for reads and writes with different timeouts and define.
http://django-haystack.readthedocs.org/en/latest/settings.html#haystack-connections
And use routers for read and write separation http://django-haystack.readthedocs.org/en/v2.4.0/multiple_index.html#automatic-routing

Resources