Connection timeout Feign client while consuming huge data - hystrix

I have a FeignClient which is configured to connect to 2 different services via Eureka service discovery.
It works perfectly fine for Service A but fails for Service B. Where Service B consumes hbase data and took nearly 5 minutes to complete. While consuming B it throws
java.net.ConnectException: Connection timed out: connect.
On debugging I have found Service B have called and proceeded with its manipulation, but service which requested feign throws time out exception.

Related

ASP.NET Core 3.1 API on AWS - Connection Timeout Expired intermittently

I have ASP.NET Core 3.1 API hosted in AWS (on IIS 10 and SQl Server on separate Box). More frequently than I would like, I am getting the below error (intermittently).
Error : Connection Timeout Expired. The timeout period elapsed while attempting to consume the pre-login handshake acknowledgement. This could be because the pre-login handshake failed or the server was unable to respond back in time. The duration spent while attempting to connect to this server was - [Pre-Login] initialization=21020; handshake=0;
Here is the conenction string:
"ConDB": "Server=SqlServerName;Database=Test;user id=User;password=xyz;MultipleActiveResultSets=true;Application Name=AppName;TrustServerCertificate=true;"
If I add connect timeout=240;, I don't get the error but intermittently the connection to database takes a long time to establish before I see the API call results.
I have other .NET framework based projects on the same AWS setup, they all run fine.
Any help appreciated.

Every 1 in 30 connections I get Win32Exception: Unknown location error. Azure web app to AWS SQL DB

We have a couple of .NET Core 3.0 Web Apps (UK South) that connect to a MS SQL 2016 database which is running on an Amazon Windows Server 2016 Datacenter (EC2 instance). We connect via an Azure Relay/Hybrid Connection which is installed on the SQL Server.
It has been working fine for over a year with no errors, but recently we've started getting the following error, about 1 in every 30 connections:
An unhandled exception occurred while processing the request.
Win32Exception: An existing connection was forcibly closed by the remote host.
Unknown location
SqlException: A connection was successfully established with the server, but
then an error occurred during the pre-login handshake. (provider: TCP Provider,
error: 0 - An existing connection was forcibly closed by the remote host.)
If you try again it usually works.
After reading a lot of posts on this I added transient error handling to the code/resilience using EnableRetryOnFailure() to the DB connection.
I also tried adding Trusted_Connection=False to the connection string.
After this the you could see the connection re-trying multiple times until it worked, sometimes taking 20 seconds or more. Still, maybe 1 in 100 connections it eventually fails with the same error.
We also looked at the TLS_DHE bug https://learn.microsoft.com/en-us/troubleshoot/windows-server/identity/apps-forcibly-closed-tls-connection-errors but the TLS_DHE ciphers are not installed on the server at all.
There's nothing in the event logs on the Windows server, or in the database logs at the time of the error.
Recent changes in the infrastructure: Panda antivirus, moved web apps to a different Azure region.
I've been reading posts on this for days now, mostly really old and slightly different. I'm looking for any ideas of things to try to pinpoint the error. Thanks.
edit: I found some event logs in Microsoft/ServiceBus/Client
HybridConnectionManager Trace: Microsoft.Azure.Relay.RelayException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.WebSockets.WebSocketException: An internal WebSocket error occurred. Please see the innerException, if present, for more details. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult)
at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult)
--- End of inner exception stack trace ---
Well, this took three months to resolve and it involved our network support team, AWS support, and Azure support.
I've come back three times to edit this answer. The solution returned on a different server so we tried the fixes that worked on one and they didn't work!
In Azure Relay/Hyrbid connections, under the connection in question we saw there were TWO listeners, when there should only be one. Each Hybrid Connection Manager you install and connect shows up there as a listener.
So where was the second listener? Nowhere. It seemed to be a hanging orphan link from a previously deleted connection.
The only way to delete the phantom listener was to
uninstall HCM on the database server
remove the connection from all azure apps using it
delete the hybrid connection completely in azure
recreate the connection in azure afresh
reconnect the apps
reinstall HCM on the database server
connect HCM to the new hybrid connection
After this we showed one listener under the connection in Azure, and things worked immediately.
When you have two listeners the data is load balanced between them, so in my case half the time the data was being routed to a non-existent listener and failing. This is why no logs appeared on the database server - it wasn't getting there at all!

Intermittent Read Timeouts to SQL Server DB - com.microsoft.sqlserver.jdbc.SQLServerException: Read timed out

I have a software service running via Tomcat which performs some analytics on data from my SQLServer database using the SQLServer JDBC driver.
Around 50% of the time when running SQL queries I receive the error:
com.microsoft.sqlserver.jdbc.SQLServerException: Read timed out
I have set my connectiontimeout property in the connection strings that the software uses to connect to my database. I have also set the connection timeout within the software itself in the settings. This has no effect on the error, I set connection timeout to 120 seconds and it still throws a readtimeout within 30.
The intermittent nature of the timeouts is very strange. I'll get this error for a few minutes, and then it will work for a few minutes, refreshing easily.
Is anyone familiar with the error? Are there any more settings I could set in the connection string to try and resolve the readtimeout as it seems to be a different thing to connectiontimeout.

Sql Server Services not restarting - The request failed or the service did not respond in a timely fashion

I tried to establish remote connection to my Sql Server 2014 on Windows Server 2008 machine. To do this I enabled TCP/IP added port 1433. The prompt asked me to restart my services for the changes to take effect. However on trying to restart the services, I'm getting the below error message -
The request failed or the service did not respond in a timely fashion. Consult the event log or other applicable error logs for details.
The steps I have tried -
1. Tried disabling VIA
2. Tried replacing Master and MastLog files from TemplateData to Data
3. Tried disabling TCP/IP
4. Tried starting the services with user as local system
I tried the above based on similar questions from stackoverflow and nothing seems to work.

IIS + Kerberos + SQL Server + EF Initial connection failure

I have a web server on my domain that I'm trying to use Kerberos delegation to allow access to my SQL Server. They are all Server 2008 R2 servers with IIS 7.5 and SQL 2008 R2 (the DC is also Server 2008 R2).
Everything is working, in that I see transactions being executed on my SQL Server under the user's account. However, the first time I access the site after an extended period of time (30 mins or so) I get the following error thrown by my EF DataContext object:
Exception: The underlying provider failed on Open
at System.Data.EntityClient.EntityConnection.OptenStoreConnectionIf...
Inner Exception: A network-related or instance-specific error occurred while
establishing a connection to SQL Server. The server was not found or was not
accessible. Verify that the instance name is correct and that SQL Server is
configured to allow remote connections. (provider: Named Pipes Provider,
error: 40 - Could not open a connection to SQL Server)
Inner Inner Exception: The system cannot find the file specified
The error page takes ~20 to 30 seconds to be served. After receiving this error, if I hit refresh in my browser, I get the page with all of the data almost instantly (around 200ms)
What would be causing this initial connection to fail, but all subsequent connections to succeed?
Misc information:
EF 6.0
IIS 7.5, Windows Auth & APS.NET Impersonation enabled, Extended Protection Off, Kernal-mode auth Off, Providers - Negotiate:Kerberos
AppPool uses service account (all SPNs are registered to that account)
If there is any more information that you need, let me know and I'll update this list!
UPDATE:
After doing several network traces, I'm seeing the following pattern:
HTTP Request 1
6 frames of KerberosV5 traffic
HTTP Response: No SQL Data
HTTP Request 2
2 frames of KerberosV5 traffic
TDS Prelogin
TDS Response
2 more frames KerberosV5 traffic (TGS MSSQLSvc request and response)
6 frames of TDS Traffic (SQL Data)
HTTP Response: Success!!
I'm thinking this is a kerberos issue...
I can't really tell what is causing your issue, but here is a tip on how you can deal with it, just in case you don't manage to find the cause:
EF CodePlex Link on Connection Resiliency
MSDN article on Connection Resiliency
This is feature introduced with Entity Framework 6.x. By the default, when EF encounters issue that you've brought up, it will throw an exception and then if you want to have a retry, you must write quite a messy code and duplicate it everywhere.
With Connection Resiliency, you're able to write DbExecutionStrategy that suits you the best. DbExecutionStrategy has a method that you can override that gives you ability to decide whether the query should be executed again once specific Exception type occurs. For the executing code and end user, this would just look like slight delay in execution, no error would appear.
From my personal experience, what you see now can be caused by many things, including some setting on your hosting provider (if you're not hosting it on premises). I'd look look into SQL logs or Event Viewer to see if SQL is from some reason going to a state where it is not available.

Resources