SQL Server Service Broker not sending messages, unable to start endpoint - sql-server

I had a perfectly working SQL Server Service Broker this morning, until I tested how it recovers from crashing.
I forced a system shutdown on the sender during a messaging session between servers over a network. I was sending binary messages of about 5mb size. There are automatic procedures for sending, replying and receiving messages and ending conversations from both sides in place and my setup uses certificates for security.
I am now unable to send any messages from the server side.
Both sides of the messaging chain have queues on and it does not seem like poison message handling would be causing this. The sender side accepts new messages but is not sending them.
The sender side transmission queue has messages with transmission_status
The Service Broker endpoint cannot listen for connections due to the following error: '10013(An attempt was made to access a socket in a way forbidden by its access permissions.)'.
Running ALTER ENDPOINT myendpoint STATE = STARTED returns the same error as above.
Running select * from sys.endpoints shows the endpoint with state_desc = STARTED anyhow..
Running select state_desc from [sender_database].sys.conversation_endpoints shows state_desc = CONVERSING for all results.
Running SELECT COUNT(*) FROM dbo.sender_queue returns 0.
There is no other traffic to the port my endpoint is using, at least not any that is visible with netstat or the TCPView tool. The ports have rules to allow traffic from the firewall and sqlagent and sqlsrvr processes also have extra rules to be allowed.
Using ssbdiagnose tool with ssbdiagnose -level info configuration from service... from the sender side shows a (not new) error
The route for service sender_service is classified as REMOTE. This will result in the message being forwarded.
along with some other errors about certificates that have always been there even when messaging was working. Ssbdiagnose with RUNTIME flag shows nothing at all.
Ssbdiagnose from the target side now says an exception occurs during connection. The target database also has a couple of reply messages stuck in the transmission queue with an empty transmission_status.
Edit: Seems that occasionally the status on the target side changes to the error 10060 connection failed...
What more can I do to diagnose the problem and fix it?
Edit: I tried changing the port the endpoint uses but the same error is thrown.
Edit: I am able to ping the servers from each other. Ssbdiagnose with RUNTIME option on target side says it cannot find the connection to the SQL Server that corresponds to the routing address of my sender endpoint/database.

The Service Broker endpoint cannot listen for connections due to the following error: '10013(An attempt was made to access a socket in a way forbidden by its access permissions.)'
WSAEACCESS (10013) is a rather unusual socket listen error. I never encountered it before. A quick search reveals KB3039044: Error 10013 (WSAEACCES) is returned when a second bind to a excluded port fails in Windows which is an acknowledged bug in Windows Server 2008R2, 2012 and 2012R2 when excluding a range of ports (netsh ... add excludedportrange ...). So my first question is, are you on one of the affected server OSes and are you actually using a network port exclusion range?
I strongly urge you to open a Microsoft support case for this issue and follow up with them, making sure networking guys are involved (again, WSAEACCESS is rather unusual symptom). This is not one of the usual issues and it is difficult to diagnose over forums discussion.

Related

What is the cause of a sporadic ORA-12599: TNS:cryptographic checksum mismatch?

In our production environment there are sometimes peaks of ORA-12599 errors the connection works otherwise.
The description is not quite clear to me.
Cause: The data received is not the same as the data sent.
Action:
Attempt the transaction again. If error persists, check (and correct)
the integrity of your physical connection.
https://www.oraexcel.com/database-oracle-11gR2-ORA-12599
Does that mean there is a problem with my network cable? Or any other Hardware?
Because our DBA said the error can be caused by a configuration issue on the client side.
Edit:
It started without any changes on the client or server side. And there are spikes. 80 errors in an hour and then no errors for 4 hours while the load is more or less constant
It can mean a mismatch in how the client and the server are wishing to communicate (in encrypted fashion). For example, trying to connect to an 11g database with 10g JDBC drivers will get this because the encryption implementation changed between these versions.
Check your local (ie, client) sqlnet.ora file - you may need to opt for a different method or change the order of them.
The error can be typically be ignored (its informational) but using the latest drivers will normally resolve the issue.
We had sporadic ORA-12599 errors and also ORA-24300 ("bad value for mode") in a Python web app using Flask and uwsgi. It turned out to be caused by multiple worker processes writing and reading to/from the same TCP connection to the DB, giving chaotic communications on the socket.
In our setup this was caused by creating the TCP connections to the DB before uwsgi forked the workers, and the file descriptors being inherited. We fixed this by using uwgsi's lazy-apps mode.
See also this question.

The MSDTC transaction manager was unable to pull the transaction from the source transaction manager due to communication problems

I have hosted my WebApp on server 1 and my database on server 2
But I'm getting following error
Communication with the underlying transaction manager has failed.
I googled and found a post which mentioned that it is the issue of DTC(Distributed Transaction)
I enabled DTC on server2(DB server) and made an exception of it in Firewall.
But still same error.
Here is the full stack trace
Message: System.Transactions.TransactionManagerCommunicationException: Communication with the underlying transaction manager has failed. ---> System.Runtime.InteropServices.COMException: The MSDTC transaction manager was unable to pull the transaction from the source transaction manager due to communication problems. Possible causes are: a firewall is present and it doesn't have an exception for the MSDTC process, the two machines cannot find each other by their NetBIOS names, or the support for network transactions is not enabled for one of the two transaction managers. (Exception from HRESULT: 0x8004D02B)
at System.Transactions.Oletx.IDtcProxyShimFactory.ReceiveTransaction(UInt32 propgationTokenSize, Byte[] propgationToken, IntPtr managedIdentifier, Guid& transactionIdentifier, OletxTransactionIsolationLevel& isolationLevel, ITransactionShim& transactionShim)
at System.Transactions.TransactionInterop.GetOletxTransactionFromTransmitterPropigationToken(Byte[] propagationToken)
Kindly advice
We had the exact same situation, and more than once. Each time, it was one of the following:
The IP address in the DNS for the server is outdated (as said in error message: "two machines cannot find each other by their NetBIOS names"). You can check if this is the case by trying ping servername from one server to another in the command prompt. If the ping by name fails and ping by IP succeeds (or ping by name returns the wrong IP), than you should talk to the System Admins to take a look at DNS/DHCP.
The servers are created as an image of preconfigured server (for example, if you are working with virtual machines, and instead of doing a fresh install for each of the servers, you simply clone the image). This is a problem because DTC has an internal "Identifier" - and in case of image cloning both your installations now have same DTC ID, and won't be able to communicate with each other. The solution is to simply uninstall and install the DTC again.
Hope it helps.
Things to check:
Have you done this configuration on both servers?
Are both servers members of the same domain?
Have you checked the event log?
I had the same problem while connecting to a remote SQl Server.
The solution in my case was to add "enlist=false" to the connection string.
I was missing quite a lot of things:
No authentication (as DB server and APP server and not within same AD domain)
Rule to Windows Firewall enabling msdtc.exe
Rule to firewall between DMZ and internal zone TCP 135,1024-65535 in both directions. The link tell you how to restrict the firewall policy to few ports only.
short / long server names to hosts or a shared DNS server. Eg. 192.168.1.1 app1 as well as 192.168.1.1 app1.domain.local
On the other hand based on this link my setup doesn't require:
Allow Remote Clients
Allow Remote Administration
Enable XA Transactions (required prior Windows Server 2003 SP1)
Solved after adding remote IP\machine name to files on server:
hosts, lmhosts
in folder
C:\Windows\System32\drivers\etc
One of our servers displayed this error after the Virtual Machine (VM) controlling our Domain Controller froze. Several related communication problems also started to pop up (like failed password resets). Resetting the frozen VM fixed the issue.
Lots of helpful answers already given.
One problem for me was the presence of invalid (cyrillic) characters in the computer name.
And there is also a way to validate the connection between two servers (or between a server and a computer) using a small tool from Microsoft called DTCPing.

SYN receives RST,ACK very frequently

Hi Socket Programming experts,
I am writing a proxy server on Linux for SQL server 2005/2008 running on Windows.
The proxy is coded using bsd sockets and in C, and it is working fine with a problem described below.
When I use a database client (written in JAVA, and running on a Linux box) to fire queries (with a concurrency of 100 or more) directly to the Database server, not experiencing connection resets. But through my proxy I am experiencing many connection resets.
Digging deeper I came to know that connection from 'DB client' to 'Proxy' always succeeds
but when the 'Proxy' tries to connect to the DB server the connection fails, due to the SYN packet getting RST,ACK.
That was to give some background. The question is :
Why does sometimes SYN receives RST,ACK?
DB client(linux) to Server(windows) ----> Works fine
DB client(linux) to Proxy(Linux) to Server(windows) -----> problematic
I am aware that this can happen in "connection refused" case but this definitely is not that one. SYN flooding might be another scenario, but that does not explain fine behavior while firing to Server directly.
I am suspecting some socket option setting may be required, that the client does before connecting and my proxy does not. Please put some light on this. Any help (links or pointers) is most appreciated.
Additional info:
Wrote a C client that does concurrent connections, which takes concurrency as an argument. Here are my observations:
-> At 5000 concurrency and above, some connects failed with 'connection refused'.
-> Below 2000, it works fine.
But the actual problem is observed even at a concurrency of 100 or more.
Note: The problem is time dependent sometimes it never comes at all and sometimes it is very frequent and DB client (directly to server) works fine at all times .
SQL Server needs worker threads to accept incoming connections. If your server is worker starved (which can be easily diagnosed by a high number of entries in sys.dm_os_tasks in PENDING state) then attempting to open new connection will fail. So likely what I suspect it happens is that you're pushing to the server more workload that it can handle. You need to optimize the workload or get a beefier server.
Clients like Java client make effective use of connection pooling and, even under a high load, do not need to open new connection hence you do not see this problem, instead you see only delays in requests completion.
A listening socket keeps queue of established connections and connections in establishment process (e.g. SYN got, SYNACK replied, but not ACK from client yet). If a established queue overflows, IP stack reaction differs on OS. The most traditional approach was to ignore newcoming SYNs, waiting when userland accept()s and frees a slot in the queue. With SYN flooding attacks in mid-90s, the new method was invented named "SYN cookies" which drops need for establishment queue totally, in cost of need to support special TCP option.
OTOH I have heard that Windows stacks change their behavior - under some conditions, the reaction to queue overflow is RST response. In earlier stacks (e.g. Win95) this was the main response and client side was correspondingly changed to ignore RST response to SYN:(
That's why I guess that some proxy host feature triggers RST in Windows stack.
Another guess is that DB server closes listening socket at all under some condition (e.g. detected overload peak) which appears only with proxy.
When the SYN receives the RST response, it should not be the problem of SQL-SERVER.
Because the application is able to accept the socket only after the tcp handshake finished.
Is there any device between the Proxy and the SQL-Server machines?
Try to make sure that the RST response come from sql-server machine.
You connection count is far from SYN-FLOOD, I think.

General network error after a night of inactivity

For some time now our flagship application has been having mysterious errors. The error message is the generic
[DBNETLIB][ConnectionWrite (send()).]General network error. Check your network documentation.
This is reliably reproduced by leaving the app open for the night and resuming work in the morning. Since it's a backend server app this is a normal scenario.
The funny thing is - we've migrated from SQL Server 7 to 2000 to 2008 and the issue is present on all of them. But what seems to matter is the OS on which we run the app. On WinXP it works fine, on Vista/7 it fails. So the problem is at the client end.
The results of Google on the error message cover a very wide spectrum of different causes (since this is a very generic error) and none of the scenarios found there are similar to ours.
So perhaps someone around here will know what the problem is in our case?
You should be able to reproduce this error condition on demand by:
1. Opening a database connection (in your client application)
2. Unplugging the network cable
3. Plugging network cable back in (wait until the network connection is restored)
4. Using the previously opened connection to query the database
As far as I can tell from experience, client side ADO code is not able to consistently determine if an underlying network connection is actually valid or not. Checking if the database connection is open (in the client code) returns true. However, performing any operations on that connection results in a General network error.
The connection pool appears to be able to determine when a connection goes 'bad' so it never returns a bad connection to the application. It simply opens a new connection instead.
So, if a database connection is kept alive for a long time (used or unused) by the application, the underlying TCP/IP connectivity can get broken.
The bottom line is that database connections should be closed and returned back to the connection pool when not in use.
Edit
Also, depending on the number of clients connecting to the db, not using the connection pool can cause another issue. You may hit the maximum number of sockets open on the server side. This is from memory. Once a connection is closed on the client side, the connection on the server goes into a TIME_WAIT state. By default, the server socket takes about 4 minutes to close, so it is not available to other clients during that time. The bottom line is that there is a limited number of available sockets on the server. Keeping too many connections open can create a problem.
One project I worked on easily hit this socket limit with around 120 users. A new 'feature' was added that absolutely hammered the server, and after a few hours of using the app, things would suddenly slow to a crawl for everyone. SQL server was not closing enough sockets in time for new connection requests. Although there are 65K sockets altogether, only the first 5000 are made available to the ADO (this is a default registry setting thing, so can be changed).
The number of sockets in TIME_WAIT state would slowly build up until the OS would not allocate any more. So clients had to wait until server side sockets closed and a new connection could then be created.
Have you tried disabling SNP/TCP Chimneying?
Had a similar error. For me it was indirectly caused by mismatched calls to WSACleanup and WSAStartup.
The program called WSACleanup more times than WSAStartup. This would cause a reference counter (somewhere in the sockets library) to reach zero too early.
I think effectively from that moment on all sockets owned by the process are broken.
And this would also kill the SQL client since it uses sockets to 'talk' to the SQL server as well.

SQL Server Message Broker - External Activation

I have a Sql Server inside a restricted network. I need to somehow get data from the outside in.
I would like to harness the use of Message Broker. My thinking is the external db places a message on a queue then I require a service that sits inside of the restricted LAN to listen (poll?) for these messages and then act upon them.
I cannot have the external queue initiate the normal broker conversation into the restricted LAN.
My question is should I be looking at the broker external activator to sit inside the restricted LAN and listen for new messages and then act upon them? Has anyone got any experience with this. Documentation / examples for external activator are pretty thin on the ground and monologues are not supported in message broker yet.
Is msmq a better option?
My recommendation would be to allow Service Broker to deliver the message all the way into the SQL Server instance inside the restricted lan. That will require the restricted LAN to allow incomming connection (allow the inside server to listen and accept). MSMQ would be no different, the MSMQ port(s) would have to be open in the restricted LAN.
If you want to use a dedicated process inside the restricted LAN that 'gets' the data inside then you must ensure the transactional consistency between the external server 'get' and the internal server write: the two operation have to be enrolled into a distributed transaction and the DTC protocol itself needs to be allowed to penetrate into the restricted LAN. So some ports still need to be open in the restricted LAN.
What your LAN security designers need to understand is that Service Broker connections are not Transact-SQL connections. Service Broker uses a dedicated protocol that only allows exchange of Service Broker messages. All traffic is encrypted and secured with RC4 or AES encryption. SSB cryptography is FIPS compliant. Allowing for Service Broker traffic to the SQL Server inside is probably the most secure way of allowing data from the external server to reach the secured server. In Service Broker networking there is no concept of 'client' and 'server' and one cannot design the network allowing connections only in one dirrection (eg. unlike say HTTP, which can be designed to connect from inside to outside but not the other way). SSB networking requires both machines involved to be able to connect to each other, because response messages can come after long delays (hours, days, consider the case when a queue is backed up so it takes a long time until the message is processed and a response is sent). IS not feasable to keep connecitons open for days to expect a response, so the receiver of a message must be able to connect back to the sender to deliver a response.

Resources