DB2 Communication Error - database

We recently developed an application which will run a query in DB2 and send a mail to the corresponding recipient. It works well in our local system and QA region. But in production, few queries failed (even if it's rare, like once in week). It throws the exception below.
Exception InnerDetails:
ERROR [40003] [IBM][CLI Driver] SQL30081N A communication error has
been detected. Communication protocol being used: "TCP/IP".
Communication API being used: "SOCKETS". Location where the error was
detected: "111.111.111.111". Communication function detecting the
error: "recv". Protocol specific error code(s): "10004", "", "".
SQLSTATE=08001
Since error occurs only in production and not very often, we are not sure whether it is the code or a setting issue. Do you have any idea?

We recently discussed this issue with our IBM rep. After looking in their internal knowledge base, he suggested we add "Interrupt=0" to our connection string, based on recommendations given to other customers that had the same problem.
The default value for Interrupt was 1 before v10.5 FP2 and still is for most connections. They changed the default value to 2 for connections to z/OS (mainframe) in FP2.
We're using C# and the connection string properties for the IBM Data Server Driver for .Net can be found here. I'm sure there is a similar property for their drivers for other languages.
This page from the IBM docs goes into a bit more detail about the setting.
We haven't seen the issue since we recently added the property, but it was always intermittent so I can't yet confidently say that the problem is fixed. Time will tell...

That particular error (SQL30081N) is just a generic message that indicates a network issue between your DB2 client and the server. In this case, you want to look at the Protocol specific error code(s). Here, it looks like you're on Windows, and that particular code (10004) isn't given in the IBM documentation.
So, if you google "windows network error codes", you'll find this page, which says:
WSAEINTR
10004
Interrupted function call.
A blocking operation was interrupted by a call to WSACancelBlockingCall.
Which links to this page with more information on that specific function (emphasis mine):
The WSACancelBlockingCall function has been removed in compliance
with the Windows Sockets 2 specification, revision 2.2.0.
The function is not exported directly by WS2_32.DLL and Windows
Sockets 2 applications should not use this function. Windows Sockets
1.1 applications that call this function are still supported through the WINSOCK.DLL and WSOCK32.DLL.
Blocking hooks are generally used to keep a single-threaded GUI
application responsive during calls to blocking functions. Instead of
using blocking hooks, an applications should use a separate thread
(separate from the main GUI thread) for network activity.
I'm guessing that your application may be blocking for a longer time in your production application than your other environments, and something along the way is causing the interrupt.
Hopefully this leads you down the right path...

I spent hours to solve the same problem and fixed it. I use a Windows exe (developed with C#.NET) to run a SELECT query from a DB2 database and I sometimes got this error. Finally I realized that my problem is a time out error. Error with protocol code "10004" message, sometimes occurs if query execution is longer than 30 seconds which is default timeout value. Maybe the interruption call on the "Windows Socket Error Codes" page occurs for time out mechanism. I add aline to set an acceptable timeout value and got rid off this annoying error. I hope it helps other.
Here is my code fix :
...
connDb.Open();
DB2Command cmdDb = new DB2Command(QueryText,connDb);
cmdDb.CommandTimeout = 300; //I added this line.
using (DB2DataReader readerDb = cmdDb.ExecuteReader())
{
...

Related

How to wait for Windows TCP Network at startup?

I know this should be obvious, but I have found far too many DIFFERENT answers and the ones I've tried all fail (sometimes or all the time), so...
We are working on a service and some applications that run at startup on a Windows 10 computer that performs an automatic login. The service and applications require Windows sockets for TCP, UDP and Multicast. Most of the time, our programs fail because they get errors about the network not being ready and such. Currently, we work around this by just adding a dumb, fixed length delay time before attempting to start, but we would prefer to start as soon at the network is ready to be used.
Our most recent attempt was to wait on the LanmanWorkstation (Workstation) service, but that generally reports it is running/ready before the sockets functions will succeed. I have also seen suggestions to use LanmanServer (Server) or Netman (Network Connections) or maybe even Tcpip (TCP/IP Protocol Driver), but I cannot find anything definitive. One would think this is a common requirement, so why would Microsoft make the info so difficult to find?
Ahem. Does any know a definitive method for a service or application to wait until winsock functions will succeed before using them? Short of a spin wait on a failing winsock function, of course!

Tibco RV C client stops receiving messages

I've got a C client listening to Tibco RV (using 8.4.0). The source pumps out messages on PREFIX1.* and PREFIX2.* pretty frequently (can be several times per second).
I have six threads, each listening to a particular SUFFIX, eg PREFIX1.SUFFIX_A and PREFIX2.SUFFIX_A. So each thread has a listener and its own queue for these two messages. I've got a queue size limit of 1000, dropping the oldest 200 if we hit that (but never have more than about 40 in the queue at busy times).
After running fine for many hours, each day the program suddenly stops receiving data. The source continues to publish but I no longer dispatch events from any queue. I don't understand what can have caused this (aside from deleting the listeners).
What might have caused the listening to stop? Or alternatively, given the system is high frequency how can this be investigated? Can I tell whether a listener is still active via the C interface? I couldn't see anything in the API for that.
Thanks for any help,
-Dave
It looks like the problem was that the machine had only a partial install of RV. In particular, there was no rv daemon in the package that we had for that machine. I'm actually a bit confused how we managed to get network data at all after re-reading the docs but it seems that without a daemon we can achieve networking until a minor network problem, then nothing; with the daemon we recover from network errors.
So the fix for this case was simply to install the full package and ensure the daemon runs constantly. Now the problem appears to have disappeared.

freeipmi - ipmimonitoring_sensors returning internal ipmi error

I am executing the ipmimonitoring-sensors.c example provided in the freeipmi library.
It throws internal error sometimes. Issue is reproducible when i execute the program back to back couple of times. I need to wait approximately 30 sec after the last execution for the program to run properly. Has anyone faced this issue before? If yes, can you tell me how to avoid it.
This is the error ipmi_monitoring_sensor_readings_by_record_id: internal error
Thanks
FreeIPMI maintainer here. The "internal error" indicates some logical error that the library doesn't know how to handle. Given its coming from ipmi_monitoring_sensor_readings_by_record_id and it occurs when you run the program back to back, I would bet there is some internal IPMI issue on your system.
Perhaps the motherboard has some issue with a high amount of IPMI traffic or a sensor has issues with a high number of requests. Many of these situations are handled more gracefully (perhaps give a BUSY error or minimally SYSTEM error), but perhaps there is some combo of error situations I haven't yet seen. (Lots of motherboards return errors that would be considered non-standard or unexpected).
If you're interested in working through that, just send something onto the FreeIPMI mailing list.
Set the driver_type = -1 (for default) and it works.

dbus: ConnectProfile method: error host is down

Actually I'm using D-Feet (D-Feet can be used to inspect D-Bus interfaces of running programs and invoke methods on those interfaces) to connect to a BLE peripheral that advertises proximity profile.
When I try the Connect() method on the remote object /org/bluez/hci0/dev_88_6B_0F_00_C4_3A every thing is fine and the connection succeed but when I try to connect only the proximity profile using ConnectProfile("0x1802") method an error occurs saying that the host is down:
g-io-error-quark: GDBus.Error:org.bluez.Error.Failed: Host is down
(36)
Can anyone help me solving this problem (I'm blocked for 2 weeks and there still to much to deal with in the project :/)
ConnectProfile("0x1802")
ConnectProfile (and the Bluez API in general) does not deal with handles, only UUIDs. Your input argument does not look like a UUID: I suggest you find the remote service UUID that matches the handle (I'm assuming your current input argument is a handle).
I believe you can find the UUID with d-feet (after Connect() the service objects should be there) or with bluez command line tools.

Windows socket error code 10055

I've developed an app that uses sockets over windows. It works perfectly but after some time, the internet connection begin to fail and finally I get this error (10055), which means that my app run out of buffer space.
Actually I think I am only using 2 sockets with the code i did by myself, but it's true that I'm using a 3rd party library that I have no idea how it's implemented.
I've read that there are lot of literature about this trouble, so I am not the only that suffers from it, but I cannot realise how to solve it, or at least, by-pass it, because when it fails, it makes my computer to lose internet connection. I've tried it by catching this error and when it occurs, doing a WSACleanup(), WSAStartup() even when it's not the best practise... but my app still get stacked in this error.
Any advice will be pretty much appreciated.
Usually this happens when you dnt close your socket properly. Make sure you have both shutdown and closesocket when you want to close the socket (http://msdn.microsoft.com/en-us/library/windows/desktop/ms741394(v=vs.85).aspx) From MSDN - "Note To assure that all data is sent and received on a connection, an application should call shutdown before calling closesocket"
Before you bind the socket, you can use SO_REUSEADDR for setsocketopt which will "Allows the socket to be bound to an address that is already in use" (http://msdn.microsoft.com/en-us/library/windows/desktop/ms740476(v=vs.85).aspx)
Finally, look at this blog - http://blogs.technet.com/b/yongrhee/archive/2011/12/19/how-to-troubleshoot-a-handle-leak.aspx
You have one or more resource leaks in your application.
Without the code I can only give general recommendations.
I recommend that you run Valgrind or similar tools to help you find the resource leak.
Another way is by reviewing the code.
If the leak started recently you can probably find it by reviewing just recent changes.
MSDN has an article on how to locate memory leaks using Visual Studio. (Remember to choose your version of Visual Studio on the linked page).
One cause of this error in Windows is the exhaustion of the ephemeral TCP ports pool.
It's easy to reproduce this error: just create a program that loops in binding port 0.
Very soon this error will happen.
When we pass a 0 to the bind socket function, Windows chooses an ephemeral port to use.

Resources