I have a silverlight 4 application using the ClientHttp stack to make a WebRequest which serves a binary stream. I then read from this stream and do stuff. However, I have the following problem: the server buffers the data that it sends down, so that the send process is like send-pause-send-pause-send...
Sometimes the server takes a little longer pause (around 20 seconds), at which point the connection seems to somehow break. I don't get any exception in Silverlight, actually to the code it looks like the read from the web response stream finished ok (i.e. no more data). However, the server did not actually send all its data down (which I can test from a non-Silverlight application that will get more data after that pause). I'm thinking this might be some timeout issue (which from what I read around one can't set in Silverlight explicitly), but it's weird that I don't get an Exception indicating the timeout. Also, the pause is not that long, I would expect 20sec to be a reasonable time.
I've also looked at the TCP traffic and looks like after the pause, Silverlight sends a FIN message to the server. So it seems like it kind of times out and decides to break the connection, but it doesn't actually report the timeout as an Exception or give me any way to avoid it.
Any ideas what's actually going on and how could I prevent it?
Thanks!
UPDATE: Found the problem. There is a registry key that controls system-wide web request timeout behavior and some apps set it to 10 seconds (e.g. Install Anywhere) and "forget" to ever set it back. The key is this: HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings\ReceiveTimeout
I changed it back to a greater value and now it works fine! Hth.
Merely quoting the OP but providing an answer:
UPDATE: Found the problem. There is a registry key that controls system-wide web request timeout behavior and some apps set it to 10 seconds (e.g. Install Anywhere) and "forget" to ever set it back. The key is this: HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings\ReceiveTimeout
I changed it back to a greater value and now it works fine! Hth.
Related
I'm using VxWorks 5.4 and attempting to connect to a server via TCP. A server which I'm going to be sending logs to, but for some reason at boot it fails or takes even up to 6 seconds - and is blocking the continuation of the task that the connection attempt was made in, which obviously is a big no no.
I have checked if the problem is one the server side by making a simple c program in windows that would connect to that server, and it takes no time at all (milliseconds).
I have "solved" the problem by making a task that would attempt "connectwithtineout" every 1-2 seconds and it does work (initiates the connection after around 2 fails in around 20ms), but I don't really like this approach and would have liked to initiate the actual connection when whatever I need that I'm missing is there and up instead of checking if I can connect every time.
After trying to investigate what the issue could have been, eventually the problem was about how a session is being closed between my system and the server.
You see, when you have a client running on some app on your windows/ or whatever other system, when you shut it down, it goes through some processes that close the session properly.
That is not the case in my system where to close it I essentially unplug the wire - thereby not having my system go through a shutdown process that involves properly closing the session.
After the system is up again, the connect function cannot be performed because my system tries to make the same session as the "dead one" which the server thinks is running.
Solving the problem was easy from the server side, just have a keepalive functionality - if your system doesn't respond for a while that you decide, close the session.
We recently developed an application which will run a query in DB2 and send a mail to the corresponding recipient. It works well in our local system and QA region. But in production, few queries failed (even if it's rare, like once in week). It throws the exception below.
Exception InnerDetails:
ERROR [40003] [IBM][CLI Driver] SQL30081N A communication error has
been detected. Communication protocol being used: "TCP/IP".
Communication API being used: "SOCKETS". Location where the error was
detected: "111.111.111.111". Communication function detecting the
error: "recv". Protocol specific error code(s): "10004", "", "".
SQLSTATE=08001
Since error occurs only in production and not very often, we are not sure whether it is the code or a setting issue. Do you have any idea?
We recently discussed this issue with our IBM rep. After looking in their internal knowledge base, he suggested we add "Interrupt=0" to our connection string, based on recommendations given to other customers that had the same problem.
The default value for Interrupt was 1 before v10.5 FP2 and still is for most connections. They changed the default value to 2 for connections to z/OS (mainframe) in FP2.
We're using C# and the connection string properties for the IBM Data Server Driver for .Net can be found here. I'm sure there is a similar property for their drivers for other languages.
This page from the IBM docs goes into a bit more detail about the setting.
We haven't seen the issue since we recently added the property, but it was always intermittent so I can't yet confidently say that the problem is fixed. Time will tell...
That particular error (SQL30081N) is just a generic message that indicates a network issue between your DB2 client and the server. In this case, you want to look at the Protocol specific error code(s). Here, it looks like you're on Windows, and that particular code (10004) isn't given in the IBM documentation.
So, if you google "windows network error codes", you'll find this page, which says:
WSAEINTR
10004
Interrupted function call.
A blocking operation was interrupted by a call to WSACancelBlockingCall.
Which links to this page with more information on that specific function (emphasis mine):
The WSACancelBlockingCall function has been removed in compliance
with the Windows Sockets 2 specification, revision 2.2.0.
The function is not exported directly by WS2_32.DLL and Windows
Sockets 2 applications should not use this function. Windows Sockets
1.1 applications that call this function are still supported through the WINSOCK.DLL and WSOCK32.DLL.
Blocking hooks are generally used to keep a single-threaded GUI
application responsive during calls to blocking functions. Instead of
using blocking hooks, an applications should use a separate thread
(separate from the main GUI thread) for network activity.
I'm guessing that your application may be blocking for a longer time in your production application than your other environments, and something along the way is causing the interrupt.
Hopefully this leads you down the right path...
I spent hours to solve the same problem and fixed it. I use a Windows exe (developed with C#.NET) to run a SELECT query from a DB2 database and I sometimes got this error. Finally I realized that my problem is a time out error. Error with protocol code "10004" message, sometimes occurs if query execution is longer than 30 seconds which is default timeout value. Maybe the interruption call on the "Windows Socket Error Codes" page occurs for time out mechanism. I add aline to set an acceptable timeout value and got rid off this annoying error. I hope it helps other.
Here is my code fix :
...
connDb.Open();
DB2Command cmdDb = new DB2Command(QueryText,connDb);
cmdDb.CommandTimeout = 300; //I added this line.
using (DB2DataReader readerDb = cmdDb.ExecuteReader())
{
...
everyone. It's me again, the guy porting WinPcap from the NDIS 6 protocol to NDIS 6 filter:) I have encountered a bug, which trapped me for two days. Here it is: After I installed the npf6x.sys driver (original named npf.sys), the service can be started by "net start npf". Then I opened Wireshark. Then the network got down (an exclamation mark on the tray icon). After remote debugging, I found the FilterReceiveNetBufferLists routine is never called. I believe the RX link was broken here. However, FilterSendNetBufferLists is called normally. I'm sure the FilterAttach has been successfully called and no FilterUnload is called now. So the filter module should be still in its place. But it just cannot work in the RX path. Then I clicked the "Start" button of Wireshark, I unexpectedly found the network had recovered. Then I stopped the current capture and clicked "Interface List", the network was down again. It is so weird.
I didn't change the handler pointer in the running process of the driver. I seems that the driver is not blocked by locks too. Can anyone tell me if there is any case to cause NDIS not to call the FilterReceiveNetBufferLists of a filter during its running?
Also are there any offcial documents addressing how to port from NDIS 6 protocol to NDIS 6 filter? I only found documents for porting from NDIS 5 to NDIS 6.
thanks.
We have no official documentation on LWF->Protocol, since that's not a very common transition.
It's hard to say what's caused the network to go down, since there can be many causes. The best approach is to use a kernel debugger and start analyzing things with !ndiskd.miniport. Here's a general checklist of things to look at when the network goes down:
Is the miniport in a normal state? Check that !ndiskd.miniport shows everything in the STATE area as green or normal-looking. Make sure the datapath is normal (not bypassed) and the media connect state is connected.
Is your filter driver loaded where you think it should be loaded? Check that !ndiskd.miniport's BINDINGS section shows your filter being listed. If you're using the new Windows 8.1 WDK, also check that the filter's binding isn't "declined".
Does the miniport's receive filter allow the usual set of incoming packets? Check that !ndiskd.miniport -filterdb shows the miniport has at least DIRECTED and MULTICAST traffic allowed in.
Is the miniport attempting to indicate traffic? Set a breakpoint on ndis!NdisMIndicateReceiveNetBufferLists, and verify that the breakpoint hits frequently, as the NIC is giving received packets to the OS.
Is TCPIP attempting to send traffic? If TCPIP isn't sending traffic, then there won't be any replies to receive. Set a breakpoint on ndis!NdisSendNetBufferLists to see if TCPIP is sending any traffic. If it is, set another breakpoint on the miniports send handler (use !ndiskd.minidriver to find its MiniportSendNetBufferLists handler) and verify that the send packets are making it down to the NIC.
Is the miniport's pool of receive packets empty? If so, the miniport won't be able to indicate any more packets, because it has run out of NBLs. Use !ndiskd.pendingnbls to see if there are any NBLs that haven't been returned yet. It's typical for it to find zero or maybe one pending NBL; if you see it find hundreds, then there's an NBL leak in your filter.
Has the miniport noticed any problems? Check the miniport statistics. In Windows 8, use Get-NetAdapterStatistics from PowerShell.
If you're new to Windows kernel network debugging, it will be difficult for you to determine whether some things look good or bad. Ideally, you'd have another working computer to debug, so you can see what "normal" looks like.
If your search still doesn't turn up anything useful, another angle of attack is to do a binary search on the code changes you've made. First, comment-out all the changes you made to your filter's receive path, and restore it to exactly like in the sample. Does that fix the problem? If so, continue. . . .
I am testing my WPF application connecting to Azure Blob Storage to download a bunch of images using TPL (tasks).
It is expected that in Live environment, there will be highly transient connection to the internet at deployed locations.
I have set Retry Policy and time-out in BlobRequestOptions as below:
//Note the values here are for test purposes only
//CloudRetryPolicy is a custom method returning adequate Retry Policy
// i.e. retry 3 times, wait 2 seconds between retries
blobClient.RetryPolicy = CloudRetryPolicy(3, new TimeSpan(0, 0, 2));
BlobRequestOptions bro = new BlobRequestOptions() { Timeout = TimeSpan.FromSeconds(20) };
blob.DownloadToFile(LocalPath, bro);
The above statements are in a background task that work as expected and I have appropriate exception handling in background task and the continuation task.
In order to test exception handling and my recovery code, I am simulating internet disconnection by pulling out the network cable. I have hooked up a method to System.Net.NetworkChange.NetworkAvailabilityChanged event on UI thread and I can detect connection/disconnection as expected and update UI accordingly.
My problem is: If I pull the network cable while a file is being downloaded (via blob.DownloadToFile), the background thread just hangs. It does not timeout, does not crash, does not throw exception, nothing!!! As I write, I have been waiting ~30 mins and no response/processing has happened in relation to background task.
If I pull the network cable, before download starts, execution is as expected. i.e. I can see retries happening, exceptions raised and passed ahead and so on.
Has anyone experienced similar behaviour? Any tips/suggestions to overcome this behaviour/problem?
By the way, I am aware that I can cancel the download task on detection of network connectivity loss, but I do not want to do this as network connectivity can get restored within the time-out duration and the download process can continue from where it was interrupted. I have tested this auto resumption and works nicely.
Below is a rough indication of my code structure (not syntactically correct, just a flow indication)
btnClick()
{
declare background_task
attach continuewith_task to background task
start background task
}
background_task()
{
try
{
... connection setup ...
blob.DownloadToFile(LocalPath, bro);
}
catch(exception ex)
{
... exception handling ....
// in case of connectivity loss while download is in progress
// this block is not getting executed
// debugger just sits idle without a current statement
}
}
continuewith_task()
{
check if antecedent task is faulted
{
... do recovery work ...
// this is working as expected if connectivity is lost
// before download starts
// this task does not get called if connectivity is lost
// while file transfer is taking place
}
else
{
.. further processing ...
}
}
Avkash is correct I believe. Also, to be clear, you will basically never see that network removed error so not a lot of point in testing for it. You will see a ton of connection rejected, conflicts, missing resources, read-only accounts, throttles, access denied, even DNS resolution failures depending on how you are handling storage accounts. You should test for those.
That being said, I would suggest you do not use the RetryPolicy at all with blob or table storage. For most of the errors you will actually encounter, they are not retryable to begin with (e.g. 404, 409, 403, etc.). When you have a retry policy in place, it will by default actually try it 4 more times over the next 2 minutes. There is no point in retrying bad credentials for instance.
You are far better off to simply handle the error and retry selectively yourself (timeouts and throttle are about the only thing that make sense here).
Your problem is mainly caused because Azure storage client libraries uses file streaming classes underneath and that why the API hang is not directly related with Windows Azure Blob client library. Calling file streaming API directly over network you can see the exact same behavior when network cable is suddenly removed, however removing network gracefully will return different behavior.
If you search on internet you will find streaming classes does not detect the network loss and that's why in your code you can check the network disconnect event and then stop the background streaming thread.
I am making an application which runs on our every PC in random times. It works fine, however if the PC is currently shutting down, then I can't read the WMI and i get some errors. So I need to determinate if a PC is shutting down currently, and so i could avoid these errors. Does anyone has an idee?
Thanks!
Call GetSystemMetrics with index SM_SHUTTINGDOWN (0x2000).
Create a hidden top-level window and listen for WM_ENDSESSION messages. The value of wParam will tell you whether the entire system is going down, or whether the user is logging off.
If your app is a console app then use SetConsoleCtrlHandler to register to receive shutdown notifications.
Any attempt to detect this situation will have a race condition: the system shutdown might start immediately after you detect that it's not shutting down, but before you try to perform the operations that won't work during shutdown. Thus your approach to fixing the problem is wrong. Instead you just need to handle the WMI read failures and determine if they're cause by system shutdown, and in this case abort the operation or proceed in whatever alternate way makes sense.
It might be possible to use a sort of synchronous shutdown detection mechanism where you can actually lock/delay the shutdown for a brief interval before it proceeds, and do your processing in that interval. If so, that would also be a safe approach without race conditions.