SSL connection fails when stalled every ~5s - c

I have a Perl streaming application that GET some file through HTTPS (for RadioParadise). The TCP flow is often paused for a few seconds because the player buffers a lot. As a result, the connection very often breaks/ends early.
I've taken various TCP logs and wireshark seems to be all confused and whines about Changing Cipher Spec and Ignored Unknown records. Obviously, the Change Cipher packets are wrong and do not match re-handshaking, there is something funny in the packets. What the logs surely shows is in the case of early termination, the server is closing the TCP connection (proper FIN frame)
I've written a small Perl example that throttles the download by using varous length pauses to show what happens. If the download is not paused by more than 1 sec, then this does not happen. When it is paused by 5s, it always happens at some point of the download. Server's openSSL is 1.0.2k and I've tried client 1.1.0j and various version from 0.9.8 to 1.0.2. I've tried also various Perl versions, no change
(it's just an example that actually shows the problem, not supposed to be proper code)
I've also written the same application in C to eliminate Perl as a potential problem and got exactly the same result.
I've also tried other HTTPS server and they do not have the same problem, it seems to be related to RP's. Maybe somebody would have an idea of what I should look at for further investigation as currently, I don't know in what direction to continue. Some general advice on why pausing for 5 sec would make any difference at TCP or SSL level (it's not a SO_KEEPALIVE issue).
NB: there is no packet lost by kernel in my tcpdump
use IO::Socket::SSL;
use Net::SSLeay;
use IO::Select;
$IO::Socket::SSL::DEBUG = 3;
$Net::SSLeay::trace = 3;
my $sock = IO::Socket::SSL->new(
Timeout => 15,
PeerAddr => '23.29.117.2',
PeerPort => 443,
SSL_startHandshake => 1,
SSL_verify_mode => Net::SSLeay::VERIFY_NONE() # SSL_VERIFY_NONE isn't recognized on some platforms?!?, and 0x00 isn't always "right"
) or do {
print("Couldn't create socket binding to $!");
return undef;
};
my $sel = IO::Select->new($sock);
my $start = time;
use constant SLEEP => 5;
syswrite ($sock, "GET /blocks/chan/0/4/1835801-1835807.flac HTTP/1.0\r\n\r\n");
sysread( $sock, my $data, 512 );
print $data;
$sock->blocking(0);
my $total;
my $sum;
while (1) {
my $bytes = sysread( $sock, my $data, 8192 );
next if !defined $bytes;
last if !$bytes;
$total += $bytes;
$sum += $bytes;
print "$total ", int ($total/(time() - $start)/1000), " kB/s \r" if time() - $start;
if ($sum > SLEEP*1024*1024/8) {
sleep(SLEEP);
$sum = 0;
}
}
print("\nfinished $total\n");
$sock->close();

Related

ZMQ Multi-part flusher: Can you get the total number of parts in a multi-part message received via ZeroMQ without reading them all?

I am implementing a simple REQ-REP pattern with ZeroMQ in C using multi-part messaging. Most of my messages have strictly 4 parts each (to and fro) with a few exceptions. To enforce the rule, I need to determine the total number of parts of a multi-part message received. Knowing if it is <= 4 is easy. Here is my receiver function:
#define BUFMAX 64 // Maximum size of text buffers
#define BUFRCV 63 // Maximum reception size of text buffers (reserve 1 space to add a terminal '\0')
char mpartstr[4][BUFMAX];
int recv_multi(void *socket,int *aremore)
// Receive upto the first 4 parts of a multipart message into mpartstr[][].
// Returns the number of parts read (upto 4) or <0 if there is an error.
// Returns -1 if there is an error with a zmq function.
// It sets aremore=1 if there are still more parts to read after the fourth
// part (or aremore=0 if not).
{
int len,rc,rcvmore,pdx,wrongpard=0;
size_t rcvmore_size = sizeof(rcvmore);
pdx=0;
len=zmq_recv(socket, mpartstr[pdx], BUFRCV, 0);
if(len==-1) return len;
mpartstr[pdx][len]='\0';
rc=zmq_getsockopt(socket,ZMQ_RCVMORE,&rcvmore,&rcvmore_size); if(rc) return -1;
pdx++;
if(rcvmore==0){*aremore=0; return pdx;}
while(rcvmore){
len=zmq_recv (socket, mpartstr[pdx], BUFRCV, 0); if(len==-1) return len; mpartstr[pdx][len]='\0';
rc=zmq_getsockopt(socket,ZMQ_RCVMORE,&rcvmore,&rcvmore_size); if(rc) return -1;
pdx++;
if(pdx==4) break;
}
*aremore=rcvmore;
return pdx;
}
All fine. But now in my main() function I check to see if there are more parts by seeing the value of aremore. In those cases where I am not expecting more I will send an error message back to the sender but I have found that ZeroMQ doesn't like it if I don't read ALL the parts of a multi-part message (it reads the remaining parts of this old multi-part message next time I make a call to zmq_recv() function, even after I send a message and expect a new clean multi-part response).
So what I really need is a kind of "flush" function to clear the remaining parts of a message that contains more than 4 parts which I want to discard. So far the only way I have to do this is an ugly arbitrary brute force exhaustion function like so (aremore will have a value of 1 to begin with - it was set by the previous function):
int recv_exhaust(void *socket,int *aremore)
// Receive the remainder of a multipart message and discard the contents.
// Use this to clean out a multi-part 'inbox' from a wrongly sent message.
// Returns 0 on success
// Returns -1 on zmq function failure
// Returns -2 on failure to exhaust even after 1000 parts.
{
int len,rc,rcvmore,pdx;
size_t rcvmore_size = sizeof(rcvmore);
pdx=1;
rcvmore=*aremore;
while(rcvmore){
len=zmq_recv(socket, mpartstr[0], BUFRCV, 0); if(len==-1) return len;
rc=zmq_getsockopt(socket,ZMQ_RCVMORE,&rcvmore,&rcvmore_size); if(rc) return -1;
pdx++;
if(pdx>1000) return -2;
}
return 0;
}
If there is no dedicated 'flusher' API then at least I could get rid of my arbitrary 1000 message limit if I had some way of knowing in advance how many parts (in total) a given multi-part message has. Surely ZeroMQ knows this because multi-part messages are sent as a whole block. Can anyone point me to the way to find that info? Or is there a proper 'flusher' function/method out there? (for standard C please - not C++/C#, etc.). Thanks in advance.
Q : Can anyone point me to the way to find that info?
Yes.
Q : is there a proper 'flusher' function/method out there?
Yes and No :
As far as ZeroMQ v2.x up until v4.3.1, there was no explicit API-call to a "flusher"
The beauty and the powers of the low-latency smart-messaging the ZeroMQ design delivers is built on a wisely crafted Zen-of-Zero : always preferring performance to comfort - as Zero-copy, Zero-warranty and other paradigms suggest.
Naive ( and I bear a lot of pain to trivialise this down to resorting to use a primitive blocking recv() ... ) "flusher" has to go all the way till the ZMQ_RCVMORE does not NACK-flag any more parts "beyond" the multi-frame-last-message ( or zmq_msg_more() == 0 does conform the same ). Still, all these operations do just a pointer-handling, no data gets "moved/copied/read" from RAM, just the pointer(s) get assigned, so it is indeed both fast and I/O-efficient :
int more;
size_t more_size = sizeof ( more );
do {
zmq_msg_t part; /* Create an empty ØMQ message to hold the message part */
int rc = zmq_msg_init (&part); assert (rc == 0 && "MSG_INIT failed" );
rc = zmq_msg_recv (&part, socket, 0); /* Block until a message is available to be received from socket */
assert (rc != -1 && "MSG_RECV failed" );
/* Determine if more message parts are to follow */
rc = zmq_getsockopt (socket, ZMQ_RCVMORE, &more, &more_size);
assert (rc == 0 && "GETSOCKOPT failed" );
zmq_msg_close (&part);
} while (more);
Given the RFC-23/ZMTP documented properties, there are but a few (wire-level telemetry encoded) warranties:
1) all messages get sent/delivered:
atomically ( either error-free binary-identical all frames, or none at all )
at most once ( per relevant peer )
in order
2) multi-part messages get additionally an internal (in-band)-telemetry "advice" of state:
a bit-flagged state { 1: more-frames-follow| 0: no-more-frames }
a bit-flagged size-type { 0: 8b-direct-octet | 1: 64b-"network-endian"-coded }
a size-advice { 0~255: direct-size | 0~2^63-1: 64b-"network-endian"-coded-size }
Documented zmq_recv() API is similarly rather explicit in this :
Multi-part messages
A ØMQ message is composed of 1 or more message parts. Each message part is an independent zmq_msg_t in its own right. ØMQ ensures atomic delivery of messages: peers shall receive either all message parts of a message or none at all. The total number of message parts is unlimited except by available memory.
An application that processes multi-part messages must use the ZMQ_RCVMORE zmq_getsockopt(3) option after calling zmq_msg_recv() to determine if there are further parts to receive.
Whatever "ugly" this may look on a first read, the worst-case that would fit in memory is a huuuuuge amount of SMALL-sized messages inside a multi-part message-frame.
The resulting time to "get-rid-of-'em" is not zero, sure, yet the benefits of compact and efficient internal ZMTP-telemetry and low-latency stream-processing is way more important goal ( and was achieved ).
If in doubts, benchmark the worst-case first :
a) "produce" about 1E9 multipart-message frames, transporting Zero-sized payloads ( no data, but all the message-framing )
b) "setup" simplest possible "topology" PUSH/PULL
c) "select" transport-class of your choice { inproc:// | ipc:// | tipc:// | ... | vmci:// } - best stack-less inproc:// ( I would start a stress-test with this )
d) stopwatch such blind-mechanical-Zero-shortcuts "flusher" between a ReferencePoint-S: when zmq_poll( ZMQ_POLLIN ) has POSACK-ed a presence of any read-able content and a ReferencePoint-E: when the last from the many-part multipart-message was looped-over by the blind-"flusher"-circus.
RESULT INTERPRETATION :
Those nanoseconds, spent between [S] and [E], do count as an evidence of a worst-case of the amount of the time that got "scapegoated" into a knowingly blind-"flusher"-looping circus. In real world use-cases, there will be additional reasons for potentially spending even more time on doing the same.
Yet, it is fair not to forget, that the responsibility of sending of such { knowingly-such-sized | ill-formated }-multi-frame-BEAST(s) is the root cause of any operational-risks on dealing with this in an otherwise ultra-low-latency, high-(almost-linear)-scalability focused messaging/signaling framework.
It is the art of the Zen-of-Zero, that has enabled this happen. All thanks to Pieter HINTJENS and his team, led by Martin SÚSTRIK, we all owe 'em a lot for being able to work with their legacy further on.

XPC service array crashes

I'm using the C interface for XPC services; incidentally my XPC service runs very nicely asides from the following problem.
The other day I tried to send a "large" array via XPC; of the order of 200,000 entries. Usually, my application deals with data of the order of a couple of thousand entries and has no problems with that. For other uses an array of this size may not be special.
Here is my C++ server code for generating the array:
xpc_connection_t remote = xpc_dictionary_get_remote_connection(event);
xpc_object_t reply = xpc_dictionary_create_reply(event);
xpc_object_t times;
times = xpc_array_create(NULL, 0);
for(unsigned int s = 0; s < data.size(); s++)
{
xpc_object_t index = xpc_uint64_create(data[s]);
xpc_array_append_value(times, index);
}
xpc_dictionary_set_value(reply, "times", times);
xpc_connection_send_message(remote, reply);
xpc_release(times);
xpc_release(reply);
and here is the client code:
xpc_object_t times = xpc_dictionary_get_value(reply, "times");
size_t count = xpc_array_get_count(times);
for(int c = 0; c < count; c++)
{
long my_time = xpc_array_get_uint64(times, c);
local_times.push_back(my_time);
}
If I try to handle a large array I get a seg fault (SIGSEGV)
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libxpc.dylib 0x00007fff90e5cc02 xpc_array_get_count + 0
When you say "extremely big array" are you speaking of something that launchd might regard as a resource-hog and kill?
XPC is only really meant for short-fast transactional runs rather than long-winded service-based runs.
If you're going to make calls that make launchd wait, then I'd suggest you try https://developer.apple.com/library/mac/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/CreatingLaunchdJobs.html
When the Service dies.. Are any specific events other then SIG_ABORTS etc... fired?
Do you get "xpc service was invalidated" (which usually means launchD killed it, or did you get "xpc service/exited prematurely" which usually is handler code error.

Arduino Serial.println() outputs a blank line if not in loop()

I'm attempting to write a function that will pull text from different sources (Ethernet client/Serial/etc.) into a single line, then compare them and run other functions based on them. Simple..
And while this works, I am having issues when trying to call a simple Serial.println() from a function OTHER than loop().
So far, I have around 140 lines of code, but here's a trimmed down version of the portion that's causing me problems:
boolean fileTerm;
setup() {
fileTerm = false;
}
loop() {
char character;
String content="";
while (Serial.available()) {
character = Serial.read();
content.concat(character);
delay(1);
}
if (content != "") {
Serial.println("> " + content);
/** Error from Serial command string.
* 0 = No error
* 1 = Invalid command
*/
int err = testInput(content);
}
int testInput(String content) {
if (content == "term") {
fileTerm = true;
Serial.println("Starting Terminal Mode");
return 0;
}
if (content == "exit" && fileTerm == true) {
fileTerm = false;
Serial.println("Exiting Terminal Mode");
return 0;
}
return 1;
}
(full source at http://pastebin.com/prEuBaRJ)
So the point is to catch the "term" command and enter some sort of filesystem terminal mode (eventually to access and manipulate files on the SD card). The "exit" command will leave the terminal mode.
However, whenever I actually compile and type these commands with others into the Serial monitor, I see:
> hello
> term
> test for index.html
> exit
> test
> foo
> etc...
I figure the function is catching those reserved terms and actually processing them properly, but for whatever reason, is not sending the desired responses over the Serial bus.
Just for the sake of proper syntax, I am also declaring the testInput() function in a separate header, though I would doubt this has any bearing on whether or not this particular error would occur.
Any explainable reason for this?
Thanks.
Model: Arduino Uno R3, IDE version: 1.0.4, though this behavior also happened on v1.0.5 in some instances..
It is kinda guessable how you ended up putting delay(1) in your code, that was a workaround for a bug in your code. But you didn't solve it properly. What you probably saw was that your code was too eager to process the command, before you were done typing it. So you slowed it down.
But that wasn't the right fix, what you really want to do is wait for the entire command to be typed. Until you press the Enter key on your keyboard.
Which is the bug in your code right now, the content variable doesn't just contain "term", it also contains the character that was generated by your terminal's Enter key. Which is why you don't get a match.
So fix your code, add a test to check that you got the Enter key character. And then process the command.

Fast start for fdsink: first 5 MB async, following bytes synchronous

I wrote a little HTTP video streaming server using GStreamer. Essentially the client does a GET request and receives a continuous HTTP stream.
The stream should be sent synchronously, i.e. at the same speed as the bitrate is. Problem is that some players (mplayer is a prominent example) don't buffer variable bitrate content well, thus lacking every other second.
I want to circumvent the buffer underruns by transmitting the first, say, 5 MB immediately, ignoring the pipeline's clock. The rest of the stream should be transmitted at the appropriate speed.
I figured setting the fdsink sync=TRUE for the first 5 MB, and sync=FALSE from then on should do the trick, but that does not work, as the fdsink patiently waits for the pipeline clock to catch up to the already sent data. In my test with a very low bitrate, there is no data transmitted for quite some seconds.
My fdsink reader thread currently looks like this:
static void *readerThreadFun(void*) {
int fastStart = TRUE;
g_object_set(G_OBJECT(fdsink0), "sync", FALSE, NULL);
for(uint64_t position = 0;;) {
// (On the other side there is node.js,
// that's why I don't do the HTTP chunking here)
ssize_t readCount = splice(gstreamerFd, NULL, remoteFd,
NULL, 1<<20, SPLICE_F_MOVE|SPLICE_F_MORE);
if(readCount == 0) {
break;
} else if(readCount < 0) {
goto error;
}
position += readCount;
if(fastStart && position >= 5*1024*1024) {
fastStart = FALSE;
g_object_set(G_OBJECT(fdsink0), "sync", TRUE, NULL);
}
}
...
}
How can I make GStreamer "forget" the duration the wall clock has to catch up with? Is there some "reset" function? Am I misunderstanding sync? Is there another method to realize a "fast start" in GStreamer?
That's not quite the solution I was looking for:
gst_base_sink_set_ts_offset(GST_BASE_SINK(fdsink0), -10ll*1000*1000*1000);
The sink will stream the first 10 seconds immediately.

How can I cause ldap_simple_bind_s to timeout?

We recently had a problem with our test LDAP server - it was hung and wouldn't respond to requests. As a result, our application hung forever* while trying to bind to it. This only happened on Unix machines - on Windows, the ldap_simple_bind_s call timed out after about 30 seconds.
* I don't know if it really was forever, but it was at least several minutes.
I added calls to ldap_set_option, trying both LDAP_OPT_TIMEOUT and LDAP_OPT_NETWORK_TIMEOUT, but the bind call still hung. Is there any way to make ldap_simple_bind_s time out after some period of time of my choosing?
There are a couple of things happening here.
Basically the LDAP SDK is broken; based on the spec it should have timed out based upon the value you sent in ldap_set_option. Unfortunately it's not doing that properly. Your bind will probably eventually time out, but it won't be until the OS returns back a failure, and that will come from the TCP timeout or some multiple of that timeout.
You can work around this by using ldap_simple_bind(), then calling ldap_result() a couple of times. If you don't get the result back within the timeout you want you can call ldap_abandon_ext() to tell the SDK to give up.
Of course since you're trying to bind this will almost certainly leave the connection in an unusable state and so you will need to unbind it immediately.
Hope this helps.
UPDATE: below code is only working on openldap 2.4+. openLdap 2.3 does not honor LDAP_OPT_TIMEOUT without which ldap_simple_bind_s will not timeout regardless of what you set. Here is the link from openLdap forum
I am using ldap_simple_bind_s in my LDAP auth service, and with setting LDAP_OPT_TIMEOUT, LDAP_OPT_TIMELIMIT, and LDAP_OPT_NETWORK_TIMEOUT; it successfully times out if the LDAP server is unavailable.
Here is the code excerpt from my LDAP Connect Method:
int opt_timeout = 4; // LDAP_OPT_TIMEOUT
int timelimit = 4; // LDAP_OPT_TIMELIMIT
int network_timeout = 4; // LDAP_OPT_NETWORK_TIMEOUT
int status = 0;
// Set LDAP operation timeout(synchronous operations)
if ( opt_timeout > 0 )
{
struct timeval optTimeout;
optTimeout.tv_usec = 0;
optTimeout.tv_sec = opt_timeout;
status = ldap_set_option(connection, LDAP_OPT_TIMEOUT, (void *)&optTimeout);
if ( status != LDAP_OPT_SUCCESS )
{
return false;
}
}
// Set LDAP operation timeout
if ( timelimit > 0 )
{
status = ldap_set_option(connection, LDAP_OPT_TIMELIMIT, (void *)&timelimit);
if ( status != LDAP_OPT_SUCCESS )
{
return false;
}
}
// Set LDAP network operation timeout(connection attempt)
if ( network_timeout > 0 )
{
struct timeval networkTimeout;
networkTimeout.tv_usec = 0;
networkTimeout.tv_sec = network_timeout;
status = ldap_set_option(connection, LDAP_OPT_NETWORK_TIMEOUT, (void *)&networkTimeout);
if ( status != LDAP_OPT_SUCCESS )
{
return false;
}
}
Try to specify option LDAP_OPT_TCP_USER_TIMEOUT - if this option available in your Ldap SDK. For OpenLdap and Linux it works nice - if no TCP answer will get in this timeout, synchronous operation is terminated.
See man page

Resources