I am using ssize_t send(int sockfd, const void *buf, size_t len, int flags); from socket.h file. I have some doubts about it.
Doubt 1:If suppose the string I am passing to send is of length 10 and the length i specified in the third parameter is 15. Then what will send only send 10 chars or it will send 15 chars (which it gets by reading unallocated memory for last 5 chars).
What will happen in the reverse case means if length of second parameter is more than third parameter.
Doubt 2:I am assuming the length of second parameter is equal to third parameter. Now if the second parameter is say - "abc\0def\0qw". Its length is 11. Will send send the whole string or \0 have any of its effect. I think it will send the whole string. How really send works.
If someone know any good source about send, recv which discuss these function in depth pls share.
The function send will try to send exactly as much as you tell it. If the buffer is big enough, nothing special happens. If it's not, what does happen is not defined: it could send garbage or it could crash or it could do anything else
The function send does not care about buffer contents - it only cares about the specified number of bytes to write
The send function knows nothing about "strings". If you give it a pointer, and tell it to send the next 15 bytes after that pointer, then that is EXACTLY what it will try to do. (You may well encounter a Seg-Fault or similar if you give it an inappropriate len value).
There is no justification for believing that it would stop just because it finds a byte with value 0x00. After all, many network protocols are FILLED with 0x00 all over the place. You can't have send stopping every time it happens to find that value.
send doesn't know anything about NUL-terminated strings (there's a clue in the parameter types - it takes a void* rather than a char*).
It simply sends the number of bytes you give it, from the address you give it. If that means reading unallocated memory, then that's what it'll do, possibly crashing in the process.
Related
I'm having some doubts about the number of bytes I should write/read through a socket in C on Unix. I'm used to sending 1024 bytes, but this is really too much sometimes when I send short strings.
I read a string from a file, and I don't know how many bytes this string is, it can vary every time, it can be 10, 20 or 1000. I only know for sure that it's < 1024. So, when I write the code, I don't know the size of bytes to read on the client side, (on the server I can use strlen()). So, is the only solution to always read a maximum number of bytes (1024 in this case), regardless of the length of the string I read from the file?
For instance, with this code:
read(socket,stringBuff,SIZE);
wouldn't it be better if SIZE is 10 instead of 1024 if I want to read a 10 byte string?
In the code in your question, if there are only 10 bytes to be read, then it makes no difference whether SIZE is 10 bytes, 1,024 bytes, or 1,000,024 bytes - it'll still just read 10 bytes. The only difference is how much memory you set aside for it, and if it's possible for you to receive a string up to 1,024 bytes, then you're going to have to set aside that much memory anyway.
However, regardless of how many bytes you are trying to read in, you always have to be prepared for the possibility that read() will actually read a different number of them. Particularly on a network, when you can get delays in transmission, even if your server is sending a 1,024 byte string, less than that number of bytes may have arrived by the time your client calls read(), in which case you'll read less than 1,024.
So, you always have to be prepared for the need to get your input in more than one read() call. This means you need to be able to tell when you're done reading input - you can't rely alone on the fact that read() has returned to tell you that you're done. If your server might send more than one message before you've read the first one, then you obviously can't hope to rely on this.
You have three main options:
Always send messages which are the same size, perhaps padding smaller strings with zeros if necessary. This is usually suboptimal for a TCP stream. Just read until you've received exactly this number of bytes.
Have some kind of sentinel mechanism for telling you when a message is over. This might be a newline character, a CRLF, a blank line, or a single dot on a line followed by a blank line, or whatever works for your protocol. Keep reading until you have received this sentinel. To avoid making inefficient system calls of one character at a time, you need to implement some kind of buffering mechanism to make this work well. If you can be sure that your server is sending you lines terminated with a single '\n' character, then using fdopen() and the standard C I/O library may be an option.
Have your server tell you how big the message is (either in an initial fixed length field, or using the same kind of sentinel mechanism from point 2), and then keep reading until you've got that number of bytes.
The read() system call blocks until it can read one or more bytes, or until an error occurs.
It DOESN'T guarantee that it will read the number of bytes you request! With TCP sockets, it's very common that read() returns less than you request, because it can't return bytes that are still propagating through the network.
So, you'll have to check the return value of read() and call it again to get more data if you didn't get everything you wanted, and again, and again, until you have everything.
I am testing a small issue with a daemon here (written in linux). I want to know whether what is done is right or not.
The daemon loads a shared object file (.so) using dlopen call. The the shared object receives some buffers from clients over the network. It uses the following call to read the buffer:
read_buffer(something, length of buffer read, buffer contents);
The read_buffer function copies the buffer of length specified in the second parameter, to another location using memcpy
On the client side, the following is done:
write_buffer(something, length of buffer, buffer contents);
The problem is that if we send an invalid length parameter (not matching the real length of what is copied in the third parameter), from the client side, there is a segfault in the server side in the memcpy location
I am not sure how to input-validate the parameters that are passed to memcpy function
Request you to please help me out understand what is possible solution
To check C/C++ code for errors in memory allocation / access, use Valgrind. The server side has no way (that I know of) to determine if the parmateres passed are valid or not. That's the C/C++ credo: Know what you're doing, or you die. There is no safety net.
You can append the length of the buffer at the start of the buffer.
-------------------------------------------------------------
| FIXED LENGTH OF BUFFER - n bytes | BUFFER
-------------------------------------------------------------
Now each time you read it on the server side, first read "n" bytes(reserved for storing the length) which contain the length. When the data arrives, you can compare length of buffer and first n bytes for validation.
Hope this helps.
I'm sending a C struct over UDP
struct packet{
int numInt;
int* intList; //malloc'ed as (sizeof(int)*numInt)
}
It will be serialized as [numInt][intList[0]]...[intList[numInt-1]].
My understanding is that calling recvfrom on UDP will read the entire packet, even if the buffer doesn't hold that many bytes. Is using a really large buffer the only option I have?
You could pass MSG_PEEK to recvfrom to find out exactly how big the buffer needs to be. So just recvfrom a few bytes with MSG_PEEK to find numInt and then recvfrom the real thing (this time without MSG_PEEK).
The standard says something about MSG_PEEK, but kernel.org spells it better:
MSG_PEEK
This flag causes the receive operation to return data from the
beginning of the receive queue without removing that data from the
queue. Thus, a subsequent receive call will return the same data.
Obviously at some point you will start wondering if doubling the number of system calls to save memory is worth it. I think it isn't.
UDP packets are sent and received as a whole. if you receive it, the size is right. The only thing you have to do is to supply a big enough buffer on read() or recv() or recfrom(). The length field inside the payload is redundant, since the read() will tell you the correct size. It is also dangerous, since it relies on the sender and reciever having the same byte order.
You could try using a small buffer, just large enough to get numInt, with the MSG_PEEK flag set. Then you can find out the size you actually need, and receive again without MSG_PEEK to get the whole thing.
I'm pretty sure recvfrom will read up to as many bytes as is told to it by its 3rd argument, len. If there are fewer bytes available, it will return what is there. If there are more, it will return up to len bytes. You may have to make additional calls to obtain all the data your are expecting.
I've tried using GDB and Valgrind, but I can't seem to pinpoint the problem.
Interestingly, the program crashes during normal execution and GDB, but not Valgrid.
To help you follow along with the code, heres the basic point of the program:
Communicate with a server via sockets and UDP to transfer a file, and handle some basic packet loss.
I won't share the server's code, because I know the issue isn't there.
The point that might confuse some, is that I'm implementing packet loss myself, with a number generator. Right now it doesn't do anything really, besides make the program use another recvfrom.
To guide you throught the programs output, the client tells the server what file it wants, the server tells the client how big the file is it's going to send, and then sends it in chunks (of 10 characters at a time).
The output shows what chunk is sent, how many characters were received, and what the concatenated string is.
The file transfer succeeds from what i can tell, its just the fopen call that I use to write the received file that is giving me trouble. Not sure if it's to do with my malloc call or not.
Here is the source code:
pastebin.com/Z79hvw6L
Here are the outputs from CLI execution, and Valgrind (GDB doesn't seem to give any more info):
Notice the CLI gives a malloc memory corruption error, and Valgrind doesn't.
CLI: http://pastebin.com/qdTKMCD2
VALGRIND: http://pastebin.com/8inRygnU
Thanks for any help!
Added the GDB Backtrace results
======= Backtrace: =========
/lib/i386-linux-gnu/libc.so.6(+0x6b961)[0x19a961]
/lib/i386-linux-gnu/libc.so.6(+0x6e15d)[0x19d15d]
/lib/i386-linux-gnu/libc.so.6(__libc_malloc+0x63)[0x19ef53]
/lib/i386-linux-gnu/libc.so.6(+0x5c2b8)[0x18b2b8]
/lib/i386-linux-gnu/libc.so.6(fopen+0x2c)[0x18b38c]
/home/---/client[0x8048dc2]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x145e37]
/home/---/client[0x8048871]
Maybe this could give someone an insight as to what part of the program the error is in?
char chunk[10];
chunk[10] = '\0';
is wrong, chunk[10] is one past the array.
And in general, be careful with doing this
char filename[25];
scanf("%s",filename);
If you enter a long filename, you'll trash memory. using fgets() would be better. You also would at least want to check if scanf succeeds, else the following strlen() on filename isn't valid.
line 93, buf[strlen(buf)-1]='\0'; is dangerous, you can't use strlen if the buffer isn't already nul terminated, and you trash memory if buf is an empty string, as you index buf[-1].
Edit.
Your other problem is strcat(fullstring,chunk); , you've no control in your loop that stops appending to this string if you happen to receive more data than it can hold. The size is also likely off by one, as you need room for the last nul terminator. Make it at least char * fullstring = malloc(sizeof(char)*filesize + 1 ); But your loop really needs to check that's it is not writing past the end of that buffer.
As for adding a nul terminator to buf , the recv call returns how many bytes you've read, so if you've checked recv for errors, do buf[numbytes] = 0 , but this will be off by one as well, as you've allocated 10 bytes for buf and you try to read 10 bytes into it as well - but in C, a string needs room for a nul terminator too. Make buf 11 bytes big. Or recv() only 9 bytes.
In fact, you're off by one many places, so start counting how many bytes you need, and were you put stuff into them. Remember that in C, arrays starts with index zero, and an array of 10 can only be indexed by index 0 to 9.
This (line 93) is suspect:
buf[strlen(buf)-1]='\0';
UPDATE This (line 99,100) is also wrong:
char chunk[10];
chunk[10] = '\0';
UPDATE2: The buffer is too small
char * fullstring = malloc(sizeof(char)*filesize); // line 103
...
strcat(fullstring,chunk); // line 124
UPDATE3:
UDP is unreliable. Transmission of a packet may fail (packets may be dropped anywhere between sender and receiver) , and the packets may be received in a different order than in which you sent them.
Well, it shouldn't be a problem on modern OS:es but you don't check the returned value from malloc() for NULL. On what line does it crash and with what signal?
I'm receiving from socket A and writing that to socket B on the fly (like a proxy server might). I would like to inspect and possibly modify data passing through. My question is how to handle border cases, ie where the regular expression I'm searching for would match between two successive socket A read and socket B write iterations.
char buffer[4096]
int socket_A, socket_B
/* Setting up the connection goes here */
for(;;) {
recv(socket_A, buffer, 4096, 0);
/* Inspect, and possibly modify buffer */
send(socket_B, buffer, 4096, 0);
/* Oops, the matches I was looking for were at the end of buffer,
* and will be at the beginning of buffer next iteration :( */
}
My suggestion: have two buffers, and rotate between them:
Recv buffer 1
Recv buffer 2
Process.
Send buffer 1
Recv buffer 1
Process, but with buffer 2 before buffer 1.
Send buffer 2
Goto 2.
Or something like that?
Assuming you know the maximum length M of the possible regular expression matches (or can live with an arbitrary value - or just use the whole buffer), you could handle it by not passing on the full buffer but keep M-1 bytes back. In the next iteration put the new received data at the end of the M-1 bytes and apply the regular expression.
If you know the format of the data transmitted (e.g. http), you should be able to parse the contents to know when you reached the end of the communication and should send out the trailing bytes you may have cached. If you do not know the format, then you'd need to implement a timeout in the recv so that you do not hold on to the end of the communication for too long. What is too long is something that you will have to decide on your own,
You need to know and/or say something about your regular expression.
Depending on the regular expression, you might need to buffer a lot more than you are buffering now.
A worst case scenario might be something like a regular expression which says, "find everything, starting from the begining up until the first occurence of the word 'dog', and replace that with something else": if you have a regular expression like that, then you need to buffer (without forwarding) everything from the begining until the first occurence of the word 'dog': which might never happen, i.e. might be an infinite amount to buffer.
In that sense you're talking about (and all senses for, say, TCP) sockets are streams. It follows from your question that you have some structure in the data. So you must do something similar to the following:
Buffer (hold) incoming data until a boundary is reached. The boundary might be end-of-line, end-of-record, or any other way that you know that your regex will match.
When a "record" is ready, process it and place the results in an output buffer.
Write anything accumulated in the output buffer.
That handles most cases. If you have one of the rare cases where there's really no "record" then you have to build some sort of state machine (DFA). By this I mean you must be able to accumulate data until either a) it can't possibly match your regex, or b) it's a completed match.
EDIT:
If you're matching fixed strings instead of a true regex then you should be able to use the Boyer-Moore algorithm, which can actually run in sub-linear time (by skipping characters). If you do it right, as you move over the input you can throw previously seen data to the output buffer as you go, decreasing latency and increasing throughput significantly.
Basically, the problem with your code is that the recv/send loop is operating on a lower network layer than your modifications. How you solve this problem depends on what modifications you're making, but it probably involves buffering data until all local modifications can be made.
EDIT: I don't know of any regex library that can filter a stream like that. How hard this is going to be will depend on your regex and the protocol it's filtering.
One alternative is to use poll(2)-like strategy with non-blocking sockets. On read event grab a buffer from the socket, push it onto incoming queue, call the lexer/parser/matcher that assembles the buffers into a stream, then pushes chunks onto the output queue. On write event, take a chunk from the output queue, if any, and write it into the socket. This sounds kind of complicated, but it's not really once you get used to the inverted control model.