I'm writing a linux C client-server programs that communicates with each other with unix domain sockets and passes couple of buffers each time.
I'm using ioverctors but for some reasons the server program only receives the first io vector.
Any idea ?
I attached the relevant code snippets.
Client code:
struct iovec iov[2];
struct msghdr mh;
int rc;
char str1[] = "abc";
char str2[] = "1234";
iov[0].iov_base = (caddr_t)str1;
iov[0].iov_len = sizeof(str1);
iov[1].iov_base = (caddr_t)str2;
iov[1].iov_len = sizeof(str2);
memset(&mh, 0, sizeof(mh));
mh.msg_iov = iov;
mh.msg_iovlen = 2;
n = sendmsg(sockfd, &mh, 0); /* no flags used*/
if (n > 0) {
printf("Sendmsg successfully executed\n");
}
}
Server code:
{
struct sockaddr_un *client_sockaddr = (sockaddr_un *)opq;
struct msghdr msg;
struct iovec io[2];
char buf[16];
char buf2[16];
io[0].iov_base = buf;
io[0].iov_len = sizeof(buf);
io[1].iov_base = buf2;
io[1].iov_len = sizeof(buf2);
msg.msg_iov = io;
msg.msg_iovlen = 2;
int len = recvmsg(sock, &msg, 0);
if (len > 0) {
printf("recv: %s %d %s %d\n", msg.msg_iov[0].iov_base, msg.msg_iov[0].iov_len, msg.msg_iov[1].iov_base, msg.msg_iov[1].iov_len);
}
return 0;
}
The output i'm getting from the server:
recv: abc 16 16
sendmsg(), writev(), pwritev(), and pwritev2() do not operate on multiple buffers, but one discontiguous buffer. They operate exactly as if you'd allocate a large enough temporary buffer, gather the data there, and then do the corresponding syscall on the single temporary buffer.
Their counterparts recvmsg(), readv(), preadv(), and preadv2() similarly do not operate on multiple buffers, only on one discontiguous buffer. They operate exactly as if you'd allocate a large enough temporary buffer, receive data into that buffer, then scatter the data from that buffer to the discontiguous buffer parts.
Unix domain datagram (SOCK_DGRAM) and seqpacket (SOCK_SEQPACKET) sockets preserve message boundaries, but stream sockets (SOCK_STREAM) do not. That is, using a datagram or seqpacket socket you receive each message as it was sent. With a stream socket, message boundaries are lost: two consecutively sent messages can be received as a single message, and you can (at least in theory) receive a partial message now and the rest later.
You can use the Linux-specific sendmmsg() function to send several messages in one call (using the same socket). If you use an Unix domain datagram or seqpacket socket, these will then retain their message boundaries.
Each message is described using a struct mmsghdr. It contains struct msghdr msg_hdr; and unsigned int msg_len;. msg_hdr is the same as you use when sending a single message using e.g. sendmsg(); you can use more than one iovec for each message, but the recipient will receive them concatenated into a single buffer (but can scatter that buffer using e.g. recvmsg()). msg_len will be filled in by the sendmmsg() call: the number of bytes sent for that particular message, similar to the return value of e.g. sendmsg() call when no errors occur.
The return value from the sendmmsg() call is the number of messages sent successfully (which may be fewer than requested!), or -1 if an error occurs (with errno indicating the error as usual). Thus, you'll want to write a helper function or a loop around sendmmsg() to make sure you send all the messages. For portability, I recommend a helper function, because you can then provide another based on a loop around sendmsg() for use when sendmmsg() is not available.
The only real benefit of sendmmsg() is that you need fewer syscalls to send a large number of messages: it boosts efficiency in certain situations, that's all.
I am kind of new to use libpcap.
I am using this library to capture the packet and the code i wrote to the capture the packet is below.
The interface that I am tapping is always flooded with arp packet so there is always packet coming to the interface.But I cannot able to tap these packet. The interface is UP and running.
I got no error on pcap_open_live function.
The code is in C. And I am running this code on FreeBSD10 machine 32 bit.
void captutre_packet(char* ifname , int snaplen) {
char ebuf[PCAP_ERRBUF_SIZE];
int pflag = 0;/*promiscuous mode*/
snaplen = 100;
pcap_t* pcap = pcap_open_live(ifname, snaplen, !pflag , 0, ebuf);
if(pcap!=NULL) {
printf("pcap_open_live for %s \n" ,ifname );
}
int fd = pcap_get_selectable_fd(pcap);
pcap_setnonblock(pcap, 1, ebuf);
fd_set fds;
struct timeval tv;
FD_ZERO(&fds);
FD_SET(fd, &fds);
tv.tv_sec = 3;
tv.tv_usec = 0;
int retval = select(fd + 1, &fds, NULL, NULL, &tv);
if (retval == -1)
perror("select()");
else if (retval) {
printf("Data is available now.\n");
printf("calling pcap_dispatch \n");
pcap_dispatch(pcap , -1 , (pcap_handler) callback , NULL);
}
else
printf("No data within 3 seconds.\n");
}
void
callback(const char *unused, struct pcap_pkthdr *h, uint8_t *packet)
{
printf("got some packet \n");
}
I am always getting retval as 0 which is timeout.
I don't know what is happening under the hood I follow the tutorial and they also did exactly the same thing I do not know what i am missing.
I also want to understand how the packet from the ethernet layer once received get copied into this opened bpf socket/device (using pcap_open_live) and how the buffer is copied from kernel space to user space?
And for how long we can tap the packet till the kernel consume or reject the packet?
The pcap_open_live() call provided 0 as the packet buffer timeout value (the fourth argument). libpcap does not specify what a value of 0 means, because different packet capture mechanisms, on different operating systems, treat that value differently.
On systems using BPF, such as the BSDs and macOS, it means "wait until the packet buffer is completely full before providing the packets. If the packet buffer is large (it defaults to about 256K, on FreeBSD), and the packets are small (60 bytes for ARP packets), it may take a significant amount of time for the buffer to fill - longer than the timeout you're handing to select().
It's probably best to have a timeout value of between 100 milliseconds and 1 second, so pass an argument of somewhere between 100 and 1000, not 0.
I read in MSDN about the send() and recv() function, and there is one thing that I'm not sure I understand.
If I send a buffer of size 256 for example, and receive first 5 bytes, so the next time I call the recv() function, it will point to the 6th byte and get the data from there?
for example :
char buff[256];
memcpy(buff,"hello world",12);
send(sockfd, buffer, 100) //sending 100 bytes
//server side:
char buff[256];
recv(sockfd, buff, 5) // now buffer contains : "Hello"?
recv(socfd, buff,5) // now I ovveride the data and the buffer contains "World"?
thanks!
The correct way to receive into a buffer in a loop from TCP in C is as follows:
char buffer[8192]; // or whatever you like, but best to keep it large
int count = 0;
int total = 0;
while ((count = recv(socket, &buffer[total], sizeof buffer - total, 0)) > 0)
{
total += count;
// At this point the buffer is valid from 0..total-1, if that's enough then process it and break, otherwise continue
}
if (count == -1)
{
perror("recv");
}
else if (count == 0)
{
// EOS on the socket: close it, exit the thread, etc.
}
You have missed the principal detail - what kind of socket is used and what protocol is requested. With TCP, data is octet granulated, and, yes, if 256 bytes was sent and you have read only 5 bytes, rest 251 will wait in socket buffer (assuming buffer is larger, which is true for any non-embedded system) and you can get them on next recv(). With UDP and without MSG_PEEK, rest of a single datagram is lost, but, if MSG_PEEK is specified, next recv() will give the datagram from the very beginning. With SCTP or another "sequential packet" protocol, AFAIK, the same behavior as with UDP is got, but I'm unsure in Windows implementation specifics.
I am using domain sockets (AF_UNIX) to communicate between two threads for inter process communication. This is chosen to work well with libev: I use it on the recv end of the domain socket. This works very well except that the data I am sending is constant 4864 bytes. I cannot afford to get this data fragmented. I always thought domain sockets won't fragment data, but as it turns out it does. When the communication is at its peak between the threads, I observe the following
Thread 1:
SEND = 4864 actual size = 4864
Thread 2:
READ = 3328 actual size = 4864
Thread 1:
SEND = 4864 actual size = 4864
Thread 2:
READ = 1536 actual size = 4864
As you can see, thread 2 received the data in fragments (3328 + 1536). This is really bad for my application. Is there anyway we can make it not fragment it? I understand that IP_DONTFRAG can be set to only AF_INET family? Can someone suggest an alternative?
Update: sendto code
ssize_t
socket_domain_writer_dgram_send(int *domain_sd, domain_packet_t *pkt) {
struct sockaddr_un remote;
unsigned long len = 0;
ssize_t ret = 0;
memset(&remote, '\0', sizeof(struct sockaddr_un));
remote.sun_family = AF_UNIX;
strncpy(remote.sun_path, DOMAIN_SOCK_PATH, strlen(DOMAIN_SOCK_PATH));
len = strlen(remote.sun_path) + sizeof(remote.sun_family) + 1;
ret = sendto(*domain_sd, pkt, sizeof(*pkt), 0, (struct sockaddr *)&remote, sizeof(struct sockaddr_un));
if (ret == -1) {
bps_log(BPS_LOGGER_RD, ASL_LEVEL_ERR, "Domain writer could not connect send packets", errno);
}
return ret;
}
SOCK_STREAM by definition doesn't preserve message boundaries. Try again with SOCK_DGRAM or SOCK_SEQPACKET:
http://man7.org/linux/man-pages/man7/unix.7.html
On the other hand, consider that you may be passing messages larger than your architecture page size. For example, for amd64, a memory page is 4K. If that's a problem for any reason it might make sense to split the packets in 2.
Note however, that's not a real issue for the packets to arrive fragmented. It's common to have a packet assembler in the receiving end of the socket. What's wrong with implementing it ?
4864 + 3328 = 8192. My guess is that you're transmitting two 4864-byte packets back to back in some cases, and it's filling an 8 KB kernel buffer somewhere. IP_DONTFRAG isn't applicable because IP is not involved here — the "fragmentation" you're seeing is happening via a completely different mechanism.
If all the data you're transmitting consists of packets, you would do well to use a datagram socket (SOCK_DGRAM) instead of a stream. This should make the send() block when the kernel buffer doesn't have sufficient space to store an entire packet, rather than allowing a partial write through, and will make each recv() return exactly one packet, so you don't need to deal with framing.
send() shall return the number of bytes sent or error code, but all examples that I found check it only with error codes, but not with the number of bytes sent.
//typical example
int cnt=send(s,query,strlen(query),0);
if (cnt < 0) return(NULL);
//Hey, what about cnt < strlen(query)?
Q: Does "send()" always return the whole buffer?
A: No, not necessarily.
From Beej's Guide:
* http://beej.us/guide/bgnet/html/multi/syscalls.html#sendrecv
send() returns the number of bytes actually sent out—this might be
less than the number you told it to send! See, sometimes you tell it
to send a whole gob of data and it just can't handle it. It'll fire
off as much of the data as it can, and trust you to send the rest
later. Remember, if the value returned by send() doesn't match the
value in len, it's up to you to send the rest of the string. The good
news is this: if the packet is small (less than 1K or so) it will
probably manage to send the whole thing all in one go. Again, -1 is
returned on error, and errno is set to the error number.
Q: Does "recv()" always read the whole buffer?
A: No, absolutely not. You should never assume the buffer you've received is "the whole message". Or assume the message you receive is from one, single message.
Here's a good, short explanation. It's for Microsoft/C#, but it's applicable to all sockets I/O, in any language:
http://blogs.msdn.com/b/joncole/archive/2006/03/20/simple-message-framing-sample-for-tcp-socket.aspx
The answer is in another section of man 2 send:
When the message does not fit into the send buffer of the socket,
send() normally blocks, unless the socket has been placed in nonblock‐
ing I/O mode. In nonblocking mode it would fail with the error EAGAIN
or EWOULDBLOCK in this case. The select(2) call may be used to deter‐
mine when it is possible to send more data.
Or, alternatively, the POSIX version (man 3p send):
If space is not available at the sending socket to hold the message to
be transmitted, and the socket file descriptor does not have O_NONBLOCK
set, send() shall block until space is available. If space is not
available at the sending socket to hold the message to be transmitted,
and the socket file descriptor does have O_NONBLOCK set, send() shall
fail. The select() and poll() functions can be used to determine when
it is possible to send more data.
So, while a read of partial data is common, a partial send in blocking mode should not happen (barring implementation details).
Nope, it doesn't.
For reference, see the man page for send:
When the message does not fit into the send buffer of the socket, send()
normally blocks, unless the socket has been placed in nonblocking I/O mode.
In nonblocking mode it would fail with the error EAGAIN or EWOULDBLOCK in this
case. The select(2) call may be used to determine when it is possible to send
more data.
I've read through this question and other two related questions:
When a non-blocking send() only transfers partial data, can we assume it would return EWOULDBLOCK the next call?
Blocking sockets: when, exactly, does "send()" return?
I found not all answers reach an consensus and one or two answers have oppsite conclusion.
So I spent quite some time searching in book and playing with this code of #Damon that he posted in the comment of https://stackoverflow.com/a/19400029/5983841 .
I think most answers are wrong and my conlusion is:
A call to send has these possible outcomes:
There is at least one byte available in the send buffer:
1.1 → if send is blocking (the fd is not set as non-blocking and MSG_DONTWAIT is not specified in send), send blocks until there's enough room for the whole buffer to fit, and send the whole buffer.
1.2 → if send is non-blocking (fd set as non-blocking or MSG_DONTWAIT is specified in send), send returns the number of bytes accepted (possibly fewer than you asked for).
The send buffer is completely full at the time you call send.
→ if the socket is blocking, send blocks
→ if the socket is non-blocking, send fails with EWOULDBLOCK/EAGAIN
An error occurred (e.g. user pulled network cable, connection reset by peer) →send fails with another error
#1.1 conforms to man 2 send:
When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in nonblocking I/O mode.
partial recv is easy to understand, while for partial send (from The Linux Programming Interface):
61.1 Partial Reads and Writes on Stream Sockets
...
A partial write may occur if there is insufficient buffer space to transfer all of the requested bytes and one of the following is true:
A signal handler interrupted the write() call (Section 21.5) after it transferred some of the requested bytes.
The socket was operating in nonblocking mode (O_NONBLOCK), and it was possible to transfer only some of the requested bytes.
An asynchronous error occurred after only some of the requested bytes had
been transferred. By an asynchronous error, we mean an error that occurs asynchronously with respect to the application’s use of calls in the sockets API. An asynchronous error can arise, for example, because of a problem with a TCP connection, perhaps resulting from a crash by the peer application.
In all of the above cases, assuming that there was space to transfer at least 1 byte, the write() is successful, and returns the number of bytes that were transferred to the output buffer.
...
(The case of signal interruption doesn't happen in most of the time and I have difficulties writing to prove a partial write in this case. Hope someone could help)
What's not made clear enough of man 2 send :
When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in nonblocking I/O mode.
In nonblocking mode it would fail with the error EAGAIN or EWOULDBLOCK in this case.
is that in nonblocking mode it would fail if the buffer is completely full. If there's 1 byte available in send buffer, it won't fail but instead returns the number of bytes that were sent, aka a partial send. (the author of the book is also the mantainer of linux manpage https://www.kernel.org/doc/man-pages/maintaining.html ).
Prove of code, written by #Damon. I modifed 3~5 lines, making the server doesn't consume any packets, so as to demonstrate.
#include <cstdio>
#include <cstdlib>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <arpa/inet.h>
int create_socket(bool server = false)
{
addrinfo hints = {};
addrinfo* servinfo;
int sockfd = -1;
int rv;
hints.ai_family = AF_INET;
hints.ai_socktype = SOCK_STREAM;
hints.ai_flags = server ? AI_PASSIVE : 0;
if ((rv = getaddrinfo(server ? 0 : "localhost", "12345", &hints, &servinfo)))
{
printf("getaddrinfo failed: %s\n", gai_strerror(rv));
exit(1);
}
for(auto p = servinfo; p; p = p->ai_next)
{
if ((sockfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) == -1)
{
perror("socket");
continue;
}
if(server)
{
int yes = 1;
setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(yes));
if(bind(sockfd, p->ai_addr, p->ai_addrlen) == -1)
{
close(sockfd);
perror("bind");
continue;
}
}
else
{
if(connect(sockfd, p->ai_addr, p->ai_addrlen) == -1)
{
close(sockfd);
perror("connect");
continue;
}
else
puts("client: connected");
}
break;
}
freeaddrinfo(servinfo);
return sockfd;
}
void server()
{
int socket = create_socket(true);
if(listen(socket, 5) == -1)
{
perror("listen");
exit(1);
}
puts("server: listening");
int conn = -1;
sockaddr_storage addr;
unsigned int sizeof_addr = sizeof(addr);
for(;;)
{
if((conn = accept(socket, (sockaddr *) &addr, &sizeof_addr)) == -1)
{
perror("accept");
}
else
{
puts("server: accept");
if(!fork()) // actually not necessary, only got 1 client
{
close(socket);
// char *buf = new char[1024*1024];
// read(conn, buf, 1024*1024); // black hole
// server never reads
break;
}
}
}
}
void do_send(int socket, const char* buf, unsigned int size/*, bool nonblock = false */)
{
unsigned int sent = 0;
unsigned int count = 0;
while(sent < size)
{
int n = send(socket, &buf[sent], size - sent, 0);
// int n = send(socket, &buf[sent], size - sent, MSG_DONTWAIT);
if (n == -1)
{
if(errno == EAGAIN)
{
printf(".");
printf("\n");
}
else
{
perror("\nsend");
return;
}
}
else
{
sent += n;
printf(" --> sent a chunk of %u bytes (send no. %u, total sent = %u)\n", n, ++count, sent);
}
}
}
void client()
{
const unsigned int max_size = 64*1024*1024; // sending up to 64MiB in one call
sleep(1); // give server a second to start up
int socket = create_socket();
unsigned int send_buffer_size = 0;
unsigned int len = sizeof(send_buffer_size);
if(getsockopt(socket, SOL_SOCKET, SO_SNDBUF, &send_buffer_size, &len))
perror("getsockopt");
// Linux internally doubles the buffer size, and getsockopt reports the doubled size
printf("send buffer size = %u (doubled, actually %u)\n", send_buffer_size, send_buffer_size/2);
if(socket == -1)
{
puts("no good");
exit(1);
}
char *buf = new char[max_size]; // uninitialized contents, but who cares
for(unsigned int size = 65536; size <= max_size; size += 16384)
{
printf("attempting to send %u bytes\n", size);
do_send(socket, buf, size);
}
puts("all done");
delete buf;
}
int main()
{
setvbuf(stdout, NULL, _IONBF, 0);
if(fork() > 0) server(); else client();
return 0;
}
compile and run
g++ -g -Wall -o send-blocking-and-server-never-read code-of-damon.cpp
./send-blocking-and-server-never-read > log1.log 2>&1
log1.log content
server: listening
client: connectedserver: accept
send buffer size = 2626560 (doubled, actually 1313280)
attempting to send 65536 bytes
--> sent a chunk of 65536 bytes (send no. 1, total sent = 65536)
attempting to send 81920 bytes
--> sent a chunk of 81920 bytes (send no. 1, total sent = 81920)
attempting to send 98304 bytes
--> sent a chunk of 98304 bytes (send no. 1, total sent = 98304)
attempting to send 114688 bytes
--> sent a chunk of 114688 bytes (send no. 1, total sent = 114688)
attempting to send 131072 bytes
--> sent a chunk of 131072 bytes (send no. 1, total sent = 131072)
attempting to send 147456 bytes
--> sent a chunk of 147456 bytes (send no. 1, total sent = 147456)
attempting to send 163840 bytes
--> sent a chunk of 163840 bytes (send no. 1, total sent = 163840)
attempting to send 180224 bytes
--> sent a chunk of 180224 bytes (send no. 1, total sent = 180224)
attempting to send 196608 bytes
--> sent a chunk of 196608 bytes (send no. 1, total sent = 196608)
attempting to send 212992 bytes
--> sent a chunk of 212992 bytes (send no. 1, total sent = 212992)
attempting to send 229376 bytes
--> sent a chunk of 229376 bytes (send no. 1, total sent = 229376)
attempting to send 245760 bytes
--> sent a chunk of 245760 bytes (send no. 1, total sent = 245760)
attempting to send 262144 bytes
--> sent a chunk of 262144 bytes (send no. 1, total sent = 262144)
attempting to send 278528 bytes
--> sent a chunk of 278528 bytes (send no. 1, total sent = 278528)
attempting to send 294912 bytes
then comment int n = send(socket, &buf[sent], size - sent, 0); and uncomment int n = send(socket, &buf[sent], size - sent, MSG_DONTWAIT);
compile and run again
g++ -g -Wall -o send-nonblocking-and-server-never-read code-of-damon.cpp
./send-nonblocking-and-server-never-read > log2.log 2>&1
log2.log content
server: listening
server: accept
client: connected
send buffer size = 2626560 (doubled, actually 1313280)
attempting to send 65536 bytes
--> sent a chunk of 65536 bytes (send no. 1, total sent = 65536)
attempting to send 81920 bytes
--> sent a chunk of 81920 bytes (send no. 1, total sent = 81920)
attempting to send 98304 bytes
--> sent a chunk of 98304 bytes (send no. 1, total sent = 98304)
attempting to send 114688 bytes
--> sent a chunk of 114688 bytes (send no. 1, total sent = 114688)
attempting to send 131072 bytes
--> sent a chunk of 131072 bytes (send no. 1, total sent = 131072)
attempting to send 147456 bytes
--> sent a chunk of 147456 bytes (send no. 1, total sent = 147456)
attempting to send 163840 bytes
--> sent a chunk of 163840 bytes (send no. 1, total sent = 163840)
attempting to send 180224 bytes
--> sent a chunk of 180224 bytes (send no. 1, total sent = 180224)
attempting to send 196608 bytes
--> sent a chunk of 196608 bytes (send no. 1, total sent = 196608)
attempting to send 212992 bytes
--> sent a chunk of 212992 bytes (send no. 1, total sent = 212992)
attempting to send 229376 bytes
--> sent a chunk of 229376 bytes (send no. 1, total sent = 229376)
attempting to send 245760 bytes
--> sent a chunk of 245760 bytes (send no. 1, total sent = 245760)
attempting to send 262144 bytes
--> sent a chunk of 262144 bytes (send no. 1, total sent = 262144)
attempting to send 278528 bytes
--> sent a chunk of 278528 bytes (send no. 1, total sent = 278528)
attempting to send 294912 bytes
--> sent a chunk of 178145 bytes (send no. 1, total sent = 178145)
.
.
.
.
.
.
// endless .
Compare the last output of log1.log and log2.log and you can tell that a blocking send blocks when there's no enough buffer to fit all 294912 bytes while a non-blocking send performs a partial write. This conforms to conclusion #1.
Special thanks to #user207421's different opinion that leads me on more searching.