I have a client server program written in C. The intent is to see how fast big data can be trasported over TCP. The receiving side OS (Ubuntu Linux 14.*) is tuned to improve the TCP performance, as per the documentation around tcp / socket / windows scaling etc. as below:
net.ipv4.tcp_window_scaling = 1
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 16384 16777216
This aprt, I have also increased the individual socket buffer size through setsockopt call.
But I am not seeing the program responding to these changes - the overall throughput is either flat or even reduced at times. When I took tcpdump at the receiving side, I see a monotonic pattern of tcp packets of length 1368 as coming to it, in most (99%) cases.
19:26:06.531968 IP <SRC> > <DEST>: Flags [.], seq 25993:27361, ack 63, win 57, options [nop,nop,TS val 196975830 ecr 488095483], length 1368
As per the documentation, the tcp window scaling option increases the receiving frame size in propotion to the demand and capacity - but all I see is "win 57" - very few bytes remaining in the receiving buffer, which is not matching with the expection.
Hence I start suspecting my assumptions on the tuning itself, and have these questions:
Is there any specific tunables required at the sending side to improve the client side reception? Making sure that you (program) writes the whole chunk of data in one go is not enough?
In the client side tunable as mentioned above necessary and sufficient? The default on in the system are too low, but I don't see the changes applied in /etc/sysctl.conf having any effect. Is running sysctl --system after changes sufficient to make the changes in effect? or do we need to reboot the system?
If the OS is a virtual machine, will these tunables make meaning in its completeness, or are there additional steps at the real physical machine?
I can share the source code if that helps, but I can guarentee that it is just a trivial code.
Here is the code:
#cat client.c
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <string.h>
#define size 1024 * 1024 * 32
int main(){
int s;
char buffer[size];
struct sockaddr_in sa;
socklen_t addr_size;
s = socket(PF_INET, SOCK_STREAM, 0);
sa.sin_family = AF_INET;
sa.sin_port = htons(25000);
sa.sin_addr.s_addr = inet_addr("<SERVERIP");
memset(sa.sin_zero, '\0', sizeof sa.sin_zero);
addr_size = sizeof sa;
connect(s, (struct sockaddr *) &sa, addr_size);
int rbl = 1048576;
int g = setsockopt(s, SOL_SOCKET, SO_RCVBUF, &rbl, sizeof(rbl));
while(1) {
int ret = read(s, buffer, size);
if(ret <= 0) break;
}
return 0;
}
And the server code:
bash-4.1$ cat server.c
#include <sys/types.h>
#include <sys/mman.h>
#include <memory.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <netinet/in.h>
#include <errno.h>
#include <stdio.h>
#include <sys/socket.h>
extern int errno;
#define size 32 * 1024 * 1024
int main() {
int fdsocket;
struct sockaddr_in sock;
fdsocket = socket(AF_INET,SOCK_STREAM, 0);
int rbl = 1048576;
int g = setsockopt(fdsocket, SOL_SOCKET, SO_SNDBUF, &rbl, sizeof(rbl));
sock.sin_family = AF_INET;
sock.sin_addr.s_addr = inet_addr("<SERVERIP");
sock.sin_port = htons(25000);
memset(sock.sin_zero, '\0', sizeof sock.sin_zero);
g = bind(fdsocket, (struct sockaddr *) &sock, sizeof(sock));
if(g == -1) {
fprintf(stderr, "bind error: %d\n", errno);
exit(1);
}
int p = listen(fdsocket, 1);
char *buffer = (char *) mmap(NULL, size, PROT_WRITE|PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if(buffer == -1) {
fprintf(stderr, "%d\n", errno);
exit(-1);
}
memset(buffer, 0xc, size);
int connfd = accept(fdsocket, (struct sockaddr*)NULL, NULL);
rbl = 1048576;
g = setsockopt(connfd, SOL_SOCKET, SO_SNDBUF, &rbl, sizeof(rbl));
int wr = write(connfd, buffer, size);
close(connfd);
}
There are many tunables, but whether they have an effect and whether the effect is positive or negative also depends on the situation. What are the defaults for the tunables? The values you set might actually be lower than the defaults on your OS, thereby decreasing performance. But larger buffers might sometimes also be detrimental, because more RAM is used, and it might not fit into cache memory anymore. It also depends on your network itself. Is it wired, wireless, how many hops, what kind of routers are inbetween? But sending data in as large chunks as possible is usually the right thing to do.
One tunable you have missed is the congestion control algorithm, which you can tune with net.ipv4.tcp_congestion_control. Which ones are available depends on your kernel, and which one is the best depends on your network and the kind of traffic that you are sending.
Another thing is that TCP has two endpoints, and tunables on both sides are important.
The changes made with sysctl are taking effect immediately for new TCP connections.
The TCP parameters only have effect on the endpoints of a TCP connection. So you don't have to change them on the VM host. But running in a guest means that the packets it sends still need to be processed by the host in some way (if only just to forward them to the real physical network interface). It will always be slower to run your test from inside a virtual machine than if you'd run it on a physical machine.
What I'm missing is any benchmark numbers that you can compare with the actual network speed. Is there room for improvement at all? Maybe you are already at the maximum speed that is possible? In that case no amount of tuning will help. Note that the defaults are normally very reasonable.
Related
I'm required to make a 'height sensing subsystem' to read the data sent from a moonlander by making a UDP protocol. The client is already set up for me, and is a 64bit executable on linux run by using ./simulator. So I need to make the UDP server in linux to connect with the client.
The client sends readings from many subsystems in the moonlander, but I only need to read one of them, which is the laser altimeter reading that corresponds to the a type specified by 0xaa01, there are other types such as 0xaa##, and 0xff##, but those correspond to different subsystems of the moonlander I assume. The data sent from the ./simulator file is sent through the type, which I then need to decode to find if its the laser altimeter, and then I need to decode the values to convert into distance to find when the moonlander has touched down. I need to read the time first, which has a size of 4 bytes and is an unsigned 32 bit integer, and the laser altimeter reading is 3 unsigned 16-bit integers that correspond to 3 different measurements (as there are 3 different sensors on the altimeter, max height of 1000m, convert by dividing by 65.535 which is UINT16_MAX, and multiplying by 100 to convert to cm). I need to then take those readings, convert them into height, and then acknowledge that we've landed once we've hit 40cm away from the ground.
How do I read the data from the ./simulator file? The problem is that when I run the ./receiver file, it stops working at the recvfrom() function as in my code below. In the instructions, they tell me to connect to port 12778, which works, but I'm not receiving anything.
#include <arpa/inet.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
// Create a UDP datagram socket
int main() {
int fd = socket(AF_INET, SOCK_DGRAM, 0);
if (fd < 0)
{
perror("Can't connect");
exit(EXIT_FAILURE);
}
struct sockaddr_in addr;
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET; // use IPv4
addr.sin_addr.s_addr = INADDR_ANY; // bind to all interfaces
addr.sin_port = htons(12778); // the port we want to bind
// Bind to the port specified above
if (bind(fd, (const struct sockaddr *)&addr, sizeof(addr)) < 0)
{
perror("cant connect");
exit(EXIT_FAILURE);
}
printf("here");
// Listen for data on our port (this is blocking)
char buffer[4096];
int n = recvfrom(fd, buffer, 4096, MSG_WAITALL, NULL, NULL);
printf("Recieved!");
}
I am able to gain the mss value from getsockopt:
tcpmss.c:
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/tcp.h>
#include <arpa/inet.h>
#include <stdio.h>
int main()
{
int sockfd, mss;
if ((sockfd = socket(AF_INET, SOCK_STREAM, 0)) < 0)
{
perror("sockfd");
return 1;
}
socklen_t len = sizeof(mss);
if (getsockopt(sockfd, IPPROTO_TCP, TCP_MAXSEG, &mss, &len) < 0)
{
perror("getsockopt");
return 1;
}
printf("maximum segment size: %d\n", mss);
}
output:
maximum segment size: 536
other sources says, the default mss is 1460. But If I try to check it from client:
client.c:
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/tcp.h>
#include <arpa/inet.h>
#include <stdio.h>
#include <netdb.h>
#include <string.h>
#include <unistd.h>
#define GET_CMD "GET %s HTTP/1.0\r\n\r\n"
#define SERV "80"
#define HOST "google.com"
#define HOMEPG "/"
//BUFSIZ = 8192, defined in <stdio.h>
int main()
{
int sockfd, nbytes;
struct addrinfo hints, *res;
char buf[BUFSIZ];
memset(&hints, 0, sizeof(hints));
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
if (getaddrinfo(HOST, SERV, &hints, &res) != 0)
{
perror("getaddrinfo");
return 1;
}
if ((sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol)) < 0)
{
perror("socket");
return 1;
}
if (connect(sockfd, res->ai_addr, res->ai_addrlen) < 0)
{
perror("connect");
return 1;
}
nbytes = snprintf(buf, 256, GET_CMD, HOMEPG);
if (write(sockfd, buf, nbytes) < 0)
{
perror("write");
return 1;
}
while ((nbytes = read(sockfd, buf, BUFSIZ)) > 0)
{
printf("read %d bytes of home page of %s\n", nbytes, HOST);
}
if (nbytes == 0)
{
printf("got EOF from google.com");
}
}
output:
read 8192 bytes of home page of google.com
read 3888 bytes of home page of google.com
read 7248 bytes of home page of google.com
read 4832 bytes of home page of google.com
read 6040 bytes of home page of google.com
read 6040 bytes of home page of google.com
read 6040 bytes of home page of google.com
read 4832 bytes of home page of google.com
read 2229 bytes of home page of google.com
got EOF from google.com
neither of those value is true. So I am little bit confuse with maximum segment size. I know the read() block and fetches more tcp segments into kernel receive buffer so I cannot see the true segment size from read() syscall, however, how to determine then the agreed window between the peers which should correspond to the MSS. Here on the first read(), I got full buffer, (BUFSIZE == 8192), then not even half, etc.
How to determine (all from my example)
MSS
propagated window between peers (and its relation to MSS)
how much segment size changes between each send operation (and why)
It's a relatively big question to answer since it includes many things.
Before getting into deep of answer, I think the most important thing to understand is that network is a complex and long path. We devide the path into serveral layers, and every layer may have differnt protocols which makes things more complex. So when you find a interesting thing about TCP, sometime we need to also look at the lower/upper layer to see the whole path.
Firstly, we always use tcpdump and wireshark to analysis network for more details, that can help you understand network deeply.
As for MSS, it is means Max Segment Size for TCP layer, since we use aonther limitation named as the maximum transmission unit (MTU) which is the size of the largest protocol data unit (PDU) that can be communicated in a single network layer transaction, MSS always need to cut down some little lengths for the header.
As for windows of TCP protocol, here are many different factors we must consider. We use a window because TCP need ACK to realise reliable transmission, the send/recv window can check for that(also develp for some other reasons). But with the bombing up of network stream, TCP add congestion control, so the window must edit by congestion window too. MSS/MTU is used as a factor to caclutate default value, but after that, many protocols/algorithms works together to make the window change for the reliable and efficient work of TCP connection.
For large packet, TCP can split them and then make them together when receiving. What's more, hardware can do it too. Here are many technologies like TSO(TCP Segmentation Offload), UFO(UDP Fragmentation Offload), GSO(Generic Segmentation Offload), LRO(Large Receive Offload)and GRO(Generic Receive Offload).
So you see, it's really interesting and complex.
In prep for my first time coding UDP, I'm trying out some example client and server code copied and lightly modified from here. Everything seems to be working except that the value returned by recvfrom() is always the size of the buffer instead of the number of bytes read (if I change my buffer size and recompile, the reported bytes received changes to match the new buffer size although the bytes sent are the same 10 bytes in every test).
Does anyone see any error(s) in this code that would explain the problem (some error-checking removed here for conciseness)? In case it's relevant, I'm compiling and running in bash in a Terminal window on a Macbook Pro running Yosemite 10.10.5:
#include <stdlib.h>
#include <string.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <unistd.h>
#define BUFLEN 1024
#define PORT 9930
int main(void) {
struct sockaddr_in si_me, si_other;
int s, i, slen=sizeof(si_other);
int nrecv;
char buf[BUFLEN];
s=socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
memset((char *) &si_me, 0, sizeof(si_me));
si_me.sin_family = AF_INET;
si_me.sin_port = htons(PORT);
si_me.sin_addr.s_addr = htonl(INADDR_ANY);
bind(s, &si_me, sizeof(si_me));
while (1) {
nrecv = recvfrom(s, buf, BUFLEN, 0, &si_other, &slen);
printf("Received packet from %s:%d\n%d bytes rec'd\n\n",
inet_ntoa(si_other.sin_addr), ntohs(si_other.sin_port), nrecv);
}
}
recvfrom truncates datagrams to the size of your buffer when the buffer is not large enough.
The fact that recvfrom returns the buffer size implies that your buffer size is not big enough, try increasing it to, say, 65535 bytes - the maximum theoretical UDP datagram size.
I'm working on a real time project on my Debian Wheezy (with real-time patch), it needs a strong reactivity using TCP communication protocol.
When I send a request, the response time is too long (220us) and I don't understand why.
My problem is when I send a request, my application server answers too late for my needs.
So, I decided to write a short program using TCP socket to acquire my server's response time. (see code below)
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
int main(int argc, char* argv[])
{
char sendBuffer [] = "OK";
char buffer [10];
int socket1;
int workingSocket;
socklen_t len;
int nodelay = 1;
struct sockaddr_in sa1;
struct sockaddr sa2;
socket1 = 0;
workingSocket = 0;
len = sizeof(sa1);
memset(&sa1, 0, len);
sa1.sin_addr.s_addr = htonl (INADDR_ANY);
sa1.sin_family = AF_INET;
sa1.sin_port = htons(12345);
socket1 = socket(AF_INET, SOCK_STREAM, 0);
bind(socket1, (struct sockaddr *)&sa1, len);
listen(socket1, 10);
workingSocket = accept(socket1, &sa2, &len);
setsockopt (workingSocket, IPPROTO_TCP, TCP_NODELAY, &nodelay, sizeof(nodelay));
// receive and send message back
while (1)
{
recv(workingSocket, buffer, 5, MSG_WAITALL);
send(workingSocket, sendBuffer, 2, 0);
}
}
I check the response time doing the following procedure :
start a wireshark session to trace the network traffic
launch my C server.
send a TCP request for example : $echo 'abcde'|netcat 192.168.0.1 12345
I got a response time of around 200 µs between the moment the string is sent (abcde) and the moment when I receive the reponse on the socket (OK)
This time seems to be very high. I made the same experience on VxWorks and got a response time aproaching 10µs.
Is the Linux kernel really slow or is there a trick to increase the reactivity of the system ?
Thank you for your help and your advices.
I've been writing some sockets code in C. I need modify packet headers and control how they're sent out, so I took the raw sockets approach. However, the code I wrote will not compile on BSD systems (Mac OS X/Darwin, FreeBSD, etc.)
I've done a bunch of research on this and have found that BSD systems can't handle raw sockets the way Linux (or even Windows) does. From what I've read, it seems I need to use bpf (berkley packet filter), but I can't figure out how bpf works or how I would go about using it with raw sockets.
If someone could shed some light on this one, I'd be very excited :D
P.S. I'll even be happy with some source code showing how raw sockets are handled in a BSD environment. It doesn't have to be a guide or explanation. I just want to see how it works.
Using raw sockets isn't hard but it's not entirely portable. For instance, both in BSD and in Linux you can send whatever you want, but in BSD you can't receive anything that has a handler (like TCP and UDP).
Here is an example program that sends a SYN.
#include <sys/socket.h>
#include <sys/types.h>
#include <netdb.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <arpa/inet.h>
#include <err.h>
#include <stdio.h>
#include <string.h>
#include <sysexits.h>
int
main(int argc, char *argv[])
{
int s, rc;
struct protoent *p;
struct sockaddr_in sin;
struct tcphdr tcp;
if (argc != 2)
errx(EX_USAGE, "%s addr", argv[0]);
memset(&sin, 0, sizeof(sin));
sin.sin_family = AF_INET;
sin.sin_port = 0;
/* Parse command line address. */
if (inet_pton(AF_INET, argv[1], &sin.sin_addr) <= 0)
err(EX_USAGE, "Parse address");
/* Look up tcp although it's 6. */
p = getprotobyname("tcp");
if (p == NULL)
err(EX_UNAVAILABLE, "getprotobyname");
/* Make a new shiny (Firefly) socket. */
s = socket(AF_INET, SOCK_RAW, p->p_proto);
if (s < 0)
err(EX_OSERR, "socket");
memset(&tcp, 0, sizeof(tcp));
/* Fill in some random stuff. */
tcp.th_sport = htons(4567);
tcp.th_dport = htons(80);
tcp.th_seq = 4; /* Chosen by fair dice roll. */
tcp.th_ack = 0;
tcp.th_off = 5;
tcp.th_flags = TH_SYN;
tcp.th_win = htonl(65535);
rc = sendto(s, &tcp, sizeof(tcp), 0, (struct sockaddr *)&sin,
sizeof(sin));
printf("Wrote %d bytes\n", rc);
return 0;
}
Of course, more BSD-specific solutions are available. For instance you could use divert(4) to intercept packets as they traverse your system and alter them.