I have a C program running on Linux that will receive data from 4 different IP addresses, on the same UDP port every 250mS (4 times a second). The data coming in on each socket is no more than 120 bytes per socket, and uses UDP. My question is, if I use the select() Linux call, would I be able to process all the data without missing any if the data arrives on the sockets at the same time? Would I have to use Pthreads instead?
If I do use select() would I just have to dump the data into buffers every 250mS then process it after I receive all four sockets data from select()? Assuming the processing can be completed within 250mS which it should only take 10mS or less.
Related
I am receiving UDP packets at the rate of 10Mbps. Each packet is formed of around 1109 bytes.
So, it makes more than 1pkt/ms that I am receving on eth0. The recvfrom() in C receives the packet and passes on the packet to Java. Java does the filtering of the packets and the necessary processing.
The bottlenecks are:
recvfrom() is too slow:fetching takes more than 10ms possibly because it does not get the CPU.
Passing of the packet from C to Java through the interface(JNI) takes 1-2 ms.
The processing of packet in Java itself takes 0.5 to 1 second depending on if database insertion or image processing needs to be done.
So, the problem is many delays add up and more than half of the packets are lost.
The possible solutions could be:
Exclude the need for C's recvfrom() completely and implement UDP fetching directly in Java (Note: On the first place, C's recvfrom() was implemented to receive raw packets and not UDP). This solution may reduce the JNI transfer delay.
Implement multi-threading on the UDP receive function in Java. But then an ID shall be required in the UDP packets for the sequence because in multi-threading the order of incoming packets is not guaranteed. (However, in this particular program, there is a need for packets to be ordered). This solution might be helpful in receiving all packets but the protocol which sends the data needs to be modified to add a sequence identifier. Due to multi-threading, the receiver might have higher chances to get the CPU and packets can be quickly fetched.
In Java, a blocking queue can be implemented as a huge buffer which stores the incoming packets. The Java parser can then use the packets from this queue to process it. However, it is not sure if the receiver function will be fast enough and put all the received packets in the queue without dropping any packet.
I would like to know which of the solutions could be optimal or a combination of the above solutions will work. Any help or suggestions would be greatly appreciated.
How long is this burst going on? Is it continuous and will go on forever? Then you need beefier hardware that can handle the load. Possibly some load-balancing where multiple servers handle the incoming data.
Does the burst only last a short wile, like in at most a second or two? Then have the lower levels read all packets as fast as it can, and put in a queue, and let the upper levels get the messages from the queue in its own time.
It sounds like you may be calling recvfrom() with your socket in blocking mode, in which case it will not return until the next packet arrives. If the packets are being sent at 10ms intervals, perhaps due to some delay on the sending side, then a loop of blocking recvfrom() calls would appear to take 10ms each. Set the socket to non-blocking and use something like select() to decide when to call it. Then profile everything to see where the real bottleneck lies. My bet would be on one or more of the JNI passthroughs.
Note that recvfrom() is not a "C" function, it is a system call. Java's functions just add layers on top of it.
I am building an UDP port scanner in C.
This is a scheme of the code
Create Socket
Structure raw UDP packet with port i
Send packet and wait n miliseconds for reply
I need to perform those tasks X times, depending on the number of ports to be scanned. It may be up to 65535 times.
My goal is to optimize resources, considering an i386 machine running under a 3.5.0-17-generic Linux kernel.
How many threads should be created?
How many packets should be sent inside a single thread?
Thanks for your attention.
One thread, using select, epoll or similar.
All of them. Remember to rate limit since that doesn't happen automatically with UDP.
Issue summary: AF_UNIX stable sending, bursty receiving.
I have an application B that receives data over unix domain datagram socket. There is peer application A that sends data to it. Both A and B are running continuously (and are SCHED_FIFO). My application A also prints the time of reception.
The peer application B can send data at varying timings (varying in terms of milliseconds only). Ideally (what I expect) the packet send delay should exactly match with reception delay. For example:
A sends in time : 5ms 10ms 15ms 21ms 30ms 36ms
B should receive in time : 5+x ms 10+x ms 15+x ms 21+x ms ...
Where x is a constant delay.
But when I experimented what I observe in B is :
A sends in time : 5ms 10ms 15ms 21ms 30ms 36ms
B received in time : 5+w ms 10+x ms 15+y ms 21+z ms ...
(w,x,y,z are different constant delays). So I cannot predict reception time when sending time is given).
Is it because some buffering is involved in unix domain socket ? Please suggest some workaround for the issue so that the reception time is predicable from send time. I need 1 millisecond accuracy.
(I am using vanilla Linux 3.0 kernel)
As you are using blocking recv(), when no datagram is available your program will be unscheduled. This is bad for your use case--you want your program to stay hot. So make your recv() non-blocking, and handle EAGAIN by simply busy waiting. This will consume 100% of one core, but I think you'll find it helps you achieve your goal.
I'm using Linux 64 bit Linux scv 3.2.0-39-generic #62-Ubuntu SMP Thu Feb 28 00:28:53 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux and have two processes using sockets which run on the same physical host.
One process (A) sends on a TCP/IP socket (would be a local socket given the host is the same) the following pieces of data:
276 bytes
16 bytes
This is done in 0.000023 seconds form process A. The data is being sent calling 2 times the send socket API.
Another process (B), receives the data via epoll using epoll_wait(efd, events, 10, 5). Data is received as follows (time is taken with clock_gettime(CLOCK_REALTIME, &cur_ts);, what matters is relative difference):
Read data from socket buffer at 8051.177743 (276)
Call to epoll 8051.177763 again
Read data from socket buffer 8051.216250 (16)
Making the receiving process lag of 0.038507 seconds. Basically if the sending process A takes less than a ms, on the receiving side epoll to receive the data adds an additional lag of approximately 0.038 s.
Is this expected? What am I doing wrong?
Or how can I improve the situation?
Thanks
Is this expected? ...
Yes. I would expect that. Here's why:
What am I doing wrong? ...
epoll was designed to be used in situations where large numbers of file descriptors need to be watched. That's what it's suitable for, and it seems to me that the situation you're using it for isn't that situation.
... how can I improve the situation?
If you want to improve the performance, use the right tool for the job. Don't use epoll for a single socket. Just use plain-old vanilla recv. If you're handling two or three sockets, consider using poll or select. If you're venturing into hundreds, then you might want to consider using epoll or kqueue.
I have developed a udp server/client application in which server has one socket at which it continuously receives data from 40 clients.
Now I want to know that what happens if all of the 40 Clients send data at a time?
According to my understanding, data must be queued in receive buffer and next time when I call recvfrom() the data queued in the buffer is received i.e. I shall have to call recvfrom() 40 times to receive data of all the 40 Clients even if all the Clients sent data simultaneously.
Also, I want to know that all of the data of 40 Clients will be queued in receive buffer or some of the data will be discarded too?
Also, what is the maximum buffer size in which data can be queued in receive buffer and after what limit is data dropped?
I shall have to call recvfrom() 40 times to receive data of all the 40 Clients even if all the Clients sent data simultaneously.
In other words, you're asking whether separate UDP datagrams can be combined by the network stack. The answer is: no, they'll arrive as separate datagrams requiring separate calls to recvfrom().
Also, I want to know that all of the data of 40 Clients will be queued in receive buffer or some of the data will be discarded too?
UDP does not guarantee delivery. Packets can be dropped anywhere along the route: by the sending host, by any devices along the route and by the receiving host.
Also, what is the maximum buffer size in which data can be queued in receive buffer and after what limit is data dropped?
This is OS-dependent and is usually configurable. Bear in mind that packets might get dropped before even reaching the receiving host.
Indeed it all depends on the size of your OS socket buffers. In linux its fairly easy to configure, so all you probably need to do it to google how to change it for your Windows system.