I have a server which needs to transfer a large amount of data (~1-2 gigs) per request. What is the recommended amount of data that each call to send should have? Does it matter?
The TCP/IP stack takes care of not sending packets larger than the PATH MTU, and you would have to work rather hard to make it send packets smaller than the MTU, in ways that wouldn't help throughput: and there is no way of dsicovering what the path PTU actually is for any given connection. So leave all consideration of the path MTU out of it.
You can and should make every send() as big as possible.
Your packet size is constraint by the Maximum Transfer Unit (MTU) of the protocol. If you send a packet bigger than the MTU then you get packet fragmentation.
You can do some math: http://www.speedguide.net/articles/mtu-what-difference-does-it-make--111
More References: http://en.wikipedia.org/wiki/Maximum_transmission_unit
But ultimately my suggestion is that if you aim for good enough and just let the os network layer do its work, unless you are programming raw sockets or passing some funky options, is pretty common that the os will have a network buffer and will try to do its best.
Considering this last option then the socket.send() is nothing else than a memory transfer from your user process memory to the kernel's private memory. The rule of thumb is to not exceed the virtual memory page size http://en.wikipedia.org/wiki/Page_(computer_memory) that is usually 4KB.
Related
When will a TCP packet be fragmented at the application layer? When a TCP packet is sent from an application, will the recipient at the application layer ever receive the packet in two or more packets? If so, what conditions cause the packet to be divided. It seems like a packet won't be fragmented until it reaches the Ethernet (at the network layer) limit of 1500 bytes. But, that fragmentation will be transparent to the recipient at the application layer since the network layer will reassemble the fragments before sending the packet up to the next layer, right?
It will be split when it hits a network device with a lower MTU than the packet's size. Most ethernet devices are 1500, but it can often be smaller, like 1492 if that ethernet is going over PPPoE (DSL) because of the extra routing information, even lower if a second layer is added like Windows Internet Connection Sharing. And dialup is normally 576!
In general though you should remember that TCP is not a packet protocol. It uses packets at the lowest level to transmit over IP, but as far as the interface for any TCP stack is concerned, it is a stream protocol and has no requirement to provide you with a 1:1 relationship to the physical packets sent or received (for example most stacks will hold messages until a certain period of time has expired, or there are enough messages to maximize the size of the IP packet for the given MTU)
As an example if you sent two "packets" (call your send function twice), the receiving program might only receive 1 "packet" (the receiving TCP stack might combine them together). If you are implimenting a message type protocol over TCP, you should include a header at the beginning of each message (or some other header/footer mechansim) so that the receiving side can split the TCP stream back into individual messages, either when a message is received in two parts, or when several messages are received as a chunk.
Fragmentation should be transparent to a TCP application. Keep in mind that TCP is a stream protocol: you get a stream of data, not packets! If you are building your application based on the idea of complete data packets then you will have problems unless you add an abstraction layer to assemble whole packets from the stream and then pass the packets up to the application.
The question makes an assumption that is not true -- TCP does not deliver packets to its endpoints, rather, it sends a stream of bytes (octets). If an application writes two strings into TCP, it may be delivered as one string on the other end; likewise, one string may be delivered as two (or more) strings on the other end.
RFC 793, Section 1.5:
"The TCP is able to transfer a
continuous stream of octets in each
direction between its users by
packaging some number of octets into
segments for transmission through the
internet system."
The key words being continuous stream of octets (bytes).
RFC 793, Section 2.8:
"There is no necessary relationship
between push functions and segment
boundaries. The data in any particular
segment may be the result of a single
SEND call, in whole or part, or of
multiple SEND calls."
The entirety of section 2.8 is relevant.
At the application layer there are any number of reasons why the whole 1500 bytes may not show up one read. Various factors in the internal operating system and TCP stack may cause the application to get some bytes in one read call, and some in the next. Yes, the TCP stack has to re-assemble the packet before sending it up, but that doesn't mean your app is going to get it all in one shot (it is LIKELY will get it in one read, but it's not GUARANTEED to get it in one read).
TCP tries to guarantee in-order delivery of bytes, with error checking, automatic re-sends, etc happening behind your back. Think of it as a pipe at the app layer and don't get too bogged down in how the stack actually sends it over the network.
This page is a good source of information about some of the issues that others have brought up, namely the need for data encapsulation on an application protocol by application protocol basis Not quite authoritative in the sense you describe but it has examples and is sourced to some pretty big names in network programming.
If a packet exceeds the maximum MTU of a network device it will be broken up into multiple packets. (Note most equipment is set to 1500 bytes, but this is not a necessity.)
The reconstruction of the packet should be entirely transparent to the applications.
Different network segments can have different MTU values. In that case fragmentation can occur. For more information see TCP Maximum segment size
This (de)fragmentation happens in the TCP layer. In the application layer there are no more packets. TCP presents a contiguous data stream to the application.
A the "application layer" a TCP packet (well, segment really; TCP at its own layer doesn't know from packets) is never fragmented, since it doesn't exist. The application layer is where you see the data as a stream of bytes, delivered reliably and in order.
If you're thinking about it otherwise, you're probably approaching something in the wrong way. However, this is not to say that there might not be a layer above this, say, a sequence of messages delivered over this reliable, in-order bytestream.
Correct - the most informative way to see this is using Wireshark, an invaluable tool. Take the time to figure it out - has saved me several times, and gives a good reality check
If a 3000 byte packet enters an Ethernet network with a default MTU size of 1500 (for ethernet), it will be fragmented into two packets of each 1500 bytes in length. That is the only time I can think of.
Wireshark is your best bet for checking this. I have been using it for a while and am totally impressed
I want to do my own very simple implementation of VPN in C on Linux. For that purpose I'm going to capture IP packets, modify them and send forward. The modification consists of encryption, authentication and other stuff like in IPSec. My question is should I process somehow the size of packets or this will be handled automatically? I know it's maximum size is 65535 - 20 (for header) but accoring to MTU it is lesser. I think its because encrypted payload "incapsulated into UDP" for NAT-T is much bigger then just "normal payload" of the IP packet.
Well, I found that there actually 2 ways to handle that problem:
1) We can send big packets by settings DF flag to tell we want fragment out packets. But in this case packet can be lost, because not all the devices/etc support packet fragmentation
2) We can automatically calculate our maximum MTU between hosts, split them and send. On another side we put all this packets together and restore them. This can be done by implementing our own "system" for this purpose.
More about IP packets fragmentation and reassembly you can read here
When using pcap_open_live to sniff from an interface, I have seen a lot of examples using various numbers as SNAPLEN value, ranging from BUFSIZ (<stdio.h>) to "magic numbers".
Wouldn't it make more sense to set as SNAPLEN the MTU of the interface we are capturing from ?
In this manner, we could fit more packets at once in PCAP buffer. Is it safe to assume that the MRU is equal to the MTU ?
Otherwise, is there a non-exotic way to set the SNAPLEN value ?
Thanks
The MTU is the largest payload size that could be handed to the link layer; it does not include any link-layer headers, so, for example, on Ethernet it would be 1500, not 1514 or 1518, and wouldn't be large enough to capture a full-sized Ethernet packet.
In addition, it doesn't include any metadata headers such as the radiotap header for 802.11 radio information.
And if the adapter is doing any form of fragmentation/segmentation/reassembly offloading, the packets handed to the adapter or received from the adapter might not yet be fragmented or segmented, or might have been reassembled, and, as such, might be much larger than the MTU.
As for fitting more packets in the PCAP buffer, that only applies to the memory-mapped TPACKET_V1 and TPACKET_V2 capture mechanisms in Linux, which have fixed-size packet slots; other capture mechanisms do not reserve a maximum-sized slot for every packet, so a shorter snapshot length won't matter. For TPACKET_V1 and TPACKET_V2, a smaller snapshot length could make a difference, although, at least for Ethernet, libpcap 1.2.1 attempts, as best it can, to choose an appropriate buffer slot size for Ethernet. (TPACKET_V3 doesn't appear to have the fixed-size per-packet slots, in which case it wouldn't have this problem, but it only appeared in officially-released kernels recently, and no support for it exists yet in libpcap.)
We have a client/server communication system over UDP setup in windows. The problem we are facing is that when the throughput grows, packets are getting dropped. We suspect that this is due to the UDP receive buffer which is continuously being polled causing the buffer to be blocked and dropping any incoming packets. Is it possible that reading this buffer will cause incoming packets to be dropped? If so, what are the options to correct this? The system is written in C. Please let me know if this is too vague and I can try to provide more info. Thanks!
The default socket buffer size in Windows sockets is 8k, or 8192 bytes. Use the setsockopt Windows function to increase the size of the buffer (refer to the SO_RCVBUF option).
But beyond that, increasing the size of your receive buffer will only delay the time until packets get dropped again if you are not reading the packets fast enough.
Typically, you want two threads for this kind of situation.
The first thread exists solely to service the socket. In other words, the thread's sole purpose is to read a packet from the socket, add it to some kind of properly-synchronized shared data structure, signal that a packet has been received, and then read the next packet.
The second thread exists to process the received packets. It sits idle until the first thread signals a packet has been received. It then pulls the packet from the properly-synchronized shared data structure and processes it. It then waits to be signaled again.
As a test, try short-circuiting the full processing of your packets and just write a message to the console (or a file) each time a packet has been received. If you can successfully do this without dropping packets, then breaking your functionality into a "receiving" thread and a "processing" thread will help.
Yes, the stack is allowed to drop packets — silently, even — when its buffers get too full. This is part of the nature of UDP, one of the bits of reliability you give up when you switch from TCP. You can either reinvent TCP — poorly — by adding retry logic, ACK packets, and such, or you can switch to something in-between like SCTP.
There are ways to increase the stack's buffer size, but that's largely missing the point. If you aren't reading fast enough to keep buffer space available already, making the buffers larger is only going to put off the time it takes you to run out of buffer space. The proper solution is to make larger buffers within your own code, and move data from the stack's buffers into your program's buffer ASAP, where it can wait to be processed for arbitrarily long times.
Is it possible that reading this buffer will cause incoming packets to be dropped?
Packets can be dropped if they're arriving faster than you read them.
If so, what are the options to correct this?
One option is to change the network protocol: use TCP, or implement some acknowledgement + 'flow control' using UDP.
Otherwise you need to see why you're not reading fast/often enough.
If the CPU is 100% utilitized then you need to do less work per packet or get a faster CPU (or use multithreading and more CPUs if you aren't already).
If the CPU is not 100%, then perhaps what's happening is:
You read a packet
You do some work, which takes x msec of real-time, some of which is spent blocked on some other I/O (so the CPU isn't busy, but it's not being used to read another packet)
During those x msec, a flood of packets arrive and some are dropped
A cure for this would be to change the threading.
Another possibility is to do several simultaneous reads from the socket (each of your reads provides a buffer into which a UDP packet can be received).
Another possibility is to see whether there's a (O/S-specific) configuration option to increase the number of received UDP packets which the network stack is willing to buffer until you try to read them.
First step, increase the receiver buffer size, Windows pretty much grants all reasonable size requests.
If that doesn't help, your consume code seems to have some fairly slow areas. I would use threading, e.g. with pthreads and utilize a producer consumer pattern to put the incoming datagram in a queue on another thread and then consume from there, so your receive calls don't block and the buffer does not run full
3rd step, modify your application level protocol, allow for batched packets and batch packets at the sender to reduce UDP header overhead from sending a lot of small packets.
4th step check your network gear, switches, etc. can give you detailed output about their traffic statistics, buffer overflows, etc. - if that is in issue get faster switches or possibly switch out a faulty one
... just fyi, I'm running UDP multicast traffic on our backend continuously at avg. ~30Mbit/sec with peaks a 70Mbit/s and my drop rate is bare nil
Not sure about this, but on windows, its not possible to poll the socket and cause a packet to drop. Windows collects the packets separately from your polling and it shouldn't cause any drops.
i am assuming your using select() to poll the socket ? As far as i know , cant cause a drop.
The packets could be lost due to an increase in unrelated network traffic anywhere along the route, or full receive buffers. To mitigate this, you could increase the receive buffer size in Winsock.
Essentially, UDP is an unreliable protocol in the sense that packet delivery is not guaranteed and no error is returned to the sender on delivery failure. If you are worried about packet loss, it would be best to implement acknowledgment packets into your communication protocol, or to port it to a more reliable protocol like TCP. There really aren't any other truly reliable ways to prevent UDP packet loss.
I am sending data from a linux application through serial port to an embedded device.
In the current implementation a byte circular buffer is used in the firmware. (Nothing but an array with a read and write pointer)
As the bytes come in, it is written to the circular bufffer.
Now the PC application appears to be sending the data too fast for the firmware to handle. Bytes are missed resulting in the firmware returning WRONG_INPUT too mant times.
I think baud rate (115200) is not the issue. A more efficient data structure at the firmware side might help. Any suggestions on choice of data structure?
A circular buffer is the best answer. It is the easiest way to model a hardware FIFO in pure software.
The real issue is likely to be either the way you are collecting bytes from the UART to put in the buffer, or overflow of that buffer.
At 115200 baud with the usual 1 start bit, 1 stop bit and 8 data bits, you can see as many as 11520 bytes per second arrive at that port. That gives you an average of just about 86.8 µs per byte to work with. In a PC, that will seem like a lot of time, but in a small microprocessor, it might not be all that many total instructions or in some cases very many I/O register accesses. If you overfill your buffer because bytes are arriving on average faster than you can consume them, then you will have errors.
Some general advice:
Don't do polled I/O.
Do use a Rx Ready interrupt.
Enable the receive FIFO, if available.
Empty the FIFO completely in the interrupt handler.
Make the ring buffer large enough.
Consider flow control.
Sizing your ring buffer large enough to hold a complete message is important. If your protocol has known limits on the message size, then you can use the higher levels of your protocol to do flow control and survive without the pains of getting XON/XOFF flow to work right in all of the edge cases, or RTS/CTS to work as expected in both ends of the wire which can be nearly as hairy.
If you can't make the ring buffer that large, then you will need some kind of flow control.
There is nothing better than a circular buffer.
You could use a slower baud rate or speed up the application in the firmware so that it can handle data coming at full speed.
If the output of the PC is in bursts it may help to make the buffer big enough to handle one burst.
The last option is to implement some form of flow control.
What do you mean by embedded device ? I think most of current DSP and processor can easily handle this kind of load. The problem is not with the circular buffer, but how do you collect bytes from the serial port.
Does your UART have a hardware fifo ? If yes, then you should enable it. If you have an interrupt per byte, you can quickly get into trouble, especially if you are working with an OS or with virtual memory, where the IRQ cost can be quit high.
If your receiving firmware is very simple (no multitasking), and you don't have an hardware fifo, polled mode can be a better solution than interrupt driven, because then your processor is doing only UART data reception, and you have no interrupt overhead.
Another problem might be with the transfer protocol. For example if you have long packet of data that you have to checksum, and you do the whole checksum at the end of the packet, then all the processing time of the packet is at the end of it, and that is why you may miss the beginning of the next packet.
So circular buffer is fine and you have to way to improve :
- The way you interact with the hardware
- The protocol (packet length, acknoledgment etc ...)
Before trying to solve the problem, first you need to establish what the problem really is. Otherwise you might waste time trying to fix something that isn't actually broken.
Without knowing more about your set-up it's hard to give more specific advice. But you should investigate further to establish what exactly the hardware and software is currently doing when the bytes come in, and then what is the weak point where they're going missing.
A circular buffer with Interrupt driven IO will work on the smallest and slowest of embedded targets.
First try it at the lowest baud rate and only then try at high speeds.
Using a circular buffer in conjunction with IRQ is an excellent suggestion. If your processor generates an interrupt each time a byte is received take that byte and store it in the buffer. How you decide to empty that buffer depends on if you are processing a stream of data or data packets. If you are processing a stream simply have your background process remove the bytes from the buffer and process them first-in-first-out. If you are processing packets then just keep filing the buffer until you have a complete packet. I've used the packet method successfully many times in the past. I would implement some type of flow control as well to signal to the PC if something went wrong like a full buffer or if packet-processing time is long to indicate to the PC when it is ready for the next packet.
You could implement something like IP datagram which contains data length, id, and checksum.
Edit:
Then you could hard-code some fixed length for the packets, for example 1024 byte or whatever that makes sense for the device. PC side would then check if the queue is full at the device every time it writes in a packet. Firmware side would run checksum to see if all data is valid, and read up till the data length.