I'd like to write little file transfer program in C (I'm working on Linux).
I'm quite new to sockets programming, but I already managed to write little server and
client programs.
My question:
If I have something like this:
int read_bytes = read(socket_id, buffer, 4096);
While reading from the socket, how can I get the progress of reading?
For example I want to display every 100ms how many bytes have been transferred
so far. I'm pretty sure I have to use threads or other async functions here.
Is there a function so I can get the number of bytes read and the number of bytes to read?
Update 1:
Accumulating read()
int total_read_bytes=0, read_bytes;
char buffer[4096];
do {
read_bytes = read(socket_id, buffer, 4096);
total_read_bytes += read_bytes;
} while(read_bytes > 0); /* Read to end or read failed */
Isn't this very, very inefficient? (For example to transfer a 1MiB file)
Thank you
If you have control to the code of both the server and the client, you can let the sender to tell the receiver the size of the file before actually sending the file. For example, using the first 8 bytes in the message. That's the number of bytes to read.
By accumulating the read_bytes in your example, you can get number of bytes read
Each recv call will block until it's read some more, but it won't (by default / sans MSG_WAITALL) wait for the supplied buffer to be full. That means there's no need to reduce the buffer size in an effort to get more frequent updates to the total bytes read information, you can trivially keep a total of recv() return values that updates as packets arrive (or as fast as your app can process them from the OS buffers).
As you observe, if that total is being updated 100 times a second, you probably don't want to issue 100 screen updates, preferring to limit them to your 100ms min interval. You could use a timer/alarm to manage that, but then you have more edge cases to see if the timer is already pending etc.. It's probably simpler just to use two threads where one checks periodically if an update is needed:
thread1:
while ((n_bytes = recv(...)) > 0)
total_bytes += n_bytes;
set screen-updater-termination flag
thread2:
last_total_bytes = -1;
while the screen-updater-termination flag's not set
sleep 100ms
if last_total_bytes != total_bytes
update screen
last_total_bytes = total_bytes
Of course you'll need to use a mutex or atomic operations to coordinate some of these actions.
Is there a function so I can get... the number of bytes to read?
Not from TCP itself - it doesn't have a notion of message sizes, and the API just ensures an app receives the bytes in the same order they were sent. If you want to be able to display a "N bytes received so far, X more expected", then at the application protocol level, the sending side should prefix the data in a logic message with the size - commonly in either a fixed width binary format (ideally using htonl or similar to send and ntohl on the recv side to avoid endian issues), or in a textual representation that's either fixed with or separated from the data by a known sentinel character (e.g. perhaps a NUL, a space or newline).
Related
What are the best practices when sending (or writing) and recving (or reading) to/from a TCP socket ?
Assume usual blocking I/O on sockets. From what I understand :
writing (sending) should be fine without a loop, because it will block if the write buffer of the socket is full, so something like
if ((nbytes_w = write(sock, buf, nb)) < nb)
/* something bad happened : error or interrupted by signal */
should always be correct ?
on the other hand, there is no guaranty that one will read a full message, so one should read with
while ((nbytes_r = read(sock, buf, MAX)) > 0) {
/* do something with these bytes */
/* break if encounter specific application protocol end of message flag
or total number of bytes was known from previous message
and/or application protocol header */
}
Am I correct ? Or is there some "small message size" or other conditions allowing to read safely outside a loop ?
I am confused because I have seen examples of "naked reads", for instance in Tanenbaum-Wetherall:
read(sa, buf, BUF_SIZE); /* read file name in socket */
Yes you must loop on the receive
Once a week I answer a question where someones TCP app stops working for this very reason. The real killer is that they developped the client and server on the same machine, so they get loopback connection. Almost all the time a loopback will receive the send messages in the same blocks as they were sent. This makes it look like the code is correct.
The really big challenge is that this means you need to know before the loop how big the message is that you are going to receive. Possibilities
send a fixed length length (ie you know its , say, 4 bytes) first.
have a recognizable end sequence (like the double crlf at the end of an HTTP request.
Have a fixed size message
I would always have a 'pull the next n bytes' function.
Writing should loop too, but that easy, its just a matter of looping.
I need to send data from the PC to my STM32F3, so I decided to use a built-in USB in uC.
But now I have a problem - I want to send to stm32 big amount of data at once - I mean something like 200-500 Bytes.
When I send from PC with minicom packets which have less than 64 chart - everything is fine - callback CDC_Receive_FS(uint8_t* Buf, uint32_t *Len) occurs once - it enables UsbRxFlag, just to inform the running program that there is data available.
static int8_t CDC_Receive_FS(uint8_t* Buf, uint32_t *Len)
{
/* USER CODE BEGIN 6 */
USBD_CDC_SetRxBuffer(&hUsbDeviceFS, &Buf[0]);
USBD_CDC_ReceivePacket(&hUsbDeviceFS);
if( (Buf[0] == 'A') & (Buf[1] == 'T') ){
GPIOB->BSRR = (uint32_t)RX_Led_Pin;
UsbRxFlag = 1;
}
return (USBD_OK);
/* USER CODE END 6 */
}
But when I try to send more data (just long text from minicom ) to uC, something weird happens - sometimes uC doesn't react at all - sometimes it doesn't take into account some data.
How can I handle sending to STM32F3 more than 64Bytes over USB-CDC?
The maximum packet length for full-speed USB communication is 64 bytes. So the data will be transferred in chunks of 64 bytes and needs to be reassembled on the other end.
USB CDC is based on bulk transfer endpoints and implements a data stream (also known as pipe), not a message stream. It's basically a stream of bytes. So if you send 200 bytes, do not expect any indication of where the 200 bytes end. Such information is not transmitted.
Your code looks a bit suspicious:
You probably meant '&&' instead of '&' as pointed out by Reinstate Monica.
Unless you change buffers, USBD_CDC_SetRxBuffer only needs to be called once at initialization.
When CDC_Receive_FS is called, a data packet has already been received. Buf will point to the buffer you have specified with USBD_CDC_SetRxBuffer. Len provides the length of the packet. So the first thing you would do is process the received data. Once the data has been processed and the buffer can be reused again, you would call USBD_CDC_ReceivePacket to indicate that you are ready to receive the next packet. So move USBD_CDC_SetRxBuffer to another function (unless you want to use several buffers) and move USBD_CDC_ReceivePacket to the end of CDC_Receive_FS.
The incorrect order of the function calls could likely have led to the received data being overwritten while you are still processing it.
But the biggest issue is likely that you expect that the entire data is received in a single piece if you sent is as a single piece, or that it at least contains an indication of the end of the piece. That's not the case. You will have to implement this yourself.
If you are using a text protocol, you could buffer all incoming data until you detect a line feed. Then you know that you have a complete command and can execute it.
The following is a general purpose implementation for reading an arbitrary number of bytes: https://github.com/philrawlings/bluepill-usb-cdc-test.
The full code is a little too long to post here, but this essentially modifies usb_cdc_if.c to create a circular buffer and exposes additional functions (CDC_GetRxBufferBytesAvailable_FS(), CDC_ReadRxBuffer_FS() and CDC_FlushRxBuffer_FS()) which can be consumed from main.c. The readme.md text shown on the main page describes all the code changes required.
As mentioned by #Codo, you will need to either add termination characters to your source data, or include a "length" value (which itself would be a fixed number of bytes) at the beginning to then indicate how many bytes are in the data payload.
In Receiver, I have
recvfd=accept(sockfd,&other_side,&len);
while(1)
{
recv(recvfd,buf,MAX_BYTES-1,0);
buf[MAX_BYTES]='\0';
printf("\n Number %d contents :%s\n",counter,buf);
counter++;
}
In Sender , I have
send(sockfd,mesg,(size_t)length,0);
send(sockfd,mesg,(size_t)length,0);
send(sockfd,mesg,(size_t)length,0);
MAX_BYTES is 1024 and length of mesg is 15. Currently, It calls recv only one time. I want recv function to be called three times for each corresponding send. How do I achieve it?
In short: yes, it is blocking. But not in the way you think.
recv() blocks until any data is readable. But you don't know the size in advance.
In your scenario, you could do the following:
call select() and put the socket where you want to read from into the READ FD set
when select() returns with a positive number, your socket has data ready to be read
then, check if you could receive length bytes from the socket:
recv(recvfd, buf, MAX_BYTES-1, MSG_PEEK), see man recv(2) for the MSG_PEEK param or look at MSDN, they have it as well
now you know how much data is available
if there's less than length available, return and do nothing
if there's at least length available, read length and return (if there's more than length available, we'll continue with step 2 since a new READ event will be signalled by select()
To send discrete messages over a byte stream protocol, you have to encode messages into some kind of framing language. The network can chop up the protocol into arbitrarily sized packets, and so the receives do not correlate with your messages in any way. The receiver has to implement a state machine which recognizes frames.
A simple framing protocol is to have some length field (say two octets: 16 bits, for a maximum frame length of 65535 bytes). The length field is followed by exactly that many bytes.
You must not even assume that the length field itself is received all at once. You might ask for two bytes, but recv could return just one. This won't happen for the very first message received from the socket, because network (or local IPC pipe, for that matter) segments are never just one byte long. But somewhere in the middle of the stream, it is possible that the fist byte of the 16 bit length field could land on the last position of one network frame.
An easy way to deal with this is to use a buffered I/O library instead of raw operating system file handles. In a POSIX environment, you can take an open socket handle, and use the fdopen function to associate it with a FILE * stream. Then you can use functions like getc and fread to simplify the input handling (somewhat).
If in-band framing is not acceptable, then you have to use a protocol which supports framing, namely datagram type sockets. The main disadvantage of this is that the principal datagram-based protocol used over IP is UDP, and UDP is unreliable. This brings in a lot of complexity in your application to deal with out of order and missing frames. The size of the frames is also restricted by the maximum IP datagram size which is about 64 kilobytes, including all the protocol headers.
Large UDP datagrams get fragmented, which, if there is unreliability in the network, adds up to greater unreliability: if any IP fragment is lost, the entire packet is lost. All of it must be retransmitted; there is no way to just get a repetition of the fragment that was lost. The TCP protocol performs "path MTU discovery" to adjust its segment size so that IP fragmentation is avoided, and TCP has selective retransmission to recover missing segments.
I bet you've created a TCP socket using SOCK_STREAM, which would cause the three messages to be read into your buffer during the first recv call. If you want to read the messages one-by-one, create a UPD socket using SOCK_DGRAM, or develop some type of message format which allows you to parse your messages when they arrive in a stream (assuming your messages will not always be fixed length).
First send the length to be received in a fixed format regarding the size of length in bytes you use to transmit this length, then make recv() loop until length bytes had been received.
Note the fact (as also already mentioned by other answers), that the size and number of chunks received do not necessarly need to be the same as sent. Only the sum of all bytes received shall be the same as the sum of all bytes sent.
Read the man pages for recvand send. Especially read the sections on what those functions RETURN.
recv will block until the entire buffer is filled, or the socket is closed.
If you want to read length bytes and return, then you must only pass to recv a buffer of size length.
You can use select to determine if
there are any bytes waiting to be read,
how many bytes are waiting to be read, then
read only those bytes
This can avoid recv from blocking.
Edit:
After re-reading the docs, the following may be true: your three "messages" may be being read all-at-once since length + length + length < MAX_BYTES - 1.
Another possibility, if recv is never returning, is that you may need to flush your socket from the sender-side. The data may be waiting in a buffer to actually be sent to the receiver.
I'm trying to read and write a serial port in Linux (Ubuntu 12.04) where a microcontroller on the other end blasts 1 or 3 bytes whenever it finishes a certain task. I'm able to successfully read and write to the device, but the problem is my reads are a little 'dangerous' right now:
do
{
nbytes = read(fd, buffer, sizeof(buffer));
usleep(50000);
} while(nbytes == -1);
I.e. to simply monitor what the device is sending me, I poll the buffer every half second. If it's empty, it idles in this loop. If it receives something or errors, it kicks out. Some logic then processes the 1 or 3 packets and prints it to a terminal. A half second is usually a long enough window for something to fully appear in the buffer, but quick enough for a human who will eventually see it to not think it's slow.
'Usually' is the keyword. If I read the buffer in the middle of it blasting 3 bytes. I'll get a bad read; the buffer will have either 1 or 2 bytes in it and it'll get rejected in the packet processing (If I catch the first of a 3 byte packet, it won't be a purposefully-sent-one-byte value).
Solutions I've considered/tried:
I've thought of simply reading in one byte at a time and feeding in additional bytes if its part of a 3 byte transmission. However this creates some ugly loops (as read() only returns the number of bytes of only the most previous read) that I'd like to avoid if I can
I've tried to read 0 bytes (eg nbytes = read(fd, buffer, 0);) just to see how many bytes are currently in the buffer before I try to load it into my own buffer, but as I suspected it just returns 0.
It seems like a lot of my problems would be easily solved if I could peek into the contents of the port buffer before I load it into a buffer of my own. But read() is destructive up to the amount of bytes that you tell it to read.
How can I read from this buffer such that I don't do it in the middle of receiving a transmission, but do it fast enough to not appear slow to a user? My serial messenger is divided into a sender and receiver thread, so I don't have to worry about my program loop blocking somewhere and neglecting the other half.
Thanks for any help.
Fix your packet processing. I always end up using a state machine for instances like this, so that if I get a partial message, I remember (stateful) where I left off processing and can resume when the rest of the packet arrives.
Typically I have to verify a checksum at the end of the packet, before proceeding with other processing, so "where I left off processing" is always "waiting for checksum". But I store the partial packet, to be used when more data arrives.
Even though you can't peek into the driver buffer, you can load all those bytes into your own buffer (in C++ a deque is a good choice) and peek into that all you want.
You need to know how large the messages being sent are. There are a couple of ways to do that:
Prefix the message with the length of the message.
Have a message-terminator, a byte (or sequence of bytes) that can not be part of a message.
Use the "command" to calculate the length, i.e. when you read a command-byte you know how much data should follow, so read that amount.
The second method is best for cases when you can come out of sync, because then read until you get the message-terminator sequence and you're sure that the next bytes will be a new message.
You can of course combine these methods.
To poll a device, you should better use a multiplexing syscall like poll(2) which succeeds when some data is available for reading from that device. Notice that poll is multiplexing: you can poll several file descriptors at once, and poll will succeed as soon as one (any) file descriptor is readable with POLLIN (or writable, if so asked with POLLOUT, etc...).
Once poll succeeded for some fd which you POLLIN you can read(2) from that fd
Of course, you need to know the conventions used by the hardware device about its messages. Notice that a single read could get several messages, or only a part of one (or more). There is no way to prevent reading of partial messages (or "packets") - probably because your PC serial I/O is much faster than the serial I/O inside your microcontroller. You should bear with that, by knowing the conventions defining the messages (and if you can change the software inside the microcontroller, define an easy convention for that) and implementing the appropriate state machine and buffering, etc...
NB: There is also the older select(2) syscall for multiplexing, which has limitations related to the C10K problem. I recommend poll instead of select in new code.
I am a bit confused about socket programming in C.
You create a socket, bind it to an interface and an IP address and get it to listen. I found a couple of web resources on that, and understood it fine. In particular, I found an article Network programming under Unix systems to be very informative.
What confuses me is the timing of data arriving on the socket.
How can you tell when packets arrive, and how big the packet is, do you have to do all the heavy lifting yourself?
My basic assumption here is that packets can be of variable length, so once binary data starts appearing down the socket, how do you begin to construct packets from that?
Short answer is that you have to do all the heavy lifting yourself. You can be notified that there is data available to be read, but you won't know how many bytes are available. In most IP protocols that use variable length packets, there will be a header with a known fixed length prepended to the packet. This header will contain the length of the packet. You read the header, get the length of the packet, then read the packet. You repeat this pattern (read header, then read packet) until communication is complete.
When reading data from a socket, you request a certain number of bytes. The read call may block until the requested number of bytes are read, but it can return fewer bytes than what was requested. When this happens, you simply retry the read, requesting the remaining bytes.
Here's a typical C function for reading a set number of bytes from a socket:
/* buffer points to memory block that is bigger than the number of bytes to be read */
/* socket is open socket that is connected to a sender */
/* bytesToRead is the number of bytes expected from the sender */
/* bytesRead is a pointer to a integer variable that will hold the number of bytes */
/* actually received from the sender. */
/* The function returns either the number of bytes read, */
/* 0 if the socket was closed by the sender, and */
/* -1 if an error occurred while reading from the socket */
int readBytes(int socket, char *buffer, int bytesToRead, int *bytesRead)
{
*bytesRead = 0;
while(*bytesRead < bytesToRead)
{
int ret = read(socket, buffer + *bytesRead, bytesToRead - *bytesRead);
if(ret <= 0)
{
/* either connection was closed or an error occurred */
return ret;
}
else
{
*bytesRead += ret;
}
}
return *bytesRead;
}
So, the answer to your question depends a fair bit on whether you are using UDP or TCP as your transport.
For UDP, life gets a lot simpler, in that you can call recv/recvfrom/recvmsg with the packet size you need (you'd likely send fixed-length packets from the source anyway), and make the assumption that if data is available, it's there in multiples of packet-length sizes. (I.E. You call recv* with the size of your sending side packet, and you're set.)
For TCP, life gets a bit more interesting - for the purpose of this explanation, I will assume that you already know how to use socket(), bind(), listen() and accept() - the latter being how you get the file descriptor (FD) of your newly made connection.
There are two ways of doing the I/O for a socket - blocking, in which you call read(fd, buf, N) and the read sits there and waits until you've read N bytes into buf - or non-blocking, in which you have to check (using select() or poll()) whether the FD is readable, and THEN do your read().
When dealing with TCP-based connections, the OS doesn't pay attention to the packet sizes, since it's considered a continual stream of data, not seperate packet-sized chunks.
If your application uses "packets" (packed or unpacked data structures that you're passing around), you ought to be able to call read() with the proper size argument, and read an entire data structure off the socket at a time. The only caveat you have to deal with, is to remember to properly byte-order any data that you're sending, in case the source and destination system are of different byte endian-ness. This applies to both UDP and TCP.
As far as *NIX socket programming is concerned, I highly recommend W. Richard Stevens' "Unix Network Programming, Vol. 1" (UNPv1) and "Advanced Programming in an Unix Environment" (APUE). The first is a tome regarding network-based programming, regardless of the transport, and the latter is a good all-around programming book as it applies to *NIX based programming. Also, look for "TCP/IP Illustrated", Volumes 1 and 2.
When you do a read on the socket, you tell it how many maximum bytes to read, but if it doesn't have that many, it gives you however many it's got. It's up to you to design the protocol so you know whether you've got a partial packet or not. For instance, in the past when sending variable length binary data, I would put an int at the beginning that said how many bytes to expect. I'd do a read requesting a number of bytes greater than the largest possible packet in my protocol, and then I'd compare the first int against however many bytes I'd received, and either process it or try more reads until I'd gotten the full packet, depending.
Sockets operate at a higher level than raw packets - it's like a file you can read/write from. Also, when you try to read from a socket, the operating system will block (put on hold) your process until it has data to fulfill the request.