Retrieve unaligned netlink message size in kernel space - c

I am working on a Linux kernel module which has a bi-directional communication link with a userspace module provided by netlink.
I have an issue with an incorrect message length calculation on messages sent from userspace to kernel space. The message is sent from userspace like this:
this->sendLock.lock();
this->netlinkTxHeader->nlmsg_len = NLMSG_SPACE(len);
this->netlinkTxIov.iov_len = this->netlinkTxHeader->nlmsg_len;
memcpy(NLMSG_DATA(this->netlinkTxHeader), buf, len);
int32_t status = sendmsg(this->netlinkSock, &this->netlinkTxMsg, 0);
And is received in kernel space like this:
unsigned char* buf = (unsigned char*)NLMSG_DATA(nlh);
int len = NLMSG_PAYLOAD(nlh, 0);
However the calculated value of len always seems to be the aligned size, which I don't want. For example, I can see from debug information that the userspace process sends a message with a payload of 14 bytes (excluding netlink headers). When this is received in the kernel module, however, NLMSG_PAYLOAD returns a length of 16 bytes.
Is there any way to get the unaligned payload length (i.e. the actual payload length) back in the kernel module? I looked through the macros in netlink.h but I don't see anything which would help.
Note that the nlmsghdr object is derived using the netlink_rcv_skb() function in the kernel module.
The only other way I can see to fix this is to prefix the actual length in the payload information which I think would work, but doesn't really feel "correct".

See man 3 netlink: you need to use NLMSG_LENGTH(len) (and not NLMSG_SPACE(len)) to calculate the nlmsg_len field of the nlmsghdr.

Related

Using C POSIX sockets, can you determine how many bytes a socket contains without extracting?

I'm working with POSIX sockets in C.
Given X, I have a need to verify that the socketfd contains at least X bytes before proceeding to perform an operation with it.
With that being said, I don't want to receive X bytes and store it into a buffer using recv as X has the potential of being very large.
My first idea was to use MSG_PEEK...
int x = 9999999
char buffer[1];
int num_bytes = recv(socketfd, buffer, X, MSG_PEEK);
(value == X) ? good : bad;
...
...
...
// Do some operation
But I'm concerned X > 1 is corrupting memory, flag MSG_TRUNC seems to resolve the memory concern but removes X bytes from socketfd.
There's a big difference between e.g. TCP and UDP in this regards.
UDP is packet based, you send and receive packets of fixed size, basically.
TCP is a streaming protocol, where data begins to stream on connection and stops at disconnection. There are no message boundaries or delimiters in TCP, other than what you add at the application layer. It's simply a stream of bytes without any meaning (in TCP's point of view).
That means there's no way to tell how much will be received with a single recv call.
You need to come up with an application-level protocol (on top of TCP) which can either tell the size of the data to be received; For example there might be a fixed-size data-header that contains the size of the following data; Or you could have a specific delimiter between messages, something that can't occur in the stream of bytes.
Then you receive in a loop until you either have received all the data, or until you have received the delimiter. But note, with a delimiter there's the possibility that you also receive the beginning of the next message, so you need to be able to handle partial beginnings of message after the current message have been fully received.
int num_bytes = recv(socketfd, buffer, X, MSG_PEEK);
This will copy up to X byte into buffer and return it without removing it from the socket. But your buffer is only 1 byte large. Increase your buffer.
Have you tried this?
ssize_t available = recv(socketfd, NULL, 0, MSG_PEEK | MSG_TRUNC);
Or this?
size_t available;
ioctl(socketfd, FIONREAD, &available);

Sockets - Reading and writing [duplicate]

I'm very new to C++, but I'm trying to learn some basics of TCP socket coding. Anyway, I've been able to send and receive messages, but I want to prefix my packets with the length of the packet (like I did in C# apps I made in the past) so when my window gets the FD_READ command, I have the following code to read just the first two bytes of the packet to use as a short int.
char lengthBuffer[2];
int rec = recv(sck, lengthBuffer, sizeof(lengthBuffer), 0);
short unsigned int toRec = lengthBuffer[1] << 8 | lengthBuffer[0];
What's confusing me is that after a packet comes in the 'rec' variable, which says how many bytes were read is one, not two, and if I make the lengthBuffer three chars instead of two, it reads three bytes, but if it's four, it also reads three (only odd numbers). I can't tell if I'm making some really stupid mistake here, or fundamentally misunderstanding some part of the language or the API. I'm aware that recv doesn't guarantee any number of bytes will be read, but if it's just two, it shouldn't take multiple reads.
Because you cannot assume how much data will be available, you'll need to continuously read from the socket until you have the amount you want. Something like this should work:
ssize_t rec = 0;
do {
int result = recv(sck, &lengthBuffer[rec], sizeof(lengthBuffer) - rec, 0);
if (result == -1) {
// Handle error ...
break;
}
else if (result == 0) {
// Handle disconnect ...
break;
}
else {
rec += result;
}
}
while (rec < sizeof(lengthBuffer));
Streamed sockets:
The sockets are generally used in a streamed way: you'll receive all the data sent, but not necessarily all at once. You may as well receive pieces of data.
Your approach of sending the length is hence valid: once you've received the length, you cann then load a buffer, if needed accross successive reads, until you got everything that you expected. So you have to loop on receives, and define a strategy on how to ahandle extra bytes received.
Datagramme (packet oriented) sockets:
If your application is really packet oriented, you may consider to create a datagramme socket, by requesting linux or windows socket(), the SOCK_DGRAM, or better SOCK_SEQPACKET socket type.
Risk with your binary size data:
Be aware that the way you send and receive your size data appers to be assymetric. You have hence a major risk if the sending and receiving between machine with CPU/architectures that do not use the same endian-ness. You can find here some hints on how to ame your code platform/endian-independent.
TCP socket is a stream based, not packet (I assume you use TCP, as to send length of packet in data does not make any sense in UDP). Amount of bytes you receive at once does not have to much amount was sent. For example you may send 10 bytes, but receiver may receive 1 + 2 + 1 + 7 or whatever combination. Your code has to handle that, be able to receive data partially and react when you get enough data (that's why you send data packet length for example).

Get the number of bytes available in socket by 'recv' with 'MSG_PEEK' in C++

C++ has the following function to receive bytes from socket, it can check for number of bytes available with the MSG_PEEK flag. With MSG_PEEK, the returned value of 'recv' is the number of bytes available in socket:
#include <sys/socket.h>
ssize_t recv(int socket, void *buffer, size_t length, int flags);
I need to get the number of bytes available in the socket without creating buffer (without allocating memory for buffer). Is it possible and how?
You're looking for is ioctl(fd,FIONREAD,&bytes_available) , and under windows ioctlsocket(socket,FIONREAD,&bytes_available).
Be warned though, the OS doesn't necessarily guarantee how much data it will buffer for you, so if you are waiting for very much data you are going to be better off reading in data as it comes in and storing it in your own buffer until you have everything you need to process something.
To do this, what is normally done is you simply read chunks at a time, such as
char buf[4096];
ssize_t bytes_read;
do {
bytes_read = recv(socket, buf, sizeof(buf), 0);
if (bytes_read > 0) {
/* do something with buf, such as append it to a larger buffer or
* process it */
}
} while (bytes_read > 0);
And if you don't want to sit there waiting for data, you should look into select or epoll to determine when data is ready to be read or not, and the O_NONBLOCK flag for sockets is very handy if you want to ensure you never block on a recv.
On Windows, you can use the ioctlsocket() function with the FIONREAD flag to ask the socket how many bytes are available without needing to read/peek the actual bytes themselves. The value returned is the minimum number of bytes recv() can return without blocking. By the time you actually call recv(), more bytes may have arrived.
Be careful when using FIONREAD! The problem with using ioctl(fd, FIONREAD, &available) is that it will always return the total number of bytes available for reading in the socket buffer on some systems.
This is no problem for STREAM sockets (TCP) but misleading for DATAGRAM sockets (UDP). As for datagram sockets read requests are capped to the size of the first datagram in the buffer and when reading less than she size of the first datagram, all unread bytes of that datagram are still discarded. So ideally you want to know only the size of the next datagram in the buffer.
E.g. on macOS/iOS it is documented that FIONREAD always returns the total amount (see comments about SO_NREAD). To only get the size of the next datagram (and total size for stream sockets), you can use the code below:
int available;
socklen_t optlen = sizeof(readable);
int err = getsockopt(soc, SOL_SOCKET, SO_NREAD, &available, &optlen);
On Linux FIONREAD is documented to only return the size of the next datagram for UDP sockets.
On Windows ioctlsocket(socket, FIONREAD, &available) is documented to always give the total size:
If the socket passed in the s parameter is message oriented (for example, type SOCK_DGRAM), FIONREAD returns the reports the total number of bytes available to read, not the size of the first datagram (message) queued on the socket.
Source: https://learn.microsoft.com/en-us/windows/win32/api/ws2spi/nc-ws2spi-lpwspioctl
I am unaware of a way how to get the size of the first datagram only on Windows.
The short answer is : this cannot be done with MS-Windows WinSock2,
as I can discovered over the last week of trying.
Glad to have finally found this post, which sheds some light on the issues I've been having, using latest Windows 10 Pro, version 20H2 Build 19042.867 (x86/x86_64) :
On a bound, disconnected UDP socket 'sk' (in Listening / Server mode):
1. Any attempt to use either ioctlsocket(sk, FIONREAD, &n_bytes)
OR WsaIoctl with a shifted FIONREAD argument, though they succeed,
and retern 0, after a call to select() returns > with that
'sk' FD bit set in the read FD set,
and the ioctl call returns 0 (success), and n_bytes is > 0,
causes the socket sk to be in a state where any
subsequent call to recv(), recvfrom(), or ReadFile() returns
SOCKET_ERROR with a WSAGetLastError() of :
10045, Operation Not Supported, or ReadFile
error 87, 'Invalid Parameter'.
Moreover, even worse:
2. Any attempt to use recv or recvfrom with the 'MSG_PEEK' msg_flags
parameter returns -1 and WSAGetLastError returns :
10040 : 'A message sent on a datagram socket was larger than
the internal message buffer or some other network limit,
or the buffer used to receive a datagram into was smaller
than the datagram itself.
' .
Yet for that socket I DID successfully call:
setsockopt(s, SOL_SOCKET, SO_RCVBUF, bufsz = 4096 , sizeof(bufsz) )
and the UDP packet being received was of only 120 bytes in size.
In short, with modern windows winsock2 ( winsock2.h / Ws2_32.dll) ,
there appears to be absolutely no way to use any documented API
to determine the number of bytes received on a bound UDP socket
before calling recv() / recvfrom() in MSG_WAITALL blocking mode to
actually receive the whole packet.
If you do not call ioctlsocket() or WsaIoctl or
recv{,from}(...,MSG_PEEK,...)
before entering recv{,from}(...,MSG_WAITALL,...) ,
then the recv{,from} succeeds.
I am considering advising clients that they must install and run
a Linux instance with MS Services for Linux under their windows
installation , and developing some
API to communicate with it from Windows, so that reliable
asynchronous UDP communication can be achieved - or does anyone
know of a good open source replacement for WinSock2 ?
I need access to a "C" library TCP+UDP/IP implementation for
modern Windows 10 that conforms to its own documentation,
unlike WinSock2 - does anyone know of one ?

Reading wrong data from TCP socket

I'm trying to send data blockwise over a TCP socket. The server code does the following:
#define CHECK(n) if((r=n) <= 0) { perror("Socket error\n"); exit(-1); }
int r;
//send the number of blocks
CHECK(write(sockfd, &(storage->length), 8)); //p->length is uint64_t
for(p=storage->first; p!=NULL; p=p->next) {
//send the size of this block
CHECK(write(sockfd, &(p->blocksize), 8)); //p->blocksize is uint64_t
//send data
CHECK(write(sockfd, &(p->data), p->blocksize));
}
On the client side, I read the size and then the data (same CHECK makro):
CHECK(read(sockfd, &block_count, 8));
for(i=0; i<block_count; i++) {
uint64_t block_size;
CHECK(read(sockfd, &block_size, 8));
uint64_t read_in=0;
while(read_in < block_size) {
r = read(sockfd, data+read_in, block_size-read_in); //assume data was previously allocated as char*
read_in += r;
}
}
This works perfectly fine as long as both client and server run on the same machine, but as soon as I try this over the network, it fails at some point. In particular, the first 300-400 blocks (à ~587 bytes) or so work fine, but then I get an incorrect block_size reading:
received block #372 size : 586
read_in: 586 of 586
received block #373 size : 2526107515908
And then it crashes, obviously.
I was under the impression that the TCP protocol ensures no data is lost and everything is received in correct order, but then how is this possible and what's my mistake here, considering that it already works locally?
There's no guarantee that when you read block_count and block_size that you will read all 8 bytes in one go.
I was under the impression that the TCP protocol ensures no data is
lost and everything is received in correct order
Yes, but that's all that TCP guarantees. It does not guarantee that the data is sent and received in a single packet. You need to gather the data and piece them together in a buffer until you get the block size you want before copying the data out.
Perhaps the read calls are returning without reading the full 8 bytes. I'd check what length they report they've read.
You might also find valgrind or strace informative for better understanding why your code is behaving this way. If you're getting short reads, strace will tell you what the syscalls returned, and valgrind will tell you that you're reading uninitialized bytes in your length variables.
The reason why it works on the same machine is that the block_size and block_count are sent as binary values and when they are received and interpreted by the client, they have same values.
However, if two machines communicating have different byte order for representing integers, e.g. x86 versus SPARC, or sizeof(int) is different, e.g. 64 bit versus 32 bit, then the code will not work correctly.
You need to verify that sizeof(int) and byte order of both machines is identical. On the server side, print out sizeof(int) and values of storage->length and p->blocksize. On the client side print out sizeof(int) and values of block_count and block_size.
When it doesn't work correctly, I think you will find them that they are not the same. If this is true, then the contents of data is also going to be misinterpreted if it contains any binary data.

Caching packets captured from pcap

This is a follow-up question to this:
Rebuilding a packet to inject via pcap
What I want to accomplish:
functionA: Capture packets with pcap. Modify source/destination addresses. Recalculate checksums. Inject with pcap.
functionB: Create two threads. Thread 1 sends a magic packet to wake sleeping client. Thread 2 captures packets with pcap and caches the packets into an array of u_char *'s, since pcap puts the packet data serially into "u_char * packet". When both threads terminate, I then change the headers then inject each of the cached packets.
What I need help with:
functionA: I can do everything but calculate checksums. I tried to verify the original checksum by calculating it myself with a function but they never match. However, this issue is not as important because I don't need it to demo my final project. I understand that if IP checksums are incorrect, the receiving computer will discard the packet. But when I demo, so long as my client computer can be shown to have received this incorrect packet, I have proven my overall concept and will not fail. :)
functionB: I guess this is the more important problem. I don't know of an easy way to cache my captured packets. What I'm working on right now is as follows:
functionB creates a pointer to an array that stores u_char * called cachedPackets. So cachedPackets basically points to an array that stores "strings".
It'll be something like this? u_char ** cachedPackets[100], enough array elements for 100 packets.
After this, I start two threads. Thread1 to wake my sleeping client. Thread2 to open another pcap session so no data is lost while client is waking. Thread1 is easy, I've already tested my send magic packet function independently. Thread2 is where I'm screwing up.
Thread2 eventually calls int pcap_loop(pcap_t *p, int cut, pcap_handler callback, u_char *user).
callback is the function that will be run after each packet is captured. It is where I will be caching the packet into the array.
callback takes parameters ( u_char* user,
const struct pcap_pkthdr* packet_header,
const u_char* packet_data )
user is the same string in the 4th argument of pcap_loop.
So I was thinking, I could sneakily give my callback function the pointer to the array of string by type casting it.
pcap_loop(asdf, asdf, callback, (u_char *)cachedPackets);
Since I don't know how big the incoming packets will be, I'll dynamically allocate enough space in the callback function. I will also keep track of my position in the array with a static int.
this is what the callback looks like:
void cacheCall(u_char * user, const struct pcap_pkthdr * header, const u_char * packet)
static int cacheindex = 0;
u_char ** cachethis = (u_char **)user;
//u_char * cachething = *cachethis;
(*cachethis)[cacheindex] = (u_char *) malloc(header->len); <--- 497
int i = 0;
for(i = 0; i < header->len; i++)
{
(*cachethis)[cacheindex][i] = packet[i]; <-------------------503
}
//memcpy(cachething[cacheindex], packet, header->len);
cacheindex++;
but when I compile, i get
497: warning: assignment makes integer from pointer without a cast
503: error: subscripted value is neither array nor pointer
That was pretty longwinded, hopefully my knowledge of what I'm doing isn't completely misinformed. Any help would be awesome! :)
u_char ** cachethis;
cachethis is a pointer-to-pointer-to-u_char.
So:
*cachethis
is a pointer-to-u_char, and:
(*cachethis)[i]
is a plain u_char.
So line 497 tries to store a pointer into an u_char, and line 503 tries to subscript a u_char, both of which are invalid.
Looks like what you want is simply:
cachethis[i]
and
cachethis[i][j]

Resources