copy_to_user a struct that contains an array (pointer) - c

Disclosure: I'm fairly new to C. If you could explain any answers verbosely, I would appreciate it.
I am writing a linux kernel module, and in one of the functions I am writing I need to copy a structure to userspace that looks like this:
typedef struct
{
uint32_t someProperty;
uint32_t numOfFruits;
uint32_t *arrayOfFruits;
} ObjectCapabilities;
The API I'm implementing has documentation that describes the arrayOfFruits member as "an array of size numOfFruits where each element is a FRUIT_TYPE constant." I am confused how to do this, given that the arrayOfFruits is a pointer. When I copy_to_user the ObjectCapabilities structure, it will only copy the pointer arrayOfFruits to userspace.
How can userspace continuously access the elements of the array? Here is my attempt:
ObjectCapabilities caps;
caps.someProperty = 1024;
caps.numOfFruits = 3;
uint32_t localArray[] = {
FRUIT_TYPE_APPLE,
FRUIT_TYPE_ORANGE,
FRUIT_TYPE_BANANA
};
caps.arrayOfFruits = localArray;
And then for the copy... can I just do this?
copy_to_user((void *)destination, &caps, (sizeof(caps) + (sizeof(localArray) / sizeof((localArray)[0]))));

The user needs to provide enough space for all the data being copied out. Ideally he'll tell you how much space he provided, and you check that everything fits.
The copied-out data should (in general) not include any pointers, since they're "local" to a different "process" (the kernel can be viewed as a separate process, as it were, and kernel / user interactions involve process-to-process IPC, similar to sending stuff over local or even Internet-connected sockets).
Since the kernel has pretty intimate knowledge of a process, you can skirt these rules somewhat, e.g., you could compute what the user's pointer will be, and copy out a copy of the original data, with the pointer modified appropriately. But that's kind of wasteful. Or, you can copy a kernel pointer and just not use it in the user code, but now you're "leaking data" that "bad guys" can sometimes leverage in various ways. In security-people-speak you've left a wide-open "covert channel".
In the end, then, the "right" way to do this tends to be something like this:
struct user_interface_version_of_struct {
int property;
int count;
int data[]; /* of size "count" */
};
The user code mallocs (or otherwise arranges to have sufficient space) the "user interface version" and makes some system call to the kernel (read, receive, rcvmsg, ioctl, whatever, as long as it involves doing a "read"-type operation) and tells the kernel: "here's the memory holding the struct, and here's how big it is" (in bytes, or the maximum count value, or whatever: user and kernel simply need to agree on the protocol). The kernel-side code then verifies the user's values in some appropriate manner, and either does the copy-out however is most convenient, or returns an error.
"Most convenient" is sometimes two separate copy ops, or some put_user calls, e.g., if the kernel side has the data structure you showed, you might do:
/* let's say ulen is the user supplied length in bytes,
and uaddr is the user-supplied address */
struct user_interface_version_of_struct *p;
needed = sizeof(*p) + 3 * sizeof(int);
if (needed > ulen)
return -ENOMEM; /* user did not supply enough space */
p = uaddr;
error = put_user(1024, &p->property);
if (error == 0)
error = put_user(3, &p->count);
if (error == 0 && copy_to_user(&p->data, localArray, 3 * sizeof(int))
error = -EFAULT;
You may have a situation where you must conform to some not-very-nice interface, though.
Edit: if you're adding your own system call (rather than tying in to read or ioctl for instance), you can separate the header and data, as in Adam Rosenfield's answer.

You can't copy raw pointers, since a pointer into kernel space is meaningless to userspace (and will segfault if dereferenced).
The typical way of doing something like this is to ask the userspace code to allocate the memory and pass in a pointer to that memory into a system call. If the program doesn't pass in a large enough buffer, then fail with an error (e.g. EFAULT). If there's no way for the program to know in advance a priori how much memory it will need, then typically you'd return the amount of data needed when passed a NULL pointer.
Example usage from userspace:
// Fixed-size data
typedef struct
{
uint32_t someProperty;
uint32_t numOfFruits;
} ObjectCapabilities;
// First query the number of fruits we need
ObjectCapabilities caps;
int r = sys_get_fruit(&caps, NULL, 0);
if (r != 0) { /* Handle error */ }
// Now allocate memory and query the fruit
uint32_t *arrayOfFruits = malloc(caps.numOfFruits * sizeof(uint32_t));
r = sys_get_fruit(&caps, arrayOfFruits, caps.numOfFruits);
if (r != 0) { /* Handle error */ }
And here's how the corresponding code would look in kernel space on the other side of the system call:
int sys_get_fruit(ObjectCapabilities __user *userCaps, uint32_t __user *userFruit, uint32_t numFruits)
{
ObjectCapabilities caps;
caps.someProperty = 1024;
caps.numOfFruits = 3;
// Copy out fixed-size data
int r = copy_to_user(userCaps, &caps, sizeof(caps));
if (r != 0)
return r;
uint32_t localArray[] = {
FRUIT_TYPE_APPLE,
FRUIT_TYPE_ORANGE,
FRUIT_TYPE_BANANA
};
// Attempt to copy variable-sized data. Check the size first.
if (numFruits * sizeof(uint32_t) < sizeof(localArray))
return -EFAULT;
return copy_to_user(userFruit, localArray, sizeof(localArray));
}

With copy_to_user you would do two copy to users.
//copy the struct
copy_to_user((void *)destination, &caps, sizeof(caps));
//copy the array.
copy_to_user((void *)destination->array, localArray, sizeof(localArray);

Related

Bytecopy of Float array into Byte buffer creates hard fault (C - STM32F4xx/F103)

Abstract:
I need to copy all elements of a struct containing a float array into a byte buffer, in order to send it out via UART. The next call of malloc after the copy operation leads (allways) to a hard fault, which is a good indicator that somewhere the memory gets corrupted, but I have no clue where this could happen (after 2 days debugging ...)
Description:
I have a nested Typtedef, that contains a float array:
#define DRV_SCALE_MAXSZ 32
#define DRV_CHANNELS 2
typedef struct {
float x1;
float step;
uint8_t aSz;
float Y[DRV_SCALE_MAXSZ];
} DRV_linscale_TDS;
typedef struct {
DRV_linscale_TDS scale;
uint32_t active;
} DRV_ChScale_TDS;
DRV_ChScale_TDS DRV_scale[DRV_CHANNELS] = {0,}; // Channel Scales
And I need to copy the whole content of either DRV_scale[0] or [1] into a byte buffer, in order to send it out via UART.
As a little extra complication I copy it element by element, with a copy function, that reverts the bytes of the value if necessary:
#define TXBUFSZ 255
volatile uint8_t TxBuf[TXBUFSZ] = {0,};
void FillTxBuf(uint8_t idx, uint8_t *pBo) {
if(idx < DRV_CHANNELS) {
volatile uint8_t *pDst = TxBuf;
*pDst++ = DRV_SCALE_MAXSZ;
*pDst++ = DRV_scale[idx].active;
pDst += COM_ElementCopyU32((uint8_t*)&DRV_scale[idx].scale.x1, pDst, pBo);
pDst += COM_ElementCopyU32((uint8_t*)&DRV_scale[idx].scale.step, pDst, pBo);
*pDst++ = DRV_scale[idx].scale.aSz;
uint8_t i = *pDst;
float *pSrc = DRV_scale[idx].scale.Y;
while(i--) {
pDst += COM_ElementCopyU32((uint8_t*)pSrc, pDst, pBo);
pSrc++;
}
}
}
Note: the code above is a shrinked version just for explanation. In reality TxBuf[TXBUFSZ] is a static preallocated byte buffer (declared extern in the header file, and defined in the c file)
The function COM_ElementCopyU32 looks like this:
uint8_t COM_ElementCopyU32(volatile uint8_t* pSrc, volatile uint8_t* pDst, uint8_t* ByteOrder) {
// #brief copy data from Source to Destination and revert bytes if necessary
// #param U8* pSrc: Pointer to data Source Buffer
// #param U8* pDst: Pointer to Destination Buffer
// #param U8 ByteOder: 0 = little endian, 1=big endian
// #return u16 number of copied bytes
if(pSrc && pDst) {
if(*ByteOrder != isBigEndian) {
pDst[0] = pSrc[3];
pDst[1] = pSrc[2];
pDst[2] = pSrc[1];
pDst[3] = pSrc[0];
} else {
pDst[0] = pSrc[0];
pDst[1] = pSrc[1];
pDst[2] = pSrc[2];
pDst[3] = pSrc[3];
}
}
return(sizeof(uint32_t));
}
The issue:
as soon as the line
pDst += COM_ElementCopyU32((uint8_t*)pSrc, pDst, pBo);
is involved, the call of FillTxBuf() leads to an hard fault with the next call of malloc(). The next malloc() comes immediately after FillTxBuf() when the CRC32 is appended to the byte stream. The general workflow is: check the incoming request, fill the Tx Buffer, append the CRC32 and send it out.
What have i tried to solve this so far?
Well, i tried a lot:
I removed the line mentioned above. As long i do not copy any bytes
from DRV_scale[idx].scale.Y to TxBuf[] in the while loop is
disabled, anything works fine.
I replaced float pSrc = DRV_scale[idx].scale.Y; with * float pSrc =
DebugArray; where DebugArray is a "stand alone" static pre-allocated
float array of the same size as DRV_scale[idx].scale.Y (32
Elements) and anything works fine
I tried to copy the Elements from DRV_scale[idx].scale.Y to
another float array (lets call it "DupArray"), which worked fine but
when I tried to copy "DupArray" bytewise into TxBuf[] it crashes.
and I tried to copy the Elements from DRV_scale[idx].scale.Y to
TxBuf[] in another function, right after Hardware initialisation, using the same code (copy & paste), it worked fine
I tried several versions of the DRV_linscale_TDS Typdef, with the
Byte variable at the end and at the beginning, with no effect
I checked if there would be a buffer overflow in the while loop, but
as expected there is none, as the total number of copied bytes is
~100, so there are 155 bytes "free" (note: the overrun prevention
code is still in the original code but left out here for better
readability)
I have no clue what's going on here. Each part of the code - when I debug it separatey - works fine. Copying the original Array to another float preallocated float array works fine, copying the float array to a byte array and writing it back works fine. Just if I do exactly that, whats working fine verywhere else, in that particular function, it generates a hard fault.
Through all the testing and debugging it points out clearly: the hard fault only happens, when I try to copy DRV_scale[idx].scale.Y into TxBuf[], anything else works without problems.
One might say: well, then somewhere before FillTxBuf() TxBuf[] gets corrupted, but why works anything flawless in FillTxBuf() when I use a different float array than DRV_scale[idx].scale.Y?
Remarks:
One possible workaround would most probably be to split up the struct and use separate preallocated "stand alone" float arrays. The reason why I glued it together in one variable is, that this variable is written to flash and I'd really like the approach FlashWrite(VariablePointer, SizeInBytes) ...
If there is no other option, i will have to separate it, but I'd really like to understand in which pitfall I stumbled in ...
The Question:
Where could I search?
I have no idea about the problem but you can use union to send struct data within a array. Here is an example;
typedef union
{
your struct object; (should be packed
struct)
uint8_t uartBuff[your struct size];
}unExample;
Variables in union use same memory address via this you can easily send your data.
hardfault errors always because of pointer-alingment
i usually use my own library for binary serialize in c
this library can help you
it's have examples in c and for STM32F4 already
it's support endiness and have configuration part for customization

Difference between vm_offset_t, (void *), and mach_vm_size_t

I'm trying to understand this code for reading virtual memory mappings but I'm having trouble understanding the different data types as I can't find any good documentation.
What is the difference between vm_offset_t, void *, and mach_vm_size_t? On my machine they all seem to be 8 bytes (64-bit) and used to navigate virtual memory. What are the differences between their purposes? What is the point of having these different types?
EDIT:
For instance, in the linked code:
unsigned char *
readProcessMemory (int pid, mach_vm_address_t addr, mach_msg_type_number_t *size)
{
// Helper function to read process memory (a la Win32 API of same name)
// To make it easier for inclusion elsewhere, it takes a pid, and
// does the task_for_pid by itself. Given that iOS invalidates task ports
// after use, it's actually a good idea, since we'd need to reget anyway
task_t t;
task_for_pid(mach_task_self(),pid, &t);
mach_msg_type_number_t dataCnt = size;
vm_offset_t readMem;
// Use vm_read, rather than mach_vm_read, since the latter is different
// in iOS.
kern_return_t kr = vm_read(t, // vm_map_t target_task,
addr, // mach_vm_address_t address,
*size, // mach_vm_size_t size
&readMem, //vm_offset_t *data,
size); // mach_msg_type_number_t *dataCnt
if (kr) {
// DANG..
fprintf (stderr, "Unable to read target task's memory #%p - kr 0x%x\n" , addr, kr);
return NULL;
}
return ( (unsigned char *) readMem);
}
According to this documentation of the vm_read function, the data_out parameter is an "Out-pointer to dynamic array of bytes returned by the read."
But in the code above they pass in &readMem (which is type vm_offset_t *) for data_out. I'm confused how `readMem is being used here - is it a pointer to the dynamic array of bytes returned by the read? Or does it actually contain the dynamic array of bytes? Is vm_offset_t a pointer or an address? What is its purpose?
Similarly
vm_offset_t, void*, and mach_vm_size_t are all internally synonymous with unsigned long, but they are used to make the code more readable and expressive.
vm_read returns an address in readMem, meaning that readMem will need to be cast to a pointer and dereferenced to access its value.
Also, the memory region pointed to by readMem is allocated by the kernel, so it needs to be deallocated with vm_deallocate. To avoid this, consider using vm_read_overwrite which will populate the buffer it is supplied.

C - varying size text

I have to write for my assignement a program that will consist of agents and a central server deamon. It will be a distributed shell - every command issued from a server will be also performed on every agent(the output will be sent back from every agent to central server).
I will have to deal with output commands (like ls -la /home/user/dir1) - on each agent the output may vary in size). The output of "find /" will also be BIG but I have to take somehow into account that something like that can happen. What is desired way of handling varying size outputs in C and operating on them? (saving to variable, sending it over a socket).
The way to deal with data of arbitrary size is to use dynamic allocation, i.e. the functions malloc(), realloc() and free(). You allocate and possibly grow the memory needed to store the command output.
Reading command output (assuming a Unix-like OS) is best done with popen().
Read the manuals of each of the mentioned functions for details.
Dynamic Memory Allocation
To hold your "variable length" strings, you should use dynamic memory allocation: the malloc family of functions.
#include <stdlib.h>
void *malloc(size_t size);
void free(void *ptr);
void *calloc(size_t nmemb, size_t size);
void *realloc(void *ptr, size_t size);
So, suppose you have your data stored in a variable char *ag_str. I suggest you malloc and then realloc the size of the buffer in blocks. Calling malloc and then realloc a thousand times to readjust the block size after each character is very costly.
So, you might do something like this:
#define BLOCK_SIZE 4096
struct mem_block {
size_t current_block_size;
size_t current_str_size;
char *ag_str;
};
struct mem_block *new_chunk(void)
{
struct mem_block *p = malloc(sizeof *p);
p->ag_str = malloc(BLOCK_SIZE);
p->current_block_size = BLOCK_SIZE;
p->current_str_size = 0;
return p;
}
void realloc_chunk(struct mem_block *chunk)
{
size_t ns = chunk->current_block_size + BLOCK_SIZE;
chunk->ag_str = realloc(chunk->ag_str, ns);
chunk->current_block_size = ns;
}
void cat_ag_str(struct mem_block *chunk, char *ag_str, size_t ag_len)
{
if (chunk->current_str_size + ag_len > chunk->current_block_size)
realloc_chunk(chunk);
strncat (chunk->ag_str, ag_str, ag_len);
chunk->current_str_size += ag_len;
}
void receive_from_agent(...)
{
struct mem_block *chunk = new_chunk();
ssize_t c; // Linux read/recv return
size_t count;
char buff[BLOCK_SIZE];
while((c = read(your_fd, buff, BLOCK_SIZE)) // or probably recv()
if (c < 0) ...
count = (size_t)c;
cat_ag_str(chunk, buff, count);
(...)
}
Note that this code was not tested and is just an idea for you. (Error checking was omitted)
struct mem_block: This will keep information about your current memory block.
new_chunk: function to create a new chunk handler for you.
realloc_chunk: anytime the amount of characters that must be written exceeds the amount of characters available in the chunk, we get one more block.
cat_ag_str: this will append what you just read to the memory block you have, effectively transforming chunks of data into one coherent big buffer.
receive_from_agent: this is the entry point of your receiving loop. You may use read or recv, I don't know which you use, but both return the amount of bytes read, which you'll use to pass to cat_ag_str.
It's important to note that you're reading in the same sized blocks as you realloc. (You can read in smaller chunks too, but never bigger).
You can do roughly the same for sending, but you don't need all that workaround for memory. You can just use a fixed sized buffer and copy data from your big string to it in fixed sizes, then you send the fixed-sized buffer.

safe structures embedded systems

I have a packet from a server which is parsed in an embedded system. I need to parse it in a very efficient way, avoiding memory issues, like overlapping, corrupting my memory and others variables.
The packet has this structure "String A:String B:String C".
As example, here the packet received is compounded of three parts separated using a separator ":", all these parts must be accesibles from an structure.
Which is the most efficient and safe way to do this.
A.- Creating an structure with attributes (partA, PartB PartC) sized with a criteria based on avoid exceed this sized from the source of the packet, and attaching also an index with the length of each part in a way to avoid extracting garbage, this part length indicator could be less or equal to 300 (ie: part B).
typedef struct parsedPacket_struct {
char partA[2];int len_partA;
char partB[300];int len_partB;
char partC[2];int len_partC;
}parsedPacket;
The problem here is that I am wasting memory, because each structure should copy the packet content to each the structure, is there a way to only save the base address of each part and still using the len_partX.
How about replacing the (:) with a 0, and add a null to the end - then you have three char * to pass around. You will need to deal with 0 length strings, but that might solve it
To avoid corrupting memory and other variables, you generally declare large data buffers as statics and place them at file scope, then allocate a separate RAM segment for them. Having them sitting on the stack is a bad idea in any embedded system.
You need to consider whether there is an alignment requirement for the CPU and whether the code should be portable or not. The compiler is free to add any number of padding bytes anywhere in that struct, meaning you may not be able to do this:
parsedPacket pp;
memcpy(&pp, raw_data, sizeof(parsedPacket )) ;
For this reason, structs are generally a bad choise for storing data packages. The safest solution is this:
/* packet.h */
typedef struct parsedPacket_struct {
uint8_t* partA;
uint8_t* partB;
uint8_t* partC;
uint16_t len_partA;
uint16_t len_partB;
uint16_t len_partC;
}parsedPacket;
#define MAX_PART_A 2
#define MAX_PART_B 300
#define MAX_PART_C 2
void packet_receive (parsedPacket* packet);
/* packet.c */
static uint8 partA[MAX_PART_A];
static uint8 partB[MAX_PART_B];
static uint8 partC[MAX_PART_C];
void packet_receive (parsedPacket* packet)
{
/* receive data from server */
...
packet->len_partA = ...;
packet->len_partB = ...;
packet->len_partC = ...;
packet->partA = partA;
packet->partB = partB;
packet->partC = partC;
memcpy(partA, A_from_server, packet->len_partA);
memcpy(partB, B_from_server, packet->len_partB);
memcpy(partC, C_from_server, packet->len_partC);
}
This can be extended to contain several static buffers if needed, ie a static array of arrays for each buffer. As you are dealing with large amounts of data in an embedded system, you can never allow the program to stack the buffers at a whim. The maximum amount of copies of a received packet must be determined during program design.
I'm not sure why you think your approach is wasting memory, but here's what I would do if I were feeling especially hacky:
typedef struct {
char *a, *b, *c;
char data[1]; // or 0 if your compiler lets you, or nothing in C99
} parsedPacket;
This is called a flexible array member. Basically, when you allocate memory for your struct, you do this:
parsedPacket *p = malloc(offsetof(parsedPacket, data[N]));
N above becomes the amount of data your array needs, i.e. how long the string you read is. This allocates the struct so that the data member has enough size for your entire string of data. Then, copy the string you recieve into this member, replace ':' characters with '\0', and set a to the first string (i.e. p->a = p->data), b to the second (p->b = p->data + strlen(p->a) + 1) and c to the third. Of course, you can make this process easier by doing it all at once:
size_t current = 0;
p->a = p->data;
p->b = p->c = NULL;
while(1)
{
int i = getc();
if(i == '\n' || i == EOF) break; // or whatever end conditions you expect
if(i == ':')
{
p->data[current] = '\0';
++current;
if(p->b == NULL) p->b = &p->data[current];
else if(p->c == NULL) p->c = &p->data[current];
else /* error */;
}
else
{
p->data[current] = i;
}
}
The type of each len_partN should be a type that can count up to the length of partN. E.g.:
typedef struct parsedPacket_struct {
char partA[300];unsigned short len_partA; // unsigned shorts have < 32k distinct values
char partB[300];unsigned short len_partB;
char partC[300];unsigned short len_partC;
}parsedPacket;
This seems like a design decision. If you want the struct to be easy to create, use the above approach, but beware its drawbacks (like "what if B has more than 300 chars?").

malloc code in C

I have a code block that seems to be the code behind malloc. But as I go through the code, I get the feeling that parts of the code are missing. Does anyone know if there is a part of the function that's missing? Does malloc always combine adjacent chunks together?
int heap[10000];
void* malloc(int size) {
int sz = (size + 3) / 4;
int chunk = 0;
if(heap[chunk] > sz) {
int my_size = heap[chunk];
if (my_size < 0) {
my_size = -my_size
}
chunk = chunk + my_size + 2;
if (chunk == heap_size) {
return 0;
}
}
The code behind malloc is certainly much more complex than that. There are several strategies. One popular code is the dlmalloc library. A simpler one is described in K&R.
The code is obviously incomplete (not all paths return a value). But in any case this is not a "real" malloc. This is probably an attempt to implement a highly simplified "model" of 'malloc'. The approach chosen by the author of the code can't really lead to a useful practical implementation.
(And BTW, standard 'malloc's parameter has type 'size_t', not 'int').
Well, one error in that code is that it doesn't return a pointer to the data.
I suspect the best approach to that code is [delete].
When possible, I expect that malloc will try to put different requests close to each other, as it will have a block of code that is available for malloc, until it has to get a new block.
But, that also depends on the requirements imposed by the OS and hardware architecture. If you are only allowed to request a certain minimum size of code then it may be that each allocation won't be near each other.
As others mentioned, there are problems with the code snippet.
You can find various open-source projects that have their own malloc function, and it may be best to look at one of those, in order to get an idea what is missing.
malloc is for dynamically allocated memory. And this involves sbrk, mmap, or maybe some other system functions for Windows and/or other architectures. I am not sure what your int heap[10000] is for, as the code is too incomplete.
Effo's version make a little bit more sense, but then it introduce another black box function get_block, so it doesn't help much.
The code seems to be run on a metal machine, normally no virtual address mapping on such a system which only use physical address space directly.
See my understanding, on a 32 bits system, sizeof(ptr) = 4 bytes:
extern block_t *block_head; // the real heap, and its address
// is >= 0x80000000, see below "my_size < 0"
extern void *get_block(int index); // get a block from the heap
// (lead by block_head)
int heap[10000]; // just the indicators, not the real heap
void* malloc(int size)
{
int sz = (size + 3) / 4; // make the size aligns with 4 bytes,
// you know, allocated size would be aligned.
int chunk = 0; // the first check point
if(heap[chunk] > sz) { // the value is either a valid free-block size
// which meets my requirement, or an
// address of an allocated block
int my_size = heap[chunk]; // verify size or address
if (my_size < 0) { // it is an address, say a 32-bit value which
// is >0x8000...., not a size.
my_size = -my_size // the algo, convert it
}
chunk = chunk + my_size + 2; // the algo too, get available
// block index
if (chunk == heap_size) { // no free chunks left
return NULL; // Out of Memory
}
void *block = get_block(chunk);
heap[chunk] = (int)block;
return block;
}
// my blocks is too small initially, none of the blocks
// will meet the requirement
return NULL;
}
EDIT: Could somebody help to explain the algo, that is, converting address -> my_size -> chunk? you know, when call reclaim, say free(void *addr), it'll use this address -> my_size -> chunk algo too, to update the heap[chunk] accordingly after return the block to the heap.
To small to be a whole malloc implementation
Take a llok in the sources of the C library of Visual Studio 6.0, there you will find the implementation of malloc if I remeber it correctly

Resources