I'm using Sun RPC to implement a simple pseudo-distributed storage system. I have three instances of the same server, and one client on the same machine.
Server RPC implementation goes something like this:
char **
fileread64k_1_svc(char *filename, long offset, struct svc_req *rqstp)
{
static char * readResult;
//chunkName is a function of (fileName, offset)
FILE *chunkFile = fopen(chunkName, "r");
readResult = (char *) malloc(sizeof(char) * (CHUNKSIZE + 2));
fread(readResult, 1, CHUNKSIZE, chunkFile);
readResult[CHUNKSIZE] = '\0';
fclose(chunkFile);
return &readResult;
}
I give my client a list of files to read, and the client creates 3 threads (one for each instance of the server) and the threads distribute the files among them, and call the read RPC like this:
while all files are not read:
//pthread_mutex_lock(&lock);
char **out = fileread64k_1(fileName, offset, servers[id]);
//char *outData = *out;
//pthread_mutex_unlock(&lock);
But the data in out is replaced by another thread before I have a chance to process it. If I enable the commented lines (the mutex and the outData variable), I get the data in outData and I seem to be able to safely use it.
Can anyone explain why this happens and if there is a better workaround?
Because "readResult" is declared static. That means that all invocations of the method use the same space in memory for that variable, including concurrent invocations in different threads.
The problem should be taken care of if you just don't declare readResult as static -- but in that case, you won't be able to return its address, you should return the value of readResult itself.
Incidentally, which code has the responsibility of free()ing the allocated memory?
Related
Is there a way to inject code into an ELF binary without ptrace, I can't use it since the program I'm writing this for is using GDB and I don't want to stop the process for the while it's injecting. I read it's possible by using /proc/pid/mem but I couldn't quite find anything about how to do it. I don't want to use LD_PRELOAD either since it would require restarting the program and I'd want to do it during runtime.
EDIT: I can't use ptrace since the process might already be attached to by gdb
/proc/pid/mem behaves like an image of the process's memory. To read/write the process's memory, just open /proc/pid/mem, then lseek to the desired address and read() or write() however many bytes you want.
For instance, to overwrite the byte at address 0x12345 in the process with 0x90, you can just do
fd = open("/proc/XXX/mem", O_RDWR);
lseek(fd, 0x12345, SEEK_SET);
unsigned char new = 0x90;
write(fd, &new, 1);
On a 32-bit system, use lseek64 instead (and add #define _LARGEFILE64_SOURCE before the standard includes).
Note that accessing /proc/XXX/mem requires the same permissions as to ptrace the process. In particular, on some systems you may need to be root.
I decided I'd use process_vm_writev, which seems to work I don't know why it didn't want to write to /proc/pid/mem which is odd.
/**
* #brief write_process_memory Writes to the memory of a given process
* #param pid Program pid
* #param address The base memory address
* #param buffer Buffer to write
* #param n How many bytes to write
* #return Returns bytes written
*/
ssize_t write_process_memory(pid_t pid, void *address, void *buffer, ssize_t n) {
struct iovec local, remote;
/* this might have to be made so that if n > _SC_PAGESIZE
* local would be split into multiple locals, similar to how
* read_process_memory works, no fucking clue though if it's necessary */
remote.iov_base = address;
remote.iov_len = n;
local.iov_base = buffer;
local.iov_len = n;
ssize_t amount_read = process_vm_writev(pid, &local, 1, &remote, 1, 0);
return amount_read;
}
I want to be able to write atomically to a file, I am trying to use the write() function since it seems to grant atomic writes in most linux/unix systems.
Since I have variable string lengths and multiple printf's, I was told to use snprintf() and pass it as an argument to the write function in order to be able to do this properly, upon reading the documentation of this function I did a test implementation as below:
int file = open("file.txt", O_CREAT | O_WRONLY);
if(file < 0)
perror("Error:");
char buf[200] = "";
int numbytes = snprintf(buf, sizeof(buf), "Example string %s" stringvariable);
write(file, buf, numbytes);
From my tests it seems to have worked but my question is if this is the most correct way to implement it since I am creating a rather large buffer (something I am 100% sure will fit all my printfs) to store it before passing to write.
No, write() is not atomic, not even when it writes all of the data supplied in a single call.
Use advisory record locking (fcntl(fd, F_SETLKW, &lock)) in all readers and writers to achieve atomic file updates.
fcntl()-based record locks work over NFS on both Linux and BSDs; flock()-based file locks may not, depending on system and kernel version. (If NFS locking is disabled like it is on some web hosting services, no locking will be reliable.) Just initialize the struct flock with .l_whence = SEEK_SET, .l_start = 0, .l_len = 0 to refer to the entire file.
Use asprintf() to print to a dynamically allocated buffer:
char *buffer = NULL;
int length;
length = asprintf(&buffer, ...);
if (length == -1) {
/* Out of memory */
}
/* ... Have buffer and length ... */
free(buffer);
After adding the locking, do wrap your write() in a loop:
{
const char *p = (const char *)buffer;
const char *const q = (const char *)buffer + length;
ssize_t n;
while (p < q) {
n = write(fd, p, (size_t)(q - p));
if (n > 0)
p += n;
else
if (n != -1) {
/* Write error / kernel bug! */
} else
if (errno != EINTR) {
/* Error! Details in errno */
}
}
}
Although there are some local filesystems that guarantee write() does not return a short count unless you run out of storage space, not all do; especially not the networked ones. Using a loop like above lets your program work even on such filesystems. It's not too much code to add for reliable and robust operation, in my opinion.
In Linux, you can take a write lease on a file to exclude any other process opening that file for a while.
Essentially, you cannot block a file open, but you can delay it for up to /proc/sys/fs/lease-break-time seconds, typically 45 seconds. The lease is granted only when no other process has the file open, and if any other process tries to open the file, the lease owner gets a signal. (If the lease owner does not release the lease, for example by closing the file, the kernel will automagically break the lease after the lease-break-time is up.)
Unfortunately, these only work in Linux, and only on local files, so they are of limited use.
If readers do not keep the file open, but open, read, and close it every time they read it, you can write a full replacement file (must be on the same filesystem; I recommend using a lock-subdirectory for this), and hard-link it over the old file.
All readers will see either the old file or the new file, but those that keep their file open, will never see any changes.
Working on a Linux (Ubuntu) application. I need to read many files in a non-blocking fashion. Unfortunately epoll doesn't support real file descriptor (file descriptor from file), it does support file descriptor that's network socket. select does work on real file descriptors, but it has two drawbacks, 1) it's slow, linearly go through all the file descriptors that are set, 2) it's limited, it typically won't allow more than 1024 file descriptors.
I can change each file descriptors to be non-blocking and use non-blocking "read" to poll, but it's very expensive especially when there are a large number of file descriptors.
What are the options here?
Thanks.
Update 1
The use case here is to create some sort of file server, with many clients requesting for files, serve them in a non-blocking fashion. Due to network side implementation (not standard TCP/IP stack), can't use sendfile().
You could use multiple select calls combined with either threading or forking. This would reduce the number of FD_ISSET calls per select set.
Perhaps you can provide more details about your use-case. It sounds like you are using select to monitor file changes, which doesn't work as you would expect with regular files. Perhaps you are simply looking for flock
You could use Asynchronous IO on Linux. The relevant AIO manpages (all in section 3) appear to have quite a bit of information. I think that aio_read() would probably be the most useful for you.
Here's some code that I believe you should be able to adapt for your usage:
...
#define _GNU_SOURCE
#include <aio.h>
#include <unistd.h>
typedef struct {
struct aiocb *aio;
connection_data *conn;
} cb_data;
void callback (union sigval u) {
// recover file related data prior to freeing
cb_data data = u.sival_ptr;
int fd = data->aio->aio_fildes;
uint8_t *buffer = data->aio->aio_buf;
size_t len = data->aio->aio_nbytes;
free (data->aio);
// recover connection data pointer then free
connection_data *conn = data->conn;
free (data);
...
// finish handling request
...
return;
}
...
int main (int argc, char **argv) {
// initial setup
...
// setup aio for optimal performance
struct aioinit ainit = { 0 };
// online background threads
ainit.aio_threads = sysconf (_SC_NPROCESSORS_ONLN) * 4;
// use defaults if using few core system
ainit.aio_threads = (ainit.aio_threads > 20 ? ainit.aio_threads : 20)
// set num to the maximum number of likely simultaneous requests
ainit.aio_num = 4096;
ainit.aio_idle_time = 5;
aio_init (&ainit);
...
// handle incoming requests
int exit = 0;
while (!exit) {
...
// the [asynchronous] fun begins
struct aiocb *cb = calloc (1, sizeof (struct aiocb));
if (!cb)
// handle OOM error
cb->aio_fildes = file_fd;
cb->aio_offset = 0; // assuming you want to send the entire file
cb->aio_buf = malloc (file_len);
if (!cb->aio_buf)
// handle OOM error
cb->aio_nbytes = file_len;
// execute the callback in a separate thread
cb->aio_sigevent.sigev_notify = SIGEV_THREAD;
cb_data *data = malloc (sizeof (cb_data));
if (!data)
// handle OOM error
data->aio = cb; // so we can free() later
// whatever you need to finish handling the request
data->conn = connection_data;
cb->aio_sigevent.sigev_value.sival_ptr = data; // passed to callback
cb->aio_sigevent.sigev_notify_function = callback;
if ((err = aio_read (cb))) // and you're done!
// handle aio error
// move on to next connection
}
...
return 0;
}
This will result in you no longer having to wait on files being read in your main thread. Of course, you can create more performant systems using AIO, but those are naturally likely to be more complex and this should work for a basic use case.
I'm trying to write a linux kernel module that can dump the contents of other modules to a /proc file (for analysis). In principle it works but it seems I run into some buffer limit or the like. I'm still rather new to Linux kernel development so I would also appreciate any suggestions not concerning the particular problem.
The memory that is used to store the module is allocated in this function:
char *get_module_dump(int module_num)
{
struct module *mod = unhiddenModules[module_num];
char *buffer;
buffer = kmalloc(mod->core_size * sizeof(char), GFP_KERNEL);
memcpy((void *)buffer, (void *)startOf(mod), mod->core_size);
return buffer;
}
'unhiddenModules' is an array of module structs
Then it is handed over to the proc creation here:
void create_module_dump_proc(int module_number)
{
struct proc_dir_entry *dump_module_proc;
dump_size = unhiddenModules[module_number]->core_size;
module_buffer = get_module_dump(module_number);
sprintf(current_dump_file_name, "%s_dump", unhiddenModules[module_number]->name);
dump_module_proc = proc_create_data(current_dump_file_name, 0, dump_proc_folder, &dump_fops, module_buffer);
}
The proc read function is as follows:
ssize_t dump_proc_read(struct file *filp, char *buf, size_t count, loff_t *offp)
{
char *data;
ssize_t ret;
data = PDE_DATA(file_inode(filp));
ret = copy_to_user(buf, data, dump_size);
*offp += dump_size - ret;
if (*offp > dump_size)
return 0;
else
return dump_size;
}
Smaller Modules are dumped correctly but if the module is larger than 126,796 bytes only the first 126,796 bytes are written and this error is displayed when reading from the proc file:
*** Error in `cat': free(): invalid next size (fast): 0x0000000001f4a040 ***
I've seem to run into some limit but I couldn't find anything on it. The error seems to be related so memory leaks but the buffer should be large enough so I don't see where this actually happens.
The procfs has a limit of PAGE_SIZE (one page) for read and write operations. Usually seq_file is used to iterate over the entries (modules in your case ?) to read and/or write smaller chunks. Since you are running into problems with only larger data, I suspect this is the case here.
Please have a look here and here if you are not familiar with seq_files.
A suspicious thing is that in dump_proc_read you are not using "count" parameter. I would have expected copy_to_user to take "count" as third argument instead of "dump_size" (and in subsequent calculations too). The way you do, always dump_size bytes are copied to user space, regardless the data size the application was expecting. The bigger dump_size is, the larger the user area that gets corrupted.
I have always been told(In books and tutorials) that while copying data from kernel space to user space, we should use copy_to_user() and using memcpy() would cause problems to the system. Recently by mistake i have used memcpy() and it worked perfectly fine with out any problems. Why is that we should use copy_to_user instead of memcpy()
My test code(Kernel module) is something like this:
static ssize_t test_read(struct file *file, char __user * buf,
size_t len, loff_t * offset)
{
char ani[100];
if (!*offset) {
memset(ani, 'A', 100);
if (memcpy(buf, ani, 100))
return -EFAULT;
*offset = 100;
return *offset;
}
return 0;
}
struct file_operations test_fops = {
.owner = THIS_MODULE,
.read = test_read,
};
static int __init my_module_init(void)
{
struct proc_dir_entry *entry;
printk("We are testing now!!\n");
entry = create_proc_entry("test", S_IFREG | S_IRUGO, NULL);
if (!entry)
printk("Failed to creats proc entry test\n");
entry->proc_fops = &test_fops;
return 0;
}
module_init(my_module_init);
From user-space app, i am reading my /proc entry and everything works fine.
A look at source code of copy_to_user() says that it is also simple memcpy() where we are just trying to check if the pointer is valid or not with access_ok and doing memcpy.
So my understanding currently is that, if we are sure about the pointer we are passing, memcpy() can always be used in place of copy_to_user.
Please correct me if my understanding is incorrect and also, any example where copy_to_user works and memcpy() fails would be very useful. Thanks.
There are a couple of reasons for this.
First, security. Because the kernel can write to any address it wants, if you just use a user-space address you got and use memcpy, an attacker could write to another process's pages, which is a huge security problem. copy_to_user checks that the target page is writable by the current process.
There are also some architecture considerations. On x86, for example, the target pages must be pinned in memory. On some architectures, you might need special instructions. And so on. The Linux kernels goal of being very portable requires this kind of abstraction.
This answer may be late but anyway copy_to_user() and it's sister copy_from_user() both do some size limits checks about user passed size parameter and buffer sizes so a read method of:
char name[] = "This message is from kernel space";
ssize_t read(struct file *f, char __user *to, size_t size, loff_t *loff){
int ret = copy_to_user(to, name, size);
if(ret){
pr_info("[+] Error while copying data to user space");
return ret;
}
pr_info("[+] Finished copying data to user space");
return 0;
}
and a user space app read as read(ret, buffer, 10); is OK but replace 10 with 35 or more and kernel will emit this error:
Buffer overflow detected (34 < 35)!
and cause the copy to fail to prevent memory leaks. Same goes for copy_from_user() which will also make some kernel buffer size checks.
That's why you have to use char name[] and not char *name since using pointer(not array) makes determining size not possible which will make kernel emit this error:
BUG: unable to handle page fault for address: ffffffffc106f280
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
Hope this answer is helpful somehow.