Performance of GNU C implementation of getcwd() - c

According to the GNU Lib C documentation on getcwd()...
The GNU C Library version of this function also permits you to specify a null pointer for the buffer argument. Then getcwd allocates a buffer automatically, as with malloc(see Unconstrained Allocation). If the size is greater than zero, then the buffer is that large; otherwise, the buffer is as large as necessary to hold the result.
I now draw your attention to the implementation using the standard getcwd(), described in the GNU documentation:
char* gnu_getcwd ()
{
size_t size = 100;
while (1)
{
char *buffer = (char *) xmalloc (size);
if (getcwd (buffer, size) == buffer)
return buffer;
free (buffer);
if (errno != ERANGE)
return 0;
size *= 2;
}
}
This seems great for portability and stability but it also looks like a clunky compromise with all that allocating and freeing memory. Is this a possible performance concern given that there may be frequent calls to the function?
*It's easy to say "profile it" but this can't account for every possible system; present or future.

The initial size is 100, holding a 99-char path, longer than most of the paths that exist on a typical system. This means in general that there is no “allocating and freeing memory”, and no more than 98 bytes are wasted.
The heuristic of doubling at each try means that at a maximum, a logarithmic number of spurious allocations take place. On many systems, the maximum length of a path is otherwise limited, meaning that there is a finite limit on the number of re-allocations caused.
This is about the best one can do as long as getcwd is used as a black box.

This is not a performance concern because it's the getcwd function. If that function is in your critical path then you're doing it wrong.
Joking aside, there's none of this code that could be removed. The only way you could improve this with profiling is to adjust the magic number "100" (it's a speed/space trade-off). Even then, you'd only have optimized it for your file system.
You might also think of replacing free/malloc with realloc, but that would result in an unnecessary memory copy, and with the error checking wouldn't even be less code.

Thanks for the input, everyone. I have recently concluded what should have been obvious from the start: define the value ("100" in this case) and the increment formula to use (x2 in this case) to be based on the target platform. This could account for all systems, especially with the use of additional flags.

Related

Using memcpy() to move tail of buffer to its beginning? (overlap)

I have binary file read buffer which reads structures of variable length. Near the end of buffer there will always be incomplete struct. I want to move such tail of buffer to its beginning and then read buffer_size - tail_len bytes during next file read. Something like this:
char[8192] buf;
cur = 0, rcur = 0;
while(1){
read("file", &buf[rcur], 8192-rcur);
while (cur + sizeof(mystruct) < 8192){
mystruct_ptr = &buf[cur];
if (mystruct_prt->tailsize + cur >= 8192) break; //incomplete
//do stuff
cur += sizeof(mystruct) + mystruct_ptr->tailsize;
}
memcpy(buf,&buf[cur],8192-cur);
rcur=8192-cur;
cur = 0;
}
It should be okay if tail is small and buffer is big because then memcpy most likely won't overlap copied memory segment during single copy iteration. However it sounds slightly risky when tail becomes big - bigger than 50% of buffer.
If buffer is really huge and tail is also huge then it still should be okay since there's physical limit of how much data can be copied in single operation which if I remember correctly is 512 bytes for modern x86_64 CPUs using vector units. I thought about adding condition that checks length of tail and if it's too big comparing to size of buffer, performs naive byte-by-byte copy but question is:
How big is too big to consider such overlapping memcpy more or less safe. tail > buffer size - 2kb?
Per the standard, memcpy() has undefined behavior if the source and destination regions overlap. It doesn't matter how big the regions are or how much overlap there is. Undefined behavior cannot ever be considered safe.
If you are writing to a particular implementation, and that implementation defines behavior for some such copying, and you don't care about portability, then you can rely on your implementation's specific behavior in this regard. But I recommend not. That would be a nasty bug waiting to bite people who decide to use the code with some other implementation after all. Maybe even future you.
And in this particular case, having the alternative of using memmove(), which is dedicated to this exact purpose, makes gambling with memcpy() utterly reckless.

Is this code vulnerable to buffer overflow?

Fortify reported a buffer overflow vulnerability in below code citing following reason -
In this case we are primarily concerned with the case "Depends upon properties of the data that are enforced outside of the immediate scope of the code.", because we cannot verify the safety of the operation performed by memcpy() in abc.cpp
void create_dir(const char *sys_tmp_dir, const char *base_name,
size_t base_name_len)
{
char *tmp_dir;
size_t sys_tmp_dir_len;
sys_tmp_dir_len = strlen(sys_tmp_dir);
tmp_dir = (char*) malloc(sys_tmp_dir_len + 1 + base_name_len + 1);
if(NULL == tmp_dir)
return;
memcpy(tmp_dir, sys_tmp_dir, sys_tmp_dir_len);
tmp_dir[sys_tmp_dir_len] = FN_LIBCHAR;
memcpy(tmp_dir + sys_tmp_dir_len + 1, base_name, base_name_len);
tmp_dir[sys_tmp_dir_len + base_name_len + 1] = '\0';
..........
..........
}
It appears to me a false positive since we are getting the size of data first, allocating that much amount of space, then calling memcpy with size to copy.
But I am looking for good reasons to convince fellow developer to get rid of current implementation and rather use c++ strings. This issue has been assigned to him. He just sees this a false positive so doesn't want to change anything.
Edit I see quick, valid criticism of the current code. Hopefully, I'll be able to convince him now. Otherwise, I'll hold the baton. :)
Take a look to strlen(), it has input string but it has not an upper bound then it'll go on searching until it founds \0. It's a vulnerability because you'll perform memcpy() trusting its result (if it won't crash because of access violation while searching). Imagine:
create_dir((const char*)12345, baseDir, strlen(baseDir));
You tagged both C and C++...if you're using C++ then std::string will protect you from these issues.
It appears to me a false positive since we are getting the size of data first, allocating that much amount of space
This assumption is a problem that matches the warning/error. In your code, you're assuming that malloc successfully allocated the requested memory. If your system has no memory to spare, malloc will fail and return NULL. When you try to memcpy into tmp_dir, you'd be copying to NULL which would be bad news.
You should check to guarantee that the value returned by malloc is not NULL before considering it as a valid pointer.

How to prevent the compiler from optimizing memory access to benchmark read() vs mmap() performance?

I would like to benchmark read() vs mmap() performance of a C program reading 10GB of data. If I have read or mmap'ed the data to a buffer, what should be done in order to make sure the data was actually read?
At the moment I use the following function after each single read() and after the one mmap() operation to make sure data is actually in memory:
void use_data(void *data, size_t length) {
volatile int c = 0;
for (size_t i = 0; i < length; i++) {
c += *((char *) data + i);
}
}
However, I feel this might even introduce overhead? Maybe one can even distinguish between read() and mmap():
In the read() case I think no explicit data access is needed, because the read() call will copy the data to a buffer anyway. In the case of mmap() however, I think some kind of summing up/counting need to be performed in order to make the kernel load every page.
Any recommendations?
You don't need to access the volatile variable for each byte you process. Sum all bytes into a local. Then, write the sum into a volatile variable.
In fact you don't need a volatile variable. You can use any opaque sink that the compiler cannot prove as unneeded. Writing the sum to a temp file would be guaranteed to work as well.
Note, that this is not just a hack to make the compiler cooperate. This is guaranteed to touch every byte (because it could influence the result). The result is needed for an external IO. This cannot be optimized away under the standard.
If alignment allows, sum in bigger units such as 32 or 64 bits. Use unsigned types to avoid UB on overflow. You want to be memory/IO bound, not ALU bound. You can create instruction-level parallelism by summing multiple independent streams using multiple local accumulator variables.

Correct way to profile a memory allocator

I have written a memory allocator that is (supposedly) faster than using malloc/free.
I have written a small amout of code to test this but I'm not sure if this is the correct way to profile a memory allocator, can anyone give me some advice?
The output of this code is:
Mem_Alloc: 0.020000s
malloc: 3.869000s
difference: 3.849000s
Mem_Alloc is 193.449997 times faster.
This is the code:
int i;
int mem_alloc_time, malloc_time;
float mem_alloc_time_float, malloc_time_float, times_faster;
unsigned prev;
// Test Mem_Alloc
timeBeginPeriod (1);
mem_alloc_time = timeGetTime ();
for (i = 0; i < 100000; i++) {
void *p = Mem_Alloc (100000);
Mem_Free (p);
}
// Get the duration
mem_alloc_time = timeGetTime () - mem_alloc_time;
// Test malloc
prev = mem_alloc_time; // For getting the difference between the two times
malloc_time = timeGetTime ();
for (i = 0; i < 100000; i++) {
void *p = malloc (100000);
free (p);
}
// Get the duration
malloc_time = timeGetTime() - malloc_time;
timeEndPeriod (1);
// Convert both times to seconds
mem_alloc_time_float = (float)mem_alloc_time / 1000.0f;
malloc_time_float = (float)malloc_time / 1000.0f;
// Print the results
printf ("Mem_Alloc: %fs\n", mem_alloc_time_float);
printf ("malloc: %fs\n", malloc_time_float);
if (mem_alloc_time_float > malloc_time_float) {
printf ("difference: %fs\n", mem_alloc_time_float - malloc_time_float);
} else {
printf ("difference: %fs\n", malloc_time_float - mem_alloc_time_float);
}
times_faster = (float)max(mem_alloc_time_float, malloc_time_float) /
(float)min(mem_alloc_time_float, malloc_time_float);
printf ("Mem_Alloc is %f times faster.\n", times_faster);
Nobody cares[*] whether your allocator is faster or slower than their allocator, at allocating and then immediately freeing a 100k block 100k times. That is not a common memory allocation pattern (and for any situation where it occurs, there are probably better ways to optimize than using your memory allocator. For example, use the stack via alloca or use a static array).
People care greatly whether or not your allocator will speed up their application.
Choose a real application. Study its performance at allocation-heavy tasks with the two different allocators, and compare that. Then study more allocation-heavy tasks.
Just for one example, you might compare the time to start up Firefox and load the StackOverflow front page. You could mock the network (or at least use a local HTTP proxy), to remove a lot of the random variation from the test. You could also use a profiler to see how much time is spent in malloc and hence whether the task is allocation-heavy or not, but beware that stuff like "overcommit" might mean that not all of the cost of memory allocation is paid in malloc.
If you wrote the allocator in order to speed up your own application, you should use your own application.
One thing to watch out for is that often what people want in an allocator is good behavior in the worst case. That is to say, it's all very well if your allocator is 99.5% faster than the default most of the time, but if it does comparatively badly when memory gets fragmented then you lose in the end, because Firefox runs for a couple of hours and then can't allocate memory any more and falls over. Then you realise why the default is taking so long over what appears to be a trivial task.
[*] This may seem harsh. Nobody cares whether it's harsh ;-)
All your implementation you are testing against is missing is checking if current size of packet is same as previously fried one:
if(size == prev_free->size)
{
current = allocate(prev_free);
return current;
}
It is "trivial" to make efficient malloc/free functions for memory until memory is not fragmented. Challenge is when you allocate lot of memory of different sizes and you try to free some and then allocate some whit no specific order.
You have to check which library you tested against and check what conditions that library was optimised for.
de-fragmented memory handling efficiency
fast free, fast malloc (you can make either one O(1) ),
memory footprint
multiprocessor support
realloc
Check existing implementations and problems they were dealing whit and try to improve or solve difficulties they had. Try to figure out what users expects from library.
Make test on this assumptions, not just some operation you think is important.

Is there any hard-wired limit on recursion depth in C

The program under discussion attempts to compute sum-of-first-n-natural-numbers using recursion. I know this can be done using a simple formula n*(n+1)/2 but the idea here is to use recursion.
The program is as follows:
#include <stdio.h>
unsigned long int add(unsigned long int n)
{
return (n == 0) ? 0 : n + add(n-1);
}
int main()
{
printf("result : %lu \n", add(1000000));
return 0;
}
The program worked well for n = 100,000 but when the value of n was increased to 1,000,000 it resulted in a Segmentation fault (core dumped)
The following was taken from the gdb message.
Program received signal SIGSEGV, Segmentation fault.
0x00000000004004cc in add (n=Cannot access memory at address 0x7fffff7feff8
) at k.c:4
My question(s):
Is there any hard-wired limit on recursion depth in C? or does the recursion depth depends on the available stack memory?
What are the possible reasons why a program would receive a reSIGSEGV signal?
Generally the limit will be the size of the stack. Each time you call a function, a certain amount of stack is eaten (usually dependent on the function). The eaten amount is the stack frame, and it is recovered when the function returns. The stack size is almost almost fixed when the program starts, either from being specified by the operating system (and often adjustable there), or even being hardcoded in the program.
Some implementations may have a technique where they can allocate new stack segments at run time. But in general, they don't.
Some functions will consume stack in slightly more unpredictable ways, such as when they allocate a variable-length array there.
Some functions may be compiled to use tail-calls in a way that will preserve stack space. Sometimes you can rewrite your function so that all calls (Such as to itself) happen as the last thing it does, and expect your compiler to optimise it.
It's not that easy to see exactly how much stack space is needed for each call to a function, and it will be subject to the optimisation level of the compiler. A cheap way to do that in your case would be to print &n each time its called; n will likely be on the stack (especially since the progam needs to take its address -- otherwise it could be in a register), and the distance between successive locations of it will indicate the size of the stack frame.
1)Consumption of the stack is expected to be reduced and written as tail recursion optimization.
gcc -O3 prog.c
#include <stdio.h>
unsigned long long int add(unsigned long int n, unsigned long long int sum){
return (n == 0) ? sum : add(n-1, n+sum); //tail recursion form
}
int main(){
printf("result : %llu \n", add(1000000, 0));//OK
return 0;
}
There is no theoretical limit to recursion depth in C. The only limits are those of your implementation, generally limited stack space.
(Note that the C standard doesn't actually require a stack-based implementation. I don't know of any real-world implementations that aren't stack based, but keep that in mind.)
A SIGSEGV can be caused by any number of things, but exceeding your stack limit is a relatively common one. Dereferencing a bad pointer is another.
The C standard does not define the minimum supported depth for function calls. If it did, which is quite hard to guarantee anyway, it would have it mentioned somewhere in section 5.2.4 Environmental limits.

Resources