Say I have two arrays, dynamically allocated on the stack using C99 features. Due to the compiler's freedom of being able to reorder such arrays however it likes, and by great desire to out of bounds check, the two arrays aren't cleanly adjacent. If I just needed it to work I'd write the whole part in assembly but I'm not that serious yet.
char a[] = {'H','e','l','l','o',',',' '};
char b[] = {'W','o','r','l','d','!','\n',0};
If this were assembly, and perhaps if I knew the language better, instead of setting b, I would just grow a by perhaps just increasing the stack pointer and then copy b over.
char* b = "World!\n";
size_t b_size = 8;// or some bothered way to get the sizeof("World!\n")
char a[] = {'H','e','l','l','o',',',' '};
predict_the_future_and_run_away_if_likely_explosion(); // Maybe... just use malloc bro
if(is_magical_failure(magically_grow_array_to_size(a, sizeof(a)+b_size))
inform_user_of_grand_tragedy_and_recover_gracefully_or_explode_in_a_fire(
"There was not enough spacetime left available to continue.");// TODO
//... or something...
strncpy(&a[a_size], b, b_size); // This copy seems necessary
If this magically_grow_array_to_size, can exist, it would, without copy, increase the size of the stack "knowing" that a is the last thing on the stack, and complain to me that I'm stupid for trying to grow the stack for an array that is smack in the middle, and say bro, just use malloc and put it on the heap like all the other good people on stackoverflow.
If the answer is "sister, just use malloc bro", please be respectful in the comments.
I know with utmost certainty that if I go into assembly and *assume the hardware architecture, and *assume there is a stack frame, and figure out how to reference the stack limit and know if It will be reached by said operation or not, everything will work.
If the answer is: c99, as elegant as it is, the feature you're looking for only got added in like version c2x or c4x or c09x... c12x, if it's within a couple years, go ahead. Beyond that, someone might appreciate it, can't promise, and hopefully I'd have accepted an answer appropriate of my time.
"don't use alloca bro" I'm having trouble parsing the answer I think; And I feel reassured as people seem to think the stack is faster than the heap~; so if alloca is slow, that's fine but why would I then go and use malloc if I don't have to? I didn't understand the it's a free lunch thing, because does that mean I can shrink my stack usage once I figure out how much I need? I feel like if I move beyond once I received a free lunch, I'll be chugging memory. Once I move beyond combining two arrays on the stack, I'd like to know I'm not using any more memory than I have to. If I could see some code on how to free lunch, considering, oh, my predicament isn't super clear.
I really don't want to have to track the length of a anywhere other than compile time, I have determined b and it's length is available and b is either in program memory or is stored on the heap. How can I free lunch using a before I've actually checked the size of a, and if I can't... well, I should be able to... and in assembly if not in C99. I still don't understand how to grow an array encouraged to be placed at the end of the stack frame... and if the endian-ness is wonky, ensure a is moved a from the middle to the beginning and add b to the end and if the endian-ness isn't wonky, just extend the size of the stack copy b in and Bob's your uncle, Alice we've done it! Should be dead simple, but apparently I'm clueless at the moment, maybe I missed the answer in plain sight, or don't know the search term but I couldn't find it.
If assembly is the best and only way to do this, if you could clear up my misconceptions about how possible or not writing cross-platform assembly is... really should be another question. I really would like a pure C99 solution if possible, if a particular flavor of assembly was perfectly defined for all C99 capable devices, or reluctantly including mostly all or vast majority of, excluding only the most obtuse of microcontrollers, and never excluding any desktop machines, I'm down for some extern asm.
Please forgive my ignorance, and the wordiness of the question, if I were smart enough for it to be shorter and concise.
With love to the comments, the following is the question: Can I merge two stack allocated arrays in place using exactly enough memory at the end but using within an order of magnitude of the memory as the below example to get there?
const char a[]={'H','e','l','l','o',',',' '};
const char b[]={'W','o','r','l','d','!','\n',0};
char c[sizeof(a)+sizeof(b)];
strncpy(c,a,sizeof(a));
strncpy(&c[sizeof(a)],b,sizeof(b));
Without assembly, I refuse to believe this is this the best I can do in terms of memory usage.
I'm using twice as much stack space as strictly necessary.
If this is the best that can be done, I'll accept the closed question. It's just annoying, I'd rather not go into assembly.
I can't figure out how to allocate enough space for c before defining a and b and once a and b are defined, I can't figure out how to get rid of them after defining c, other than to just do it in assembly.
It's not life threateningly critical, nobody is going to die, and I have a inefficient solution, but it is a serious question, I think. If nothing else it's just irritating that I can't get this space efficient (without assembly). Be it compiler directives or anything.
If it's closed again, I'll be asking how to do this in assembly. Because I'm lacking in comprehension.
If there is a clang solution, I'm fine with LLVM, however, cross compiler would be preferred.
If the gnu c compiler specifically, has a solution, it has good range of deployment targets as clang, if msvc only, I'd be less favorable to.
Related
When I'm on SO I read a lot of comments guiding (Especially in C)
"dynamic allocation allways goes to the heap, automatic allocation on the stack"
But especially regarding to plain C I disaggree with that. As the ISO/IEC9899 doesn't even drop a word of heap or stack. It just mentions three storage duriations (static, automatic, and allocated) and advises how each of them has to be treat.
What would give a compiler the option to do it even wise versa if it would like to.
So my question is:
Are the heap and the stack physical existing that (even if not in C) a standardized language can say "... has to happen on heap and ... on the stack"?
Or are they just a virtuell system of managing memory access so that a language can't make rules about them, as it can't even be ensured the enviroment supports them?
In my knowledgebase only the second would make sense. But I read allready many times people writing comments like "In language XY this WILL happen on the stack/heap". But if I'm right this had to be indeterminable as long the language isn't just made for such systems which guarantee to have a stack and heap. And all thoose comments would be wrong.
Thats what lead me to ask about this question. Am I that wrong, or is there a big error in reasoning going around about that?
You are correct in that the C spec doesn't require the use of a heap or stack, as long as it implements the storage classes correctly.
However, virtually every compiler will use stacks for automatic variables and heaps for allocated variables. While you could implement a compiler that doesn't use a stack or heap, it probably wouldn't perform very well and wouldn't be familiar to most devs.
So when people say "always", they really mean "virtually always".
I am working on some (embedded) device, recently I just started thinking maybe to use less memory, in case stack size isn't that big.
I have long functions (unfortunately).
And inside I was thinking to save space in this way.
Imagine there is code
1. void f()
2. {
3. ...
4. char someArray[300];
5. char someOtherArray[300];
6. someFunc(someArray, someOtherArray);
7. ...
8. }
Now, imagine, someArray and someOtherArray are never used in f function beyond line: 6.
Would following save some stack space??
1. void f()
2. {
3. ...
4. {//added
5. char someArray[300];
6. char someOtherArray[300];
7. someFunc(someArray, someOtherArray);
8. }//added
9. ...
8. }
nb: removed second part of the question
For the compiler proper both are exactly the same and thus makes no difference. The preprocessor would replace all instances of TEXT1 with the string constant.
#define TEXT1 "SomeLongStringLiteral"
someFunc(TEXT1)
someOtherFunc(TEXT1)
After the preprocessor's job is done, the above snippet becomes
someFunc("SomeLongStringLiteral");
someOtherFunc("SomeLongStringLiteral");
Thus it makes no difference performance or memory-wise.
Aside: The reason #define TEXT1 "SomeLongStringLiteral" is done is to have a single place to change all instances of TEXT1s usage; but that's a convinience only for the programmer and has no effect on the produced output.
recently I just started thinking maybe to use less memory, in case stack size isn't that big.
Never micro optimise or prematurely optimise. In case the stack size isn't that big, you'll get to know it when you benchmark/measure it. Don't make any assumptions when you optimise; 99% of the times it'd be wrong.
I am working on some device
Really? Are you? I wouldn't have thought that.
Now, imagine, someArray and someOtherArray are never used in f function beyond line 6. Would following save some stack space?
On a good compiler, it wouldn't make a difference. By the standard, it isn't specified if it saves or not, it isn't even specified if there is a stack or not.
But on a not so good compiler, the one with the additional {} may be better. It is worth a test: compile it and look at the generated assembler code.
it seems my compiler doesn't allow me to do this (this is C), so never mind...
But it should so. What happens then? Maybe you are just confusing levels of {} ...
I'll ask another one here.
Would better be a separate question...
someFunc("SomeLongStringLiteral");
someOtherFunc("SomeLongStringLiteral");
vs.
someFunc(TEXT1)
someOtherFunc(TEXT1)
A #define is processed before any compilation step, so it makes absolutely no difference.
If it happens within the same compilation unit, the compiler will tie them together anyway. (At least, in this case. On an ATXmega, if you use PSTR("whatever") for having them in flash space only, each occurrence of them will be put into flash separately. But that's a completely different thing...)
Modern compilers should push stack variables before they are used, and pop them when they are no longer needed. The old thinking with { ... } marking the start and end of a stack push/pop should be rather obsolete by now.
Since 1999, C allows stack variables to be allocated anywhere and not just immediately after a {. C++ allowed this far earlier. Today, where the local variable is declared inside the scope has little to do with when it actually starts to exist in the machine code. And similarly, the } has little to do with when is ceases to exist.
So regarding adding extra { }, don't bother. It is premature optimization and only adds pointless clutter.
Regarding the #define it absolutely makes no difference in terms of efficiency. Macros are just text replacement.
Furthermore, from the generic point-of-view, data must always be allocated somewhere. Data used by a program cannot be allocated in thin air! That's a very common misunderstanding. For example, many people incorrectly believe that
int x = func();
if(x == something)
consumes more memory than
if(func() == something)
But both examples compile into identical machine code. The result of func must be stored somewhere, it cannot be stored in thin air. In the first example, the result is stored in a memory segment that the programmer may refer to as x.
In the second example, it is stored in the very same memory segment, taking up the same amount of space, for the same duration of program execution. The only difference is that the memory segment is anonymous and the programmer has no name for it. As far as the machine code is concerned, that doesn't matter, since no variable names exist in machine code.
And this would be why every professional C programmer needs to understand a certain amount of assembler. You cannot hope to ever do any kind of manual code optimization if you don't.
(Please don't ask two questions in one, this is really annoying since you get two types of answer for your two different questions.)
For your first question. Probably putting {} around the use of a variable will not help. The lifetime of automatic variables that are not VLA (see below) is not bound to the scope in which it is declared. So compilers may have a hard time in figuring out how the use of the stack may be optimized, and maybe don't do such an optimization at all. In your case this is most likely the case, since you are exporting pointers to your data to a function that is perhaps not visible, here. The compiler has no way to figure out if there is a valid use of the arrays later on in the code.
I see two ways to "force" the compiler into optimizing that space, functions or VLA. The first, functions is simple: instead of putting the block around the code, put it in a static function. Function calls are quite optimized on modern platforms, and here the compiler knows exactly how he may clear the stack at the end.
The second alternative in your case is a VLA, variable length array, if you compiler supports that c99 feature. Arrays that have a size that doesn't depend on a compile time constant have a special rule for their lifetime. That lifetime exactly ends at the end of the scope where they are defined. Even a const-qualified variable could be used for that:
{
size_t const len = 300;
char someArray[len];
char someOtherArray[len];
someFunc(someArray, someOtherArray);
}
At the end, on a given platform, you'd really have to inspect what assembler your compiler produces.
From my understanding of C it seems that you are supposed to use malloc(size) whenever you are trying to initialize, for instance, an array whose size you do not know of until runtime.
But I was wondering why the function malloc() returns a pointer to the location of the variable and why you even need that.
Basically, why doesn't C just hide it all from you, so that whenever you do something like this:
// 'n' gets stdin'ed from the user
...
int someArray[n];
for(int i = 0; i < n; i++)
someArray[i] = 5;
you can do it without ever having to call malloc() or some other function? Do other languages do it like this (by hiding the memory properties/location altogether)? I feel that as a beginner this whole process of dealing with the memory locations of variables you use just confuse programmers (and since other languages don't use it, C seems to make a simple initialization process such as this overly complicated)...
Basically, what I'm trying to ask is why malloc() is even necessary, because why the language doesn't take care of all that for you internally without the programmer having to be concerned about or having to see memory. Thanks
*edit: Ok, maybe there are some versions of C that I'm not aware of that allows you to forgo the use of malloc() but let's try to ignore that for now...
C lets you manage every little bit of your program. You can manage when memory gets allocated; you can manage when it gets deallocated; you can manage how to grow a small allocation, etc.
If you prefer not to manage that and let the compiler do it for you, use another language.
Actually C99 allows this (so you're not the only one thinking of it). The feature is called VLA (VAriable Length Array).
It's legal to read an int and then have an array of that size:
int n;
fscanf("%d", &n);
int array[n];
Of course there are limitations since malloc uses the heap and VLAs use the stack (so the VLAs can't be as big as the malloced objects).
*edit: Ok, maybe there are some versions of C that I'm not aware of that allows you to forgo the use of malloc() but let's try to ignore
that for now...
So we can concentrate on the flame ?
Basically, what I'm trying to ask is why malloc() is even necessary,
because why the language doesn't take care of all that for you
internally without the programmer having to be concerned about or
having to see memory.
The very point of malloc(), it's raison d'ĂȘtre, it's function, if you will, is to allocate a block of memory. The way we refer to a block of memory in C is by its starting address, which is by definition a pointer.
C is close to 40 years old, and it's not nearly as "high level" as some more modern languages. Some languages, like Java, attempt to prevent mistakes and simplify programming by hiding pointers and explicit memory management from the programmer. C is not like that. Why? Because it just isn't.
Basically, what I'm trying to ask is why malloc() is even necessary, because why the language doesn't take care of all that for you internally without the programmer having to be concerned about or having to see memory. Thanks
One of the hallmarks of C is its simplicity (C compilers are relatively easy to implement); one way of making a language simple is to force the programmer to do all his own memory management. Clearly, other languages do manage objects on the heap for you - Java and C# are modern examples, but the concept isn't new at all; Lisp implementations have been doing it for decades. But that convenience comes at a cost in both compiler complexity and runtime performance.
The Java/C# approach helps eliminate whole classes of memory-management bugs endemic to C (memory leaks, invalid pointer dereferences, etc.). By the same token, C provides a level of control over memory management that allows the programmer to achieve high levels of performance that would be difficult (not impossible) to match in other languages.
If the only purpose of dynamic allocation were to allocate variable-length arrays, then malloc() might not be necessary. (But note that malloc() was around long before variable-length arrays were added to the language.)
But the size of a VLA is fixed (at run time) when the object is created. It can't be resized, and it's deallocated only when you leave the scope in which it's declared. (And VLAs, unlike malloc(), don't have a mechanism for reporting allocation failures.)
malloc() gives you a lot more flexibility.
Consider creating a linked list. Each node is a structure, containing some data and a pointer to the next node in the list. You might know the size of each node in advance, but you don't know how many nodes to allocate. For example, you might read lines from a text file, creating and appending a new node for each line.
You can also use malloc() along with realloc() to create a buffer (say, an array of unsigned char) whose size can be changed after you created it.
Yes, there are languages that don't expose pointers, and that handle memory management for you.
A lot of them are implemented in C.
Maybe the question should be "why do you need something like int array[n] when you can use pointers?"
After all, pointers allow you to keep an object alive beyond the scope it was created in, you can use pointer to slice and dice arrays (for example strchr() returns a pointer to a string), pointers are light-weight objects, so it's cheap to pass them to functions and return them from functions, etc.
But the real answer is "that's how it is". Other options are possible, and the proof is that there are other languages that do other things (and even C99 allows different things).
C is treated as highly developed low-level language, basically malloc is used in dynamic arrays which is a key component in stack & queue. for other languages that hides the pointer part from the developer are not well capable of doing hardware related programming.
The short answer to your question is to ponder this question: What if you also need to control exactly when the memory is de-allocated?
C is a compiled language, not an interpreted one. If you don't know n at compile time, how is the compiler supposed to produce a binary?
I'm referring to the main static languages today (C, C++, java, C#,). I've heard some contradicting answers about this, so I wanted to know:
If I have some code such as:
loop(...) {
type x = val;
...
}
('loop' is some type of loop, e.g. for, while)
Will it cause memory allocation in each iteration of the loop, or just once? Is it different from writing this:
type x;
loop(...) {
x = val;
...
}
where memory is only allocated once for x?
The strictly correct answer is that it depends on the implementation, as both are semantically correct. No language specification would require or prohibit such implementation details.
That said, any implementation worth its salt will be able to reuse the same stack slot or even CPU register (with native compilation, especially likely in presence of a JIT). Even the bytecode will likely be completely identical.
And finally, there's that thing with premature optimization... Unless proven otherwise, you shouldn't even bother thinking about low-level details like this (if you think knowledge and control over such issues matters, perhaps you should just program in assembler), because:
Unless you're doing a microbenchmark (or a really huge number-crunching task - but how many people freaking out about performance actually do those?), you won't even notice any difference even if it isn't optimized. If you're doing anything of interest in the loop body, it will dwarf the difference (again, if any). Especially if you're doing any I/O.
Even if there is memory allocation, it boils down to pushing and popping a few bytes on the native stack, which in turn boils down to adding an integer constant to a hardware register. All C and C++ programs use that stack for their local variables, and non of those ever complained about its performance... if you have to reserve space, you can't get faster than using the stack.
If you have to ask this kind of question, you're not someone who could do anything about it. Those people know to just (1) measure it, (2) look at the generated code and (3) look for large-scale optimizations before even thinking on this level ;)
I have a function and i'm accessing a struct's members a lot of times in it.
What I was wondering about is what is the good practice to go about this?
For example:
struct s
{
int x;
int y;
}
and I have allocated memory for 10 objects of that struct using malloc.
So, whenever I need to use only one of the object in a function, I usually create (or is passed as argument) pointer and point it to the required object (My superior told me to avoid array indexing because it adds a calculation when accessing any member of the struct)
But is this the right way? I understand that dereferencing is not as expensive as creating a copy, but what if I'm dereferencing a number of times (like 20 to 30) in the function.
Would it be better if i created temporary variables for the struct variables (only the ones I need, I certainly don't use all the members) and copy over the value and then set the actual struct's value before returning?
Also, is this unnecessary micro optimization? Please note that this is for embedded devices.
This is for an embedded system. So, I can't make any assumptions about what the compiler will do. I can't make any assumptions about word size, or the number of registers, or the cost of accessing off the stack, because you didn't tell me what the architecture is. I used to do embedded code on 8080s when they were new...
OK, so what to do?
Pick a real section of code and code it up. Code it up each of the different ways you have listed above. Compile it. Find the compiler option that forces it to print out the assembly code that is produced. Compile each piece of code with every different set of optimization options. Grab the reference manual for the processor and count the cycles used by each case.
Now you will have real data on which to base a decision. Real data is much better that the opinions of a million highly experience expert programmers. Sit down with your lead programmer and show him the code and the data. He may well show you better ways to code it. If so, recode it his way, compile it, and count the cycles used by his code. Show him how his way worked out.
At the very worst you will have spent a weekend learning something very important about the way your compiler works. You will have examined N ways to code things times M different sets of optimization options. You will have learned a lot about the instruction set of the machine. You will have learned how good, or bad, the compiler is. You will have had a chance to get to know your lead programmer better. And, you will have real data.
Real data is the kind of data that you must have to answer this question. With out that data nothing anyone tells you is anything but an ego based guess. Data answers the question.
Bob Pendleton
First of all, indexing an array is not very expensive (only like one operation more expensive than a pointer dereference, or sometimes none, depending on the situation).
Secondly, most compilers will perform what is called RVO or return value optimisation when returning structs by value. This is where the caller allocates space for the return value of the function it calls, and secretly passes the address of that memory to the function for it to use, and the effect is that no copies are made. It does this automatically, so
struct mystruct blah = func();
Only constructs one object, passes it to func for it to use transparently to the programmer, and no copying need be done.
What I do not know is if you assign an array index the return value of the function, like this:
someArray[0] = func();
will the compiler pass the address of someArray[0] and do RVO that way, or will it just not do that optimisation? You'll have to get a more experienced programmer to answer that. I would guess that the compiler is smart enough to do it though, but it's just a guess.
And yes, I would call it micro optimisation. But we're C programmers. And that's how we roll.
Generally, the case in which you want to make a copy of a passed struct in C is if you want to manipulate the data in place. That is to say, have your changes not be reflected in the struct it self but rather only in the return value. As for which is more expensive, it depends on a lot of things. Many of which change implementation to implementation so I would need more specific information to be more helpful. Though, I would expect, that in an embedded environment you memory is at a greater premium than your processing power. Really this reads like needless micro optimization, your compiler should handle it.
In this case creating temp variable on the stack will be faster. But if your structure is much bigger then you might be better with dereferencing.