Array of pointers - need bigger - c

How can I create an array of pointers that can store more than 1,047,141 pointers? I calculated this number using the following code:
int main(int argc, char const *argv[]) {
long a = 0;
while(1==1){
char * str[a];
printf("%ld is good.\n", a);
a++;
//Loop ends on Segmentation fault
}
return 0;
}
I am using the array of pointers to store strings. What are the alternatives?
Edit
The code above is just a way of finding the max size of an array of pointers.
One pointer holds one string, so the max number of strings I can store is 1,047,141. I need a way of storing more than 1,047,141 strings.

Allocate the array dynamically via malloc().
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char const *argv[]) {
long a = 0;
while(1==1){
char ** str = malloc(sizeof(char*) * a);
if (str != NULL){
printf("%ld is good.\n", a);
free(str);
} else {
break;
}
a++;
}
return 0;
}

You have to allocate the arrays on the heap with malloc. This code will allocate an array of pointers long how_many_strings; and for each pointer it will allocate a string long str_length.
char** str = malloc(sizeof(char*)*how_many_strings);
for(int i = 0; i < how_many_strings; i++)
{
str[i] = malloc(sizeof(char)*str_length);
}
The size is limited to your RAM capacity.

The OP code has undefined behavior. The array isn't used, so if you use -O2 (gcc), you are just printing a as it increments. Gcc generates:
.L2:
movq %rbx, %rdx
movl $.LC0, %esi
movl $1, %edi
xorl %eax, %eax
addq $1, %rbx
call __printf_chk
jmp .L2
It won't segfault, but the output will be quite boring.
However, with -O0, gcc generates a much longer loop (that I don't want to paste) that creates larger and larger str buffers on the stack. At some point when running this you will run out of stack space, which can cause a segfault.

Related

How to change the local variable without its reference

Interview question : Change the local variable value without using a reference as a function argument or returning a value from the function
void func()
{
/*do some code to change the value of x*/
}
int main()
{
int x = 100;
printf("%d\n", x); // it will print 100
func(); // not return any value and reference of x also not sent
printf("%d\n", x); // it need to print 200
}
x value need to changed
The answer is that you can’t.
The C programming language offers no way of doing this, and attempting to do so invariably causes undefined behaviour. This means that there are no guarantees about what the result will be.
Now, you might be tempted to exploit undefined behaviour to subvert C’s runtime system and change the value. However, whether and how this works entirely depends on the specific executing environment. For example, when compiling the code with a recent version of GCC and clang, and enabling optimisation, the variable x simply ceases to exist in the output code: There is no memory location corresponding to its name, so you can’t even directly modify a raw memory address.
In fact, the above code yields roughly the following assembly output:
main:
subq $8, %rsp
movl $100, %esi
movl $.LC0, %edi
xorl %eax, %eax
call printf
xorl %eax, %eax
call func
movl $100, %esi
movl $.LC0, %edi
xorl %eax, %eax
call printf
xorl %eax, %eax
addq $8, %rsp
ret
As you can see, the value 100 is a literal directly stored in the ESI register before the printf call. Even if your func attempted to modify that register, the modification would then be overwritten by the compiled printf call:
…
movl $200, %esi /* This is the inlined `func` call! */
movl $100, %esi
movl $.LC0, %edi
xorl %eax, %eax
call printf
…
However you dice it, the answer is: There is no x variable in the compiled output, so you cannot modify it, even accepting undefined behaviour. You could modify the output by overriding the printf function call, but that wasn’t the question.
By the design of the C language, and by the definition of a local variable, you cannot access it from outside without making it available in some way.
Some ways to make a local variable accessible to the outside world:
send a copy of it (the value);
send a pointer to it (don't save and use the pointer for too long, since the variable may be removed when its scope ends);
export it with extern if the variable is declared at file level (outside of all functions).
Hack
Only changing code in void func(), create a define.
Akin to #chqrlie.
void func()
{
/*do some code to change the value of x*/
#define func() { x = 200; }
}
int main()
{
int x = 100;
printf("%d\n", x); // it will print 100
func(); // not return any value and reference of x also not sent
printf("%d\n", x); // it need to print 200
}
Output
100
200
The answer is that you can’t, but...
I perfectly agree with what #virolino and #Konrad Rudolph and I don't like my "solution" to this problem be recognised as a best practise, but since this is some sort of challenge one can come up with this approach.
#include <stdio.h>
static int x;
#define int
void func() {
x = 200;
}
int main() {
int x = 100;
printf("%d\n", x); // it prints 100
func(); // not return any value and reference of x also not sent
printf("%d\n", x); // it prints 200
}
The define will set int to nothing. Thus x will be the global static x and not the local one. This compiles with a warning, since the line int main() { is now only main(){. It only compiles due to the special handling of a function with return type int.
This approach is hacky and fragile, but that interviewer is asking for it. So here's an example for why C and C++ are such fun languages:
// Compiler would likely inline it anyway and that's necessary, because otherwise
// the return address would get pushed onto the stack as well.
inline
void func()
{
// volatile not required here as the compiler is told to work with the
// address (see lines below).
int tmp;
// With the line above we have pushed a new variable onto the stack.
// "volatile int x" from main() was pushed onto it beforehand,
// hence we can take the address of our tmp variable and
// decrement that pointer in order to point to the variable x from main().
*(&tmp - 1) = 200;
}
int main()
{
// Make sure that the variable doesn't get stored in a register by using volatile.
volatile int x = 100;
// It prints 100.
printf("%d\n", x);
func();
// It prints 200.
printf("%d\n", x);
return 0;
}
Boring answer: I would use a straightforward, global pointer variable:
int *global_x_pointer;
void func()
{
*global_x_pointer = 200;
}
int main()
{
int x = 100;
global_x_pointer = &x;
printf("%d\n", x);
func();
printf("%d\n", x);
}
I'm not sure what "sending reference" means. If setting a global pointer counts as sending a reference, then this answer obviously violates the stated problem's curious stipulations and isn't valid.
(On the subject of "curious stipulations", I've sometimes wished SO had another tag, something like driving-screws-with-a-hammer, because that's what these "brain teasers" always make me think of. Perfectly obvious question, perfectly obvious answer, but no, gotcha, you can't use that answer, you're stuck on a desert island and your C compiler's for statement got broken in the shipwreck, so you're supposed to be McGyver and use a coconut shell and a booger instead. Occasionally these questions can demonstrate good lateral thinking skills and are interesting, but most of the time, they're just dumb.)

divide and store quotient and reminder in different arrays

The standard div() function returns a div_t struct as parameter, for example:
/* div example */
#include <stdio.h> /* printf */
#include <stdlib.h> /* div, div_t */
int main ()
{
div_t divresult;
divresult = div (38,5);
printf ("38 div 5 => %d, remainder %d.\n", divresult.quot, divresult.rem);
return 0;
}
My case is a bit different; I have this
#define NUM_ELTS 21433
int main ()
{
unsigned int quotients[NUM_ELTS];
unsigned int remainders[NUM_ELTS];
int i;
for(i=0;i<NUM_ELTS;i++) {
divide_single_instruction(&quotient[i],&reminder[i]);
}
}
I know that the assembly language for division does everything in single instruction, so I need to do the same here to save on cpu cycles, which is bassicaly move the quotient from EAX and reminder from EDX into a memory locations where my arrays are stored. How can this be done without including the asm {} or SSE intrinsics in my C code ? It has to be portable.
Since you're writing to the arrays in-place (replacing numerator and denominator with quotient and remainder) you should store the results to temporary variables before writing to the arrays.
void foo (unsigned *num, unsigned *den, int n) {
int i;
for(i=0;i<n;i++) {
unsigned q = num[i]/den[i], r = num[i]%den[i];
num[i] = q, den[i] = r;
}
}
produces this main loop assembly
.L5:
movl (%rdi,%rcx,4), %eax
xorl %edx, %edx
divl (%rsi,%rcx,4)
movl %eax, (%rdi,%rcx,4)
movl %edx, (%rsi,%rcx,4)
addq $1, %rcx
cmpl %ecx, %r8d
jg .L5
There are some more complicated cases where it helps to save the quotient and remainder when they are first used. For example in testing for primes by trial division you often see a loop like this
for (p = 3; p <= n/p; p += 2)
if (!(n % p)) return 0;
It turns out that GCC does not use the remainder from the first division and therefore it does the division instruction twice which is unnecessary. To fix this you can save the remainder when the first division is done like this:
for (p = 3, q=n/p, r=n%p; p <= q; p += 2, q = n/p, r=n%p)
if (!r) return 0;
This speeds up the result by a factor of two.
So in general GCC does a good job particularly if you save the quotient and remainder when they are first calculated.
The general rule here is to trust your compiler to do something fast. You can always disassemble the code and check that the compiler is doing something sane. It's important to realise that a good compiler knows a lot about the machine, often more than you or me.
Also let's assume you have a good reason for needing to "count cycles".
For your example code I agree that the x86 "idiv" instruction is the obvious choice. Let's see what my compiler (MS visual C 2013) will do if I just write out the most naive code I can
struct divresult {
int quot;
int rem;
};
struct divresult divrem(int num, int den)
{
return (struct divresult) { num / den, num % den };
}
int main()
{
struct divresult res = divrem(5, 2);
printf("%d, %d", res.quot, res.rem);
}
And the compiler gives us:
struct divresult res = divrem(5, 2);
printf("%d, %d", res.quot, res.rem);
01121000 push 1
01121002 push 2
01121004 push 1123018h
01121009 call dword ptr ds:[1122090h] ;;; this is printf()
Wow, I was outsmarted by the compiler. Visual C knows how division works so it just precalculated the result and inserted constants. It didn't even bother to include my function in the final code. We have to read in the integers from console to force it to actually do the calculation:
int main()
{
int num, den;
scanf("%d, %d", &num, &den);
struct divresult res = divrem(num, den);
printf("%d, %d", res.quot, res.rem);
}
Now we get:
struct divresult res = divrem(num, den);
01071023 mov eax,dword ptr [num]
01071026 cdq
01071027 idiv eax,dword ptr [den]
printf("%d, %d", res.quot, res.rem);
0107102A push edx
0107102B push eax
0107102C push 1073020h
01071031 call dword ptr ds:[1072090h] ;;; printf()
So you see, the compiler (or this compiler at least) already does what you want, or something even more clever.
From this we learn to trust the compiler and only second-guess it when we know it isn't doing a good enough job already.

for(;...) or while(...) flow control? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Which one, 1 or 2, is better in any way (whatever can be considered better)? Are they exactly the same?
void method1(char **var1) {
//the last element of var1 is NULL
char **var2 = var1;
int count = 0;
//1
for (; *var2; (*var2)++, count++);
//2
while(*var2) {
(*var2)++;
count++;
}
}
you could examine the asm output at different optimization levels with your compiler... or just not worry about stuff that is semantically the same...
...
LBB0_1: ## =>This Inner Loop Header: Depth=1
movq -16(%rbp), %rax
cmpq $0, (%rax)
je LBB0_4
## BB#2: ## in Loop: Header=BB0_1 Depth=1
jmp LBB0_3
LBB0_3: ## in Loop: Header=BB0_1 Depth=1
movq -16(%rbp), %rax
movq (%rax), %rcx
addq $1, %rcx
movq %rcx, (%rax)
movl -20(%rbp), %edx
addl $1, %edx
movl %edx, -20(%rbp)
jmp LBB0_1
LBB0_4:
...
.subsections_via_symbols
method2:
...
LBB0_1: ## =>This Inner Loop Header: Depth=1
movq -16(%rbp), %rax
cmpq $0, (%rax)
je LBB0_3
## BB#2: ## in Loop: Header=BB0_1 Depth=1
movq -16(%rbp), %rax
movq (%rax), %rcx
addq $1, %rcx
movq %rcx, (%rax)
movl -20(%rbp), %edx
addl $1, %edx
movl %edx, -20(%rbp)
jmp LBB0_1
LBB0_3:
...
.subsections_via_symbols
Purpose of the code in question
Your code seems to be entirely wrong as it increments the target of var2 pointer, which also serves for ending the loop. You cannot expect an incrementing value to reach zero. I will assume that (1) you wanted to increment the temporary pointer to iterate over a list (technically an array) of character strings and (2) that you expect a NULL pointer as a sentinel.
Detailed explanation of the pointer incrementation issue
So what is the logic of the code we are writing? It takes an array of strings (lines in a file, list of names, etc...), counts the items, and then does whatever else you need to do. The input argument is represented by a pointer to pointer to char, which can be a bit confusing for the beginner. Pointers are used for multiple purposes in C and one is to point to the first item of a list (technically array). This is the case of the list pointer (type char **) which points to an array of pointers (type char * each) which in turn point to an array of byte/character values (type char each).
Therefore you need to increment a local char ** pointer to iterate over the items and a temporary char * pointer to iterate over characters of an item. If you just want to read data, you must never increment anything else than local (temporary) variables. Incrementing *item is nonsense and would alter the data in a bad way (the pointer would point to the second character instead of the first one), and checking the incremented pointer for being NULL is a double nonsense.
In other words, the idiom of iterating over an array using a temporary pointer requires the following actions:
Increment the temporary pointer (and nothing else) at each step.
Check the target of the pointer (and not the address it points to) for the sentinel value.
Corrected code examples
Using C99 syntax, you probably wanted to do something like:
void method1(char **list) {
size_t count = 0;
for (char **item = list; *item; item++)
count++;
...
}
The older syntax is forcing you to do:
void method1(char **list) {
char **item;
size_t count = 0;
for (item = list; *item; item++)
count++;
...
}
A more intuitive version for people not fluent in pointers:
void method1(char **list) {
size_t count = 0;
for (size_t i = 0; list[i]; i++)
count++;
...
}
Note: The count is redundant as its value is kept the same as the value of i, so you could just do for (; list[count]; count++) with an empty body or while (list[count]) count++;.
A real function to just count the items would be:
size_t get_size(char **list)
{
int count = 0;
for (char **item = list; *item; item++)
count++;
return count;
}
Of course it could be simplified to (borrowing from other answer):
size_t get_size(char **list)
{
int count = 0;
for (; *list; list++)
count++;
return count;
}
Thanks to very specific circumstances where (1) it's easy to merge the condition and the increment and (2) you're not using the current item in the body, it can be turned to:
size_t get_size(char **list)
{
int count = 0;
while (*list++)
count++;
return count;
}
Attempt to answer the for versus while dilemma
While technically the while and for loops are equivalent, the for loop expresses the iteration idiom way better, as it keeps the iteration logic separate from the rest of the code and thus also makes it more reusable, i.e. you can use the same for header with a different body for any other iterative action on the list.
Bad usage of the for loop in the original code
There are a number of things that should be considered discouraged:
1) Don't modify the object from the for loop header.
for (... ; ...; (*item)++)
...
Any code matching the above patter modifies the target object instead of performing the looping logic, whenever item is a temporary pointer to the actual data.
2) Don't decouple any non-looping code from the for loop header.
char **item = list;
...
for (; *item; *item++)
count++;
The assignment before the for loop seems out of place. If you copy-pasted the header of the for loop to iterate again over all list items, the list would seem empty because of the omitted initialization.
3) Don't perform any per-item actions in the increment of the for loop header.
for (char **item = list; *item++, count++)
;
The count++ here doesn't help the looping at all, instead it performs an actual action (counting one item). If you copy-pasted the header of the for loop and added an actual body, the count would get modified.
4) Don't use non-descriptive for arguments, use simple names for temporary variables.
for (char **var2 = var1; *var2; var2++)
count++;
The two variables differ in their purpose, yet their names are almost the same, only distinguished by a number. How exactly you name them is a matter of context and preference.
Note: Some people also prefer explicit comparison to NULL instead of relying on boolean evaluation of pointers. I'm not one of them, though. Stack Exchange seems to highlight list as a keyword but I don't think there's such a keyword in C or C++.
I would prefer the for loop, if you initialize var2 as the first argument of the for loop, i.e.
for(char **var2 = var1; *var2; var2++)
because then all conditions (initial, terminal, increment) are located in one place
I would also prefer to make the test explicit, i.e.,
for(char **var2 = var1; *var2 != NULL; var2++)
because it makes the terminal condition more visible.
Next: I would not place count++ in the for loop, because if count is not modified inside the loop it is redundant and can be calculated from var2 - var 1. If count is modified inside the loop it should be done at a single spot.
But I assume this is a matter of taste only.
Probably both are same, compiler should not make any difference.
First of all the both loops are wrong. They have no sense. I think you mean the following
int count = 0;
while ( *var1++ ) ++count;
It is the loop I would use.
Or if you want that var1 would not be changed then
int count = 0;
for ( char **p = var1; *p; ++p ) ++count;
Also you could write
char **p = var1;
while ( *p ) ++p;
int count = p - var1;
you better make the loop conditional statement more stronger and explicit to avoid bugs and infinity loops. which one is better depends in your logic and code, "for" loops is faster and easier but if you want to make a loop which needs more logic then use "while" loop.

what is causing SIGSEV?

/*
learning from all the post - please correct me if i am wrong..
now it makes sense- if i remember it right, the stack is a fixed memory segment- allocated on program start up... while the virtual memory can be sized/resized programmatically using malloc, realloc, free...
the struct pointer array -
long size = 10000000;
struct foo *bar[size];
should have been allocated from heap - using malloc()... instead of just a fixed size stack (program text)
*/
This one SIGSEV's:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
struct foo {
int x;
char s[5];
};
long size = 10000000;
struct foo *bar[size];
long i = 0;
while (i < size) {
printf("%ld \n", i);
i++;
}
}
This one works - commenting out the struct foo pointer array:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
struct foo {
int x;
char s[5];
};
long size = 10000000;
//struct foo *bar[size];
long i = 0;
while (i < size) {
printf("%ld \n", i);
i++;
}
}
This one works - commenting our the while loop:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
struct foo {
int x;
char s[5];
};
long size = 10000000;
struct foo *bar[size];
long i = 0;
while (i < size) {
//printf("%ld \n", i);
i++;
}
}
/* what i really am trying to achieve is this... which SIGSEVS -
ok thanks for all your replies i really appreciate it...
will look int stack overflow and use explore using heap memory-- thanks guys
*/
int main(void) {
struct foo {
int x;
char s[5];
};
long size = 10000000;
struct foo *bar[size];
long i = 0;
while (i < size) {
bar[i] = (struct foo *) malloc(sizeof(struct foo));
free(bar[i]);
i++;
}
return EXIT_SUCCESS;
}
long size = 10000000;
struct foo *bar[size];
will create a very big array, which may cause stack overflow, and therefore your program receive the SIGSEV.
You should create this array dynamically:
struct foo *bar = malloc(size * sizeof(struct foo *));
Why does the program work normally if these is not any function call in main()?
The definition of foo will cause main() to have a large stack frame at runtime. If you does not call any function in main(), this large stack frame will not be actually allocated or accessed (the entrance code of main() only make sure that amounts of memory be reserved by manipulating some registers and memory cells); but if you call a function in main(), the calling itself will try to access some addresses in that main() stack frame, because of stack overflow, those addresses may not be valid, this will cause SIGSEV be sent.
If you disassemble and compare the working and not-working versions of this program, this would be obvious. You could also find it out by stepping through the instructions of not-working main() one by one.
Without function call in main():
0x00001ff0 <main+0>: push %ebp
0x00001ff1 <main+1>: mov %esp,%eax
0x00001ff3 <main+3>: mov %esp,%ebp
0x00001ff5 <main+5>: sub $0x2625a10,%esp
0x00001ffb <main+11>: mov %eax,%esp
0x00001ffd <main+13>: leave
0x00001ffe <main+14>: ret
Call exit() in main():
0x00001fe0 <main+0>: push %ebp
0x00001fe1 <main+1>: mov %esp,%ebp
0x00001fe3 <main+3>: sub $0x2625a28,%esp
0x00001fe9 <main+9>: movl $0x0,(%esp) <==== This causes segfault.
0x00001ff0 <main+16>: call 0x3000 <dyld_stub_exit>
Stack overflow is causing sigsegv. There's no need of a while loop. A single printf will cause stack overflow.
Local variables are created on the stack. The variable foo is using huge space on the stack. Stack is also used to store return addresses in function calls. So both of them together will cause a stack overflow. foo uses up almost all the space in the stack. Calling printf overflows the stack
You should allocate on the heap using malloc.
Stack size is the problem here, as others have pointed out. Check out C/C++ maximum stack size of program for more details.

C Buffer Overflow - Why is there a constant number of bytes that trips a segfault? (Mac OS 10.8 64-bit, clang)

I was experimenting with buffer overflow in C, and found an interesting quirk:
For any given array size, there seems to be a set number of overflow bytes that can be written to memory before a SIGABRT crash. For example, in the code below the 10 byte array can be overflowed to 26 bytes before crashing at 27. Similarly, an array of 20 chars can be overflowed to 40 chars before it aborts on the 41st.
Can anyone explain why this is? Also, is the SIGABRT the same as (or caused by) a "segmentation fault"?
Mac OS 10.8 - Xcode 4.6, clang and lldb. Thanks!
#include <stdio.h>
int main(int argc, const char * argv[])
{
char aString[ 10 ];
char aLetter = 'a';
printf("The size of one array slot sizeof( aString[0] ) is %zu\n", sizeof(aString[0]));
printf("The size of one letter sizeof( aLetter ) is %zu\n", sizeof(aLetter));
// Overflow the aString array of chars
// lldb claims aString is initialized with values \0 or NULL at all locations
// Substitute i<27 and this code will crash regularly
for (int i=0; i<26; i++) {
aString[i]= aLetter;
}
return 0;
}
EDIT - I've stepped through it in disassembly and found this protection just after the for-loop:
0x100000f27: movq 226(%rip), %rax ; (void *)0x00007fff793f24b0: __stack_chk_guard
0x100000f2e: movq (%rax), %rax
0x100000f31: movq -8(%rbp), %rcx
0x100000f35: cmpq %rcx, %rax
0x100000f38: jne 0x100000f49 ; main + 121 at main.c:26
.
.
.
0x100000f49: callq 0x100000f4e ; symbol stub for: __stack_chk_fail
That is due to the alignment of the stack on mac os.
It is not big news, if you have googled it, you would find the answer:
Why does the Mac ABI require 16-byte stack alignment for x86-32?
It is nice to see you can actually write to the stack with no side affect in chunks little than 16 bytes.
If you exploit it several times you can get into a state where all your malicious code can be lay down and you may execute it in jumps on the stack.

Resources