C unknown-size structure - c

I am writing an app with the following dynamic config structure:
typedef struct {
char apphash[41];
char filenames_count;
char * filename[64];
} config;
But this code is wrong, I can't figure out how to copy data from and to c->filename[0] properly; c is a pointer to config structure, allocated dynamically like
config * c = (config *) malloc( 42 + 64 * 2 ) // alloc for 2 filenames. can realloc() later.
It segfaults if I use something like strcpy(c->filename[0],"file1.txt").
Can someone please help me with this?
Currently, I'm using direct address calculation, like
strcpy(
(char*)
((unsigned long) c + 42 /* apphash + filenames_count */ +
64 * 0 /* first item */ ),
"file1.txt"
);
and it works of course.
You see, I'm more of assembly programmer than of C, but I'd like this code to be more human-readable. This code looks that bad because I'm newcomer in C.
Oh, I gave a bad description of the situation. Sorry for that :(
The real code looks like:
config * c = (config*) malloc( 42 + 64 * 2 );
// we may realloc() it later if we are going to add more filenames.
// failing example how I do copy one default filename
strcpy(c->filename[0],"file1.txt");
// working example (i386)
strcpy((char*)((unsigned long) c + 42 + 64 * 0),"file1.txt");
I am using fully static structure type because it is going to be loaded directly from a file next time. That's why I can't really use pointers inside the structure, I need real data to be placed there.
I do check all lengths, no BOFs in real code, I just omitted all that stuff here.
I still didn't find a good solution to this.
Thanks again and sorry for bad question information.

I assume you have many filenames since you have filenames_count. Try
config_obj.filename[0] = strdup("file1.txt")

In your struct you're allocating an array of pointers to chars, not an array of chars. You must explicitely allocate also the targets of pointers, or at the very least, make the struct contain also the arrays of chars themselves:
char filename[64][MAX_PATH+1];
Replace MAX_PATH with the maximum length of any filename. Mind that this is not a very elegant solution, albeit a really simple one, because you're wasting lots of space.
Your direct address calculation is doing something different: it's placing the string directly in the space allocated for the pointers (and this is a Terribly Wrong Thing To Do™)

Right now config is a type that means the struct you have defined. You don't show us an identifier referring to an actual variable of type config.
So, first thing we need an instance of type config. You will either do
config c;
... c.filename ...
note the structure access operator is a ., or you will do something like
config *p = malloc(config)
/* error checking */
...c->filename ...
where -> is the pointer-dereference-and-access operator. The first form is preferred unless you hve a reason to want dynamic allocation (which, alas, happens a lot in c).
Then you have to figure out just what you want filename to be. As it is you have allocated space for 64 character pointers which don't point at allocated memory (except by purest acident, and then not at the memory you mean). You probably wanted{*} char filename[64] (a single filename allowed to be up to 63 characters long (to leave room for the null termination)) in which case you would use
strcpy(c.filename,"file1.txt");
/* or */
strcpy(p->filename,"file1.txt");
depending on how you allocated the structure in the first place.
If you really wanted a list of filenames, then you may want char *filenames[64], but you will have to allocate a buffer for each name before you can use it
c.filenames[0] = malloc(sizeOfString);
/* error checking */
strcpy(c.filenames[0],...
or as another poster suggested
c.filenames[o] = strdup(...
The first form may be better if you are building your filenames from multiple pieces and can project the total length from the get go.
{*} Later you may want to scrap this fixed length buffer, but leave that for now.

It is currently failing because you aren't allocating memory for the filenames. Either use strdup or malloc+strcpy (I'd use strdup).
Your filename field is an array of pointers to zero terminated string. You need to allocate the memory for the string and copy the string to that memory. You save the address of the new string in one of the pointers, e.g. filename[0].
The direct memory address code doesn't work. It just doesn't crash, yet! That code just overwrites the array of pointers. Don't ever write code like that. Never ever ever!! Writing code like that is morally equivalent to eating baby unicorns.

Related

Pointer Requires More Memory Allocation Than It Theoretically Should

Pretty new to C, but I thought I had the hang of allocating and managing memory until I ran into this issue recently.
I am working on a "make" utility. (It's not homework, just my friend's old assignment that I thought I could glean valuable practice from.) As I'm sure most of you know, makefiles have various targets, and these targets have depdendencies that must be attended to before the targets's commands can be executed.
In order to store data for a given target's dependencies found while parsing the makefile I made the following:
typedef struct{
char* target;
char** dependency_list;
}dependency_tracker;
In order to keep track of multiple dependency_trackers, I declared (and subsequently allocated for) the following variable. (NOTICE the "+4" after "total_number_of_targets". THE PROGRAM DOESN'T WORK WITHOUT IT, AND MY QUESTION IS WHY THAT IS.)
dependency_tracker** d_tracker_ptr = (dependency_tracker**) malloc((total_number_of_targets+4)*sizeof(dependency_tracker*));
I then sent the pointer for this to the parsing method with the following line:
parse_file(filename,&d_tracker_ptr);
Within the parse_file function, I believe these are the most important calls I make (left out string parsing calls). Note that target_counter is the number of targets parsed so far. I think everything else should be somewhat manageable to figure out:
dependency_tracker** tracker_ptr = *tracker_ptr_address; // tracker_ptr_address is the pointer I passed to the function above
// declare and allocate for the new struct we are creating
dependency_tracker* new_tracker_ptr = (dependency_tracker*) malloc(sizeof(dependency_tracker));
char* new_tracker_ptr_target = (char*) malloc((size_of_target)*sizeof(char)); // size_of_target is the string length
new_tracker_ptr->target = new_tracker_ptr_target;
*(tracker_ptr+target_counter*sizeof(dependency_tracker*)) = new_tracker_ptr;
As I mentioned earlier, I have to allocate space for four more (dependency_tracker*)'s than I would have thought I needed to in order for this program to complete without a segfault.
I came to the conclusion that this was because I was overwriting the space I had allocated for the pointer I pass to parse_file.
My question is: why does this happen? Even if space for a NULL pointer is needed, that shouldn't require the space of 4 additional pointers. And the program produces a segfault if I allocate anything less than 25 additional bytes in the original call to malloc
Let me know if anything needs clarification. I know this is a bit of a novel.
This is broken:
*(tracker_ptr+target_counter*sizeof(dependency_tracker*)) = new_tracker_ptr;
The pointer size is accounted for by C. You want:
tracker_ptr[target_counter] = new_tracker_ptr;
Also as I mentioned in comments, you did not allow for a null terminator in the strings.
Another comment: C does not require a cast on malloc, and using one invites trouble. Also it's safer to just dereference the pointer you're assigning to inform sizeof. So just say:
dependency_tracker *new_tracker_ptr = malloc(sizeof *new_tracker_ptr);
char* new_tracker_ptr_target = malloc(size_of_target * sizeof *new_tracker_ptr_target);
dependency_tracker *new_tracker_ptr = malloc(*new_tracker_ptr);
new_tracker_ptr->target = new_tracker_ptr_target;
Additionally, you may want to reconsider the vacuous words in your variable names. I'm actually a big fan of longish, explanatory identifiers, but "tracker" and "target" are so vague that they add little clarity. Similarly, embedding type information in variable names a la _ptr was a fad about 30 years ago. It's over now. If you have a function where the declaration and a variable name can't be grok'ed on the same screen, the function is too big.
*(tracker_ptr+target_counter*sizeof(dependency_tracker*)) = ...
This is the problem. Pointer arithmetic doesn't work like that. You do not have to multiply by sizeof(anyhing) when using properly typed (i.e. not char*) pointer arithmetic. What's better, you don't have to use pointer arithmetic at all.
tracker_ptr[target_counter] = ...
is all that's needed.

how to allocate memory to store 5 names without wasting not even 1 byte

I want to store 5 names without wasting 1byte , so how can allocate memory using malloc
That's for all practical purposes impossible, malloc will more often than not return blocks of memory bigger than requested.
#include <stdio.h>
#include<stdlib.h>
int main()
{
int n,i,c;
char *p[5];/*declare a pointer to 5 strings for the 5 names*/
for(i=0;i<5;i++)
{
n=0;
printf("please enter the name\n" );/*input name from the user*/
while((c=getchar())!='\n')
n++;/*count the total number of characters in the name*/
p[i]= (char *)malloc(sizeof(char)*n);/*allocate the required amount of memory for a name*/
scanf("%s",p[i]);
}
return 0;
}
If you know the cumulative length of the five names, let's call it length_names, you could do a
void *pNameBlock = malloc(length_names + 5);
Then you could store the names, null terminated (the +5 is for the null termination), one right after the other in the memory pointed to by pNameBlock.
char *pName1 = (char *) pNameBlock;
Store the name data at *pName1. Maybe via
char *p = *pName1; You can then write byte by byte (following is pseudo-codeish).
*p++ = byte1;
*p++ = byte2;
etc.
End with a null termination:
*p++ = '\0';
Now set
char *pName2 = p;
and write the second name using p, as above.
Doing things this way will still waste some memory. Malloc will internally get itself more memory than you are asking for, but it will waste that memory only once, on this one operation, getting this one block, with no overhead beyond this once.
Be very careful, though, because under this way of doing things, you can't free() the char *s, such as pName1, for the names. You can only free that one pointer you got that one time, pNameBlock.
If you are asking this question out of interest, ok. But if you are this memory constrained, you're going to have a very very hard time. malloc does waste some memory, but not a lot. You're going to have a hard time working with C this constrained. You'd almost have to write your own super light weight memory manager (do you really want to do that?). Otherwise, you'd be better off working in assembly, if you can't afford to waste even a byte.
I have a hard time imagining what kind of super-cramped embedded system imposes this kind of limit on memory usage.
If you don't want to waste any byte to store names, you should dynamically allocate a double array (char) in C.
A double array in C can be implemented as a pointer to a list of pointers.
char **name; // Allocate space for a pointer, pointing to a pointer (the beginning of an array in C)
name = (char **) malloc (sizeof(char *) * 5); // Allocate space for the pointer array, for 5 names
name[0] = (char *) malloc (sizeof(char) * lengthOfName1); // Allocate space for the first name, same for other names
name[1] = (char *) malloc (sizeof(char) * lengthOfName2);
....
Now you can save the name to its corresponding position in the array without allocating more space, even though names might have different lengths.
You have to take double pointer concept and then have to put your name character by character with increment of pointer address and then you are able to save all 5 names so as you are able to save your memory.
But as programmer you should not have to use this type of tedious task you have to take array of pointers to store names and have to allocate memory step by step.
This is only for the concept of storing names but if you are dealing with large amount of data then you have to use link list to store all data.
When you malloc a block, it actually allocates a bit more memory than you asked for. This extra memory is used to store information such as the size of the allocated block.
Encode the names in binary and store them in a byte array.
What is "memory waste"? If you can define it clearly, then a solution can be found.
For example, the null in a null terminated string might be considered "wasted memory" because the null isn't printed; however, another person might not consider it memory waste because without it, you need to store a second item (string length).
When I use a byte, the byte is fully used. Only if you can show me how it might be done without that byte will I consider your claims of memory waste valid. I use the nulls at the ends of my strings. If I declare an array of strings, I use the array too. Make what you need, and then if you find that you can rearrange those items to use less memory, decide that the other way wasted some memory. Until then, you're chasing a dream which you haven't finished.
If these five "names" are assembly jump points, you don't need a full string's worth of memory to hold them. If the five "names" are block scoped variables, perhaps they won't need any more memory than the registers already provide. If they are strings, then perhaps you can combine and overlay strings; but, until you come up with a solution, and a second solution to compare the first against, you don't have a case for wasted / saved memory.

Working with Pointers and Strcpy in C

I'm fairly new to the concept of pointers in C. Let's say I have two variables:
char *arch_file_name;
char *tmp_arch_file_name;
Now, I want to copy the value of arch_file_name to tmp_arch_file_name and add the word "tmp" to the end of it. I'm looking at them as strings, so I have:
strcpy(&tmp_arch_file_name, &arch_file_name);
strcat(tmp_arch_file_name, "tmp");
However, when strcat() is called, both of the variables change and are the same. I want one of them to change and the other to stay intact. I have to use pointers because I use the names later for the fopen(), rename() and delete() functions. How can I achieve this?
What you want is:
strcpy(tmp_arch_file_name, arch_file_name);
strcat(tmp_arch_file_name, "tmp");
You are just copying the pointers (and other random bits until you hit a 0 byte) in the original code, that's why they end up the same.
As shinkou correctly notes, make sure tmp_arch_file_name points to a buffer of sufficient size (it's not clear if you're doing this in your code). Simplest way is to do something like:
char buffer[256];
char* tmp_arch_file_name = buffer;
Before you use pointers, you need to allocate memory. Assuming that arch_file_name is assigned a value already, you should calculate the length of the result string, allocate memory, do strcpy, and then strcat, like this:
char *arch_file_name = "/temp/my.arch";
// Add lengths of the two strings together; add one for the \0 terminator:
char * tmp_arch_file_name = malloc((strlen(arch_file_name)+strlen("tmp")+1)*sizeof(char));
strcpy(tmp_arch_file_name, arch_file_name);
// ^ this and this ^ are pointers already; no ampersands!
strcat(tmp_arch_file_name, "tmp");
// use tmp_arch_file_name, and then...
free(tmp_arch_file_name);
First, you need to make sure those pointers actually point to valid memory. As they are, they're either NULL pointers or arbitrary values, neither of which will work very well:
char *arch_file_name = "somestring";
char tmp_arch_file_name[100]; // or malloc
Then you cpy and cat, but with the pointers, not pointers-to-the-pointers that you currently have:
strcpy (tmp_arch_file_name, arch_file_name); // i.e., no "&" chars
strcat (tmp_arch_file_name, "tmp");
Note that there is no bounds checking going on in this code - the sample doesn't need it since it's clear that all the strings will fit in the allocated buffers.
However, unless you totally control the data, a more robust solution would check sizes before blindly copying or appending. Since it's not directly related to the question, I won't add it in here, but it's something to be aware of.
The & operator is the address-of operator, that is it returns the address of a variable. However using it on a pointer returns the address of where the pointer is stored, not what it points to.

Why does a char array need strcpy and char star doesn't - using structs in C

I have a misunderstanding regarding this code -
typedef struct _EXP{
int x;
char* name;
char lastName[40];
}XMP
...main...
XMP a;
a.name = "eaaa";
a.lastName = strcpy(a.lastName, "bbb");
Why can't I use: a.lastName = "bbbb"; and that's all?
Well consider the types here. The array has the contents of the string, while the char* merely points to the data. Consequently the array requires strcpy and friends.
Besides, if you allocated memory for the char* on the heap or stack and then wanted to assign some content to that, you'd also have to use strcpy because a mere assignment would create a dangling pointer (i.e. a memory leak).
Because the location of an array is fixed, while the value of a pointer (which is itself a location) is not. You can assign new values to a pointer, but not an array.
Under the hood, they're both the same thing; an array name in C is a pointer, but from a semantics point of view you cannot reassign an array but you can repoint a pointer.
When you write
a.name = "eaaa" ;
the compiler will allocate memory for a NULL terminated string eaaa\0 and, because of that instruction, it will make the pointer name point to that location (e.g. the name variable will contain the address of the memory location where the first byte of the string resides).
If you have the array instead, you already have an allocated area of memory (which cannot be assigned to another memory location!), and you can only fill it with data (in this case bytes representing your string).
This is my understanding about what might be the reason for this.
I think it's about the way that language works. C (and also C++) produces an unmanaged code - which means they don't need an environment (like JVM) to run on to manage memory, threading etc. So, the code is produced to an executable that is run by the OS directly. For that reason, the executable includes information, for example, how much space that to be allocated for each type (not sure for the dynamic types though) including the arrays. (This is also why C++ introduced header files since this was the only way to know size of an object during compilation)
So, when the compiler sees an array of characters, it calculates how much space is needed for it during the compilation phase and put that information into the executable. When running the program, the flow can figure out how much space is required and allocates that much of memory. If you change this multiple times, let's say in a C function, each assignment would make the previous one(s) invalid. So, IMO, that's why the compiler doesn't allow that.

Disabling NUL-termination of strings in GCC

Is it possible to globally disable NUL-terminated strings in GCC?
I am using my own string library, and I have absolutely no need for the final NUL characters as it already stores the proper length internally in a struct.
However, if I wanted to append 10 strings, this would mean that 10 bytes are unnecessarily allocated on the stack. With wide strings it is even worse: As for x86, there are 40 bytes wasted; and for x86_64, 80 bytes!
I defined a macro to add those stack-allocated strings to my struct:
#define AppendString(ppDest, pSource) \
AppendSubString(ppDest, (*ppDest)->len + 1, pSource, 0, sizeof(pSource) - 1)
Using sizeof(...) - 1 works quite well but I am wondering whether I could get rid of NUL termination in order to save a few bytes?
This is pretty awful, but you can explicitly specify the length of every character array constant:
char my_constant[6] = "foobar";
assert(sizeof my_constant == 6);
wchar_t wide_constant[6] = L"foobar";
assert(sizeof wide_constant == 6*sizeof(wchar_t));
I understand you're only dealing with strings declared in your program:
....
char str1[10];
char str2[12];
....
and not with text buffers you allocate with malloc() and friends otherwise sizeof is not going to help you.
Anyway, i would just think twice about removing the \0 at the end: you would lose the compatibility with C standard library functions.
Unless you are going to rewrite any single string function for your library (sprintf, for example), are you sure you want to do it?
I can't remember the details, but when I do
char my_constant[5]
it is possible that it will reserve 8 bytes anyway, because some machines can't address the middle of a word.
It's nearly always best to leave this sort of thing to the compiler and let it handle the optmisation for you, unless there is a really really good reason to do so.
If you're not using any of the Standard Library function that deal with strings you can forget about the NUL terminating byte.
No strlen(), no fgets(), no atoi(), no strtoul(), no fopen(), no printf() with the %s conversion specifier ...
Declare your "not quite C strings" with just the needed space;
struct NotQuiteCString { /* ... */ };
struct NotQuiteCString variable;
variable.data = malloc(5);
data[0] = 'H'; /* ... */ data[4] = 'o'; /* "hello" */
Indeed this is only in case you are really low in memory. Otherwise I don't recommend to do so.
It seems most proper way to do thing you are talking about is:
To prepare some minimal 'listing' file in a form of:
string1_constant_name "str1"
string2_constant_name "str2"
...
To construct utility which processes your file and generates declarations such as
const char string1_constant[4] = "str1";
Of course I'd not recommend to do this by hands, because otherwise you can get in trouble after any string change.
So now you have both non-terminated strings because of fixed auto-generated arrays and also you have sizeof() for every variable. This solution seems acceptable.
Benefits are easy localization, possibility to add some level of checks to make this solution risk lower and R/O data segment savings.
Drawback is need to include all of such string constants in every module (as include to keep sizeof() known). So this only makes sense if your linker merges such symbols (some don't).
Aren't these similar to Pascal-style strings, or Hollerith Strings? I think this is only useful if you actually want the String data to preserve NULLs, in which you're really pushing around arbitrary memory, not "strings" per se.
The question uses false assumptions - it assumes that storing the length (e.g. implicitly by passing it as a number to a function) incurs no overhead, but that's not true.
While one might save space by not storing the 0-byte (or wchar), the size must be stored somewhere, and the example hints that it is passed as a constant argument to a function somewhere, which almost certainly takes more space, in code. If the same string is used multiple times, the overhead is per use, not per-string.
Having a wrapper that uses strlen to determine the length of a string and isn't inlined will almost certainly save more space.

Resources