I'm implementing a CMap in C, and part of this entails storing information in a linked-list type of structure that I manually manage the memory of. So the first 4 bytes of this struct is a pointer to the next struct, the next section is the string (key), and the final section is the value.
Say void *e = ptr defines one such linked list.
Then, ptr + 4 refers to the beginning of the string section.
I want to assign that string value to another string, and what I've done so far is:
char *string = (char *)ptr + 4;
However, I don't think this is right.
If you want to point to the same string your code is fine, assuming pointers are always 4 bytes wide.
If you want to copy the contents of the string use malloc and strcpy to create a new string.
Just reference struct instead of calculating offsets.
//if data is structured this way
struct struct_list_el
{
struct list_el * next;
char* str;
int value;
};
typedef struct struct_list_el list_el;
// than from void_pointer
list_el* el;
el = (list_el*) void_pointer;
char * string;
string = el->str;
#ralu is right that you should be using a struct. But you should also be very careful when copying strings. In C there is no first-class string object like in C++, Java, Python, and well, everything else. :)
In C, character pointers (char*) are often used as strings, but they are really just pointers to null-terminated arrays of bytes in memory somewhere. Copying a character pointer is not the same as copying the underlying array of characters. To do that, you need to provide memory for the characters of the copy. This memory can be on the stack (a local array), or the heap (created with malloc), or some other buffer.
You'll need to measure the length of the string before you do anything to make sure that the target buffer can hold it. Be sure to add one to the length so that there is room for the terminating null.
Also note that the standard library functions (strlen, strcpy, strncpy, strcat, snprintf, strdup, etc.) are slightly incompatible with each other regarding the terminating null. For example, strlen returns the number of characters, excluding the terminating null, so buffers need to be one byte larger than what it returns to hold things. Also, strncpy does not guarantee null termination while snprintf does. Misuse of these functions and C strings in general is the cause of a significant number of security breaches (not to mention bugs) in computer systems today.
Unless you build or use a solid library, string and list manipulation in C is tedious and error-prone. You can see why C++ and all those other languages were invented.
Related
I have a large string, where I want to use pieces of it but I don't want to necessarily copy them, so I figured I can make a structure that marks the beginning and length of the useful chunk from the big string, and then create a function that reads it.
struct descriptor {
int start;
int length;
};
So far so good, but when I got to writing the function I realized that I can't really return the chunk without copying into memory...
char* getSegment(char* string, struct descriptor d) {
char* chunk = malloc(d.length + 1);
strncpy(chunk, string + d.start, d.length);
chunk[d.length] = '\0';
return chunk;
}
So the questions I have are:
Is there any way that I can return the piece of string without copying it
If not, how can I deal with this memory leak, since the copy is in heap memory and I don't have control over who will call getSegment?
Answering your two questions:
No
The caller should provide buffer for the copied string
I would personally pass the pointer to the descrpiptor
char* getSegment(const char* string, const char *buff, struct descriptor *d)
Is there any way that I can return the piece of string without copying it
A string includes the terminating null character, so unless the part code wants is the tail, a pointer to a "piece of string" and still be a string, is not possible.
how can I deal with this memory leak, since the copy is in heap memory and I don't have control over who will call getSegment?
Create temporary space with a variable length array (since C99 and optional supported in C11). Good until the end of the block. At which point, the memory is released and should not be further used.
char* getSegment(char* string, struct descriptor d, char *dest) {
// form result in `dest`
return dest;
}
Usage
char *t;
{
struct descriptor des = bar();
char *large_string = foo();
char sub[des.length + 1u]; //VLA
t = getSegment(large_string, des, sub);
puts(t); // use sub or t;
}
// do not use `t` here, invalid pointer.
Recall size is of concern. If code is returning large sub-strings, best to malloc() a buffer and oblige the calling code to free it when done.
Is there any way that I can return the piece of string without copying it
You're right that if you want to use the chunks in conjunction with any of the many C functions that expect to work with null-terminated character arrays, then you have to make copies. Otherwise, adding the terminators modifies the original string.
If you're prepared to handle the chunks as fixed-length, unterminated arrays, however, then you can represent them without copying as a combination of a pointer to the first character and a length. Some standard library functions work with user-specified string lengths, thus supporting operations on such segments without null termination. You would need to be very careful with them, however.
If you take that approach, I would recommend colocating the pointer and length in a structure. For example,
struct string_segment {
char *start;
size_t length;
};
You could declare variables of this type, pass and return objects of this type, and create compound literals of this type without any dynamic memory allocation, thus avoiding opening any avenue for memory leakage.
If not, how can I deal with this memory leak, since the copy is in heap memory and I don't have control over who will call getSegment?
Returning dynamically-allocated objects does not automatically create a memory leak -- it merely confers a responsibility on the caller to free the allocated memory. It is when the caller fails to either satisfy that responsibility or pass it on to other code that a memory leak occurs. Several standard library functions indeed do return dynamically-allocated objects, and it's not so unusual in third-party libraries. The canonical example (other than malloc() itself) would probably be the POSIX-standard strdup() function.
If your function returns a pointer to a dynamically-allocated object -- whether a copied string, or a chunk definition structure -- then it should document the responsibility to free that falls on callers. You must ensure that you satisfy your obligation when you call it from your own code, but having clearly documented the function's behavior, you cannot take responsibility for errors other callers may make by failing to fulfill their obligations.
Is it possible to have strings with NULL character somewhere except the end and work with them? Like get their size, use strcat, etc?
I have some ideas:
1) Write your own function for getting length (or something else), which is going to iterate over a string. If it meets a NULL char, it is going to check the next char of the string. If it is not NULL - continue counting chars. But it may (and WILL!) eventually lead to situation when you are reading memory OUTSIDE of the char array. So it is a bad idea.
2) Use sizeof(array)/sizeof(type), eg sizeof(input)/sizeof(char). That is going to work pretty good I think.
Do you have any other ideas on how this can be done? Maybe there are some function which I am not aware of (C newbie alert :))?
The only really safe method I can think of is to use "Pascal"-type strings (that is, something that has a string header and assorted other data associated with it).
Something like this:
typedef struct {
int len, allocated;
char *data;
} my_string;
You would then have to implement pretty much every string manipulation function yourself. Keeping both the "length of the string" and "the size of the allocation" allows you to have an allocation that's larger than the current contents, this may make repeated string concatenation cheaper (allows an amortized O(1) append).
You can have an array of char, either statically or dynamically allocated, that contains a zero byte in the middle, but only the part up to and including the zero can be considered a "string" in the standard C sense. Only that part will be recognized or considered by the standard library's string functions.
You can use a different terminator -- say two zeroes in a row -- and write your own string functions, but that just pushes off the problem. What happens when you need two zeroes in the middle of your string? In any case, you need to exercise even more care in this case than in the ordinary string case to ensure that your custom strings are properly terminated. You also have to be certain to avoid using them with the standard string functions.
If your special strings are stored in char array of known size then you can get the length of the overall array via sizeof, but that doesn't tell you what portion of the array contains meaningful data. It also doesn't help with any of the other string functions you might want to perform, and it does nothing for you if your handle on the pseudo-strings is a char *.
If you are contemplating custom string functions anyway, then you should consider string objects that have an explicit length stored with them. For example:
struct my_string {
unsigned allocated, length;
char *contents;
};
Your custom functions then handle objects of that type, being certain to do the right thing with the length member. There is no explicit terminator, so these strings can contain any char value. Also, you can be certain not to mixed these up with standard strings.
As long as you store the length of the array of chars then you can have strings with nul characters or even without a terminating nul.
struct MyString
{
int length;
char* buffer;
};
And then you would have to write all your equivalent functions for managing the string.
The bstring library http://bstring.sourceforge.net and Microsofts BSTR (uses wide chars) are existing libraries that work in this way and also offer some compatibilty with c-style strings.
pros - getting the length of the string is quick
cons - the strings need to be dynamically allocated.
Hello I am new to this site, and I require some help with understanding what would be considered the "norm" while coding structures in C that require a string. Basically I am wondering which of the following ways would be considered the "industry standard" while using structures in C to keep track of ALL of the memory the structure requires:
1) Fixed Size String:
typedef struct
{
int damage;
char name[40];
} Item;
I can now get the size using sizeof(Item)
2) Character Array Pointer
typedef struct
{
int damage;
char *name;
} Item;
I know I can store the size of name using a second variable, but is there another way?
i) is there any other advantage to using the fixed size (1)
char name[40];
versus doing the following and using a pointer to a char array (2)?
char *name;
and if so, what is the advantage?
ii) Also, is the string using a pointer to a char array (2) going to be stored sequentially and immediately after the structure (immediately after the pointer to the string) or will it be stored somewhere else in memory?
iii) I wish to know how one can find the length of a char * string variable (without using a size_t, or integer value to store the length)
There are basically 3 common conventions for strings. All three are found in the wild, both for in-memory representation and storage/transmission.
Fixed size. Access is very efficient, but if the actual length varies you both waste space and need one of the below methods to determine the end of the "real" content.
Length prefixed. Extra space is included in the dynamically allocation, to hold the length. From the pointer you can find both the character content and the length immediately preceding it. Example: BSTR Sometimes the length is encoded to be more space efficient for short strings. Example: ASN-1
Terminated. The string extends until the first occurrence of the termination character (typically NUL), and the content cannot contain that character. Variations made the termination two NUL in sequence, to allow individual NUL characters to exist in the string, which is then often treated as a packed list of strings. Other variations use an encoding such as byte stuffing (UTF-8 would also work) to guarantee that there exists some code reserved for termination that can't ever appear in the encoded version of the content.
In the third case, there's a function such as strlen to search for the terminator and find the length.
Both cases which use pointers can point to data immediately following the fixed portion of the structure, if you carefully allocate it that way. If you want to force this, then use a flexible array on the end of your structure (no pointer needed). Like this:
typedef struct
{
int damage;
char name[]; // terminated
} Item;
or
typedef struct
{
int damage;
int length_of_name;
char name[];
} Item;
1) is there any other advantage to using the fixed size (1)
char name[40];
versus doing the following and using a pointer to a char array (2)?
char *name;
and if so, what is the advantage?
With your array declared as char name[40]; space for name is already allocated and you are free to copy information into name from name[0] through name[39]. However, in the case of char *name;, it is simply a character pointer and can be used to point to an existing string in memory, but, on its own, cannot be used to copy information to until you allocate memory to hold that information. So say you have a 30 character string you want to copy to name declared as char *name;, you must first allocate with malloc 30 characters plus an additional character to hold the null-terminating character:
char *name;
name = malloc (sizeof (char) * (30 + 1));
Then you are free to copy information to/from name. An advantage of dynamically allocating is that you can realloc memory for name if the information you are storing in name grows. beyond 30 characters. An additional requirement after allocating memory for name, you are responsible for freeing the memory you have allocated when it is no longer needed. That's a rough outline of the pros/cons/requirements for using one as opposed to the other.
If you know the maximum length of the string you need, then you can use a character array. It does mean though that you will be using more memory than you'd typically use with dynamically allocated character arrays. Also, take a look at CString if you are using C++. You can find the length of the character array using strlen. In case of static allocation I believe it will be a part of the variable. Dynamic can be anywhere on the heap.
If I have a character pointer that contains NULL bytes is there any built in function I can use to find the length or will I just have to write my own function? Btw I'm using gcc.
EDIT:
Should have mentioned the character pointer was created using malloc().
If you have a pointer then the ONLY way to know the size is to store the size separately or have a unique value which terminates the string. (typically '\0') If you have neither of these, it simply cannot be done.
EDIT: since you have specified that you allocated the buffer using malloc then the answer is the paragraph above. You need to either remember how much you allocated with malloc or simply have a terminating value.
If you happen to have an array (like: char s[] = "hello\0world";) then you could resort to sizeof(s). But be very careful, the moment you try it with a pointer, you will get the size of the pointer, not the size of an array. (but strlen(s) would equal 5 since it counts up to the first '\0').
In addition, arrays decay to pointers when passed to functions. So if you pass the array to a function, you are back to square one.
NOTE:
void f(int *p) {}
and
void f(int p[]) {}
and
void f(int p[10]) {}
are all the same. In all 3 versions, p is a pointer, not an array.
How do you know where the string ends, if it contains NULL bytes as part of it? Certainly no built in function can work with strings like that. It'll interpret the first null byte as the end of the string.
If you want the length, you'll have to store it yourself. Keep in mind that no standard library string functions will work correctly on strings like these.
You'll need to keep track of the length yourself.
C strings are null terminated, meaning that the first null character signals the end of the string. All builtin string functions rely on this, so if you have a buffer that can contain NULLs as part of the data then you can't use them.
Since you're using malloc then you may need to keep track of two sizes: the size of your allocated buffer, and how many characters within that buffer constitute valid data.
I know that I can copy the structure member by member, instead of that can I do a memcpy on structures?
Is it advisable to do so?
In my structure, I have a string also as member which I have to copy to another structure having the same member. How do I do that?
Copying by plain assignment is best, since it's shorter, easier to read, and has a higher level of abstraction. Instead of saying (to the human reader of the code) "copy these bits from here to there", and requiring the reader to think about the size argument to the copy, you're just doing a plain assignment ("copy this value from here to here"). There can be no hesitation about whether or not the size is correct.
Also, if the structure is heavily padded, assignment might make the compiler emit something more efficient, since it doesn't have to copy the padding (and it knows where it is), but mempcy() doesn't so it will always copy the exact number of bytes you tell it to copy.
If your string is an actual array, i.e.:
struct {
char string[32];
size_t len;
} a, b;
strcpy(a.string, "hello");
a.len = strlen(a.string);
Then you can still use plain assignment:
b = a;
To get a complete copy. For variable-length data modelled like this though, this is not the most efficient way to do the copy since the entire array will always be copied.
Beware though, that copying structs that contain pointers to heap-allocated memory can be a bit dangerous, since by doing so you're aliasing the pointer, and typically making it ambiguous who owns the pointer after the copying operation.
For these situations a "deep copy" is really the only choice, and that needs to go in a function.
Since C90, you can simply use:
dest_struct = source_struct;
as long as the string is memorized inside an array:
struct xxx {
char theString[100];
};
Otherwise, if it's a pointer, you'll need to copy it by hand.
struct xxx {
char* theString;
};
dest_struct = source_struct;
dest_struct.theString = malloc(strlen(source_struct.theString) + 1);
strcpy(dest_struct.theString, source_struct.theString);
If the structures are of compatible types, yes, you can, with something like:
memcpy (dest_struct, source_struct, sizeof (*dest_struct));
The only thing you need to be aware of is that this is a shallow copy. In other words, if you have a char * pointing to a specific string, both structures will point to the same string.
And changing the contents of one of those string fields (the data that the char * points to, not the char * itself) will change the other as well.
If you want a easy copy without having to manually do each field but with the added bonus of non-shallow string copies, use strdup:
memcpy (dest_struct, source_struct, sizeof (*dest_struct));
dest_struct->strptr = strdup (source_struct->strptr);
This will copy the entire contents of the structure, then deep-copy the string, effectively giving a separate string to each structure.
And, if your C implementation doesn't have a strdup (it's not part of the ISO standard), get one from here.
You can memcpy structs, or you can just assign them like any other value.
struct {int a, b;} c, d;
c.a = c.b = 10;
d = c;
In C, memcpy is only foolishly risky. As long as you get all three parameters exactly right, none of the struct members are pointers (or, you explicitly intend to do a shallow copy) and there aren't large alignment gaps in the struct that memcpy is going to waste time looping through (or performance never matters), then by all means, memcpy. You gain nothing except code that is harder to read, fragile to future changes and has to be hand-verified in code reviews (because the compiler can't), but hey yeah sure why not.
In C++, we advance to the ludicrously risky. You may have members of types which are not safely memcpyable, like std::string, which will cause your receiving struct to become a dangerous weapon, randomly corrupting memory whenever used. You may get surprises involving virtual functions when emulating slice-copies. The optimizer, which can do wondrous things for you because it has a guarantee of full type knowledge when it compiles =, can do nothing for your memcpy call.
In C++ there's a rule of thumb - if you see memcpy or memset, something's wrong. There are rare cases when this is not true, but they do not involve structs. You use memcpy when, and only when, you have reason to blindly copy bytes.
Assignment on the other hand is simple to read, checks correctness at compile time and then intelligently moves values at runtime. There is no downside.
You can use the following solution to accomplish your goal:
struct student
{
char name[20];
char country[20];
};
void main()
{
struct student S={"Wolverine","America"};
struct student X;
X=S;
printf("%s%s",X.name,X.country);
}
You can use a struct to read write into a file.
You do not need to cast it as a `char*.
Struct size will also be preserved.
(This point is not closest to the topic but guess it:
behaving on hard memory is often similar to RAM one.)
To move (to & from) a single string field you must use strncpy
and a transient string buffer '\0' terminating.
Somewhere you must remember the length of the record string field.
To move other fields you can use the dot notation, ex.:
NodeB->one=intvar;
floatvar2=(NodeA->insidebisnode_subvar).myfl;
struct mynode {
int one;
int two;
char txt3[3];
struct{char txt2[6];}txt2fi;
struct insidenode{
char txt[8];
long int myl;
void * mypointer;
size_t myst;
long long myll;
} insidenode_subvar;
struct insidebisnode{
float myfl;
} insidebisnode_subvar;
} mynode_subvar;
typedef struct mynode* Node;
...(main)
Node NodeA=malloc...
Node NodeB=malloc...
You can embed each string into a structs that fit it,
to evade point-2 and behave like Cobol:
NodeB->txt2fi=NodeA->txt2fi
...but you will still need of a transient string
plus one strncpy as mentioned at point-2 for scanf, printf
otherwise an operator longer input (shorter),
would have not be truncated (by spaces padded).
(NodeB->insidenode_subvar).mypointer=(NodeA->insidenode_subvar).mypointer
will create a pointer alias.
NodeB.txt3=NodeA.txt3
causes the compiler to reject:
error: incompatible types when assigning to type ‘char[3]’ from type ‘char *’
point-4 works only because NodeB->txt2fi & NodeA->txt2fi belong to the same typedef !!
A correct and simple answer to this topic I found at
In C, why can't I assign a string to a char array after it's declared?
"Arrays (also of chars) are second-class citizens in C"!!!