C - Splitting C-String into words without reallocating memory - c

I'm trying to split a string (const char*) into words and saving the individual words in an array of char-pointer (char**).
My problem is not the splitting part but that I'm not allowed to allocate any memory. I need to use the input string as my memory, but since its a const char* I'm not able to modify it.
My thirst thought was to change all whitespaces into '\0' and save the position of the beginning of the words in the array, which of course is not possible since the input string is const.
The declaration of the function looks like this:
int breakIntoWords(const char *line, int maxWords, char** words);
The function returns the number of words in line and maxWords is the size of the word-array.
Everything I found either used arrays as input strings or allocated memory with malloc.

There is no solution to the problem as posed. You can obtain a pointer to the start of each word, but in order to use the source string as the storage for separate word strings you must modify it by replacing delimiters with string terminators, as you considered doing.
If the task indeed supposes that you will alter the input line to use it for storage of several separate strings, then it seems that it is inherently incorrect for the function's line parameter to be const-qualified. Such qualification is inconsistent with the job the function is supposed to perform. Moreover, if you are supposed to assign pointers into the string pointed to by line into words, then the fact that words is not const-qualified also presents a conflict.
The only plausible solution I see to the problem described is to remove the const qualifier from your line parameter.

Related

Concatenate strings without using string functions: Replacing the end of string (null character) gives seg fault

Without using the string.h functions (want to use only the std libs), I wanted to create a new string by concatenating the string provided as an argument to the program. For that, I decided to copy the argument to a new char array of larger size and then replace the end of the string by the characters I want to append.
unsigned int argsize=sizeof(argv[1]);
unsigned char *newstr=calloc(argsize+5,1);
newstr=argv[1]; //copied arg string to new string of larger size
newstr[argsize+4]=oname[ns]; //copied the end-of-string null character
newstr[argsize]='.'; //this line gives seg fault
newstr[argsize+1]='X'; //this executes without any error
I believe there must be another more secure way of concatenating string without using string functions or by copying and appending char by char into a new char array. I would really want to know such methods. Also, I'm curious to know what is the reason of this segfault.
Read here: https://stackoverflow.com/a/164258/1176315 and I guess, the compiler is making my null character memory block read only but that's only a guess. I want to know the real reason behind this.
I will appreciate all your efforts to answer the question. Thanks.
Edit: By using std libs only, I mean to say I don't want to use the strcpy(), strlen(), strcat() etc. functions.
Without using the string.h functions (want to use only the std libs)
string.h is part of the standard library.
unsigned int argsize=sizeof(argv[1]);
This is wrong. sizeof does not tell you the length of a C string, it just tell you how big is the type of its argument. argv[1] is a pointer, and sizeof will just tell you how big a pointer is on your platform (typically 4 or 8), regardless of the actual content of the string.
If you want to know how long is a C string, you have to examine its characters and count until you find a 0 character (which incidentally is what strlen does).
newstr=argv[1]; //copied arg string to new string of larger size
Nope. You just copied the pointer stored in argv[1] to the variable newstr, incidentally losing the pointer that calloc returned to you previously, so you have also a memory leak.
To copy a string from a buffer to another you have to copy its characters one by one until you find a 0 character (which incidentally is what strcpy does).
All the following lines are thus operating on argv[1], so if you are going out of its original bounds anything can happen.
I believe there must be another more secure way of concatenating string without using string functions or by copying and appending char by char into a new char array.
C strings are just arrays of characters, everything boils down to copying/reading them one at time. If you don't want to use the provided string functions you'll end up essentially reimplementing them yourself. Mind you, it's a useful exercise, but you have to understand a bit better what C strings are and how pointers work.
First of all sizeof(argv[1]) will not return the length of the string you need to count the number of characters in the string using loops or using standard library function strlen().second if you want to copy the string you need to use strcpy() function.
You supposed to do like this:
unsigned int argsize=strlen(argv[1]); //you can also count the number of character
unsigned char *newstr=calloc((argsize+5),1);
strcpy(newstr,argv[1]);
newstr[argsize+4]=oname[ns];
newstr[argsize]='.';
newstr[argsize+1]='X';

difference between strlen(string) and strlen( *string)

Let's say I have an array of strings that are all of same size.
char strings[][MAX_LENGTH];
what would be the difference between strlen(strings) and strlen(*strings)?
I know that strings by itself would be the address of the first string in the array,
but what is *strings?
First, don't do this. C will allow you to do lots of things that are a bad idea. This doesn't mean you ought to do it. :)
While you may have compiler warnings, these two are effectively identical. The reason is that with this definition:
char strings[][MAX_LENGTH];
The allocation for this will end up being one continuous block. Within that block of memory, there are no "structures" or management devices that can be used to identify where individual strings start and stop. This creates an interesting situation.
Effectively, *string and string are both pointers to precisely the same memory location. This means that calling strlen on either one of them will return the null delimited string length of the first element in the first array.
However, I must reiterate... Don't do this.

C - is there a way to work with strings which have NULL character in the middle

Is it possible to have strings with NULL character somewhere except the end and work with them? Like get their size, use strcat, etc?
I have some ideas:
1) Write your own function for getting length (or something else), which is going to iterate over a string. If it meets a NULL char, it is going to check the next char of the string. If it is not NULL - continue counting chars. But it may (and WILL!) eventually lead to situation when you are reading memory OUTSIDE of the char array. So it is a bad idea.
2) Use sizeof(array)/sizeof(type), eg sizeof(input)/sizeof(char). That is going to work pretty good I think.
Do you have any other ideas on how this can be done? Maybe there are some function which I am not aware of (C newbie alert :))?
The only really safe method I can think of is to use "Pascal"-type strings (that is, something that has a string header and assorted other data associated with it).
Something like this:
typedef struct {
int len, allocated;
char *data;
} my_string;
You would then have to implement pretty much every string manipulation function yourself. Keeping both the "length of the string" and "the size of the allocation" allows you to have an allocation that's larger than the current contents, this may make repeated string concatenation cheaper (allows an amortized O(1) append).
You can have an array of char, either statically or dynamically allocated, that contains a zero byte in the middle, but only the part up to and including the zero can be considered a "string" in the standard C sense. Only that part will be recognized or considered by the standard library's string functions.
You can use a different terminator -- say two zeroes in a row -- and write your own string functions, but that just pushes off the problem. What happens when you need two zeroes in the middle of your string? In any case, you need to exercise even more care in this case than in the ordinary string case to ensure that your custom strings are properly terminated. You also have to be certain to avoid using them with the standard string functions.
If your special strings are stored in char array of known size then you can get the length of the overall array via sizeof, but that doesn't tell you what portion of the array contains meaningful data. It also doesn't help with any of the other string functions you might want to perform, and it does nothing for you if your handle on the pseudo-strings is a char *.
If you are contemplating custom string functions anyway, then you should consider string objects that have an explicit length stored with them. For example:
struct my_string {
unsigned allocated, length;
char *contents;
};
Your custom functions then handle objects of that type, being certain to do the right thing with the length member. There is no explicit terminator, so these strings can contain any char value. Also, you can be certain not to mixed these up with standard strings.
As long as you store the length of the array of chars then you can have strings with nul characters or even without a terminating nul.
struct MyString
{
int length;
char* buffer;
};
And then you would have to write all your equivalent functions for managing the string.
The bstring library http://bstring.sourceforge.net and Microsofts BSTR (uses wide chars) are existing libraries that work in this way and also offer some compatibilty with c-style strings.
pros - getting the length of the string is quick
cons - the strings need to be dynamically allocated.

C Programming: Find Length of a Char* with Null Bytes

If I have a character pointer that contains NULL bytes is there any built in function I can use to find the length or will I just have to write my own function? Btw I'm using gcc.
EDIT:
Should have mentioned the character pointer was created using malloc().
If you have a pointer then the ONLY way to know the size is to store the size separately or have a unique value which terminates the string. (typically '\0') If you have neither of these, it simply cannot be done.
EDIT: since you have specified that you allocated the buffer using malloc then the answer is the paragraph above. You need to either remember how much you allocated with malloc or simply have a terminating value.
If you happen to have an array (like: char s[] = "hello\0world";) then you could resort to sizeof(s). But be very careful, the moment you try it with a pointer, you will get the size of the pointer, not the size of an array. (but strlen(s) would equal 5 since it counts up to the first '\0').
In addition, arrays decay to pointers when passed to functions. So if you pass the array to a function, you are back to square one.
NOTE:
void f(int *p) {}
and
void f(int p[]) {}
and
void f(int p[10]) {}
are all the same. In all 3 versions, p is a pointer, not an array.
How do you know where the string ends, if it contains NULL bytes as part of it? Certainly no built in function can work with strings like that. It'll interpret the first null byte as the end of the string.
If you want the length, you'll have to store it yourself. Keep in mind that no standard library string functions will work correctly on strings like these.
You'll need to keep track of the length yourself.
C strings are null terminated, meaning that the first null character signals the end of the string. All builtin string functions rely on this, so if you have a buffer that can contain NULLs as part of the data then you can't use them.
Since you're using malloc then you may need to keep track of two sizes: the size of your allocated buffer, and how many characters within that buffer constitute valid data.

string manipulation without alloc mem in c

I'm wondering if there is another way of getting a sub string without allocating memory. To be more specific, I have a string as:
const char *str = "9|0\" 940 Hello";
Currently I'm getting the 940, which is the sub-string I want as,
char *a = strstr(str,"9|0\" ");
char *b = substr(a+5, 0, 3); // gives me the 940
Where substr is my sub string procedure. The thing is that I don't want to allocate memory for this by calling the sub string procedure.
Is there a much easier way?, perhaps by doing some string manipulation and not alloc mem.
I'll appreciate any feedback.
No, it can't be done. At least, not without modifying the original string and not without departing from the usual C concept of what a string is.
In C, a string is a sequence of characters terminated by a NUL (a \0 character). In order to obtain from "9|0\" 940 Hello" the substring "940", there would have to be a sequence of characters 9, 4, 0, \0 somewhere in memory. Since that sequence of characters does not exist anywhere in your original string, you would have to modify the original string.
The other option would just be to use a pointer into the original string at the place where your desired substring starts, and then also remember how long your substring is supposed to be in lieu of having the terminating \0 character. However, all C standard library functions that work on strings (and pretty much all third party C libraries that work with strings) expect strings to be NUL-terminated, and so won't accept this pointer-and-count format.
Try this:
char *mysubstr(char *dst, const char *src, const char *substr, size_t maxdst) {
... do substr logic, but stick result in dst respecting maxdst ...
}
Basically, punt and let the caller allocate space on the stack via:
char s[100];
Or something.
A C string is simply an array of chars in memory. If you want to access the substring without allocating a copy of the characters, you can simply access it directly:
char *b = a[5];
The problem with this approach is that b will not be null-terminated to the appropriate length. It would essentially be a pointer to the string: "940 hello".
If that doesn't matter to the code that uses b, then you are good to go. Keep in mind, however, that this would probably surprise other programmers later on in the product lifetime (including yourself)!
As xyld, suggested, you could let the caller allocate the memory and pass your substr function a buffer to fill; though, strictly speaking, that still involves "allocating memory".
Without allocating any memory at all, the only way you'd be able to do this would be by modifying the original string by changing the character after the substring to a '\0', but of course then your function couldn't take a const char * anymore, and you're modifying the original string, which may not be desirable.
If you don't require a \0 terminated string you can make a substring finding function that just tells you where in the full string (haystack) your partial string (needle) is. This would be considered a hot-copy or alias as the data could be changed by changes to the full string (haystack).
I was writing up a long thing on how to allocate memory using alloca and implement a macro (because it wouldn't work as a function) that would do what you want, but just happened to run across strndupa which is like strndup except allocates the memory on the stack rather than from the heap. It's a GNU extension, so it might not be available for you.
Writing your own macro that would look like a function because it needs to return a value but also work on the memory, but it is possible.

Resources