Copying a string with nulls inside - c

I want to copy a string in C (Windows) that contains nulls in it. I need a function to which I will pass buffer length so that the NULL characters will be meaningless. I found StringCbCopy function but it still stops at the first NULL character.

Since you know the length, use memcpy().

Here is a quick bit of code that may help:
char array1[5] = "test", array2[5];
int length = 5;
memcpy(array2, array1, length*sizeof(char));
//the sizeof() is redundant in this because each char is a byte long
//but it is useful if you are working with other datatypes
memcpy probably will become your best friend for situations like this.

It should be very easy to write your own function to do this. If you know the length of the string, just create a char[] or char* with the specified length, and copy characters one by one.

Related

Concatenate strings without using string functions: Replacing the end of string (null character) gives seg fault

Without using the string.h functions (want to use only the std libs), I wanted to create a new string by concatenating the string provided as an argument to the program. For that, I decided to copy the argument to a new char array of larger size and then replace the end of the string by the characters I want to append.
unsigned int argsize=sizeof(argv[1]);
unsigned char *newstr=calloc(argsize+5,1);
newstr=argv[1]; //copied arg string to new string of larger size
newstr[argsize+4]=oname[ns]; //copied the end-of-string null character
newstr[argsize]='.'; //this line gives seg fault
newstr[argsize+1]='X'; //this executes without any error
I believe there must be another more secure way of concatenating string without using string functions or by copying and appending char by char into a new char array. I would really want to know such methods. Also, I'm curious to know what is the reason of this segfault.
Read here: https://stackoverflow.com/a/164258/1176315 and I guess, the compiler is making my null character memory block read only but that's only a guess. I want to know the real reason behind this.
I will appreciate all your efforts to answer the question. Thanks.
Edit: By using std libs only, I mean to say I don't want to use the strcpy(), strlen(), strcat() etc. functions.
Without using the string.h functions (want to use only the std libs)
string.h is part of the standard library.
unsigned int argsize=sizeof(argv[1]);
This is wrong. sizeof does not tell you the length of a C string, it just tell you how big is the type of its argument. argv[1] is a pointer, and sizeof will just tell you how big a pointer is on your platform (typically 4 or 8), regardless of the actual content of the string.
If you want to know how long is a C string, you have to examine its characters and count until you find a 0 character (which incidentally is what strlen does).
newstr=argv[1]; //copied arg string to new string of larger size
Nope. You just copied the pointer stored in argv[1] to the variable newstr, incidentally losing the pointer that calloc returned to you previously, so you have also a memory leak.
To copy a string from a buffer to another you have to copy its characters one by one until you find a 0 character (which incidentally is what strcpy does).
All the following lines are thus operating on argv[1], so if you are going out of its original bounds anything can happen.
I believe there must be another more secure way of concatenating string without using string functions or by copying and appending char by char into a new char array.
C strings are just arrays of characters, everything boils down to copying/reading them one at time. If you don't want to use the provided string functions you'll end up essentially reimplementing them yourself. Mind you, it's a useful exercise, but you have to understand a bit better what C strings are and how pointers work.
First of all sizeof(argv[1]) will not return the length of the string you need to count the number of characters in the string using loops or using standard library function strlen().second if you want to copy the string you need to use strcpy() function.
You supposed to do like this:
unsigned int argsize=strlen(argv[1]); //you can also count the number of character
unsigned char *newstr=calloc((argsize+5),1);
strcpy(newstr,argv[1]);
newstr[argsize+4]=oname[ns];
newstr[argsize]='.';
newstr[argsize+1]='X';

Nicest syntax to initialise a char array with a string, ignoring the null terminator

I often want to do something like this:
unsigned char urlValid[66] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~";
...or:
unsigned char listOfChar[4] = "abcd";
...that is, initialize a character array from a string literal and ignoring the null terminator from that literal. It is very convenient, plus I can do things like sizeof urlValid and get the right answer.
But unfortunately it gives the error initializer-string for array of chars is too long.
Is there a way to either:
Turn off errors and warnings for this specific occurrence (ie, if there's no room for null terminator when initialising a char array)
Do it better, maintaining convenience and readability?
You tagged your question as both C and C++. In reality in C language you would not receive this error. Instead, the terminating zero simply would not be included into the array. I.e. in C it works exactly as you want it to work.
In C++ you will indeed get the error. In C++ you'd probably have to accommodate the terminating zero and remember to subtract 1 from the result of sizeof.
Note also, that as #Daniel Fischer suggested in the comments, you can "decouple" definition from initialization
char urlValid[66];
memcpy(urlValid, "ab...", sizeof urlValid);
thus effectively simulating the behavior of C language in C++.
Well, in C++ you should always use std::string. It's convenient and not prone to memory leaks etc.
You can, however, initialize an array without specifying the size:
char urlValid[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~";
This works since the compiler can deduce the correct size from the string literal. Another advantage is that you don't have to change the size if the literal changes.
Edit:You should not use unsigned char for strings.
Initialise with an actual array of chars?
char urlValid[] = {'a','b','c','d','e','f',...};
there are two simple solutions.
You can either add an extra element into the array like this:
unsigned char urlValid[67] =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~";
or leave out the size of the array all together:
unsigned char urlValid[] =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~";

C String Passing

If I am reading a C string such as:
char myData[100];
and I want to process this data and produce a copy out of it, so my code looks like:
char myData[100], processedData[50];
loop
fill myData from file...
setProcessedData(myData, processedData);
store processedData to file...
where setProcessedData is a function that returns a processed string. let's say for simplicity it returns a substring
void setProcessedData (char *myData, char *processedData) {
memCopy( processedData, myData, 5);
}
Is what I am doing something wrong? Like creating extra objects/strings? is there a better way to do it?
Let's say I read string from file which contains *
I am* A T*est String* But Ho*w to Process*.
I want to get substring which has the first 3 s. So my processedData
I am A t*est String*
and I want to do this for all lines of a file as efficient as possible.
Thanks
Problem is that your function is inherently unsafe, this because you make assumption about the allocated memory by parameters you pass to the function.
If someone is going to call setProcessedData by passing a string which is shorter than 5 bytes then bad things will happen.
In addition you are copying memory with memcpy by using a raw dimension, a safer approach, even if it is quite picky in this situation, is to use sizeof(char)*5.
Best thing you can do, though, is to follow the same approach used by safer functions of standard library like strcpy vs strncpy: you pass a third parameter which is the maximum length that should be copied, eg:
void processData(const char *data, char *processedData, unsigned int length) {
memcpy(processedData,data,length*sizeof(char));
}
I think you can improve your code:
Making the input string pointer const (i.e. const char* myData), to mark that myData is an input string and its content is not modified by the function.
Pass the size of the destination buffer, so in your function you can do proper checks and avoid buffer overruns (a security enemy).

Assigning a string to a 2D array

char in[100], *temp[10],var[10][10];
int i, n = 0,
double val[10];
var[0][]="ANS";
I want to assign a string to var[0][0,1,2] which is 'ANS', but does not work and i cannot figure where i am wrong about this
Perhaps instead using,
strncpy(var[0], "ANS", 3);
You have sort of answered your own question. You want to assign var[0][0,1,2,3] to "ANS" right? Well "ANS" is an array of characters, ans[0,1,2,3] (don't forget the null terminator). So you have to assign each one individually. In C strings aren't a data type, they are just an array of other variables (chars to be exact). What you can do instead is:
strcpy(var[0], "ANS");
Which will do the byte-by-byte copy for you.
There are some pitfalls to strcpy, however. First, the destination char array (var[0] in this case) must be large enough to contain the string. It will not check this for you (it can't, actually) so if you are not careful you can cause a buffer overflow. Also, the source must be NULL terminated.
When you write
var[0][] = "ANS"
Compiler tries to assign "ANS" to var[0][0] which is a place for only one char.
Therefore, you should use strcpy function. strcpy will copy char-by-char.

string manipulation without alloc mem in c

I'm wondering if there is another way of getting a sub string without allocating memory. To be more specific, I have a string as:
const char *str = "9|0\" 940 Hello";
Currently I'm getting the 940, which is the sub-string I want as,
char *a = strstr(str,"9|0\" ");
char *b = substr(a+5, 0, 3); // gives me the 940
Where substr is my sub string procedure. The thing is that I don't want to allocate memory for this by calling the sub string procedure.
Is there a much easier way?, perhaps by doing some string manipulation and not alloc mem.
I'll appreciate any feedback.
No, it can't be done. At least, not without modifying the original string and not without departing from the usual C concept of what a string is.
In C, a string is a sequence of characters terminated by a NUL (a \0 character). In order to obtain from "9|0\" 940 Hello" the substring "940", there would have to be a sequence of characters 9, 4, 0, \0 somewhere in memory. Since that sequence of characters does not exist anywhere in your original string, you would have to modify the original string.
The other option would just be to use a pointer into the original string at the place where your desired substring starts, and then also remember how long your substring is supposed to be in lieu of having the terminating \0 character. However, all C standard library functions that work on strings (and pretty much all third party C libraries that work with strings) expect strings to be NUL-terminated, and so won't accept this pointer-and-count format.
Try this:
char *mysubstr(char *dst, const char *src, const char *substr, size_t maxdst) {
... do substr logic, but stick result in dst respecting maxdst ...
}
Basically, punt and let the caller allocate space on the stack via:
char s[100];
Or something.
A C string is simply an array of chars in memory. If you want to access the substring without allocating a copy of the characters, you can simply access it directly:
char *b = a[5];
The problem with this approach is that b will not be null-terminated to the appropriate length. It would essentially be a pointer to the string: "940 hello".
If that doesn't matter to the code that uses b, then you are good to go. Keep in mind, however, that this would probably surprise other programmers later on in the product lifetime (including yourself)!
As xyld, suggested, you could let the caller allocate the memory and pass your substr function a buffer to fill; though, strictly speaking, that still involves "allocating memory".
Without allocating any memory at all, the only way you'd be able to do this would be by modifying the original string by changing the character after the substring to a '\0', but of course then your function couldn't take a const char * anymore, and you're modifying the original string, which may not be desirable.
If you don't require a \0 terminated string you can make a substring finding function that just tells you where in the full string (haystack) your partial string (needle) is. This would be considered a hot-copy or alias as the data could be changed by changes to the full string (haystack).
I was writing up a long thing on how to allocate memory using alloca and implement a macro (because it wouldn't work as a function) that would do what you want, but just happened to run across strndupa which is like strndup except allocates the memory on the stack rather than from the heap. It's a GNU extension, so it might not be available for you.
Writing your own macro that would look like a function because it needs to return a value but also work on the memory, but it is possible.

Resources