Is it possible to have a string ignore null terminate chars - c

I have a function that gets passed an array of chars, or a string. I use this array for data array and thus has a lot of random characters in it including NULL chars. My problem comes in when I am trying to retrieve this data the compiler sees and the Null char and thinks the string ends. Thereby effectively throwing out all the data after that. Is there an option where I can somehow make an array that is not ended by a Null char?

A C string and an array of char is not the same thing. The first is implemented by means of the second with the additional convention that the string ends where the array has the first 0 element.
So what you need is just an unsigned char[something] and you'd have to keep track of the length that you want to have separately. Then also you shouldn't use strcpy or similar functions but memcpy etc.

The null ('\0') terminator is treated as the string terminator in C. So You need to tell the compiler exactly how much data to read, why don't you maintain a separate count for the size of the data and then use functions which use that size to operate on the data?

Firstly, a string in C language is not some sort of "black box" object that can somehow choose to ignore or not ignore something out of its own will. It is based on a mere raw array of chars, of which you have full unrestricted control. This means that it is really you who chooses how to process the data stored in that array of chars. Not the compiler, not the string itself, but you and only you.
Secondly, a string in C language is defined as a sequence of characters ending with zero character. This immediately means that if you attempt using string-specific functions with your array, they will always stop at zeros. If you want your data to contain embedded zeros, then you should not call it "strings" and you should not use any string-specific functions with it. So, forget about strcmp, strcpy and such. Again, it is something you are responsible for, not the compiler.
Thirdly, the functions you would use with such data would typically be functions like memcpy for copying, memcmp for comparison and so on. Anything that's missing you'll have to implement yourself. And since you no longer have any terminating characters in your data, it is your responsibility to know where the data begins and where it ends.

Related

How to read input line at the end character or at '\n' character?

How can I read in C a line from the console to not initialize a some array like int buf[30];? I need to allocate it once and required length, i.e. to I can know input characters count before read it...
Is it possible in C?
There is no way to know the number of characters available in standard input before reading them. You can, however, use getline(3), which will read until \n and then return the data in a dynamically-allocated buffer holding the data (along with the size of that buffer). You must free the buffer when you're done with it.
You should be aware that this will routine will block until it reads a newline. It's also difficult to use this routine safely, as malformed inputs are not handled well. (What if the input has no newline?) This is one of the reasons many applications often read a fixed length input.
I suspect that what you are requesting is about dynamic memory.
With dynamic memory we can create arrays with dynamic capacity, so the number of slots inside can be varied on run-time. That way, you don't need to decide at coding the size of a particular array.
To generate that kind of dynamic array you will need to create a pointer referring to a space in memory.
int *array;
Once we have that connection between memory and a variable, we now need to set how much memory do we want(how many slots inside the array).
array = (int *)malloc(sizeof(int) * numberOfSlots);
This function malloc its provided by an external library called stdlib.h.
It will request the computer for a space in the memory. That space is defined inside those brackets (). There you set the number of bytes you want to request.
If what we want is an array of integers, we multiply the size of an integer with the slots we need.
To access or modify data inside an array, you can keep it simple by using [], like this:
array[0] = 1;
Important note: Never access or modify data inside an array without requesting memory before!
To read the numbers of chars in a line, you can simple use a loop, and read letter by letter until you find that '/n' character.

Difference between methods to create a character array

I am curious about the different methods to create a character array in C. Let's say we want to create a character array holding the string "John Smith". We could either initialize the array by supplying the number of elements explicitly, i.e.
char entireName[11] = "John Smith";
where there are four spaces for characters J-o-h-n, one for the space, five for S-m-i-t-h, and one for the string terminator \0.
You could also do the above by simply typing
char entireName[] = "John Smith";
Will there be a large difference in who these two character arrays are compiled? Is the same amount of memory allocated for the two expressions, and executed at the same speed?
What really is the difference?
Both are same, but the second one is advisable.
In case you're leaving out the size of the array during definition and initialization, the compiler will allocate proper size required. This is less error prone, compared to the definition with a fixed size as sometimes
we may forget to reserve the space for null-terminator \0.
we may supply an initializer string more than that of the size specified.
The fact remains, with proper warnings enable, you'll get an warning if you do the above, but with the second approach, theses scenarios will not arise, so less worries.
EDIT:
FWIW, in the second scenario, the array length will be decided based on the supplied initializer string length. As we know, compiler time strings cannot be resized at runtime, so that's the only possible limitation of the second approach. If, at a later part, you want the array to hold something bigger than that of the supplied initializer string, the second approach is not suitable.
The two versions are basically identical as given in the question.
However, as the array is not const, you apparently intend to change it, so the string literal is just to initialize it. In this case, giving the maximum size for the array should strongly be considered.
The size allocated is the same for both cases of this example (the compiler calculates the size from the string literal and appends '\0').
However, if you you intend to store a longer string into the array later, the version char entireName[] = "John Smith"; will result in _undefined behaviour(UB, **anything** can happen). This because the compiler only allocates the size required by the string literal (plus'\0'), but does not know you need more during execution. In theses case, always use the explicit form[]`.
Warning: If the size of the string literal exactly matches the given size of the array, you might not be warned (tested with gcc 4.8.2 -Wall -Wextra: no warning) that the implictit '\0' cannot be stored. So, use that with caution! I suspect some legacy reasons for this being legal (it was in pre-ANSI K&C-C actually), possibly conserve RAM or packing. However, if the string litera as given does not fit, gcc does warn, if you enable most warnings (for gcc, see above).
For a const array always use the second version, as that is easier and even more explicitly stating that you want the size of the given string literal. Without being able to change the value lateron, nothing is gained in giving an explicit size, but (see above) some safety is lost.
There's no difference between the two as you specify the same size that the compiler would allocate otherwise.
However, if you explicitly specify the size and is less than the size of the string literal that you intend to copy, for example,
char entireName[8] = "John Smith";
then only 8 chars will be copied and rest will be discarded and there won't a 0 terminator either. This is not what you would want to do in most cases. For this reason, it's always better to let the compiler do it.

Char array size when using certain library functions

When using some library functions (e.g. strftime(), strcpy(), MultiByteToWideChar()) that deal with character arrays (instead of std::string's) one has 2 options:
use a fixed size array (e.g. char buffer[256];) which is obviously bad because of the string length limit
use new to allocate required size which is also bad when one wants to create a utility function like this:
char * fun(void)
{
char * array = new char[exact_required_size];
some_function(array);
return array;
}
because the user of such function has to delete the array.
And the 2nd option isn't even always possible if one can't know the exact array size/length before using the problematic function (when one can't predict how long a string the function will return).
The perfect way would be to use std::string since it has variable length and its destructor takes care of deallocating memory but many library functions just don't support std::string (whether they should is another question).
Ok, so what's the problem? Well - how should I use these functions? Use a fixed size array or use new and make the user of my function worry about deallocating memory? Or maybe there actually is a smooth solution I didn't think of?
You can use std::string's data() method to get a pointer to a character array with the same sequence of characters currently contained in the string object. The character pointer returned points to a constant, non-modifiable character array located somewhere in internal memory. You don't need to worry about deallocating the memory referenced by this pointer as the string object's destructor will do so automatically.
But as to your original question: depends on how you want the function to work. If you're modifying a character array that you create within the function, it sounds like you'll need to allocate memory on the heap and return a pointer to it. The user would have to deallocate the memory themselves - there are plenty of standard library functions that work this way.
Alternatively, you could force the user to pass in character pointer as a parameter, which would ensure they've already created the array and know that they will need to deallocate the memory themselves. That method is used even more often and is probably preferable.

Is it more secure to add a length specifier to a printf call

My question is from a security perspective. I'm in the process of cleaning up some code and I've found the current code has a uncontrolled format string vulnerability; it's passing a string to printf() something like:
void print_it(char * str)
{
printf(str);
which is obviously considered insecure programming, even gcc will typically ding you with at least some sort of warning:
warning: format not a string literal and no format arguments
Now to "fix" the issue we can make sure what you're getting is treated as a string
printf("%s", str);
But I was wondering if there's any... additional security in using a length specificer as well. Something like:
printf("%.*s", (int)sizeof(str), str);
I can't think of any reason why that would be more secure, but it wouldn't surprise me if I was missing something obvious here.
Sort of, but not to the extent that modern C shops guard their printf statements this safely. That is used when you are handling non-null terminated strings, which is very common when interacting with Fortran code.
Strictly speaking it would be a security gain in order to guard against runaway reads, perhaps you were about to printf sensitive data following a breached null character.
But
printf("%.*s", (int)sizeof(str), str);
is far worse; you just said "ignore the null character and print out the full contents anyway." Or rather, it's worse unless you're dealing with space-padded strings all the way to their memory's end, which is likely the case if the string came from Fortran.
This however is extremely important:
printf("%s", str);
as printf(str) is a major security flaw. Read about printf attacks using the %n specifier, which writes.
There's some additional security in
printf("%.*s", (int)sizeof(str), str);
since it will print at most sizeof(char*) bytes - usually four or eight - so it won't go and read much of the memory if str points to a char array that is not 0-terminated.
But more typically, it will cut the output short without a good reason to do so.
If you meant
printf("%.*s", (int)strlen(str), str);
that is entirely pointless, since in the cases where a precision for the printf would be necessary, the strlen call will do the same invalid memory accesses.
This idea won't work when the array is passed to a function as arrays decay into a pointer or for any malloc'ed pointer for that matter. Because sizeof(var) is going give the size of the pointer, not the array. So it can't be used in the printf() to specify the length.
This is only applicable to automatic (stack allocated) arrays. So when can this be useful then? I can think of two cases:
1. when you write into array more than the size of the array.
In this case, you have already caused undefined behaviour by writing somewhere that doesn't belong to you. End of story.
2. when you don't have a null-byte at the end of the string (but not crossed the boundary of the array).
In this case, the array has some valid content but not the null byte.. Two possibilities here:
2.a. Using the length specifier is going to print the whole content of the array. So if you access uninitialized bytes in the array (even within its size), it's still going to cause undefined behaviour. Otherwise you have to track the length of the valid content in order to use %s in the printf() along with the length specifier to avoid UB. In this case, you already know the length of the valid content. Hence you can simply null-terminate it yourself rather telling printf() to print only the valid content.
2.b. Let's say, you have initialized the whole array at the beginning with zeros. In the case, the array is going to be a valid string and hence no need of the length modifier.
So I'd say it's not of much use.

What do I have to pay attention to when copying strings?

On which factors do I have to pay attention when copying strings in C? And what can go wrong?
I can think of I need to reserve sufficient memory before copying a string and have to make sure I have enough privileges to write to the memory (try to avoid General Protection Faults), but is there anything else I have to pay attention on when copying strings? Are there any additional potential errors?
Make sure that you have sufficient buffer space in the destination (i.e. know ahead how many bytes you're copying). This also means making sure that the source string is properly terminated (by a null character, i.e. 0 = '\0').
Make sure that the destination string gets properly terminated.
If your application is character-set aware, bear in mind that some character sets can have embedded null characters, while some can have variable-length characters.
C strings are conventionally arrays of non-zero bytes ended by a zero byte. Routines dealing with them get a pointer to a byte (that it, to the array starting at that byte). You should take care that the reserved place fits. Learn about strcpy and strcat and their bounded counter-parts strncpy and strncat. Also about strdup. And don't forget to clear the terminating byte to zero (in particular, when reaching the bound with strncpy etc...). Some libraries (e.g. Glib from GTK) provide nice utility functions (e.g. g_strdup_printf) building strings.
Read the documentation of all the functions I mentioned.
You need to:
allocate sufficient memory for destination (strlen(source) + 1)
make sure source is not NULL
deal with unicode issues (string length might be less than length in
bytes)
You have to pay attention to the definition of a string in C: they are a sequence of characters ended with a null terminating character ('\0').
All the functions in string.h will work with this assumption. Work with those functions and you should be fine.

Resources