Can I store NULL in a string? - c

I want to perform some lengthy operations on a large file, which will involve lots and lots of seeking. (The current version of the program takes 5 hours and uses fseek at least 15,057,456 times.) As a result, I am hoping to load the file into the ram, and use char* instead of FILE*. Can I load null characters from the file into the char* array if I:
Malloc the char array, and store its length separately, and
Only use character operations on the array (i.e. newchar = *(pointertothearray+offset) ), avoiding operations like strcpy or strstr?

You can load the whole file in a dynamic char array (malloc'ed on the heap) even if there are null characters in it : a null character is a valid char.
But you cannot call it a string. A C string is from specification of language a null terminated char array.
So as long as you only use offsets, mem... functions and no str... functions, there is no problems having null characters in a char array.

You can load the entire file's contents into memory. Essentially this buffer will be a byte stream and not a string.

Related

Properties of strcpy()

I have a global definition as following:
#define globalstring "example1"
typedef struct
{
char key[100];
char trail[10][100];
bson_value_t value;
} ObjectInfo;
typedef struct
{
ObjectInfo CurrentOrderInfoSet[5];
} DataPackage;
DataPackage GlobalDataPackage[10];
And I would like to use the strcpy() function in some of my functions as following:
strcpy(GlobalDataPackage[2].CurrentOrderInfoSet[0].key, "example2");
char string[100] = "example3";
strcpy(GlobalDataPackage[2].CurrentOrderInfoSet[0].key, string);
strcpy(GlobalDataPackage[2].CurrentOrderInfoSet[0].key, globalstring);
First question: Are the global defined strings all initiated with 100 times '\0'?
Second qestion: I am a bit confused as to how exactly strcpy() works. Does it only overwrite the characters necessary to place the source string into the destination string plus a \0 at the end and leave the rest as it is or does it fully delete any content of the destination string prior to that?
Third question: All my strings are fixed length of 100. If I use the 3 examples of strcpy() above, with my strings not exceeding 99 characters, does strcpy() properly overwrite the destination string and NULL terminate it? Meaning do I run into problems when using functions like strlen(), printf() later?
Fourth question: What happens when I strcpy() empty strings?
I plan to overwrite these strings in loops various times and would like to know if it would be safer to use memset() to fully "empty" the strings prior to strcpy() on every iteration.
Thx.
Are the global defined strings all initiated with 100 times '\0'?
Yes. Global char arrays will be initilizated to all zeros.
I am a bit confused as to how exactly strcpy() works. Does it only overwrite the characters necessary to place the source string into the destination string plus a \0 at the end and leave the rest as it
Exactly. It copies the characters up until and including '\0' and does not care about the rest.
If I use ... my strings not exceeding 99 characters, does strcpy() properly overwrite the destination string and NULL terminate it?
Yes, but NULL is a pointer, it's terminated with zero byte, sometimes called NUL. You might want to see What is the difference between NUL and NULL? .
Meaning do I run into problems when using functions like strlen(), printf() later?
Not if your string lengths are less than or equal to 99.
What happens when I strcpy() empty strings?
It just copies one zero byte.
would like to know if it would be safer to use memset() to fully "empty" the strings prior to strcpy() on every iteration.
Safety is a broad concept. As far as safety as in if the program will execute properly, there is no point in caring about anything after zero byte, so just strcpy it.
But you should check if your strings are less than 99 characters and handle what to do it they are longer. You might be interested in strnlen, but the interface is confusing - I recommend to use memcpy + explicitly manually set zero byte.

Will assigning a large value for length of char string be an issue?

I am reading a line from a file and I do not know the length it is going to be. I know there are ways to do this with pointers but I am specifically asking for just a plan char string. For Example if I initialize the string like this:
char string[300]; //(or bigger)
Will having large string values like this be a problem?
Any hard coded number is potentially too small to read the contents of a file. It's best to compute the size at run time, allocate memory for the contents, and then read the contents.
See Read file contents with unknown size.
char string[300]; //(or bigger)
I am not sure which of the two issues you are concerned with, so I will try to address both below:
if the string in the file is larger than 300 bytes and you try to "stick" that string in that buffer, without accounting the max length of your array -you will get undefined behaviour because of overwriting the array.
If you are just asking if 300 bytes is too much too allocate - then no, it is not a big deal unless you are on some very restricted device. e.g. In Visual Studio the default stack size (where that array would be stored) is 1 MB if I am not wrong. Benefits of doing so is understandable, e.g. you don't need to concern yourself with freeing it etc.
PS. So if you are sure the buffer size you specify is enough - this can be fine approach as you free yourself from memory management related issues - which you get from pointers and dynamic memory.
Will having large string values like this be a problem?
Absolutely.
If your application must read the entire line from a file before processing it, then you have two options.
1) Allocate buffer large enough to hold the line of maximum allowed length. For example, the SMTP protocol does not allow lines longer than 998 characters. In that case you can allocate a static buffer of length 1001 (998 + \r + \n + \0). Once you have read a line from a file (or from a client, in the example context) which is longer than the maximum length (that is, you have read 1000 characters and the last one is not \n), you can treat it as a fatal (protocol) error and report it.
2) If there are no limitations on the length of the input line, the only thing you can do to ensure your program robustness is allocating buffers dynamically as the input is read. This may involve storing multiple malloc-ed buffers in a linked list, or calling realloc when buffer exhaustion detected (this is how getline function works, although it is not specified in the C standard, only in POSIX.1-2008).
In either case, never use gets to read the line. Call fgets instead.
It all depends on how you read the line. For example:
char string[300];
FILE* fp = fopen(filename, "r");
//Error checking omitted
fgets(string, 300, fp);
Taken from tutorialspoint.com
The C library function char *fgets(char *str, int n, FILE *stream) reads a line from the specified stream and stores it into the string pointed to by str. It stops when either (n-1) characters are read, the newline character is read, or the end-of-file is reached, whichever comes first.
That means that this will read 299 characters from the file at most. This will cause only a logical error (because you might not get all the data you need) that won't cause any undefined behavior.
But, if you do:
char string[300];
int i = 0;
FILE* fp = fopen(filename, "r");
do{
string[i] = fgetc(fp);
i++;
while(string[i] != '\n');
This will cause Segmantation Fault because it will try to write on unallocated memory on lines bigger than 300 characters.

How to create array of fixed-length "strings" in C?

I am trying to create an array of fixed-length "strings" in C, but have been having a little trouble. The problem I am having is that I am getting a segmentation fault.
Here is the objective of my program: I would like to set the array's strings by index using data read from a text file. Here is the gists of my current code (I apologize that I couldn't add my entire code, but it is quite lengthy, and would likely just cause confusion):
//"n" is set at run time, and 256 is the length I would like the individual strings to be
char (*stringArray[n])[256];
char currentString[256];
//"inputFile" is a pointer to a FILE object (a .txt file)
fread(&currentString, 256, 1, inputFile);
//I would like to set the string at index 0 to the data that was just read in from the inputFile
strcpy(stringArray[i], &currentString);
Note that if your string can be 256 characters long, you need its container to be 257 bytes long, in order to add the final \0 null character.
typedef char FixedLengthString[257];
FixedLengthString stringArray[N];
FixedLengthString currentString;
The rest of the code should behave the same, although some casting might be necessary to please functions expecting char* or const char* instead of FixedLengthString (which can be considered a different type depending on compiler flags).

What does write() write if null terminator is already reached?

For write(fd[1], string, size) - what would happen if string is shorter than size?
I looked up the man page but it doesn't clearly specify that situation. I know that for read, it would simply stop there and read whatever string is, but it's certainly not the case for write. So what is write doing? The return value is still size so is it appending null terminator? Why doesn't it just stop like read.
When you call write(), the system assumes you are writing generic data to some file - it doesn't care that you have a string. A null-terminated string is seen as a bunch of non-zero bytes followed by a zero byte - the system will keep writing out until it's written size bytes.
Thus, specifying size which is longer than your string could be dangerous. It's likely that the system is reading data beyond the end of the string out your file, probably filled with garbage data.
write will write size bytes of data starting at string. If you define string to be an array shorter than size it will have undefined behaviour. But in you previous question the char *line = "apple"; contains 6 characters (i.e. a, p, p, l, e and the null character).
So it is best to write the with the value of size set to the correct value
write(int fildes, const void *buf, size_t nbyte) does not write null terminated strings. It writes the content of a buffer. If there are any null characters in the buffer they will be written as well.
read(int fildes, void *buf, size_t nbyte) also pays no attention to null characters. It reads a number of bytes into the given buffer, up to a maximum of nbyte. It does not add any null terminating bytes.
These are low level routines, designed for reading and writing arbitrary data.
The write call outputs a buffer of the given size. It does not attempt to interpret the data in the buffer. That is, you give it a pointer to a memory location and a number of bytes to write (the length) then, as long as those memory locations exist in a legal portion of your program's data, it will copy those bytes to the output file descriptor.
Unlike the string manipulation routines write, and read for that matter, ignore null bytes, that is bytes with the value zero. read does pay attention to the EOF character and, on certain devices, will only read that amount of data available at the time, perhaps returning less data than requested, but they operate on raw bytes without interpreting them as "strings".
If you attempt to write more data than the buffer contains, it may or may not work depending on the position of the memory. At best the behavior is undefined. At worst you'll get a segment fault and your program will crash.

Scanning a file and allocating correct space to hold the file

I am currently using fscanf to get space delimited words. I establish a char[] with a fixed size to hold each of the extracted words. How would I create a char[] with the correct number of spaces to hold the correct number of characters from a word?
Thanks.
Edit: If I do a strdup on a char[1000] and the char[1000] actually only holds 3 characters, will the strdup reserve space on the heap for 1000 or 4 (for the terminating char)?
Here is a solution involving only two allocations and no realloc:
Determine the size of the file by seeking to the end and using ftell.
Allocate a block of memory this size and read the whole file into it using fread.
Count the number of words in this block.
Allocate an array of char * able to hold pointers to this many words.
Loop through the block of text again, assigning to each pointer the address of the beginning of a word, and replacing the word delimiter at the end of the word with 0 (the null character).
Also, a slightly philosophical matter: If you think this approach of inserting string terminators in-place and breaking up one gigantic string to use it as many small strings is ugly, hackish, etc. then you probably should probably forget about programming in C and use Python or some other higher-level language. The ability to do radically-more-efficient data manipulation operations like this while minimizing the potential points of failure is pretty much the only reason anyone should be using C for this kind of computation. If you want to go and allocate each word separately, you're just making life a living hell for yourself by doing it in C; other languages will happily hide this inefficiency (and abundance of possible failure points) behind friendly string operators.
There's no one-and-only way. The idea is to just allocate a string large enough to hold the largest possible string. After you've read it, you can then allocate a buffer of exactly the right size and copy it if needed.
In addition, you can also specify a width in your fscanf format string to limit the number of characters read, to ensure your buffer will never overflow.
But if you allocated a buffer of, say 250 characters, it's hard to imaging a single word not fitting in that buffer.
char *ptr;
ptr = (char*) malloc(size_of_string + 1);
char first = ptr[0];
/* etc. */

Resources