What is the best way to add a '\r' to my new-line sequence? I am writing a string to a file and doing various parsing to it but according to my assignment spec a new line will be considered '\r\n'.
Right now I only have a new line at the end. I was thinking of a for loop and/or using memmove but not sure exactly how to make it work?
for (int x = 0;x < strlen(string);x++)
{
if (string[x] == '\n')
{
..............
}
}
The algorithm is something along the lines of:
Check that the last two characters in string aren't already "\r\n", if they are return string.
Check whether the last character in string has either '\r' or '\n', set a flag.
Allocate strlen(string) + 2 bytes to hold the new string, if the flag is set, otherwise allocate strlen(string) + 3 bytes.
Calculate bytes to copy as strlen(string) - 1 if flag is set, otherwise strlen(string).
Copy number of bytes to copyfrom string to the allocated storage.
Append "\r\n\0" to the end of the bytes copied above.
Return the string just created in allocated storage.
EDIT: Your mileage may vary if you can make assumptions about the state of string. If it resides in a large enough buffer to add characters to the end of it, no new allocation would be required. If your char type is not one byte, you would need to adjust accordingly.
Related
I'm making a simple program in C, which checks the length of some char array and if it's less than 8, I want to fill a new array with zeroes and add it to the former array. Here comes the problem. I don't know why the last values are some signs(see the photo).
char* hexadecimalno = decToHex(decimal,hexadecimal);
printf("Hexadecimal: %s\n", hexadecimalno);
char zeroes [8 - strlen(hexadecimalno)];
if(strlen(hexadecimalno) < 8){
for(i = 0; i < (8-strlen(hexadecimalno)); i++){
zeroes[i]='0';
}
}
printf("zeroes: %s\n",zeroes);
strcat(zeroes,hexadecimalno);
printf("zeroes: %s\n",zeroes);
result
In C, strings (which are, as you are aware, arrays of characters) do not have any special metadata that tells you their length. Instead, the convention is that the string stops at the first character whose char value is 0. This is called "null-termination". The way your code is initializing zeroes does not put any null character at the end of the array. (Do not confuse the '0' characters you are putting in with NUL characters -- they have char value 48, not 0.)
All of the string manipulation functions assume this convention, so when you call strcat, it is looking for that 0 character to decide the point at which to start adding the hexadecimal values.
C also does not automatically allocate memory for you. It assumes you know exactly what you are doing. So, your code is using a C99 feature to dynamically allocate an array zeroes that has exactly the number of elements as you need '0' characters appended. You aren't allocating an extra byte for a terminating NUL character, and strcat is also going to assume that you have allocated space for the contents of hexadecimalno, which you have not. In C, this does not trigger a bounds check error. It just writes over memory that you shouldn't actually write over. So, you need to be very careful that you do allocate enough memory, and that you only write to memory you have actually allocated.
In this case, you want hexadecimalno to always be 8 digits long, left-padding it with zeroes. That means you need an array with 8 char values, plus one for the NUL terminator. So, zeroes needs to be a char[9].
After your loop that sets zeroes[i] = '0' for the correct number of zeroes, you need to set the next element to char value 0. The fact that you are zero-padding confuses things, but again, remember that '0' and 0 are two different things.
Provided you allocate enough space (at least 9 characters, assuming that hexadecimalno will never be longer than 8 characters), and then that you null terminate the array when putting the zeroes into it for padding, you should get the expected result.
I am supposed to load a list of names from a file, and then find those names in the second file and load them in a structure with some other data (for the simplicity, I will load them to another array called "test".
The first part is just fine, I am opening a file and loading all the names into a 2dimensional array called namesArr.
The second part is where unexpected characters occur, and I can't understand why. Here is the code of the function:
void loadStructure(void){
char line[MAX_PL_LENGTH], *found;
int i, j=0;
char test[20][20];
FILE *plotPtr=fopen(PLOT_FILE_PATH, "r");
if (plotPtr==NULL){perror("Error 05:\nError opening a file in loadStructure function. Check the file path"); exit(-5);}
while(fgets(line, MAX_PL_LENGTH, plotPtr)!=NULL){ // This will load each line from a file to an array "line" until it reaches the end of file.
for(i=0; i<numOfNames; i++){ // Looping through the "namesArr" array, which contains the list of 20 character names.
if((found=strstr(line, namesArr[i]))!=NULL){ // I use strstr() to find if any of those names appear in the particular line.
printf("** %s", found); // Used of debugging.
strncpy(test[j], found, strlen(namesArr[i])); j++; // Copying the newly found name to test[j] (copying only the name, by defining it's length, which is calculated by strlen function).
}
}
}
fclose(plotPtr);
printf("%s\n", test[0]);
printf("%s\n", test[1]);
printf("%s\n", test[2]);
}
This is the output I get:
...20 names were loaded from the "../Les-Mis-Names-20.txt".
** Leblanc, casting
** Fabantou seems to me to be better," went on M. Leblanc, casting
** Jondrette woman, as she stood
Leblanct╕&q
Fabantou
Jondretteⁿ └
Process returned 0 (0x0) execution time : 0.005 s
Press any key to continue.
The question is, why am I getting characters like "╕&q" and "ⁿ └" in the newly created array? And also, is there any other more efficient way to achieve what I am trying to do?
The problem is that strncpy does not store a null in the target array if the length specified is less than the source string (as is always the case here). So whatever garbage happpend to be in the test array will remain there.
You can fix this specific problem by zeroing the test array, either when you declare it:
char test[20][20] = { { 0 } };
or as you use it:
memset(test[j], 0, 20);
strncpy(test[j], found, strlen(namesArr[i]));
but in general, it is best to avoid strncpy for this reason.
The length limitation for strncpy should be based on the target size, not the source length: that's the point of using it over strcpy, which uses only the source length. In your code
strncpy(test[j], found, strlen(namesArr[i]));
the length parameter is from the source array, which defeats the purpose of using strncpy. In addition, the nul terminator will not be present if the function copies the full limit of bytes, so the code should be
strncpy(test[j], found, 19); // limit to target size, leaving room for terminator
test[j][19] = '\0'; // add terminator (if copy did not complete)
Whether you loaded namesArr[] from file correctly is another potential issue, since you do not show the code.
Edited:
Slight modification to a previous answer:
1) Since you are working with C strings, make sure (since strncpy(...) does not do it for you) that you null terminate the buffer.
2) When using strncpy the length argument should represent the target string byte capacity - 1 (space for null terminator), not the source string length.
...
int len = strlen(found)
memset(test[j], 0, 20);
strncpy(test[j], found, 19);//maximum length (19) matches array size
//of target string -1 ( test[j] ).
if(len > 19) len = 19; //in case length of found is longer than the target string.
test[j][len+1] = 0;
...
In addition to what Chris Dodd said,, quoted from man strncpy
The strncpy() function is similar [to the strcpy() function], except that at most n bytes of src are copied. Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
Since the size parameter in your strncpy call is the length of the string, this will not include the null byte at the end of the string and thus your destination string will not be null-terminated from this call.
I'd like to write a function like this:
int validate_file_name(char *filename)
{
//...
}
which will:
return 1 if there was no \0 character in the filename,
0 otherwise.
I thought it may be achieved using a simple for(size_t i = 0; i < strlen(filename); i++), but I don't know how to determine how much characters I've got to check?
I can't use strlen() because it will terminate on the first occurrence of a \0 character.
How should I approach this problem?
Clarification:
I am trying to apply these guidelines to a filename I receive. If you should avoid putting a \0 in a filename, how could you validate this if you've got no size parameter.
Moreover, there are strings with multiple \0 characters, like here: http://www.gnu.org/software/libc/manual/html_mono/libc.html#Argz-and-Envz-Vectors. Still, I had no idea that it is impossible to determine their length if it is not explicitly provided.
Conclusion:
There is no way you can determine the length of string which is not NULL-terminated. Unless you know the length of course or you deploy some dirty hacks: Checking if a pointer is allocated memory or not.
You are trying to solve a problem that does not need to be solved.
A file name is a string. In C, a "string" is by definition "a contiguous sequence of characters terminated by and including the first null
character".
It is impossible to have a string or a file name with a null character embedded in it.
It's possible to have a sequence of characters with an embedded null character. For example:
char buf[] = "foo\0bar.txt";
buf is an array of 12 characters; the characters at positions 3 and 11 are both null characters. If you treat buf as a string, for example by calling
fopen(buf, "r")
it will be treated as a string with a length of 3 (the length of a string does not include the terminating null character).
If you're working with character arrays that may or may not contain strings, then it makes sense to do what you're asking. You would need to keep track of the size of the buffer separately from the address of the initial character, either by passing an additional argument or by wrapping the pointer and the length in a structure.
But if you're dealing with file names, it's almost certainly best just to deal with strings and assume that whatever char* value is passed to your function points to a valid string. If it doesn't (if there is no null character anywhere in the array), that's the caller's fault, and not something you can reasonably check.
(Incidentally, Unix/Linux file systems explicitly forbid null characters in file names. The / character is also forbidden, because it's used as a directory name delimiter. Windows file systems have even stricter rules.)
One last point: NULL is (a macro that expands to) a null pointer constant. Please don't use the term NULL to refer to the null character '\0'.
The answer is that you can't write a function that does that if you don't know the length of the string.
To determine the length of the string strlen() searches for the '\0' character which if is not present will cause undefined behavior.
If you knew the length of the string then,
for (int i = 0 ; i < length ; ++i)
{
if (string[i] != '\0')
continue;
return 1;
}
return 0;
would work, if you don't know the length of the string then the condition would be
for (int i = 0 ; string[i] != '\0' ; ++i)
which obviously means that then searching for the '\0' makes no sense because it's presence is what makes all other string related functions to work properly.
If the string is not NULL-terminated, what else it is terminated by? And if you don't know that, what is it length? If you know the answer to these problems, you know the answer to your question.
I didn't used C for a lot of time, and now I have to modify a little piece of code. There one thing I can't understand:
char filename[20];
filename[0] = '\0';
for (j=0; j < SHA_DIGEST_LENGTH; j++){
sprintf(filename + strlen(filename),"%02x",result[j]);
}
In the first line a string of 20 characters is dleclared.
In the second line the first char is set to '\0', so is an empty string, I suppose.
In the for loop I don't understand the "sum" between filename and its length... The firs parameter of sprintf should be a buffer where to copy the formatted string on the right. What is the result of that sum? It seems to me like I'm trying to sum an array and an integer...
What I'm missing?
It's pointer arithmetic. strlen returns the number of characters before the NUL terminator. The result of the addition will point to this terminator. E.g. if the current string is "AA" (followed by a NUL), strlen is 2. filename + 2 points to the NUL. It will write the next hex characters (e.g. BB) over the NUL and the next character. It will then NUL-terminate it again (at filename + 4). So then you'll have "AABB" (then NUL).
It doesn't really make sense though. It wastes a lot of time looking for those NULs. Specifically, it's a quadratic algorithm. The first time, it examines 1 character, then 3, 5, 7, ..., 2 * SHA_DIGEST_LENGTH - 1) that . It could just be:
sprintf(filename + 2 * j,"%02x",result[j]);
There's another problem. A hexadecimal representation of a SHA-1 sum takes 40 characters, since a byte requires two characters. Then, you have a final NUL terminator, so there should be 41. Otherwise, there's a buffer overflow.
Why dont you declare
char filename[SHA_DIGEST_LEN*2 +1]; /* And +1 if you want to have the NULL terminating char*/
This is because SHA1 digest length is 20 bytes, if you were just to print the digest then you may probably not want the additional memory but since you want hexadecimal string of the digest you can use the above declaration.
A strlen operation returns lenghth of string till a null terminating character is encountered.
So basically when you do the following :
sprintf(filename + strlen(filename),"%02x",result[j]);
In the first interation filname is copied with 2 bytes of the hexadecimal representation of the first byte of the sha-1 digest. Eg. Say that is AA, now you need to move your pointer two places to copy the next byte.
After second iteration it becomes AABB.
After the 20th iteration you have the entire string AABBCC......AA[40 bytes] and +1 if you need the '\0' which is the NULL termination character.
First iteration, when j = 0, you will write 3 chars (yes, including the '\0' terminating the string) onto the beginning of filename, since strlen() then returns 0.
Next round, strlen() returns 2, and it will continue writing after the first two chars.
Be careful for stepping outside the 20 char space allocated. Common mistake is to forget the space required for the string terminator.
EDIT: make sure that SHA_DIGEST_LENGTH is not greater than 9.
you are adding strlen(filename) only to do concatenation of result[j]
Each iteration concatenates the current result[j] at the end of filename so each time you need to know to offset within the filename where the concatenation should take place.
Replace the code with:
char filename[SHA_DIGEST_LENGTH*2+1];
for (j=0; j < SHA_DIGEST_LENGTH; j++){
sprintf(filename + 2*j,"%02x",result[j]);
}
Faster, simpler, and the bugs are gone.
I want to load a txt file into an array like file() does in php. I want to be able to access different lines like array[N] (which should contain the entire line N from the file), then I would need to remove each array element after using it to the array will decrease size until reaching 0 and the program will finish. I know how to read the file but I have no idea how to fill a string array to be used like I said. I am using gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) to compile.
How can I achieve this?
Proposed algorithm:
Use fseek, ftell, fseek to seek to end, determine file length, and seek back to beginning.
malloc a buffer big enough for the whole file plus null-termination.
Use fread to read the whole file into the buffer, then write a 0 byte at the end.
Loop through the buffer byte-by-byte and count newlines.
Use malloc to allocate that number + 1 char * pointers.
Loop through the buffer again, assigning the first pointer to point to the beginning of the buffer, and successive pointers to point to the byte after a newline. Replace the newline bytes themselves with 0 (null) bytes in the process.
One optimization: if you don't need random access to the lines (indexing them by line number), do away with the pointer array and just replace all the newlines with 0 bytes. Then s+=strlen(s)+1; advances to the next line. You'll need to add some check to make sure you don't advance past the end (or beginning if you're doing this in reverse) of the buffer.
Either way, this method is very efficient (no memory fragmentation) but has a couple drawbacks:
You can't individually free lines; you can only free the whole buffer once you finish.
You have to overwrite the newlines. Some people prefer to have them kept in the in-memory structure.
If the file ended with a newline, the last "line" in your pointer array will be zero-length. IMO this is the sane interpretation of text files, but some people prefer considering the empty string after the last newline a non-line and considering the last proper line "incomplete" if it doesn't end with a newline.
I suggest you read your file into an array of pointers to strings which would allow you to index and delete the lines as you have specified. There are efficiency tradeoffs to consider with this approach as to whether you count the number of lines ahead of time or allocate/extend the array as you read each line. I would opt for the former.
Read the file, counting the number of line terminators you see (ether \n or \r\n)
Allocate a an array of char * of that size
Re-read the file, line by line, using malloc() to allocate a buffer for each and pointed to by the next array index
For your operations:
Indexing is just array[N]
Deleting is just freeing the buffer indexed by array[N] and setting the array[N] entry to NULL
UPDATE:
The more memory efficient approach suggested by #r.. and #marc-van-kempen is a good optimization over malloc()ing each line at a time, that is, slurp the file into a single buffer and replace all the line terminators with '\0'
Assuming you've done that and you have a big buffer as char *filebuf and the number of lines is int num_lines then you can allocate your indexing array something like this:
char *lines[] = (char **)malloc(num_lines + 1); // Allocates array of pointers to strings
lines[num_lines] = NULL; // Terminate the array as another way to stop you running off the end
char *p = filebuf; // I'm assuming the first char of the file is the start of the first line
int n;
for (n = 0; n < num_lines; n++) {
lines[i] = p;
while (*p++ != '\0') ; // Seek to the end of this line
if (n < num_lines - 1) {
while (*p++ == '\0') ; // Seek to the start the next line (if there is one)
}
}
With a single buffer approach "deleting" a line is merely a case of setting lines[n] to NULL. There is no free()
Two slightly different ways to achieve this, one is more memory friendly, the other more cpu friendly.
I memory friendly
Open the file and get its size (use fstat() and friends) ==> size
allocate a buffer of that size ==> char buf[size];
scan through the buffer counting the '\n' (or '\n\r' == DOS or '\r' == MAC) ==> N
Allocate an array: char *lines[N]
scan through the buffer again and point lines[0] to &buf[0], scan for the first '\n' or '\r' and set it to '\0' (delimiting the string), set lines[1] to the first character after that that is not '\n' or '\r', etc.
II cpu friendly
Create a linked list structure (if you don't know how to do this or don't want to, have a look at 'glib' (not glibc!), a utility companion of gtk.
Open the file and start reading the lines using fgets(), malloc'ing each line as you go along.
Keep a linked list of lines ==> list and count the total number of lines
Allocate an array: char *lines[N];
Go through the linked list and assign the pointer to each element to its corresponding array element
Free the linked list (not its elements!)