C dealing with variable length string - c

I'm new to C, taking a university course.
In one of the tasks I'm given, I deal with strings. I take strings either entered by user or parsed from a file and then use a function on them to produce an answer (if a specific quality exists).
The string can be of variable length but it is acceptable to assume that their maximum length is 80 characters.
I created the program using a
char s[81];
and then filling up the same array with the different strings each time.
Since the string has to be null-terminated I just added a '\0' at index 80;
s[80] = '\0';
But then I got all kind of weird behaviors - Unrelated characters at the end of the string I entered. I assumed this is because there was space between the end of the 'real' characters and the '\0' character filled with garbage(?).
So what I did is I created a function:
void clean_string(char s[], int string_size) {
int index = 0;
while(index < string_size) {
s[index++] = '\0';
}
}
What I call clean, is just filling a string up with zero characters. I do this every time I am done dealing with a string and ready to accept a new one. Then I fill up the string again character by character and when ever I'll stop, the following character will be a '\0' for sure.
To not include any magic numbers in code (81 each time I call clean_string) I used the following:
#define STRING_LENGTH 81
That works for me. The strings show no strange behavior. But I wondered if this is considered bad practice. Are there problems with this approach?
Just emphasizing, I'm not asking for help in the assignment itself, but tips on how to approach these kind of situations better.

Rather than prefilling the entire array with zeros, it should be simple to just add a single zero after you've read all relevant characters.
For example:
char s[STRING_LENGTH];
int c;
int idx = 0;
while (((c = getchar()) != '\n') && (idx < STRING_LENGTH - 1) && (c != EOF)) {
s[idx++] = c;
}
s[idx] = 0;

Related

CamelCase to snake_case in C without tolower

I want to write a function that converts CamelCase to snake_case without using tolower.
Example: helloWorld -> hello_world
This is what I have so far, but the output is wrong because I overwrite a character in the string here: string[i-1] = '_';.
I get hell_world. I don't know how to get it to work.
void snake_case(char *string)
{
int i = strlen(string);
while (i != 0)
{
if (string[i] >= 65 && string[i] <= 90)
{
string[i] = string[i] + 32;
string[i-1] = '_';
}
i--;
}
}
This conversion means, aside from converting a character from uppercase to lowercase, inserting a character into the string. This is one way to do it:
iterate from left to right,
if an uppercase character if found, use memmove to shift all characters from this position to the end the string one position to the right, and then assigning the current character the to-be-inserted value,
stop when the null-terminator (\0) has been reached, indicating the end of the string.
Iterating from right to left is also possible, but since the choice is arbitrary, going from left to right is more idiomatic.
A basic implementation may look like this:
#include <stdio.h>
#include <string.h>
void snake_case(char *string)
{
for ( ; *string != '\0'; ++string)
{
if (*string >= 65 && *string <= 90)
{
*string += 32;
memmove(string + 1U, string, strlen(string) + 1U);
*string = '_';
}
}
}
int main(void)
{
char string[64] = "helloWorldAbcDEFgHIj";
snake_case(string);
printf("%s\n", string);
}
Output: hello_world_abc_d_e_fg_h_ij
Note that:
The size of the string to move is the length of the string plus one, to also move the null-terminator (\0).
I am assuming the function isupper is off-limits as well.
The array needs to be large enough to hold the new string, otherwise memmove will perform invalid writes!
The latter is an issue that needs to be dealt with in a serious implementation. The general problem of "writing a result of unknown length" has several solutions. For this case, they may look like this:
First determine how long the resulting string will be, reallocating the array, and only then modifying the string. Requires two passes.
Every time an uppercase character is found, reallocate the string to its current size + 1. Requires only one pass, but frequent reallocations.
Same as 2, but whenever the array is too small, reallocate the array to twice its current size. Requires a single pass, and less frequent (but larger) reallocations. Finally reallocate the array to the length of the string it actually contains.
In this case, I consider option 1 to be the best. Doing two passes is an option if the string length is known, and the algorithm can be split into two distinct parts: find the new length, and modify the string. I can add it to the answer on request.

Is a function that reads a line onto s and returns its length good practice?

K & R section 1.9 code for saving the longest line of an input has the function:
int getline(char s[], int lim)
{
int c, i;
for(i = 0; i < lim -1 && (c =getchar()) != EOF && c != '\n'; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = c;
return i;
}
Yet, for best practice, I've learned that a function only does one thing. I believe this function copies the line in its input onto the char array of s AND returns the length. Is this not considered two things? Would I be correct in my assumption that this is a bad practice?
To elaborate, we do use the input from the getLine function but in a very non-intuitive way.
main()
{
int len; /*current line length*/
int max; /*Current max line length seen so far*/
char line[MAXLINE]; /*Current input line */
char longest[MAXLINE]; /*Longest line saved here*/
max = 0;
while ((len = getline(line, MAXLINE)) > 0)
if (len > max) {
max = len;
copy(longest, line);
}
if (max > 0) /* there was a line */
printf("%s", longest);
return 0;
}
/*FUNCTION GETLINE TAKEN OUT */
/*copy: copy 'from' into 'to'; assume to is big enough */
void copy(char to[], char from[])
{
int i;
i = 0;
while ((to[i] = from[i]) != '\0')
++i;
}
No, for two reasons.
First one could argue that the purpose of getline (as the name suggests) is to read a line from input. The fact that it also returns the number of characters read could be explained by the way C-strings work and that the function could otherwise not be used to read data containing null-bytes.
Second the function does not contain any additional code to calculate the length. It is a byproduct of reading the string. The function would otherwise be of type void so there are really no drawbacks to returning the length of the string.
Also coding guidelines are not and end unto themselves but should help producing good code. I do not see how this code could possibly improve by omitting the return statement and writing a separate O(n) function to retrieve the length.
The getline() function is not doing two separate things. It's doing two very closely related things, and it's definitely not bad practice to have it return the string's length.
Do not try to fit functions, algorithms and other similar things into schemas blindly. What getline() does is conceptually correct, since a string is an object. It has a contents buffer and a length. Both "properties" belong to the string object, and in fact, I would consider it bad practice to separate them.
Also, it would be unnecessarily complicated (and inefficient) to have yet another function that computes the string length. In C, strings are 0-terminated and thus such a function has to walk the entire string in order to find its length.
(Not to mention that there already is such a function in the C standard library, it's called strlen().)
Often for strings, and in general arrays, this is not an exception but actually very useful behavior. In a sense, arrays have an implicit length property. Sometimes this is known at compile time, sometimes it is known at runtime and stored in a variable and sometimes it can only be determined by virtue of being a null-terminated array.
In any case, since one cannot return an array by value, returning the length of the array (which is often very handy to know) is a very useful property of functions that write an array into a buffer. I might even argue it is idiomatic in C when the written array's length cannot be known by the caller.
ADDITION:
The above answer is with respect to functions that do not allocate memory but only write into a provided buffer. It's sometimes useful to return a simple struct struct { size_t size; valtype* vals; } if knowing the allocated array size is always useful to the caller and you don't want to later iterate over the array. Drawing the parallels with your question, you can see why in a way it isn't really doing two things; it's just giving you a more complete result.

Where does C store '\0'?

The below code reads a line and return the line length. lim is the length of the array s[].
When the input line length is lim, then s[lim] = '\0'. But the array s[] is only lim-length long, from s[0] to s[lim-1]. Will it cause an buffer overflow? I tested it many times, but the code seemed to work just fine.
int getline(char s[], int lim)
{
int c, i;
for(i = 0; i < lim-1 && ( c = getchar())!= EOF && c!= '\n'; i++)
s[i] = c;
if( c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
The '\0' is just another character. It is stored right after the last character of the string.
Often, you can "get away" with writing off the end of a buffer with no obvious harm, but don't do it. It's a bug.
I once had to debug a program that contained an error like this. The program was writing a single byte past the end of one buffer. In the debug build, there was enough extra stuff on the stack that the single byte extra caused no harm; the crash only occurred in the release build, but the debugger didn't really work since it was the non-debug build. This is an example of why it is good to test your code both in a "debug" build and in a release build (compiled the way you would give it to your users).
This is a good example as to how to clearly define an interface - its input and returned value;
" int getline(char s[], int lim) "
One possible definition of "lim" is, maximum number of characters to be copied to s[], excluding the terminating null-character i.e. '\0'
Example:
char arr[] = "hello";
getline(arr, strlen(arr));
The other definition of "lim" is, Maximum number of characters to be copied into s[] (including the terminating null-character)
Example:
char arr[] = "hello";
getline(arr, sizeof(arr));
You seem to be supposing the 2nd definition of "lim".
This is a function straight out of "The C Programming Language" by K&R. It's from chapter one. It works because it is correct.
Consider "cat". This is a four character array {'c','a','t','\0'}. The length of the string is 3.
If s[]="cat" then s[0]='c', s[3]='\0'. Eh?
The string length returned by srtlen or what have you is the number of characters minus one. The array is allocated to hold all the 4 characters. That's where the '\0' is, at the end of the array.
No, it won't cause buffer overflow. In fact, a '\0' indicates a NULL position, which is considered as the end of an array. When you go from the beginning to the end of an array, the last position containing the '\0' character will never be considered as a position containing valid data.
You could go over all the array by using while(index < size) as a condition, or by using while(array[position] != NULL)

C—Infinite loop, I think?

I'm having an issue with a program in C and I think that a for loop is the culprit, but I'm not certain. The function is meant to take a char[] that has already been passed through a reverse function, and write it into a different char[] with all trailing white space characters removed. That is to say, any ' ' or '\t' characters that lie between a '\n' and any other character shouldn't be part of the output.
It works perfectly if there are no trailing white space characters, as in re-writing an exact duplicate of the input char[]. However, if there are any, there is no output at all.
The program is as follows:
#include<stdio.h>
#define MAXLINE 1000
void trim(char output[], char input[], int len);
void reverse(char output[], char input[], int len);
main()
{
int i, c;
int len;
char block[MAXLINE];
char blockrev[MAXLINE];
char blockout[MAXLINE];
char blockprint[MAXLINE];
i = 0;
while ((c = getchar()) != EOF)
{
block[i] = c;
++i;
}
printf("%s", block); // for debugging purposes
reverse(blockrev, block, i); // reverses block for trim function
trim(blockout, blockrev, i);
reverse(blockprint, blockout, i); // reverses it back to normal
// i also have a sneaking suspicion that i don't need this many arrays?
printf("%s", blockprint);
}
void trim(char output[], char input[], int len)
{
int i, j;
i = 0;
j = 0;
while (i <= len)
{
if (input[i] == ' ' || input[i] == '\t')
{
if (i > 0 && input[i-1] == '\n')
for (; input[i] == ' ' || input[i] == '\t'; ++i)
{
}
else
{
output[j] = input[i];
++i;
++j;
}
}
else
{
output[j] = input[i];
++i;
++j;
}
}
}
void reverse(char output[], char input[], int len)
{
int i;
for (i = 0; len - i >= 0; ++i)
{
output[i] = input[len - i];
}
}
I should note that this is a class assignment that doesn't allow the use of string functions, hence why it's so roundabout.
Change
for (i; input[i] == ' ' || input[i] == '\t'; ++i);
to
for (; i <= len && (input[i] == ' ' || input[i] == '\t'); ++i);
With the first method, if the whitespace is at the end, the loop will iterate indefinitely. Not sure how you didn't get an out of bounds access exception, but that's C/C++ for you.
Edit As Arkku brought up in the comments, make sure your character array is still NUL-terminated (the \0 character), and you can check on that case instead. Make sure you're not trimming the NUL character from the end either.
Declaring your main() function simply as main() is an obsolete style that should not be used. The function must be declared either as int main(void) or as int main(int argc, char *argv[]).
Your input process does not null-terminate your input. This means that what you're working with is not a "string", because a C string, by definition, is an array of char that the last element is a null character ('\0'). Instead, what you've got are simple arrays of char. This wouldn't be a problem as long as you're expecting that, and indeed your code is passing array lengths about, but you're also trying to print it with printf(), which requires C strings, not simple arrays of char.
Your reverse() function has an off-by-one error, because you aren't accounting for the fact that C arrays are zero-indexed, so what you're reversing is always one byte longer than your actual input.
What this means is that if you call reverse(output, input, 10), your code will start by assigning the value at input[10] to output[0], but input[10] is one past the end of your data, and since you didn't initialize your arrays before starting to fill them, that's an indeterminate value. In my testing, that indeterminate value happens, coincidentally, to have zero values much of the time, which means that output[0] gets filled with a null ('\0').
You need to be subtracting one more from the index into the input than you actually are. The loop-termination condition in the reverse() function is also wrong, in compensation, that condition should be len - i > 0, not len - i >= 0.
Your trim() function is unnecessarily complex. Additionally, it too has an incorrect loop condition to compensate for the off-by-one error in reverse(). The loop should be while ( i < len ), not while ( i <= len ).
Additionally, the trim() function has the ability to reduce the size of your data, but you don't provide a way to retain that information. (I see in the comments of Arkku's answer that you've corrected for this already. Good.)
Once you've fixed the issue with not keeping track of your data's size changes, and the off-by-one error which is copying indeterminate data (which happens, coincidentally, to be a null) from the end of the blockout array to the beginning of the blockprint array when you do the second reverse(), and you fix the incorrect <= condition in trim() and the incorrect >= condition in reverse(), and you null-terminate your byte array before passing it to printf(), your program will work.
(Moved from comments to an answer)
My guess is that the problem is outside this function, and is caused by the fact that in the described problem cases the output is shorter than the input. Since you are passing the length of the string as an argument, you need to calculate the length of the string after trim, because it may have changed...
For instance, passing an incorrect length to reverse can cause the terminating NUL character (and possibly some leftover whitespace) to end up at the beginning of the string, thus making the output appear empty.
edit: After seeing the edited question with the code of reverse included, in addition to the above problem, your reverse puts the terminating NUL as the first character of the reversed string, which causes it to be the empty string (in some cases your second reverse puts it back, so you don't see it without printing the output of the first reverse). Note that input[len] contains the '\0', not the last character of the string itself.
edit 2: Furthermore, you are not actually terminating the string in block before using it. It may be the case that the uninitialised array often happens to contain zeroes that serve to terminate the string, but for the program to be correct you absolutely need to terminate it with block[i] = '\0'; immediately after the input loop. Similarly you need ensure NUL-termination of the outputs of reverse and trim (in case of trim it seems to me that this already happens as a side-effect of having the loop condition i <= len instead of i < len, but it's not a sign of good code that it's hard to tell).

Simple C If Statement

I made a very simple C program that is supposed to count how many characters and words are in a string (I count words by checking how many spaces are in the text and one to it). The current code is the following (with no 'printf's to keep it shorter):
int main(int argc, char *argv[])
{
int character;
int words, characters = 0;
while ((character = getchar()) != '\n') {
characters = ++characters;
if ((character == ' ') || (character == '\d')) {
words = ++words;
}
}
return 0;
}
My problem is that counting words do not work. I get an accurate count for characters, but words always gives me 2293576, and I cannot for the world figure out why.
Can someone solve this mystery for me?
Thank you for all your answers; I really appreciate the help.
and sorry if my primitive skills made some of your heads hurt. I am a beginner but hopefully improve fast.
You haven't initialized words. Uninitialized local variables in C default to an undefined value and are not automatically initialized to zero.
The statement
int x, y = 0;
Is not the same as
int x = 0, y = 0;
You don't initialize words to 0. Also, change this:
characters = ++characters;
to just:
characters++;
(and for words too).
Also, what is the '\d' character (besides a plain old d)?
You fail to initialize "words". In the statement:
int words, characters = 0;
characters is assigned to 0, but words is left unintialized so it could contain any integer value. The rest of your code then modifies words in its unintialized state. Instead of starting at 0 and counting up, words is starting at something like 2293576 and counting up from there. To fix your code assign words to 0 as well as characters before using them in the for loop.
int words = 0, characters = 0;

Resources