Why copying a character to a char array results in weird behaviour - arrays

I have the following code which aims to implemente a simple LZW based text compression:
int compress(FILE text_file, FILE *output) {
char input[20];
short code = 0;
short size = 0;
memset(input, '\0', 20);
char c = fgetc(text_file);
while (c != EOF) {
strncat(input, &c, 1);
code = getCodeFromInput(input);
if ((size + 1) > 1 && code == -1) {
addInput(input);
fwrite(&code, 2, 1, output);
strcpy(input, &c);
printf(%s, input);
size = 1;
code = 0;
} else {
size++;
}
c = fgetc(text_file);
}
fwrite(&input, 2, 1, output);
return 0;
}
So after finding a suitable string I need to override my string input with with the last char I've received. When using strcpy() I can easily copy arrays to my input string, but when trying to copy a single character, if I print the result I will get the expected output plus some weird character, for example copying the caracter 'a' I get a�. Is this expected? Why does strcpy() behave like that?

The standard defines strcpy as a function copying a NUL terminated string, you are copying the last input (a character), but even if you get his address, it is not a NUL terminated string:
strcpy(input, &c);
you can try using a compound literal:
strcpy(input, (char[2]){c, '\0'}); // A NUL terminated string
Also, notice that fgetc wants an int instead of a char in order to handle EOF:
char c = fgetc(text_file);
should be
int c = fgetc(text_file);
in consequence:
strcpy(input, (char[2]){(char)c, '\0'}); // A NUL terminated string

Related

Reading arbitrary length strings in C

I've attempted to write a C program to read a string and display it back to the user. I've tested it with a lot of input and it seems to work properly. The thing is that I'm not sure whether or not the c != EOF condition is necessary inside the while expression, and since by definition, the size of a char is 1 byte, maybe I can remove the sizeof(char) expressions inside the malloc and realloc statements, but I'm not sure about this.
Here's the program, also, I manually added a null terminating character to the string:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char *str = malloc(sizeof(char));
if (!str)
return 1;
char c;
char *reallocStr;
size_t len = 0;
size_t buf = 1;
printf("Enter some text: ");
while ((c = getchar()) != '\n' && c != EOF) {
if (len == buf) {
buf *= 2;
reallocStr = realloc(str, buf * sizeof(char));
if (!reallocStr)
return 1;
str = reallocStr;
}
str[len++] = c;
}
str[len] = '\0';
printf("You entered: %s\n", str);
free(str);
return 0;
}
As mentioned in the comments, you have a buffer overflow in your code, so you would need to fix that at the very least. To answer your specific questions, sizeof(char) is guaranteed to be 1 (dictated by the c99 spec), so you don't need to multiply by sizeof(char). It's good practice to check for EOF as if your input is coming from an alternate source that has no newline, you don't die (so if someone for example did printf %s hello | yourprogram from a bash prompt, you wouldn't die).
Problems include
Buffer overflow
#HardcoreHenry
Incorrect type
getchar() reruns an int with the values [0..UCHAR_MAX] and the negative: EOF. These 257 different values lose distinctiveness when saved as a char. Possible outcomes: infinite loop or premature loop end. Instead:
// char c;
int c;
Advanced: Arbitrary length
For very long lines buf *= 2; overflows when buf is SIZE_MAX/2 + 1. An alterative to growing in steps of 1, 2, 4, 8, 16,..., consider 1, 3, 7, 15, .... That way code can handle strings up to SIZE_MAX.
Advanced: Reading '\0'
Although uncommon, possible to read in a null character. Then printf("You entered: %s\n", str); will only print to that null character and not to the end of input.
To print all, take advantage that code knows the length.
printf("You entered: ");
fwrite(str, len, 1, stdout);
printf("\n");
To be clear, text input here is not reading of strings, but of reading of lines. That input is saved and converted to a string by appending a null character. Reading a '\0' complicates things, but something robust code handles.

ANSI C strcmp() function never returning 0, where am I going wrong?

C isn't the language I know so I'm out of my comfort zone (learning C) and I have ran into an issue that I can't currently figure out.
I am trying to read from a text file one word at a time and compare it to a word that I have passed into the function as a pointer.
I am currently reading it from the file one character at a time and storing those characters in a new char array until it hits a space, then comparing that char array to the original word stored in the pointer (stored where it's pointing to, anyway).
When I do a printf to check if both arrays are the same they are, they both equal "Hello". At first I thought maybe it's because my char array doesn't have an end terminator but I tried adding one but still nothing is seeming to work.
My code is below and I would appreciate any help. Again C isn't my strong area.
If I do "Hello" it will be > 0 by the way, so I think it's because the gets() stdin function is also including the enter key or something of that sort. I am not sure of a better way to grab the string though.
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
int partA(char*);
main()
{
// Array to store my string
char myWord[81];
// myword = pointer to my char array to store. 80 = the size (maximum). stdin = standard input from my keyboard.
fgets(myWord, 80, stdin);
partA(myWord);
}
int partA(char *word)
{
// points to file.
FILE *readFile;
fopen_s(&readFile, "readThisFile.txt", "r");
char character;
char newWord[50];
int i = 0;
while ((character = fgetc(readFile)) != EOF)
{
if (character == ' ')
{
newWord[i] = '\0';
int sameWord = strcmp(word, newWord);
printf("Word: %s", word);
printf("newWord: %s", newWord);
if (sameWord == 0)
printf(" These words are the same.");
if (sameWord > 0)
printf(" sameWord > 0.");
if (sameWord < 0)
printf(" sameWord < 0.");
printf("\n");
i = 0;
}
if (character != ' ')
{
newWord[i] = character;
i++;
}
printf("%c", character);
}
fclose(readFile);
return 1;
}

Printing a string due to a new line

Is there any efficient (- in terms of performance) way for printing some arbitrary string, but only until the first new line character in it (excluding the new line character) ?
Example:
char *string = "Hello\nWorld\n";
printf(foo(string + 6));
Output:
World
If you are concerned about performance this might help (untested code):
void MyPrint(const char *str)
{
int len = strlen(str) + 1;
char *temp = alloca(len);
int i;
for (i = 0; i < len; i++)
{
char ch = str[i];
if (ch == '\n')
break;
temp[i] = ch;
}
temp[i] = 0;
puts(temp);
}
strlen is fast, alloca is fast, copying the string up to the first \n is fast, puts is faster than printf but is is most likely far slower than all three operations mentioned before together.
size_t writetodelim(char const *in, int delim)
{
char *end = strchr(in, delim);
if (!end)
return 0;
return fwrite(in, 1, end - in, stdout);
}
This can be generalized somewhat (pass the FILE* to the function), but it's already flexible enough to terminate the output on any chosen delimiter, including '\n'.
Warning: Do not use printf without format specifier to print a variable string (or from a variable pointer). Use puts instead or "%s", string.
C strings are terminated by '\0' (NUL), not by newline. So, the functions print until the NUL terminator.
You can, however, use your own loop with putchar. If that is any performance penalty is to be tested. Normally printf does much the same in the library and might be even slower, as it has to care for more additional constraints, so your own loop might very well be even faster.
for ( char *sp = string + 6 ; *sp != '\0'; sp++ ) {
if ( *sp == '\n' ) break; // newline will not be printed
putchar(*sp);
}
(Move the if-line to the end of the loop if you want newline to be printed.)
An alternative would be to limit the length of the string to print, but that would require finding the next newline before calling printf.
I don't know if it is fast enough, but there is a way to build a string containing the source string up to a new line character only involving one standard function.
char *string = "Hello\nWorld\nI love C"; // Example of your string
static char newstr [256]; // String large enough to contain the result string, fulled with \0s or NULL-terimated
sscanf(string + 6, "%s", newstr); // sscanf will ignore whitespaces
sprintf(newstr); // printing the string
I guess there is no more efficient way than simply looping over your string until you find the first \n in it. As Olaf mentioned it, a string in C ends with a terminating \0 so if you want to use printf to print the string you need to make sure it contains the terminating \0 or yu could use putchar to print the string character by character.
If you want to provide a function creating a string up to the first found new line you could do something like that:
#include <stdio.h>
#include <string.h>
#define MAX 256
void foo(const char* string, char *ret)
{
int len = (strlen(string) < MAX) ? (int) strlen(string) : MAX;
int i = 0;
for (i = 0; i < len - 1; i++)
{
if (string[i] == '\n') break;
ret[i] = string[i];
}
ret[i + 1] = '\0';
}
int main()
{
const char* string = "Hello\nWorld\n";
char ret[MAX];
foo(string, ret);
printf("%s\n", ret);
foo(string+6, ret);
printf("%s\n", ret);
}
This will print
Hello
World
Another fast way (if the new line character is truly unwanted)
Simply:
*strchr(string, '\n') = '\0';

How to delete a newline within a string

I want to delete a Newline '\n' within a string.
char *string ="hallo\n";
int i=0;
int length = sizeof(string);
while(i<length)
{
if(string[i+1] == '\n')
{
string[i+1] = '\0';
break;
}
i++;
}
printf("%s",string);
printf("world");
I know that I could just spawn a new array and it works like this
char *string ="hallo\n";
int i=0;
int length = sizeof(string);
int lengthNew = length -1;
char newStr[lengthNew];
while(i<length)
{
printf("Char ist %c:",string[i]);
newStr[i] = string[i];
if(string[i+1] == '\n')
break;
i++;
}
But why using stack if I just could substitude one character in the old array?
Based on your comment, I offer a completely different, yet better solution: strftime:
time_t clock = time(NULL);
char buf[1024];
strftime(buf, sizeof buf, "%c", localtime(&clock);
printf("The date is: %s\n", buf);
The %c format is the same as is used by ctime, but strftime is more flexible.
If the newline is always the last character of the string, you could code it like you have described.
Otherwise you'd have to create a second character buffer and copy the characters to the second buffer. The reason for this is that in C the \0 character marks the end of the string.
If you have a string like this: "this \n is \n a \n test", then after your replacement the memory would look like this: "this \0 is \0 a \0 test". Most C programs will simply interpret this as the string "this " and ignore everything after the first null.
EDIT
As other have pointed out, there are also other problems with your code. sizeof() will return the size of the character pointer, not the length of the string. It is also not possible to modify a readonly string literal.
char *string = ctime(&myTimeT);
char *c = strrchr(string, '\n');
if (c != NULL)
*(c) = '\0';

Parsing text in C

I have a file like this:
...
words 13
more words 21
even more words 4
...
(General format is a string of non-digits, then a space, then any number of digits and a newline)
and I'd like to parse every line, putting the words into one field of the structure, and the number into the other. Right now I am using an ugly hack of reading the line while the chars are not numbers, then reading the rest. I believe there's a clearer way.
Edit: You can use pNum-buf to get the length of the alphabetical part of the string, and use strncpy() to copy that into another buffer. Be sure to add a '\0' to the end of the destination buffer. I would insert this code before the pNum++.
int len = pNum-buf;
strncpy(newBuf, buf, len-1);
newBuf[len] = '\0';
You could read the entire line into a buffer and then use:
char *pNum;
if (pNum = strrchr(buf, ' ')) {
pNum++;
}
to get a pointer to the number field.
fscanf(file, "%s %d", word, &value);
This gets the values directly into a string and an integer, and copes with variations in whitespace and numerical formats, etc.
Edit
Ooops, I forgot that you had spaces between the words.
In that case, I'd do the following. (Note that it truncates the original text in 'line')
// Scan to find the last space in the line
char *p = line;
char *lastSpace = null;
while(*p != '\0')
{
if (*p == ' ')
lastSpace = p;
p++;
}
if (lastSpace == null)
return("parse error");
// Replace the last space in the line with a NUL
*lastSpace = '\0';
// Advance past the NUL to the first character of the number field
lastSpace++;
char *word = text;
int number = atoi(lastSpace);
You can solve this using stdlib functions, but the above is likely to be more efficient as you're only searching for the characters you are interested in.
Given the description, I think I'd use a variant of this (now tested) C99 code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
struct word_number
{
char word[128];
long number;
};
int read_word_number(FILE *fp, struct word_number *wnp)
{
char buffer[140];
if (fgets(buffer, sizeof(buffer), fp) == 0)
return EOF;
size_t len = strlen(buffer);
if (buffer[len-1] != '\n') // Error if line too long to fit
return EOF;
buffer[--len] = '\0';
char *num = &buffer[len-1];
while (num > buffer && !isspace((unsigned char)*num))
num--;
if (num == buffer) // No space in input data
return EOF;
char *end;
wnp->number = strtol(num+1, &end, 0);
if (*end != '\0') // Invalid number as last word on line
return EOF;
*num = '\0';
if (num - buffer >= sizeof(wnp->word)) // Non-number part too long
return EOF;
memcpy(wnp->word, buffer, num - buffer);
return(0);
}
int main(void)
{
struct word_number wn;
while (read_word_number(stdin, &wn) != EOF)
printf("Word <<%s>> Number %ld\n", wn.word, wn.number);
return(0);
}
You could improve the error reporting by returning different values for different problems.
You could make it work with dynamically allocated memory for the word portion of the lines.
You could make it work with longer lines than I allow.
You could scan backwards over digits instead of non-spaces - but this allows the user to write "abc 0x123" and the hex value is handled correctly.
You might prefer to ensure there are no digits in the word part; this code does not care.
You could try using strtok() to tokenize each line, and then check whether each token is a number or a word (a fairly trivial check once you have the token string - just look at the first character of the token).
Assuming that the number is immediately followed by '\n'.
you can read each line to chars buffer, use sscanf("%d") on the entire line to get the number, and then calculate the number of chars that this number takes at the end of the text string.
Depending on how complex your strings become you may want to use the PCRE library. At least that way you can compile a perl'ish regular expression to split your lines. It may be overkill though.
Given the description, here's what I'd do: read each line as a single string using fgets() (making sure the target buffer is large enough), then split the line using strtok(). To determine if each token is a word or a number, I'd use strtol() to attempt the conversion and check the error condition. Example:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/**
* Read the next line from the file, splitting the tokens into
* multiple strings and a single integer. Assumes input lines
* never exceed MAX_LINE_LENGTH and each individual string never
* exceeds MAX_STR_SIZE. Otherwise things get a little more
* interesting. Also assumes that the integer is the last
* thing on each line.
*/
int getNextLine(FILE *in, char (*strs)[MAX_STR_SIZE], int *numStrings, int *value)
{
char buffer[MAX_LINE_LENGTH];
int rval = 1;
if (fgets(buffer, buffer, sizeof buffer))
{
char *token = strtok(buffer, " ");
*numStrings = 0;
while (token)
{
char *chk;
*value = (int) strtol(token, &chk, 10);
if (*chk != 0 && *chk != '\n')
{
strcpy(strs[(*numStrings)++], token);
}
token = strtok(NULL, " ");
}
}
else
{
/**
* fgets() hit either EOF or error; either way return 0
*/
rval = 0;
}
return rval;
}
/**
* sample main
*/
int main(void)
{
FILE *input;
char strings[MAX_NUM_STRINGS][MAX_STRING_LENGTH];
int numStrings;
int value;
input = fopen("datafile.txt", "r");
if (input)
{
while (getNextLine(input, &strings, &numStrings, &value))
{
/**
* Do something with strings and value here
*/
}
fclose(input);
}
return 0;
}

Resources