reading in a file and getting the string length - c

gcc 4.4.4 c89
I am using the following code to read in file using fgets. I just want to get the gender which could be either M or F.
However, as the gender is always the last character in the string. I thought I could get the character by using strlen. However, for some reason I have to get the strlen and minus 2. I know that the strlen doesn't include the nul. However, it will include the carriage return.
The exact line of text I am reading in is this:
"Low, Lisa" 35 F
My code:
int read_char(FILE *fp)
{
#define STRING_SIZE 30
char temp[STRING_SIZE] = {0};
int len = 0;
fgets(temp, STRING_SIZE, fp);
if(temp == NULL) {
fprintf(stderr, "Text file corrupted\n");
return FALSE;
}
len = strlen(temp);
return temp[len - 2];
}
The strlen returns 17 when I feel it should return 16. String length including the carriage return. I feel I should be doing - 1 instead of - 2.
Any suggestions if you understand my question.
Thanks,
EDIT:
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops
after an EOF or a newline. If a newline is read, it is stored into the buffer. A '\0' is stored after the last character in the
buffer
So the buffer will contain:
"Low, Lisa" 35 F\0\r
Which will return 17 from strlen if it is including the \r? Am I correct in thinking that?

The buffer will contain:
"Low, Lisa" 35 F\n\0
so -2 is correct: strlen - 0 would be the null terminator, -1 the newline, and -2 is the letter F.
Also,
if(temp == NULL) {
temp is an array - it can never be NULL.

Instead of
if (temp == NULL)
check the return value from fgets instead, if its null then that would indicate failure
if ( fgets(temp, STRING_SIZE, fp) == NULL )
yes, strlen includes the newline
note that if you are on the last line of the file and there is no \n at the end of that line you make encounter a problem if you assume there is always \n in the string.
an alternative way would be to read the string as you do but check the last char, if there is no \n then you shouldn't use -2 offset but -1 instead.

It depends on the operating system used to save the files :
for Windows, carriage returns are \r\n
for Linux, they are \n

Did u debug and find what exactly coming at Len. If u are doing it in c add watch and find out the what is displaying at your value len.

Related

fscanf() how to go in the next line?

So I have a wall of text in a file and I need to recognize some words that are between the $ sign and call them as numbers then print the modified text in another file along with what the numbers correspond to.
Also lines are not defined and columns should be max 80 characters.
Ex:
I $like$ cats.
I [1] cats.
[1] --> like
That's what I did:
#include <stdio.h>
#include <stdlib.h>
#define N 80
#define MAX 9999
int main()
{
FILE *fp;
int i=0,count=0;
char matr[MAX][N];
if((fp = fopen("text.txt","r")) == NULL){
printf("Error.");
exit(EXIT_FAILURE);
}
while((fscanf(fp,"%s",matr[i])) != EOF){
printf("%s ",matr[i]);
if(matr[i] == '\0')
printf("\n");
//I was thinking maybe to find two $ but Idk how to replace the entire word
/*
if(matr[i] == '$')
count++;
if(count == 2){
...code...
}
*/
i++;
}
fclose(fp);
return 0;
}
My problem is that fscanf doesn't recognize '\0' so it doesn't go in the next line when I print the array..also I don't know how to replace $word$ with a number.
Not only will fscanf("%s") read one whitespace-delimited string at a time, it will also eat all whitespace between those strings, including line terminators. If you want to reproduce the input whitespace in the output, as your example suggests you do, then you need a different approach.
Also lines are not defined and columns should be max 80 characters.
I take that to mean the number of lines is not known in advance, and that it is acceptable to assume that no line will contain more than 80 characters (not counting any line terminator).
When you say
My problem is that fscanf doesn't recognize '\0' so it doesn't go in the next line when I print the array
I suppose you're talking about this code:
char matr[MAX][N];
/* ... */
if(matr[i] == '\0')
Given that declaration for matr, the given condition will always evaluate to false, regardless of any other consideration. fscanf() does not factor in at all. The type of matr[i] is char[N], an array of N elements of type char. That evaluates to a pointer to the first element of the array, which pointer will never be NULL. It looks like you're trying to determine when to write a newline, but nothing remotely resembling this approach can do that.
I suggest you start by taking #Barmar's advice to read line-by-line via fgets(). That might look like so:
char line[N+2]; /* N + 2 leaves space for both newline and string terminator */
if (fgets(line, sizeof(line), fp) != NULL) {
/* one line read; handle it ... */
} else {
/* handle end-of-file or I/O error */
}
Then for each line you read, parse out the "$word$" tokens by whatever means you like, and output the needed results (everything but the $-delimited tokens verbatim; the bracket substitution number for each token). Of course, you'll need to memorialize the substitution tokens for later output. Remember to make copies of those, as the buffer will be overwritten on each read (if done as I suggest above).
fscanf() does recognize '\0', under select circumstances, but that is not the issue here.
Code needs to detect '\n'. fscanf(fp,"%s"... will not do that. The first thing "%s" directs is to consume (and not save) any leading white-space including '\n'. Read a line of text with fgets().
Simple read 1 line at a time. Then march down the buffer looking for words.
Following uses "%n" to track how far in the buffer scanning stopped.
// more room for \n \0
#define BUF_SIZE (N + 1 + 1)
char buffer[BUF_SIZE];
while (fgets(buffer, sizeof buffer, stdin) != NULL) {
char *p = buffer;
char word[sizeof buffer];
int n;
while (sscanf(p, "%s%n", word, &n) == 1) {
// do something with word
if (strcmp(word, "$zero$") == 0) fputs("0", stdout);
else if (strcmp(word, "$one$") == 0) fputs("1", stdout);
else fputs(word, stdout);
fputc(' ', stdout);
p += n;
}
fputc('\n', stdout);
}
Use fread() to read the file contents to a char[] buffer. Then iterate through this buffer and whenever you find a $ you perform a strncmp to detect with which value to replace it (keep in mind, that there is a 2nd $ at the end of the word). To replace $word$ with a number you need to either shrink or extend the buffer at the position of the word - this depends on the string size of the number in ascii format (look solutions up on google, normally you should be able to use memmove). Then you can write the number to the cave, that arose from extending the buffer (just overwrite the $word$ aswell).
Then write the buffer to the file, overwriting all its previous contents.

Why does opendir() work for one string but not another?

I would like to open a directory using opendir but am seeing something unexpected. opendir works for the string returned from getcwd but not the string from my helper function read_cwd, even though the strings appear to be equal.
If I print the strings, both print /Users/gwg/x, which is the current working directory.
Here is my code:
char real_cwd[255];
getcwd(real_cwd, sizeof(real_cwd));
/* This reads a virtual working directory from a file */
char virt_cwd[255];
read_cwd(virt_cwd);
/* This prints "1" */
printf("%d\n", strcmp(real_cwd, virt_cwd) != 0);
/* This works for real_cwd but not virt_cwd */
DIR *d = opendir(/* real_cwd | virt_cwd */);
Here is the code for read_cwd:
char *read_cwd(char *cwd_buff)
{
FILE *f = fopen(X_PATH_FILE, "r");
fgets(cwd_buff, 80, f);
printf("Read cwd %s\n", cwd_buff);
fclose(f);
return cwd_buff;
}
The function fgets includes the final newline in the buffer — so the second string is actually "/Users/gwg/x\n".
The simplest (but not necessarily the cleanest) way to solve this issue is to overwrite the newline with a '\0': add the following at the end of the function read_cwd:
n = strlen(cwd_buff);
if(n > 0 && cwd_buff[n - 1] == '\n')
cwd_buff[n - 1] = '\0';
fgets() includes the newline.
Parsing stops if end-of-file occurs or a newline character is found, in which case str will contain that newline character. — http://en.cppreference.com/w/c/io/fgets
You should trim the white space on both ends of the string when reading input like this.
From the fgets man page:
fgets() reads in at most one less than size characters from stream and
stores them into the buffer pointed to by s. Reading stops after an
EOF or a newline. If a newline is read, it is stored into the buffer.
A terminating null byte (aq\0aq) is stored after the last character in
the buffer.
You need to remove the newline character from the string you are reading in.

Can't determine value of character at the end of a string

I'm new to C programming. I am trying to make a program that takes some simple input. However I found that on comparison of my input string to what the user "meant" to input, there is an additional character at the end. I thought this might be a '\0' or a '\r' but that seems not to be the case. This is my snippet of code:
char* getUserInput(char* command, char $MYPATH[])
{
printf("myshell$ ");
fgets(command, 200, stdin);
printf("%u\n", (unsigned)strlen(command));
if ((command[(unsigned)strlen(command) - 1] == '\0') || (command[(unsigned)strlen(command) - 1] == '\r'))
{
printf("bye\n");
}
return command;
}
The code shows that when entering, say "exit" that 5 characters are entered. However I can't seem to figure out the identity of this last one. "Bye" never prints. Does anyone know what this mystery character could be?
The magical 5th element most probably is a newline character: \n
From man fgets() (emphasis by me):
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A '\0' is
stored after the last character in the buffer.
To prove this print out each character read by doing so:
char* getUserInput(char* command, char $MYPATH[])
{
printf("myshell$ ");
fgets(command, 200, stdin);
printf("%u\n", (unsigned)strlen(command));
{
size_t i = 0, len = strlen(command);
for (;i < len; ++i)
{
fprintf(stderr, "command[%zu]='%c' (%hhd or 0x%hhx)\n", i, command[i], command[i], command[i]);
}
}
...
assumptions
array indexes in c are started with 0
strlen returns length of string
so, if you have string "exit", this will be 5 symbols in array = e, x, i, t, \0, strlen return 4, but you're trying to decrement it by 1, so you're checking last symbol in string, instead on NULL terminator
to check NULL terminator use command[strlen(command)] - this will give you \0 always, so there is no sense in it
if you want to compare strings use strcmp function
UPDATE: issue with your program is because fgets appends \n symbol at then end of string:
A newline character makes fgets stop reading, but it is considered a
valid character by the function and included in the string copied to
str.
The reason you don't see the last char is because strlen() won't calculate '\0' into the string's length. So testing for '\0' wont succeed.
for instance, const char* a = "abc"; then strlen(a) will be 3. if you want to test it, you need to access it by command[strlen(command)]
The reason for getting strlen equals to 5 on "exit" is because fgets will append the '\n' character at the end of the input. You could test it by command[strlen(command) -1 ] == '\n'

I don't understand the behavior of fgets in this example

While I could use strings, I would like to understand why this small example I'm working on behaves in this way, and how can I fix it ?
int ReadInput() {
char buffer [5];
printf("Number: ");
fgets(buffer,5,stdin);
return atoi(buffer);
}
void RunClient() {
int number;
int i = 5;
while (i != 0) {
number = ReadInput();
printf("Number is: %d\n",number);
i--;
}
}
This should, in theory or at least in my head, let me read 5 numbers from input (albeit overwriting them).
However this is not the case, it reads 0, no matter what.
I understand printf puts a \0 null terminator ... but I still think I should be able to either read the first number, not just have it by default 0. And I don't understand why the rest of the numbers are OK (not all 0).
CLARIFICATION: I can only read 4/5 numbers, first is always 0.
EDIT:
I've tested and it seems that this was causing the problem:
main.cpp
scanf("%s",&cmd);
if (strcmp(cmd, "client") == 0 || strcmp(cmd, "Client") == 0)
RunClient();
somehow.
EDIT:
Here is the code if someone wishes to compile. I still don't know how to fix
http://pastebin.com/8t8j63vj
FINAL EDIT:
Could not get rid of the error. Decided to simply add #ReadInput
int ReadInput(BOOL check) {
...
if (check)
printf ("Number: ");
...
# RunClient()
void RunClient() {
...
ReadInput(FALSE); // a pseudo - buffer flush. Not really but I ignore
while (...) { // line with garbage data
number = ReadInput(TRUE);
...
}
And call it a day.
fgets reads the input as well as the newline character. So when you input a number, it's like: 123\n.
atoi doesn't report errors when the conversion fails.
Remove the newline character from the buffer:
buf[5];
size_t length = strlen(buffer);
buffer[length - 1]=0;
Then use strtol to convert the string into number which provides better error detection when the conversion fails.
char * fgets ( char * str, int num, FILE * stream );
Get string from stream.
Reads characters from stream and stores them as a C string into str until (num-1) characters have been read or either a newline or the end-of-file is reached, whichever happens first.
A newline character makes fgets stop reading, but it is considered a valid character by the function and included in the string copied to str. (This means that you carry \n)
A terminating null character is automatically appended after the characters copied to str.
Notice that fgets is quite different from gets: not only fgets accepts a stream argument, but also allows to specify the maximum size of str and includes in the string any ending newline character.
PD: Try to have a larger buffer.

Does fgets() always terminate the char buffer with \0?

Does fgets() always terminate the char buffer with \0 even if EOF is already reached? It looks like it does (it certainly does in the implementation presented in the ANSI K&R book), but I thought I would ask to be sure.
I guess this question applies to other similar functions such as gets().
EDIT: I know that \0 is appended during "normal" circumstances, my question is targeted at EOF or error conditions. For example:
FILE *fp;
char b[128];
/* ... */
if (feof(fp)) {
/* is \0 appended after EACH of these calls? */
fgets(b, 128, fp);
fgets(b, 128, fp);
fgets(b, 128, fp);
}
fgets does always add a '\0' to the read buffer, it reads at most size - 1 characters from the stream (size being the second parameter) because of this.
Never use gets as you can never guarantee that it won't overflow any buffer that you give it, so while it technically does always terminate the read string this doesn't actually help.
Never use gets!!
7.19.7.2 The fgets function
Synopsis
1 #include <stdio.h>
char *fgets(char * restrict s, int n,
FILE * restrict stream);
Description
2 The fgets function reads at most one less than the number of characters
specified by n from the stream pointed to by stream into the array pointed
to by s. No additional characters are read after a new-line character
(which is retained) or after end-of-file. A null character is written
immediately after the last character read into the array.
Returns
3 The fgets function returns s if successful. If end-of-file is encountered
and no characters have been read into the array, the contents of the array
remain unchanged and a null pointer is returned. If a read error occurs
during the operation, the array contents are indeterminate and a null
pointer is returned.
So, yes, when fgets() does not return NULL the destination array always has a null character.
If fgets() returns NULL, the destination array may have been changed and may not have a null character. Never rely on the array after getting NULL from fgets().
Edit example added
$ cat fgets_error.c
#include <stdio.h>
void print_buf(char *buf, size_t len) {
int k;
printf("%02X", buf[0]);
for (k=1; k<len; k++) printf(" %02X", buf[k]);
}
int main(void) {
char buf[3] = {1, 1, 1};
char *r;
printf("Enter CTRL+D: ");
fflush(stdout);
r = fgets(buf, sizeof buf, stdin);
printf("\nfgets returned %p, buf has [", (void*)r);
print_buf(buf, sizeof buf);
printf("]\n");
return 0;
}
$ ./a.out
Enter CTRL+D:
fgets returned (nil), buf has [01 01 01]
$
See? no NUL in buf :)
man fgets:
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a new‐line is read, it is stored into the buffer. A '\0' is stored after the last character in the buffer.
If you did open the file in binary mode "rb", and if you want to read Text line by line by using fgets you can use the following code to protect your software of loosing text, if by a mistake the text contained a '\0' byte.
But finally like the others mentioned, normally you should not use fgets if the stream contains '\0'.
size_t filepos=ftell(stream);
fgets(buffer, buffersize, stream);
len=strlen(buffer);
/* now check for > len+1 since no problem if the
last byte is 0 */
if(ftell(stream)-filepos > len+1)
{
if(!len) filepos++;
if(!fseek(stream, filepos, SEEK_SET) && len)
{
fread(buffer, 1, len, stream);
buffer[len]='\0';
}
}
Yes it does. From CPlusPlus.com
Reads characters from stream and stores them as a C string into str until (num-1) characters have been read or either a newline or a the End-of-File is reached, whichever comes first.
A newline character makes fgets stop reading, but it is considered a valid character and therefore it is included in the string copied to str.
A null character is automatically appended in str after the characters read to signal the end of the C string.

Resources