FILE *in_file = fopen("batchfile", "r"); // read only
while( fgets (buf, MAX_BUFFER, in_file)!=NULL )
{
buf[strcspn(buf, "\n")] = 0;
printf("%s ", buf);
}
When I try to run this code without the buf[strcspn(buf, "\n")] = 0;, line, it runs properly and prints out the 3 lines of text in my batchfile. When I add this line to strip the newline character it only prints the last line of my file. Anyone know why this is happening?
Windows represents line breaks as the two-character sequence CR, LF (carriage return, line feed). Most other systems represent line breaks as a simple LF. In C, a CR is '\r' and a LF is '\n'.
When you open a text file on Windows (fopen without the b flag), the CR-LF sequence is converted into a simple '\n'. When you open a text file on Unix-like systems (including Linux, macOS, and some ports of Unix development tools to Windows), a CR is just an ordinary character. So if you read a Windows text file on Unix and you want to remove the line breaks, you need to remove both the '\n and the preceding '\r'.
Many terminals interpret LF as “go to the beginning of the next line” and CR as “go to the beginning of the current line”. So if you read and print a Windows text file, the CR characters are effectively invisible. If you read and print a Unix text file but remove the LF characters, the lines come out concatenated together without breaks. If you read and print a Windows text file, and you remove the LF characters but you keep the CR characters, each line overwrites the previous line. You can watch that happening stepping through your program in a debugger, or by adding fflush(stdout); sleep(1); after each printf call.
To support both Unix and Windows line endings, check if there's a '\r' before the '\n' and remove it.
size_t len = strlen(buf);
if (len > 0 && buf[len-1] == '\n') buf[--len] = 0;
if (len > 0 && buf[len-1] == '\r') buf[--len] = 0;
Related
This function is passed the path of a text file(mapper_path) which contains paths to other text files on each line. I am supposed to open the mapper_path.txt file, then open and evaluate each of the paths within it (example in output).
fopen succeeds on the mapper_path file but fails on the paths which it contains.
In the failure condition, it prints the EXACT path I'm trying to open.
I'm working in C on windows and running commands on Ubuntu subsystem.
How can I properly read and store the sub-path into a variable to open it?
SOLVED with Rici's suggestion!
int processText(char * mapper_path, tuple * letters[])
{
char line[LINE_SIZE];
char txt_path[MAX_PATH];
FILE * mapper_fp = fopen(mapper_path, "r");
if(!mapper_fp)
{
printf("Failed to open mapper path: %s \n", mapper_path);
return -1;
}
//!!! PROBLEM IS HERE !!!
while(fgets(txt_path, MAX_PATH, mapper_fp))
{
//remove newline character from end
txt_path[strlen(txt_path)-1] = 0;
//open each txt file path, return -1 if it fails
FILE* fp = fopen(txt_path, "r");
if(!fp)
{
printf("Failed to open file path:%s\n", txt_path);
return -1;
}
//...more unimportant code
prints:
Failed to open filepath:
/mnt/c/users/adam/documents/csci_4061/projects/blackbeards/testtext.txt
This is the exact path of the file i am trying to open.
I suspect that the problem is related to this:
I'm working in C on windows and running commands on Ubuntu subsystem.
Presumably, you created the mapper.txt file using Windows tools, so it has Windows line endings. However, I think the Ubuntu subsystem does not know about Windows line endings, and so even though you open the file in mode 'r', it does not translate CR-LF into a single \n. When you then remove the \n at the end of the input, you still leave the \r.
That \r won't be visible when you print out the line, since all it does is move the cursor to the beginning of the line and the next character output is a \n. It's usually a good idea to surround strings with other text when you print debugging messages, since that can give you a clue about this sort of problem. If you'd used:
printf("Failed to open file path: '%s'\n", txt_path);
you might have seen the error:
'ailed to open filepath: '/mnt/c/users/adam/documents/csci_4061/projects/blackbeards/testtext.txt
Here, the hint that there is a \r at the end of the string is the overwriting of the first character of the message with the trailing apostrophe.
It's not quite accurate to say that fgets "adds a \n character to the end [of the line read]." It's more accurate to say that it doesn't remove that character, if it is present. It is quite possible that there isn't a newline at the end of the line. The line may be the last line in a text file which doesn't end with a newline character, for example. Or the fgets might have been terminated by reaching the character limit you supplied, rather than by finding a newline character.
So you are certainly better off using the getline interface, which has two advantages: (a) it allocates storage for the line itself, so you don't need to guess a maximum length in advance, and (b) it tells you exactly how many characters it read, so you don't have to count them.
Using that information, you can then remove a \n which happens to be at the end of the line, if there is one, and then remove the preceding \r, if there is one:
char* line = NULL;
size_t n_line = 0;
for (;;) {
ssize_t n_read = getline(&line, &n_line, mapper_fp);
if (n_read < 0) break; /* EOF or some kind of read error */
if (n_read > 0 && line[n_read - 1] == '\n')
line[nread--] = 0;
if (n_read > 0 && line[n_read - 1] == '\r')
line[nread--] = 0;
if (nread == 0) continue; /* blank line */
/* Handle the line read */
}
if (ferr(mapper_fp))
perror("Error reading mapper file");
free(line);
I am using GCC 6.4 and my printf statement is as follows:
printf("Initial Colour: RGB(%s,%s,%s)\n",userdata[k],userdata[k+1],userdata[k+2]);
It prints )nitial Colour: RGB(1,0.0 which is wrong. Note where the second bracket is printed.
It should be Initial Colour: RGB(1,0.0) as expected.
If I use GCC 4.5 with the same printf, it prints as expected.
What should my printf look like?
You are reading a data file that was created on a Windows system. Lines are terminated by \r\n (carriage return, line feed). You are either processing this file on that same Windows system, but you are opening the file in binary ("rb") mode. Or, you are transferring the file to a Unix or Linux (or Mac) system and processing it there, but you are transferring it in a "binary" mode, that preserves the CRLF, without converting it to the Unix single-newline ('\n') convention.
Then, you are reading lines of text, perhaps with fgets. You are discarding the newline ('\n'), but you are not discarding the carriage return ('\r'). So, each line ends with \r.
Then, you are splitting up the line into fields userdata[0], userdata[1], userdata[2], ... . I'm not sure if you're splitting it up at commas or at whitespace, but in any case, the \r is remaining attached to the last field.
Finally, when you print out that last field userdata[k+2], that carriage return at the end of it is causing the cursor to return to the beginning of the line before the final ) is printed.
You can fix this in several ways:
Don't create the file with \r\n in the first place.
If processing the file on a Windows system, open it in text ("r" or maybe "rt") mode, not binary.
If transferring files from a Windows to a non-Windows system, use a "text" transfer mode that converts line endings.
When reading lines from the file, strip off trailing \r as well as \n characters.
If splitting fields on whitespace, include '\r' in the set of whitespace characters to split on. For example, if you are calling strtok, with separators " " or " \t", change to " \r" or " \t\r".
Now that you've posted code, I can be more specific.
To achieve #4, add the line
if (buffer[strlen(buffer)-1] == '\r') buffer[strlen(buffer) - 1] = '\0';
after the line where you strip off the \n.
To achieve #5, change your two strtok calls to
data = strtok(buffer, " \r");
and
data = strtok(NULL, " \r");
As a matter of fact, you could also change those two lines to
data = strtok(buffer, " \r\n");
and
data = strtok(NULL, " \r\n");
and then you wouldn't need the newline-stripping step at all.
One more thing: your usage of feof is wrong. See Why is while(!feof (fp)) always wrong?.
:~$ wc -l bitmap.bmp
12931 bitmap.bmp
I would guess a binary file is like a stream, with no lines on it. So what does it mean when you talk about lines in a binary file?
(note: "wc -l" counts the lines in a file)
Alex Taylor pointed out below, as I suspected, that wc is counting the number of /n chars in the file.
So the question becomes:
The '\n' characters that wc finds are there randomly when it translates binary to text or do actually exist in the binary file? As something as b'\n' (in Python)? And if yes, why would someone use the newline char in a binary file?
It's the number of new line characters ('\n') in the data.
Looking at the source code for MacOS' wc, we see the following code:
if (doline) {
while ((len = read(fd, buf, buf_size))) {
if (len == -1) {
warn("%s: read", file);
(void)close(fd);
return (1);
}
charct += len;
for (p = buf; len--; ++p)
if (*p == '\n')
++linect;
}
It does a buffered read of the file, then loops through the data, incrementing a counter if it finds a '\n'.
The GNU version of wc contains similar code:
/* Increase character and, if necessary, line counters */
#define COUNT(c) \
ccount++; \
if ((c) == '\n') \
lcount++;
As to why a binary file has new line characters in it, they are just another value (0x0A for the most common OS'). There is nothing special about the character unless the file is being interpreted as a text file. Likewise, tabs, numbers and all the other 'text' characters will also appear in a binary file. This is why using cat on a binary file can cause a terminal to beep wildly - it's trying to display the BEL character (0x07). Text is only text by convention.
I'm reading a normal text file and write all the words as numbers to another text. When a line finishes it looks for a "new line character (\n)" and continues from the new line. In Ubuntu it executes perfectly but in Windows (DevC++) it cannot operate the function. My problem is the text in Windows which I read haven't got new line characters. Even I put new lines by my hand my program cannot see it. When I want to print the character at the end of the line, it says it is a space (ascii = 32) and I am sur that I am end of the line. Here is my end of line control code, what can I do to fix it? And I read about a character called "carriage return (\r)" but it doesn't fix my problem either.
c = fgetc(fp);
printf("%d", c);
fseek(fp, -1, SEEK_SET);
if(c == '\n' || c == '\r')
fprintf(fp3, "%c%c", '\r', '\n');
If you are opening a text file and want newline conversions to take place, open the file in "r" mode instead of "rb"
FILE *fp = fopen(fname, "r");
this will open in text mode instead of binary mode, which is what you want for text files. On linux there won't appear to be a difference, but on windows, \r\n will be translated to \n
A possible solution it seems, is to read the numbers out of your file into an int variable directly.
int n;
fscanf(fp, "%d", &n);
unless the newline means something significant to you.
There are a couple of questions here
What is the difference between windows text newline and unix text newline?
UNIX newline is LF only. ASCII code 0x0a.
Windows newline is CR + LF. ASCII code 0x0d and 0x0a
Does your file have LF or CR ?
Use a hex editor to see the contents of the file. I use xxd on linux.
$ xxd unix.txt
0000000: 0a0a
$ xxd windows.txt
0000000: 0d0a
I have a C code which reads 1 line at a time, from a file opened in text mode using
fgets(buf,200,fin);
The input file which fgets() reads lines from, is an command line argument to the program.
Now fgets leaves the newline character included in the string copied to buf.
Somewhere do the line in the code I check
length = strlen(buf);
For some input files , which I guess are edited in *nix environment newline character is just '\n'
But for some other test case input files(which I guess are edited/created under Windows environment) have 2 characters indicating a newline - '\r''\n'
I want to remove the newline character and want to put a '\0' as the string terminator character. So I have to either do -
if(len == (N+1))
{
if(buf[length-1] == '\n')
{
buf[length-2] = '\0'; //for a `\r\n` newline
}
}
or
if(len == (N))
{
if(buf[length-1] == '\n')
{
buf[length-1] = '\0'; //for a `\n` newline
}
}
Since the text files are passed as commandline argument to the program I have no control of how it is edited/composed and hence cannot filter it using some tool to make newlines consistent.
How can I handle this situation?
Is there any fgets equivalent function in standard C library(no extensions) which can handle these inconsistent newline characters and return a string without them?
I like to update length at the same time
if (buf[length - 1] == '\n') buf[--length] = 0;
if (buf[length - 1] == '\r') buf[--length] = 0;
or, to remove all trailing whitespace
/* remember to #include <ctype.h> */
while ((length > 0) && isspace((unsigned char)buf[length - 1])) {
buf[--length] = 0;
}
I think your best (and easiest) option is to write your own strlen function:
size_t zstrlen(char *line)
{
char *s = line;
while (*s && *s != '\r' && s != '\n) s++;
*s = '\0';
return (s - line);
}
Now, to calculate the length of the string excluding the newline character(s) and eliminating it(/them) you simply do:
fgets(buf,200,fin);
length = zstrlen(buf);
It works for Unix style ('\n'), Windows style ('\r\n') and old Mac style ('\r').
Note that there are faster (but non-portable) implementation of strlen that you can adapt to your needs.
Hope it helps,
RD:
If you are troubled by the different line endings (\n and \r\n) on different machines, one way to neutralize them would be to use the dos2unix command (assuming you are working on linux and have files edited in a Windows environment). That command would replace all window-style line endings with linux-style line endings. The reverse unix2dos also exists. You can call these utilities from within the C program (system maybe) and then process the line like you are currently doing. This would reduce the burden on your program.