End of line character / Carriage return - c

I'm reading a normal text file and write all the words as numbers to another text. When a line finishes it looks for a "new line character (\n)" and continues from the new line. In Ubuntu it executes perfectly but in Windows (DevC++) it cannot operate the function. My problem is the text in Windows which I read haven't got new line characters. Even I put new lines by my hand my program cannot see it. When I want to print the character at the end of the line, it says it is a space (ascii = 32) and I am sur that I am end of the line. Here is my end of line control code, what can I do to fix it? And I read about a character called "carriage return (\r)" but it doesn't fix my problem either.
c = fgetc(fp);
printf("%d", c);
fseek(fp, -1, SEEK_SET);
if(c == '\n' || c == '\r')
fprintf(fp3, "%c%c", '\r', '\n');

If you are opening a text file and want newline conversions to take place, open the file in "r" mode instead of "rb"
FILE *fp = fopen(fname, "r");
this will open in text mode instead of binary mode, which is what you want for text files. On linux there won't appear to be a difference, but on windows, \r\n will be translated to \n
A possible solution it seems, is to read the numbers out of your file into an int variable directly.
int n;
fscanf(fp, "%d", &n);
unless the newline means something significant to you.

There are a couple of questions here
What is the difference between windows text newline and unix text newline?
UNIX newline is LF only. ASCII code 0x0a.
Windows newline is CR + LF. ASCII code 0x0d and 0x0a
Does your file have LF or CR ?
Use a hex editor to see the contents of the file. I use xxd on linux.
$ xxd unix.txt
0000000: 0a0a
$ xxd windows.txt
0000000: 0d0a

Related

Stripping newline character causes problems reading batchfile in c

FILE *in_file = fopen("batchfile", "r"); // read only
while( fgets (buf, MAX_BUFFER, in_file)!=NULL )
{
buf[strcspn(buf, "\n")] = 0;
printf("%s ", buf);
}
When I try to run this code without the buf[strcspn(buf, "\n")] = 0;, line, it runs properly and prints out the 3 lines of text in my batchfile. When I add this line to strip the newline character it only prints the last line of my file. Anyone know why this is happening?
Windows represents line breaks as the two-character sequence CR, LF (carriage return, line feed). Most other systems represent line breaks as a simple LF. In C, a CR is '\r' and a LF is '\n'.
When you open a text file on Windows (fopen without the b flag), the CR-LF sequence is converted into a simple '\n'. When you open a text file on Unix-like systems (including Linux, macOS, and some ports of Unix development tools to Windows), a CR is just an ordinary character. So if you read a Windows text file on Unix and you want to remove the line breaks, you need to remove both the '\n and the preceding '\r'.
Many terminals interpret LF as “go to the beginning of the next line” and CR as “go to the beginning of the current line”. So if you read and print a Windows text file, the CR characters are effectively invisible. If you read and print a Unix text file but remove the LF characters, the lines come out concatenated together without breaks. If you read and print a Windows text file, and you remove the LF characters but you keep the CR characters, each line overwrites the previous line. You can watch that happening stepping through your program in a debugger, or by adding fflush(stdout); sleep(1); after each printf call.
To support both Unix and Windows line endings, check if there's a '\r' before the '\n' and remove it.
size_t len = strlen(buf);
if (len > 0 && buf[len-1] == '\n') buf[--len] = 0;
if (len > 0 && buf[len-1] == '\r') buf[--len] = 0;

using brackets in printf not working correctly

I am using GCC 6.4 and my printf statement is as follows:
printf("Initial Colour: RGB(%s,%s,%s)\n",userdata[k],userdata[k+1],userdata[k+2]);
It prints )nitial Colour: RGB(1,0.0 which is wrong. Note where the second bracket is printed.
It should be Initial Colour: RGB(1,0.0) as expected.
If I use GCC 4.5 with the same printf, it prints as expected.
What should my printf look like?
You are reading a data file that was created on a Windows system. Lines are terminated by \r\n (carriage return, line feed). You are either processing this file on that same Windows system, but you are opening the file in binary ("rb") mode. Or, you are transferring the file to a Unix or Linux (or Mac) system and processing it there, but you are transferring it in a "binary" mode, that preserves the CRLF, without converting it to the Unix single-newline ('\n') convention.
Then, you are reading lines of text, perhaps with fgets. You are discarding the newline ('\n'), but you are not discarding the carriage return ('\r'). So, each line ends with \r.
Then, you are splitting up the line into fields userdata[0], userdata[1], userdata[2], ... . I'm not sure if you're splitting it up at commas or at whitespace, but in any case, the \r is remaining attached to the last field.
Finally, when you print out that last field userdata[k+2], that carriage return at the end of it is causing the cursor to return to the beginning of the line before the final ) is printed.
You can fix this in several ways:
Don't create the file with \r\n in the first place.
If processing the file on a Windows system, open it in text ("r" or maybe "rt") mode, not binary.
If transferring files from a Windows to a non-Windows system, use a "text" transfer mode that converts line endings.
When reading lines from the file, strip off trailing \r as well as \n characters.
If splitting fields on whitespace, include '\r' in the set of whitespace characters to split on. For example, if you are calling strtok, with separators " " or " \t", change to " \r" or " \t\r".
Now that you've posted code, I can be more specific.
To achieve #4, add the line
if (buffer[strlen(buffer)-1] == '\r') buffer[strlen(buffer) - 1] = '\0';
after the line where you strip off the \n.
To achieve #5, change your two strtok calls to
data = strtok(buffer, " \r");
and
data = strtok(NULL, " \r");
As a matter of fact, you could also change those two lines to
data = strtok(buffer, " \r\n");
and
data = strtok(NULL, " \r\n");
and then you wouldn't need the newline-stripping step at all.
One more thing: your usage of feof is wrong. See Why is while(!feof (fp)) always wrong?.

C program to get first word of each line from a .txt file and print that word onto another .txt file: Kind of works but also prints random letters

So we have this file called dictionary1.txt and it has words with their pronounciation right next to them. What I want to do is to get the first word from each line and print them onto another txt file that the program creates from scratch. My code does it but it also prints random Chinese letters in between English words, I don't know why.
Here's what the ouput file looks like: https://imgur.com/a/pZthP
(Pronounciations are seperated from the actual words in each line with a blankspace in dictionary1.txt)
My code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
char line[100];
int i = 0;
FILE* fp1 = fopen("dictionary1.txt", "r");
FILE* fp2 = fopen("dictionary2.txt", "w");
if (fp1 == NULL || fp2 == NULL){
printf("ERROR");
return -1;
}
while (fgets(line, 100, fp1) != NULL){
while (line[i] != ' '){
fputc(line[i], fp2);
i++;
}
i=0;
fputc('\0', fp2);
}
return 0;
}
I tried fputc('\n', fp2) as well bu t no matter what I couldn't get onto the next line in the file I created from scratch. I also can't get rid of all the random Chinese letters.
EDIT: I figured it out. The .txt file I was working on was saved in Unicode formatting, which didn't work well with my program. I turned it into ANSI and now it works like a charm.
\n is not the right line separator on all operating systems and all editors.
If you are editing your txt files on Notepad, try fputs ("\r\n", fp2);, where \r means carriage return (cursor returns at the first character of the line) and \n new line.
Generally speaking, Windows uses '\r\n' as line separator, the '\n' character is displayed as something else than end line, at least in Notepad. Linux and Mac OS use different line separators. You may also want to try fprintf(fp2, "\n");
Check this out
\n and \r seem to work everywhere. Why is line.separator more portable?
If you don't mind using C++, you could try to create an output stream os and write os << endl
Note that some compilers may automatically convert '\n' into the corresponding operating system end line character/caracther sequence, whereas some may not.
Another thing, change the while loop condition into line[i] != ' ' && line[i] != '\0' and close the file fp2 using fclose.
.txt file was saved using Unicode formatting. I turned it into ANSI and everything was suddenly fixed.

Reading \n as really Feed Line character from text file in C

I'm trying to read text file with C. Text file is a simple language file which works in embeded device and EACH LINE of file has a ENUM on code side. Here is a simple part of my file :
SAMPLE FROM TEXT FILE :
OPERATION SUCCESS!
OPERATION FAILED!\nRETRY COUNT : %d
ENUM :
typedef enum
{
...
MESSAGE_VALID_OP,
MESSAGE_INVALID_OP_WITH_RETRY_COUNT
...
}
Load Strings :
typedef struct
{
char *str;
} Message;
int iTotalMessageCount = 1012;
void vLoadLanguageStrings()
{
FILE *xStringList;
char * tmp_line_message[256];
size_t len = 0;
ssize_t read;
int message_index = 0;
xStringList = fopen("/home/change/strings.bin", "r");
if (xStringList == NULL)
exit(EXIT_FAILURE);
mMessages = (Message *) malloc(iTotalMessageCount * sizeof(Message));
while ((read = fgets(tmp_line_message, 256, xStringList)) != -1 && message_index < iTotalMessageCount)
{
mMessages[message_index].str = (char *) malloc(strlen(tmp_line_message));
memcpy(mMessages[message_index].str, tmp_line_message, strlen(tmp_line_message) -1);
message_index++;
}
fclose(xStringList);
}
As you se in the Sample from text file i have to use \n Feed Line character on some of my lines. After all, i read file successfuly. But if i try to call my text which has feed line \n, feed line character just printed on device screen as \ & n characters.
I already try with getline(...) method. How can i handle \n character without raising the complexity and read file line by line.
As you se in the Sample from text file i have to use \n Feed Line
character on some of my lines.
No, I don't see that. Or at least, I don't see you doing that. The two-character sequence \n is significant primarily to the C compiler; it has no inherent special significance in data files, whether those files are consumed by a C program or not.
Indeed, if the system recognizes line feeds as line terminators, then by definition, it is impossible to embed a literal line feed in a physical line. What it looks like you are trying to do is to encode line feeds as the "\n" character sequence. That's fine, but it's quite a different thing from embedding a line feed character itself.
But after all, i read file successfuly.
But if i try to call my text which has feed line \n, feed line
character just printed on device screen as \ & n characters.
Of course. Those are the characters you read in (not a line feed), so if you write them back out then you reproduce them. If you are encoding line feeds via that character sequence, then your program must decode that sequence if you want it to output literal line feeds in its place.
I already try with getline(...) method. How can i handle \n character
without raising the complexity and read file line by line.
You need to process each line read to decode the \n sequences in it. I would write a function for that. Any way around, however, your program will be more complex, because the current version simply doesn't do all the things it needs to do.

What does a line count of a binary file mean?

:~$ wc -l bitmap.bmp
12931 bitmap.bmp
I would guess a binary file is like a stream, with no lines on it. So what does it mean when you talk about lines in a binary file?
(note: "wc -l" counts the lines in a file)
Alex Taylor pointed out below, as I suspected, that wc is counting the number of /n chars in the file.
So the question becomes:
The '\n' characters that wc finds are there randomly when it translates binary to text or do actually exist in the binary file? As something as b'\n' (in Python)? And if yes, why would someone use the newline char in a binary file?
It's the number of new line characters ('\n') in the data.
Looking at the source code for MacOS' wc, we see the following code:
if (doline) {
while ((len = read(fd, buf, buf_size))) {
if (len == -1) {
warn("%s: read", file);
(void)close(fd);
return (1);
}
charct += len;
for (p = buf; len--; ++p)
if (*p == '\n')
++linect;
}
It does a buffered read of the file, then loops through the data, incrementing a counter if it finds a '\n'.
The GNU version of wc contains similar code:
/* Increase character and, if necessary, line counters */
#define COUNT(c) \
ccount++; \
if ((c) == '\n') \
lcount++;
As to why a binary file has new line characters in it, they are just another value (0x0A for the most common OS'). There is nothing special about the character unless the file is being interpreted as a text file. Likewise, tabs, numbers and all the other 'text' characters will also appear in a binary file. This is why using cat on a binary file can cause a terminal to beep wildly - it's trying to display the BEL character (0x07). Text is only text by convention.

Resources