What does a line count of a binary file mean? - file

:~$ wc -l bitmap.bmp
12931 bitmap.bmp
I would guess a binary file is like a stream, with no lines on it. So what does it mean when you talk about lines in a binary file?
(note: "wc -l" counts the lines in a file)
Alex Taylor pointed out below, as I suspected, that wc is counting the number of /n chars in the file.
So the question becomes:
The '\n' characters that wc finds are there randomly when it translates binary to text or do actually exist in the binary file? As something as b'\n' (in Python)? And if yes, why would someone use the newline char in a binary file?

It's the number of new line characters ('\n') in the data.
Looking at the source code for MacOS' wc, we see the following code:
if (doline) {
while ((len = read(fd, buf, buf_size))) {
if (len == -1) {
warn("%s: read", file);
(void)close(fd);
return (1);
}
charct += len;
for (p = buf; len--; ++p)
if (*p == '\n')
++linect;
}
It does a buffered read of the file, then loops through the data, incrementing a counter if it finds a '\n'.
The GNU version of wc contains similar code:
/* Increase character and, if necessary, line counters */
#define COUNT(c) \
ccount++; \
if ((c) == '\n') \
lcount++;
As to why a binary file has new line characters in it, they are just another value (0x0A for the most common OS'). There is nothing special about the character unless the file is being interpreted as a text file. Likewise, tabs, numbers and all the other 'text' characters will also appear in a binary file. This is why using cat on a binary file can cause a terminal to beep wildly - it's trying to display the BEL character (0x07). Text is only text by convention.

Related

fopen failing on variable filepath

This function is passed the path of a text file(mapper_path) which contains paths to other text files on each line. I am supposed to open the mapper_path.txt file, then open and evaluate each of the paths within it (example in output).
fopen succeeds on the mapper_path file but fails on the paths which it contains.
In the failure condition, it prints the EXACT path I'm trying to open.
I'm working in C on windows and running commands on Ubuntu subsystem.
How can I properly read and store the sub-path into a variable to open it?
SOLVED with Rici's suggestion!
int processText(char * mapper_path, tuple * letters[])
{
char line[LINE_SIZE];
char txt_path[MAX_PATH];
FILE * mapper_fp = fopen(mapper_path, "r");
if(!mapper_fp)
{
printf("Failed to open mapper path: %s \n", mapper_path);
return -1;
}
//!!! PROBLEM IS HERE !!!
while(fgets(txt_path, MAX_PATH, mapper_fp))
{
//remove newline character from end
txt_path[strlen(txt_path)-1] = 0;
//open each txt file path, return -1 if it fails
FILE* fp = fopen(txt_path, "r");
if(!fp)
{
printf("Failed to open file path:%s\n", txt_path);
return -1;
}
//...more unimportant code
prints:
Failed to open filepath:
/mnt/c/users/adam/documents/csci_4061/projects/blackbeards/testtext.txt
This is the exact path of the file i am trying to open.
I suspect that the problem is related to this:
I'm working in C on windows and running commands on Ubuntu subsystem.
Presumably, you created the mapper.txt file using Windows tools, so it has Windows line endings. However, I think the Ubuntu subsystem does not know about Windows line endings, and so even though you open the file in mode 'r', it does not translate CR-LF into a single \n. When you then remove the \n at the end of the input, you still leave the \r.
That \r won't be visible when you print out the line, since all it does is move the cursor to the beginning of the line and the next character output is a \n. It's usually a good idea to surround strings with other text when you print debugging messages, since that can give you a clue about this sort of problem. If you'd used:
printf("Failed to open file path: '%s'\n", txt_path);
you might have seen the error:
'ailed to open filepath: '/mnt/c/users/adam/documents/csci_4061/projects/blackbeards/testtext.txt
Here, the hint that there is a \r at the end of the string is the overwriting of the first character of the message with the trailing apostrophe.
It's not quite accurate to say that fgets "adds a \n character to the end [of the line read]." It's more accurate to say that it doesn't remove that character, if it is present. It is quite possible that there isn't a newline at the end of the line. The line may be the last line in a text file which doesn't end with a newline character, for example. Or the fgets might have been terminated by reaching the character limit you supplied, rather than by finding a newline character.
So you are certainly better off using the getline interface, which has two advantages: (a) it allocates storage for the line itself, so you don't need to guess a maximum length in advance, and (b) it tells you exactly how many characters it read, so you don't have to count them.
Using that information, you can then remove a \n which happens to be at the end of the line, if there is one, and then remove the preceding \r, if there is one:
char* line = NULL;
size_t n_line = 0;
for (;;) {
ssize_t n_read = getline(&line, &n_line, mapper_fp);
if (n_read < 0) break; /* EOF or some kind of read error */
if (n_read > 0 && line[n_read - 1] == '\n')
line[nread--] = 0;
if (n_read > 0 && line[n_read - 1] == '\r')
line[nread--] = 0;
if (nread == 0) continue; /* blank line */
/* Handle the line read */
}
if (ferr(mapper_fp))
perror("Error reading mapper file");
free(line);

y with umlaut in file

I'm working on an example problem where I have to reverse the text in a text file using fseek() and ftell(). I was successful, but printing the same output to a file, I had some weird results.
The text file I input was the following:
redivider
racecar
kayak
civic
level
refer
These are all palindromes
The result in the command line works great. In the text file that I create however, I get the following:
ÿsemordnilap lla era esehTT
referr
levell
civicc
kayakk
racecarr
redivide
I am aware from the answer to this question says that this corresponds to the text file version of EOF in C. I'm just confused as to why the command line and text file outputs are different.
#include <stdio.h>
#include <stdlib.h>
/**********************************
This program is designed to read in a text file and then reverse the order
of the text.
The reversed text then gets output to a new file.
The new file is then opened and read.
**********************************/
int main()
{
//Open our files and check for NULL
FILE *fp = NULL;
fp = fopen("mainText.txt","r");
if (!fp)
return -1;
FILE *fnew = NULL;
fnew = fopen("reversedText.txt","w+");
if (!fnew)
return -2;
//Go to the end of the file so we can reverse it
int i = 1;
fseek(fp, 0, SEEK_END);
int endNum = ftell(fp);
while(i < endNum+1)
{
fseek(fp,-i,SEEK_END);
printf("%c",fgetc(fp));
fputc(fgetc(fp),fnew);
i++;
}
fclose(fp);
fclose(fnew);
fp = NULL;
fnew = NULL;
return 0;
}
No errors, I just want identical outputs.
The outputs are different because your loop reads two characters from fp per iteration.
For example, in the first iteration i is 1 and so fseek sets the current file position of fp just before the last byte:
...
These are all palindromes
^
Then printf("%c",fgetc(fp)); reads a byte (s) and prints it to the console. Having read the s, the file position is now
...
These are all palindromes
^
i.e. we're at the end of the file.
Then fputc(fgetc(fp),fnew); attempts to read another byte from fp. This fails and fgetc returns EOF (a negative value, usually -1) instead. However, your code is not prepared for this and blindly treats -1 as a character code. Converted to a byte, -1 corresponds to 255, which is the character code for ÿ in the ISO-8859-1 encoding. This byte is written to your file.
In the next iteration of the loop we seek back to the e:
...
These are all palindromes
^
Again the loop reads two characters: e is written to the console, and s is written to the file.
This continues backwards until we reach the beginning of the input file:
redivider
^
Yet again the loop reads two characters: r is written to the console, and e is written to the file.
This ends the loop. The end result is that your output file contains one character that doesn't exist (from the attempt to read past the end of the input file) and never sees the first character.
The fix is to only call fgetc once per loop:
while(i < endNum+1)
{
fseek(fp,-i,SEEK_END);
int c = fgetc(fp);
if (c == EOF) {
perror("error reading from mainText.txt");
exit(EXIT_FAILURE);
}
printf("%c", c);
fputc(c, fnew);
i++;
}
In addition to #melpomene correction about using only 1 fgetc() per loops, other issues exist.
fseek(questionable_offset)
fopen("mainText.txt","r"); opens the file in text mode and not binary mode. Thus the using fseek(various_values) as a valid offset into the file is prone to troubles. Usually not a problem in *nix systems.
I do not have a simple alternative.
ftell() return type
ftell() return long. Use long instead of int i, endNum. (Not a concern with small files)
Check return values
ftell() and fseek() can fail. Test for error returns.

Reading \n as really Feed Line character from text file in C

I'm trying to read text file with C. Text file is a simple language file which works in embeded device and EACH LINE of file has a ENUM on code side. Here is a simple part of my file :
SAMPLE FROM TEXT FILE :
OPERATION SUCCESS!
OPERATION FAILED!\nRETRY COUNT : %d
ENUM :
typedef enum
{
...
MESSAGE_VALID_OP,
MESSAGE_INVALID_OP_WITH_RETRY_COUNT
...
}
Load Strings :
typedef struct
{
char *str;
} Message;
int iTotalMessageCount = 1012;
void vLoadLanguageStrings()
{
FILE *xStringList;
char * tmp_line_message[256];
size_t len = 0;
ssize_t read;
int message_index = 0;
xStringList = fopen("/home/change/strings.bin", "r");
if (xStringList == NULL)
exit(EXIT_FAILURE);
mMessages = (Message *) malloc(iTotalMessageCount * sizeof(Message));
while ((read = fgets(tmp_line_message, 256, xStringList)) != -1 && message_index < iTotalMessageCount)
{
mMessages[message_index].str = (char *) malloc(strlen(tmp_line_message));
memcpy(mMessages[message_index].str, tmp_line_message, strlen(tmp_line_message) -1);
message_index++;
}
fclose(xStringList);
}
As you se in the Sample from text file i have to use \n Feed Line character on some of my lines. After all, i read file successfuly. But if i try to call my text which has feed line \n, feed line character just printed on device screen as \ & n characters.
I already try with getline(...) method. How can i handle \n character without raising the complexity and read file line by line.
As you se in the Sample from text file i have to use \n Feed Line
character on some of my lines.
No, I don't see that. Or at least, I don't see you doing that. The two-character sequence \n is significant primarily to the C compiler; it has no inherent special significance in data files, whether those files are consumed by a C program or not.
Indeed, if the system recognizes line feeds as line terminators, then by definition, it is impossible to embed a literal line feed in a physical line. What it looks like you are trying to do is to encode line feeds as the "\n" character sequence. That's fine, but it's quite a different thing from embedding a line feed character itself.
But after all, i read file successfuly.
But if i try to call my text which has feed line \n, feed line
character just printed on device screen as \ & n characters.
Of course. Those are the characters you read in (not a line feed), so if you write them back out then you reproduce them. If you are encoding line feeds via that character sequence, then your program must decode that sequence if you want it to output literal line feeds in its place.
I already try with getline(...) method. How can i handle \n character
without raising the complexity and read file line by line.
You need to process each line read to decode the \n sequences in it. I would write a function for that. Any way around, however, your program will be more complex, because the current version simply doesn't do all the things it needs to do.

End of line character / Carriage return

I'm reading a normal text file and write all the words as numbers to another text. When a line finishes it looks for a "new line character (\n)" and continues from the new line. In Ubuntu it executes perfectly but in Windows (DevC++) it cannot operate the function. My problem is the text in Windows which I read haven't got new line characters. Even I put new lines by my hand my program cannot see it. When I want to print the character at the end of the line, it says it is a space (ascii = 32) and I am sur that I am end of the line. Here is my end of line control code, what can I do to fix it? And I read about a character called "carriage return (\r)" but it doesn't fix my problem either.
c = fgetc(fp);
printf("%d", c);
fseek(fp, -1, SEEK_SET);
if(c == '\n' || c == '\r')
fprintf(fp3, "%c%c", '\r', '\n');
If you are opening a text file and want newline conversions to take place, open the file in "r" mode instead of "rb"
FILE *fp = fopen(fname, "r");
this will open in text mode instead of binary mode, which is what you want for text files. On linux there won't appear to be a difference, but on windows, \r\n will be translated to \n
A possible solution it seems, is to read the numbers out of your file into an int variable directly.
int n;
fscanf(fp, "%d", &n);
unless the newline means something significant to you.
There are a couple of questions here
What is the difference between windows text newline and unix text newline?
UNIX newline is LF only. ASCII code 0x0a.
Windows newline is CR + LF. ASCII code 0x0d and 0x0a
Does your file have LF or CR ?
Use a hex editor to see the contents of the file. I use xxd on linux.
$ xxd unix.txt
0000000: 0a0a
$ xxd windows.txt
0000000: 0d0a

Issue with fread when read from stdin for System generated and Manual Input

Application reads the input from the stdin and the function is as follows:
filepos = ftell(stdin);
if (filepos < 0 && errno != 0)
{
perror("ftell");
return 1;
}
if ((n = fread(input_data, sizeof(char), 2, stdin)) != 2)
{
if (n == 1)
{
if (*input_data == '\n')
fprintf(stderr, "Unexpected NL character read\n");
else if (*input_data== '\r')
fprintf(stderr, "Unexpected CR character read\n");
else
fprintf(stderr, "Unexpected character read <%c>\n", *input_data);
}
else if (n != 0 && errno != 0)
{
perror("fread");
}
return 1;
}
... process data ....
When I ran this over a system generated input the application is processing correctly and when I ran this for the same output which is manually created, I am getting the error message ""Unexpected NL character read".
$ convertInput < input.system > out
$
$ convertInput < input.manual > out
Unexpected NL character read
$
Both the cases the output is correct.
When I did a diff between the two input files, it showed the message as below.
$ diff input.manual input.system
1c1
< INPUT
---
> INPUT
\ No newline at end of file
I have verified the manual input file and there is no new line also after the input. I am not sure whether the fread itself should be replaced with fgets or something to fix this.
The gdb showed that the "fread" returned "0" after the end of INPUT for "input.system" where as the "fread" returned "\n" after the end of INPUT for "input.manual".
The manual file is created as "vim input" and "pasted" the data and removed all characters after the end of data (including "\n") and "save and quit" the editor.
Any suggestions or thoughts to fix this is appreciated.
Thanks,
The problem depends on the way you create the "manual" file. If you use a text editor (vim), it's normal that it puts a \n at the very last place, since it has to complete te last "text" line. I'd rather use a binary editor to do the job. As far as I remember, recent vim's versions have a "binary mode". Another way to create a file that misses the last \n is by using "echo -en '...your data...' > file". The -n options omits the \n, the -e one interprets the sequence of characters beginning with "\" (e.g "\n", "\r"...). I hope it can help.
By the way, "ftell" on stdin may return useless values if stdin is not redirected to a real file.

Resources