print binary file - c

Say you have a file dog.txt
The
quick
brown
fox
jumps
over
the
lazy
dog
You can print the lines like this
#include <stdio.h>
int
main (void)
{
char buf[10];
FILE *fp = fopen ("dog.txt", "r");
while (fgets (buf, sizeof buf, fp))
printf ("%s", buf);
return 0;
}
But what if each "line" was separated by a null character (\0), instead of a newline (\n)? How would you print each "line" ?

The difference between "text" file handling and any other file handling is that the "text" functions assume certain things (for example, that \n is a separator). If that's not the case for you, you obviously cannot use the "text" manipulation functions. You do fread, and parse the content yourself.

If the file is not a text file (that is, if it contains non-printable ASCII characters), treat it as binary.
Rather than reading a "line" at a time (which is a text file concept), read in a buffer at a time (e.g. 1024 characters at a time).
Output each character that you read one at a time, unless you encounter whatever line delimiter the file uses (e.g. the "null" character in your question). When that character is encountered, output a newline instead.
You open a file in binary mode by including the "b" flag, e.g.
FILE *fp = fopen("dog.txt", "rb");
Use fread to read data one buffer at a time.
n = fread(buffer, sizeof(char), BUFFER_SIZE, source);

This is a trimmed down version of
WhozCraig’s
deleted answer
If all you want to do is dump data from the input file to stdout, replacing any imbedded null-chars (0) with newlines, then just do that. read-ahead buffering and such is honestly overkill for the simplicity of this problem, and besides, the fopen/fread/etc.. family already buffers for you.
Note: this assumes exactly what the OP specified, the this is otherwise a reagular "text" file save for the oddity that imbedded null-chars (0)'s should be treated as newlines in the output stream:
#include <stdio.h>
#include <errno.h>
int main(int argc, char* argv[])
{
FILE *fp = NULL;
fp = fopen(argv[1], "rb");
do
{ // pull next char, break on EOF, subst '\n' on 0.
int ch = fgetc(fp);
if (EOF == ch)
break;
if (0 == ch)
ch = '\n';
fputc(ch, stdout);
} while (true);
fclose(fp);
return EXIT_SUCCESS;
}

Related

how can i delete a whole line containing a string provided by the user in a file

i need to remove the whole line containing the string a user
inserted,i'm having a hard time figuring out how i can delete this
line here's is what i tied so far.
for example if u.txt contains:
1 hello world
2 where are you
* user input: you*
u.txt now contains
1.hello world
#include <stdio.h>
#include <stdlib.h>
int main()
{
FILE *tr1, *tr2;
char *p;
char chr;
char scanner [10];
int t = 1;
int sc;
tr1 = fopen("u.txt", "r");
chr = getc(tr1);
while (chr != EOF)
{
printf("%c", chr);
chr = getc(tr1);
}
printf("input your word to search:");
scanf("%s",scanner);
rewind(tr1);
tr2 = fopen("replica.txt", "w");
chr = 'A';
while (chr != EOF)
{
//expect the line to be deleted
sc= strstr(chr,scanner);
if (!sc)
{
//copy all lines in file replica.txt
putc(chr, tr2);
}
fclose(tr1);
fclose(tr2);
remove("u.txt");
rename("replica.txt", "u.txt");
tr1 = fopen("u.txt", "r");
chr = getc(tr1);
while (chr != EOF)
{
printf("%c", chr);
chr = getc(tr1);
}
fclose(tr1);
return 0;
}
You are making things difficult on yourself by using character-oriented input with fgetc() with operations that call for line-oriented input and output. Moreover, in your approach, you would delete every line in the file containing words where you is a lesser-included substring like:
"you", "your", "you're", "yourself", "you've", etc...
For a simplistic way of ensuring your don't remove lines unless they contain the word "you", you can simply check to ensure the character before and after the word is whitespace. You can use the isspace() macro provided in ctype.h to simplify the checks.
When you are faced with a line-oriented problem use line-oriented input and output functions, such as fgets() or POSIX getline() for input and puts() or fputs() for output. The only additional caveat is that line-oriented input functions read and include the trailing '\n' in the buffers they fill. Just remove where necessary by overwriting the trailing '\n' with the nul-terminating character '\0' (or simply 0).
Using a line-oriented approach will greatly simplify your task and provide a convenient way write the lines you keep to your output file in a single call rather than looping character-by-character. Using a character-oriented approach is not wrong, and it's not inherently slower than a line-oriented approach -- it's just less convenient. Match the tool to the job.
The following example reads "u.txt" and writes "v.txt" with the unwanted lines removed while preserving lines where the word is a lesser-included substring of the word being sought. (the rename() and remove() are left to you as there is nothing wrong with that part of your existing code)
Simply using a fixed buffer of sufficient size is the easiest way to do line-oriented input. You will generally know what the largest anticipated line is, just use a buffer that is more than sufficient to handle it (don't skimp!). Most lines are no more than 80-120 characters long. A fixed array of 1024-characters is more than sufficient. Just declare an array to use as a buffer to hold each line read, e.g.:
#define MAXW 32 /* max chars in word to find */
#define MAXC 1024 /* max chars for input buffer */
int main (int argc, char **argv) {
char buf[MAXC], /* buffer to hold each line */
word[MAXW]; /* user input to find/delete line */
size_t wordlen = 0; /* length of user input */
FILE *ifp, *ofp; /* infile & outfile pointers */
You can validate that sufficient arguments were provided on the command line for the input and output filenames and then open and validate each of your files, e.g.
if (argc < 3) { /* validate at least 2 program arguments given */
printf ("usage: %s infile outfile\n", strrchr (argv[0], '/') + 1);
return 1;
}
if (!(ifp = fopen (argv[1], "r"))) { /* open/validate open for reading */
perror ("fopen-ifp");
return 1;
}
if (!(ofp = fopen (argv[2], "w"))) { /* open/validate open for writing */
perror ("fopen-ofp");
return 1;
}
Now just take the input from the user and remove the trailing '\n'. Using strcspn you can get both the length of the user-input while overwriting the trailing '\n' in a single call. See: man 3 strspn, e.g.
fputs ("enter word to find: ", stdout); /* prompt for input */
fflush (stdout); /* optional (but recommended) */
if (!fgets (word, MAXW, stdin)) { /* read/validate word to find */
fputs ("(user canceled input)\n", stdout);
return 1;
}
/* get wordlen, trim trailing \n from find */
word[(wordlen = strcspn (word, "\n"))] = 0;
Now simply read a line-at-a-time and search for word within the line and validate is it a whole word and not a part of a larger word. Write all lines that don't contain word alone to your output file:
while (fgets (buf, MAXC, ifp)) { /* read each line */
char *p = strstr (buf, word); /* search for word */
if (!p || /* word not in line */
(p != buf && !isspace (*(p - 1))) || /* text before */
(!isspace (p[wordlen]) && p[wordlen] != 0)) /* text after */
fputs (buf, ofp); /* write line to output file */
}
(note: the isspace() macro from checking whether a character is whitespace is provided in the header ctype.h)
Close both files and aside from your remove() and rename() you are done. But, note you should always validate fclose() after a write to ensure you catch any errors associated with flushing the file stream on close that would not otherwise be caught by validating each write individually, e.g.
if (fclose (ofp) == EOF) { /* validate EVERY close-after-write */
perror ("fclose-ofp");
remove (argv[2]);
}
fclose (ifp); /* close input file */
}
Add the required header files and you have a working example. You can put the example to use as follows:
Example Input File
$ cat dat/u.txt
hello world
how are you
show yourself
you're missing
Example Use
$ ./bin/filermline dat/u.txt dat/v.txt
enter word to find: you
Resulting Output File
$ cat dat/v.txt
hello world
show yourself
you're missing
Look things over and compare the line-oriented approach to your use of the character-oriented use of fgetc(). Let me know if you have further questions.
You're not giving any value to chr, it is always A at least for the first loop. Is there any lines of code missed? I suggest you checking every line until you find an EOL saving it in a string. Then call strstr() with it and if the result is NULL you simply copy it into another file.
If keeping it in the same file is necessary it Will be more dificult, because you got to save a pointer to the start of each line. If this line youre studying is actually one that has to be erased, you got to recopy each following line from this pointer. It is way harder this way.
Good luck!
PS: Maybe this can help, not sure: How do I delete a specific line from text file in C?

how to read and process utf-8 characters in one char in c from the file

how i can to read and process utf-8 characters in one char in c from the file
this is my code
FILE *file = fopen(fileName, "rb");
char *code;
size_t n = 0;
if (file == NULL) return NULL;
fseek(file, 0, SEEK_END);
long f_size = ftell(file);
fseek(file, 0, SEEK_SET);
code = malloc(f_size);
char a,b;
while (!feof(file)) {
fscanf(file, "%c", &a);
code[n++] = a;
// i want to modify "a" (current char) in here
}
code[n] = '\0';
this is file content
~”م‘‎iاk·¶;R0ثp9´
-پ‘“گAéI‚sہئzOU,HدلKŒ©َض†ُ­ ت6‘گA=…¢¢³qد4â9àr}hw O‍Uجy.4a³‎M;£´`د$r(q¸Œçً£F 6pG|ںJr(TîsشR
Chars can commonly hold 255 different values (1 byte), or in other words, just the ASCII table (it could use the extended table if you make it unsigned). For handling UTF-8 characters i would recommend using another type like wchar_t (if a wide character in your compiler means as an UTF-8), otherwise use char_32 if you're using C++11, or a library to deal with your data like ICU.
Edit
This example code explains how to deal with UTF-8 in C. Note that you have to make sure that wchar_t in your compiler can store an UTF-8.
#include <stdio.h>
#include <locale.h>
#include <stdlib.h>
#include <wchar.h>
main() {
FILE *file=fopen("Testing.txt", "r, ccs=UTF-8");
wchar_t sentence[100000], ch=1;
int n=0;
char*loc = setlocale(LC_ALL, "");
printf("Locale set to: %s\n", loc);
if(file==NULL){
printf("Error processing file\n");
} else {
while((ch = fgetwc(file)) != 65535){
/* The end of file value may vary depending of the wchar_t!*/
/* wprintf(L"%lc", ch); */
sentence[n]=ch+1; /*Example modification*/
n++;
}
}
fclose(file);
file=fopen("Testing.txt", "w, ccs=UTF-8");
fputws(sentence, file);
wprintf(L"%ls", sentence);
fclose(file);
return 0;
}
Your system locale
The char*loc = setlocale(LC_ALL, ""); will help you see your current system locale. Make sure is in UTF-8 if your using linux, if you're using windows then you'll have to stick to one language. This is not a problem if you don't want to print the characters.
How to open the file
Firstly, I opened it for reading it as text file instead of reading it as binary file. Also I have to open the file using the UTF-8 formating (I think in linux it will be as your locale, so the ccs=UTF-8 won't be necessary). Even though in windows we're stuck with one language, the file still has to be read in UTF-8.
Using compatible functions with the characters
For this we'll use the functions inside the wchar.h library (like wprintf and fgetwc). The problem with the other functions is that they are limited to the range of a char, giving the wrong value.
I used as an example this:
¿khñà?
hello
~”م‘‎iاk·¶;R0ثp9´ -پ‘“گAéI‚sہئzOU,HدلKŒ©َض†ُ­ ت6‘گA=…¢¢³qد4â9àr}hw O‍Uجy.4a³‎M;£´`د$r(q¸Œçً£F 6pG|ںJr(TîsشR
In the last part of the program It overwrites the file with the acumulated modified string.
You could try changing sentence[n]=ch+1; to sentence[n]=ch; to check in your original file if it reads and outputs the file correctly (and uncomment the wprintf to check the output).

C fopen and fgets returning weird characters instead of file contents

I am doing a coding exercise and I need to open a data file that contains lots of data. It's a .raw file. Before I build my app I open the 'card.raw' file in a texteditor and in a hexeditor. If you open it in textEdit you will see 'bit.ly/18gECvy ˇÿˇ‡JFIFHHˇ€Cˇ€Cˇ¿Vˇƒ' as the first line. (The url points to Rick Roll as a joke by the professor.)
So I start building my app to open the same 'card.raw' file. I'm doing initial checks to see the app print to the console the same "stuff" as when I open it with TextEdit. Instead of printing out I see when I open it with TextEdit (see the text above), it starts and continues printing out text that looks like this:
\377\304 'u\204\206\226\262\302\3227\205\246\266\342GSc\224\225\245\265\305\306\325\326Wgs\244\346(w\345\362\366\207\264\304ǃ\223\227\2678H\247\250\343\344\365\377\304
Now I have no idea what the '\' and numbers are called (what do I search for to read more?), why it's printing that instead of the characters (unicode?) I see when I open in TextEdit, or if I can convert this output to hex or unicode.
My code is:
#include <stdio.h>
#include <string.h>
#include <limits.h>
int main(int argc, const char * argv[]) {
FILE* file;
file = fopen("/Users/jamesgoldstein/CS50/CS50Week4/CS50Recovery/CS50Recovery/CS50Recovery/card.raw", "r");
char output[LINE_MAX];
if (file != NULL)
{
for (int i = 1; fgets(output, LINE_MAX, file) != NULL; i++)
{
printf("%s\n", output);
}
}
fclose(file);
return 0;
}
UPDATED & SIMPLIFIED CODE USING fread()
#include <stdio.h>
#include <string.h>
int main(int argc, const char * argv[]) {
FILE* fp = fopen("/Users/jamesgoldstein/CS50/CS50Week4/CS50Recovery/CS50Recovery/CS50Recovery/card.raw", "rb");
char output[256];
if (fp == NULL)
{
printf("Bad input\n");
return 1;
}
for (int i = 1; fread(output, sizeof(output), 1, fp) != NULL; i++)
{
printf("%s\n", output);
}
fclose(fp);
return 0;
}
Output is partially correct (here's a snippet of the beginning):
bit.ly/18gECvy
\377\330\377\340
\221\241\26145\301\321\341 "#&23DE\3616BFRTUe\202CVbdfrtv\222\242
'u\204\206\226\262\302\3227\205\246\266\342GSc\224\225\245\265\305\306\325\326Wgs\244\346(w\345\362\366\207\264\304ǃ\223\227\2678H\247\250\343\344\365\377\304
=\311\345\264\352\354 7\222\315\306\324+\342\364\273\274\205$z\262\313g-\343wl\306\375My:}\242o\210\377
3(\266l\356\307T饢"2\377
\267\212ǑP\2218 \344
Actual card.raw file snippet of beginning
bit.ly/18gECvy ˇÿˇ‡JFIFHHˇ€Cˇ€Cˇ¿Vˇƒ
ˇƒÖ
!1AQa$%qÅë°±45¡—· "#&23DEÒ6BFRTUeÇCVbdfrtví¢
I think you should open the .raw file in the mode "rb".
Then use fread()
From the presence of the string "JFIF" in the first line of the file card.raw ("bit.ly/18gECvy ˇÿˇ‡JFIFHHˇ€Cˇ€Cˇ¿Vˇƒ") it seems like card.raw is a JPEG image format file that had the bit.ly URL inserted at its beginning.
You are going to see weird/special characters in this case because it is not a usual text file at all.
Also, as davmac pointed out, the way you are using fgets isn't appropriate even if you were dealing with an actual text file. When dealing with plain text files in C, the best way is to read the entire file at once instead of line by line, assuming sufficient memory is available:
size_t f_len, f_actualread;
char *buffer = NULL;
fseek(file, 0, SEEK_END)
f_len = ftell(fp);
rewind(fp);
buffer = malloc(f_len + 1);
if(buffer == NULL)
{
puts("malloc failed");
return;
}
f_actualread = fread(buffer, 1, f_len, file);
buffer[f_actualread] = 0;
printf("%s\n", output);
free(buffer);
buffer = NULL;
This way, you don't need to worry about line lengths or anything like that.
You should probably use fread rather than fgets, since the latter is really designed for reading text files, and this is clearly not a text file.
Your updated code in fact does have the very problem I originally wrote about (but have since retracted), since you are now using fread rather than fgets:
for (int i = 1; fread(output, sizeof(output), 1, fp) != NULL; i++)
{
printf("%s\n", output);
}
I.e. you are printing the output buffer as if it were a null-terminated string, when in fact it is not. Better to use fwrite to STDOUT.
However, I think the essence of the problem here is trying to display arbitrary bytes (which don't actually represent a character string) to the terminal. The terminal may interpret some byte sequences as commands which affect what you see. Also, textEdit may determine that the file is in some character encoding and decode characters accordingly.
Now I have no idea what the '\' and numbers are called (what do I search for to read more?)
They look like octal escape sequences to me.
why it's printing that instead of the characters (unicode?)
It's nothing to do with unicode. Maybe it's your terminal emulator deciding that those characters are unprintable, and so replacing them with an escape sequence.
In short, I think that your method (comparing visually what you see in a text editor with what you see on the terminal) is flawed. The code you have to read from the file looks correct; I'd suggest proceeding with the exercise and checking results then, or if you really want to be sure, look at the file using a hex editor, and have your program output the byte values it reads (as numbers) - and compare those with what you see in the hex editor.

How to skip white lines while reading text file?

I want to read a text file line by line, but I'm not interested in the white lines. What nice way is there of skipping the blank lines? I know I could read a line, check if it's blank and free it if it is, and so on until I reach a good line, but I'm wondering if there's some other way to do it.
I think your method is good enough. Technically you should even check if it's only spaces :-) Note that if you are using fscanf (quite used in homework problems), white line skipping is "Included in the price" :-) AND you don't have to fight against "this line is bigger than my buffer, what should I do?"
The general concept is fine ... you read in line by line and check to see if it has a non-whitespace character. A fairly optimum way of checking for it is to use strspn ... for example:
#include <stdio.h>
#include <string.h>
int is_blank_line(const char *line) {
const char accept[]=" \t\r\n"; /* white space characters (fgets stores \n) */
return (strspn(line, accept) == strlen(line));
}
int main(int argc, char *argv[]) {
char line[256]; /* assuming no line is longer than 256 bytes */
FILE *fp;
if ( argc < 2 ) {
fprintf(stderr, "Need a file name\n");
return -1;
}
fp = fopen(argv[1], "r");
if ( !fp ) {
perror(argv[1]);
return -1;
}
while (!feof(fp)) {
fgets(line, sizeof(line), fp);
if (is_blank_line(line)) {
continue;
}
printf("%s", line);
}
return 0;
}
If reading line by line use a simple check of '\n' (compiler will take care even if your real OS newline is \r\n).
If using fread to read whole file use strtok or strtok_r to split lines using sep='\n', empty lines will chopped out automatically.

C file read by line up to a custom delimiter

Is there a function in C to read a file with a custom delimiter like '\n'?
For example: I have:
I did write \n to exemplify in the file is the LF (Line feed, '\n', 0x0A)
this is the firstline\n this is the second line\n
I'd like the file to read by part and split it in two strings:
this is the firstline\n
this is the second line\n
I know fgets I can read up to a num of characters but not by any pattern. In C++ I know there is a method but in C how to do it?
I'll show another example:
I'm reading a file ABC.txt
abc\n
def\n
ghi\n
With the following code:
FILE* fp = fopen("ABC.txt", "rt");
const int lineSz = 300;
char line[lineSz];
char* res = fgets(line, lineSz, fp); // the res is filled with abc\ndef\nghi\n
fclose(fp);
I excpected fgets had to stop on abc\n
But the res is filled with: abc\ndef\nghi\n
SOLVED: The problem is that I was using Notepad++ in WindowsXP (the one I used
I don't know it happens on other windows) saved the file with different
encoding.
The newline on fgets needs the CRLF not just the CR when you type
enter in notepad++
I opened the windows notepad And it worked the fgets reads the string
up to abc\n on the second example.
fgets() will read one line at a time, and does include the newline character in the line output buffer. Here's an example of the common usage.
#include <stdio.h>
#include <string.h>
int main()
{
char buf[1024];
while ( fgets(buf,1024,stdin) )
printf("read a line %lu characters long:\n %s", strlen(buf), buf);
return 0;
}
But since you asked about using a "custom" delimiter... getdelim() allows you to specify a different end-of-line delimiter.

Resources