Read a text file line by line and save each line in the buffer irrespective of data type and length of each line - c

I want to read one line of the text file, save it to a buffer, send the buffer over a udp socket and then go and read the second line and so on..
So far, since I knew the data type of the text to be read from the text file, I had been using
fscanf()
to read each line from the text file. But now I don't know the data types so it is not possible for me to use this function anymore. Is there any other way to read text file line by line.
Note: The length of each line may vary.

Here is a handy code I found to read data as binary
FILE *fp;
fp=fopen("c:\\test.bin", "r");
char *x = new char[10];
//size_t fread(void *ptr, size_t size_of_elements, size_t number_of_elements, FILE *a_file);
fread(x, sizeof(x[0]), sizeof(x)/sizeof(x[0]), fp);

Without knowing the data type you can never know what you're going to read into your variables... Let's see, you mention that the length of each line may vary, right?. So we can assume that your text file contains... text. That is, the the number 128 would not be represented by a single integer, but by three chars that you would read and then parse into an integer.
That said, there's not a lot of options there but to build a parser (you read each line and try to guess what it is based on the chars you've read, say, are there only numbers?, are there only numbers but there's a dot? are there only a-z characters?, are they both?) that would't be 100% reliable or just try to always know the data type beforehand (say, save the first char that you read from each line for the data type when writing the file).
A very different story goes on if your text file is not really in text, but in binary mode. If that's the case... well, there's nothing to do but knowing the data types beforehand.

Related

How to write at the middle of a file in c

Is it possible to write at the middle of a file for example I want to insert some string at the 5th position in the 2nd line of a file in c ?
I'm not very familiar with some of C functions that are related to handling files , if someone could help me I would appreciate it
I tried using fputs but I couldn't insert characters at the desired location
open a new output file
read the input file line by line (fgets) writing each line out to a new file as you read.
When you hit the place you want to insert write the new line(s)
The carry on copy the old lines to the new file
close input and output
rename output file to input
Continuing from my comments above. Here's what I'd do:
Create two large, static char[] buffers of the same size--each large enough to store the largest file you could possibly ever need to read in (ex: 10 MiB). Ex:
#define MAX_FILE_SIZE_10_MIB (10*1024*1024)
static char buffer_file_in[MAX_FILE_SIZE_10_MIB];
static char buffer_file_out[MAX_FILE_SIZE_10_MIB];
Use fopen(filename, "r+") to open the file as read/update. See: https://cplusplus.com/reference/cstdio/fopen/. Read the chars one-by-one using fgetc() (see my file_load() function for how to use fgetc()) into the first large char buffer you created, buffer_file_in. Continue until you've read the whole file into that buffer.
Find the location of the place you'd like to do the insertion. Note: you could do this live as you read the file into buffer_file_in the first time by counting newline chars ('\n') to see what line you are on. Copy chars from buffer_file_in to buffer_file_out up to that point. Now, write your new contents into buffer_file_out at that point. Then, finish copying the rest of buffer_file_in into buffer_file_out after your inserted chars.
Seek to the beginning of the file with fseek(file_pointer, 0, SEEK_SET);
Write the buffer_file_out buffer contents into the file with fwrite().
Close the file with fclose().
There are some optimizations you could do here, such as storing the index where you want to begin your insertion, and not copying the chars up to that point into buffer_file_in, but rather, simply copying the remaining of the file after that into buffer_file_in, and then seeking to that point later and writing only your new contents plus the rest of the file. This avoids unnecessarily rewriting the very beginning of the fie prior to the insertion point is all.
(Probably preferred) you could also just copy the file and the changes you insert straight into buffer_file_out in one shot, then write that back to the file starting at the beginning of the file. This would be very similar to #pm100's approach, except using 1 file + 1 buffer rather than 2 files.
Look for other optimizations and reductions of redundancy as applicable.
My approach above uses 1 file and 1 or 2 buffers in RAM, depending on implementation. #pm100's approach uses 2 files and 0 buffers in RAM (very similar to what my 1 file and 1 buffer approach would look like), depending on implementation. Both approaches are valid.

Reading content from a file and storing it to String in C

I've written a simple http server in C and am now trying to implement HTML files.
For this I need send a response, containing the content of the HTML file.
How do I do that best?
Do I read the file line by line, and if so how do I store them in a single string?
Thanks already!
Here is an example of reading a text file by chunks which, if the file is big, would be faster than reading the file line by line.
As #tadman said in his comment, text files aren't generally big so reading them in chunks doesn't make any real difference in speed but web servers serve other files too - like perhaps photos or movies (which are big). So if you are only going to read text files then reading line by line might be simpler (you could use fgets instead of fread) but if you are going to read other kinds of files then reading all of them in chunks means you can do it the same way for all of them.
However, as #chux said in his comment, there is another difference between reading text files and binary files. The difference is that text files are opened in text mode: fopen(filename,"r"); and binary files must be opened in binary mode: fopen(filename,"rb"); A web server could probably open all files in binary mode because web browsers ignore whitespace anyway but other kinds of programs need to know what the line endings will be so it can make a difference.
https://onlinegdb.com/HkM---r2X
#include <stdio.h>
int main()
{
// we will make the buffer 200 bytes in size
// this is big enough for the whole file
// in reality you would probably stat the file
// to find it's size and then malloc the memory
// or you could read the file twice:
// - first time counting the bytes
// - second time reading the bytes
char buffer[200]="", *current=buffer;
// we will read 20 bytes at a time to show that the loop works
// in reality you would pick something approaching the page size
// perhaps 4096? Benchmarking might help choose a good size
int bytes, chunk=20, size=sizeof(buffer)/sizeof(char);
// open the text file in text mode
// if it was a binary file you would need "rb" instead of "r"
FILE *file=fopen("test.html","r");
if(file)
{
// loop through reading the bytes
do {
bytes=fread(current,sizeof(char),chunk,file);
current+=bytes;
} while (bytes==chunk);
// close the file
fclose(file);
// terminate the buffer so that string function will work
*current='\0';
// print the buffer
printf("%s",buffer);
}
return 0;
}

Inserting text in a file instead of overwriting in c

How can I insert characters in a file using C instead of overwriting? I also want to write in start of file and end of a file. I tried this method but it didn't work out (I can re-position but I cannot insert. The text is overwritten)
I've tried this, but it didn't work:
fword = fopen("wrote.txt", "rb+");
fseek(fword, 0, SEEK_SET);
fscanf(fword, "%c", &l);
To add text at the end, you can open the file with "a" mode (check the fopen manual). It will write your text to end.
To add text in other positions, you have to read everything after that to memory, write what you want and then write the rest.
Files are abstractions of byte streams, there is no such concept as insert in a byte stream, you can seek into certain place and write data there. The bytes you wrote will lay in the file as an array of bytes, if the writing exceeds the current file size, the file will be extended.

C program for reading doc, docx, pdf

I want to write a program in C(only c not c++ or java) that will read doc, docx, pdf and want to make it available on github to use for all who needs that code. So I started with .doc file I explored that if I open .doc file with simple notepad it will show you all text but just with some extra content which you can easily trim. So I did write a simple c program to read .doc wile in both 'r' and 'rb' mode but both time it gives me only 5-9 character in the file and those also not readable. I don't know why it's happening. Any comment or disccussion will be very helpful for me.
Here is the link for github Source code. Please help me to complete all three format.
To answer your specific question, the reason your little application stops reading is because it mistakenly thinks there is an EOF character in your file.
Look at your code:
char ch;
int nol=0, not=0, nob=0, noc=0;
FILE *fp;
fp = fopen("file.doc","rb");
while(1)
{
ch = fgetc(fp);
if(ch==EOF)
{
break;
}
You store the result of fgetc(fp) in a variable of type char, which is a single-byte variable. However, the result of fgetc is very purposefully "int", not "char".
fgetc always returns a positive result in the range 0 to 255, except for when you reach the end of the file in which case it returns EOF, which is often implemented as a -1 value.
If you read a byte of value 255 and store it in an int, everything is OK, it's stored as the value 255 and your loop can continue. If you store the result in a char, it's going to be interpreted equal to EOF. And your loop stops.
Don't expect to get anywhere with this idea. .doc is a huge binary file format that is inhumanly complicated to parse. With that said, Cubia mentioned the offset where the text section of the document starts. I'm not familiar with the details of the format, but if the raw text is contained in one location, use fseek to get at it and stop when you reach the end. This won't be the case for the other formats because they are very different.
.docx and .pdf should be easier to parse because they are more modern formats. If you want to read anything from a docx you need to read from a zip file with a ton of xml in it and use a parser to figure out which text you want.
.pdf should be the easiest of the three because you might be able to find a library out there that can almost do what you want.
As for why you are getting strange output from your program, remember that .doc is a binary format and the vast majority of the data is garbage from your perspective. Dumping it to the terminal will yield readable text but also a bunch of control characters that should screw with your terminal.
As a last note - don't try to read docx files directly using fread - they are compressed so you likely won't recover the text unaltered. Take a look at libarchive. Also - expect to have to read the document specifications. docx seems to be a microsoft extension to the openoffice format. See this and some PDF specification documents (there are multiple versions).
Look at the .doc file type as a txt file but with extra non-printable characters before, in the middle, and after your content. These non-printable characters are used for defining special formatting, metadata and other infos.
With this said, all .doc files follow a certain structure.
If you open two different .doc files in a hex editor, you will notice that the text content of both files start at an offset of 0xA00 (2560 bytes) from the beginning of the file. This means that when you open your file initially, you can ignore the first 2560 bytes of the file (Take a look at the fseek() function).
From this point on, you can read the contents of your file until you reach '\0'.
I have not seen the implementation of a .pdf or a .docx file, but you can take open up both files with a hex editor and figure out what pattern you can use the isolate the important contents of the files.
Hope this helps.
EDIT : You can always find documentation on the different file formats that you want to manipulate. Here are the specifications of the PDF file type :
http://www.adobe.com/devnet/pdf/pdf_reference.html
http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf

Using scanf to re-read a text file in c

I am currently writing a program in c that requires me to read a text file more than once. That is, I am reading the data from the first line of the text file (which is fine), but then want to go back and re-read the same data from the same first line of the text file again (my problem). The data on the text file are simple numbers spaced out such that they may be read with scanf. I am a beginner and would appreciate some help. If it is in fact not possible to do this using scanf what can I do in order to solve my problem?
you can use rewind(FILE *stream) it is equivalent to:
fseek(stream, 0, SEEK_SET)
which sets the file position indicator for the stream pointed to by stream to the beginning of the file

Resources