These commands are supposed to return bytes read or written.
Currently, I am trying to read and write 100 characters at a time.
When I use read(fd, buffer, 100) I get 99 characters read. When I use read(fd, buffer, 101) I get 100 characters read. What is the problem?
I am supposed to read 100 characters from a source code and write them to a destination1. Then, I am supposed to read 50 characters from same source into destination2.
The reading and passing is inaccurate after the first few loops. Problems arise on the third loop.
Please check it out:
[Step 2] Prcs_P2.c: Copy the contents of source.txt into destination1.txt and
destination2.txt as per the following procedure.
1. Read the next 100 characters from source.txt, and among characters read,
replace each character ’1’ with character ’A’ and all characters are then
written in destination1.txt
2. Then the next 50 characters are read from source.txt, and among characters
read, replace each character ’2’ with character ’B’ and all characters are
then written in destination2.txt
3. The previous steps are repeated until the end of file source.txt.
The last read may not have 100 or 50 characters.
-------------
It's copying characters irregularly. sometimes more than 100 | 50 and sometimes less.
int main() {
//const int sizeBuff=100;
char buffer[105]; //used to carry information in packets of 10
int temp=0; //temp variable to check for errors
int charCount=0;
int i=0;
//----------------------------------------
//charCount=read(sourceFile, buffer , 101 );
while( charCount=read(sourceFile, buffer , 100) >0){ //needed 101 as last arg instead of 100. DUnno why?
i=0;
while(i<charCount){
if (buffer[i]=='1')
buffer[i]='A';
i++;
}
if(write( destinationFile, buffer,charCount)==-1) //write(...) returns the number of bytes written to destinationFile
//-1 depicts error in the function and 0 is returned upon end of file
{
printf("\nWrite fail.\n");
perror("Error"); //Prints error, if found while writing.
}
memset(buffer, 0, 105); //CLEARS BUFFER
i=0;
charCount=read(sourceFile, buffer , 50 ); //reads 50 bytes at a time //need flag for error
while(i<charCount){
if (buffer[i]=='2')
buffer[i]='B';
i++;
}
temp=write( destinationFile2, buffer, charCount); //write(...) returns the number of bytes written to destinationFile
if(temp==-1) //-1 depicts error in the function and 0 is returned upon end of file
{
printf("\nWrite fail.\n");
perror("Error"); //Prints error, if found while writing.
}
memset(buffer, 0, 105);
}//while loop ends
close(destinationFile);
close(destinationFile2);
close(sourceFile);
//------PART 1 ENDS-------------
//------PART 2 STARTS------------
}
charCount=read(sourceFile, buffer , 100) >0
This sets charCount to 0 or 1. You want
(charCount = read(sourceFile, buffer , 100)) > 0
Related
I want to write my own version of the head Unix command, but my program is not working.
I am trying to to print the first 10 lines of a text file, but instead the program prints all the lines. I specify the file name and number of lines to print via command-line arguments. I am only required to use Unix system calls such as read(), open() and close().
Here is the code:
#include "stdlib.h"
#include "stdio.h"
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
#define BUFFERSZ 256
#define LINES 10
void fileError( char*, char* );
int main( int ac, char* args[] )
{
char buffer[BUFFERSZ];
int linesToRead = LINES;
int in_fd, rd_chars;
// check for invalid argument count
if ( ac < 2 || ac > 3 )
{
printf( "usage: head FILE [n]\n" );
exit(1);
}
// check for n
if ( ac == 3 )
linesToRead = atoi( args[2] );
// attempt to open the file
if ( ( in_fd = open( args[1], O_RDONLY ) ) == -1 )
fileError( "Cannot open ", args[1] );
int lineCount = 0;
//count no. of lines inside file
while (read( in_fd, buffer, 1 ) == 1)
{
if ( *buffer == '\n' )
{
lineCount++;
}
}
lineCount = lineCount+1;
printf("Linecount: %i\n", lineCount);
int Starting = 0, xline = 0;
// xline = totallines - requiredlines
xline = lineCount - linesToRead;
printf("xline: %i \n\n",xline);
if ( xline < 0 )
xline = 0;
// count for no. of line to print
int printStop = lineCount - xline;
printf("printstop: %i \n\n",printStop);
if ( ( in_fd = open( args[1], O_RDONLY ) ) == -1 )
fileError( "Cannot open ", args[1] );
//read and print till required number
while (Starting != printStop) {
read( in_fd, buffer, BUFFERSZ );
Starting++; //increment starting
}
//read( in_fd, buffer, BUFFERSZ );
printf("%s \n", buffer);
if ( close( in_fd ) == -1 )
fileError( "Error closing files", "" );
return 0;
}
void fileError( char* s1, char* s2 )
{
fprintf( stderr, "Error: %s ", s1 );
perror( s2 );
exit( 1 );
}
What am I doing wrong?
It's very odd that you open the file and scan it to count the total number lines before going on to echoing the first lines. There is absolutely no need to know in advance how many lines there are altogether before you start echoing lines, and it does nothing useful for you. If you're going to do it, anyway, however, then you ought to close() the file before you re-open it. For your simple program, this is a matter of good form, not of correct function -- the misbehavior you observe is unrelated to that.
There are several problems in the key portion of your program:
//read and print till required number
while (Starting != printStop) {
read( in_fd, buffer, BUFFERSZ );
Starting++; //increment starting
}
//read( in_fd, buffer, BUFFERSZ );
printf("%s \n", buffer);
You do not check the return value of your read() call in this section. You must check it, because it tells you not only whether there was an error / end-of-file, but also how many bytes were actually read. You are not guaranteed to fill the buffer on any call, and only in this way can you know which elements of the buffer afterward contain valid data. (Pre-counting lines does nothing for you in this regard.)
You are performing raw read()s, and apparently assuming that each one will read exactly one line. That assumption is invalid. read() does not give any special treatment to line terminators, so you are likely to have reads that span multiple lines, and reads that read only partial lines (and maybe both in the same read). You therefore cannot count lines by counting read() calls. Instead, you must scan the valid characters in the read buffer and count the newlines among them.
You do not actually print anything inside your read loop. Instead, you wait until you've done all your reading, then print everything the buffer after the last read. That's not going to serve your purpose when you don't get all the lines you need in the first read, because each subsequent successful read will clobber the data from the preceding one.
You pass the buffer to printf() as if it were a null-terminated string, but you do nothing to ensure that it is, in fact, terminated. read() does not do that for you.
I have trouble believing your claim that your program always prints all the line of the designated file, but I can believe that it prints all the lines of the specific file you're testing it on. It might do that if the file is short enough that the whole thing fits into your buffer. Your program then might read the whole thing into the buffer on the first read() call (though it is not guaranteed to do so), and then read nothing on each subsequent call, returning -1 and leaving the buffer unchanged. When you finally print the buffer, it still contains the whole contents of the file.
I'm trying to read a file with 1024 lines with 9 times the same letter in each line and returning if it finds a line that doesn't match this terms.
The file is as follow but with 1024 lines:
eeeeeeeee
eeeeeeeee
eeeeeeeee
Code:
fd = open(fileName, O_RDONLY);
lseek(fd,0,SEEK_SET);
if(flock(fd, LOCK_SH) == -1)
perror("error on file lock");
if(fd != 0){
read(fd, lineFromFile, (sizeof(char)*10));
arguments->charRead = lineFromFile[0];
for(i=0; i < 1024; i++){
var = read(fd, toReadFromFile, (sizeof(char)*10));
if(strncmp(toReadFromFile,lineFromFile,10) != 0 || var < 10){
arguments->result = -1;
printf("%s \n\n",toReadFromFile);
printf("%s \n",lineFromFile);
printf("i %d var %d \n",i,var);
free(toReadFromFile);
free(lineFromFile);
return ;
}
}
}
Output:
> eeeee
eeee
eeeee
eeee
i 954 var 6
I have 5 different files with different letters and every single one gives this output in that specific line (954) and the line is correct with the letter writen 9 times with a \n in the end.
Any ideas why this could be happening? If i don't use the lseek it works fine but i need the lseek to divide the file in several parts to be tested by different threads. I put the 0 index in the lseek for simplification to show you guys.
Thanks.
It looks like you are looking for "eeeee\neeee" instead of "eeeeeeeee\n". Which means your file should should start like this:
eeeee
eeeeeeeee
eeeeeeeee
and end like this:
eeeeeeeee
eeee
If your file ends like this:
eeeeeeeee
eeeeeeeee
Then when you get to the last line, it will fail because you will only read "eeeee\n" instead of "eeeee\neeee".
Given the new information in your comment, I believe the problem is that you should not be seeking to the middle of lines (in this case 342 and 684). You should seek to an even multiple of the expected string (like 340 and 680). Also, line 954 is not where the problem happened. It should be line 954 + X, where X is the line you seeked to.
Whatever other problems your program may have, it certainly has this: the read() function is not guaranteed to read the full number of bytes requested. It will read at least one unless it encounters an error or the end of the file, and under many circumstances it does read the full number of bytes requested, but even when there are enough bytes remaining before the end of the file, read() may read fewer bytes than requested.
The comments urging you to use a higher-level function instead are well considered, but if you are for some reason obligated to use read() then you must watch for cases where fewer bytes are read than requested, and handle them by reading additional bytes into the unused tail end of the buffer. Possibly multiple times.
In function form, that might look like this:
int read_all(int fd, char buf[], int num_to_read) {
int total_read = 0;
int n_read = 0;
while (total_read < num_to_read) {
n_read = read(fd, buf + total_read, num_to_read - total_read);
if (n_read > 0) {
total_read += n_read;
} else {
break;
}
}
return (n_read < 0) ? n_read : total_read;
}
I want to take all characters past location 900 from a file called WWW, and put all of these in an array:
//Keep track of all characters past position 900 in WWW.
int Seek900InWWW = lseek(WWW, 900, 0); //goes to position 900 in WWW
printf("%d \n", Seek900InWWW);
if(Seek900InWWW < 0)
printf("Error seeking to position 900 in WWW.txt");
char EverythingPast900[appropriatesize];
int NextRead;
char NextChar[1];
int i = 0;
while((NextRead = read(WWW, NextChar, sizeof(NextChar))) > 0) {
EverythingPast900[i] = NextChar[0];
printf("%c \n", NextChar[0]);
i++;
}
I try to create a char array of length 1, since the read system call requires a pointer, I cannot use a regular char. The above code does not work. In fact, it does not print any characters to the terminal as expected by the loop. I think my logic is correct, but perhaps a misunderstanding of whats going on behind the scenes is what is making this hard for me. Or maybe i missed something simple (hope not).
If you already know how many bytes to read (e.g. in appropriatesize) then just read in that many bytes at once, rather than reading in bytes one at a time.
char everythingPast900[appropriatesize];
ssize_t bytesRead = read(WWW, everythingPast900, sizeof everythingPast900);
if (bytesRead > 0 && bytesRead != appropriatesize)
{
// only everythingPast900[0] to everythingPast900[bytesRead - 1] is valid
}
I made a test version of your code and added bits you left out. Why did you leave them out?
I also made a file named www.txt that has a hundred lines of "This is a test line." in it.
And I found a potential problem, depending on how big your appropriatesize value is and how big the file is. If you write past the end of EverythingPast900 it is possible for you to kill your program and crash it before you ever produce any output to display. That might happen on Windows where stdout may not be line buffered depending on which libraries you used.
See the MSDN setvbuf page, in particular "For some systems, this provides line buffering. However, for Win32, the behavior is the same as _IOFBF - Full Buffering."
This seems to work:
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdio.h>
int main()
{
int WWW = open("www.txt", O_RDONLY);
if(WWW < 0)
printf("Error opening www.txt\n");
//Keep track of all characters past position 900 in WWW.
int Seek900InWWW = lseek(WWW, 900, 0); //goes to position 900 in WWW
printf("%d \n", Seek900InWWW);
if(Seek900InWWW < 0)
printf("Error seeking to position 900 in WWW.txt");
int appropriatesize = 1000;
char EverythingPast900[appropriatesize];
int NextRead;
char NextChar[1];
int i = 0;
while(i < appropriatesize && (NextRead = read(WWW, NextChar, sizeof(NextChar))) > 0) {
EverythingPast900[i] = NextChar[0];
printf("%c \n", NextChar[0]);
i++;
}
return 0;
}
As stated in another answer, read more than one byte. The theory behind "buffers" is to reduce the amount of read/write operations due to how slow disk I/O (or network I/O) is compared to memory speed and CPU speed. Look at it as if it is code and consider which is faster: adding 1 to the file size N times and writing N bytes individually, or adding N to the file size once and writing N bytes at once?
Another thing worth mentioning is the fact that read may read fewer than the number of bytes you requested, even if there is more to read. The answer written by #dreamlax illustrates this fact. If you want, you can use a loop to read as many bytes as possible, filling the buffer. Note that I used a function, but you can do the same thing in your main code:
#include <sys/types.h>
/* Read from a file descriptor, filling the buffer with the requested
* number of bytes. If the end-of-file is encountered, the number of
* bytes returned may be less than the requested number of bytes.
* On error, -1 is returned. See read(2) or read(3) for possible
* values of errno.
* Otherwise, the number of bytes read is returned.
*/
ssize_t
read_fill (int fd, char *readbuf, ssize_t nrequested)
{
ssize_t nread, nsum = 0;
while (nrequested > 0
&& (nread = read (fd, readbuf, nrequested)) > 0)
{
nsum += nread;
nrequested -= nread;
readbuf += nread;
}
return nsum;
}
Note that the buffer is not null-terminated as not all data is necessarily text. You can pass buffer_size - 1 as the requested number of bytes and use the return value to add a null terminator where necessary. This is useful primarily when interacting with functions that will expect a null-terminated string:
char readbuf[4096];
ssize_t n;
int fd;
fd = open ("WWW", O_RDONLY);
if (fd == -1)
{
perror ("unable to open WWW");
exit (1);
}
n = lseek (fd, 900, SEEK_SET);
if (n == -1)
{
fprintf (stderr,
"warning: seek operation failed: %s\n"
" reading 900 bytes instead\n",
strerror (errno));
n = read_fill (fd, readbuf, 900);
if (n < 900)
{
fprintf (stderr, "error: fewer than 900 bytes in file\n");
close (fd);
exit (1);
}
}
/* Read a file, printing its contents to the screen.
*
* Caveat:
* Not safe for UTF-8 or other variable-width/multibyte
* encodings since required bytes may get cut off.
*/
while ((n = read_fill (fd, readbuf, (ssize_t) sizeof readbuf - 1)) > 0)
{
readbuf[n] = 0;
printf ("Read\n****\n%s\n****\n", readbuf);
}
if (n == -1)
{
close (fd);
perror ("error reading from WWW");
exit (1);
}
close (fd);
I could also have avoided the null termination operation and filled all 4096 bytes of the buffer, electing to use the precision part of the format specifiers of printf in this case, changing the format specification from %s to %.4096s. However, this may not be feasible with unusually large buffers (perhaps allocated by malloc to avoid stack overflow) because the buffer size may not be representable with the int type.
Also, you can use a regular char just fine:
char c;
nread = read (fd, &c, 1);
Apparently you didn't know that the unary & operator gets the address of whatever variable is its operand, creating a value of type pointer-to-{typeof var}? Either way, it takes up the same amount of memory, but reading 1 byte at a time is something that normally isn't done as I've explained.
Mixing declarations and code is a no no. Also, no, that is not a valid declaration. C should complain about it along the lines of it being variably defined.
What you want is dynamically allocating the memory for your char buffer[]. You'll have to use pointers.
http://www.ontko.com/pub/rayo/cs35/pointers.html
Then read this one.
http://www.cprogramming.com/tutorial/c/lesson6.html
Then research a function called memcpy().
Enjoy.
Read through that guide, then you should be able to solve your problem in an entirely different way.
Psuedo code.
declare a buffer of char(pointer related)
allocate memory for said buffer(dynamic memory related)
Find location of where you want to start at
point to it(pointer related)
Figure out how much you want to store(technically a part of allocating memory^^^)
Use memcpy() to store what you want in the buffer
Using C, is there a way to read only the last line of a file without looping it's entire content?
Thing is that file contains millions of lines, each of them holding an integer (long long int). The file itself can be quite large, I presume even up to 1000mb. I know for sure that the last line won't be longer than 55 digits, but could be 2 only digits as well. It's out of options to use any kind of database... I've considered it already.
Maybe its a silly question, but coming from PHP background I find it hard to answer. I looked everywhere but found nothing clean.
Currently I'm using:
if ((fd = fopen(filename, "r")) != NULL) // open file
{
fseek(fd, 0, SEEK_SET); // make sure start from 0
while(!feof(fd))
{
memset(buff, 0x00, buff_len); // clean buffer
fscanf(fd, "%[^\n]\n", buff); // read file *prefer using fscanf
}
printf("Last Line :: %d\n", atoi(buff)); // for testing I'm using small integers
}
This way I'm looping file's content and as soon as file gets bigger than ~500k lines things slow down pretty bad....
Thank you in advance.
maxim
Just fseek to fileSize - 55 and read forward?
If there is a maximum line length, seek to that distance before the end.
Read up to the end, and find the last end-of-line in your buffer.
If there is no maximum line length, guess a reasonable value, read that much at the end, and if there is no end-of-line, double your guess and try again.
In your case:
/* max length including newline */
static const long max_len = 55 + 1;
/* space for all of that plus a nul terminator */
char buf[max_len + 1];
/* now read that many bytes from the end of the file */
fseek(fd, -max_len, SEEK_END);
ssize_t len = read(fd, buf, max_len);
/* don't forget the nul terminator */
buf[len] = '\0';
/* and find the last newline character (there must be one, right?) */
char *last_newline = strrchr(buf, '\n');
char *last_line = last_newline+1;
Open with "rb" to make sure you're reading binary. Then fseek(..., SEEK_END) and start reading bytes from the back until you find the first line separator (if you know the maximum line length is 55 characters, read 55 characters ...).
ok. It all worked for me. I learned something new. The last line of a file 41mb large and with >500k lines was read instantly. Thanks to you all guys, especially 'Useless' (love the controversy of your nickname, btw). I will post here the code in the hope that someone else in the future can benefit from it:
Reading ONLY the last line of the file:
the file is structured the way that there is a new line appended and I am sure that any line is shorter than, in my case, 55 characters:
file contents:
------------------------
2943728727
3129123555
3743778
412912777
43127787727
472977827
------------------------
notice the new line appended.
FILE *fd; // File pointer
char filename[] = "file.dat"; // file to read
static const long max_len = 55+ 1; // define the max length of the line to read
char buff[max_len + 1]; // define the buffer and allocate the length
if ((fd = fopen(filename, "rb")) != NULL) { // open file. I omit error checks
fseek(fd, -max_len, SEEK_END); // set pointer to the end of file minus the length you need. Presumably there can be more than one new line caracter
fread(buff, max_len-1, 1, fd); // read the contents of the file starting from where fseek() positioned us
fclose(fd); // close the file
buff[max_len-1] = '\0'; // close the string
char *last_newline = strrchr(buff, '\n'); // find last occurrence of newlinw
char *last_line = last_newline+1; // jump to it
printf("captured: [%s]\n", last_line); // captured: [472977827]
}
cheers!
maxim
#include <stdio.h>
int main(void)
{
clrscr();
FILE *fin;
fin=fopen("data.txt","r");
if(fin==NULL)
{
printf("can not open input fil");
return 0;
}
long data[2];
while(!feof(fin))
{
fscanf(fin,"%ld %ld",&data[0],&data[1]);
printf("\n%ld %ld",data[0],data[1]);
}
fclose(fin);
return;
}
above is my c code for reading a table from a file.In that ..last value is printing 2 times !!!
data.txt
1 34
2 24
3 45
4 56
5 67
but I can not get proper values with broken table like below...How can I resolve it ? (here It should work where it does not find any value it should return "null space" or zero ..but not the next value..)
data.txt
1 34
2
3 45
4
5 67
as well as
data.txt
1 34
57
3 45
4
5 34
above is my c code for reading a table from a file.In that ..last value is printing 2 times !!!
The last value is printing two times due to the structure of the file reading loop. The eof() flag is not set until an attempt is made to read past the end of the file. When fscanf() reads the last two longs from the last line of the file eof() is not yet set but the next call to fscanf() fails and sets eof() but the result of fscanf() is not queried immediately, resulting the use of the previously extracted longs: check the result of all read operations immediately.
A possible solution is to read a line at a time, using fgets(), and then use sscanf() to extract the long value(s) from the read line. If fscanf() is used, it would read past the new-line character to locate the second requested long, which is not the desired behaviour.
For example:
char line[1024];
while (fgets(line, 1024, fin))
{
/* Assign appropriate default values.
sscanf() does not modify its arguments
for which it has no value to assign.
So if 'line' has a single long value
data[1] will be zero. */
long data[2] = { 0, 0 };
/* You can use 'result' if you require to take particular
action if it reads only 1, or 0, items. */
int result = sscanf(line, "%ld %ld", &data[0], &data[1]);
printf("\n%ld %ld",data[0],data[1]);
}
(in response to question update) To differentiate between lines where second value is missing:
2
and lines where first value is missing:
57
a valid range (or some other criteria) is required to determine which value (the first or second) was missing from the line:
int result = sscanf(line, "%ld %ld", &data[0], &data[1]);
if (1 == result)
{
if (data[0] >= 1 && data[0] <= 9)
{
printf("\n%ld 0", data[0]);
}
else
{
/* Read value was the second value. */
printf("\n%ld %ld", ++last_first_value, data[0]);
}
}
where last_first_value is a long that stores the current value of the first value (either the last successfully read first value or computed from the last successfully read first value).
while(!feof(fin))
{
fscanf(fin,"%ld %ld",&data[0],&data[1]);
printf("\n%ld %ld",data[0],data[1]);
}
feof doesn't return true until after you attempt to read past the end of the file, so the loop will execute once too often. It's better to check the return value of fscanf and if it doesn't match what you expect (2 in this case), then check for EOF. Here's one possible restructuring:
int good = 1;
while (good)
{
int itemsRead = fscanf(fin, "%ld %ld", &data[0], &data[1]);
if (itemsRead == 2)
{
// process data[0] and data[1] normally
}
else
{
good = !good;
if (feof(fin))
printf("Hit end of file\n");
else if (ferror(fin))
printf("Error during read\n");
else
printf("Malformed input line\n");
}
}