Below is part of my code to read data from a text file, strip out the HTML and print out just the normal text. This all work swell but i am having a problem with reading all of the text file. How would i read the entire text file, understand that i will probably need to use malloc but am unsure of how to do so.
int i, nRead, fd;
int source;
char buf[1024];
int idx = 0;
int opened = 0;
if((fd = open("data.txt", O_RDONLY)) == -1)
{
printf("Cannot open the file");
}
else
{
nRead = read(fd, buf, 1024);
printf("Original String ");
for(i=0; i<nRead; i++)
{
printf("%c", buf[i]);
}
printf("\nReplaced String ");
for(i=0; i<nRead; i++)
{
if(buf[i]=='<') {
opened = 1;
} else if (buf[i] == '>') {
opened = 0;
} else if (!opened) {
buf[idx++] = buf[i];
}
//printf("%c", buf[i]);
}
}
buf[idx] = '\0';
printf("%s\n", buf);
close(source);
If you want to read the complete file do the following:
Open the file
Use fstat - see fstat - to get the size
malloc the buffer i.e. buffer = malloc(fileStats.st_size);
Read the file fread(buffer, fileStats.st_size, 1);
Close the file.
Play with the buffer to your hearts content.
You may wish to add one to the buffer size to place the null character into it.
Instead of collecting all the text in a single buffer, you could just put the above in a loop and call read() repeatedly to fill the buffer. Process each chunk as you read it, and print out the part you've processed so far. When you hit end-of-file (i.e., when read() returns 0,) stop.
More efficient would be to use the mmap() call to map the file directly into memory:
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
struct stat statbuf;
stat("data.txt", &statbuf);
size_t len = stat.st_size;
int fd = open("data.txt",O_RDONLY);
char *buf = mmap(NULL, len, PROT_READ, MAP_PRIVATE,fd, 0);
for( i=0; i< len; i++ ) {
// do your own thing here
}
munmap(buf,len);
close(fd);
If the file is longer than 2GB then use the mmap2() call - you will have to fiddle with page sizes as the last argument is in pages (usually 4k)
Related
My program has a consumer and multiple producers. The producers each read a different file and write their content into a FIFO in N-sized chunks, with a leading parameter for the consumer to interpret.
The consumer is supposed to take these chunks and compose an output file where each line corresponds to one producer. The leading parameter from the chunk is used to determine the owner of the chunk and where to write it (it's a line number number in the output file).
My problem is, even though it works mostly fine when there's one producer, any more make the resulting file a mess. Also there are some unexpected excessive \n but they aren't critical.
This is my expected output:
aaaaa1a aaaaaaa2a aaa3a aaaaaaaaaaa4a
bbbbbbbbbbb1b bbbbbbb2b bbbbbbbbbbbbbb3b bbbbbbb4b bbbbbbbbbb5b bb6b
cccccccccc1c cccc2c cccccccc3c ccccc4c ccccccccc5c ccccccccccccc6c
but that's what I get:
aaaaa1a aaaaaaa2a aaa3a aaaaaaaaaaa4a2 bbbbbbb43 cccccccc53 cccccccc2 bbbbbbbb2 b5b bb6b3 cccc6c2
bbbbbbbbbbb1b bbbbbbb2b bbbbbbbbbbbbbb3b
cccccccccc1c cccc2c cccccccc3c ccccc4c c
There's an unexpected cutoff in the later lines and the chunks become mixed up.
I think it's a problem with how I handle the named pipes, because I'm printing the "raw input" before further processing and I can see that I'm reading invalid data from the pipe. But AFAIK Linux has atomic writes for small chunks of data for FIFO. Maybe the reads aren't caring about the writes and that's where lies the problem?
Consumer code:
#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
char *filename;
size_t getFileSize(FILE *fp) {
fseek(fp, 0, SEEK_END);
size_t len = ftell(fp);
rewind(fp);
printf("length %ld \n", len);
return len;
}
int nthFunctionCall = 1;
void printFile(FILE *file) {
char *fileContent = NULL;
if(file != NULL) {
size_t size = getFileSize(file);
fileContent = malloc((size /* + 1*/) * sizeof(char));
fread(fileContent, 1, size, file);
//fileContent[size + 1] = '\0'; ?
}
printf("FILE CONTENT: \n%s\n", fileContent);
}
void writeToFile(long targetLineNumber, char *text) {
FILE *temp = fopen("temp", "w");
if(temp == NULL) {
perror("can't create temp");
exit(-1);
}
char *fileContents = NULL;
FILE *file = fopen(filename, "r");
if(file != NULL) {
size_t size = getFileSize(file);
fileContents = malloc((size + 1) * sizeof(char));
fread(fileContents, 1, size, file);
fileContents[size] = '\0'; // tbh, I don't know whether I should do this or not.
fclose(file);
}
char *fileContentsCpy = fileContents;
printf("FILE CONTENT:\n %s\n", fileContents);
printf("%d Text to save %s\n", nthFunctionCall, text);
char *currentLineFromFile;
size_t processedLineNumber;
for (processedLineNumber = 1; (currentLineFromFile = strsep(&fileContents, "\n")) != NULL; processedLineNumber++) {
printf("%d targetLineNumber %ld processedLineNumber %ld \n", nthFunctionCall, targetLineNumber, processedLineNumber);
printf("%d copy the current line into temp: %s\n", nthFunctionCall, currentLineFromFile);
fputs(currentLineFromFile, temp);
if(processedLineNumber == targetLineNumber) {
printf("%d add text to line %ld: %s\n", nthFunctionCall, processedLineNumber, text);
fputs(text, temp);
}
fputs("\n", temp);
fflush(temp);
}
printf("%d Finished loop with: targetLineNumber %ld processedLineNumber %ld \n", nthFunctionCall, targetLineNumber, processedLineNumber);
if(targetLineNumber >= processedLineNumber) {
for (int j = 0; j < (targetLineNumber - processedLineNumber); ++j) {
fputs("\n", temp);
}
printf("%d added text: %s\n", nthFunctionCall, text);
fputs(text, temp);
fflush(temp);
}
fclose(temp);
if(fileContentsCpy != NULL) free(fileContentsCpy);
nthFunctionCall++;
remove(filename);
rename("temp", filename);
printf("One iteration end\n");
}
int numberLength(size_t number) {
int len = 0;
while(number > 0) {
number /= 10;
len++;
}
return len;
}
int main(int argc, char **argv)
{
if (argc < 4) {
fprintf(stderr, "testConsument <fifo_path> <file_to_save_in> <chunk size>\n");
exit(-1);
}
char *myfifo = argv[1];
filename = argv[2];
int numberToRead = atoi(argv[3]);
int fd = open(myfifo, O_RDONLY);
perror("sdada test consument");
char *str1 = calloc(100, sizeof(char));
while (read(fd, str1, numberToRead + 3) > 0) {
long lineNumber;
printf("length: %ld raw input: %s\n", strlen(str1), str1);
sscanf(str1, "%ld", &lineNumber);
char* content = str1 + numberLength(lineNumber) + 1; // lines should be of the format "<number> <chunk-sized-word>\0"
printf("add to line %ld content : %s \n", lineNumber, content);
writeToFile(lineNumber, content);
sleep(1);
free(str1);
str1 = calloc(100, sizeof(char));
printf("#################\n");
}
free(str1);
close(fd);
FILE *res = fopen(filename, "r");
printFile(res);
fclose(res);
return 0;
}
Producer code:
#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>
size_t getFileSize(FILE *fp) {
fseek(fp, 0, SEEK_END);
size_t len = ftell(fp);
rewind(fp);
return len;
}
int main(int argc, char **argv) {
if (argc < 5) {
fprintf(stderr, "producer <fifo_path> <line_number_to_save_in> <input_file> <chunk_size>\n");
exit(-1);
}
char *myfifo = argv[1];
char *lineNumber = argv[2];
int numberToRead = atoi(argv[4]);
mkfifo(myfifo, 0666);
int fd = open(myfifo, O_WRONLY);
char *someFilePath = argv[3];
FILE *somFile = fopen(someFilePath, "r");
char *buf = calloc(numberToRead, sizeof(char));
size_t size = 1;
while ((fread(buf, size, numberToRead, somFile) > 0)) {
char *buf2 = calloc((numberToRead + 3), sizeof(char));
strcat(buf2, lineNumber); strcat(buf2, " "); strcat(buf2, buf); strcat(buf2, "\0");
while (strstr(buf2, "\n")) {
buf2[strcspn(buf2, "\n")] = ' ';
}
printf("SENDING: %s\n", buf2);
fflush(stdout);
write(fd, buf2, numberToRead + 3);
sleep(2);
free(buf);
free(buf2);
buf = calloc(numberToRead, sizeof(char));
}
write(fd, lineNumber, 2);
close(fd);
return 0;
}
After running both the producer and the consumer the communication should start working and after some time there should be an output file. After each such execution you have to manually remove the file, because I didn't really consider the situation where it has existed before.
Example start (each line should be in a different terminal):
./producer '/tmp/fifo3' 3 'file1' 10
./producer '/tmp/fifo3' 2 'file1' 10
./producer '/tmp/fifo3' 1 'file1' 10
./testConsument '/tmp/fifo3' 'output' 10
There are a lot of debug prints, I'm not sure if they are helpful or not but I'm leaving them in.
The problem you face is that having several producers attached to a shared resource (the fifo) you need to control how the accesses are done to be able to control that the consumer gets the data in the proper sequence. The only help you get from the kernel is at the write(2) system call level (the kernel locks the inode of the destination fifo during the time the system call is being executed) So, if you are making short writes, the easiest approach is to group together all the data you are going to put in the fifo and write(2) it all in one single call to write.
If you opt for a more complex solution, then you need to use some kind of mutex/semaphore/whatever to control who has exclusive access to the fifo for writting, as other processes must wait for it to release the lock before starting to write.
Also, don't try to use stdio if you are using this approach. The stdio package only writes data when flushing a buffer, and this happens differently for the output terminal than for a fifo, it depends on the actua buffer size it is using and you don't have an exact and clear idea on when it is happening. This means you cannot use fprintf(3) and friends.
Finally, if you use the atomicity of write(2), then have in mind that a fifo is a limited resource, that can buffer data, and will break a write(2) call if you try to write a big amount of data in one shot (this meaning a single write(2) call) you can get a partial write and you will not be able to recover from this because in the mean time other producers can have access to the fifo and be writting on it (which will break your writting structure) As a rule of thumb, try to recude your messages to a small number of kilobytes (4kb or 8kb being a good upper limit, to be portable to different unices)
The problem laid with the line
write(fd, lineNumber, 2);
near the end in the producer program. It sent unnecessary data which wasn't meaningful in any way and wasn't interpreted properly.
After removing it the program works as intended (except for the unexpected new lines, but they aren't that bad and they have happened before).
I am trying to write a program on how to read a file 10 bytes per time using read, however, I do not know how to go about it. How should I modify this code to read 10bytes per time. Thanks!!!!
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/time.h>
int main (int argc, char *argv[])
{
printf("I am here1\n");
int fd, readd = 0;
char* buf[1024];
printf("I am here2\n");
fd =open("text.txt", O_RDWR);
if (fd == -1)
{
perror("open failed");
exit(1);
}
else
{
printf("I am here3\n");
if(("text.txt",buf, 1024)<0)
printf("read error\n");
else
{
printf("I am here3\n");
/*******************************
* I suspect this should be the place I make the modification
*******************************/
if(read("text.txt",buf, 1024)<0)
printf("read error\n");
else
{
printf("I am here4\n");
printf("\nN: %c",buf);
if(write(fd,buf,readd) != readd)
printf("write error\n");
}
}
return 0;
}
The final parameter of read() is the maximum size of the data you wish to read so, to try and read ten bytes at a time, you would need:
read (fd, buf, 10)
You'll notice I've also changed the first parameter to the file descriptor rather than the file name string.
Now, you'll probably want that in a loop since you'll want to do something with the data, and you also need to check the return value since it can give you less than what you asked for.
A good example for doing this would be:
int copyTenAtATime (char *infile, char *outfile) {
// Buffer details (size and data).
int sz;
char buff[10];
// Try open input and output.
int ifd = open (infile, O_RDWR);
int ofd = open (outfile, O_WRONLY|O_CREAT);
// Do nothing unless both opened okay.
if ((ifd >= 0) && (ofd >= 0)) {
// Read chunk, stopping on error or end of file.
while ((sz = read (ifd, buff, sizeof (buff))) > 0) {
// Write chunk, flagging error if not all written.
if (write (ofd, buff, sz) != sz) {
sz = -1;
break;
}
}
}
// Finished or errored here, close files that were opened.
if (ifd >= 0) close (ifd);
if (ofd >= 0) close (ofd);
// Return zero if all okay, otherwise error indicator.
return (sz == 0) ? 0 : -1;
}
change the value in read,
read(fd,buf,10);
From man of read
ssize_t read(int fd, void *buf, size_t count);
read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf.
if(read("text.txt",buf, 1024)<0)// this will give you the error.
First argument must be an file descriptor.
I really have a problem in here. It seems that i dont really find the best way to exit a loop when reading characters from a file. I know that every tutorial suggests that i shouldn't use while( !feof() ) but they dont really suggest anything else than putting fgets() in the while and that is not really apropriate because i want to read the whole FILE content in my variable.
while (!feof(newFile))
{
newString[i++] = fgetc(newFile);
}
newString[i] = '\0';
i = 0;
//this is the resoult seen with the debugger
newFile content = ABC
newString[0] = 65 (A)
newString[1] = 66 (B)
newString[2] = 67 (C)
newString[3] = 10 (\n)
newString[4] = -1
newString[5] = 0 (\0)
I am looking for a solution and some advices about how to improve my algorithm.
int c;
while ((c = fgetc(newFile)) != EOF) newString[i++] = c;
newString[i] = '\0';
For reading whole test files into memory, I suggest using mmap. This has the benefit, that all buffering and reading can be handled by your operating system, and you can focus your code on the task at hand. (also, it's usually faster than buffering stuff yourself.)
#include <stdio.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
int
main (void)
{
int fd = open("filename", O_RDONLY);
if (fd == -1)
return 0; // file open failed
struct stat sb;
int res = fstat(fd, &sb);
if (res == -1)
return 0; // stat failed
size_t length = sb.st_size;
char *data = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if (!data)
return 0; // mmap failed
// iterate over characters
size_t i;
for (i = 0; i < length; ++i)
printf("'%c'\n", data[i]);
munmap(data, length);
return 0;
}
they dont really suggest anything else than putting fgets() in the while and that is not really apropriate
That is absolutely, entirely appropriate. fgets() reads the file line by line, and you can append each line onto then end of a dybamically expanding buffer.
However, if you don't want to use fgets(), and you just want to read the file at once: use fread().
FILE *f = fopen("foo.txt", "rb");
if (!f)
abort(); // "handle" error
fseek(f, 0, SEEK_END);
size_t len = ftell(f);
fseek(f, 0, SEEK_SET);
char *buf = malloc(len + 1);
if (!buf)
abort();
if (fread(buf, len, 1, f) != 1) {
// handle reading error
}
buf[len] = 0;
fclose(f);
I am writing a simple program to flip all the bits in a file, but right now it only does the first 1000 bytes until I get that much working. Why does my call to read() ignore \r characters? When I run this code on a file that only contains \r\n\r\n, the read call returns 2 and the buffer contains \n\n. The \r characters are completely ignored. I'm running this on Windows (this wouldn't even be an issue on Linux machines)
Why does read(2) skip over the \r character when it finds it? Or is that what is happening?
EDIT: Conclusion is that windows defaults to opening files in "text" mode as opposed to "binary" mode. For this reason, when calling open, we must specify O_BINARY as the mode.
Thanks, code below.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/stat.h>
#include <fcntl.h>
void invertBytes(size_t amount, char* buffer);
int main(int argv, char** argc)
{
int fileCount = 1;
char* fileName;
int fd = 0;
size_t bufSize = 1000;
size_t amountRead = 0;
char* text;
int offset = 0;
if(argv <= 1)
{
printf("Usages: encode [filenames...]\n");
return 0;
}
text = (char *)malloc(sizeof(char) * bufSize);
for(fileCount = 1; fileCount < argv; fileCount++)
{
fileName = argc[fileCount];
fd = open(fileName, O_RDWR);
printf("fd: %d\n", fd);
amountRead = read(fd, (void *)text, bufSize);
printf("Amount read: %d\n", amountRead);
invertBytes(amountRead, text);
offset = (int)lseek(fd, 0, SEEK_SET);
printf("Lseek to %d\n", offset);
offset = write(fd, text, amountRead);
printf("write returned %d\n", offset);
close(fd);
}
return 0;
}
void invertBytes(size_t amount, char* buffer)
{
int byteCount = 0;
printf("amount: %d\n", amount);
for(byteCount = 0; byteCount < amount; byteCount++)
{
printf("%x, ", buffer[byteCount]);
buffer[byteCount] = ~buffer[byteCount];
printf("%x\r\n", buffer[byteCount]);
}
printf("byteCount: %d\n", byteCount);
}
fd = open(fileName, O_RDWR);
should be
fd = open(fileName, O_RDWR | O_BINARY);
See read() only reads a few bytes from file for details.
Try opening with O_BINARY to use binary mode, text mode may be default and may ignore \r.
open(fileName, O_RDWR|O_BINARY);
I have the following bit of code (it's "example" code, so nothing fancy):
#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#include <sys/types.h>
#include <unistd.h>
int main()
{
char buffer[9];
int fp = open("test.txt", O_RDONLY);
if (fp != -1) // If file opened successfully
{
off_t offset = lseek(fp, 2, SEEK_SET); // Seek from start of file
ssize_t count = read(fp, buffer, strlen(buffer));
if (count > 0) // No errors (-1) and at least one byte (not 0) was read
{
printf("Read test.txt %d characters from start: %s\n", offset, buffer);
}
close(fp);
}
int fp2 = open("test.txt", O_WRONLY);
if (fp2 != -1)
{
off_t offset = lseek(fp2, 2, SEEK_CUR); // Seek fraom current position (0) - same result as above in this case
ssize_t count = write(fp2, buffer, strlen(buffer));
if (count == strlen(buffer)) // We successfully wrote all the bytes
{
printf("Wrote to test.txt %d characters from current (0): %s\n", offset, buffer);
}
close(fp2);
}
}
This code does not return the first printout (reading) as it is, and the second printout reads: "Wrote test.txt 0 characters from current (0): " indicating that it did not seek anywhere in the file and that buffer is empty.
The odd thing is, if I comment out everything from fp2 = open("test.txt", O_WRONLY);, the first printout returns what I expected. As soon as I include the second open statement (even with nothing else) it won't write it. Does it somehow re-order the open statements or something else?
The line
ssize_t count = read(fp, buffer, strlen(buffer));
is wrong, you're taking the strlen of an uninitialized buffer. You likely want the size of the buffer like so:
ssize_t count = read(fp, buffer, sizeof buffer);
You should make sure buffer really contain a nul terminated string as well when you print it as one.
if (fp != -1) // If file opened successfully
{
off_t offset = lseek(fp, 2, SEEK_SET); // Seek from start of file
ssize_t count = read(fp, buffer, sizeof buffer - 1);
if (count > 0) // No errors (-1) and at least one byte (not 0) was read
{
buffer[count] = 0;
Are you perfectly sure you are cleaning out the file every time you run?
As written, the first time you run this, you'll only see the second printout, and the second time you might see the first one.