I am experiencing that calling read_from_fd more than once causes the data to be empty.
#include <fcntl.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int fd;
void read_from_fd() {
char data[2];
read(fd, data, 1);
data[1] = 0x00;
printf("data %s\n", data);
}
int main(void) {
fd = open("test.txt", O_RDWR);
read_from_fd();
read_from_fd();
read_from_fd();
}
So the first read prints data but the second and third one print something empty.
I guess this has to do something with the memory from char. Is this correct? What do I have to do to fix this?
Bodo
If there is only one character in the input, then it is clear that you will get it only once. This has to do with the seek pointer in the file. When you perform an open with the O_RDWR flag, the seek pointer is placed at the beginning of the file. Then on every call to read it is moved the amount of bytes read. When the seek pointer reaches the end of the file, the call to read will fill your buffer with 0 and return an appropriate value.
If you want to read the same character over and over, you have to reset the seek pointer using function lseek.
According to opengroup The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified.
Related
I know it's very dumb but I really don't get what the heck is happening here.
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
int getinput()
{
char buf[10];
int rv = read(0, buf, 1000);
printf("\nNumber of bytes read are %d\n", rv);
return 0;
}
int main()
{
getinput();
return 0;
}
I can't understand how this read() function is working.
read(0, buf, 1000)
Also, the buf is 10 bytes long why it is taking 23 bytes?
Array-pointer equivalence
In C, an array like the variable buf in your example is just a pointer to the memory address of the first allocated byte.
You can print the value of this pointer:
#include <stdio.h>
int main(void) {
char buf[10];
printf("Address of the first byte of buf: %p\n", buf);
return 0;
}
Output:
Address of the first byte of buf: 0x7ffd3699bfb6
Pointer arithmetic
When you write something into this buffer with an instruction like
buf[3] = 'Z';
It is in fact translated to
*(buf+3) = 'Z';
It means "add 3 to the value of the pointer buf and store the character 'Z' at the resulting address".
Nothing is stopping you from storing the character 'Z' at any given address. You can store it before or after the address pointed to by buf without any restriction. If the address you choose happen to be the address of another variable, it cannot produce a segmentation fault (the address is valid).
In C, you can even write the character 'Z' at the address 123456 if you like:
int main(void) {
char *address = (char *)123456;
*address = 'Z';
return 0;
}
The fact that your buffer is 10 bytes long does not change that. You cannot "fix" this because writing anything at any memory location is a fundamental feature of the C programming language. The only "fix" I can think of would be to use another language.
File descriptors opened at program startup
In your exemple, you pass the value 0 as the first argument of the function read(). It seems that this value corresponds to the file descriptor of the standard input. This file descriptor is automatically opened at program startup (normally you get such a file descriptor as the result of a call to the function open()). So, if you get 23 read bytes, it means that you typed in 23 characters on your keyboard during the program execution (for instance, 22 letters and 1 newline character).
It would be better to use the macro name of the standard input file descriptor:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
int getinput()
{
char buf[10];
int rv = read(STDIN_FILENO, buf, 10);
printf("\nNumber of bytes read are %d\n", rv);
return 0;
}
int main()
{
getinput();
return 0;
}
your sample is a perfect example of a buffer overflow.
read(0, buff, 1000) will most probably corrupt the memory (stack on your case).
Read will take the start address of your buf pointer and will write those 23 bytes in your case... if there are some other structures on the memory they will be overwritten by those 13 bytes and can lead to very unwanted behavior (maybe even crashes of you application)
C gives the responsibility to handle memory correctly to the programmer. So there is no bounds checking.
You call read() with 3 arguments:
The file handle, in your case "0".
The pointer to the array of bytes to fill with the bytes read from the file, in your case buf.
The size of this array, in your case "1000".
Apparently the file has only 23 bytes, which is less or equal to 1000, so read() returns this value.
Note: But before, it happily wrote all these 23 bytes into the array. Since your buffer has just a capacity of 10 bytes, the memory after it gets overwritten. This is called "buffer overflow" and is a common error, abused for evil attacks, or possibly leading to crashes or malfunction (Thanks, ikegami!).
To fix this error, I recommend to change the read into:
read(0, buf, sizeof buf);
This way your are always giving the right size to read(). (If you declare buf as an array, of course.)
I'm trying to read a huge dataset of 20 millions lines, in each line there is a huge number (in fact I'm storing the number in unsigned long long variables), for example: 1774251443, 8453058335, 19672843924, and so on...
I develop a simple function to do this, I'll show below
void read(char pathToDataset[], void **arrayToFill, int arrayLength) {
FILE *dataset = fopen(pathToDataset, "r");
if (dataset == NULL ) {
printf("Error while opening the file.\n");
exit(0); // exit failure, it closes the program
}
int i = 0;
/* Prof. suggestion: do a malloc RIGHT HERE, for allocate a
* space in memory in which store the element
* to insert in the array
*/
while (i < arrayLength && fscanf(dataset, "%llu", (unsigned long long *)&arrayToFill[i]) != EOF) {
// ONLY FOR DEBUG, it will print
//printf("line: %d.\n", i); 20ML of lines!
/* Prof. suggestion: do another malloc here,
* for each element to be read.
*/
i++;
}
printf("Read %d lines", i);
fclose(dataset);
}
the parameter arrayToFill is of type void** because of the exercise goal. Every function has to perform on generic type, and the array could potentially be filled with every type of data (in this example, huge numbers, but it could contain huge strings, integers and so on...).
I don't understand why I have to do 2 malloc calls, isn't a single one enough?
For your first question, think of malloc as a call for memory to store a number of N objects, all of which are size S. When you have the parameters void ** arrayToFill, int arrayLength, you are saying this array will contain arrayLength amount of pointers of the size sizeof(void*). That is the first allocation and call to malloc.
But the members of that array are pointers, which are meant to hold arrays or essentially memory of some other object themselves. The first call to malloc only allocates memory to store the void* of each array member, but the memory for each individual member of the array needs it's own malloc() call.
Efficient Line Reading
For your other question, making lots of small allocations of memory, and then later on freeing them (assuming you would do so, otherwise you would leak a lot of memory), is very slow. However, the performance hit for I/O related tasks is more based on the number of calls than it is for the amount of memory you are allocating.
Have your program read the entire file into memory, and allocate an array of unsigned long long for 20 million, or however many integers you expect to handle. This way, you can parse through the file contents, use strtol function from <stdlib.h>, and one by one copy the resulting long to your large array.
This way, you only use a 2-3 large memory allocations and deallocations.
I've come up with this POSIX solution, see if it helps
#include <unistd.h> //for read, write, lseek
#include <stdio.h> //fprintf
#include <fcntl.h> //for open
#include <string.h> //
#include <stdlib.h> // for exit and define
#include <sys/types.h>
#include <sys/stat.h>
int main(int argc, char * argv[])
{
int fd; // file descriptor
char * buffer; //pointer for the malloc
if(argc < 2)
{
fprintf(stderr, "Insert the file name as parameter\n");
exit(EXIT_FAILURE);
}
if((fd = open(argv[1], O_RDONLY)) == -1)// opens the file in read-only mode
{
fprintf(stderr, "Can't open file\n");
exit(EXIT_FAILURE);
}
off_t bytes = lseek(fd, 0, SEEK_END); // looks at how many bytes the file has
lseek(fd, 0, SEEK_SET); // returns the file pointer to the start position
buffer = malloc(sizeof(char)*bytes); // allocates enough memory for reading the file
int r = read(fd, buffer, bytes); //reads the file
if(r == -1)
{
fprintf(stdout, "Error reading\n");
exit(EXIT_FAILURE);
}
fprintf(stdout, "\n%s", buffer); // prints the file
close(fd);
exit(EXIT_SUCCESS);
}
I wrote the following code in order to write some random characters to a text file:
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
int main()
{
int input_f = open("./input.txt", O_CREAT | O_APPEND | O_RDWR ,0666);
int i;
for(i=0;i<50;i++)
{
int r = rand()%252;
printf("%d size of r: %d\n",i,sizeof(r));
write(input_f,r,sizeof(r));
printf("%d we just wrote %d which is %c\n",i,r,r);
}
close(input_f);
}
I looked for some solutions to do this
Maybe someone here knows how can I fix this?
write(input_f,r,sizeof(r));
should be
write(input_f, &r, sizeof(r));
The second parameter is the address of the buffer you want to send according to the man page.
Also you should check the return value of the function to be equal to sizeof r.
write(input_f,r,sizeof(r)); should be write(input_f,&r,sizeof(r)); because write takes a pointer to the data to be written, not the data directly.
Other then that you should be checking the result of the open call, and write calls they can fail.
You're calling write wrong.
If you'd included unistd.h, you would have gotten a prototype and the compiler would have corrected you.
write(input_f,&r,sizeof(r)); //the 2nd arg is a void const*
Also, size_t arguments to printf require "%zu", and you should be checking for errors.
Others have already said why it doesn't work
I just want to add that you should also write :
#include <unistd.h>
Or else you'll get warnings during compilation.
The write() function do not take an int as second parameter but a pointer (void *) on a buffer, and its length. It means you will have to convert your int into a string in a buffer first (with sprintf() for instance) and then output the buffer to the file (with write())
I'm trying to make a shell "bosh>" which takes in Unix commands and keep getting a bad address error. I know my code reads in the commands and parses them but for some reason, I cannot get them to execute, instead, I get a "bad address" error.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/types.h>
#include <string.h>
#include <sys/wait.h>
#define MAX_LINE 128
#define MAX_ARGS 10
int main(){
pid_t pid;
char command[MAX_LINE]; /*command line buffer*/
char *commandArgs[MAX_ARGS]; /*command line arg*/
int i;
char *sPtr=strtok(command," ");
int n=0;
printf("bosh>");
fgets(command, MAX_LINE-1,stdin);
command[strlen(command)-1]='\0';
while(strcmp(command,"quit")!=0)
{
n=0;
sPtr=strtok(command," ");
while(sPtr&&n<MAX_ARGS)
{
sPtr=strtok(NULL," ");
n++;
}
commandArgs[0]=malloc(strlen(command)+1);
strcpy(commandArgs[0],command);
if(fork()==0)
{
execvp(commandArgs[0],commandArgs);
perror("execvp failed");
exit(2);
}
pid=wait(NULL);
printf("%s",">" );
fgets(command, MAX_LINE-1,stdin);
command[strlen(command)-1]='\0';
}
printf("Command (%d) done\n", pid);
return 0;
}
These two lines are the culprit:
commandArgs[0]=malloc(strlen(command)+1);
strcpy(commandArgs[0],command);
First of all, malloc(strlen(...)) followed by strcpy is what the POSIX function strdup already does. But then, you don't need to even copy the string - it is enough to just store the pointer to the original string into commandArgs[0]:
commandArgs[0] = command;
But then, how does execvp how many arguments the command is going to take? If you read the manuals carefully, they'd say something like:
The execv(), execvp(), and execvpe() functions provide an array of pointers to null-terminated strings that represent the argument list available to the new program. The first argument, by convention, should point to the filename associated with the file being executed. The array of pointers MUST be terminated by a NULL pointer.
Your argument array is not NULL-terminated. To fix it, use
commandArgs[0] = command;
commandArgs[1] = NULL; // !!!!
(Then you'd notice that you'd actually want to assign the arguments within the strtok parsing loop, so that you can actually assign all of the arguments into the commandArgs array; and compile with all warnings enabled and address those, and so forth).
You initialize sPtr in its declaration, which you do not need to do because you never use the initial value. But the initialization produces undefined behavior because it depends on the contents of the command array, which at that point are indeterminate.
The array passed as the second argument to execvp() is expected to contain a NULL pointer after the last argument. You do not ensure that yours does.
You furthermore appear to drop all arguments to the input command by failing to assign tokens to commandArgs[]. After tokenizing you do copy the first token (only) and assign the copy to the first element of commandArgs, but any other tokens are ignored.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *readLine(FILE *inFile) //Simply reads line in a text file till "\n"
{
char *line = realloc(NULL, 1);
char c;
int i=0;
while (!feof(inFile))
{
c = fgetc(inFile);
if (ferror(inFile)) printf("Error reading");
if (c == 10)
{
realloc(line,i+1);
line[i]= 10;
break;
}
realloc(line, i+1);
line[i++] = c;
}
return line;
}
int main(int argc,char **argv)
{
FILE *inFile;
inFile = fopen("testFile","r");
printf("%s",readLine(inFile));
printf("%s",readLine(inFile));
printf("%s",readLine(inFile));
return 0;
}
If the contents of testFile is:-
abc
def
ghi
The three printf statements should show "abc" three times.. But the output is:-
abc
def
ghi
I know I am wrong in the concept somewhere. Pls help.
Usage of realloc() is incorrect.
realloc(line,i+1); // wrong
// OK
void *new_line = realloc(line,i+1);
if (!new_line)
{
free(line);
return NULL;
}
line = new_line;
Because line is passed by value, it's not changed. The actual re-allocated memory is in the return value. Therefore line remains the same over and over again, and you are seeing the same line over and over again. Edit: just realized that's even though it's a bug, it's not what would cause repeating lines. Other points are still valid.
What's worse:
You have a memory leak by losing the newly re-allocated pointer every time.
You are potentially accessing freed memory, because old line value may become invalid after reallocation, if it was reallocated in a different part of the heap.
You are re-allocating memory every character, which is potentially an expensive operation.
But I am passing file pointer by value. So i should get output "abc" again and again
Ah, I understand your confusion.
A file pointer only points to the actual file structure. State such as the current offset are not part of the pointer but are part of the internal structure.
Another way to think about this is that the actual object representing the file is FILE. To get pass-by-reference semantics, you pass a pointer to the object. Since you are passing by reference, each line picks up where the last one left off.
fgetc() advances the file pointer (which is "where the next character to be read is located"). That's how you're able to call it in a loop and read a whole line of characters.
After it advances past the newline character, it naturally moves on to the next character, which is the beginning of the next line.
You could modify the file pointer with the fseek() function. For example, calling fseek(inFile, 0, SEEK_SET) would reset it to the beginning of the file, causing the next fgetc() call to start over from the first character of the file.