I am trying to understand some C code I've stumbled across. For the record, the code does exactly what it's supposed to except for the print line I added. It takes all of the contents from the inputFile.txt, and replaces the J's with X's, and writes it to the outputFile.txt.
int main(void) {
int fd_to_read = open("inputFile.txt", O_RDONLY);
int fd_to_write = open("outputFile.txt", O_WRONLY | O_CREAT);
char c;
int bytes;
while ((bytes = read(fd_to_read, &c, sizeof(c))) > 0) {
if (c == 'J') {
c = 'X';
}
write(fd_to_write, &c, sizeof(c));
//I added this and it doesn't work.
printf(&c);
}
close(fd_to_read);
}
When I first saw this, I expected the while loop to print the first character from the file over and over again. I understand that the read() function will be executed until it is >0, but I assumed that in order for it to change position in the file the memory address pointed to by c would have to increment by something, possibly the sizeof(c), but nothing appears to increment and it just moves on to the next letter in the file. My next assumption is that read() handles that on it's own, but when I went to print the contents of &c I got close to what I expected, which was a ton of garbage, since it was just printing random things from memory essentially.
So, two questions really.
How is the &c that is written to the outputFile.txt write to that correctly without incrementing the address of c?
How would I just print the individual characters from the file without all the garbage since print(&c) added all the garbage after each char?
Ok two things.
(1) char c does not need to be incremented because it is serving like the output to the read fn.
(2) the read and write fn's automatically increment their respective file descriptors.
The file descriptors (fd_to_read, fd_to_write) represent a location in the file, not char c.
Read about it in the man pages:
https://linux.die.net/man/3/read
https://linux.die.net/man/3/write
char c; ... printf(&c); is UB, yet I suspect OP knows that.
The first arg to printf() must be a pointer to a string. &c is not a pointer to a string.
OP is hoping for favorable UB. Good luck.
Related
I need to write a program in C, that prints out last five lines of file by using basic functions like open, read, write, close, lseek. My code so far:
int main(int argc, char *argv[]){
int fd1=open(argv[1], O_RDONLY);
char c;
int currPos = lseek(fd1,-2,SEEK_END);
while(currPos != -1){
read(fd1,&c,sizeof(c));
currPos--;
currPos=lseek(fd1,currPos,SEEK_SET);
if (c == '\n'){
}
}
return 0;
}
Can anybody help me? I think I need to store those characters in array and then print it backwards, but I don't know how.
Why not count the number of characters read while reading back to the fifth newline (call that n) and then do a read of n characters? You don't need to store the data, it's already stored in the file.
Inside the if statement you can count how many '\n' characters you encounter from the end of your file. When you encounter the 6th end-of-line, you know you are at the end of the 6-th-from-the-end line (assuming that the last line also contains an end-of-line character at the end) , so you just print from that point to the end of the file.
You do not need to save the characters in an array, since they are already saved in your file.
You can just do (after your while loop):
int i=read(fd1,&c,1);
while(i){
printf("%c",c);
i = read(fd1,&c,1);
}
It may not be the most efficient way to do it, but it should do the trick.
Note: There is no need to write sizeof(c), since c is a char, and chars are always 1 byte long.
Also, you should always check the return value of read, you never know when something goes wrong in your system and your program crashes because of a read gone wrong.
I wrote the following code:
main()
{
FILE *fp;
fp=fopen("ftest.txt","r");
char c, filestring[100];
int i=0;
while((c=getc(fp))!=EOF)
{
filestring[i]=c;
}
printf("str is %s",filestring);
fclose(fp);
}
The file ftest.txt contains the words Hello World.
The output displayed is not correct, it is either some other font or some other encoding.
What is the reason for this? And how do I solve this problem?
At the same time, this code runs well (shows output on stdout in "English"):
main()
{
FILE *fp;
fp=fopen("ftest.txt","r");
char c;
while((c=getc(fp))!=EOF)
{
printf("%c",c);
}
fclose(fp);
}
I need the first code to work, as I've to search in the text file. How to solve this?
The question is different from Output is not displying correctly in file operation as I'm able to "display" the correct output (as in second code), but I'm not able to write the contents of the file into a string.
Things I've tried:
1) Changing the mode in which the file is opened from "r" to "rb".
2) Changing the Notepad encoding to all available options: ANSI, UTF etc.
There are two parts of the answer:
You never increment i. This means you're just overwriting the same spot (the first space in the array) in the while loop. That's why the first value of the junk is a 'd' (the last character of your input).
The junk after the 'd' is because the array is never initialized, meaning that there is random junk already there that is never overwritten.
Another note: doing the first way would require manually adding a null byte \0 to the end of the array (either by initializing the whole thing to \0s or just after the last character is read in. This is so the string is read correctly by printf.
... and there's also a third part that's wrong here:
getc() returns an int, you're assigning it to a char before comparing it with EOF, which is defined as -1. If it just so happens that getc returns character 255, it gets assigned to a char, a signed 8 bit value, which results in, in a manner of speaking (char)-1, which then gets signed-extended to -1.
I seem to understand the program now, except the getline function is not very intuitive as it seems to copy everything getchar() returns to a character array s[] which is never really used for anything important.
int getline(char s[], int lim)
{
int c, i;
for(i=0; i < lim - 1 && (c = getchar()) != EOF && c != '\n'; ++i)
s[i] = c;
if(c == '\n')
{
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
The function could just as easily ignore the line s[i] = c; because all the function is really doing is counting the number of characters until it reaches EOF or '\n' returns from getchar()
What I really do not understand is why the program progressed forward as the main loop is as follows:
main()
{
int len; /* current line length */
int max; /* maximum length seen so far */
char line[MAXLINE]; /* current input line */
char longest[MAXLINE]; /* longest line saved here */
max = 0;
while ((len = getline(line, MAXLINE)) > 0)
if (len > max)
{
max = len;
copy(longest, line);
}
if (max > 0) /* there was a line */
printf("%s", longest);
return 0;
}
The only explanation would be that the getchar() function does its magic after the user has entered in a full line of text, and hits the enter key. So it would appear to work during run-time is my guess.
Is this how the program progresses? Does the program first enter the while loop, and then wait for a user to enter a line of text, and once the user hits enter, the getline function's for-loop is iterated? I feel like this would be the case, since the user can enter backspace during input.
My question is, how exactly does the program move forward at all? Is it all because of the getchar() function?
When I hit ctrl-D in the terminal, some other confusing stuff happens. If I hit ctrl-D at the start of a newline, the program will terminate. If I hit ctrl-D at the end of a line filled with some text, it does not terminate and it does not act the same way as hitting enter. If I hit ctrl-D a few times in a line with text, the program will finally end.
Is this just the way my terminal is treating the session, or is this all stuff I should not be worrying about if I just want to learn C?
The reason why I ask is that I like to trace the program to get a good understanding of it, but the getchar() function makes that tricky.
In a parameter declaration (and only in that context), char s[] really means char *s. The way the C standard describes this is that:
A declaration of a parameter as "array of type" shall be adjusted to
"qualified pointer to
type".
So s really is a pointer, of type char*, and when the function modifies s[i] it's modifying the ith element of line.
On the call:
getline(line, MAXLINE)
line is an array, but in most contexts an array expression is implicitly converted to a pointer to the array's first element.
These two rules almost seem to be part of a conspiracy to make it look like arrays and pointers are really the same thing in C. They most definitely are not. A pointer object contains the address of some object (or a null pointer that doesn't point to any object); an array object contains an ordered sequence of elements. But most manipulation of arrays in C is done via pointers to the array's elements, with pointer arithmetic used to advance from one element to the next.
Suggested reading (I say this a lot): section 6 of the comp.lang.c FAQ.
getchar reads a character from standard input. So if that's you sitting at the terminal, it blocks the program until it receives a character you've typed, then it's done. But standard input is line buffered when its interactive, so what you type isn't processed by the program until you press enter. That means that getchar will be able to keep reading all the characters you typed, as they're read from the buffer.
You're mistaken about the function. The array is passed to the function*, and it stores each character read by getchar (except for EOF or newline) in successive elements. That's the point of it - not to count the characters, but to store them in the array.
(*a pointer is actually passed, but the function here can still treat it like an array.)
The array is used for something important: it is provided by the caller and returned modified with the new content. From a reasonable point of view, filling in the array is the purpose of calling the function.
That array (an array reference) is actually a pointer, char s[] is the same as char *s. so it's building its result in that array, which is why it's copied later in main. there is rarely any "magic" in K&R.
I have the code below:
#include <stdio.h>
#include <unistd.h>
int main () {
int fd = open("filename.dat", O_CREAT|O_WRONLY|O_TRUNC, 0600);
int result = write(fd, "abcdefghijklmnopqrstuvxz", 100);
printf("\n\nfd = %d, result = %d, errno = %d", fd, result, errno);
close(fd);
return 0;
}
I am trying to understand what happens when I try to write more bytes to a file than I have available. So I am calling write and asking the program to write 100 bytes while I have much less than that. The result: a bunch of stuff from stdout ends up on filename.dat. If instead of 100 I use strlen("abcdefghijklmnopqrstuvxz"), I get the desired result. My question then is: why is the program trying to write beyond the '\0' character on my string? Is there some undefined behavior going on here?
My question then is: why is the program trying to write beyond the
'\0' character on my string?
The function write(2) doesn't care about 0-terminators. It actually doesn't care about buffer contents at all: it will try to write as many bytes as you tell it.
Is there some undefined behavior going on here
Of course, trying to write more than you have might incur the wrath of the OS who could decide to terminate your process if it touches inaccessible memory.
The write() function you are using does not care about the content. It just writes the no. of bytes you tell it to write in the file.
So when you say it to write 100 bytes and provide less than 100 bytes. The remaining bytes are taken as garbage value.
But when you are using strlen("abcdefghijklmnopqrstuvxz"), you are asking the write() to write bytes equal to the length of the string. So it works fine there
Because there are two techniques to represent a string. There is the null-terminated version, and there is another when you define its size and the pointer to the first byte. Write uses the second one. It needs a pointer where your data begins and a length to know how much data should copy to the file, but it doesn't see the null values. Sometimes these methods wraps a simple memcpy.
So when you defined the 100 length, in the memory after your abcdefghijklmnopqrstuvxz the program stored your "bunch of stdout stuff". That's why you see garbage. You were lucky because you can get SEGFAULT easily in these cases!
My question then is: why is the program trying to write beyond a \0 Because you want it to write 100 chars.
Is there some undefined behavior going on here? If you increase that 100 to a large number and if that area is on a non-privilage area, it is undefined behaviour.
I think that the basic issue here is that you're thinking of C strings as values, you think you're passing this value to the write function, and the write function is writing out your value plus extra junk.
C is lower level than that. In C, we don't really pass strings around, instead we pass pointers to strings, which are 'char *' values but with the added promise that they point to a valid block of memory that should be treated as a null-terminated string.
The write() function doesn't care about the null-terminated string convention. The parameters in the write call provide a file descriptor, a char *, and a buffer length.
Also, the compiler converts string constants into const char arrays. The equivalent of this happens at the top level:
const char *stringconst00001[27] = { 'a', 'b', 'c', ... 'y', 'z', '\0' }
And it does this in main():
int result = write(fd, stringconst00001, 100);
I have a simple question. I want to write a program in C that scans the lines of a specific file, and if the only phrase on the line is "Atoms", I want it to stop scanning and report which line it was on. This is what I have and is not compiling because apparently I'm comparing an integer to a pointer: (of course "string.h" is included.
char dm;
int test;
test = fscanf(inp,"%s", &dm);
while (test != EOF) {
if (dm=="Amit") {
printf("Found \"Atoms\" on line %d", j);
break;
}
j++;
}
the file was already opened with:
inp = fopen( .. )
And checked to make sure it opens correctly...
I would like to use a different approach though, and was wondering if it could work. Instead of scanning individual strings, could I scan entire lines as such:
// char tt[200];
//
// fgets(tt, 200, inp);
and do something like:
if (tt[] == "Atoms") break;
Thanks!
Amit
Without paying too much attention to your actual code here, the most important mistake your making is that the == operator will NOT compare two strings.
In C, a string is an array of characters, which is simply a pointer. So doing if("abcde" == some_string) will never be true unless they point to the same string!
You want to use a method like "strcmp(char *a, char *b)" which will return 0 if two strings are equal and something else if they're not. "strncmp(char *a, char *b, size_t n)" will compare the first "n" characters in a and b, and return 0 if they're equal, which is good for looking at the beginning of strings (to see if a string starts with a certain set of characters)
You also should NOT be passing a character as the pointer for %s in your fscanf! This will cause it to completely destroy your stack it tries to put many characters into ch, which only has space for a single character! As James says, you want to do something like char ch[BUFSIZE] where BUFSIZE is 1 larger than you ever expect a single line to be, then do "fscanf(inp, "%s", ch);"
Hope that helps!
please be aware that dm is a single char, while you need a char *
more: if (dm=="Amit") is wrong, change it in
if (strcmp(dm, "Amit") == 0)
In the line using fscanf, you are casting a string to the address of a char. Using the %s in fscanf should set the string to a pointer, not an address:
char *dm;
test = fscanf(inp,"%s", dm);
The * symbol declares an indirection, namely, the variable pointed to by dm. The fscanf line will declare dm as a reference to the string captured with the %s delimiter. It will point to the address of the first char in the string.
What kit said is correct too, the strcmp command should be used, not the == compare, as == will just compare the addresses of the strings.
Edit: What kit says below is correct. All pointers should be allocated memory before they are used, or else should be cast to a pre-allocated memory space. You can allocate memory like this:
dm = (char*)malloc(sizeof(char) * STRING_LENGTH);
where STRING_LENGTH is a maximum length of a possible string. This memory allocation only has to be done once.
The problem is you've declared 'dm' as a char, not a malloc'd char* or char[BUFSIZE]
http://www.cplusplus.com/reference/clibrary/cstdio/fscanf/
You'll also probably report incorrect line numbers, you'll need to scan the read-in buffer for '\n' occurences, and handle the case where your desired string lies across buffer boundaries.