How do I seek in stdin - c

Hello guys I need a help. I want to read from stdin by 16 bytes. Every byte I convert into hexadecimal form. Is there a way I can use read() function to read NOT from the beginning, but for example from the second byte? Also how can I know if I have read the whole stdin? - This way I could call this function in a cycle until I have read the whole stdin
This is a function I made:
void getHexLine()
{
int n = 16;
char buffer[n];
read(STDIN_FILENO, buffer, n);
buffer[n]='\0';
//printf("%08x", 0); hex number of first byte on line - not working yet
putchar(' ');
putchar(' ');
//converting every byte into hexadecimal
for (int i = 0;i < 16;i++ )
{
printf("%x", buffer[i]);
putchar(' ');
if (i == 7 || i == 15)
putchar(' ');
}
printf("|%s|\n", buffer);
}
The output should be like this but with an option to start from second byte for example.
[vcurda#localhost proj1]$ echo "Hello, world! This is my program." | ./proj1
48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 20 54 68 |Hello, world! Th|
69 73 20 69 73 20 6d 79 20 70 72 6f 67 72 61 6d |is is my program|
This is a school project so I cant use malloc, scanf and <string.h>. I would be really glad if I get some help and sorry for my not very understandable english.

stdin is not seekable. You can read bytes in, but you can't rewind or fast forwards. EOF (-1) means end of input in stdin as with a regular file, but it's a bit of a looser concept if you are conducting an interactive dialogue with the user.
Basically stdin is line oriented, and it's best to use the pattern printf() prompt, enter whole line from user, printf() results if applicable and another prompt, read in whole line from user, and so on, at least at first until you get used to programming stdin.
To start from the second byte then becomes easy. Read in the whole line, then start from i = 1 instead of i = 0 as you parse it.

Is there a way I can use read() function to read NOT from the
beginning, but for example from the second byte?
Most universally, you can simply ignore the read-in bytes you're not interested in.
Sometimes you will be able to lseek, e.g. if you run your program with
a regular file set to its STDIN as in:
./a.out < /etc/passwd
but lseek will fail on STDINs that are terminals, pipes, character devices, or sockets.
how can I know if I have read the whole stdin?
read will return 0 at the end of the file.
Consult the manual pages for more information.
Generally, you should check your return codes and account for short reads. Your function should probably return an int so that it has a way to communicate a possible IO error.

Related

Why does the file created using a text editor contain one byte more than expected?

I created a file like this
a v
bb
e
And I didn't press enter after typing e in the last line.
So there are four characters in the first line 'a',' ','v','\n'.
There are three characters in the second line 'b','b','\n'.
And there is one character in the last line 'e'.
So there are totally 8 characters in this file. But when I count the characters using the following C program:
#include<stdio.h>
/* count characters in input; 1st version */
int main()
{
long nc;
nc = 0;
while (getchar() != EOF) {
++nc;
}
printf("%ld\n", nc);
return 0;
}
It gave me 9. Even when I use wc command to count, it is still 9. Why?
There's reasoning in favor of having all lines terminated with a newline character:
Why should text files end with a newline?
And there are text editors that are set up to add a trailing newline character automatically (if not already there):
How to stop Gedit, Gvim, Vim, Nano from adding End-of-File newline char?
Probably that is why you observe the unexpected file size.
To inspect the actual content of such files, I like to use hexdump:
$ hexdump -C test
00000000 61 20 76 0a 62 62 0a 65 0a |a v.bb.e.|
00000009

Loop over stdin in C

I am trying to loop over stdin, but since we cannot know the length of stdin, I am not sure how to create the loop or what condition to use in it.
Basically my program will be piped in some data. Each line in the data contains 10 characters of data, followed by a line break (So 11 characters per line)
In pseudocode, what I am trying to accomplish is:
while stdin has data:
read 11 characters from stdin
save 10 of those characters in an array
run some code processing the data
endwhile
Each loop of the while loop rewrites the data into the same 10 bytes of data.
So far, I have figured out that
char temp[11];
read(0,temp,10);
temp[10]='\0';
printf("%s",temp);
will take the first 11 characters from stdin,and save it. The printf will later be replaced by more code that analyzes the data. But I don't know how to encapsulate this functionality in a loop that will process all my data from stdin.
I have tried
while(!feof(stdin)){
char temp[11];
read(0,temp,11);
temp[10]='\0';
printf("%s\n",temp);
}
but when this gets to the last line, it keeps repeatedly printing it out without terminating. Any guidance would be appreciated.
Since you mention line breaks, I assume your data is text. Here is one way, when you know the line lengths. fgets reads the newline too, but that is easily ignored. Instead of trying to use feof I simply check the return value from fgets.
#include <stdio.h>
int main(void) {
char str[16];
int i;
while(fgets(str, sizeof str, stdin) != NULL) { // reads newline too
i = 0;
while (str[i] >= ' ') { // shortcut to testing newline and nul
printf("%d ", str[i]); // print char value
i++;
}
printf ("\n");
str[i] = '\0'; // truncate the array
}
return 0;
}
Program session (ended by Ctrl-Z in Windows console, Ctrl-D in Linux)
qwertyuiop
113 119 101 114 116 121 117 105 111 112
asdfghjkl;
97 115 100 102 103 104 106 107 108 59
zxcvbnm,./
122 120 99 118 98 110 109 44 46 47
^Z

Wrong fprintf output in c

So here is the code that I am attempting to get to work:
char* inFile = "input.txt";
FILE *out = fopen("output.txt", "a+");
int i = 0
while(i < 5){
int countFound = findWord(inFile, keyWord[i]);//returns count of keywords in given file
fprintf(out, "%s: %d\n", keyWord[i], countFound);
i++;
}
fclose(out);
The output of this code is:
youshouldsee1
: 3
youshouldsee2
: 3
youshouldsee3
: 3
youshouldsee4
: 3
youshouldsee5: 1
Expected output:
youshouldsee1: 3
youshouldsee2: 3
youshouldsee3: 3
youshouldsee4: 3
youshouldsee5: 1
I don't really understand why the output is like that, shouldn't it print the string and the int then a new line? Also note that there is not a newline after the last line and there should be. I did some testing and I noticed that if I changed the fprintf statement to fprintf(out, "%s\n", keyWord[i]); the output is:
youshouldsee1
youshouldsee2
youshouldsee3
youshouldsee4
youshouldsee5
Which is formatted much better. Again note that there is not a newline after the last line and there should be.
I noticed that while doing this with just printf statements I get the exact same problem, but the output is slightly more messed up.
Does anybody know what causes this specific issue? Much appreciated.
The array keyWord[] is a double pointer, I'm not sure if that makes a difference or not, but I thought that I would mention it. It is declared like so char** keyWord;. And it was created as follows:
char *tempWord = "something";
keyWords[x] = strdup(tempWord);
That could be totally irrelevent but I thought it was best to mention it.
You likely have some carriage return characters ('\r' or 0x0D) and/or backspace characters ('\b' or 0x08) in your keyWord[i] strings.
CRs and backspaces don't print ordinarily to terminals and instead move the terminal cursor—CRs move the cursor to the beginning of the current line, and backspaces move the cursor backwards one character. When subsequent characters are printed, they overwrite what was there before. So this code
printf("foobar\rbaz\n");
results in this output
bazbar
and this code
printf("foo\bbar\n");
results in this output
fobar
If you look at the raw bytes of the output without printing it to the terminal, you should see the CRs and backspaces plain as day. I like to use the program hexdump(1) to do that. For example:
$ echo -e 'foobar\rbaz'
bazbar
$ echo -e 'foobar\rbaz' | hexdump -C
00000000 66 6f 6f 62 61 72 0d 62 61 7a 0a |foobar.baz.|
0000000b
So I'd suggest looking at the raw data in your program's output and find out where the pesky characters are, and then figure out how to get rid of them.

fwrite fails to write value 10 (0x0A)

I was experimenting on creating BMP files from scratch, when I found a weird bug I could not explain. I isolated the bug in this minimalist program:
int main()
{
FILE* ptr=NULL;
int success=0,pos=0;
ptr=fopen("test.bin","w");
if (ptr==NULL)
{
return 1;
}
char c[3]={10,11,10};
success=fwrite(c,1,3,ptr);
pos=ftell(ptr);
printf("success=%d, pos=%d\n",success,pos);
return 0;
}
the output is:
success=3, pos=5
with hex dump of the test.bin file being:
0D 0A 0B 0D 0A
In short, whatever value you put instead of 11 (0x0B), fwrite will write it correctly. But for some reason, when fwrite comes across a 10 (0x0A) - and precisely this value - it writes 0D 0A instead, that is, 2 bytes, although I clearly specified 1 byte per write in the fwrite arguments. Thus the 3 bytes written, as can be seen in the success variable, and the mysterious 5 in the ftell result.
Could someone please tell me what the heck is going on here...and why 10, why not 97 or 28??
Thank you very much for your help!
EDIT: oh wait, I think I have an idea...isn't this linked to \n being 0A on Unix, and 0D 0A on windows, and some inner feature of the compiler converting one to the other? how can I force it to write exactly the bytes I want?
Your file was opened in text mode so CRLF translation is being done. Try:
fopen("test.bin","wb");
You must be working on the Windows machine. In Windows, EOL is CR-LF whereas in Unix, it is a single character. Your system is replacing 0A with 0D0A.

Why does a file read after calling lseek always return 0?

I cannot understand why a call to read after an lseek returns 0 number of bytes read.
//A function to find the next note for a given userID;
//returns -1 if at the end of file is reached;
//otherwise, it returns the length of the found note.
int find_user_note(int fd, int user_uid) {
int note_uid = -1;
unsigned char byte;
int length;
while(note_uid != user_uid) { // Loop until a note for user_uid is found.
if(read(fd, &note_uid, 4) != 4) // Read the uid data.
return -1; // If 4 bytes aren't read, return end of file code.
if(read(fd, &byte, 1) != 1) // Read the newline separator.
return -1;
byte = length = 0;
while(byte != '\n') { // Figure out how many bytes to the end of line.
if(read(fd, &byte, 1) != 1) // Read a single byte.
return -1; // If byte isn't read, return end of file code.
//printf("%x ", byte);
length++;
}
}
long cur_position = lseek(fd, length * -1, SEEK_CUR ); // Rewind file reading by length bytes.
printf("cur_position: %i\n", cur_position);
// this is debug
byte = 0;
int num_byte = read(fd, &byte, 1);
printf("[DEBUG] found a %d byte note for user id %d\n", length, note_uid);
return length;
}
The variable length value is 34 when it exist the outer while loop and the above code produces cur_position 5 (so there are definitely at least 34 bytes after the lseek function returns), but the variable num_byte returned from function read always returns 0 even though there are still more bytes to read.
Does anyone know the reason num_byte always return 0? If it is a mistake in my code, am not seeing what it is.
Just for information, the above code was run on the following machine
$ uname -srvpio
Linux 3.2.0-24-generic #39-Ubuntu SMP Mon May 21 16:52:17 UTC 2012 x86_64 x86_64 GNU/Linux
Update:
I upload the full code here
This is the content of file that I try to read
$ sudo hexdump -C /var/notes
00000000 e8 03 00 00 0a 74 68 69 73 20 69 73 20 61 20 74 |.....this is a t|
00000010 65 73 74 20 6f 66 20 6d 75 6c 74 69 75 73 65 72 |est of multiuser|
00000020 20 6e 6f 74 65 73 0a | notes.|
00000027
$
If length is an unsigned type smaller than off_t (for instance, size_t on a 32-bit machine), then length*-1 is going to be a huge value (somewhere around 4GB perhaps). This could be the problem. Storing the result of lseek into a long (again, if it's 32-bit) will apply an implementation-defined conversion, probably truncation, that leaves you with a small value again.
I see that your machine is 64-bit, but perhaps you're running a 32-bit userspace?
In any case, why not run your program under strace to see what system calls it's making? That will almost surely clear the issue up quickly.
I finally found the issue!!! I have to put #include <unistd.h> in order to use the correct lseek(). However I'm not sure why without including unistd.h it was compile-able though resulting in unexpected behavior. I thought that without including the prototype of a function, it shouldn't even compile-able.
The code was written in Hacking: The Art of Exploitation 2nd Edition by Jon Erickson and I have verified that in the book, there is no #include <unistd.h>.
With the initial variable length set to 34, the above code would
produce cur_position 5 (so there are definitely at least 34 bytes
after the lseek function returns)
This not necessarily is the case, as one could seek around beyond the end of file without getting any errors.
See the excerpt from lseek()'s man page below:
The lseek() function allows the file offset to be set beyond the
end of the file (but this does not change the size of the file).
So one could very well receive a value form lseek()ing, which still points beyond the end of the file. So read()ing from this position will still return 0 (as is's beyond end-of-file).
Also I agree with R.., that taking more care in using the correct types (the types used by the methods used) isn't a bad idea.
Update: also you might take care to include all headers for system functions you call. To check for such I strongly recommand to use gccs option -Wall to switch on all compiler warnings, they are for free ... ;-)

Resources