getting random characters in terminal with C and Python - c

For some reason, when I open a file and read it byte by byte in Python and C and try to print the result, I get random characters/data mixed in.
For example, when I read the first 8 bytes of a PNG image, as in the following example:
/* Test file reading and see if there's random data */
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#define PNG_BYTES_TO_CHECK 8
int
main(void)
{
char fname[] = "../images/2.png";
FILE *fp = fopen(fname, "rb");
if (fp == NULL) abort();
char *buffer = (char *)malloc(PNG_BYTES_TO_CHECK);
if (fread(buffer, 1, PNG_BYTES_TO_CHECK, fp) != PNG_BYTES_TO_CHECK)
abort();
unsigned i;
for (i = 0; i < PNG_BYTES_TO_CHECK; ++i) printf("%x ", buffer[i]);
printf("\n");
free(buffer); fclose(fp);
return 1;
}
I get this garbage to stdout:
ffffff89 50 4e 47 d a 1a a
But when I open the file with a hex editor, the bytes are perfectly fine (it's a valid PNG signature):
Any ideas as to what may cause this ? I don't have an example for Python, but I recall a few days ago I was getting repetitive mumbo jumbo while working with files at the byte level and printing stuff as well.

The png spec states that a png file should always start with the bytes137 80 78 71 13 10 26 10. The maximum value for a signed byte is 127, meaning that the first byte's value overflows and becomes -119 (if this is confusing, check out the way negative numbers are represented). You are then printing it as an unsigned hexadecimal integer. To do so, the signed byte is promoted to an integer. Again, because of the way negative numbers are represented, a 4-byte integer whose value is -119 has the following binary representation: 11111111111111111111111110001001. %x is the format specifier for an unsigned hexadecimal value. Because it thinks the value you are giving it is unsigned, it won't interpret that binary as if it were represented like a negative number. If you convert 11111111111111111111111110001001 to hex, you'll see that it is ffffff89.
tl;dr: there's nothing wrong with the file. You just forgot to make your bytes unsigned.

Related

Reading 2 bytes from a file and converting to an int gives the wrong output

Basically I have a text file that contains a number. I changed the number to 0 to start and then I read 2 bytes from the file (because an int is 2 bytes) and I converted it to an int. I then print the results, however it's printing out weird results.
So when I have 0 it prints out 2608 for some reason.
I'm going off a document that says I need to read through a file where the offset of bytes 0 to 1 represents a number. So this is why I'm reading bytes instead of characters...
I imagine the issue is due to reading bytes instead of reading by characters, so if this is the case can you please explain why it would make a difference?
Here is my code:
void readFile(FILE *file) {
char buf[2];
int numRecords;
fread(buf, 1, 2, file);
numRecords = buf[0] | buf[1] << 8;
printf("numRecords = %d\n", numRecords);
}
I'm not really sure what the buf[0] | buf[1] << 8 does, but I got it from another question... So I suppose that could be the issue as well.
The number 0 in your text file will actually be represented as a 1-byte hex number 0x30. 0x30 is loaded to buf[0]. (In the ASCII table, 0 is represented by 0x30)
You have garbage data in buf[1], in this case the value is 0x0a. (0x0a is \n in the ASCII table)
Combining those two by buf[0] | buf[1] << 8 results in 0x0a30 which is 2608 in decimal. Note that << is the bit-wise left shift operator.
(Also, the size of int type is 4-byte in many systems. You should check that out.)
You can directly read into integer
fread(&numRecords, sizeof(numRecords), 1, file);
You need to check sizeof(int) on your system, if its four bytes you need to declare numRecords as short int rather than int

How to save uint64_t bytes to file on C?

How to save uint64_t bites to file on plain C?
As i suppose output file will be 8 bytes length?
fwrite(&sixty_four_bit_var, 8, 1, file_pointer)
EDIT. Since my compiler does not have uint64_t I have shown two ways to save a 64-bit value to file, by using unsigned long long. The first example writes it in (hex) text format, the second in binary.
Note that unsigned long long may be more than 64 bits on some systems.
#include <stdio.h>
int main(void) {
FILE *fout;
unsigned long long my64 = 0x1234567887654321;
// store as text
fout = fopen("myfile.txt", "wt");
fprintf(fout, "%llX\n", my64);
fclose(fout);
// store as bytes
fout = fopen("myfile.bin", "wb");
fwrite(&my64, sizeof(my64), 1, fout);
fclose(fout);
return 0;
}
Content of myfile.txt (hex dump)
31 32 33 34 35 36 37 38 38 37 36 35 34 33 32 31 1234567887654321
0D 0A ..
Content of myfile.bin (hex dump) (little-endian)
21 43 65 87 78 56 34 12 !Ce‡xV4.
I recommend fprintf it as String. Easier to verify (cat or open in text editor) and no hazzle with endianess. Check inttypes.h for the proper format specifier (PRIu64).
Read back with fscanf, using SCNu64 as format specifier.
That will also work if the data type is not aligned to the first position. Whil impropable for uint64_t, consider a char of 1 octet, but not starting from offset 0 for some reason (big endian CPU with no 8 bit load/store e.g.). This would be allowed by the standard.
However, if you realy want to get 8-bit values, use the following:
uint64_t value = input_value;
for ( size_t i = 0 ; i < 8 ; i++ ) {
fputc(value & 0xFF), filep);
value >>= 8;
}
That will store the value in little-endian format. Note that this is not guaranteed to work for signed due to the right-shift (but it will very likely).
For more complex structures, you might use a proper format like JSON with a library.

When you divide a hexadecimal, what do you get?

I'm trying to create a hash function which stores hexadecimals but I'm not to sure what the hash function to be. I get the addresses which are hexadecimals from a text file and then convert them into unsigned long long int. I'm trying to create a hash table of size 1000, so what exactly do I get when I divide these long long ints? I don't exactly understand this.
The input file contains lines like this:
0x7f1a91026b00
0x7f1a91026b03
0x7f1a91027130
0x7f1a91027131
0x7f1a91027134
0x7f1a91027136
Here's my code so far (I have not created the hash table at the moment since I don't have the hash function)
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main (int argc, char **argv){
if(argc!=2){
printf("error\n");
return 0;
// if there is no input then print an error
}
FILE *file = fopen(argv[1], "r"); // open file
if (!file){
printf("error\n");
return 0;
}
char linestring[BUFSIZ];
while (fgets(linestring, sizeof(linestring), file)) // reads the entire file until it hits Null
{
char *endptr;
unsigned long long key = strtoull(linestring, &endptr, 16);
printf("%s\n", linestring);
}
fclose(file);
}
Hexadecimal, Decimal and Octal are simply 3 different ways of printing to the screen the same number.
Let's look at the number 100. We could print it in decimal as 100. Similarly, we could print it in octal as 0144. And we could print it in hexadecimal as 0x64.
But all three of those represent the same number. So the result of 100 / 3, 0144 / 3, and 0x64 / 3 are all identical.
Onto your real question...
You have a number x. You'd like to restrict x to be a number between [0, 0x1000). The easiest way to do that is to do:
unsigned long long x;
unsigned long long y = x % 0x1000;
Now y will be within the range of [0, 0x1000). This is basically accomplished by subtracting 0x1000 from x until it is less than 0x1000.
So if you want hash table size of 1000 then you need to get the modulo of 0x3E8 so for example 0x7f1a91026b00 % 0x3e8 = 0x20. This represent the 32 in dec.
Hex 0x3e8 = Dec 1000.

Reading from a text file in C

I am having trouble reading a specific integer from a file and I am not sure why. First I read through the entire file to find out how big it is, and then I reset the pointer to the beginning. I then read 3 16-byte blocks of data. Then 1 20-byte block and then I would like to read 1 byte at the end as an integer. However, I had to write into the file as a character but I do not think that should be a problem. My issue is that when I read it out of the file instead of being the integer value of 15 it is 49. I checked in the ACII table and it is not the hex or octal value of 1 or 5. I am thoroughly confused because my read statement is read(inF, pad, 1) which I believe is right. I do know that an integer variable is 4 bytes however, there is only one byte of data left in the file so I read in only the last byte.
My code is reproduced the function(it seems like a lot but it don't think it is)
the code is
#include<math.h>
#include<stdio.h>
#include<string.h>
#include <fcntl.h>
int main(int argc, char** argv)
{
char x;
int y;
int bytes = 0;
int num = 0;
int count = 0;
num = open ("a_file", O_RDONLY);
bytes = read(num, y, 1);
printf("y %d\n", y);
return 0;
}
To sum up my question, how come when I read the byte that stores 15 from the text file, I can't view it as 15 from the integer representation?
Any help would be very appreciated.
Thanks!
You're reading a first byte of int (4 bytes), and then print it as a whole. If you want to read by one byte, you need also to use it as one byte, like this:
char temp; // one-byte signed integer
read(fd, &temp, 1); // read the integer from file
printf("%hhd\n", temp); // print one-byte signed integer
Or, you can use regular int:
int temp; // four byte signed integer
read(fd, &temp, 4); // read it from file
printf("%d\n", temp); // print four-byte signed integer
Note that this will work only on platforms with 32-bit integers, and also depends on platform's byte order.
What you're doing is:
int temp; // four byte signed integer
read(fd, &temp, 1); // read one byte from file into the integer
// now first byte of four is from the file,
// and the other three contain undefined garbage
printf("%d\n", temp); // print contents of mostly uninitialized memory
The read function system call has a declaration like:
ssize_t read(int fd, void* buf, size_t count);
So, you should pass address of the int variable in which you want to read the stuff.
i.e use
bytes = read(num, &y, 1);
You can see all the details of file I/O in C from that link
Based on the read function, I believe it is reading the first byte in the first byte of the 4 bytes of the integer, and that byte is not placed in the lowest byte. This means whatever is in pad for the other 3 bytes will still be there, even if you initialized it to zero (then it will have zeros in the other bytes). I would read in one byte and then cast it to an integer (if you need a 4 byte integer for some reason), as shown below:
/* declare at the top of the program */
char temp;
/* Note line to replace read(inF,pad,1) */
read(inF,&temp,1);
/* Added to cast the value read in to an integer high order bit may be propagated to make a negative number */
pad = (int) temp;
/* Mask off the high order bits */
pad &= 0x000000FF;
Otherwise, you could change your declaration to be an unsigned char which would take care of the other 3 bytes.

Why does C print my hex values incorrectly?

So I'm a bit of a newbie to C and I am curious to figure out why I am getting this unusual behavior.
I am reading a file 16 bits at a time and just printing them out as follows.
#include <stdio.h>
#define endian(hex) (((hex & 0x00ff) << 8) + ((hex & 0xff00) >> 8))
int main(int argc, char *argv[])
{
const int SIZE = 2;
const int NMEMB = 1;
FILE *ifp; //input file pointe
FILE *ofp; // output file pointer
int i;
short hex;
for (i = 2; i < argc; i++)
{
// Reads the header and stores the bits
ifp = fopen(argv[i], "r");
if (!ifp) return 1;
while (fread(&hex, SIZE, NMEMB, ifp))
{
printf("\n%x", hex);
printf("\n%x", endian(hex)); // this prints what I expect
printf("\n%x", hex);
hex = endian(hex);
printf("\n%x", hex);
}
}
}
The results look something like this:
ffffdeca
cade // expected
ffffdeca
ffffcade
0
0 // expected
0
0
600
6 // expected
600
6
Can anyone explain to me why the last line in each block doesn't print the same value as the second?
The placeholder %x in the format string interprets the corresponding parameter as unsigned int.
To print the parameter as short, add a length modifier h to the placeholder:
printf("%hx", hex);
http://en.wikipedia.org/wiki/Printf_format_string#Format_placeholders
This is due to integer type-promotion.
Your shorts are being implicitly promoted to int. (which is 32-bits here) So these are sign-extension promotions in this case.
Therefore, your printf() is printing out the hexadecimal digits of the full 32-bit int.
When your short value is negative, the sign-extension will fill the top 16 bits with ones, thus you get ffffcade rather than cade.
The reason why this line:
printf("\n%x", endian(hex));
seems to work is because your macro is implicitly getting rid of the upper 16-bits.
You have implicitly declared hex as a signed value (to make it unsigned write unsigned short hex) so that any value over 0x8FFF is considered to be negative. When printf displays it as a 32-bit int value it is sign-extended with ones, causing the leading Fs. When you print the return value of endian before truncating it by assigning it to hex the full 32 bits are available and printed correctly.

Resources