Swapping bytes of unsigned short integer - c

I have a partially working function which involves writing to a file.
I have an array, arr, of type unsigned short int and each element must be written to a file in binary format.
My inital solution was:
for(i = 0; i < ROWS; i++) {
fwrite(&arr[i], 1, sizeof(unsigned short int), source);
}
The code above works when writing unsigned short ints to the file. Also, source is a pointer to the file which is being written to in binary format. However, I need to swap the bytes, and am having trouble doing so. Essentially, what is written to the file as abcd should be cdab.
My attempt:
unsigned short int toWrite;
unsigned short int swapped;
for(i = 0; i < ROWS; i++) {
toWrite = &arr[i];
swapped = (toWrite >> 8) | (toWrite << 8);
fwrite(swapped, 1, sizeof(unsigned short int), source);
}
However I get a segmentation fault core dump as a result. I read and used the upvoted answer to this question - convert big endian to little endian in C [without using provided func] - but it doesnt seem to be working. Any suggestions? Thanks!

your attempt is very wrong (and the answers you copied from are okay, the problem isn't in the swapping itself)
First you're taking the address of the value to swap, then you're passing the value instead of the address to write. It should be:
unsigned short int toWrite;
unsigned short int swapped;
for(i = 0; i < ROWS; i++){
toWrite = arr[i];
swapped = (toWrite >>8) | (toWrite <<8); // that part is OK
fwrite(&swapped, 1 , sizeof(unsigned short int) , source);
}
I'm positive that the compiler warned you for this. Warnings are useful.

Welcome to the world of non portable binary formats.
Swapping the numbers is a hack that is error prone because it makes the program non portable: whether to swap the values or not depends on the endianness of the system where you compile and run the program. Your program has a very simple bug: you pass the value of swapped instead of its address. You can fix it with:
fwrite(&swapped, sizeof(swapped), 1, source);
Yet a better solution for your problem is to handle endianness explicitly in your program. This is a portable solution to write the numbers in big endian order:
/* writing 16 bit unsigned integers in big endian order */
for (i = 0; i < ROWS; i++) {
putc(arr[i] >> 8, source);
putc(arr[i] & 255, source);
}
This is the alternative version if you are expected to write in little endian order:
/* writing 16 bit unsigned integers in little endian order */
for (i = 0; i < ROWS; i++) {
putc(arr[i] & 255, source);
putc(arr[i] >> 8, source);
}
Note that it is somewhat confusing to name the stream variable for an output file source.

Related

Parsing LUKS Headers to Read Integer Value Fields Correctly

I'm trying to parse a luks header by reading the raw data off a device with a luks volume installed to it, following the specification given here: https://gitlab.com/cryptsetup/cryptsetup/wikis/LUKS-standard/on-disk-format.pdf, specifically page 6 with the table showing the data that resides at each location, what type of data it is and for how many of those data types there are for a single value.
For instance, the hash-spec string resides at location 72 and contains 32 type char bytes. Collecting this into an array and printing the result is simple, however as detailed in the table for numerical values such as the version or the key-bytes (which is supposedly the length of the key), these values span over multiple integers. The version has two unsigned shorts and the key-bytes has four unsigned ints to represent their values.
I'm somewhat confused by this, and how I should go about interpreting it to retrieve the correct value. I wrote a messy test script to scan through a usb stick encrypted with luks and display what's retrieved from reading these fields.
256
25953
hash spec:
sha256
key bytes (length):
1073741824
3303950314
1405855026
1284286704
This is very confusing, as again the hash spec field holds an expected value, just the string of characters themselves, but how am I supposed to interpreter either the version or key-byte fields? These both seem like completely random numbers, and from what I can tell there isn't anything in the spec that explains this. I figured then this might be a problem with how I'm actually writing the code to do this, below is the script used to display these values:
#include <stdio.h>
int main() {
unsigned short data[100];
unsigned char data2[100];
unsigned int data3[100];
int i;
FILE *fp;
fp = fopen("/dev/sdd1", "rb");
fseek(fp, 6, SEEK_SET);
if (fp) {
for (i=0; i < 2; i++) {
fread(&data[i], sizeof(short), 1, fp);
}
fseek(fp, 72, SEEK_SET);
for (i=0; i < 32; i++) {
fread(&data2[i], sizeof(char), 1, fp);
}
fseek(fp, 108, SEEK_SET);
for (i=0; i < 4; i++) {
fread(&data3[i], sizeof(int), 1, fp);
}
printf("version:\n");
for (i=0; i < 2; i++) {
printf("%u\n", data[i]);
}
printf("hash spec:\n");
for (i=0; i < 32; i++) {
printf("%c", data2[i]);
}
printf("\n");
printf("key bytes (length):\n");
for(i=0; i < 4; i++) {
printf("%u\n", data3[i]);
}
fclose(fp);
}
else {
printf("error\n");
}
return 0;
}
Any help would be appreciated, thanks.
The problem is the data you're reading is big-endian, but the computer you're running on is little-endian. For example, the bytes you're printing out as 1073741824 are 0x00, 0x00, 0x00, and 0x40, in that order. As a big-endian number, that's 0x00000040, or 64. As a little-endian number, as is usually used on x86 systems, that's 0x40000000, an absurdly long length.
Fortunately, there are functions that can convert these values for you. To convert from a 32-bit big-endian (network byte order) to your system's (host byte order) format, use ntohl, and for a 16-bit integer, use ntohs.
So when you read the data for the 16-bit integers, it would look like this:
for (i=0; i < 2; i++) {
fread(&data[i], sizeof(short), 1, fp);
data[i] = ntohs(data[i]);
}
As a side note, if you're going to be working with values of fixed sizes, it's a little more portable and easier to understand if you do #include <stdint.h> and then use the types uint8_t, uint16_t, and uint32_t. These will always be the right size, since the built-in types can vary between platforms.
If you're interested in reading more about endianness, Wikipedia has an article on it.

Another invert char array in C

That code will run on a payment device (POS). I have to use legacy C (not C# or C++) for that purpose.
I am trying to prepare a simple Mifare card read/write software data. Below document is my reference and I am trying to achieve what is on page 9, 8.6.2.1 Value blocks explains.
http://www.nxp.com/documents/data_sheet/MF1S50YYX_V1.pdf
I just know very basics of C. All my searches in The Internet have failed. According to document:
1- There is integer variable with value of 1234567.
2- There is char array[4] which should have hex of above value which is 0x0012D687
3- I am supposed to invert that char array[4] and reach value of 0xFFED2978
I need to do some other things but I have stuck in number 3 above. What I have tried lastly is
int value = 1234567;
char valuebuffer[4];
char invertbuffer[4];
sprintf(valuebuffer, "%04x", value);
for(i = 0; i < sizeof(valuebuffer); i++ )
{
invertbuffer[i] ^= valuebuffer[i];
}
When I print, I read some other value in invertbuffer and not 0xFFED2978
Seems like you're making it more complicated than it needs to be. You can do the binary inversion on the int variable rather than messing around with individual bytes.
int value = 1234567;
int inverted= ~ value;
printf("%x\n",value);
printf("%x\n",inverted);
gives you output of
12d687
ffed2978
First of all, you must use the types from stdint.h and not char, because the latter has implementation-defined signedness and is therefore overall unsuitable for holding raw binary data.
With that sorted, you can use a union for maximum flexibility:
#include <stdint.h>
#include <stdio.h>
typedef union
{
uint32_t u32;
uint8_t u8 [4];
} uint32_union_t;
int main (void)
{
uint32_union_t x;
x.u32 = 1234567;
for(size_t i=0; i<4; i++)
{
printf("%X ", x.u8[i]);
}
printf("\n");
x.u32 = ~x.u32;
for(size_t i=0; i<4; i++)
{
printf("%X ", x.u8[i]);
}
printf("\n");
}
Notably, the access order of the u8 is endianess dependent. This might be handy when dealing with something like RFID, which doesn't necessarily have the same network endianess as your MCU.

Understanding And Getting info of Bitmap in C

I am having a hard time understanding and parsing the info data present in a bitmap image. To better understand I read the following tutorial, Raster Data.
Now, The code present there is as follows, (Greyscale 8bit color value)
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
/*-------STRUCTURES---------*/
typedef struct {int rows; int cols; unsigned char* data;} sImage;
/*-------PROTOTYPES---------*/
long getImageInfo(FILE*, long, int);
int main(int argc, char* argv[])
{
FILE *bmpInput, *rasterOutput;
sImage originalImage;
unsigned char someChar;
unsigned char* pChar;
int nColors; /* BMP number of colors */
long fileSize; /* BMP file size */
int vectorSize; /* BMP vector size */
int r, c; /* r = rows, c = cols */
/* initialize pointer */
someChar = '0';
pChar = &someChar;
if(argc < 2)
{
printf("Usage: %s bmpInput.bmp\n", argv[0]);
//end the execution
exit(0);
}
printf("Reading filename %s\n", argv[1]);
/*--------READ INPUT FILE------------*/
bmpInput = fopen(argv[1], "rb");
//fseek(bmpInput, 0L, SEEK_END);
/*--------DECLARE OUTPUT TEXT FILE--------*/
rasterOutput = fopen("data.txt", "w");
/*--------GET BMP DATA---------------*/
originalImage.cols = (int)getImageInfo(bmpInput, 18, 4);
originalImage.rows = (int)getImageInfo(bmpInput, 22, 4);
fileSize = getImageInfo(bmpInput, 2, 4);
nColors = getImageInfo(bmpInput, 46, 4);
vectorSize = fileSize - (14 + 40 + 4*nColors);
/*-------PRINT DATA TO SCREEN-------------*/
printf("Width: %d\n", originalImage.cols);
printf("Height: %d\n", originalImage.rows);
printf("File size: %ld\n", fileSize);
printf("# Colors: %d\n", nColors);
printf("Vector size: %d\n", vectorSize);
/*----START AT BEGINNING OF RASTER DATA-----*/
fseek(bmpInput, (54 + 4*nColors), SEEK_SET);
/*----------READ RASTER DATA----------*/
for(r=0; r<=originalImage.rows - 1; r++)
{
for(c=0; c<=originalImage.cols - 1; c++)
{
/*-----read data and print in (row,column) form----*/
fread(pChar, sizeof(char), 1, bmpInput);
fprintf(rasterOutput, "(%d, %d) = %d\n", r, c, *pChar);
}
}
fclose(bmpInput);
fclose(rasterOutput);
}
/*----------GET IMAGE INFO SUBPROGRAM--------------*/
long getImageInfo(FILE* inputFile, long offset, int numberOfChars)
{
unsigned char *ptrC;
long value = 0L;
unsigned char dummy;
int i;
dummy = '0';
ptrC = &dummy;
fseek(inputFile, offset, SEEK_SET);
for(i=1; i<=numberOfChars; i++)
{
fread(ptrC, sizeof(char), 1, inputFile);
/* calculate value based on adding bytes */
value = (long)(value + (*ptrC)*(pow(256, (i-1))));
}
return(value);
} /* end of getImageInfo */
What I am not understanding:-
I am unable the understand the 'GET IMAGE INTOSUBPROGRAM' part where the code is trying to get the image infos like no of rows,columns, etc. Why are these infos stored over 4 bytes and what is the use of the value = (long)(value + (*ptrC)*(pow(256, (i-1)))); instruction.
Why there unsigned char dummy ='0' is created and then ptrC =&dummy is assigned?
Why can't we just get the no of rows in an image by just reading 1 byte of data like getting the Greyscale value at a particular row and column.
Why are we using unsigned char to store the byte, isn't there some other data type or int or long we can use effectively here?
Please help me understand these doubts(confusions!!?) I am having and forgive me if they sound noobish.
Thank you.
I would say the tutorial is quite bad in some ways and your problems to understand it are not always due to being a beginner.
I am unable the understand the 'GET IMAGE INTOSUBPROGRAM' part where the code is trying to get the image infos like no of rows,columns, etc. Why are these infos stored over 4 bytes and what is the use of the value = (long)(value + (ptrC)(pow(256, (i-1)))); instruction.
The reason to store over 4 bytes is to allow the image to be sized between 0 and 2^32-1 high and wide. If we used just one byte, we could only have images sized 0..255 and with 2 bytes 0..65535.
The strange value = (long)(value + (*ptrC)*(pow(256, (i-1)))); is something I've never seen before. It's used to convert bytes into a long so that it would work with any endianness. The idea is to use powers of 256 to set the *ptrC to the value, i.e. multiplying first byte with 1, next with 256, next with 65536 etc.
A much more readable way would be to use shifts, e.g. value = value + ((long)(*ptrC) << 8*(i-1));. Or even better would be to read bytes from the highest one to lower and use value = value << 8 + *ptrC;. In my eyes a lot better, but when the bytes come in a different order, is not always so simple.
A simple rewrite to be much easier to understand would be
long getImageInfo(FILE* inputFile, long offset, int numberOfChars)
{
unsigned char ptrC;
long value = 0L;
int i;
fseek(inputFile, offset, SEEK_SET);
for(i=0; i<numberOfChars; i++) // Start with zero to make the code simpler
{
fread(&ptrC, 1, 1, inputFile); // sizeof(char) is always 1, no need to use it
value = value + ((long)ptrC << 8*i); // Shifts are a lot simpler to look at and understand what's the meaning
}
return value; // Parentheses would make it look like a function
}
Why there unsigned char dummy ='0' is created and then ptrC =&dummy is assigned?
This is also pointless. They could've just used unsigned char ptrC and then used &ptrC instead of ptrC and ptrC instead of *ptrC. This would've also shown that it is just a normal static variable.
Why can't we just get the no of rows in an image by just reading 1 byte of data like getting the Greyscale value at a particular row and column.
What if the image is 3475 rows high? One byte isn't enough. So it needs more bytes. The way of reading is just a bit complicated.
Why are we using unsigned char to store the byte, isn't there some other data type or int or long we can use effectively here?
Unsigned char is exactly one byte long. Why would we use any other type for storing a byte then?
(4) The data of binary files is made up of bytes, which in C are represented by unsigned char. Because that's a long word to type, it is sometimes typedeffed to byte or uchar. A good standard-compliant way to define bytes is to use uint8_t from <stdint.h>.
(3) I'm not quite sure what you're trying to get at, but the first bytes - usually 54, but there are othzer BMF formats - of a BMP file make up the header, which contains information on colour depth, width and height of an image. The bytes after byte 54 store the raw data. I haven't tested yopur code, but there might be an issue with padding, because the data for each row must be padded to make a raw-data size that is divisible by 4.
(2) There isn't really a point in defining an extra pointer here. You could just as well fread(&dummy, ...) directly.
(1) Ugh. This function reads a multi-byte value from the file at position offset in the file. The file is made up of bytes, but several bytes can form other data types. For example, a 4-byte unsigned word is made up of:
uint8_t raw[4];
uint32_t x;
x = raw[0] + raw[1]*256 + raw[2]*256*256 + raw[3]*256*256*256;
on a PC, which uses Little Endian data.
That example also shows where the pow(256, i) comes in. Using the pow function here is not a good idea, because it is meant to be used with floating-point numbers. Even the multiplication by 256 is not very idiomatic. Usually, we construct values by byte shifting, where a multiplication by 2 is a left-shift by 1 and hence a multiplication by 256 is a left-shift by 8. Similarly, the additions above add non-overlapping ranges and are usually represented as a bitwise OR, |:
x = raw[0] | (raw[1]<<8) | (raw[2]<<16) | (raw[3]<<24);
The function accesses the file by re-positioning the file pointer (and leaving it at the new position). That's not very effective. It would be better to read the header as an 54-byte array and accessing the array directly.
The code is old and clumsy. Seeing something like:
for(r=0; r<=originalImage.rows - 1; r++)
is already enough for me not to trust it. I'm sure you can find a better example of reading greyscale images from BMP. You could even write your own and start with the Wikipedia article on the BMP format.

Incorrect result for %p when implementation printf

I'm working on my own printf code and I got 2 problems that I hoped you might be able to help me with.
The first one is with the %p option :
This option gives me the pointer address of a void* in hex form.
So what I'm doing is this :
void printp(void *thing)
{
dectohex((long)&thing, 1);
}
where dectohex is just a function converting a decimal to hex.
The result will always be correct, except for the last 3 characters. Always. For example :
me : 0x5903d8b8 , printf : 0x5903da28.
And these characters don't change very often, whereas the other part changes at each call like its supposed to.
The other problem I have is with the %O option. I can't manage to convert a signed int to an unsigned int. printf prints huge numbers for negative int's, and no casts seems to work since I wouldn't have the place to store it anyways.
EDIT:
Thanks sooo much for the answers, so apparently for the first problem i was just a little stupid. For the second question i'm gonna try the different solutions you gave me and update you if i manage to do it.
Again thanks so much for your time and patience, and sorry for the delay in my response, i checked the email alert for any answer but it doesn't work apparently.
REEDIT: After reading your answers to my second question more carefully, i think some of you think i asked about %o or %0. I was really talking about %O as in %lo i think. In the man it tells me "%O : The long int argument is converted to unsigned octal". My problem is before converting the long int to octal, i need to convert it to something unsigned.
If uintptr_t/intmax_t is defined (it is optional), convert the pointer to that integer type and then print.
Otherwise, if sizeof(uintmax_t) >= sizeof (void *) , convert to uintmax_t. uintmax_t is a required type, but may not be sufficiently large.
void printp(void *thing) {
uintptr_t j = (uintptr_t) thing;
char lst[(sizeof j * CHAR_BIT + 3)/ 4 + 1]; // Size needed to print in base 16
char *p = &lst[sizeof lst] - 1;
*p = '\0';
do {
p--;
*p = "0123456789ABCDEF"[j%16];
j /= 16;
} while (p > lst);
fputs(p, stdout);
}
The %O problem is likely a sign extension issue. (#mafso) Insure valuables used are unsigned, like unsigned and unsigned long. Without seeing the code difficult to know for sure.
About the first issue you're having, just to make sure, you want to print the address of thing (note that thing itself is a pointer) or the address of the origin of thing (the pointer to the pointer thing)?
You're currently printing the pointer to the pointer.
Change
dectohex((long)&thing, 1);
to
dectohex((long)thing, 1);
if that is the case.
About the %O problem, can you give a code example?
You need "unsigned long long" for your cast.
Pointers are unsigned, but long is signed.
The number of bits in any data type is implementation-dependent; however these days it is common for long and unsigned long to be 32 bits.
edit: to be more clear, you can't count on anything about the number of bits in C, C++ or Objective-C, it's always implementation-dependent. For example it was at one time common to have nine bit bytes and thirty-six bit words. That's why the Internet Protocols always specify "octets" - groups of eight bites - rather then "bytes".
That's one advantage of Java, in that the number of bits in each data type is strictly definited.
About your second question regarding zero-padding and negative integers, which seems entirely separate from the first question about hex output. You can handle negative numbers like this (although in 32-bit it does not work with the value -2147483648 which is 0x80000000).
#include <stdio.h>
#define MAXDIGITS 21
int printint(int value, int zeropad, int width)
{
int i, z, len = 0;
char strg [MAXDIGITS+1];
strg [MAXDIGITS] = 0;
if (value < 0) {
value = - value;
putchar ('-');
len = 1;
}
for (i=MAXDIGITS-1; i>=0; i--) {
strg [i] = '0' + value % 10;
if ((value /= 10) == 0)
break;
}
if (zeropad)
for (z=MAXDIGITS-i; z<width; z++) {
putchar ('0');
len++;
}
for (; i<MAXDIGITS; i++) {
putchar (strg [i]);
len++;
}
return len;
}
int main (int argc, char *argv[])
{
int num = 0, len;
if (argc > 1) {
sscanf (argv[1], "%d", &num);
// try the equivalent of printf("%4d, num);
len = printint (num, 0, 4);
printf (" length %d\n", len);
// try the equivalent of printf("%04d, num);
len = printint (num, 1, 4);
printf (" length %d\n", len);
}
return 0;
}

Why does printing an unsigned char sometimes work and sometimes not? In C

For a school project, I'm writing a blowfish encryption (just the encryption, not the decryption). I've finished the encryption itself, and I decided I would do the decryption for fun (it's easy enough in Blowfish).
I used unsigned chars to represent bytes (although I suppose uint8_t would have been more portable). The question I have comes in when I am attempting to print out the decrypted bytes. I am encrypting plain text messages. I've been able to print out the actual text message that I encrypted, but only at a very specific spot. The same exact code seems not to work anywhere else. Here it is:
int n;
for(n = 0; n < numBlocks; n++) // Blocks are 32-bit unsigned ints (unsigned long)
{
// uchar is just a typedef for unsigned char
// message is an array of uchars
message[n] = (uchar) ((blocks[n]>>24));
message[n+1] = (uchar) ((blocks[n]>>16));
message[n+2] = (uchar) ((blocks[n]>>8));
message[n+3] = (uchar) ((blocks[n]));
// Printing works here; exact message comes back
printf("%c%c%c%c", message[n], message[n+1], message[n+2], message[n+3]);
}
But when I try to use the exact same code two lines later, it doesn't work.
for(n = 0; n < numBlocks; n++)
{
// Printing doesn't work here.
// Actually, the first letter works, but none of the others
printf("%c%c%c%c", message[n], message[n+1], message[n+2], message[n+3]);
}
I have tried printing out the characters in number format as well, and I can see that they have in fact changed.
What exactly is going on here? Is this undefined behavior? Does anyone have any reliable solutions? I'm not doing anything to change the value of the message array in between the two calls.
I'm running and compiling this on Sun 5.10 with a sparc processor.
for(n = 0; n < numBlocks; n++) // Blocks are 32-bit unsigned ints (unsigned long)
{
message[n] = (uchar) ((blocks[n]>>24));
message[n+1] = (uchar) ((blocks[n]>>16));
message[n+2] = (uchar) ((blocks[n]>>8));
message[n+3] = (uchar) ((blocks[n]));
}
Every time you go through this loop, you set message[n] to message[n+3], then increment n by 1. This means that your first iteration sets message[0], message[1], message[2] and message[3], then your second sets message[1], message[2], message[3] and message[4]. So basically, you overwrite all but the first char in your message on every iteration.
Most likely you need to make message 4x larger and then do:
message[n*4] = ...
message[n*4 + 1] = ...

Resources