C pread gives different results

C pread gives different results - c

I have two systems. One is Ubuntu 14.04 64bit on a Intel CPU, the other one is Ubuntu 14.04 for ARM on a CubieTruck.
The Intel system has a data file stored on a ext4 formatted HDD. The CubieTruck has the same file on a NTFS HDD, which is mounted with NTFS-3G.
I currently have a problem with pread() on those systems. I read a bunch of bytes from a file, and print out the first 64 byte from this chunk. Later, these bytes are used to calculate some hash using Shabal.
While the data printed on the CubieTruck matches exactly what I see on a Windows system when opening the file with a Hex-editor, the output on the 64bit Ubuntu is different. It looks like it is filled with "FFFFFF", but also different in general. Even more strange is, that while output on the CubieTruck always stays the same, it changes on the 64bit Ubuntu system after a while (I haven't seen a pattern when that happens, I just check from time to time).
But the most annoying thing is, that the x64 system seems to calculate correctly, while the ARM system is wrong.
I have no idea why pread delivers different results for the same file under those systems, but I hope someone can shed some light into it.
edit, the code:
int main(int argc, char **argv) {
unsigned int readsize = 16384 * 32 * 2;
char *cache = (char*) malloc(readsize);
int fh = open("/home/user/somefile", O_RDONLY);
if (fh < 0) {
printf("can't open file");
exit(-1);
}
int bytes = 0, b;
do {
b = pread(fh, &cache[bytes], readsize - bytes, bytes);
bytes += b;
} while(bytes < readsize && b > 0);
int i = 0;
for (i=0; i < 64; i++) {
printf("%02X", cache[i]);
}
close(fh);
free(cache);
return 0;
}
both systems are opening the exact same file.
result on x64:
FFFFFF94FFFFFFF16D25FFFFFFC0FFFFFFA3367D010BFFFFFFEF1E12FFFFFF841CFFFFFFBE4C26FFFFFF92FFFFFF80FFFFFF86FFFFFFA822FFFFFF8A26FFFFFF906CFFFFFFAD05FFFFFFE7FFFFFFB124FFFFFFA8FFFFFFF77B16FFFFFFEAFFFFFFACFFFFFF9DFFFFFF9EFFFFFF81FFFFFFC7FFFFFF92FFFFFFCDFFFFFFB0FFFFFFE86270FFFFFFF974FFFFFFA8420C45FFFFFFFC04FFFFFFF9103F2E3A47FFFFFF990F
result on ARM:
94F16D25C0A3367D010BEF1E12841CBE4C26928086A8228A26906CAD05E7B124A8F77B16EAAC9D9E81C792CDB0E86270F974A8420C45FC04F9103F2E3A47990F
You can see, on x64, the result is filled with "FFFFFF", and it appears, that this is somehow needed later on. But I don't get why it's different on my systems.

One of the fun aspects of C is the amount of implementation-defined behaviour - in this case, whether char is signed or not.
The %x format specifier takes an unsigned int argument, so in the ARM case is the conversion is straightforward - char is unsigned so just gets zero-extended to unsigned int. However for x86 where it's signed, the converion can go one of two ways:
sign-extend the char to a signed int, then cast it to unsigned int
first cast to unsigned char, then zero-extend to unisgned int
It appears the char->int part of the conversion takes precedence over the signed->unsigned part* so you get the former (note how the bytes without the top bit set are unambiguous and print the same on both implementations). I imagine your calculation does a similar conversion somewhere expecting signedness, hence why it breaks on ARM.
In short, if you're dealing with char-sized values rather than characters, always specify signed char or unsigned char as appropriate, never bare char.
* I suppose I could dig out the standard to check if that's actually specified, but at this point it's merely a trivial detail

Related

What is a cross OS method of converting different sized integers to and from byte arrays in C?

I'm making a data parser/encoder that has to work on different machines of both endianness.
Metadata in the byte array dynamically declares the number of bytes used to represent each integer, and some integers (I'll know which ones) must be read in big endian, and some must be read in little endian.
I currently have the integer -> byte functions written (developing on a macOS little endian) and working on the Mac.
void longlong_to_bytes_big(long long num, unsigned char *byte_arr, unsigned char num_bytes)
{
unsigned char i;
for(i=0; i<num_bytes; i++)
byte_arr[i] = (num >> ((num_bytes - i - 1) * 8)) & 0xFF;
}
void longlong_to_bytes_little(long long num, unsigned char *byte_arr, unsigned char num_bytes)
{
unsigned char i;
for(i=0; i<num_bytes; i++)
byte_arr[i] = (num >> (i * 8)) & 0xFF;
}
But I'm worried this code actually only works for char, short and int on a little endian machine, and would give me the opposite endianness on a big endian machine.
Then for the other direction, I don't think I can combine all the different integer sizes into one function but I think each one should look something like this:
long long bytes_to_longlong_big(unsigned char *byte_arr)
{
unsigned char i, a[8];
for(i=0; i<8; i++)
a[i] = byte_arr[8-i-1];
return *(long long *)a;
}
long long bytes_to_longlong_small(unsigned char *byte_arr)
{
return *(long long *)byte_arr;
}
but again I'm pretty sure these will be backwards on a different endian machine due to the compilers implementation of (long long *).
Is there a machine endian agnostic way to accomplish this? Given the choice I'd prefer performance over simplicity.
The goal is that these byte arrays be in the same order, regardless of the compiler's endianness, but also regardless of endianness, the code needs to correctly interpret the byte array.

You can save/exchange data in "network order" and then use functions like ntohl and htonl (and friends) when reading and writing data. These function will automatically take care of endianess of the "current" system. Consequently, you don't need to write your own code.

You could be interested in textual formats such as JSON, XML, YAML. For human developers, they make debugging easier. You'll find many libraries supporting them.
You could also look into portable binary formats like XDR or ASN1
You could find some C or C++ code generators (so a metaprogramming approach) related to them (rpcgen, SWIG), and you could consider writing your own C/C++ generator with tools such as GPP or GNU m4 or your Guile or Python script.
For true network exchanges (e.g. Ethernet) - or disk IO, the bottleneck is usually the network (or the disk), not the encoding/decoding processing. That is why it usually makes sense to use textual formats.

The purpose of size_t and its relationship with implementation [duplicate]

I am getting confused with size_t in C. I know that it is returned by the sizeof operator. But what exactly is it? Is it a data type?
Let's say I have a for loop:
for(i = 0; i < some_size; i++)
Should I use int i; or size_t i;?

From Wikipedia:
According to the 1999 ISO C standard
(C99), size_t is an unsigned integer
type of at least 16 bit (see sections
7.17 and 7.18.3).
size_tis an unsigned data type
defined by several C/C++ standards,
e.g. the C99 ISO/IEC 9899 standard,
that is defined in stddef.h.1 It can
be further imported by inclusion of
stdlib.h as this file internally sub
includes stddef.h.
This type is used to represent the
size of an object. Library functions
that take or return sizes expect them
to be of type or have the return type
of size_t. Further, the most
frequently used compiler-based
operator sizeof should evaluate to a
constant value that is compatible with
size_t.
As an implication, size_t is a type guaranteed to hold any array index.

size_t is an unsigned type. So, it cannot represent any negative values(<0). You use it when you are counting something, and are sure that it cannot be negative. For example, strlen() returns a size_t because the length of a string has to be at least 0.
In your example, if your loop index is going to be always greater than 0, it might make sense to use size_t, or any other unsigned data type.
When you use a size_t object, you have to make sure that in all the contexts it is used, including arithmetic, you want non-negative values. For example, let's say you have:
size_t s1 = strlen(str1);
size_t s2 = strlen(str2);
and you want to find the difference of the lengths of str2 and str1. You cannot do:
int diff = s2 - s1; /* bad */
This is because the value assigned to diff is always going to be a positive number, even when s2 < s1, because the calculation is done with unsigned types. In this case, depending upon what your use case is, you might be better off using int (or long long) for s1 and s2.
There are some functions in C/POSIX that could/should use size_t, but don't because of historical reasons. For example, the second parameter to fgets should ideally be size_t, but is int.

size_t is a type that can hold any array index.
Depending on the implementation, it can be any of:
unsigned char
unsigned short
unsigned int
unsigned long
unsigned long long
Here's how size_t is defined in stddef.h of my machine:
typedef unsigned long size_t;

If you are the empirical type,
echo | gcc -E -xc -include 'stddef.h' - | grep size_t
Output for Ubuntu 14.04 64-bit GCC 4.8:
typedef long unsigned int size_t;
Note that stddef.h is provided by GCC and not glibc under src/gcc/ginclude/stddef.h in GCC 4.2.
Interesting C99 appearances
malloc takes size_t as an argument, so it determines the maximum size that may be allocated.
And since it is also returned by sizeof, I think it limits the maximum size of any array.
See also: What is the maximum size of an array in C?

The manpage for types.h says:
size_t shall be an unsigned integer type

To go into why size_t needed to exist and how we got here:
In pragmatic terms, size_t and ptrdiff_t are guaranteed to be 64 bits wide on a 64-bit implementation, 32 bits wide on a 32-bit implementation, and so on. They could not force any existing type to mean that, on every compiler, without breaking legacy code.
A size_t or ptrdiff_t is not necessarily the same as an intptr_t or uintptr_t. They were different on certain architectures that were still in use when size_t and ptrdiff_t were added to the Standard in the late 1980s, and becoming obsolete when C99 added many new types but not gone yet (such as 16-bit Windows). The x86 in 16-bit protected mode had a segmented memory where the largest possible array or structure could be only 65,536 bytes in size, but a far pointer needed to be 32 bits wide, wider than the registers. On those, intptr_t would have been 32 bits wide but size_t and ptrdiff_t could be 16 bits wide and fit in a register. And who knew what kind of operating system might be written in the future? In theory, the i386 architecture offers a 32-bit segmentation model with 48-bit pointers that no operating system has ever actually used.
The type of a memory offset could not be long because far too much legacy code assumes that long is exactly 32 bits wide. This assumption was even built into the UNIX and Windows APIs. Unfortunately, a lot of other legacy code also assumed that a long is wide enough to hold a pointer, a file offset, the number of seconds that have elapsed since 1970, and so on. POSIX now provides a standardized way to force the latter assumption to be true instead of the former, but neither is a portable assumption to make.
It couldn’t be int because only a tiny handful of compilers in the ’90s made int 64 bits wide. Then they really got weird by keeping long 32 bits wide. The next revision of the Standard declared it illegal for int to be wider than long, but int is still 32 bits wide on most 64-bit systems.
It couldn’t be long long int, which anyway was added later, since that was created to be at least 64 bits wide even on 32-bit systems.
So, a new type was needed. Even if it weren’t, all those other types meant something other than an offset within an array or object. And if there was one lesson from the fiasco of 32-to-64-bit migration, it was to be specific about what properties a type needed to have, and not use one that meant different things in different programs.

Since nobody has yet mentioned it, the primary linguistic significance of size_t is that the sizeof operator returns a value of that type. Likewise, the primary significance of ptrdiff_t is that subtracting one pointer from another will yield a value of that type. Library functions that accept it do so because it will allow such functions to work with objects whose size exceeds UINT_MAX on systems where such objects could exist, without forcing callers to waste code passing a value larger than "unsigned int" on systems where the larger type would suffice for all possible objects.

size_t and int are not interchangeable. For instance on 64-bit Linux size_t is 64-bit in size (i.e. sizeof(void*)) but int is 32-bit.
Also note that size_t is unsigned. If you need signed version then there is ssize_t on some platforms and it would be more relevant to your example.
As a general rule I would suggest using int for most general cases and only use size_t/ssize_t when there is a specific need for it (with mmap() for example).

size_t is an unsigned integer data type which can assign only 0 and greater than 0 integer values. It measure bytes of any object's size and is returned by sizeof operator.
const is the syntax representation of size_t, but without const you can run the program.
const size_t number;
size_t regularly used for array indexing and loop counting. If the compiler is 32-bit it would work on unsigned int. If the compiler is 64-bit it would work on unsigned long long int also. There for maximum size of size_t depending on the compiler type.
size_t already defined in the <stdio.h> header file, but it can also be defined by the
<stddef.h>, <stdlib.h>, <string.h>, <time.h>, and <wchar.h> headers.
Example (with const)
#include <stdio.h>
int main()
{
const size_t value = 200;
size_t i;
int arr[value];
for (i = 0 ; i < value ; ++i)
{
arr[i] = i;
}
size_t size = sizeof(arr);
printf("size = %zu\n", size);
}
Output: size = 800
Example (without const)
#include <stdio.h>
int main()
{
size_t value = 200;
size_t i;
int arr[value];
for (i = 0; i < value; ++i)
{
arr[i] = i;
}
size_t size = sizeof(arr);
printf("size = %zu\n", size);
}
Output: size = 800

size_t is a typedef which is used to represent the size of any object in bytes. (Typedefs are used to create an additional name/alias for another data type, but does not create a new type.)
Find it defined in stddef.h as follows:
typedef unsigned long long size_t;
size_t is also defined in the <stdio.h>.
size_t is used as the return type by the sizeof operator.
Use size_t, in conjunction with sizeof, to define the data type of the array size argument as follows:
#include <stdio.h>
void disp_ary(int *ary, size_t ary_size)
{
for (int i = 0; i < ary_size; i++)
{
printf("%d ", ary[i]);
}
}
int main(void)
{
int arr[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 0};
int ary_size = sizeof(arr)/sizeof(int);
disp_ary(arr, ary_size);
return 0;
}
size_t is guaranteed to be big enough to contain the size of the biggest object the host system can handle.
Note that an array's size limitation is really a factor the system's stack size limitations where this code is compiled and executed. You should be able to adjust the stack size at link time (see ld commands's --stack-size parameter).
To give you an idea of approximate stack sizes:
4K on an embedded device
1M on Win10
7.4M on Linux
Many C library functions like malloc, memcpy and strlen declare their arguments and return type as size_t.
size_t affords the programmer with the ability to deal with different types, by adding/subtracting the number of elements required instead of using the offset in bytes.
Let's get a deeper appreciate for what size_t can do for us by examining its usage in pointer arithmetic operations of a C string and an integer array:
Here's an example using a C string:
const char* reverse(char *orig)
{
size_t len = strlen(orig);
char *rev = orig + len - 1;
while (rev >= orig)
{
printf("%c", *rev);
rev = rev - 1; // <= See below
}
return rev;
}
int main() {
char *string = "123";
printf("%c", reverse(string));
}
// Output: 321
0x7ff626939004 "123" // <= orig
0x7ff626939006 "3" // <= rev - 1 of 3
0x7ff626939005 "23" // <= rev - 2 of 3
0x7ff626939004 "123" // <= rev - 3 of 3
0x7ff6aade9003 "" // <= rev is indeterminant. This can be exploited as an out of bounds bug to read memory contents that this program has no business reading.
That's not very helpful in understanding the benefits of using size_t since a character is one byte, regardless of your architecture.
When we're dealing with numerical types, size_t becomes very beneficial.
size_t type is like an integer with benefits that can hold a physical memory address; That address changes its size according to the type of platform in which it is executed.
Here's how we can leverage sizeof and size_t when passing an array of ints:
void print_reverse(int *orig, size_t ary_size)
{
int *rev = orig + ary_size - 1;
while (rev >= orig)
{
printf("%i", *rev);
rev = rev - 1;
}
}
int main()
{
int nums[] = {1, 2, 3};
print_reverse(nums, sizeof(nums)/sizeof(*nums));
return 0;
}
0x617d3ffb44 1 // <= orig
0x617d3ffb4c 3 // <= rev - 1 of 3
0x617d3ffb48 2 // <= rev - 2 of 3
0x617d3ffb44 1 // <= rev - 3 of 3
Above, we see than an int takes 4 bytes (and since there are 8 bits per byte, an int occupies 32 bits).
If we were to create an array of longs we'd discover that a long takes 64 bits on a linux64 operating system, but only 32 bits on a Win64 system. Hence, using t_size, will save a lot of coding and potential bugs, especially when running C code that performs Address Arithmetic on different architectures.
So the moral of this story is "Use size_t and let your C-compiler do the error-prone work of pointer arithmetic."

size_t is unsigned integer data type. On systems using the GNU C Library, this will be unsigned int or unsigned long int. size_t is commonly used for array indexing and loop counting.

In general, if you are starting at 0 and going upward, always use an unsigned type to avoid an overflow taking you into a negative value situation. This is critically important, because if your array bounds happens to be less than the max of your loop, but your loop max happens to be greater than the max of your type, you will wrap around negative and you may experience a segmentation fault (SIGSEGV). So, in general, never use int for a loop starting at 0 and going upwards. Use an unsigned.

size_t or any unsigned type might be seen used as loop variable as loop variables are typically greater than or equal to 0.
When we use a size_t object, we have to make sure that in all the contexts it is used, including arithmetic, we want only non-negative values. For instance, following program would definitely give the unexpected result:
// C program to demonstrate that size_t or
// any unsigned int type should be used
// carefully when used in a loop
#include<stdio.h>
int main()
{
const size_t N = 10;
int a[N];
// This is fine
for (size_t n = 0; n < N; ++n)
a[n] = n;
// But reverse cycles are tricky for unsigned
// types as can lead to infinite loop
for (size_t n = N-1; n >= 0; --n)
printf("%d ", a[n]);
}
Output
Infinite loop and then segmentation fault

This is a platform-specific typedef. For example, on a particular machine, it might be unsigned int or unsigned long. You should use this definition for more portability of your code.

From my understanding, size_t is an unsigned integer whose bit size is large enough to hold a pointer of the native architecture.
So:
sizeof(size_t) >= sizeof(void*)

XChangeProperty uses long for 32-bits

The pid variable that is passed to XChangeProperty() is not a long.
The libX11 code deferences the variable as a long and on a 64-bit sparc
this must be aligned on a 8-byte boundary.
Because it is an int, it gets aligned on a 4-byte boundary, causing a
bus error.
pid_t pid = getpid();
XChangeProperty( display, wm_window, net_wm_pid, cardinal, 32,
PropModeReplace,
(const unsigned char*) &pid, 1 );
Its only necessary casts to 'long' or check for max value?
From XGetWindowProperty(3) manual page
format
Specifies whether the data should be viewed as a list of
8-bit, 16-bit, or 32-bit quantities. Possible values are
8,
16, and 32. This information allows the X server to cor‐
rectly perform byte-swap operations as necessary. If the
format is 16-bit or 32-bit, you must explicitly cast your
data pointer to an (unsigned char *) in the call to
XChange‐
Property.
pid_t pid = getpid ();
if (pid <= 0xFFFFFFFFU) {
unsigned long xpid = pid;
XChangeProperty( display, wm_window, net_wm_pid, cardinal, 32,
PropModeReplace,
(const unsigned char*) &xpid, 1 );
}

You have to pass a long to XChangeProperty for 32 bit properties you're setting.
So you need to do
unsigned long xpid = pid;
If your pid_t is 32 bits or less, there's no need to check if it's <= 0xFFFFFFFFU, it will always be true.
If you want your code to be portable to systems that have pid_t values > 32 bits - (though I don't know of any system that has) then you need that check since the _NET_WM_PID property is defined to be 32 bits.
The underlying issue here is that libX11 will dereference the property as a long, but only extract 32 bit from it, so from what I can tell _NET_WM_PID can't be used on a system if the pid_t value is > 32 bits.
Now, the reason for all this is that X11 was made in a time that where long was 32 bits on all systems it needed to run on, and has been kept that way to not break its API and ABI, I'll quote from this post
You have to separate the C API from the underlying X11 objects. The
objects are 32-bit (or 16 or 8), and will always remain so. These
have to be mapped to the language (C) somehow. The types picked
matched up nicely for a few decades, but unfortunately not anymore.
So rather than backpeddling and saying "we didn't really mean long,
we meant whatever-type-is-closest-to-32-bit-right-now", they kept
long. Anything else would probably have meant a very painful
transition period.
That means that everything in the X11 API that deals with 32-bit
objects will use 64-bit variables with half the space wasted.

Issues with sprintf

I have been trying to use sprintf to add " to the start and end of a integer, however when i use more than 10 digits the program returns the wrong number:
int data2 = 12345678910;
char data3[2];
sprintf(data3,"\"%i\"", data2);
send(data3);
The send function outputs the integer to the screen.
The result i am getting back is :
"-108508098"
The send function works as i use it elsewhere and it does what it is suppose to.

Before your edit, your issue is not only with sprintf (which BTW you should not use, prefer snprintf), it is with integral numbers in C (they have a limited amount of bits, e.g. 64 bits at most on my Linux desktop....; read wikipages on computer number format & C data types).
Your use of sprintf is completely wrong (you've got a buffer overflow, which is an undefined behavior). You should code:
char buffer[32];
snprintf(buffer, sizeof(buffer), "%i", data2);
sendsomewhere(buffer);
Notice that on POSIX send needs 4 arguments. You should rename your function to sendsomewhere
You should read more about <stdint.h> and <limits.h>
You probably want to use bignums (or at least int64_t or perhaps long long to represent numbers like 12345678910). Don't reinvent bignums (they are difficult to implement efficiently). Use some library like gmplib
If 64 bits are enough for you (so if your numbers would always be between -263 i.e. −9223372036854775808 and 263-1 i.e. 9223372036854775807), consider using long long (or unsigned long long) numbers of C99 or C11:
long long data2 = 12345678910;
char buffer[32];
snprintf(buffer, sizeof(buffer), "%lld", data2);
sendsomewhere(buffer);
If 64 bits are not enough, you should use bigints (but some recent compilers might provide some _int128_t type for 128-bits ints)
Don't forget to enable all the warnings & debug info when compiling (e.g. with gcc -Wall -Wextra -g), then learn how to use the debugger (e.g. gdb)

data2 is overflown by the value to which you have initialized it to.
data3 should be able to hold atleast 11 bytes if data2 is of int datatype (+1 for NULL termination), for which you have allocated only 2 bytes.
Here is an example code snippet:
#include <stdio.h>
int main(void)
{
unsigned long long data2 = 12345678910;
char data3[32];
snprintf(data3, sizeof(data3), "\"%llu\"", data2);
printf("%s\n", data3);
return 0;
}

Problems with File I/o when porting old 'C' libraries from 32-bit to 64-bit

I have really old 'c' code that uses read to read a binary file. Here is a sample:
uint MyReadFunc(int _FileHandle, char *DstBuf, uint BufLen)
{
return (read( _FileHandle, DstBuf, BufLen));
}
For 64bit OS - char * will be 64 bits but the BufLen is only 32 bits and the returned value are only 32 bits.
Its not an option to change this to .NET - I have .NET versions, but I need this old library converted also.
Can someone please tell me what I need to use to do File i/o on 64 bit OS (using 'C' code)

Use size_t, not uint.

It looks like you're conflating two things: size of a pointer and the extent of the memory it points to.

I'm not sure about char* being 64-bits - the pointer itself will be 64-bit, yes, but the actual value is still a character array, unless I'm missing something? (I'm not a brilliant C programmer.)
The length argument to read() is size_t, not int, which on a 64-bit system should be 64-bit not 32. Also the return value is a ssize_t, not int, which will also be 64-bit so you should be covered if you just change your function definition to return ssize_t, and take size_t instead of the ints.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight