Use of uninitialised value of size 8 Valgrind - c

I'm getting
==56903== Use of uninitialised value of size 8
==56903== at 0x1000361D1: checkMSB (in ./UnittestCSA)
==56903== by 0x10003732A: S_derive_k1_k2 (in ./UnittestCSA)
Code is as follows:
int32_t checkMSB(uint8_t *pKey){
int8_t msb = 0;
int32_t ret = 0;
msb = 1 << (8 - 1);
/* Perform bitwise AND with msb and num */
if(pKey[0] & msb){
ret = 1;
} else {
ret = 0;
}
return ret;
}
Not sure what is causing the issue.
If this
#define BITS (sizeof(int8_t) * 8)
is changed to this
#define BITS (sizeof(int) * 8)
it doesn't complain. I have #include <stdint.h> header file.
UPDATE
uint8_t localK1[BLOCKSIZE];
for(index = 0; index < inputLen; index++){
localK1[index] = pInputText[index];
}
result = checkMSB(localK1);

Your checkMSB function declares only two local variables and one function parameter. The variables both have initializers, and the parameter (a pointer) will receive a value as a result of the function call, supposing that a correct prototype for it is in scope at the point of the call. Thus, none of these is used uninitialized.
The only other data that are used (not counting constants), are those pointed to by the argument, pKey. Of those, your code uses pKey[0]. That it is Valgrind reporting the issue supports the conclusion that that's the data it is complaining about: valgrind's default memcheck service watches dynamically allocated memory, and that's the only thing involved that might be dynamically allocated.
That the error disappears when you change the definition of BITS could be explained by the expression pKey[0] & msb being optimized away when BITS evaluates to a value larger than 8.
As far as your update, which purports to show that the function's argument in fact points to initialized data, I'm inclined to think that you're looking in the wrong place, or else in the right place but at the wrong code. That is, there probably is either a different call to checkMSB that causes Valgrind to complain, or else the binary being tested was built from a different version of the code. I'm not prepared to believe that everything you've presented in the question is true, or at least not that it fits together the way you seem to be saying it does.

Related

C - Pupulate (get) data with a function. Best via reference or copy (in micro controller enviroment)?

I am currently programming a 8051 µC in C (with compiler: Wickehaeuser µC/51) and so that, I am thinking which way is the best, to pupulate a structure. In my current case I have a time/date structure which should be pupulated with the current Time/Date from an RTC via SFRs.
So I am thinking of the best method to do this:
Get the data via return value by creating the variable inside the function (get_foo_create)
Get data via call by reference (get_foo_by_reference)
Get via call by reference plus returning it (by writing I think this is stupid, but I am also thinking about this) (get_foo_by_reference)
The following code is just an example (note: there is currently a failure in the last print, which does not print out the value atm)
Which is the best method?
typedef struct {
unsigned char foo;
unsigned char bar;
unsigned char baz;
}data_foo;
data_foo get_foo_create(void) {
data_foo foo;
foo.bar = 2;
return foo;
}
void get_foo_by_reference(data_foo *foo) {
// Read values e.g. from SFR
foo->bar = 42; // Just simulate SFR
}
data_foo *get_foo_pointer_return(data_foo *foo) {
// Read values e.g. from SFR
(*foo).bar = 11; // Just simulate SFR
return foo;
}
/**
* Main program
*/
void main(void) {
data_foo struct_foo;
data_foo *ptr_foo;
seri_init(); // Serial Com init
clear_screen();
struct_foo = get_foo_create();
printf("%d\n", struct_foo.bar);
get_foo_by_reference(&struct_foo);
printf("%d\n", struct_foo.bar);
ptr_foo = get_foo_pointer_return(&ptr_foo);
//Temp problem also here, got 39 instead 11, tried also
//printf("%d\n",(void*)(*ptr_foo).bar);
printf("%d\n",(*ptr_foo).bar);
SYSTEM_HALT; //Programm end
}
On the 8051, you should avoid using pointers to the extent possible. Instead, it's generally best--if you can afford it--to have some global structures which will be operated upon by various functions. Having functions for "load thing from address" and "store thing to address", along with various functions that manipulate thing, can be much more efficient than trying to have functions that can operate on objects of that type "in place".
For your particular situation, I'd suggest having a global structure called "time", as well as a global union called "ldiv_acc" which combines a uint_32, two uint16_t, and four uint8_t. I'd also suggest having an "ldivmod" function which divides the 32-bit value in ldiv_acc by an 8-bit argument, leaving the quotient in ldiv_acc and returning the remainder, as well as an "lmul" function which multiplies the 32-bit value in ldiv_acc by an 8-bit value. It's been a long time since I've programmed the 8051, so I'm not sure what help compilers need to generate good code, but 32x32 divisions and multiplies are going to be expensive compared with using a combination of 8x8 multiplies and divides.
On the 8051, code like:
uint32_t time;
uint32_t sec,min,hr;
sec = time % 60;
time /= 60;
min = time % 60;
time /= 60;
hr = time % 24;
time /= 24;
is likely to be big and slow. Using something like:
ldiv_acc.l = time;
sec = ldivmod(60);
min = ldivmod(60);
hr = ldivmod(24);
is apt to be much more compact and, if you're clever, faster. If speed is really important, you could use functions to perform divmod6, divmod10, divmod24, and divmod60, taking advantage of the fact that divmod60(h+256*l) is equal to h*4+divmod60(h*16+l). The second addition might yield a value greater than 256, but if it does, applying the same technique will get the operand below 256. Dividing an unsigned char by another unsigned char is faster than divisions involving unsigned int.

Why is glibc's __random_r assigning variables it immediately overwrites?

I was looking for the source for glibc's rand() function, which an answer here links to.
Following the link, I'm puzzled about the code for the __random_r() TYPE_0 branch:
int32_t val = state[0];
val = ((state[0] * 1103515245) + 12345) & 0x7fffffff;
state[0] = val;
*result = val;
What is the point of the val variable, getting assigned and then immediately overwritten? The random_data struct that holds state is nothing unusual.
As one would expect, compiling with -O2 on godbolt gives the same code if you just eliminate val. Is there a known reason for this pattern?
UPDATE: This seems it was an aberration in the version linked to from that answer, I've updated the links there to the 2.28 version. It might have been something that was done temporarily to aid in debugging by making the contents of state[0] easier to see in a local watchlist?
Wow, that is indeed some unbelievable garbage code.
There is no justification for this.
And not only is initialization of val not needed, the fact is that state[0] is an int32_t, and multiplication by 1103515245 will trigger undefined behaviour in GCC (integer overflow) on any platform with 32-bit ints (= basically every one). And GCC is the compiler most often used to compile Glibc.
As noted by HostileFork, the code in more recent 2.28 reads:
int32_t val = ((state[0] * 1103515245U) + 12345U) & 0x7fffffff;
state[0] = val;
*result = val;
With this, not only is the useless initialization removed, but the U suffix makes the multiplication happen with unsigned integers, avoiding undefined behaviour. The & 0x7fffffff ensures that the resulting value fits into an int32_t and is positive.

Error when trying to cast double to char * [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
My project is to scan an address space (which in my case is 0x00000000 - 0xffffffff, or 0 - (232)-1) for a pattern and return in an array the locations in memory where the pattern was found (could be found multiple times).
Since the address space is 32 bits, i is a double and max is pow(2,32) (also a double).
I want to keep the original value of i intact so that I can use that to report the location of where the pattern was found (since actually finding the pattern requires moving forward several bytes past i), so I want temp, declared as char *, to copy the value of i. Then, later in my program, I will dereference temp.
double i, max = pow(2, 32);
char *temp;
for (i = 0; i < max; i++)
{
temp = (char *) i;
//some code involving *temp
}
The issue I'm running into is a double can't be cast as a char *. An int can be; however, since the address space is 32 bits (not 16) I need a double, which is exactly large enough to represent 2^32.
Is there anything I can do about this?
In C, double and float are not represented the way you think they are; this code demonstrates that:
#include <stdio.h>
typedef union _DI
{
double d;
int i;
} DI;
int main()
{
DI di;
di.d = 3.00;
printf("%d\n", di.i);
return 0;
}
You will not see an output of 3 in this case.
In general, even if you could read other process' memory, your strategy is not going to work on any modern operating system because of virtual memory (the address space that one process "sees" doesn't necessarily (in fact, it usually doesn't) represent the physical memory on the system).
Never use a floating point variable to store an integer. Floating point variables make approximate computations. It would happen to work in this case, because the integers are small enough, but to know that, you need intimate knowledge of how floating point works on a particular machine/compiler and what range of integers you'll be using. Plus it's harder to write the program, and the program would be slower.
C defines an integer type that's large enough to store a pointer: uintptr_t. You can cast a pointer to uintptr_t and back. On a 32-bit machine, uintptr_t will be a 32-bit type, so it's only able to store values up to 232-1. To express a loop that covers the whole range of the type including the first and last value, you can't use an ordinary for loop with a variable that's incremented, because the ending condition requires a value of the loop index that's out of range. If you naively write
uintptr_t i;
for (i = 0; i <= UINTPTR_MAX; i++) {
unsigned char *temp = (unsigned char *)i;
// ...
}
then you get an infinite loop, because after the iteration with i equal to UINTPTR_MAX, running i++ wraps the value of i to 0. The fact that the loop is infinite can also be seen in a simpler logical way: the condition i <= UINTPTR_MAX is always true since all values of the type are less or equal to the maximum.
You can fix this by putting the test near the end of the loop, before incrementing the variable.
i = 0;
do {
unsigned char *temp = (unsigned char *)i;
// ...
if (i == UINTPTR_MAX) break;
i++;
} while (1);
Note that exploring 4GB in this way will be extremely slow, if you can even do it. You'll get a segmentation fault whenever you try to access an address that isn't mapped. You can handle the segfault with a signal handler, but that's tricky and slow. What you're attempting may or may not be what your teacher expects, but it doesn't make any practical sense.
To explore a process's memory on Linux, read /proc/self/maps to discover its memory mappings. See my answer on Unix.SE for some sample code in Python.
Note also that if you're looking for a pattern, you need to take the length of the whole pattern into account, a byte-by-byte lookup doesn't do the whole job.
Ahh, a school assignment. OK then.
uint32_t i;
for ( i = 0; i < 0xFFFFFFFF; i++ )
{
char *x = (char *)i;
// Do magic here.
}
// Also, the above code skips on 0xFFFFFFFF itself, so magic that one address here.
// But if your pattern is longer than 1 byte, then it's not necessary
// (in fact, use something less than 0xFFFFFFFF in the above loop then)
The cast of a double to a pointer is a constraint violation - hence the error.
A floating type shall not be converted to any pointer type. C11dr §6.5.4 4
To scan the entire 32-bit address space, use a do loop with an integer type capable of the [0 ... 0xFFFFFFFF] range.
uint32_t address = 0;
do {
char *p = (char *) address;
foo(p);
} while (address++ < 0xFFFFFFFF);

Stack concern : Local variables vs Arithmetics

I have a question regarding wether it is good to save arithmetics computations to limit the stack usage.
Let's say I have a recursive function like this one :
void foo (unsigned char x, unsigned char z) {
if (!x || !z)
return;
// Do something
for (unsigned char i = 0; i < 100; ++i) {
foo(x - 1, z);
foo(x, z - 1);
}
}
The main thing to see here are the x - 1 and z - 1 evaluated each time in the loop.
To increase performance, I would do something like this :
const unsigned char minus_x = x - 1;
const unsigned char minus_z = z - 1;
for (unsigned char i = 0; i < 100; ++i) {
foo(minus_x, z);
foo(x, minus_z);
}
But doing this means that on each call, minus_x and minus_z are saved on the stack. The recursive function might be called thousand of times, which means thousand bytes used in the stack. Also, the maths involved aren't as simple as a -1.
Is this a good idea ?
Edit : It is actually useless, since it is a pretty standard optimization for compilers : Loop-invariant code motion (see HansPassant's comment)
Would it be a better idea to use a static array containing the computations like :
static const char minuses[256] = {/* 0 for x = 0; x - 1 for x = 1 to 255 */}
and then do :
foo(minuses[x], z);
foo(x, minuses[z]);
This approach limits a lot the actual maths needed but on each call, it has to get the cell in the array instead of reading it from a register.
I am trying to benchmark as much as I can to find the best solution, but if there is a best practice or something I am missing here, please let me know.
FWIW, I tried this with gcc, for two functions foo_1() (no extra variables) and foo_2() (extra variables).
With -03 gcc unrolled the for loop (!), the two functions were exactly the same size, but not quite the same code. I regret I don't have time to work out how and why they differed.
With -02 gcc generated exactly the same code for foo_1 and foo_2. As one might expect it allocated a register to x, z, x-1, z-1 and i, and pushed/popped those to preserve the parent's values -- using 6 x 8 (64-bit machine) bytes of stack for each call (including the return address).
You report 24 bytes of stack used... is that a 32-bit machine ?
With -O0, the picture was different, foo_1 did the x-1 and z-1 each time round the loop, and in both cases the variables were held in memory. foo_1 was slightly shorter and I suspect that the subtraction makes no difference on a modern processor ! In this case, foo_1 and foo_2 used the same amount of stack. This is because all the variables in foo are unsigned char, and the extra minus_x and minus_z pack together with the i, using space which is otherwise padding. If you change minus_x and minus_z to unsigned long long, you get a difference. Curiously, foo_1 used 6 x 8 bytes of stack as well. There were 16 unused bytes in the stack frame, so even taking into account aligning the RSP and the RBP to 16 byte boundaries, it appears to be using more than it needs to... I have no idea why.
I had a quick look at a static array of x - 1. For -O0 it made no difference to the stack use (for the same reason as before). For -O2, it took one look at foo(x, minuses[z]); and hoisted the minuses[z] out of the loop ! Which one ought to have expected... and the stack use stayed the same (at 6 x 8).
More generally, as noted elsewhere, any effective amount of optimisation is going to hoist calculations out of loops where it can. The other thing which is going on is heavy use of registers to hold variables -- both real variables (those you have named) and "pseudo" variables (to hold the pre-calculated result of hoisted stuff). Those registers need to be saved across calls of subroutines -- either by the caller or the callee. The x86 push/pop operate on the entire register, so an unsigned char held in a register is going to need a full 8 or 4 (64-bit or 32-bit mode) bytes of stack. But, hey, that's what you pay for the optimisation !
It's not entirely clear to me whether its the run-time or the stack-use which you are most concerned about. Either way, the message is to leave it up to the compiler and worry if and only if the thing is too slow, and then only worry about the bits which profiling shows are a problem !

Custom "non-traditional" polymorphism implementation

I've been looking for a custom polymorphic solution to improve binary compatibility. The problem is that pointer members are varying size on different platforms, so even "static" width members get pushed around producing binary incompatible layouts.
They way I understand, most compilers implement v-tables in a similar way:
__________________
| | | |
|0| vtable* | data | -> ->
|_|_________|______|
E.g. the v-table is put as the first element of the object, and in cases of multiple inheritance the inherited classes are sequentially aligned with the appropriate padding to align.
So, my idea is the following, put all v-tables (and all varying "platform width" members) "behind":
__________________
| | | |
<- | vtable* |0| data | ->
|_________|_|______|
This way the layout right of the 0 (alignment boundary for the first data member) is only comprised of types with explicit and portable size and alignment (so it can stay uniform), while at the same time you can still portably navigate through the v-tables and other pointer members using indices with the platform width stride. Also, since all members to the left will be the same size and alignment, this could reduce the need of extra padding in the layout.
Naturally, this means that the "this" pointer will no longer point at the beginning of the object, but will be some offset which will vary with every class. Which means that "new" and "delete" must make adjustments in order for the whole scheme to work. Would that have a measurable negative impact, considering that one way or another, offset calculation takes place when accessing members anyway?
My question is whether someone with more experience can point out potential caveats of using this approach.
Edit:
I did a quick test to determine whether the extra offset calculation will be detrimental to the performance of virtual calls (yeah yeah, I know it is C++ code inside a C question, but I don't have a nanosecond resolution timer for C, plus the whole point so to compare to the existing polymorphism implementation):
class A;
typedef void (*foo)(A*);
void bar(A*) {}
class A {
public:
A() : a(&bar) { }
foo a;
virtual void b() {}
};
int main() {
QVector<A*> c;
int r = 60000000;
QElapsedTimer t;
for (int i = 0; i < r; ++i) c.append(new A);
cout << "allocated " << c.size() << " object in " << (quint64)t.elapsed() << endl;
for (int count = 0; count < 5; ++count) {
t.restart();
for (int i = 0; i < r; ++i) {
A * ap = c[i]; // note that c[i]->a(c[i]) would
ap->a(ap); // actually result in a performance hit
}
cout << t.elapsed() << endl;
t.restart();
for (int i = 0; i < r; ++i) {
c[i]->b();
}
cout << t.elapsed() << endl;
}
}
After testing with 60 million objects (70 million failed to allocate on the 32bit compiler I am currently using) it doesn't look like there is any measurable difference between calling a regular virtual function and calling through a pointer that is not the first element in the object (and therefore needs additional offset calculation), and even though in the case of the function pointer the memory address is passed twice, e.g. pass to find the offset of a and then pass into a). In release mode the time for the two functions are identical (+/- 1 nsec for 60mil calls), and in debug mode the function pointer is actually about 1% faster consistently (maybe a function pointer requires less resources than a virtual function).
The overhead from adjusting the pointer when allocating and deleting also seems to be practically negligible and totally within the margin of error. Which is kind of expected, considering it should add no more than a single increment of a value that is already on a register with an immediate value, something that should take a single cycle on the platforms I intend to target.
FYI, the address of the vtable is placed at the first DWORD/QWORD of the object, not the table itself. the vtable is shared between objects of the same class/struct.
Having different vtable sizes between platforms is a non-issue BTW. Incompatible platforms can't execute native code of other platforms and for binary translation to work, the emulator needs to know the original architecture.
The main drawbacks to your solution are performance and complexity over the current implementations.

Resources