I am trying to thoroughly understand the code I found on Github. I am running this code on Eclipse (Version: 3.6.1
Build id: M20100909-0800).
I want to efficiently debug these lines of code:
for (index_X = 0; index_X < nb_MCU_X; index_X++) {
for (index_Y = 0; index_Y < nb_MCU_Y; index_Y++) {
for (index = 0; index < SOS_section.n; index++) {
uint32_t component_index = component_order[index];
int nb_MCU = ((SOF_component[component_index].HV >> 4) & 0xf) * (SOF_component[component_index].HV & 0x0f);
for (chroma_ss = 0; chroma_ss < nb_MCU; chroma_ss++) {
unpack_block(movie, & scan_desc, index, MCU);
iqzz_block(MCU, unZZ_MCU, DQT_table[SOF_component[component_index].q_table]);
IDCT(unZZ_MCU, YCbCr_MCU_ds[component_index] + (64 * chroma_ss));
}
upsampler(YCbCr_MCU_ds[component_index], YCbCr_MCU[component_index],
max_ss_h / ((SOF_component[component_index].HV >> 4) & 0xf), max_ss_v / ((SOF_component[component_index].HV) & 0xf), max_ss_h, max_ss_v);
}
if (color && (SOF_section.n > 1)) {
YCbCr_to_ARGB(YCbCr_MCU, RGB_MCU, max_ss_h, max_ss_v);
} else {
to_NB(YCbCr_MCU, RGB_MCU, max_ss_h, max_ss_v);
}
screen_cpyrect(index_Y * MCU_sy * max_ss_h, index_X * MCU_sx * max_ss_v, MCU_sy * max_ss_h, MCU_sx * max_ss_v, RGB_MCU);
}
}
The code above contains a number of loops and stepping over every line of code many times is laborious (nb_MCU_X is 18 and nb_MCU_Y is 32).
I tried to change the values of index_X and index_Y in Debug mode. I thought doing so would take me to a point in the program where more of the code will have been processed. However, Only index_X and index_Y changed to the values I gave them but all other dependent values did not change with them. Consequently, the behavior of the program was distorted and it began behaving erratically.
I tried setting breakpoints immediately after this section of code. However, it does not allow me to see the next step that occurs after the section of the code above is processed. I want to know instantly what the condition of the code will be when index_X or index_Y is at any value of my choosing.
Is there a way for me in Eclipse to go forward in time and have more iterations processed instead of stepping over each line of the code?
What should I do if, for example, index_Y is currently 0 but I want to go instantly to a point in the program where index_Y is 7 and the rest of the code has also changed accordingly?
Related
This development is being done on Windows in usermode.
I have two (potentially quite large) buffers, and I would like to know the number of bytes different between the two of them.
I wrote this myself just checking byte by byte, but this resulted in a quite slow implementation. As I'm comparing on the order of hundreds of megabytes, this is undesirable. I'm aware that I could optimize this though many different means, but this seems like a common problem that's probably got optimized solutions already out there, and there's no way I'm going to optimize this as effectively as if it was written by optimization experts.
Perhaps my Googling is inadequate, but I'm unable to find any other C or C++ functions that can count the number of different bytes between two buffers. Is there such a built in function to the C standard library, WinAPI, or C++ standard library that I just don't know of? Or do I need to manually optimize this?
I ended up writing this (perhaps somewhat poorly) optimized code to do the job for me. I was hoping it would vectorize this under the hood, but that doesn't appear to be happening unfortunately, and I didn't feel like digging around the SIMD intrinsics to do it manually. As a result, my bit fiddling tricks may end up making it slower, but it's still fast enough that it's no more than about 4% of my code's runtime (and almost all of that was memcmp). Whether or not it could be better, it's good enough for me.
I'll note that this is designed to be fast for my use case, where I'm expecting only rare differences.
inline size_t ComputeDifferenceSmall(
_In_reads_bytes_(size) char* buf1,
_In_reads_bytes_(size) char* buf2,
size_t size) {
/* size should be <= 0x1000 bytes */
/* In my case, I expect frequent differences if any at all are present. */
size_t res = 0;
for (size_t i = 0; i < (size & ~0xF); i += 0x10) {
uint64_t diff1 = *reinterpret_cast<uint64_t*>(buf1) ^
*reinterpret_cast<uint64_t*>(buf2);
if (!diff1) continue;
/* Bit fiddle to make each byte 1 if they're different and 0 if the same */
diff1 = ((diff1 & 0xF0F0F0F0F0F0F0F0ULL) >> 4) | (diff1 & 0x0F0F0F0F0F0F0F0FULL);
diff1 = ((diff1 & 0x0C0C0C0C0C0C0C0CULL) >> 2) | (diff1 & 0x0303030303030303ULL);
diff1 = ((diff1 & 0x0202020202020202ULL) >> 1) | (diff1 & 0x0101010101010101ULL);
/* Sum the bytes */
diff1 = (diff1 >> 32) + (diff1 & 0xFFFFFFFFULL);
diff1 = (diff1 >> 16) + (diff1 & 0xFFFFULL);
diff1 = (diff1 >> 8) + (diff1 & 0xFFULL);
diff1 = (diff1 >> 4) + (diff1 & 0xFULL);
res += diff1;
}
for (size_t i = (size & ~0xF); i < size; i++) {
res += (buf1[i] != buf2[i]);
}
return res;
}
size_t ComputeDifference(
_In_reads_bytes_(size) char* buf1,
_In_reads_bytes_(size) char* buf2,
size_t size) {
size_t res = 0;
/* I expect most pages to be identical, and both buffers should be page aligned if
* larger than a page. memcmp has more optimizations than I'll ever come up with,
* so I can just use that to determine if I need to check for differences
* in the page. */
for (size_t pn = 0; pn < (size & ~0xFFF); pn += 0x1000) {
if (memcmp(&buf1[pn], &buf2[pn], 0x1000)) {
res += ComputeDifferenceSmall(&buf1[pn], &buf2[pn], 0x1000);
}
}
return res + ComputeDifferenceSmall(
&buf1[size & ~0xFFF], &buf2[size & ~0xFFF], size & 0xFFF);
}
I struggle with a bug since hours now. Basically, I do some simple bit operation on an uint64_t array in main.c (no function calls). It works properly on gcc (Ubuntu), MSVS2019 (Windows 10) in Debug, but not in Release. However my target architecture is x64/Windows, so I need to get it work properly with MSVS2019/Release. Besides that, I'm curious what the reason for the problem is. None of the compilers shows errors or warnings.
Now, as soon as I add a totally unrelated command to the loop (commented printf()), it works properly.
...
int q = 5;
uint64_t a[32] = { 0 };
// a[] is filled with data
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 2) | 8;
// printf("%i \n", i); // that's the line which makes it work
}
...
Initially I believed that I messed up the stack somewhere before the for() loop, but I checked it up multiple times ... all fine!
all used variables are checked to be initialized
no pointer returns of local variables (in scope)
array indexing (reads and writes) all within declaration limits (in scope)
All Google/SE posts explain subject UB to some of the above reasons, but none of these apply for my code. Also the fact, that it works in MSVS2019/Debug and gcc shows the code works.
What do I miss?
--- UPDATE (24.08.2021 12:00) ---
I'm completely stuck, since added printf() modifies the result and MSVS/Debug works. So how can I inspect variables?!
#Lev M There are quite some calculations before and after the shown for() loop. That's why I skipped most of the code and just showed the snippet where I could influence the code towards working correctly. I know what should be the final result (it's just a uint64_t), and it's wrong with the Release version of MSVS. I also checked w/o the for() loop. It's not optimized "away". If I leave it out completely, the result is again different.
#tstanisl It's just a matter of an uint64_t number. I know that input A should output B.
#Steve Summit That's why I posted (a bit desperate). I checked in all directions, isolated as much code as I could and yet ... no uninitialized variable or array out of bound. Driving me nuts.
#Craig Estey The code is unfortunately quite extensive. I wonder ... could the error also be in a part of the code which doesn't run?
#Eric Postpischil Agreed!
#Nate Eldredge I tested on valgrind (see below).
...
==13997== HEAP SUMMARY:
==13997== in use at exit: 0 bytes in 0 blocks
==13997== total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated
==13997==
==13997== All heap blocks were freed -- no leaks are possible
==13997==
==13997== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
--- UPDATE (24.08.2021 18:00) ---
I found the reason for the problem (after countless trial-and-errors), but no solution yet. I post more of the code.
...
int q = 5;
uint64_t a[32] = { 0 };
// a[] is filled with data
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 2) | 8;
// printf("%i \n", i); // that's the line which makes it work
}
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 3) | 3;
}
...
In fact, the MSVS/Release compiler did this:
...
int q = 5;
uint64_t a[32] = { 0 };
// a[] is filled with data
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 2) | 8;
a[q] = (a[q] << 3) | 3;
}
...
... which is not the same. Never seen such a thing!
How can I force the compiler to keep the 2 for() loops separate?
Summary:
MSVS/Release (default solution properties) optimization will change this code ...
// Code 1
...
int q = 5;
uint64_t a[32];
// a[] is filled with data
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 2) | 8;
// printf("%i \n", i); // that's the line which makes it work
}
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 3) | 3;
}
...
... into the following one, which is not the same as ...
// Code 2
...
int q = 5;
uint64_t a[32];
// a[] is filled with data
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 2) | 8;
a[q] = (a[q] << 3) | 3;
}
...
Above excerpt is slightly simplified, since not limited to constant 32 loops, but kept variable (% 8). Hence 64-bit constants can't be used as commented by a user.
Discoveries:
MSVS/Release - fails
MSVS/Debug - works
gcc/Release - works
gcc/Debug - works
MSVS/Release optimization merges the two for() loops (Code 1) into one for() loop (Code 2).
Fixes:
The commented printf() provides an artificial fix this as the compiler sees the requirement to print an intermediate result.
An alternative fix would be to to use the type qualifier volatile for a[].
The root of the issue is, that MSVS optimization doesn't consider that the index q remains the same in both loops, meaning that the first loop needs to finish before the second loop starts.
I'm trying to get some experience with OpenCL, the environment is setup and I can create and execute kernels. I am currently trying to compute pi in parallel using the Leibniz formula but have been receiving some strange results.
The kernel is as follow:
__kernel void leibniz_cl(__global float *space, __global float *result, int chunk_size)
{
__local float pi[THREADS_PER_WORKGROUP];
pi[get_local_id(0)] = 0.;
for (int i = 0; i < chunk_size; i += THREADS_PER_WORKGROUP) {
// `idx` is the work item's `i` in the grander scheme
int idx = (get_group_id(0) * chunk_size) + get_local_id(0) + i;
float idx_f = 1 / ((2 * (float) idx) + 1);
// Make the fraction negative if needed
if(idx & 1)
idx_f = -idx_f;
pi[get_local_id(0)] += idx_f;
}
// Reduction within workgroups (in `pi[]`)
for(int groupsize = THREADS_PER_WORKGROUP / 2; groupsize > 0; groupsize >>= 1) {
if (get_local_id(0) < groupsize)
pi[get_local_id(0)] += pi[get_local_id(0) + groupsize];
barrier(CLK_LOCAL_MEM_FENCE);
}
If I end the function here and set result to pi[get_local_id(0)] for !get_global_id(0) (as in the reduction for the first group), printing result prints -nan.
Remainder of kernel:
// Reduction amongst workgroups (into `space[]`)
if(!get_local_id(0)) {
space[get_group_id(0)] = pi[get_local_id(0)];
for(int groupsize = get_num_groups(0) / 2; groupsize > 0; groupsize >>= 1) {
if(get_group_id(0) < groupsize)
space[get_group_id(0)] += space[get_group_id(0) + groupsize];
barrier(CLK_LOCAL_MEM_FENCE);
}
}
barrier(CLK_LOCAL_MEM_FENCE);
if(get_global_id(0) == 0)
*result = space[get_group_id(0)] * 4;
}
Returning space[get_group_id(0)] * 4 returns either -nan or a very large number which clearly is not an approximation of pi.
I can't decide if it is an OpenCL concept I'm missing or a parallel execution one in general. Any help is appreciated.
Links
Reduction template: OpenCL float sum reduction
Leibniz Formula: https://www.wikiwand.com/en/Leibniz_formula_for_%CF%80
Maybe these are not most critical issues with the code but they can be the source of problem:
You definetly should use barrier(CLK_LOCAL_MEM_FENCE); before local reduction. This can be avoided if only you know that work group size is equal or smaller than number of threads in wavefront running same instruction in parallel - 64 for AMD GPUs, 32 for NVidia GPUs.
Global reduction must be done in multiple launches of kernel because barrier() works for work items of same work group only. Clear and 100% working way to insert a barrier into kernel is splittion it in two in the place where global barier is needed.
I'm using Ryyst's code from here - How do I base64 encode (decode) in C? - to base64 encode an image file and insert it into a HTML document.
It works! - except on the second line of base64-encoded output there is a single stray "X" at the end of the line.
It's always the second line, and only the second line, no matter how large the binary file (I've tried many).
If I remove the stray "X" manually, the encoded data exactly matches the output of the base64 utility, and the image is correctly decoded by the browser.
I've tried adding "\0" to the ends of each char array to make sure they are properly terminated (made no difference). I've checked that "buffer" is always 60 bytes, and that output_length is always 80 bytes (they are). I've read and re-read Ryyst's code to see if anything there could cause it (didn't see anything, but I am a C n00b). I did a rain dance. I searched for a virgin to toss down a volcano (can't find either one around here). The bug is still there.
Here are the important bits of the code -
while (cgiFormFileRead(CoverImageFile, buffer, BUFFERLEN, &got) ==cgiFormSuccess)
{
if(got>0)
{
fputs(base64_encode(buffer, got, &output_length), targetfile);
fputs("\n", targetfile);
}
}
And the base64_encode function is -
char *base64_encode(const unsigned char *data, size_t input_length,
size_t *output_length)
{
*output_length = 4 * ((input_length + 2) / 3);
char *encoded_data = malloc(*output_length);
if (encoded_data == NULL)
return NULL;
int i = 0, j = 0;
for (i = 0, j = 0; i < input_length;)
{
uint32_t octet_a = i < input_length ? data[i++] : 0;
uint32_t octet_b = i < input_length ? data[i++] : 0;
uint32_t octet_c = i < input_length ? data[i++] : 0;
uint32_t triple = (octet_a << 0x10) + (octet_b << 0x08) + octet_c;
encoded_data[j++] = encoding_table[(triple >> 3 * 6) & 0x3F];
encoded_data[j++] = encoding_table[(triple >> 2 * 6) & 0x3F];
encoded_data[j++] = encoding_table[(triple >> 1 * 6) & 0x3F];
encoded_data[j++] = encoding_table[(triple >> 0 * 6) & 0x3F];
}
for (i = 0; i < mod_table[input_length % 3]; i++)
encoded_data[*output_length - 1 - i] = '=';
return encoded_data;
}
(as you can see, I'm also using the cgic library v 205, but I don't think the problem is from there because its giving the right number of bytes)
(And BUFFERLEN is a constant, equals 60.)
What am I doing wrong, guys?
(Even more frustratingly, I /did/ get Ryyst's algorithm to work flawlessly once before, so his code /does/ work.)
I'm compiling using gcc on an ARM-based Debian Linux system, if that makes any difference.
Comparing your function with the original you've deleted:
encoded_data[j++] = encoding_table[(triple >> 0 * 6) & 0x3F];
Apart from that, the function is the same, I'm guessing that's just a copy error.
The problem is you are using BUFFERLEN rather than looking at got, which returns the amount of data read, the second line doesn't read the full 60 characters so you are encoding whatever junk is at the end of the buffer.
My Thermo professor assigned our class a computational project in which we have to calculate some thermodynamic functions. He provided us with some code to work off of which is a program that essentially finds the area under a curve between two points for the function x^2. The code is said to be correct and it looks correct to me. However, I've been having FREQUENT problems with all of my programs giving me the error "'File location'.exe is not recognized as internal or external command, operable programs or batch files." upon initial running of a project or [mostly] reopening projects.
I've been researching the problem for many hours. I tried adjusting the environmental variables like so many other sites suggested, but I'm either not doing it right or it's not working. All I keep reading about is people explaining the purpose of an .exe file and that I have to locate that file and open that. The problem is that I cannot find ANY .exe file. There is the project I created with the source.c file I created and wrote the program in. Everything else has lengthy extensions that I've never seen before.
I'm growing increasingly impatient with Visual Studios' inconsistent behavior lately. I've just made the switch from MATLAB, which although is an inferior programming language, is far more user friendly and easier to program with. For those of you interested in the code I'm running, it is below:
#include <stdio.h>
#include <iostream>
#include <math.h>
using namespace std;
double integration();
double integration()
{
int num_of_intervals = 4, i;
double final_sum = 0, lower_limit = 2, upper_limit = 3, var, y = 1, x;
x = (upper_limit - lower_limit) / num_of_intervals; // Calculating delta x value
if(num_of_intervals % 2 != 0) //Simpson's rule can be performed only on even number of intervals
{
printf("Cannot perform integration. Number of intervals should be even");
return 0;
}
for(i = 0 ; i < num_of_intervals ; i++)
{
if(i != 0) //Coefficients for even and odd places. Even places, it is 2 and for odd it is 4.
{
if(i % 2 == 0)
y = 2;
else
y = 4;
}
var = lower_limit + (i * x);// Calculating the function variable value
final_sum = final_sum + (pow(var, 2) * y); //Calculating the sum
}
final_sum = (final_sum + pow(upper_limit , 2)) * x / 3; //Final sum
return final_sum;
}
int main()
{
printf("The integral value of x2 between limits 2 and 3 is %lf \n" , integration());
system("PAUSE");
return 0;
}
Thanks in advance,
Dom