Fastest way to check mass data if null in C? [duplicate]

Fastest way to check mass data if null in C? [duplicate] - c

This question already has answers here:
Faster approach to checking for an all-zero buffer in C?
(19 answers)
Closed 7 years ago.
I have a mass of data, maybe 4MB. Now want to check if all bits in it are 0.
Eg:
Here is the data:
void* data = malloc(4*1024*1024);
memset(data, 0, 4*1024*1024);
Check if all bits in it are 0. Here is my solution which is not fast enough:
int dataisnull(char* data, int length)
{
int i = 0;
while(i<length){
if (data[i]) return 0;
i++;
}
return 1;
}
This code might have some things to improve in performance. For example, in 32/64 bits machine, checking 4/8 bytes at a time may be faster.
So I wonder what is the fastest way to do it?

You can handle multiple bytes at a time and unroll the loop:
int dataisnull(const void *data, size_t length) {
/* assuming data was returned by malloc, thus is properly aligned */
size_t i = 0, n = length / sizeof(size_t);
const size_t *pw = data;
const unsigned char *pb = data;
size_t val;
#define UNROLL_FACTOR 8
#if UNROLL_FACTOR == 8
size_t n1 = n - n % UNROLL_FACTOR;
for (; i < n1; i += UNROLL_FACTOR) {
val = pw[i + 0] | pw[i + 1] | pw[i + 2] | pw[i + 3] |
pw[i + 4] | pw[i + 5] | pw[i + 6] | pw[i + 7];
if (val)
return 0;
}
#endif
val = 0;
for (; i < n; i++) {
val |= pw[i];
}
for (i = n * sizeof(size_t); i < length; i++) {
val |= pb[i];
}
return val == 0;
}
Depending on your specific problem, it might be more efficient to detect non zero values early or late:
If the all zero case is the most common, you should compute cumulate all bits into the val accumulator and test only at the end.
If the all zero case is rare, you should check for non zero values more often.
The unrolled version above is a compromise that tests for non zero values every 64 or 128 bytes depending on the size of size_t.
Depending on your compiler and processor, you might get better performance by unrolling less or more. You could also use intrinsic functions available for your particular architecture to take advantage of vector types, but it would be less portable.
Note that the code does not verify proper alignment for the data pointer:
it cannot be done portably.
it assumes the data was allocated via malloc or similar, hence properly aligned for any type.
As always, benchmark different solutions to see if it makes a real difference. This function might not be a bottleneck at all, writing a complex function to optimize a rare case is counterproductive, it makes the code less readable, more likely to contain bugs and much less maintainable. For example, the assumption on data alignment may not hold if you change memory allocation scheme or if you use static arrays, the function may invoke undefined behavior then.

The following checks if the first byte is what you want, and all subsequent pairs of bytes are the same.
int check_bytes(const char * const data, size_t length, const char val)
{
if(length == 0) return 1;
if(*data != val) return 0;
return memcmp(data, data+1, length-1) ? 0 : 1;
}
int check_bytes64(const char * const data, size_t length, const char val)
{
const char * const aligned64_start = (char *)((((uintptr_t)data) + 63) / 64 * 64);
const char * const aligned64_end = (char *)((((uintptr_t)data) + length) / 64 * 64);
const size_t start_length = aligned64_start - data;
const size_t aligned64_length = aligned64_end - aligned64_start;
const size_t end_length = length - start_length - aligned64_length;
if (!check_bytes(data, start_length, val)) return 0;
if (!check_bytes(aligned64_end, end_length, val)) return 0;
return memcmp(aligned64_start, aligned64_start + 64, aligned64_length-64) ? 0 : 1;
}
A more elaborate version of this function should probably pass cache-line-aligned pointers to memcmp, and manually check the remaining blocks(s) instead of just the first byte.
Of course, you will have to profile on your specific hardware to see if there is any speed benefit of this method vs others.
If anyone doubts whether this works, ideone.

I once wrote the following function for my own use. It assumes that the data to check is a multiple of a constant chunk size and aligned properly for a buffer of machine words. If this is not given in your case, it is not hard to loop for the first and last few bytes individually and only check the bulk with the optimized function. (Strictly speaking, it is undefined behavior even if the array is properly aligned but the data has been written by any type that is incompatible with unsigned long. However, I believe that you can get pretty far with this careful breaking of the rules here.)
#include <assert.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
bool
is_all_zero_bulk(const void *const p, const size_t n)
{
typedef unsigned long word_type;
const size_t word_size = sizeof(word_type);
const size_t chunksize = 8;
assert(n % (chunksize * word_size) == 0);
assert((((uintptr_t) p) & 0x0f) == 0);
const word_type *const frst = (word_type *) p;
const word_type *const last = frst + n / word_size;
for (const word_type * iter = frst; iter != last; iter += chunksize)
{
word_type acc = 0;
// Trust the compiler to unroll this loop at its own discretion.
for (size_t j = 0; j < chunksize; ++j)
acc |= iter[j];
if (acc != 0)
return false;
}
return true;
}
The function itself is not very smart. The main ideas are:
Use large unsigned machine words for data comparison.
Enable loop unrolling by factoring out an inner loop with a constant iteration count.
Reduce the number of branches by ORing the words into an accumulator and only comparing it every few iterations against zero.
This should also make it easy for the compiler to generate vectorized code using SIMD instructions which you really want for code like this.
Additional non-standard tweaks would be to annotate the function with __attribute__ ((hot)) and use __builtin_expect(acc != 0, false). Of course, the most important thing is to turn on your compiler's optimizations.

Related

In this case, how to save data more efficiently and conveniently?

I am measuring the latency of some operations.
There are many scenarios here.
The delay of each scene is roughly distributed in a small interval. For each scenario, I need to measure 500,000 times. Finally I want to output the delay value and its corresponding number of times.
My initial implementation was:
#define range 1000
int rec_array[range];
for (int i = 0; i < 500000; i++) {
int latency = measure_latency();
rec_array[latency]++;
}
for (int i = 0; i < range; i++) {
printf("%d %d\n", i, rec_array[i]);
}
But this approach was fine at first, but as the number of scenes grew, it became problematic.
The delay measured in each scene is concentrated in a small interval. So for most of the data in the rec_array array is 0.
Since each scene is different, the delay value is also different. Some delays are concentrated around 500, and I need to create an array with a length greater than 500. But some are concentrated around 5000, and I need to create an array with a length greater than 5000.
Due to the large number of scenes, I created too many arrays. For example I have ten scenes and I need to create ten rec_arrays. And I also set them to be different lengths.
Is there any efficient and convenient strategy? Since I am using C language, templates like vector cannot be used.
I considered linked lists. However, considering that the interval of the delay value distribution is uncertain, and how many certain delay values are uncertain, and when the same delay occurs, the timing value needs to be increased. It also doesn't seem very convenient.
I'm sorry, I just went out. Thank you for your help. I read the comments carefully. Here are some of my answers.
These data are mainly used to draw pictures,For example, this one below.
The comment area says that data seems small. The main reason why I thought about this problem is that according to the picture, only a few arrays are used each time, and the vast majority are 0. And there are many scenarios where I need to generate an array for each. I have referred to an open source implementation.
According to the comments, it seems that using arrays directly is a good solution, considering fast access. Thanks veru much!

A linked list is probably (and almost always) the least efficient way to store things – both slow as hell, and memory inefficient, since your values use less storage than your pointers. Linked lists are very rarely a good solution for anything that actually stores significant data. The only reason they're so prevalent is that C still has no proper containers, and they're easy wheels to
reinvent for every single C program you write.
#define range 1000
int rec_array[range];
So you're (probably! This depends on your compiler and where you write int rec_array[range];) storing rec_array on the stack, and it's large. (Actually, 4000 Bytes is not "large" by any modern computer's means, but still.) You should not be doing that; instead, this should be heap allocated, once, at initialization.
The solution is to allocate it:
/* SPDX-License-Identifier: LGPL-2.1+ */
/* Copyright Marcus Müller and others */
#include <stdlib.h>
#define N_RUNS 500000
/*
* Call as
* program maximum_latency
*/
unsigned int *run_benchmark(struct task_t task, unsigned int *latencies,
unsigned int *max_latency) {
for (unsigned int run = 0; run < N_RUNS; ++run) {
unsigned int latency = measure_latency();
if (latency >= *max_latency) {
latency = *max_latency - 1;
/*
* alternatively: use realloc to increase the size of the `latencies`,
* and update max_latency as well; that's basically what C++ std::vector
* does
*/
(latencies[latency])++;
}
}
return latencies;
}
int main(int argc, char **argv) {
// check argument
if (argc != 2) {
exit(127);
}
int maximum_latency_raw = atoi(argv[1]);
if (maximum_latency_raw <= 0) {
exit(126);
}
unsigned int maximum_latency = maximum_latency_raw;
/*
* note that the length does no longer have to be a constant
* if you're using calloc/malloc.
*/
unsigned int *latency_counters =
(unsigned int *)calloc(maximum_latency, sizeof(unsigned int));
for (; /* benchmark task in benchmark_tasks */;) {
run_benchmark(task, latency_counters, &maximum_latency);
print_benchmark_result(latency_counters, maximum_latency);
// clear our counters after each run!
memset(latency_counters, 0, maximum_latency * sizeof(unsigned int));
}
}
void print_benchmark_result(unsigned int *array, unsigned int length) {
for (unsigned int index = 0; index < length; ++index) {
printf("%d %d\n", i, rec_array[i]);
}
puts("============================\n");
}
Note especially the "alternatively: realloc" comment in the middle: realloc allows you to increase the size of your array:
unsigned int *run_benchmark(struct task_t task, unsigned int *latencies,
unsigned int *max_latency) {
for (unsigned int run = 0; run < N_RUNS; ++run) {
unsigned int latency = measure_latency();
if (latency >= *max_latency) {
// double the size!
latencies = (unsigned int *)realloc(latencies, (*max_latency) * 2 *
sizeof(unsigned int));
// realloc doesn't zero out the extension, so we need to do that
// ourselves.
memset(latencies + (*max_latency), 0, (*max_latency)*sizeof(unsigned int);
(*max_latency) *= 2;
(latencies[latency])++;
}
}
return latencies;
}
This way, your array grows when you need it to!

how about using a Hash table so we would only save the latency used and maybe the keys in the Hash table can be ranges while the values of said keys be the actual latency?

Just sacrifice some precision in your latencies like 0-15, 16-31, 32-47 ... etc. Now your array will be 16x smaller.
Allocate all latency counter arrays for all scenes in one go
unsigned int *latency_div16_counter = (unsigned int *)calloc((MAX_LATENCY >> 4) * NUM_OF_SCENES, sizeof(unsigned int));
Clamp the values to the max latency, div 16 and store
for (int scene = 0; scene < NUM_OF_SCENES; scene++) {
for (int i = 0; i < 500000; i++) {
int latency = measure_latency();
if(latency >= MAX_LATENCY) latency = MAX_LATENCY - 1;
latency = latency >> 4; // int div 16
latency_div16_counter[(scene * MAX_LATENCY) + latency]++;
}
}
Adjust the data (mul 16) before displaying it
for (int scene = 0; scene < NUM_OF_SCENES; scene++) {
for (int i = 0; i < (MAX_LATENCY >> 4); i++) {
printf("Scene %d Latency %d Total %d\n", scene, i * 16, latency_div16_counter[i]);
}
}

What really is alignment on a longword boundary in the source code of memchr?

I've tried to understand and rewrite the memchr function but I found something strange at the beginning of the code.
We can read that:
#include "libc.h"
#include <unistd.h>
void *my_memchr(void const *s, int c_in, size_t n)
{
unsigned const char *char_ptr;
unsigned char c;
/*
** t_longword is a typedef for unsigned long int **
*/
t_longword *longword_ptr;
t_longword magic;
t_longword mega_c;
c = (unsigned char)c_in;
for (char_ptr = (unsigned const char*)s; n > 0
&& (size_t)char_ptr % sizeof(t_longword) != 0; --n, ++char_ptr)
{
if (*char_ptr == c)
return ((void*)char_ptr);
}
longword_ptr = (t_longword*)char_ptr;
print_bits(*longword_ptr);
magic = 0x101010101010100;
mega_c = c | (c << 8);
mega_c |= mega_c << 16;
mega_c |= mega_c << 32;
/*
** I didn't finish to rewrite the entire function**
*/
return (NULL);
}
I was wondering why the first loop is mandatory ? I've already tried without in the function strlen and I've got some bugs from time to time but I don't know why.

The optimized part of memchr() requires that it work with a pointer that is four byte aligned. However, there is no requirement that s as passed to the function is aligned in that way.
The purpose of that first loop is to advance s if necessary just far enough that it is correctly aligned for the optimized portion. The loop is as complex as it is because it has to deal with two edge cases:
The character being searched for is in the first non-aligned few bytes, and
The case of the non-aligned starting area being so small that you reach the end of the buffer before getting the pointer aligned.

How to Implement a Bitset in C [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 6 years ago.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Improve this question
The below code, which creates and uses a bitset, is from the following tutorial "Intro to Bit Vectors". I am rewriting this code to try to learn and understand more about C structs and pointers.
#include <stdio.h>
#include <stdlib.h>
#define WORDSIZE 32
#define BITS_WORDSIZE 5
#define MASK 0x1f
// Create a bitset
int initbv(int **bv, int val) {
*bv = calloc(val/WORDSIZE + 1, sizeof(int));
return *bv != NULL;
}
// Place int 'i' in the biset
void set(int bv[], int i) {
bv[i>>BITS_WORDSIZE] |= (1 << (i & MASK));
}
// Return true if integer 'i' is a member of the bitset
int member(int bv[], int i) {
int boolean = bv[i>>BITS_WORDSIZE] & (1 << (i & MASK));
return boolean;
}
int main() {
int *bv, i;
int s1[] = {32, 5, 0};
int s2[] = {32, 4, 5, 0};
initbv(&bv, 32);
// Fill bitset with s1
for(i = 0; s1[i]; i++) {
set(bv, s1[i]);
}
// Print intersection of bitset (s1) and s2
for(i = 0; s2[i]; i++) {
if(member(bv, s2[i])) {
printf("%d\n", s2[i]);
}
}
free(bv);
return 0;
}
Below, is what I have rewritten to make use of structs.
#include <stdio.h>
#include <stdlib.h>
#define WORDSIZE 32
#define BITS_WS 5
#define MASK 0x1f
struct bitset {
int *bv;
};
/* Create bitset that can hold 'size' items */
struct bitset * bitset_new(int size) {
struct bitset * set = malloc(sizeof(struct bitset));
set->bv = calloc(size/WORDSIZE + 1, sizeof(int));
return set;
}
/* Add an item to a bitset */
int bitset_add(struct bitset * this, int item) {
return this->bv[item>>BITS_WS] |= (1 << (item & MASK));
}
/* Check if an item is in the bitset */
int bitset_lookup(struct bitset * this, int item) {
int boolean = this->bv[item>>BITS_WS] & (1 << (item & MASK));
printf("%d\n", boolean);
return boolean;
}
int main() {
struct bitset * test = bitset_new(32);
int num = 5;
bitset_add(test, num);
printf("%d\n", bitset_lookup(test, num));
return 0;
}
What I have rewritten is not working as expected. To test the implementation, in main() I try a bitset_lookup and expect a return value of 0 or 1 but I get a value of 32. I believe it must be something to do with my use of pointers, although I cannot see what I am doing wrong.
Any help appreciated!

That is not a tutorial, it is misleading examples at best.
First of all, use an unsigned type. I recommend unsigned long (for various reasons, none of them critical). The <limits.h> header file defines constant CHAR_BIT, and the number of bits you can use in any unsigned integer type is always CHAR_BIT * sizeof (unsigned_type).
Second, you can make the bit map (or ordered bit set) dynamically resizable, by adding the size information to the structure.
The above boils down to
#include <stdlib.h>
#include <limits.h>
#define ULONG_BITS (CHAR_BIT * sizeof (unsigned long))
typedef struct {
size_t ulongs;
unsigned long *ulong;
} bitset;
#define BITSET_INIT { 0, NULL }
void bitset_init(bitset *bset)
{
if (bset) {
bset->ulongs = 0;
bset->ulong = NULL;
}
}
void bitset_free(bitset *bset)
{
if (bset) {
free(bset->ulong);
bset->ulongs = 0;
bset->ulong = NULL;
}
}
/* Returns: 0 if successfully set
-1 if bs is NULL
-2 if out of memory. */
int bitset_set(bitset *bset, const size_t bit)
{
if (bset) {
const size_t i = bit / ULONG_BITS;
/* Need to grow the bitset? */
if (i >= bset->ulongs) {
const size_t ulongs = i + 1; /* Use better strategy! */
unsigned long *ulong;
size_t n = bset->ulongs;
ulong = realloc(bset->ulong, ulongs * sizeof bset->ulong[0]);
if (!ulong)
return -2;
/* Update the structure to reflect the changes */
bset->ulongs = ulongs;
bset->ulong = ulong;
/* Clear the newly acquired part of the ulong array */
while (n < ulongs)
ulong[n++] = 0UL;
}
bset->ulong[i] |= 1UL << (bit % ULONG_BITS);
return 0;
} else
return -1;
}
/* Returns: 0 if SET
1 if UNSET
-1 if outside the bitset */
int bitset_get(bitset *bset, const size_t bit)
{
if (bset) {
const size_t i = bit / ULONG_BITS;
if (i >= bset->ulongs)
return -1;
return !(bset->ulong[i] & (1UL << (bit % ULONG_BITS)));
} else
return -1;
}
In a bitset structure, the ulong is a dynamically allocated array of ulongs unsigned longs. Thus, it stores ulongs * ULONG_BITS bits.
BITSET_INIT is a preprocessor macro you can use to initialize an empty bitset. If you cannot or do not want to use it, you can use bitset_init() to initialize a bitset. The two are equivalent.
bitset_free() releases the dynamic memory allocated for the bitset. After the call, the bit set is gone, and the variable used is re-initialized. (Note that it is perfectly okay to call bitset_free() on an un-used but initialized bit set, because calling free(NULL) is perfectly safe and does nothing.)
Because the OS/kernel will automatically release all memory used by a program (except for certain types of shared memory segments), it is not necessary to call bitset_free() just before a program exits. But, if you use bit sets as part of some algorithm, it is obviously a good practice to release the memory no longer needed, so that the application can potentially run indefinitely without "leaking" (wasting) memory.
bitset_set() automatically grows the bit set when necessary, but only to as large as is needed. This is not necessarily a good reallocation policy: malloc()/realloc() etc. calls are relatively slow, and if you happen to call bitset_set() in increasing order (by increasing bit number), you end up calling realloc() for every ULONG_BITS. Instead, it is often a good idea to adjust the new size (ulongs) upwards -- the exact formula you use for this is your reallocation policy --, but suggesting a good policy would require practical testing with practical programs. The shown one works, and is quite robust, but may be a bit slow in some situations. (You'd need to use at least tens of thousands of bits, though.)
The bitset_get() function return value is funky, because I wanted the function to return a similar value for both "unset" and "outside the bit set", because the two are logically similar. (That is, I consider the bit set, a set of set bits; in which case it is logical to think of all bits outside the set as unset.)
A much more traditional definition is
int bitset_get(bitset *bset, const size_t bit)
{
if (bset) {
const size_t i = bit / ULONG_BITS;
if (i >= bset->ulongs)
return 0;
return !!(bset->ulong[i] & (1UL << (bit % ULONG_BITS)));
} else
return 0;
}
which returns 1 only for bits set, and 0 for bits outside the set.
Note the !!. It is just two not operators, nothing too strange; making it a not-not operator. !!x is 0 if x is zero, and 1 if x is nonzero.
(A single not operator, !x, yields 1 if x is zero, and 0 if x is nonzero. Applying not twice yields the not-not I explained above.)
To use the above, try e.g.
int main(void)
{
bitset train = BITSET_INIT;
printf("bitset_get(&train, 5) = %d\n", bitset_get(&train, 5));
if (bitset_set(&train, 5)) {
printf("Oops; we ran out of memory.\n");
return EXIT_FAILURE;
} else
printf("Called bitset_set(&train, 5) successfully\n");
printf("bitset_get(&train, 5) = %d\n");
bitset_free(&train);
return EXIT_SUCCESS;
}
Because we do not make any assumptions about the hardware or system we are running (unless I goofed somewhere; if you notice I did, let me know in the comments so I can fix my goof!), and only stuff that the C standard says we can rely on, this should work on anything you can compile the code with a standards-compliant compiler. Windows, Linux, BSDs, old Unix, macOS, and others.
With some changes, it can be made to work on microcontrollers, even. I'm not sure if all development libraries have realloc(); even malloc() might not be available. Aside from that, on things like 32-bit ARMs this should work as-is just fine; on 8-bit AVRs and such it would be a good idea to use unsigned char and CHAR_BIT, instead, since they tend to emulate the larger types rather than supporting them in hardware. (The code above would work, but be slower than necessary.)

The returned value from bitset_lookup should be treated as a binary truth value (i.e. yes on no), not a numerical value. If it's zero, the bit is not set; if it's not zero, the bit is set. Really that function should return boolean != 0, which would force it to a value of zero or one, a true boolean, not the current zero or anything (well actually (1 << (item & MASK))). C/C++ not having a true boolean can cause this sort of confusion.

how to prove the correctness of a c program with coq

I want to prove the correctness of some of my programs but I don't know where to start. Let's say I have the following program, how can I prove its correctness or lack there of. How can I go from the source below and plug them into a theorem prover. Coq or ACL2 or pretty much anything.
The code below just counts the bytes that it reads from the standard input. It has 2 versions, one does a byte by byte count, the other reads them by unsigned integer size chunks when possible. I know it's not portable or pretty, it's just an example that could get me started. With some help.
The code works and I know it's correct and I know how to write unit tests for it but I don't know how to prove anything about it.
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
unsigned count_bytes1(unsigned char * bytes, unsigned len) {
unsigned count=0;
unsigned i;
for (i=0;i<len;i++) {
count+=bytes[i];
}
return count;
}
unsigned count_word(unsigned word) {
unsigned tmp = word;
if (sizeof(unsigned)==4) {
tmp = (0x00FF00FFU&tmp) + (( (0xFF00FF00U)&tmp)>>8);
tmp = (0x0000FFFFU&tmp) + (( (0xFFFF0000U)&tmp)>>16);
return tmp;
}
if (sizeof(unsigned)==8) {
tmp = (0x00FF00FF00FF00FFU&tmp) + (( (0xFF00FF00FF00FF00U)&tmp)>>8);
tmp = (0x0000FFFF0000FFFFU&tmp) + (( (0xFFFF0000FFFF0000U)&tmp)>>16);
tmp = (0x00000000FFFFFFFFU&tmp) + (( (0xFFFFFFFF00000000U)&tmp)>>32);
return tmp;
}
return tmp;
}
unsigned count_bytes2(unsigned char * bytes, unsigned len) {
unsigned count=0;
unsigned i;
for (i=0;i<len;) {
if ((unsigned long long)(bytes+i) % sizeof(unsigned) ==0) {
unsigned * words = (unsigned *) (bytes + i);
while (len-i >= sizeof(unsigned)) {
count += count_word (*words);
words++;
i+=sizeof(unsigned);
}
}
if (i<len) {
count+=bytes[i];
i++;
}
}
return count;
}
int main () {
unsigned char * bytes;
unsigned len=8192;
bytes=(unsigned char *)malloc(len);
len = read (0,bytes,len);
printf ("%u %u\n",count_bytes1(bytes,len),count_bytes2(bytes,len));
return 0;
}

1. Know what you are proving: specification
First, decide what it is you want to prove for your function. For instance, write a contract for your function, using the ACSL specification language:
/*# ensures \result >= x && \result >= y;
ensures \result == x || \result == y;
*/
int max (int x, int y);
2. Verification
Then, you may prove that your implementation satisfies the specification, for instance with Frama-C's WP plug-in.
The WP plug-in will generate proof obligations, the verification of which will ensure that the implementation is correct with respect to the specification. You can prove these in Coq 8.4+ if it amuses you (but almost nobody who actually does this does not first apply available, fully automatic SMT provers such as Alt-Ergo).
PS: it appears that you are trying to prove that one C function is equivalent to another, that is, to use a simple C function as specification for an optimized one. Proving the equivalence of one with respect to the other is the approach followed in this article:
José Bacelar Almeida, Manuel Barbosa, Jorge Sousa Pinto, and Bárbara Vieira.
Verifying cryptographic software correctness with respect to reference implementations. In FMICS’09, volume 5825 of LNCS, pages 37–52, 2009.

Pointer initializiation? for a specific function

Alright, this one's been puzzling me for a bit.
the following function encodes a string into base 64
void Base64Enc(const unsigned char *src, int srclen, unsigned char *dest)
{
static const unsigned char enc[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
unsigned char *cp;
int i;
cp = dest;
for(i = 0; i < srclen; i += 3)
{
*(cp++) = enc[((src[i + 0] >> 2))];
*(cp++) = enc[((src[i + 0] << 4) & 0x30)
| ((src[i + 1] >> 4) & 0x0f)];
*(cp++) = enc[((src[i + 1] << 2) & 0x3c)
| ((src[i + 2] >> 6) & 0x03)];
*(cp++) = enc[((src[i + 2] ) & 0x3f)];
}
*cp = '\0';
while (i-- > srclen)
*(--cp) = '=';
return;
}
Now, on the function calling Base64Enc() I have:
unsigned char *B64Encoded;
Which is the argument I pass onto unsigned char *dest in the base 64 encoding function.
I've tried different initializations from mallocs to NULL to other initialization. No matter what I do I alway get an exception and if I don't initialize it then the compiler (VS2005 C compiler) throws a warning telling me that it hasn't been initialized.
If I run this code with the un-initialized variable sometimes it works and some other it doesn't.
How do I initialized that pointer and pass it to the function?

you need to allocate buffer big enough to contain the encoded result. Either allocate it on the stack, like this:
unsigned char B64Encoded[256]; // the number here needs to be big enough to hold all possible variations of the argument
But it is easy to cause stack buffer overflow by allocating too little space using this approach. It would be much better if you allocate it in dynamic memory:
int cbEncodedSize = srclen * 4 / 3 + 1; // cbEncodedSize is calculated from the length of the source string
unsigned char *B64Encoded = (unsigned char*)malloc(cbEncodedSize);
Don't forget to free() the allocated buffer after you're done.

It looks like you would want to use something like this:
// allocate 4/3 bytes per source character, plus one for the null terminator
unsigned char *B64Encoded = malloc(srclen*4/3+1);
Base64Enc(src, srclen, B64Encoded);

It would help if you provided the error.
I can, with your function above, to this successfully:
int main() {
unsigned char *B64Encoded;
B64Encoded = (unsigned char *) malloc (1000);
unsigned char *src = "ABC";
Base64Enc(src, 3, B64Encoded);
}
You definitely need to malloc space for the data. You also need to malloc more space than src (1/4 more I believe).

A base64 encoded string has four bytes per three bytes in-data string, so if srclen is 300 bytes (or characters), the length for the base64 encoded string is 400.
Wikipedia has a brief but quite good article about it.
So, rounding up srclen to the nearest tuple of three, divided by three, times four should be exactly enough memory.

I see a problem in your code in the fact that it may access the byte after the trailing null char, for instance if the string length is one char. The behavior is then undefined and may result in a thrown exception if buffer boundary checking is activated.
This may explain the message related to accessing uninitialized memory.
You should then change your code so that you handle the trailing chars separately.
int len = (scrlen/3)*3;
for( int i = 0; i < len; i += 3 )
{
// your current code here, it is ok with this loop condition.
}
// Handle 0 bits padding if required
if( len != srclen )
{
// add new code here
}
...
PS: Here is a wikipedia page describing Base64 encoding.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight