Optimization Tips

Optimization Tips - c

int *s;
allocate memory for s[100];
void func (int *a, int *b)
{
int i;
for (i = 0; i < 100; i++)
{
s[i] = a[i] ^ b[i];
}
}
Assume that this particular code snippet is being called 1000 times, and this is the most time consuming operation in my code. Also assume that addresses of a and b is changed every time. 's' is a global variable which is updated with different sets of values of a & b.
As far as I assume, the main performance bottleneck would be memory access, because the only other operation is XOR, which is very trivial.
Would you please suggest how can I optimize my code in the best possible way?
the question I really wanted to ask, but I think it didn't get properly conveyed is, let for example this for loop contains 10 such XOR operations, the loop count is 100 and the function is called 1000 times, the point is high memory access..If the code is to be executed on a single core machine, what are scopes for improvement?

I've tested proposed solutions, and other two. I was not able to test onemasse' proposal as the result saved to s[] was not correct. I was not able to fix it too. I had to do some changes on moonshadow code. The measurement unit is clock cycles, so lower is better.
Original code:
#define MAX 100
void inline STACKO ( struct timespec *ts, struct timespec *te ){
int i, *s, *a, *b;
for (i = 0; i < MAX; ++i){
s = (int *) malloc (sizeof (int)); ++s;
a = (int *) malloc (sizeof (int)); ++a;
b = (int *) malloc (sizeof (int)); ++b;
}
srand ( 1024 );
for (i = 0; i < MAX; ++i){
a[i] = ( rand() % 2 );
b[i] = ( rand() % 2 );
}
rdtscb_getticks ( ts ); /* start measurement */
for (i = 0; i < MAX; i++)
s[i] = a[i] ^ b[i];
rdtscb_getticks ( te ); /* end measurement */
/*
printf("\n");
for (i = 0; i < MAX; ++i)
printf("%d", s[i]);
printf("\n");
*/
}
New proposal 1: register int
From:
int i, *s, *a, *b;
To:
register int i, *s, *a, *b;
New proposal 2: No array notation
s_end = &s[MAX];
for (s_ptr = &s[0], a_ptr = &a[0], b_ptr = &b[0]; \
s_ptr < s_end; \
++s_ptr, ++a_ptr, ++b_ptr){
*s_ptr = *a_ptr ^ *b_ptr;
}
moonshadow proposed optimization:
s_ptr = &s[0];
a_ptr = &a[0];
b_ptr = &b[0];
for (i = 0; i < (MAX/4); i++){
s_ptr[0] = a_ptr[0] ^ b_ptr[0];
s_ptr[1] = a_ptr[1] ^ b_ptr[1];
s_ptr[2] = a_ptr[2] ^ b_ptr[2];
s_ptr[3] = a_ptr[3] ^ b_ptr[3];
s_ptr+=4; a_ptr+=4; b_ptr+=4;
}
moonshadow proposed optimization + register int:
From:
int i, *s, ...
To:
register int i, *s, ...
Christoffer proposed optimization:
#pragma omp for
for (i = 0; i < MAX; i++)
{
s[i] = a[i] ^ b[i];
}
Results:
Original Code 1036.727264
New Proposal 1 611.147928
New proposal 2 450.788845
moonshadow 713.3845
moonshadow2 452.481192
Christoffer 1054.321943
There is other simple way of optimizing the resulting binary. Passing -O2 to gcc tells that you want optimization. To know exactly what -O2 does, refer to gcc man page.
After enabling -O2:
Original Code 464.233031
New Proposal 1 452.620255
New proposal 2 454.519383
moonshadow 428.651083
moonshadow2 419.317444
Christoffer 452.079057
Source codes available at: http://goo.gl/ud52m

Don't use the loop variable to index.
Unroll the loop.
for (i = 0; i < (100/4); i++)
{
s[0] = a[0] ^ b[0];
s[1] = a[1] ^ b[1];
s[2] = a[2] ^ b[2];
s[3] = a[3] ^ b[3];
s+=4; a+=4; b+=4;
}
Work out how to perform SIMD XOR on your platform.
Performing these XORs as an explicit step is potentially more expensive than doing them as part of another calculation: you're having to read from a and b and store the result in s - if s is read again for more calculation, you'd save a read and a write per iteration, and all the function call and loop overhead, by doing the XOR there instead; likewise, if a and b are outputs of some other functions, you do better by performing the XOR at the end of one of those functions.

int *s;
allocate memory for s[100];
void func (int *a, int *b)
{
int i;
#pragma omp for
for (i = 0; i < 100; i++)
{
s[i] = a[i] ^ b[i];
}
}
Of course, for only a hundred elements you might not see any particular improvement :-)

Just a guess here. If this is a cache issue you could try this:
int *s;
allocate memory for s[100];
void func (int *a, int *b)
{
int i;
memcpy( s, a, 100 );
for (i = 0; i < 100; i++)
{
s[i] = s[i] ^ b[i];
}
}
The memcpy, although it's a function call will often be inlined by the compiler if the size argument is a constant. Loop unrolling will probably not help here as it can be done automatically by the compiler. But you shouldn't take my word for it, verify on your platform.

Related

stack corruption by eliminate a printf()

some time ago I tried to program a Mergesort. In some point I got an error that I was able to solve, but still saved the code because have something strange that i don't understand. The code is the following:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
typedef int elem;
void mergesort(elem * arr, unsigned n){
if(n != 1){
if(n == 2){
if(arr[0] > arr[1]){
int change = arr[0];
arr[0] = arr[1];
arr[1] = change;
}
}else{
unsigned i = 0, j = 0, mit = (n+1)>>2, fin = n>>2;
elem * arr_2 = (elem *)malloc(sizeof(elem) * n), * mit_arr = arr+mit;
mergesort(arr, mit);
mergesort(mit_arr, fin);
while(i < mit || j < fin){
if(arr[i] <= mit_arr[j]){
arr_2[j+i] = arr[i];
i++;
}else{
arr_2[j+i] = mit_arr[j];
j++;
}
}
for(i=0; i<n; i++)
arr[i] = arr_2[i];
free(arr_2);
}
}
}
int main(){
unsigned a = 10;
int i;
elem * arr = (elem*)malloc(sizeof(elem) * a);
arr[0] = 12;
arr[1] = 3;
arr[2] = -3;
arr[3] = 22;
arr[4] = 12;
arr[5] = 11;
arr[6] = 4;
arr[7] = 9;
arr[8] = 10;
arr[9] = 2;
printf("something\n"); // 1
mergesort(arr, a);
printf("\n");
for(i=0; i<a; i++){
printf("%d, ", arr[i]);
}
printf("\n");
free(arr);
return 0;
}
The thing is that, despite the fact that the code doesn't do what I want and it seems like there is no error, if I comment out the line marked by 1 (printf("something\\n"); ) then the error "malloc(): corrupted top size" appears. I actually don't know why something like that is possible, so I came here to see if someone have an explanation.
I tried to debug the program with gdb and got the same error, but have more information:
Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
but still without idea of what happened.

There are a couple of major issues with your code. These are things you should be able to test yourself by checking whether the values being used by your function are what you expect.
Here's the first issue. You have calculated the length of the first half of the array as (n+1)>>2, and then you assume the length of the remaining array is n>>2. That is simply not true. Shifting a value to the right by 2 binary places is a division by four.
It is far better to use "normal" math instead of attempting to be clever. This reduces the chance for errors, and makes your code easier to read.
unsigned mit = n / 2, fin = n - mit;
The other issue is your merge. You have made your while-loop run until both i and j are out-of-range. But in your loop, it's guaranteed that on at least one of the iterations, one of those values will be out-of-range.
A better way to merge arrays uses three loops. The first one runs until either of the arrays has been merged in, and the remaining two loops will copy the remaining part of the other array.
unsigned x = 0, i = 0, j = 0;
while(i < mit && j < fin){
if(arr[i] <= mit_arr[j]){
arr_2[x++] = arr[i++];
}else{
arr_2[x++] = mit_arr[j++];
}
}
while(i < mit){
arr_2[x++] = arr[i++];
}
while(j < fin){
arr_2[x++] = mit_arr[j++];
}

When to use variable length array in C, but when a dynamic allocation?

I find out about Variable Length Array in C99, but it looks like it behave almost the same as malloc + free.
The practical differences I found:
Too big array handling:
unsigned size = 4000000000;
int* ptr = malloc(size); // ptr is 0, program doesn't crash
int array[size]; // segmentation fault, program crashes
Memory leaks: only possible in dynamic array allocation:
int* ptr = malloc(size);
...
if(...)
return;
...
free(ptr);
Life of object and possibility to return from function: dynamically allocated array lives until the memory is frees and can be returned from function which allocated the memory.
Resizing: resizing possible only with pointers to allocated memory.
My questions are:
What are more differences (I'm interested in practical advice)?
What are more problems a programmer can have with both ways of arrays with variable length?
When to choose VLA but when dynamic array allocation?
What is faster: VLA or malloc+free?

Some practical advices:
VLAs are in practice located on the space-limited stack, while malloc() and its friends allocates on the heap, that is likely to allow bigger allocations. Moreveover you have more control on that process, as malloc() could return NULL if it fails. In other words you have to be careful with VLA not-to-blow your stack in runtine.
Not all compilers support VLA, e.g. Visual Studio. Moreover C11 marked them as optional feature and allows not to support them when __STDC_NO_VLA__ macro is defined.
From my experience (numerical programs like finding prime numbers with trial division, Miller-Rabin etc.) I wouldn't say that VLAs are any faster than malloc(). There is some overhead of malloc() call of course, but what seems to be more important is data access efficiency.
Here is some quick & dirty comparison using GNU/Linux x86-64 and GCC compiler. Note that results may vary from platform to another or even compiler's version. You might use as some basic (though very far of being complete) data-access malloc() vs VLA benchmark.
prime-trial-gen.c:
#include <assert.h>
#include <stdbool.h>
#include <stdio.h>
bool isprime(int n);
int main(void)
{
FILE *fp = fopen("primes.txt", "w");
assert(fp);
fprintf(fp, "%d\n", 2);
for (int i = 3; i < 10000; i += 2)
if (isprime(i))
fprintf(fp, "%d\n", i);
fclose(fp);
return 0;
}
bool isprime(int n)
{
if (n % 2 == 0)
return false;
for (int i = 3; i * i <= n; i += 2)
if (n % i == 0)
return false;
return true;
}
Compile & run:
$ gcc -std=c99 -pedantic -Wall -W prime-trial-gen.c
$ ./a.out
Then here is second program, that take use of generated "primes dictionary":
prime-trial-test.c:
#include <assert.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
bool isprime(int n, int pre_prime[], int num_pre_primes);
int get_num_lines(FILE *fp);
int main(void)
{
FILE *fp = fopen("primes.txt", "r");
assert(fp);
int num_lines = get_num_lines(fp);
rewind(fp);
#if WANT_VLA
int pre_prime[num_lines];
#else
int *pre_prime = malloc(num_lines * sizeof *pre_prime);
assert(pre_prime);
#endif
for (int i = 0; i < num_lines; i++)
assert(fscanf(fp, "%d", pre_prime + i));
fclose(fp);
/* NOTE: primes.txt holds primes <= 10 000 (10**4), thus we are safe upto 10**8 */
int num_primes = 1; // 2
for (int i = 3; i < 10 * 1000 * 1000; i += 2)
if (isprime(i, pre_prime, num_lines))
++num_primes;
printf("pi(10 000 000) = %d\n", num_primes);
#if !WANT_VLA
free(pre_prime);
#endif
return 0;
}
bool isprime(int n, int pre_prime[], int num_pre_primes)
{
for (int i = 0; i < num_pre_primes && pre_prime[i] * pre_prime[i] <= n; ++i)
if (n % pre_prime[i] == 0)
return false;
return true;
}
int get_num_lines(FILE *fp)
{
int ch, c = 0;
while ((ch = fgetc(fp)) != EOF)
if (ch == '\n')
++c;
return c;
}
Compile & run (malloc version):
$ gcc -O2 -std=c99 -pedantic -Wall -W prime-trial-test.c
$ time ./a.out
pi(10 000 000) = 664579
real 0m1.930s
user 0m1.903s
sys 0m0.013s
Compile & run (VLA version):
$ gcc -DWANT_VLA=1 -O2 -std=c99 -pedantic -Wall -W prime-trial-test.c
ime ./a.out
pi(10 000 000) = 664579
real 0m1.929s
user 0m1.907s
sys 0m0.007s
As you might check π(10**7) is indeed 664,579. Notice that both execution times are almost the same.

One advantage of VLAs is that you can pass variably-dimensioned arrays to functions, which can be handy when dealing with (sanely sized) matrices, for example:
int n = 4;
int m = 5;
int matrix[n][m];
// …code to initialize matrix…
another_func(n, m, matrix);
// No call to free()
where:
void another_func(int n, int m, int matrix[n][m])
{
int sum = 0;
for (int i = 0; i < n; i++)
{
for (int j = 0; j < m; j++)
{
// …use matrix just like normal…
sum += matrix[i][j];
}
}
// …do something with sum…
}
This is particularly valuable since the alternatives using malloc() without using VLA as well mean that you either have to do subscript calculations manually in the called function, or you have to create a vector of pointers.
Manual subscript calculations
int n = 4;
int m = 5;
int *matrix = malloc(sizeof(*matrix) * n * m);
// …code to initialize matrix…
another_func2(n, m, matrix);
free(matrix);
and:
void another_func2(int n, int m, int *matrix)
{
int sum = 0;
for (int i = 0; i < n; i++)
{
for (int j = 0; j < m; j++)
{
// …do manual subscripting…
sum += matrix[i * m + j];
}
}
// …do something with sum…
}
Vector of pointers
int n = 4;
int m = 5;
int **matrix = malloc(sizeof(*matrix) * n);
for (int i = 0; i < n; i++)
matrix[i] = malloc(sizeof(matrix[i] * m);
// …code to initialize matrix…
another_func2(n, m, matrix);
for (int i = 0; i < n; i++)
free(matrix[i]);
free(matrix);
and:
void another_func3(int n, int m, int **matrix)
{
int sum = 0;
for (int i = 0; i < n; i++)
{
for (int j = 0; j < m; j++)
{
// …use matrix 'just' like normal…
// …but there is an extra pointer indirection hidden in this notation…
sum += matrix[i][j];
}
}
// …do something with sum…
}
This form can be optimized to two allocations:
int n = 4;
int m = 5;
int **matrix = malloc(sizeof(*matrix) * n);
int *values = malloc(sizeof(*values) * n * m);
for (int i = 0; i < n; i++)
matrix[i] = &values[i * m];
// …code to initialize matrix…
another_func2(n, m, matrix);
free(values);
free(matrix);
Advantage VLA
There is less bookkeeping work to do when you use VLAs. But if you need to deal with preposterously sized arrays, malloc() still scores. You can use VLAs with malloc() et al if you're careful — see calloc() for an array of array with negative index in C for an example.

fill character array by blocks

question of curiosity
suppose I have:
int main(void)
{
char str[32];
for (i = 0; i < 32; i++)
str[i] = 0;
}
but I want to do it 4x faster
int main(void)
{
char str[32];
for (i = 0; i < 32 / 4; i += 4)
str[i] = (int)0;
}
I expect that the whole array will be filled with zeros.
but array is not filled by zeroes
my questions: why array not filled by zeroes? how to fill array per int blocks? my question is Research for c feature, how to tell the compiler - write blocks of 4 bytes, ie integer registers, it will reduce the number of memory accesses by 4 times, on x64 processors reduce 8 times
thanks for all, follows work well:
int main(int argc, char *argv[])
{
char str[32];
int i;
for (i = 0; i < 32; i++)
str[i] = 12;
for (i = 0; i < 32 / sizeof(int); i++)
((int *) str)[i] = 0;
printf("%d\n", i);
for (i = 0; i < 32; i++)
printf("%d\n", str[i]);
return 0;
}

The correct and fastest way is to initialize the array to zero
char str[32] = { 0 } ;
If you want to set the array to zero afterwards, then use memset and enable compiler optimization and intrinsic functions, and the compiler will figure out the fastest way to zero the array.
memset( str , 0 , sizeof( str ) ) ;

As pointed out by #user2501, initializing to {0} or using memset is the fastest and correct way.
If you are tempted to use something like ((int *)str)[i] = 0, do not, this can result in unaligned access.
As an alternative to memset, in C99 (and assuming that int is 4 bytes) you can use the type punning feature of unions:
#include <stdio.h>
typedef union {
char as_string[32];
int as_int[8];
} foo;
int main(void)
{
foo x;
int i;
for (i = 0; i < 8; i++)
x.as_int[i] = 0;
for (i = 0; i < 32; i++)
printf("%d", x.as_string[i]);
printf("\n");
return 0;
}
Output:
00000000000000000000000000000000

Seems that you are forced to use pointers and casts, this version uses the heap in order to allocate str on an address properly aligned for int:
#include <stdio.h>
#include <stdlib.h> /* malloc, free */
#include <stdint.h> /* intptr_t (pointer arithmetic using modulo division) */
#define MAX 32
int main(void)
{
size_t i, align;
char *ptr, *str;
align = __alignof__(int);
ptr = malloc(MAX + align); /* MAX + max align distance */
str = ptr + align - (intptr_t)ptr % align; /* now str is properly aligned */
for (i = 0; i < MAX / sizeof(int); i++)
((int *)str)[i] = 0;
for (i = 0; i < MAX; i++)
printf("%d\n", str[i]);
free(ptr);
return 0;
}
Note that __alignof__ is a gcc extension, change to __alignof if you are under Visual Studio or _Alignof if you are comfortable with C11.

You have to cast the pointer as the int, not the value:
int main(void)
{
char str[32];
for (i = 0; i < 32/sizeof(int); i++)
((int *) str)[i] = 0xAABBCCDD; //Be careful of Endianness
}
Alternatively:
int i;
union block_fill
{
char arr[24];
int iarr[6];
};
union block_fill block_arr;
for(i=0; i<6; i++)
block_arr.iarr[i] = 0x11223344;
for(i=0; i<24; i++)
printf("%x", block_arr.arr[i]);

C Allocating array of 500 and more longs

So.. I have something like this. It is supposed to create arrays with 10, 20, 50 100 .. up to 5000 random numbers that then sorts with Insertion Sort and prints out how many comparisions and swaps were done .. However, I am getting a runtime exception when I reach 200 numbers large array .. "Access violation writing location 0x00B60000." .. Sometimes I don't even reach 200 and stop right after 10 numbers. I have literally no idea.
long *arrayIn;
int *swap_count = (int*)malloc(sizeof(int)), *compare_count = (int*)malloc(sizeof(int));
compare_count = 0;
swap_count = 0;
int i, j;
for (j = 10; j <= 1000; j*=10) {
for (i = 1; i <= 5; i++){
if (i == 1 || i == 2 || i == 5) {
int n = i * j;
arrayIn = malloc(sizeof(long)*n);
fill_array(&arrayIn, n);
InsertionSort(&arrayIn, n, &swap_count, &compare_count);
print_array(&arrayIn, n, &swap_count, &compare_count);
compare_count = 0;
swap_count = 0;
free(arrayIn);
}
}
}
EDIT: ok with this free(arrayIn); I get this " Stack cookie instrumentation code detected a stack-based buffer overrun." and I get nowhere. However without it it's "just" "Access violation writing location 0x00780000." but i get up to 200numbers eventually
void fill_array(int *arr, int n) {
int i;
for (i = 0; i < n; i++) {
arr[i] = (RAND_MAX + 1)*rand() + rand();
}
}
void InsertionSort(int *arr, int n, int *swap_count, int *compare_count) {
int i, j, t;
for (j = 0; j < n; j++) {
(*compare_count)++;
t = arr[j];
i = j - 1;
*swap_count = *swap_count + 2;
while (i >= 0 && arr[i]>t) { //tady chybí compare_count inkrementace
*compare_count = *compare_count + 2;
arr[i + 1] = arr[i];
(*swap_count)++;
i--;
(*swap_count)++;
}
arr[i + 1] = t;
(*swap_count)++;
}
}

I am sure your compiler told you what was wrong.
You are passing a long** to a function that expects a int* at the line
fill_array(&arrayIn, n);
function prototype is
void fill_array(int *arr, int n)
Same problem with the other function. From there, anything can happen.
Always, ALWAYS heed the warnings your compiler gives you.
MAJOR EDIT
First - yes, the name of an array is already a pointer.
Second - declare a function prototype at the start of your code; then the compiler will throw you helpful messages which will help you catch these
Third - if you want to pass the address of a simple variable to a function, there is no need for a malloc; just use the address of the variable.
Fourth - the rand() function returns an integer between 0 and RAND_MAX. The code
a[i] = (RAND_MAX + 1) * rand() + rand();
is a roundabout way of getting
a[i] = rand();
since (RAND_MAX + 1) will overflow and give you zero... If you actually wanted to be able to get a "really big" random number, you would have to do the following:
1) make sure a is a long * (with the correct prototypes etc)
2) convert the numbers before adding / multiplying:
a[i] = (RAND_MAX + 1L) * rand() + rand();
might do it - or maybe you need to do some more casting to (long); I can never remember my order of precedence so I usually would do
a[i] = ((long)(RAND_MAX) + 1L) * (long)rand() + (long)rand();
to be 100% sure.
Putting these and other lessons together, here is an edited version of your code that compiles and runs (I did have to "invent" a print_array) - I have written comments where the code needed changing to work. The last point above (making long random numbers) was not taken into account in this code yet.
#include <stdio.h>
#include <stdlib.h>
// include prototypes - it helps the compiler flag errors:
void fill_array(int *arr, int n);
void InsertionSort(int *arr, int n, int *swap_count, int *compare_count);
void print_array(int *arr, int n, int *swap_count, int *compare_count);
int main(void) {
// change data type to match function
int *arrayIn;
// instead of mallocing, use a fixed location:
int swap_count, compare_count;
// often a good idea to give your pointers a _p name:
int *swap_count_p = &swap_count;
int *compare_count_p = &compare_count;
// the pointer must not be set to zero: it's the CONTENTs that you set to zero
*compare_count_p = 0;
*swap_count_p = 0;
int i, j;
for (j = 10; j <= 1000; j*=10) {
for (i = 1; i <= 5; i++){
if (i == 1 || i == 2 || i == 5) {
int n = i * j;
arrayIn = malloc(sizeof(long)*n);
fill_array(arrayIn, n);
InsertionSort(arrayIn, n, swap_count_p, compare_count_p);
print_array(arrayIn, n, swap_count_p, compare_count_p);
swap_count = 0;
compare_count = 0;
free(arrayIn);
}
}
}
return 0;
}
void fill_array(int *arr, int n) {
int i;
for (i = 0; i < n; i++) {
// arr[i] = (RAND_MAX + 1)*rand() + rand(); // causes integer overflow
arr[i] = rand();
}
}
void InsertionSort(int *arr, int n, int *swap_count, int *compare_count) {
int i, j, t;
for (j = 0; j < n; j++) {
(*compare_count)++;
t = arr[j];
i = j - 1;
*swap_count = *swap_count + 2;
while (i >= 0 && arr[i]>t) { //tady chybí compare_count inkrementace
*compare_count = *compare_count + 2;
arr[i + 1] = arr[i];
(*swap_count)++;
i--;
(*swap_count)++;
}
arr[i + 1] = t;
(*swap_count)++;
}
}
void print_array(int *a, int n, int* sw, int *cc) {
int ii;
for(ii = 0; ii < n; ii++) {
if(ii%20 == 0) printf("\n");
printf("%d ", a[ii]);
}
printf("\n\nThis took %d swaps and %d comparisons\n\n", *sw, *cc);
}

You are assigning the literal value 0 to some pointers. You are also mixing "pointers" with "address-of-pointers"; &swap_count gives the address of the pointer, not the address of its value.
First off, no need to malloc here:
int *swap_count = (int*)malloc(sizeof(int)) ..
Just make an integer:
int swap_coint;
Then you don't need to do
swap_coint = 0;
to this pointer (which causes your errors). Doing so on a regular int variable is, of course, just fine.
(With the above fixed, &swap_count ought to work, so don't change that as well.)

As I told in the comments, you are passing the addresses of pointers, which point to an actual value.
With the ampersand prefix (&) you are passing the address of something.
You only use this when you pass a primitive type.
E.g. filling the array by passing an int. But you are passing pointers, so no need to use ampersand.
What's actually happening is that you are looking in the address space of the pointer, not the actual value the pointer points to in the end. This causes various memory conflicts.
Remove all & where you are inputting pointers these lines:
fill_array(&arrayIn, n);
InsertionSort(&arrayIn, n, &swap_count, &compare_count);
print_array(&arrayIn, n, &swap_count, &compare_count);
So it becomes:
fill_array(arrayIn, n);
InsertionSort(arrayIn, n, swap_count, compare_count);
print_array(arrayIn, n, swap_count, compare_count);
I also note that you alloc memory for primitive types, which could be done way simpler:
int compare_count = 0;
int swap_count = 0;
But if you choose to use the last block of code, DO use &swap_count and &compare_count since you are passing primitive types, not pointers!

find number of rows in a 2D char array

How to find number of rows in dynamic 2D char array in C?
Nothing from there.
tried with following code
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int k = 97;
void foo(char **a)
{
int i = 0;
for(i=0; a[i] != NULL; ++i)
printf("i = %d\n", i);
}
void strcpyo(char* a, char*b){
int i=0;
for(i=0;b[i]!='\0';i++){
a[i]=b[i];
}
a[i]='\0';
}
void strcpym(char* a, char*b){
int i=0;
for(i=0;b[i]!='\0';i++);
memcpy(a,b,i+1);
}
void freee(char** ptr){
int i;
for(i = 0;i < k; ++i)
{
free(ptr[i] );
}
free(ptr);
}
void alloc(char ***p)
{
*p = (char **)malloc(k * sizeof(char *));
int i,j;
for(j=0;j<k;j++)
{
// for(i = 0;i < j; ++i)
{
(*p)[j] = (char *)malloc(11 * sizeof(char));
strcpy((*p)[j],"paicharan");
}
//printf("j = %d ", j);
//foo(p);
}
}
int main()
{
char **p;
alloc(&p);
#if 0
char **p = (char **)malloc(k * sizeof(char *));
int i,j;
for(j=0;j<k;j++)
{
for(i = 0;i < j; ++i)
{
p[i] = (char *)malloc(11 * sizeof(char));
strcpy(p[i],"paicharan");
}
printf("j = %d ", j);
foo(p);
}
#endif
foo(p);
freee(p);
return 0;
}
The code in #if 0 #endif works perfectly, but if I do create arrays in function alloc(char**) it's giving the wrong answer for odd number of rows in array. Can anybody explain why?
ie. for k= odd number it gives out wrong answer but for even number its correct.

Your code depends on Undefined Behaviour to work correctly i.e. it'll work only by chance. This has got nothing to do with even or odd count of elements.
In the void alloc(char ***p) function you allocate memory for k pointer to pointer to char: char**. Then you fill all of the k pointers with new valid char* pointers i.e. none of them are NULL. Later in void foo(char **a) you do for(i=0; a[i] != NULL; ++i); since a[k - 1] was non-null, it'll iterate over them correctly. BUT after that a[k] may or may not be NULL, you never know what is in there. Also accessing what is beyond the array you allocated is undefined behaviour (due to out of bounds access).
Making k + 1 elements and setting the kth element to NULL makes this work; make sure you free all of k + 1 elements and not leak the last sentinal element.
Since you told that the code wraped inside the macro works fine, I've ignored that; don't know if there's UB there too. If you're doing this exercise to learn, it's fine. If you are planning to do some other project, try to reuse some existing C library which already gives these facilities.