efficiency int versus long long assignment

efficiency int versus long long assignment - c

If I need to assign zeros to a chunk of memory. If the architecture is 32bits can assignment of long long (which is 8 bytes on particular architecture) be more efficient then assignment of int (which is 4 bytes), or will it be equal to two int assignments? And will the assignment of int be more efficient then assignment using char for the same chunk of memory since I would need to loop 4 times as many times if I use char versus int

Why not use memset() ?
http://www.elook.org/programming/c/memset.html
(from above site)
Syntax:
#include <string.h>
void *memset( void *buffer, int ch, size_t count );
Description:
The function memset() copies ch into the first count characters of buffer, and returns buffer. memset() is useful for intializing a section of memory to some value. For example, this command:
memset( the_array, '\0', sizeof(the_array) );
is a very efficient way to set all values of the_array to zero.

To your questions, the answers would be yes and yes, if the compiler is smart/optimizes.
Interesting note that on machines that have SSE we can work with 128 bit chunks :) still, and this is just my opinion, always try to emphasize readability balanced with conciseness so yeah ... I tend to use memset, its not always perfect, and may not be the fastest but it tells the person maintaining the code "hey Im initializing or setting this array"
anyway here some test code, if it needs any corrections let me know.
#include <time.h>
#include <xmmintrin.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define NUMBER_OF_VALUES 33554432
int main()
{
int *values;
int result = posix_memalign((void *)&values, 16, NUMBER_OF_VALUES * sizeof(int));
if (result)
{
printf("Failed to mem allocate \n");
exit(-1);
}
clock_t start, end;
int *temp = values, total = NUMBER_OF_VALUES;
while (total--)
*temp++ = 0;
start = clock();
memset(values, 0, sizeof(int) * NUMBER_OF_VALUES);
end = clock();
printf("memset time %f\n", ((double) (end - start)) / CLOCKS_PER_SEC);
start = clock();
{
int index = 0, total = NUMBER_OF_VALUES * sizeof(int);
char *temp = (char *)values;
for(; index < total; index++)
temp[index] = 0;
}
end = clock();
printf("char-wise for-loop array indices time %f\n", ((double) (end - start)) / CLOCKS_PER_SEC);
start = clock();
{
int index = 0, *temp = values, total = NUMBER_OF_VALUES;
for (; index < total; index++)
temp[index] = 0;
}
end = clock();
printf("int-wise for-loop array indices time %f\n", ((double) (end - start)) / CLOCKS_PER_SEC);
start = clock();
{
int index = 0, total = NUMBER_OF_VALUES/2;
long long int *temp = (long long int *)values;
for (; index < total; index++)
temp[index] = 0;
}
end = clock();
printf("long-long-int-wise for-loop array indices time %f\n", ((double) (end - start)) / CLOCKS_PER_SEC);
start = clock();
{
int index = 0, total = NUMBER_OF_VALUES/4;
__m128i zero = _mm_setzero_si128();
__m128i *temp = (__m128i *)values;
for (; index < total; index++)
temp[index] = zero;
}
end = clock();
printf("SSE-wise for-loop array indices time %f\n", ((double) (end - start)) / CLOCKS_PER_SEC);
start = clock();
{
char *temp = (char *)values;
int total = NUMBER_OF_VALUES * sizeof(int);
while (total--)
*temp++ = 0;
}
end = clock();
printf("char-wise while-loop pointer arithmetic time %f\n", ((double) (end - start)) / CLOCKS_PER_SEC);
start = clock();
{
int *temp = values, total = NUMBER_OF_VALUES;
while (total--)
*temp++ = 0;
}
end = clock();
printf("int-wise while-loop pointer arithmetic time %f\n", ((double) (end - start)) / CLOCKS_PER_SEC);
start = clock();
{
long long int *temp = (long long int *)values;
int total = NUMBER_OF_VALUES/2;
while (total--)
*temp++ = 0;
}
end = clock();
printf("long-ling-int-wise while-loop pointer arithmetic time %f\n", ((double) (end - start)) / CLOCKS_PER_SEC);
start = clock();
{
__m128i zero = _mm_setzero_si128();
__m128i *temp = (__m128i *)values;
int total = NUMBER_OF_VALUES/4;
while (total--)
*temp++ = zero;
}
end = clock();
printf("SSE-wise while-loop pointer arithmetic time %f\n", ((double) (end - start)) / CLOCKS_PER_SEC);
free(values);
return 0;
}
here are some tests:
$ gcc time.c
$ ./a.out
memset time 0.025350
char-wise for-loop array indices time 0.334508
int-wise for-loop array indices time 0.089259
long-long-int-wise for-loop array indices time 0.046997
SSE-wise for-loop array indices time 0.028812
char-wise while-loop pointer arithmetic time 0.271187
int-wise while-loop pointer arithmetic time 0.072802
long-ling-int-wise while-loop pointer arithmetic time 0.039587
SSE-wise while-loop pointer arithmetic time 0.030788
$ gcc -O2 -Wall time.c
MacBookPro:~ samyvilar$ ./a.out
memset time 0.025129
char-wise for-loop array indices time 0.084930
int-wise for-loop array indices time 0.025263
long-long-int-wise for-loop array indices time 0.028245
SSE-wise for-loop array indices time 0.025909
char-wise while-loop pointer arithmetic time 0.084485
int-wise while-loop pointer arithmetic time 0.025277
long-ling-int-wise while-loop pointer arithmetic time 0.028187
SSE-wise while-loop pointer arithmetic time 0.025823
my info:
$ gcc --version
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ uname -a
Darwin MacBookPro 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun 7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386
memset is quite optimize probably using inline assembly though again this varies from compiler to compiler ...
gcc seems to be optimizing quite aggressively when giving -O2 some of the timings start converging I guess I should take a look at the assembly.
If you are curios just call gcc -S -msse2 -O2 -Wall time.c and the assembly is at time.s

Always avoid additional iterations in higher-level programming languages. Your code will be more efficient if you just iterate once over the int, instead of looping over its bytes.

Assignment optimizations are done on most architectures so they are aligned to the word size which is 4 bytes for 32 bit x86. So assigning memory of the same size doesn't matter (no difference between memset of 1MB worth of longs and 1MB worth of char types).

1. long long(8 bytes) vs two int(4 bytes) - Its better to go for long long. Because performance will be good in assigning one 8byte element rather than two 4 byte element.
2. int (4 bytes) vs four char(1 bytes) - Its better to go for int here.
If you are declaring only one element then you can directly assign zero like below.
long long a;
int b;
....
a = 0; b = 0;
But if you are declaring array of n elements then go for memeset function like below.
long long a[10];
int b[20];
....
memset(a, 0, sizeof(a));
memset(b, 0, sizeof(b));
If you want initalize during declaration itself, then no need of memset.
long long a = 0;
int b = 0;
or
long long a[10] = {0};
int b[20] = {0};

Related

Why does OpenMP speed up a SINGLE-ITERATION loop?

I'm using the "read" benchmark from Why is writing to memory much slower than reading it?, and I added just two lines:
#pragma omp parallel for
for(unsigned dummy = 0; dummy < 1; ++dummy)
They should have no effect, because OpenMP should only parallelize the outer loop, but the code now consistently runs twice faster.
Update: These lines aren't even necessary. Simply adding
omp_get_num_threads();
(implicitly declared) in the same place has the same effect.
Complete code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
unsigned long do_xor(const unsigned long* p, unsigned long n)
{
unsigned long i, x = 0;
for(i = 0; i < n; ++i)
x ^= p[i];
return x;
}
int main()
{
unsigned long n, r, i;
unsigned long *p;
clock_t c0, c1;
double elapsed;
n = 1000 * 1000 * 1000; /* GB */
r = 100; /* repeat */
p = calloc(n/sizeof(unsigned long), sizeof(unsigned long));
c0 = clock();
#pragma omp parallel for
for(unsigned dummy = 0; dummy < 1; ++dummy)
for(i = 0; i < r; ++i) {
p[0] = do_xor(p, n / sizeof(unsigned long)); /* "use" the result */
printf("%4ld/%4ld\r", i, r);
fflush(stdout);
}
c1 = clock();
elapsed = (c1 - c0) / (double)CLOCKS_PER_SEC;
printf("Bandwidth = %6.3f GB/s (Giga = 10^9)\n", (double)n * r / elapsed / 1e9);
free(p);
}
Compiled and executed with
gcc -O3 -Wall -fopenmp single_iteration.c && time taskset -c 0 ./a.out
The wall time reported by time is 3.4s vs 7.5s.
GCC 7.3.0 (Ubuntu)

The reason for the performance difference is not actually any difference in code, but in how memory is mapped. In the fast case you are reading from zero-pages, i.e. all virtual addresses are mapped to a single physical page - so nothing has to be read from memory. In the slow case, it is not zeroed. For details see this answer from a slightly different context.
On the other side, it is not caused by calling omp_get_num_threads or the pragma itstelf, but merely linking to the OpenMP runtime library. You can confirm that by using -Wl,--no-as-needed -fopenmp. If you just specify -fopenmp but don't use it at all, the linker will omit it.
Now unfortunately I am still missing the final puzzle piece: why does linking to OpenMP change the behavior of calloc regarding zero'd pages .

C Program crashes(Segmentation Fault) for large size of input array. How to prevent it without using static/global/malloc?

The following program is to sort a large array of random numbers using heapsort. The output of the program is the total execution time of the recursive heapSort function(in microseconds). The size of the input array is defined by the SIZE macro.
The program works fine for SIZE up to 1 million(1000000). But when I try to execute the program with SIZE 10 million(10000000), the program generates segmentation fault(core dumped).
Note: I have already tried increasing the soft and hard limits of the stack using ulimit -s command on Linux(128 MB). The SEGFAULT still persists.
Please suggest me any alterations to the code needed or any method which will overcome the existing SEGFAULT malady without having to declare the array dynamically or as global/static.
/* Program to implement Heap-Sort algorithm */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
long SIZE = 10000000; // Or #define SIZE 10000000
long heapSize;
void swap(long *p, long *q)
{
long temp = *p;
*p = *q;
*q = temp;
}
void heapify(long A[], long i)
{
long left, right, index_of_max;
left = 2*i + 1;
right = 2*i + 2;
if(left<heapSize && A[left]>A[i])
index_of_max = left;
else
index_of_max = i;
if(right<heapSize && A[right]>A[index_of_max])
index_of_max = right;
if(index_of_max != i)
{
swap(&A[index_of_max], &A[i]);
heapify(A, index_of_max);
}
}
void buildHeap(long A[])
{
long i;
for(i=SIZE/2; i>=0 ; i--)
heapify(A,i);
}
void heapSort(long A[])
{
long i;
buildHeap(A);
for(i=SIZE-1 ; i>=1 ; i--)
{
swap(&A[i], &A[0]);
heapSize--;
heapify(A, 0);
}
}
int main()
{
long i, A[SIZE];
heapSize = SIZE;
struct timespec start, end;
srand(time(NULL));
for(i = 0; i < SIZE; i++)
A[i] = rand() % SIZE;
/*printf("Unsorted Array is:-\n");
for(i = 0; i < SIZE; i++)
printf("%li\n", A[i]);
*/
clock_gettime(CLOCK_MONOTONIC_RAW, &start);//start timer
heapSort(A);
clock_gettime(CLOCK_MONOTONIC_RAW, &end);//end timer
//To find time taken by heapsort by calculating difference between start and stop time.
unsigned long delta_us = (end.tv_sec - start.tv_sec) * 1000000 \
+ (end.tv_nsec - start.tv_nsec) / 1000;
/*printf("Sorted Array is:-\n");
for(i = 0; i < SIZE; i++)
printf("%li\n", A[i]);
*/
printf("Heapsort took %lu microseconds for sorting of %li elements\n",delta_us, SIZE);
return 0;
}

So, once you plan to stick with stack-only approach, you have to understand who is the main consumer(s) of your stack space.
Player #1: Array A[] itself. Depending to the OS/build, it consumes approx. 40 or 80 Mb of stack. One-time only.
Player #2: Beware recursion! In your case, this is heapify() function. Each call consumes decent stack chunk to serve a calling convention, stack alignment like stack-frames etc. If you do that million times and tree-like schema, you have tens of megabytes spent here too. So, you can try to re-implement this function to non-recursive way to decrease stack size pressure.

Segmentation Fault 11 in C caused by larger operation numbers

I have known that when encountered with segmentation fault 11, it means the program has attempted to access an area of memory that it is not allowed to access.
Here I am trying to calculate a Fourier transform, using the following code.
It works well when nPoints = 2^15 (or of course with less points) , however it corrupts when I further increase the points to 2^16. I am wondering, is that caused by occupying too much memory? But I did not notice too much memory occupation during the operation. And although it use recursion, it transforms in-place. I thought it would occupy not so much memory. Then, where's the problem?
Thanks in advance
PS: one thing I forgot to say is, the result above was on Max OS (8G memory).
When I running the code on Windows (16G memory), it corrupts when nPoints = 2^14. So it makes me confused whether it's caused by the memory allocation, as the Windows PC has a larger memory (but it's really hard to say, because the two operation systems utilize different memory strategy).
#include <stdio.h>
#include <tgmath.h>
#include <string.h>
// in place FFT with O(n) memory usage
long double PI;
typedef long double complex cplx;
void _fft(cplx buf[], cplx out[], int n, int step)
{
if (step < n) {
_fft(out, buf, n, step * 2);
_fft(out + step, buf + step, n, step * 2);
for (int i = 0; i < n; i += 2 * step) {
cplx t = exp(-I * PI * i / n) * out[i + step];
buf[i / 2] = out[i] + t;
buf[(i + n)/2] = out[i] - t;
}
}
}
void fft(cplx buf[], int n)
{
cplx out[n];
for (int i = 0; i < n; i++) out[i] = buf[i];
_fft(buf, out, n, 1);
}
int main()
{
const int nPoints = pow(2, 15);
PI = atan2(1.0l, 1) * 4;
double tau = 0.1;
double tSpan = 12.5;
long double dt = tSpan / (nPoints-1);
long double T[nPoints];
cplx At[nPoints];
for (int i = 0; i < nPoints; ++i)
{
T[i] = dt * (i - nPoints / 2);
At[i] = exp( - T[i]*T[i] / (2*tau*tau));
}
fft(At, nPoints);
return 0;
}

You cannot allocate very large arrays in the stack. The default stack size on macOS is 8 MiB. The size of your cplx type is 32 bytes, so an array of 216 cplx elements is 2 MiB, and you have two of them (one in main and one in fft), so that is 4 MiB. That fits on the stack, but, at that size, the program runs to completion when I try it. At 217, it fails, which makes sense because then the program has two arrays taking 8 MiB on stack. The proper way to allocate such large arrays is to include <stdlib.h> and use cmplx *At = malloc(nPoints * sizeof *At); followed by if (!At) { /* Print some error message about being unable to allocate memory and terminate the program. */ }. You should do that for At, T, and out. Also, when you are done with each array, you should free it, as with free(At);.
To calculate an integer power of two, use the integer operation 1 << power, not the floating-point operation pow(2, 16). We have designed pow well on macOS, but, on other systems, it may return approximations even when exact results are possible. An approximate result may be slightly less than the exact integer value, so converting it to an integer truncates to the wrong result. If it may be a power of two larger than suitable for an int, then use (type) 1 << power, where type is a suitably large integer type.

the following, instrumented, code clearly shows that the OPs code repeatedly updates the same locations in the out[] array and actually does not update most of the locations in that array.
#include <stdio.h>
#include <tgmath.h>
#include <assert.h>
// in place FFT with O(n) memory usage
#define N_POINTS (1<<15)
double T[N_POINTS];
double At[N_POINTS];
double PI;
// prototypes
void _fft(double buf[], double out[], int step);
void fft( void );
int main( void )
{
PI = 3.14159;
double tau = 0.1;
double tSpan = 12.5;
double dt = tSpan / (N_POINTS-1);
for (int i = 0; i < N_POINTS; ++i)
{
T[i] = dt * (i - (N_POINTS / 2));
At[i] = exp( - T[i]*T[i] / (2*tau*tau));
}
fft();
return 0;
}
void fft()
{
double out[ N_POINTS ];
for (int i = 0; i < N_POINTS; i++)
out[i] = At[i];
_fft(At, out, 1);
}
void _fft(double buf[], double out[], int step)
{
printf( "step: %d\n", step );
if (step < N_POINTS)
{
_fft(out, buf, step * 2);
_fft(out + step, buf + step, step * 2);
for (int i = 0; i < N_POINTS; i += 2 * step)
{
double t = exp(-I * PI * i / N_POINTS) * out[i + step];
buf[i / 2] = out[i] + t;
buf[(i + N_POINTS)/2] = out[i] - t;
printf( "index: %d buf update: %d, %d\n", i, i/2, (i+N_POINTS)/2 );
}
}
}
Suggest running via (where untitled1 is the name of the executable and on linux)
./untitled1 > out.txt
less out.txt
the out.txt file is 8630880 bytes
An examination of that file shows the lack of coverage and shows that any one entry is NOT the sum of the prior two entries, so I suspect this is not a valid Fourier transform,

segmentation fault calloc in C

I am making a C program that takes the average time of a calloc, malloc, and alloca process. I got everything to compile, but when I run it I get a segmentation fault. The first thing it runs is calloc so I am going to assume the problem starts there.
Here is my calloc function, the malloc and alloca are basically the same so I figure there is no reason to post them yet.
double calloctest(int objectsize, int numberobjects, int numberoftests)
{
double average = 0;
for (int i = 0; i < numberoftests; i++)
{
clock_t begin = clock();
int *objectsize = calloc(numberobjects, sizeof(char) * *objectsize);
clock_t end = clock();
double time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
average = average + time_spent;
printf("%f", time_spent);
free(objectsize);
}
double totalAverage;
totalAverage = average / numberoftests;
return totalAverage;
}

You have a local variable objectsize that shadows the function argument with the same name and you dereference it before storing the return value of calloc():
int *objectsize = calloc(numberobjects, sizeof(char) * *objectsize);
You probably meant to write:
int *object = calloc(numberobjects, objectsize);
...
free(object);

emulating a variable size struct in C; allignment, performance issues

It is possible to put arrays with custom length anywhere in a struct in C, but in that case additional malloc calls are required. Some compilers allow having VLAs anywhere in a struct, but that is not standard compliant. So I decided to emulate VLAs within struct for standard C.
I am in a situation where I really do have to get the maximum performance. The code in C will be automatically generated, so readability or style is not important in this case.
There will be structs with many custom size array members in between static size members. Below is a very simple form of such structs.
struct old_a {
int n_refs;
void **refs;
int count;
};
struct old_a *old_a_new(int n_refs, int count) {
struct old_a *p_a = malloc(sizeof(struct old_a));
p_a->n_refs = n_refs;
p_a->refs = malloc(n_refs * sizeof(void *));
p_a->count = count;
return p_a;
}
#define old_a_delete(p_a) do {\
free(p_a->refs);\
free(p_a);\
} while (0)
The additional malloc call for refs can be avoided as follows.
#define a_get_n_refs(p_a) *(int *)p_a
#define a_set_n_refs(p_a, rval) *(int *)p_a = rval
#define a_get_count(p_a) *(int *)((char *)p_a + sizeof(int) + a_get_n_refs(p_a) * sizeof(void *))
#define a_set_count(p_a, rval) *(int *)((char *)p_a + sizeof(int) + a_get_n_refs(p_a) * sizeof(void *)) = rval
#define a_get_refs(p_a, i) *(void **)((char *)p_a + sizeof(int) + i * sizeof(void *))
#define a_set_refs(p_a, i, rval) *(void **)((char *)p_a + sizeof(int) + i * sizeof(void *)) = rval
static void *a_new(int n_refs, int count) {
void *p_a = malloc(sizeof(int) + n_refs * sizeof(void *) + sizeof(int));
a_set_n_refs(p_a, n_refs);
a_set_count(p_a, count);
return p_a;
}
#define a_delete(p_a) do {\
free(p_a);\
} while (0)
The emulated version seems to run 12~14% faster in my machine than the one with a pointer array. I assume it is due to the halved number of calls to malloc and free, and the reduced number of dereferencing. The test code is below.
int main(int argc, char **argv) {
const int n_as = atoi(argv[1]) * 10000;
const int n_refs = n_as;
const int count = 1;
unsigned int old_sum = 0;
unsigned int sum = 0;
clock_t timer;
timer = clock();
struct old_a **old_as = malloc(n_as * sizeof(struct old_a));
for (int i = 0; i < n_as; ++i) {
old_as[i] = old_a_new(n_refs, count);
for (int j = 0; j < n_refs; ++j) {
old_as[i]->refs[j] = (void *)j;
old_sum += (int)old_as[i]->refs[j];
}
old_sum += old_as[i]->n_refs + old_as[i]->count;
old_a_delete(old_as[i]);
}
free(old_as);
timer = clock() - timer;
printf("old_sum = %u; elapsed time = %.3f\n", old_sum, (double)timer / CLOCKS_PER_SEC);
timer = clock();
void **as = malloc(n_as * sizeof(void *));
for (int i = 0; i < n_as; ++i) {
as[i] = a_new(n_refs, count);
for (int j = 0; j < n_refs; ++j) {
a_set_refs(as[i], j, (void *)j);
sum += (int)a_get_refs(as[i], j);
}
sum += a_get_n_refs(as[i]) + a_get_count(as[i]);
a_delete(as[i]);
}
free(as);
timer = clock() - timer;
printf("sum = %u; elapsed time = %.2f\n", sum, (double)timer / CLOCKS_PER_SEC);
return 0;
}
Compiled with gcc test.c -otest -std=c99:
>test 4
old_sum = 3293684800; elapsed time = 7.04
sum = 3293684800; elapsed time = 6.07
>test 5
old_sum = 885958608; elapsed time = 10.74
sum = 885958608; elapsed time = 9.44
Please let me know if my code has any undefined behaviors, implementation defined behaviors et cetera. It is meant to be 100% portable for machines with a sane (standard compliant) C compiler.
I am aware of memory alignment issues. The member of these emulated structs will only be int, double, and void *, so I think there will not be alignment problems, but I am not sure. Also although the emulated struct appreared to run faster in my machine (Windows 7 64bit, MinGW/gcc), I do not know how it is likely to run with other hardware or compilers. Other than checking about standard guarenteed behavior, I really need help about hardware knowledge; which one is more machine friendly code (preferably in general)?

one thing to note - on some systems, int will be 2 bytes instead of 4. In that case, int will only reach 32767. Since you multiply the input by 10000, it will almost definitely cause problems on such machines. Use long instead.

Unless a sizable proportion of the work of your program is going to be allocating and freeing these data structures, the difference you observed in allocation / deallocation speed is unlikely to make a significant difference in the program's overall execution time.
Furthermore, do be aware that the two approaches are not equivalent. The latter does not produce a representation of a struct old_a, so any other code that uses the data structure produced must use the provided access macros (or an equivalent) to do so.
Moreover, the roll-your-own-struct approach has potential alignment issues. Depending on the implementation-dependent sizes and alignment requirements for various types, it may cause the members of the pointer array inside the pseudo-struct to be misaligned. If it does, then either a speed penalty or possibly even a program crash will result.
More generally, there are few safe assumptions about sizes of type representations. It is certainly unsafe to assume that the size of an int is the same as the size of a void *, or that either one is the same size as a double.

It is possible to put arrays with custom length anywhere in a struct in C, but in that case additional malloc calls are required
No it isn't
There is famous "struct hack" to get structure with array allocated in one go
struct name {
int namelen;
char namestr[1];
};
And then
struct name *makename(char *newname)
{
struct name *ret = malloc(sizeof(struct name)-1 + strlen(newname)+1);
/* -1 for initial [1]; +1 for \0 */
if(ret != NULL) {
ret->namelen = strlen(newname);
strcpy(ret->namestr, newname);
}
return ret;
}
See http://c-faq.com/struct/structhack.html for details
UPDATE
As mentioned, array to be last member of the struct, that might be unreasonable restriction
UPDATE
IN C99 this is now blessed way, called flexible array member, declaration shall be modified to be
struct name {
int namelen;
char namestr[];
};
and then it will work and allocated in one go

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

efficiency int versus long long assignment - c

Always avoid additional iterations in higher-level programming languages. Your code will be more efficient if you just iterate once over the int, instead of looping over its bytes.

Assignment optimizations are done on most architectures so they are aligned to the word size which is 4 bytes for 32 bit x86. So assigning memory of the same size doesn't matter (no difference between memset of 1MB worth of longs and 1MB worth of char types).

Related

Why does OpenMP speed up a SINGLE-ITERATION loop?

C Program crashes(Segmentation Fault) for large size of input array. How to prevent it without using static/global/malloc?

Segmentation Fault 11 in C caused by larger operation numbers

segmentation fault calloc in C

emulating a variable size struct in C; allignment, performance issues

Categories

Resources