understanding uart registers indexing - c

The function below prints out the contents of a UART register. This is the register map.
uart registers
Could somebody explain why, the for loop goes up in +=4?
Thank you
#define UART0_BASE 0x21000
void print_uart(unsigned int base) {
int i;
int val;
unsigned int adr;
for (i=0; i< 0x18; i+=4) {
adr = base + i;
printf("Uart %s [0x%x] -> 0x%x\n",uart_reg[i>>2],adr,val);
}
}

Most likely to fit the start address of each register. Since the for loop runs up to 0x18(24) that makes it 6 registers. It might look like that the registers are only 16 bits but often there's also some padding.

Related

VESA skipping "blocks" of video memory when trying to fill the screen

I'm developing a simple OS' kernel and I am trying to make a working video library so I can use it later. (VBE version 3.0, 1920 * 1080px, 32bpp).
I wrote a pixel plotting function in C which seems to be working fine:
void putPixelRGB(struct MODEINFOBLOCK mib, short x, short y, int color) {
int *pos = (char *)mib.framebuffer + x * 4 + y * 7680;
*pos = color;
}
Then I tried to fill the whole screen using this function and two for loops:
for(int i = 0; i < 1920; i++) {
for(int j = 0; j < 1080; j++) {
putPixelRGB(mib, i, j, 0xFFFFFF);
}
}
This is the result that I ended up with so far:
(I even tried to fill each single byte in the video memory with 0xFF to make sure that I'm not altering other pixels or stuff :P... and, uhh.. I got the same result.)
dloop:
mov byte[eax], 0xFF ;eax contains the address of the FB memory.
inc eax
cmp eax, 829440 ;829440 = 1920 * 1080 * 4
jg done
jmp dloop
done:
hlt
Any idea why this doesn't work? Did I access memory the wrong way?
EDIT:
The MODEINFOBLOCK structure:
struct MODEINFOBLOCK {
int attributes;
char windowA, windowB;
int granularity;
int windowSize;
int segmentA, segmentB;
long winFuncPtr;
int pitch;
int resolutionX, resolutionY;
char wChar, yChar, planes, bpp, banks;
char memoryModel, bankSize, imagePages;
char reserved0;
char readMask, redPosition;
char greenMask, greenPosition;
char blueMask, bluePosition;
char reservedMask, reservedPosition;
char directColorAttributes;
char* framebuffer;
long offScreenMemOff;
int offScreenMemSize;
char reserved1 [206];
};
You probably didn't enable the A20 gate.
With A20 gate disabled, the 21st bit of physical addresses is ignored/masked to zero (to help emulate an old 8086 CPU where there were only 20 address lines). The result is that when you try to fill the frame buffer; the first 1 MiB of pixels works, then the second 1 MiB of pixels overwrites the first 1 MiB of pixels (leaving an unfilled black band), then the third 1 MiB of pixels works but gets overwritten by the fourth 1 MiB of pixels, and so on.
This creates "filled and not filled" horizontal bands. If you do the math, ("1 MiB / 1920 / 4") you'd expect the horizontal bands to be about 136.5 pixels tall; so there'd be slightly more than 7 bands ("1000 / 136.5"); which is what you're getting.
To enable the A20 gate; see https://wiki.osdev.org/A20_Line .

How can one make this dynamic bit range code GCC compliant for 64 bit compilers?

I am trying to update for linux, GCC, and 64 bit use and preserve in a github Ken Silverman's Paint N Draw 3D C software. I got his permission but he's too busy to help. I don't want to do a bad job and I am not a bit-twiddling expert so I'd like to fix the main parts before I upload it.
In his code pnd3d.c he used a struct called bitmal_t * that contains a malloc (I think his element mal means the size of a malloc) and a size to indicate a voxel-distance as an unsigned int (in 2009 it was a 32 bit ) bit chain amongst the bits of a concatenated set of 32 bit ints. So basically, distance is a function of how many bits on (1) along the extended bit chain. For collisions, he looks up and down for zeros and ones.
Here is his bitmal_t:
//buf: cast to: octv_t* or surf_t*
//bit: 1 bit per sizeof(buf[0]); 0=free, 1=occupied
typedef struct bit { void *buf; unsigned int mal, *bit, ind, num, siz; } bitmal_t;
Here is his range finding code that goes up and down the bit-range looking for a one or a zero. I posted his originals, not my crappy nonworking version.
Here is all the code snippets you would need to reproduce it.
static __forceinline int dntil0 (unsigned int *lptr, int z, int zsiz)
{
// //This line does the same thing (but slow & brute force)
//while ((z < zsiz) && (lptr[z>>5]&(1<<KMOD32(z)))) z++; return(z);
int i;
//WARNING: zsiz must be multiple of 32!
i = (lptr[z>>5]|((1<<KMOD32(z))-1)); z &= ~31;
while (i == 0xffffffff)
{
z += 32; if (z >= zsiz) return(zsiz);
i = lptr[z>>5];
}
return(bsf(~i)+z);
}
static __forceinline int uptil0 (unsigned int *lptr, int z)
{
// //This line does the same thing (but slow & brute force)
//while ((z > 0) && (lptr[(z-1)>>5]&(1<<KMOD32(z-1)))) z--; return(z);
int i;
if (!z) return(0); //Prevent possible crash
i = (lptr[(z-1)>>5]|(-1<<KMOD32(z))); z &= ~31;
while (i == 0xffffffff)
{
z -= 32; if (z < 0) return(0);
i = lptr[z>>5];
}
return(bsr(~i)+z+1);
}
static __forceinline int dntil1 (unsigned int *lptr, int z, int zsiz)
{
// //This line does the same thing (but slow & brute force)
//while ((z < zsiz) && (!(lptr[z>>5]&(1<<KMOD32(z))))) z++; return(z);
int i;
//WARNING: zsiz must be multiple of 32!
i = (lptr[z>>5]&(-1<<KMOD32(z))); z &= ~31;
while (!i)
{
z += 32; if (z >= zsiz) return(zsiz);
i = lptr[z>>5];
}
return(bsf(i)+z);
}
static __forceinline int uptil1 (unsigned int *lptr, int z)
{
// //This line does the same thing (but slow & brute force)
//while ((z > 0) && (!(lptr[(z-1)>>5]&(1<<KMOD32(z-1))))) z--; return(z);
int i;
if (!z) return(0); //Prevent possible crash
i = (lptr[(z-1)>>5]&((1<<KMOD32(z))-1)); z &= ~31;
while (!i)
{
z -= 32; if (z < 0) return(0);
i = lptr[z>>5];
}
return(bsr(i)+z+1);
}
Here are his set range to ones and zeroes functions:
//Set all bits in vbit from (x,y,z0) to (x,y,z1-1) to 0's
#ifndef _WIN64
static __forceinline void setzrange0 (void *vptr, int z0, int z1)
{
int z, ze, *iptr = (int *)vptr;
if (!((z0^z1)&~31)) { iptr[z0>>5] &= ((~(-1<<z0))|(-1<<z1)); return; }
z = (z0>>5); ze = (z1>>5);
iptr[z] &=~(-1<<z0); for(z++;z<ze;z++) iptr[z] = 0;
iptr[z] &= (-1<<z1);
}
//Set all bits in vbit from (x,y,z0) to (x,y,z1-1) to 1's
static __forceinline void setzrange1 (void *vptr, int z0, int z1)
{
int z, ze, *iptr = (int *)vptr;
if (!((z0^z1)&~31)) { iptr[z0>>5] |= ((~(-1<<z1))&(-1<<z0)); return; }
z = (z0>>5); ze = (z1>>5);
iptr[z] |= (-1<<z0); for(z++;z<ze;z++) iptr[z] = -1;
iptr[z] |=~(-1<<z1);
}
#else
static __forceinline void setzrange0 (void *vptr, __int64 z0, __int64 z1)
{
unsigned __int64 z, ze, *iptr = (unsigned __int64 *)vptr;
if (!((z0^z1)&~63)) { iptr[z0>>6] &= ((~(LL(-1)<<z0))|(LL(-1)<<z1)); return; }
z = (z0>>6); ze = (z1>>6);
iptr[z] &=~(LL(-1)<<z0); for(z++;z<ze;z++) iptr[z] = LL(0);
iptr[z] &= (LL(-1)<<z1);
}
//Set all bits in vbit from (x,y,z0) to (x,y,z1-1) to 1's
static __forceinline void setzrange1 (void *vptr, __int64 z0, __int64 z1)
{
unsigned __int64 z, ze, *iptr = (unsigned __int64 *)vptr;
if (!((z0^z1)&~63)) { iptr[z0>>6] |= ((~(LL(-1)<<z1))&(LL(-1)<<z0)); return; }
z = (z0>>6); ze = (z1>>6);
iptr[z] |= (LL(-1)<<z0); for(z++;z<ze;z++) iptr[z] = LL(-1);
iptr[z] |=~(LL(-1)<<z1);
}
#endif
Write some unit tests that pass on the original!
First of all, SSE2 is baseline for x86-64, so you should definitely be using that instead of just 64-bit integers.
GCC (unlike MSVC) assumes no strict-aliasing violations, so the set bit range functions (that cast an incoming pointer to signed int* (!!) or uint64_t* depending on WIN64 or not) might need to be compiled with -fno-strict-aliasing to make pointer-casting well-defined.
You could replace the loop part of the set/clear bit-range functions with memset (which gcc may inline), or a hand-written SSE intrinsics loop if you expect the size to usually be small (like under 200 bytes or so, not worth the overhead of calling libc memset)
I think those dntil0 functions in the first block are just bit-search loops for the first 0 or first 1 bit, forward or backward.
Rewrite them from scratch with SSE2 intrinsics: _mm_cmpeq_epi8 / _mm_movemask_epi8 to find the first byte that isn't all-0 or all-1 bits, then use bsf or bsr on that.
See the glibc source code for SSE2 memchr, or any simpler SSE2-optimized implementation, to find out how to do the byte-search part. Or glibc memmem for an example of comparing for equal, but that's easy: instead of looking for a non-zero _mm_movemask_epi8() (indicating there was a match), look for a result that's != 0xffff (all ones) to indicate that there was a mismatch. Use bsf or bsr on that bitmask to find the byte index into the SIMD vector.
So in total you'll use BSR or BSF twice in each function: one to find the byte index within the SIMD vector, and again to find the bit-index within the target byte.
For the bit-scan function, use GCC __builtin_clz or __builtin_ctz to find the first 1 bit. Bit twiddling: which bit is set?
To search for the first zero instead of the first one, bitwise invert, like __builtin_ctz( ~p[idx] ) where p is an unsigned char* into your search buffer (that you were using _mm_loadu_si128() on), and idx is an offset within that 16 byte window. (That you calculated with __builtin_ctz() on the movemask result that broke out of the vector loop.)
How the original worked:
z -= 32 is looping by 32 bits (the size of an int, because this was written assuming it would be compiled for x86 Windows or x86-64 Windows).
lptr[z>>5]; is converting the bit index to an int index. So it's simply looping over the buffer 1 int at a time.
When it finds a 4-byte element that's != 0xFFFFFFFF, it has found an int containing a bit that's not 1; i.e. it contains the bit we're looking for. So it uses bsf or bsr to bit-scan and find the position of that bit within this int.
It adds that to z (the bit-position of the start of this int).
This is exactly the same algorithm I described above, but implemented one integer at a time instead of 16 bytes at a time.
It should really be using uint32_t or unsigned int for bit-manipulations, not signed int, but it obviously works correctly on MSVC.
if (z >= zsiz) return(zsiz); This is the size check to break out of the loop if no bit is found.

Trouble storing a contents of a drive sector into character array using BIOS interrupt 0x13

We are working on a kernel in our Operating Systems class in C using bcc and some simulator software a professor wrote. The current step I am stuck on is to read a sector from a floppy using a 0x13 BIOS interrupt, storing it in a character array, and then printing it to the screen. Once we finish our kernel.c we build it into a floppy.img file. Then we load a .txt file into sector 30 of the floppy we are reading so we can test that our readSector function works properly. Finally we run the floppy.img file in a simulator that will simulate an OS
int main(){
void printString(char*);
void readString(char*);
void readSector(char*, int);
int mod(int, int);//Modulus Function
int div(int, int);//Divison Function
char buffer[512];
readSector(buffer,30);
printString(buffer);
while(1);
}
The printString function works fine and I have been using it for debugging purposes
void printString(char* chars){
int i;
for(i = 0; chars[i] != '\0'; i++){
interrupt(0x10, 0xe*256+chars[i], 0, 0, 0);
}
}//printString
We weren't really given a clear constructor for the 0x13 interrupt, I think this is the right order but I'm not 100%. When I run the simulator it prints the "Called interrupt 0x13" but doesn't print anything out after that, when it should print the contents of buffer from main.
void readSector(char* buffer, int sector){
int relativeSector = mod(sector,18)+1; //sector%18
int head = mod(div(sector,18),2); //(sector/18)%2
int track = div(sector,36); //sector/36
interrupt(0x13, 2, 1, buffer, track, relativeSector, head, 0); //0x13, AH, AL, BX, CH, CL, DH, DL
printString("Called interrupt 0x13\0");
}
Since bcc doesn't includ mod/div I made these based on some given pseudocode
int mod(int a, int b){//a == modulend, b == modulosor
while(a > b){
a =a-b;
}
return a;
}//mod
int div(int a, int b){//a == dividend, b == divisor, q == quotient
int q = 0;
while((q*b) <= a){
q++;
}
return q-1;
}//div
I'm not 100% sure how the 0x13 interrupt works, I assumed that it would read sector 30 and write it into the buffer array. My professor took a look at this for a few minutes and said it looks okay but wasn't sure why it wasn't working. Going to see him tomorrow to investigate this further, but there's a few more steps due by Wednesday and I'm getting antsy trying to figure this one out. Any help would be much appreciated

Is there a more efficient way of splitting a number into its digits?

I have to split a number into its digits in order to display it on an LCD. Right now I use the following method:
pos = 7;
do
{
LCD_Display(pos, val % 10);
val /= 10;
pos--;
} while (pos >= 0 && val);
The problem with this method is that division and modulo operations are extremely slow on an MSP430 microcontroller. Is there any alternative to this method, something that either does not involve division or that reduces the number of operations?
A note: I can't use any library functions, such as itoa. The libraries are big and the functions themselves are rather resource hungry (both in terms of number of cycles, and RAM usage).
You could do subtractions in a loop with predefined base 10 values.
My C is a bit rusty, but something like this:
int num[] = { 10000000,1000000,100000,10000,1000,100,10,1 };
for (pos = 0; pos < 8; pos++) {
int cnt = 0;
while (val >= num[pos]) {
cnt++;
val -= num[pos];
}
LCD_Display(pos, cnt);
}
Yes, there's another way, originally invented (at least AFAIK) by Terje Mathiesen. Instead of dividing by 10, you (sort of) multiply by the reciprocal. The trick, of course, is that in integers you can't represent the reciprocal directly. To make up for that, you work with scaled integers. If we had floating point, we could extract digits with something like:
input = 123
first digit = integer(10 * (fraction(input * .1))
second digit = integer(100 * (fraction(input * .01))
...and so on for as many digits as needed. To do this with integers, we basically just scale those by 232 (and round each up, since we'll use truncating math). In C, the algorithm looks like this:
#include <stdio.h>
// here are our scaled factors
static const unsigned long long factors[] = {
3435973837, // ceil((0.1 * 2**32)<<3)
2748779070, // ceil((0.01 * 2**32)<<6)
2199023256, // etc.
3518437209,
2814749768,
2251799814,
3602879702,
2882303762,
2305843010
};
static const char shifts[] = {
3, // the shift value used for each factor above
6,
9,
13,
16,
19,
23,
26,
29
};
int main() {
unsigned input = 13754;
for (int i=8; i!=-1; i--) {
unsigned long long inter = input * factors[i];
inter >>= shifts[i];
inter &= (unsigned)-1;
inter *= 10;
inter >>= 32;
printf("%u", inter);
}
return 0;
}
The operations in the loop will map directly to instructions on most 32-bit processors. Your typical multiply instruction will take 2 32-bit inputs, and produce a 64-bit result, which is exactly what we need here. It'll typically be quite a bit faster than a division instruction as well. In a typical case, some of the operations will (or at least with some care, can) disappear in assembly language. For example, where I've done the inter &= (unsigned)-1;, in assembly language you'll normally be able to just use the lower 32-bit register where the result was stored, and just ignore whatever holds the upper 32 bits. Likewise, the inter >>= 32; just means we use the value in the upper 32-bit register, and ignore the lower 32-bit register.
For example, in x86 assembly language, this comes out something like:
mov ebx, 9 ; maximum digits we can deal with.
mov esi, offset output_buffer
next_digit:
mov eax, input
mul factors[ebx*4]
mov cl, shifts[ebx]
shrd eax, edx, cl
mov edx, 10 ; overwrite edx => inter &= (unsigned)-1
mul edx
add dl, '0'
mov [esi], dl ; effectively shift right 32 bits by ignoring 32 LSBs in eax
inc esi
dec ebx
jnz next_digit
mov [esi], bl ; zero terminate the string
For the moment, I've cheated a tiny bit, and written the code assuming an extra item at the beginning of each table (factors and shifts). This isn't strictly necessary, but simplifies the code at the cost of wasting 8 bytes of data. It's pretty easy to do away with that too, but I haven't bothered for the moment.
In any case, doing away with the division makes this a fair amount faster on quite a few low- to mid-range processors that lack dedicated division hardware.
Another way is using double dabble. This is a way to convert binary to BCD with only additions and bit shifts so it's very appropriate for microcontrollers. After splitting to BCDs you can easily print out each number
I would use a temporary string, like:
char buffer[8];
itoa(yourValue, buffer, 10);
int pos;
for(pos=0; pos<8; ++pos)
LCD_Display(pos, buffer[pos]); /* maybe you'll need a cast here */
edit: since you can't use library's itoa, then I think your solution is already the best, providing you compile with max optimization turned on.
You may take a look at this: Most optimized way to calculate modulus in C
This is my attempt at a complete solution. Credit should go to Guffa for providing the general idea. This should work for 32bit integers, signed or otherwise and 0.
#include <stdlib.h>
#include <stdio.h>
#define MAX_WIDTH (10)
static unsigned int uiPosition[] = {
1u,
10u,
100u,
1000u,
10000u,
100000u,
1000000u,
10000000u,
100000000u,
1000000000u,
};
void uitostr(unsigned int uiSource, char* cTarget)
{
int i, c=0;
for( i=0; i!=MAX_WIDTH; ++i )
{
cTarget[i] = 0;
}
if( uiSource == 0 )
{
cTarget[0] = '0';
cTarget[1] = '\0';
return;
}
for( i=MAX_WIDTH -1; i>=0; --i )
{
while( uiSource >= uiPosition[i] )
{
cTarget[c] += 1;
uiSource -= uiPosition[i];
}
if( c != 0 || cTarget[c] != 0 )
{
cTarget[c] += 0x30;
c++;
}
}
cTarget[c] = '\0';
}
void itostr(int iSource, char* cTarget)
{
if( iSource < 0 )
{
cTarget[0] = '-';
uitostr((unsigned int)(iSource * -1), cTarget + 1);
}
else
{
uitostr((unsigned int)iSource, cTarget);
}
}
int main()
{
char szStr[MAX_WIDTH +1] = { 0 };
// signed integer
printf("Signed integer\n");
printf("int: %d\n", 100);
itostr(100, szStr);
printf("str: %s\n", szStr);
printf("int: %d\n", -1);
itostr(-1, szStr);
printf("str: %s\n", szStr);
printf("int: %d\n", 1000000000);
itostr(1000000000, szStr);
printf("str: %s\n", szStr);
printf("int: %d\n", 0);
itostr(0, szStr);
printf("str: %s\n", szStr);
return 0;
}

Misaligned Pointer Performance

Aren't misaligned pointers (in the BEST possible case) supposed to slow down performance and in the worst case crash your program (assuming the compiler was nice enough to compile your invalid c program).
Well, the following code doesn't seem to have any performance differences between the aligned and misaligned versions. Why is that?
/* brutality.c */
#ifdef BRUTALITY
xs = (unsigned long *) ((unsigned char *) xs + 1);
#endif
...
/* main.c */
#include <stdio.h>
#include <stdlib.h>
#define size_t_max ((size_t)-1)
#define max_count(var) (size_t_max / (sizeof var))
int main(int argc, char *argv[]) {
unsigned long sum, *xs, *itr, *xs_end;
size_t element_count = max_count(*xs) >> 4;
xs = malloc(element_count * (sizeof *xs));
if(!xs) exit(1);
xs_end = xs + element_count - 1; sum = 0;
for(itr = xs; itr < xs_end; itr++)
*itr = 0;
#include "brutality.c"
itr = xs;
while(itr < xs_end)
sum += *itr++;
printf("%lu\n", sum);
/* we could free the malloc-ed memory here */
/* but we are almost done */
exit(0);
}
Compiled and tested on two separate machines using
gcc -pedantic -Wall -O0 -std=c99 main.c
for i in {0..9}; do time ./a.out; done
I tested this some time in the past on Win32 machines and did not notice much of a penalty on 32-bit machines. On 64-bit, though, it was significantly slower. For example, I ran the following bit of code. On a 32-bit machine, the times printed were hardly changed. But on a 64-bit machine, the times for the misaligned accesses were nearly twice as long. The times follow the code.
#define UINT unsigned __int64
#define ENDPART QuadPart
#else
#define UINT unsigned int
#define ENDPART LowPart
#endif
int main(int argc, char *argv[])
{
LARGE_INTEGER startCount, endCount, freq;
int i;
int offset;
int iters = atoi(argv[1]);
char *p = (char*)malloc(16);
double *d;
for ( offset = 0; offset < 9; offset++ )
{
d = (double*)( p + offset );
printf( "Address alignment = %u\n", (unsigned int)d % 8 );
*d = 0;
QueryPerformanceFrequency(&freq);
QueryPerformanceCounter(&startCount);
for(i = 0; i < iters; ++i)
*d = *d + 1.234;
QueryPerformanceCounter(&endCount);
printf( "Time: %lf\n",
(double)(endCount.ENDPART-startCount.ENDPART)/freq.ENDPART );
}
}
Here are the results on a 64-bit machine. I compiled the code as a 32-bit application.
[P:\t]pointeralignment.exe 100000000
Address alignment = 0
Time: 0.484156
Address alignment = 1
Time: 0.861444
Address alignment = 2
Time: 0.859656
Address alignment = 3
Time: 0.861639
Address alignment = 4
Time: 0.860234
Address alignment = 5
Time: 0.861539
Address alignment = 6
Time: 0.860555
Address alignment = 7
Time: 0.859800
Address alignment = 0
Time: 0.484898
The x86 architecture has always been able to handle misaligned accesses, so you'll never get a crash. Other processors might not be as lucky.
You're probably not seeing any time difference because the loop is memory-bound; it can only run as fast as data can be fetched from RAM. You might think that the misalignment will cause the RAM to be accessed twice, but the first access puts it into cache, and the second access can be overlapped with getting the next value from RAM.
You're assuming either x86 or x64 architectures. On MIPS, for example, your code may result in a SIGBUS(bus fault) signal being raised. On other architectures, non-aligned accesses will typically be slower than aligned accesses, although, it is very much architecture dependent.
x86 or x64?
Misaligned pointers were a killer in x86 where 64bit architectures were not nearly as prone to the crash, or even slow performance at all.
It is probably because malloc of that many bytes is returning NULL. At least that's what it does for me.
You never defined BRUTALITY in your posted code. Are you sure you are testing in 'brutal' mode?
Maybe in order to malloc such a huge buffer, the system is paging memory to and from disk. That could swamp small differences. Try a much smaller buffer and a large, in program loop count around that.
I made the mods I've suggested here and in the comments and tested on my system (a tired, 4 year old, 32 bit laptop). Code shown below. I do get a measurable difference, but only around 3%. I maintain my changes are a success because your question indicates you get no difference at all correct ?
Sorry I am using Windows and used the windows specific GetTickCount() API I am familiar with because I often do timing tests, and enjoy the simplicity of that misnamed API (it actually return millisecs since system start).
/* main.cpp */
#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
#define BRUTALITY
int main(int argc, char *argv[]) {
unsigned long i, begin, end;
unsigned long sum, *xs, *itr, *xs_begin, *xs_end;
size_t element_count = 100000;
xs = (unsigned long *)malloc(element_count * (sizeof *xs));
if(!xs) exit(1);
xs_end = xs + element_count - 1;
#ifdef BRUTALITY
xs_begin = (unsigned long *) ((unsigned char *) xs + 1);
#else
xs_begin = xs;
#endif
begin = GetTickCount();
for( i=0; i<50000; i++ )
{
for(itr = xs_begin; itr < xs_end; itr++)
*itr = 0;
sum = 0;
itr = xs_begin;
while(itr < xs_end)
sum += *itr++;
}
end = GetTickCount();
printf("sum=%lu elapsed time=%lumS\n", sum, end-begin );
free(xs);
exit(0);
}

Resources