Standard C: Storing arrays in off-chip RAM

Standard C: Storing arrays in off-chip RAM - c

I would like to know if I can choose the storage location of arrays in c. There are a couple of questions already on here with some helpful info, but I'm looking for some extra info.
I have an embedded system with a soft-core ARM cortex implemented on an FPGA.
Upon start-up code is loaded from memory and executed by the processor. My code is in assembley and contains some c functions. One particular function is a uART interrupt which I have included below
void UART_ISR()
{
int count, n=1000, t1=0, t2=1, display=0, y, z;
int x[1000]; //storage array for first 1000 terms of Fibonacci series
x[1] = t1;
x[2] = t2;
printf("\n\nFibonacci Series: \n\n %d \n %d \n ", t1, t2);
count=2; /* count=2 because first two terms are already displayed. */
while (count<n)
{
display=t1+t2;
t1=t2;
t2=display;
x[count] = t2;
++count;
printf(" %d \n",display);
}
printf("\n\n Finished. Sequence written to memory. Reading sequence from memory.....:\n\n");
for (z=0; z<10000; z++){} // Delay
for (y=0; y<1000; y++) { //Read variables from memory
printf("%d \n",x[y]);
}
}
So basically the first 1000 values of the Fibonacci series are printed and stored in array X and then values from the array are printed to the screen again after a short delay.
Please correct me if I'm wrong but the values in the array X are stored on the stack as they are computed in the for loop and retrieved from the stack when the array is read from memory.
Here is he memory map of the system
0x0000_0000 to 0x0000_0be0 is the code
0x0000_0be0 to 0x0010_0be0 is 1MB heap
0x0010_0be0 to 0x0014_0be0 is 256KB stack
0x0014_0be0 to 0x03F_FFFF is of-chip RAM
Is there a function in c that allows me to store the array X in the off-chip ram for later retrieval?
Please let me know if you need any more info
Thanks very much for helping
--W

No, not "in C" as in "specified by the language".
The C language doesn't care about where things are stored, it specifies nothing about the existance of RAM at particular addresses.
But, actual implementations in the form of compilers, assemblers and linkers, often care a great deal about this.
With gcc for instance, you can use the section variable attribute to force a variable into a particular section.
You can then control the linker to map that section to a particular memory area.
UPDATE:
The other way to do this is manually, by not letting the compiler in on the secret and doing it yourself.
Something like:
int *external_array = (int *) 0x00140be0;
memcpy(external_array, x, sizeof x);
will copy the required number of bytes to the external memory. You can then read it back by swapping the two first arguments in the memcpy() call.
Note that this is way more manual, low-level and fragile, compared to letting the compiler/linker dynamic duo Just Make it Work for you.
Also, it seems very unlikely that you want to do all of that work from an ISR.

Related

Global variable reads inside tight loops in C

Say I have a tight loop in C, within which I use the value of a global variable to do some arithmetics, e.g.
double c;
// ... initialize c somehow ...
double f(double*a, int n) {
double sum = 0.0;
int i;
for (i = 0; i < n; i++) {
sum += a[i]*c;
}
return sum;
}
with c the global variable. Is c "read anew from global scope" in each loop iteration? After all, it could've been changed by some other thread executing some other function, right? Hence would the code be faster by taking a local (function stack) copy of c prior to the loop and only use this copy?
double f(double*a, int n) {
double sum = 0.0;
int i;
double c_cp = c;
for (i = 0; i < n; i++) {
sum += a[i]*c_cp;
}
return sum;
}
Though I haven't specified how c is initialized, let's assume it's done in some way such that the value is unknown at compile time. Also, c is really a constant throughout runtime, i.e. I as the programmer knows that its value won't change. Can I let the compiler in on this information, e.g. using static double c in the global scope? Does this change the a[i]*c vs. a[i]*c_cp question?
My own research
Reading e.g. the "Global variables" section of this, it seems clear that taking a local copy of the global variable is the way to go. However, they want to update the value of the global variable, whereas I only ever want to read its value.
Using godbolt I fail to notice any real difference in the assembly for both c vs. c_cp and double c vs. static double c.

Any decently smart compiler will optimize your code so it will behave as your second code snippet. Using static won't change much, but if you want to ensure read on each iteration then use volatile.
Great point there about changes from a different thread. Compiler will maintain integrity of your code as far as single-threaded execution goes. That means that it can reorder your code, skip something, add something -- as long as the end result is still the same.
With multiple threads it is your job to ensure that things still happen in a specific order, not just that the end result is right. The way to ensure that are memory barriers. It's a fun topic to read, but one that is best avoided unless you're an expert.

Once everything translated to machine code, you will get no difference whatsoever. If c is global, any access to c will reference the address of c or most probably, in a tight loop c will be kept in a register, or in the worst case the L1 cache.
On a Linux machine you can easily generate the assembly and examine the resultant code.
You can also run benchmarks.

Maximum size array program in C?

with the following code, I am trying to make an array of numbers and then sorting them. But if I set a high arraysize (MAX), the program stops at the last 'randomly' generated number and does not continue to the sorting at all. Could anyone please give me a hand with this?
#include <stdio.h>
#define MAX 2000000
int a[MAX];
int rand_seed=10;
/* from K&R
- returns random number between 0 and 62000.*/
int rand();
int bubble_sort();
int main()
{
int i;
/* fill array */
for (i=0; i < MAX; i++)
{
a[i]=rand();
printf(">%d= %d\n", i, a[i]);
}
bubble_sort();
/* print sorted array */
printf("--------------------\n");
for (i=0; i < MAX; i++)
printf("%d\n",a[i]);
return 0;
}
int rand()
{
rand_seed = rand_seed * 1103515245 +12345;
return (unsigned int)(rand_seed / 65536) % 62000;
}
int bubble_sort(void)
{
int t, x, y;
/* bubble sort the array */
for (x=0; x < MAX-1; x++)
for (y=0; y < MAX-x-1; y++)
if (a[y] > a[y+1])
{
t=a[y];
a[y]=a[y+1];
a[y+1]=t;
}
return 0;
}

The problem is that you are storing the array in global section, C doesn't give any guarantee about the maximum size of global section it can support, this is a function of OS, arch compiler.
So instead of creating a global array, create a global C pointer, allocated a large chunk using malloc. Now memory is saved in the heap which is much bigger and can grow at runtime.

Your array will land in BSS section for static vars. It will not be part of an image but program loader will allocate required space and fill it with zeros before your program starts 'real' execution. You can even control this process if using embedded compiler and fill your static data with anything you like. This array may occupy 2GB or your RAM and yet your exe file may be few kilobytes. I've just managed to use over 2GB array this way and my exe was 34KB. I can believe a compiler may warn you when you approach maybe 231-1 elements (if your int is 32bit) but static arrays with 2m elements are not a problem nowadays (unless it is embedded system but I bet it is not).
The problem might be that your bubble sort has 2 nested loops (as all bubble sorts) so trying to sort this array - having 2m elements - causes the program to loop 2*1012 times (arithmetic sequence):
inner loop:
1: 1999999 times
2: 1999998 times
...
2000000: 1 time
So you must swap elements
2000000 * (1999999+1) / 2 = (4 / 2) * 10000002 = 2*1012 times
(correct me if I am wrong above)
Your program simply remains too long in sort routine and you are not even aware of that. What you see it just last rand number printed and program not responding. Even on my really fast PC with 200K array it took around 1minute to sort it this way.
It is not related to your os, compiler, heaps etc. Your program is just stuck as your loop executes 2*1012 times if you have 2m elements.
To verify my words print "sort started" before sorting and "sort finished" after that. I bet the last thing you'll see is "sort started". In addition you may print current x value before your inner loop in bubble_sort - you'll see that it is working.

Dynamic Array
int *Array;
Array= malloc (sizeof(int) * Size);

The original C standard (ANSI 1989/ISO 1990) required that a compiler successfully translate at least one program containing at least one example of a set of environmental limits. One of those limits was being able to create an object of at least 32,767 bytes.
This minimum limit was raised in the 1999 update to the C standard to be at least 65,535 bytes.
No C implementation is required to provide for objects greater than that size, which means that they don't need to allow for an array of ints greater than
(int)(65535 / sizeof(int)).
In very practical terms, on modern computers, it is not possible to say in advance how large an array can be created. It can depend on things like the amount of physical memory installed in the computer, the amount of virtual memory provided by the OS, the number of other tasks, drivers, and programs already running and how much memory that are using. So your program may be able to use more or less memory running today than it could use yesterday or it will be able to use tomorrow.
Many platforms place their strictest limits on automatic objects, that is those defined inside of a function without the use of the 'static' keyword. On some platforms you can create larger arrays if they are static or by dynamic allocation.

C memory management in gcc

I am using gcc version 4.7.2 on Ubuntu 12.10 x86_64.
First of all these are the sizes of data types on my terminal:
sizeof(char) = 1
sizeof(short) = 2 sizeof(int) = 4
sizeof(long) = 8 sizeof(long long) = 8
sizeof(float) = 4 sizeof(double) = 8
sizeof(long double) = 16
Now please have a look at this code snippet:
int main(void)
{
char c = 'a';
printf("&c = %p\n", &c);
return 0;
}
If I am not wrong we can't predict anything about the address of c. But each time this program gives some random hex address ending in f. So the next available location will be some hex value ending in 0.
I observed this pattern in case of other data types too. For an int value the address was some hex value ending in c. For double it was some random hex value ending in 8 and so on.
So I have 2 questions here.
1) Who is governing this kind of memory allocation ? Is it gcc or C standard ?
2) Whoever it is, Why it's so ? Why the variable is stored in such a way that next available memory location starts at a hex value ending in 0 ? Any specific benefit ?
Now please have a look at this code snippet:
int main(void)
{
double a = 10.2;
int b = 20;
char c = 30;
short d = 40;
printf("&a = %p\n", &a);
printf("&b = %p\n", &b);
printf("&c = %p\n", &c);
printf("&d = %p\n", &d);
return 0;
}
Now here what I observed is completely new for me. I thought the variable would get stored in the same order they are declared. But No! That's not the case. Here is the sample output of one of random run:
&a = 0x7fff8686a698
&b = 0x7fff8686a694
&c = 0x7fff8686a691
&d = 0x7fff8686a692
It seems that variables get sorted in increasing order of their sizes and then they are stored in the same sorted order but with maintaining the observation 1. i.e. the last variable (largest one) gets stored in such a way that the next available memory location is an hex value ending in 0.
Here are my questions:
3) Who is behind this ? Is it gcc or C standard ?
4) Why to waste the time in sorting the variables first and then allocating the memory instead of directly allocating the memory on 'first come first serve' basis ? Any specific benefit of this kind of sorting and then allocating memory ?
Now please have a look at this code snippet:
int main(void)
{
char array1[] = {1, 2};
int array2[] = {1, 2, 3};
printf("&array1[0] = %p\n", &array1[0]);
printf("&array1[1] = %p\n\n", &array1[1]);
printf("&array2[0] = %p\n", &array2[0]);
printf("&array2[1] = %p\n", &array2[1]);
printf("&array2[2] = %p\n", &array2[2]);
return 0;
}
Now this is also shocking for me. What I observed is that the array is always stored at some random hex value ending in '0' if the elements of an array >= 2 and if elements < 2
then it gets memory location following observation 1.
So here are my questions:
5) Who is behind this storing an array at some random hex value ending at 0 thing ? Is it gcc or C standard ?
6) Now why to waste the memory ? I mean array2 could have been stored immediately after array1 (and hence array2 would have memory location ending at 2). But instead of that array2 is stored at next hex value ending at 0 thereby leaving 14 memory locations in between. Any specific benefits ?

The address at which the stack and the heap start is given to the process by the operating system. Everything else is decided by the compiler, using offsets that are known at compile time. Some of these things may follow an existing convention followed in your target architecture and some of these do not.
The C standard does not mandate anything regarding the order of the local variables inside the stack frame (as pointed out in a comment, it doesn't even mandate the use of a stack at all). The standard only bothers to define order when it comes to structs and, even then, it does not define specific offsets, only the fact that these offsets must be in increasing order. Usually, compilers try to align the variables in such a way that access to them takes as few CPU instructions as possible - and the standard permits that, without mandating it.

Part of the reasons are mandated by the application binary interface (ABI) specifications for your system & processor.
See the x86 calling conventions and the SVR4 x86-64 ABI supplement (I'm giving the URL of a recent copy; the latest original is surprisingly hard to find on the Web).
Within a given call frame, the compiler could place variables in arbitrary stack slots. It may try (when optimizing) to reorganize the stack at will, e.g. by decreasing alignment constraints. You should not worry about that.
A compiler try to put local variables on stack location with suitable alignment. See the alignof extension of GCC. Where exactly the compiler put these variables is not important, see my answer here. (If it is important to your code, you really should pack the variables in a single common local struct, since each compiler, version and optimization flags could do different things; so don't depend on that precise behavior of your particular compiler).

Purposely waste all of main memory to learn fragmentation

In my class we have an assignment and one of the questions states:
Memory fragmentation in C: Design, implement, and run a C-program that does the following: it allocated memory for a sequence of of 3m arrays of size 500000 elements each; then it deallocates all even-numbered arrays and allocates a sequence of m arrays of size 700000 elements each. Measure the amounts of time your program requires for the allocations of the first sequence and for the second sequence. Choose m so that you exhaust all of the main memory available to your program. Explain your timings
My implementation of this is as follows:
#include <iostream>
#include <time.h>
#include <algorithm>
void main(){
clock_t begin1, stop1, begin2, stop2;
double tdif = 0, tdif2 = 0;
for(int k=0;k<1000;k++){
double dif, dif2;
const int m = 50000;
begin1 = clock();
printf("Step One\n");
int *container[3*m];
for(int i=0;i<(3*m);i++)
{
int *tmpAry = (int *)malloc(500000*sizeof(int));
container[i] = tmpAry;
}
stop1 = clock();
printf("Step Two\n");
for(int i=0;i<(3*m);i+=2)
{
free(container[i]);
}
begin2 = clock();
printf("Step Three\n");
int *container2[m];
for(int i=0;i<m;i++)
{
int *tmpAry = (int *)malloc(700000*sizeof(int));
container2[i] = tmpAry;
}
stop2 = clock();
dif = (stop1 - begin1)/1000.00;
dif2 = (stop2 - begin2)/1000.00;
tdif+=dif;
tdif/=2;
tdif2+=dif2;
tdif2/=2;
}
printf("To Allocate the first array it took: %.5f\n",tdif);
printf("To Allocate the second array it took: %.5f\n",tdif2);
system("pause");
};
I have changed this up a few different ways, but the consistencies I see are that when I initially allocate the memory for 3*m*500000 element arrays it uses up all of the available main memory. But then when I tell it to free them the memory is not released back to the OS so then when it goes to allocate the m*700000 element arrays it does it in the page file (swap memory) so it does not actually display memory fragmentation.
The above code runs this 1000 times and averages it, it takes quite some time. The first sequence average took 2.06913 seconds and the second sequence took 0.67594 seconds. To me the second sequence is supposed to take longer to show how fragmentation works, but because of the swap being used this does not occur. Is there a way around this or am I wrong in my assumption?
I will ask the professor about what I have on monday but until then any help would be appreciated.

Many libc implementations (I think glibc included) don't release memory back to the OS when you call free(), but keep it so you can use it on the next allocation without a syscall. Also, because of the complexity of modern paging and virtual memory stratagies, you can never be sure where anything is in physical memory, which makes it almost imposible to intentionally fragment it (even if it comes fragmented). You have to remember, all virtual memory, and all physical memory are different beasts.
(The following is written for Linux, but probably applicable to Windows and OSX)
When your program makes the first allocations, let's say there is enough physical memory for the OS to squeeze all of the pages in. They aren't all next to each-other in physical memory -- they are scattered wherever they can be. Then the OS modifies the page table to make a set of continuous virtual addresses, that refer to the scattered pages around in memory. But here's the thing -- because you don't really use the first memory you allocate, it becomes a really good candidate for swapping out. So, when you come along to do the next allocations, the OS, running out of memory, will probably swap out some of those pages to make room for the new ones. Because of this, you are actually measuring disk speeds, and the efficiency of the operations systems paging mechanism -- not fragmentation.
Remember, an set of continuous virtual addresses is almost never physically continuous in practice (or even in memory).

Frame Pointer / Program Counter / Array Overflow

I'm working on a practice problem set for C programming, and I've encountered this question. I'm not entirely sure what the question is asking for... given that xDEADBEEF is the halt instruction, but where do we inject deadbeef? why is the FP relevant in this question? thank you!
You’ve been assigned as the lead computer engineer on an interplanetary space mission to Jupiter. After several months in space, the ship’s main computer, a HAL9000, begins to malfunction and starts killing off the crew members. You’re the last crew member left alive and you need to trick the HAL 9000 computer into executing a HALT instruction. The good news is that you know that the machine code for a halt instruction is (in hexadecimal) xDEADBEEF (in decimal, this is -559,038,737). The bad news is that the only program that the HAL 9000 operating system is willing to actually run is chess. Fortunately, we have a detailed printout of the source code for the chess program (an excerpt of all the important parts is given below). Note that the getValues function reads a set of non-zero integers and places each number in sequence in the array x. The original author of the program obviously expected us to just provide two positive numbers, however there’s nothing in the program that would stop us from inputting three or more numbers. We also know that the stack will use memory locations between 8000 and 8999, and that the initial frame pointer value will be 8996.
void getValues(void) {
int x[2]; // array to hold input values
int k = 0;
int n;
n = readFromKeyboard(); // whatever you type on the keyboard is assigned to n
while (n != 0) {
x[k] = nextNumber;
k = k + 1;
n = readFromKeyboard();// whatever you type on the keyboard is assigned to n
}
/* the rest of this function is not relevant */
}
int main(void) {
int x;
getValues();
/* the rest of main is not relevant */
}
What sequence of numbers should you type on the keyboard to force the computer to execute a halt instruction?
SAMPLE Solution
One of the first three numbers should be -559038737. The fourth number must be the address of where 0xdeadbeef was placed into memory. Typical values for the 4th number are 8992 (0xdeadbeef is the second number) or 8991 (0xdeadbeef is first number).

What you want to do is overflow the input such that the program will return into a set of instructions you have overwritten at the return address.
The problem lies here:
int x[2]; // array to hold input values
By passing more than 3 values in, you can overwrite memory that you shouldn't. Explaining the sample example:
First input -559,038,737 puts xDEADBEEF in memory
Second input -559,038,737, why not.
Third number -559,038,737 can't hurt
Fourth number 8992 is the address we want the function to return into.
When the function call returns, it will return to the address overwrote the return address on the stack with (8992).
Here are some handy resources, and an excerpt:
The actual buffer-overflow hack work slike this:
Find code with overflow potential.
Put the code to be executed in the
buffer, i.e., on the stack.
Point the return address to the same code
you have just put on the stack.
Also a good book on the topic is "Hacking: The art of exploitation" if you like messing around with stacks and calling procedures.
In your case, it seems they are looking for you to encode your instructions in integers passed to the input.
An article on buffer overflowing

Hint: Read about buffer overflow exploits.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight