Array of 20000000 elements limits [duplicate] - c

This question already has answers here:
Segmentation fault on large array sizes
(7 answers)
Closed 4 years ago.
For a university project, I have to sort a CSV file of 20 million records (wich are represented in 2^64 bit, for example, 10000000 or 7000000, so I used unsigned long long) using MergeSort. So, I developed this C file:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
// Path to the dataset
#define DATASET_PATH "/Volumes/HDD/Lorenzo/Unito/2 Anno/ASD/Progetto/Progetto 2017-2018/laboratorio-algoritmi-2017-18/Datasets/ex1/integers.csv"
#define ELEMENTS_TO_SCAN 1000000 // the numbers of elements to be scanned
void mergeSort(unsigned long long * arrayToSort, int leftIndex, int rightIndex);
void merge(unsigned long long * arrayToSort, int left, int center, int right);
void read();
void printArray();
// from "Introduction to Algorithms" of T. H. Cormen
void mergeSort(unsigned long long * arrayToSort, int leftIndex, int rightIndex){
if(leftIndex < rightIndex){
int center = (leftIndex + rightIndex) / 2;
mergeSort(arrayToSort, leftIndex, center);
mergeSort(arrayToSort, center + 1, rightIndex);
merge(arrayToSort, leftIndex, center, rightIndex);
}
}
// from "Introduction to Algorithms" of T. H. Cormen
void merge(unsigned long long * arrayToSort, int left, int center, int right){
int n1 = center - left + 1;
int n2 = right - center;
unsigned long long leftSubArray[n1+1];
unsigned long long rightSubArray[n2+1];
leftSubArray[n1] = ULLONG_MAX; // here Cormen use infinite
rightSubArray[n2] = ULLONG_MAX; // here Cormen use infinite
for(int i = 0; i < n1; i++)
leftSubArray[i] = arrayToSort[left + i];
for(int j = 0; j < n2; j++)
rightSubArray[j] = arrayToSort[center + j + 1];
int i = 0;
int j = 0;
int k = 0;
for(k = left; k <= right; k++){
if(leftSubArray[i] <= rightSubArray[j]){
arrayToSort[k] = leftSubArray[i];
i++;
} else {
arrayToSort[k] = rightSubArray[j];
j++;
}
}
}
// it reads all the dataset, and saves every line (wich contains a single element)
// in a position of an array to sort by MergeSort.
void read(char pathToDataset[], unsigned long long arrayToFill[]) {
FILE* dataset = fopen(pathToDataset, "r");
if(dataset == NULL ) {
printf("Error while opening the file.\n");
exit(0); // exit failure, it closes the program
}
int i = 0;
while (i < ELEMENTS_TO_SCAN && fscanf(dataset, "%llu", &arrayToFill[i])!=EOF) {
//printf("%llu\n", arrayToFill[i]); // ONLY FOR DEBUG, it wil print 20ML of lines!
i++;
}
printf("\nRead %d lines.\n", i);
fclose(dataset);
}
void printArray(unsigned long long * arrayToPrint, int arrayLength){
printf("[");
for(int i = 0; i < arrayLength; i++) {
if (i == arrayLength-1) {
printf("%llu]", arrayToPrint[i]);
}
else {
printf("%llu, ", arrayToPrint[i]);
}
}
}
int main() {
unsigned long long toSort [ELEMENTS_TO_SCAN] = {};
read(DATASET_PATH, toSort);
mergeSort(toSort,0,ELEMENTS_TO_SCAN-1);
printf("Merge finished\n");
return 0;
}
after some testing, if ELEMENTS_TO_SCAN is bigger than 500000 (= 1/4 of 20 million) i don't know why, but the output on the terminal is
Segmentation fault: 11
Someone can help me?

You’re doing a local variable declaration (eg on stack). If you’re dealing with larger arrays, consider making them global, or use dynamic arrays — in general dynamic would be better. Using globals makes it easier to get into bad habits.
Why are global variables bad, in a single threaded, non-os, embedded application
Segmentation fault 11 because of a 40 MB array in C

As people pointed out, this type of allocation can't be done on Stack. I would try dynamically allocating it, for that you just need to change the code like so:
int main() {
unsigned long long *toSort;
toSort = (unsigned long long) malloc(ELEMENTS_TO_SCAN*sizeof(unsigned long long));
read(DATASET_PATH, toSort);
mergeSort(toSort,0,ELEMENTS_TO_SCAN-1);
printf("Merge finished\n");
free(toSort);
return 0;
}
As you pointed the merge is the one causing problems. Just to note, if you use things like:
int array[n];
You will run into problems eventually, that's a given. If you don't know how much memory you will use at compile time, either use a data structure that supports the resizing like linked lists or dynamically allocate it .

Related

Struggling with PairInverse problem in c using recursion [duplicate]

I encountered a hard question I don't know the answer to: "Rearrange the digits from an integer in blocks of two with a recursive function" here's an example:
Input: 123456
unsigned long pairinvPrint(unsigned long number) {
printf("%d", number % 100);
if ((number / 100) <= 99) {
printf("%d", number / 100);
}
else {
pairinv(number / 100);
}
}
Output: 563412
More I/O Examples: 42 -> 42; 1234 -> 3412
However, the set circumstances to do this are hard (no loops, arrays, pointers, global- or static variables, no libraries) and it should not print the solution directly, rather return it upon a call like this:
printf("Rearrange int (%lu) = %lu", input, pairinvert(input));
Luckily there's one circumstance to make it easier, the number of the input digits is always even.
Now I experimented for a while, but cant come up with a working solution, except the invalid one using printf.
Does anyone have some inspiration for me or idea how to tackle this?
I'll bite :-)
unsigned long p(unsigned long p1, unsigned long p2) {
// no loops, no arrays, no pointers, no global, no static, no variables, no libraries
if (p1 < 100) return p2*100 + p1;
return p(p1/100, p2*100 + p1%100);
}
unsigned long pairinvert(unsigned long n) {
// no loops, no arrays, no pointers, no global, no static, no variables, no libraries
if (n < 100) return n;
return p(n/100, n%100);
}
// need <stdio.h> for printf()
#include <stdio.h>
int main(void) {
unsigned long input;
input = 123456;
printf("Rearrange int (%lu) = %lu\n", input, pairinvert(input));
input = 42;
printf("Rearrange int (%lu) = %lu\n", input, pairinvert(input));
input = 1234;
printf("Rearrange int (%lu) = %lu\n", input, pairinvert(input));
}
Following program should work.
#include <stdio.h>
void rearrange(int n, int *output) {
int lsd = 0, slsd = 0;
if(n == 0)
return;
if(n > 0) {
lsd = n%10;
}
if (n > 9) {
slsd = (n%100)/10;
}
*output = 100*(*output) + 10*slsd + lsd;
n = n/100;
rearrange(n, output);
}
int main() {
int n;
int output = 0;
scanf("%d", &n);
rearrange(n, &output);
printf("%d\n", output);
return 0;
}
It is simple to understand, so I am not writing any comments.
Note that it is tail recursive so with O2 optimization it can recurse infinitely.
Try this :
unsigned long pairinv(unsigned long number, unsigned long result) {
unsigned long n = number % 100; // Gets the two digit number
if (n == 0) return result; // If it's zero returns the result
result = result * 100 + n; // Else multiplies the result by 100, adds n
return pairinv(number / 100, result); // and continues by recursion
}
int main() {
unsigned long r= 0;
printf("%lu\n", pairinv(123456, r)); //==> 563412
return 0;
}

C Program crashes(Segmentation Fault) for large size of input array. How to prevent it without using static/global/malloc?

The following program is to sort a large array of random numbers using heapsort. The output of the program is the total execution time of the recursive heapSort function(in microseconds). The size of the input array is defined by the SIZE macro.
The program works fine for SIZE up to 1 million(1000000). But when I try to execute the program with SIZE 10 million(10000000), the program generates segmentation fault(core dumped).
Note: I have already tried increasing the soft and hard limits of the stack using ulimit -s command on Linux(128 MB). The SEGFAULT still persists.
Please suggest me any alterations to the code needed or any method which will overcome the existing SEGFAULT malady without having to declare the array dynamically or as global/static.
/* Program to implement Heap-Sort algorithm */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
long SIZE = 10000000; // Or #define SIZE 10000000
long heapSize;
void swap(long *p, long *q)
{
long temp = *p;
*p = *q;
*q = temp;
}
void heapify(long A[], long i)
{
long left, right, index_of_max;
left = 2*i + 1;
right = 2*i + 2;
if(left<heapSize && A[left]>A[i])
index_of_max = left;
else
index_of_max = i;
if(right<heapSize && A[right]>A[index_of_max])
index_of_max = right;
if(index_of_max != i)
{
swap(&A[index_of_max], &A[i]);
heapify(A, index_of_max);
}
}
void buildHeap(long A[])
{
long i;
for(i=SIZE/2; i>=0 ; i--)
heapify(A,i);
}
void heapSort(long A[])
{
long i;
buildHeap(A);
for(i=SIZE-1 ; i>=1 ; i--)
{
swap(&A[i], &A[0]);
heapSize--;
heapify(A, 0);
}
}
int main()
{
long i, A[SIZE];
heapSize = SIZE;
struct timespec start, end;
srand(time(NULL));
for(i = 0; i < SIZE; i++)
A[i] = rand() % SIZE;
/*printf("Unsorted Array is:-\n");
for(i = 0; i < SIZE; i++)
printf("%li\n", A[i]);
*/
clock_gettime(CLOCK_MONOTONIC_RAW, &start);//start timer
heapSort(A);
clock_gettime(CLOCK_MONOTONIC_RAW, &end);//end timer
//To find time taken by heapsort by calculating difference between start and stop time.
unsigned long delta_us = (end.tv_sec - start.tv_sec) * 1000000 \
+ (end.tv_nsec - start.tv_nsec) / 1000;
/*printf("Sorted Array is:-\n");
for(i = 0; i < SIZE; i++)
printf("%li\n", A[i]);
*/
printf("Heapsort took %lu microseconds for sorting of %li elements\n",delta_us, SIZE);
return 0;
}
So, once you plan to stick with stack-only approach, you have to understand who is the main consumer(s) of your stack space.
Player #1: Array A[] itself. Depending to the OS/build, it consumes approx. 40 or 80 Mb of stack. One-time only.
Player #2: Beware recursion! In your case, this is heapify() function. Each call consumes decent stack chunk to serve a calling convention, stack alignment like stack-frames etc. If you do that million times and tree-like schema, you have tens of megabytes spent here too. So, you can try to re-implement this function to non-recursive way to decrease stack size pressure.

Random matrix struct creation

I'm trying to make a struct that generates a random matrix and am getting "error: expected â=â, â,â, â;â, âasmâ or â_attribute_â before âmatrixâ" when compiling. How can I get this to work effectively and efficiently?
I guess expected errors usually are caused by typos but I don't see any.
I'm very new to C so pointers and malloc are quite foreign to me. I really appreciate your help.
/* It's called RandomMatrixMaker.c */
#include <stdio.h>
#include <stdlib.h>
typdef struct {
char* name;
int MID;
int MRows;
int MCols;
long[][]* MSpace;
} matrix;
matrix makeRIDMatrix(char* name, int MID, int MRows, int MCols) {
matrix m;
static int i, j, r;
m.name = name;
m.MID = MID;
m.MRows = MRows;
m.MCols = MCols;
for (i=0; i<m.MRows; i++) {
for (j=0; i<m.MCols; j++) {
r = random(101);
*(m.MSpace[i][j]) = r;
}
}
return m;
}
int main(void) {
makeRIDMatrix("test", 1, 10, 10);
return 0;
}
There is indeed a typo. You misspelled typedef:
typdef struct {
should be:
typedef struct {
EDIT:
Also, there's no reason to use static here:
static int i, j, r;
You can just get rid of the static modifier.
int i, j, r;
As another poster mentioned, there's a typo, but even with that corrected, it wouldn't compile, due to the definition of matrix.MSpace.
Let's begin in makeRIDMatrix(). You've declared an automatic (stack) variable of type "matrix". At the end of the function, you return that object. Whilst this is permissible, it's not advisable. If the struct is large, you will be copying a lot of data unnecessarily. Better to pass a pointer to a matrix into makeRIDMatrix(), and have makeRIDMatrix() fill in the contents.
The test in the inner loop is against i, but should be against j.
Next, let's look at the definition of "matrix". The definition of "MSpace" is a mess, and wouldn't even compile. Even if it did, because you haven't defined the length of a row, the compiler would not be able to calcuate the offset to any given item in the array. You want a two-dimensional array without giving the row length, but you can't do that in C. You can in other languages, but not C.
There's a lot more I could point out, but I'd be missing the real point. The real point is this:
C Is Not Java.
(It's also not one of the interpreted languages such as JavaScript, PHP, Python, Ruby and so on.)
You don't get dynamically-expanding arrays; you don't get automatic allocation of memory; you don't get garbage collection of unreferenced memory.
What you need is something more like this:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
typedef struct {
char* name;
int MID;
unsigned int MRows;
unsigned int MCols;
long *MSpace;
} matrix;
void makeRIDMatrix(matrix *pmx, char* name, int MID,
unsigned int MRows, unsigned int MCols) {
int i, j;
long *MSpace = malloc(sizeof(*MSpace)*MRows*MCols);
if (MSpace == NULL) {
return;
}
pmx->name = name;
pmx->MID = MID;
pmx->MRows = MRows;
pmx->MCols = MCols;
pmx->MSpace = MSpace;
srandom((unsigned int)time(NULL));
for (i=0; i<MRows; i++) {
for (j=0; i<MCols; j++) {
long int r = random() % 101L;
*(MSpace++) = r;
}
}
}
inline long * item_addr(const matrix *pmx,
unsigned int row, unsigned int col) {
if (pmx == NULL || pmx->MSpace == NULL
|| row >= pmx->MRows || col >= pmx->MCols) {
return NULL;
}
return &(pmx->MSpace[row * pmx->MCols + col]);
}
long get_item(const matrix *pmx, unsigned int row, unsigned int col) {
long *addr = item_addr(pmx, row, col);
return addr == NULL ? 0L : *addr;
}
void set_item(matrix *pmx,
unsigned int row, unsigned int col,
long val) {
long *addr = item_addr(pmx, row, col);
if (addr != NULL) {
*addr = val;
}
}
int main(void) {
matrix m;
makeRIDMatrix(&m, "test", 1, 10, 10);
return 0;
}
Note a few things here. Firstly, for efficiency, I fill the array as if it were one-dimensional. All subsequent get/set of array items should be done through the getter/setter functions, for safety.
Secondly, a hidden nasty: makeRIDMatrix() has used malloc() to allocate the memory - but it's going to be job of the calling function (or its successors) explciitly to free() the allocated pointer when it's finished with.
Thirdly, I've changed the rows/cols variables to unsigned int - there's little sense in definining an array with negative indices!
Fourthly: little error checking. For example, makeRIDMatrix() neither knows nor cares whether the parameter values are sensible (e.g. the matrix pointer isn't checked for NULLness). That's an exercise for the student.
Fifthly, I've fixed your random number usage - after a fashion. Another exercise for the student: why is the way I did it not good practice?
However - all of this is moot. You need to get yourself a good C textbook, or a good online course, and work through the examples. The code you've given here shows that you're punching above your weight at the moment, and you need to develop some more C muscles before going into that ring!
In relation to your question about "variable sized arrays", you could have something like:
/* can stick this into your struct, this is just an example */
size_t rows, cols;
long **matrix;
/* set the values of rows, cols */
/* create the "array" of rows (array of pointers to longs) */
matrix = (long**)malloc(rows * sizeof(long*));
/* create the array of columns (array of longs at each row) */
for (i = 0; i < rows; i++)
matrix[i] = (long*)malloc(cols * sizeof(long));
/* ... */
/* free the memory at the end */
for (i = 0; i < rows; i++)
free(matrix[i]);
free(matrix);
Then you can just access the dynamically allocated matrix similar to any other array of arrays.
ie. to set element at the first row (row 0) and fourth column (column 3) to 5:
matrix[0][3] = 5;

Sorting strings with C

My aim is to write an app which generates an char - array (each should be random-filled with strings of the length 4) and sorts this array. The time this process takes should be measured. I coded the following:
#include <string.h>
#include <jni.h>
#include <time.h>
#include <math.h>
clock_t start, finish;
static int ARRAY_LENGTH = 200;
static int WORD_LENGTH = 4;
char values[200];
void sortStringArray(void){
int i, j;
for(i = 0; i < ARRAY_LENGTH; i++){
for(j = 0; j < ARRAY_LENGTH-1; j++){
if(strcmp(values[j], values[j+1]) > 0) {
char holder = values[j+1];
values[j+1] = values[j];
values[j] = holder;
}
}
}
}
char generateRandomChar(char aC[]){
int length = strlen(aC);
char randStr[WORD_LENGTH];
int m;
for(m = 0; m <WORD_LENGTH; m++){
int randNr = rand()%length;
randStr[m] = aC[randNr];
}
return randStr;
}
void fillStringArray(void)
{
char allowedChars[] = "abcdefghijklmnopqrstuvwxyz";
int k;
for(k = 0; k < ARRAY_LENGTH; k++){
char randStr = generateRandomChar(allowedChars);
values[k] = randStr;
}
}
double
Java_com_example_hellojni_HelloJni_processStringSort( JNIEnv* env, jobject thiz)
{
start = clock();
fillStringArray();
sortStringArray();
finish = clock();
return ((double)(finish - start));
}
Since I am pretty new to coding C, i am not that fimilar with the concept of pointers, and therefore i recieve some mistakes.
alt text http://img38.imageshack.us/img38/2894/androidndkdebugc.jpg
It would be helpful if sb could explain me where it would be useful to use a pointer in this code. Some help with the errors would be very appreciated.
Thanks! Ripei
Without re-writing your code from scratch, it is difficult to to know where to start. I'm afraid it is all wrong. In order to get a good understanding of pointer and character string use in C, you must read a good, authoritative book on the language, Luckily, C has one of the best such books in the world - The C Programming Language. If you haven't already got a copy, get one, and if you have, re-read the chapters on pointers and strings.
Well for one thing you seem to think that char means string.... sometimes? char means a character, a number between 0 and 255. As the warnings on line 15 say, values[j] and values[j+1] are not strings (char *), they are characters (char). You probably want to make values an array of strings, ie an array of arrays of characters.
The 2nd set of warnings you're getting are related to line 31, where you're returning an array of characters (a pointer) from a function that states that it returns a character. The compiler silently casts the pointer to a character (since a pointer is a number) and returns that. You'll end up with a random number, which is probably not what you want.
To fix this you'll have to make the function return a char *, but there's a catch. randStr is gone as soon as you get out of the function, thus making it impossible to return. You could use strdup to duplicate the string and, after you're done using it in your main function, you call free to get rid of it.
While we're on this function, the parameter to it should be a char *, not a char[]. They have different meanings.
The last message (the only error reported as such apparently) is because you didn't define rand(). Adding a #include <stdlib.h> at the beginning of the program should fix it.
Thank you very much Blindy for your hints. I tried to implement your hints. Now the program doesn't throw errors but the problem is that i can't check if the operation is done correctly with the programm-environment i've to work with. Do you think the code is correct as it is shown below? Also the time it takes is quite less: 11ms. Do I calc this right?
Neil Butterworth,... well your're probably right, but I had to start somehow... and I tried my best to do so.
Vinko Vrsalovic,... well you're not right ;) I did it step by step but I thought that its better to show you the whole program and all errors at one time.
#include <string.h>
#include <jni.h>
#include <time.h>
#include <stdlib.h>
long start, finish;
static int ARRAY_LENGTH = 500;
static int WORD_LENGTH = 4;
static int LOOPS = 10;
char *values[1000];
static long getTime(void){
struct timeval now;
gettimeofday(&now, NULL);
return (long)(now.tv_sec*1000 + now.tv_usec/1000);
}
void sortStringArray(void){
int i, j;
for(i = 0; i < ARRAY_LENGTH; i++){
for(j = 0; j < ARRAY_LENGTH-1; j++){
if(strcmp(values[j], values[j+1]) > 0) {
char *holder = values[j+1];
values[j+1] = values[j];
values[j] = holder;
}
}
}
}
char* generateRandomChar(char *aC){
int length = strlen(aC);
char randStr[WORD_LENGTH];
int m;
for(m = 0; m <WORD_LENGTH; m++){
int randNr = rand()%length;
randStr[m] = aC[randNr];
}
return strdup(randStr);
}
void fillStringArray(void)
{
char *allowedChars = "abcdefghijklmnopqrstuvwxyz";
int k;
for(k = 0; k < ARRAY_LENGTH; k++){
char *randStr = generateRandomChar(allowedChars);
values[k] = randStr;
}
}
jlong
Java_com_example_hellojni_HelloJni_processStringSort( JNIEnv* env, jobject thiz)
{
start = getTime();
int i;
for(i = 0; i < LOOPS; i++){
fillStringArray();
sortStringArray();
}
finish = getTime();
return (finish - start);
}

Algorithm to convert infinitely long base 2^32 number to printable base 10

I'm representing an infinitely precise integer as an array of unsigned ints for processing on a GPU. For debugging purposes I'd like to print the base 10 representation of one of these numbers, but am having difficulty wrapping my head around it. Here's what I'd like to do:
//the number 4*(2^32)^2+5*(2^32)^1+6*(2^32)^0
unsigned int aNumber[3] = {4,5,6};
char base10TextRepresentation[50];
convertBase2To32ToBase10Text(aNumber,base10TextRepresentation);
Any suggestions on how to approach this problem?
Edit: Here's a complete implementation thanks to drhirsch
#include <string.h>
#include <stdio.h>
#include <stdint.h>
#define SIZE 4
uint32_t divideBy10(uint32_t * number) {
uint32_t r = 0;
uint32_t d;
for (int i=0; i<SIZE; ++i) {
d = (number[i] + r*0x100000000) / 10;
r = (number[i] + r*0x100000000) % 10;
number[i] = d;
}
return r;
}
int zero(uint32_t* number) {
for (int i=0; i<SIZE; ++i) {
if (number[i] != 0) {
return 0;
}
}
return 1;
}
void swap(char *a, char *b) {
char tmp = *a;
*a = *b;
*b = tmp;
}
void reverse(char *str) {
int x = strlen(str);
for (int y = 0; y < x/2; y++) {
swap(&str[y],&str[x-y-1]);
}
}
void convertTo10Text(uint32_t* number, char* buf) {
int n = 0;
do {
int digit = divideBy10(number);
buf[n++] = digit + '0';
} while(!zero(number));
buf[n] = '\0';
reverse(buf);
}
int main(int argc, char** argv) {
uint32_t aNumber[SIZE] = {0,0xFFFFFFFF,0xFFFFFFFF,0xFFFFFFFF};
uint32_t bNumber[4] = {1,0,0,0};
char base10TextRepresentation[50];
convertTo10Text(aNumber, base10TextRepresentation);
printf("%s\n",base10TextRepresentation);
convertTo10Text(bNumber, base10TextRepresentation);
printf("%s\n",base10TextRepresentation);
}
If you have access to 64 bit arithmetic, it is easier. I would do something along the line of:
int32_t divideBy10(int32_t* number) {
uint32_t r = 0;
uint32_t d;
for (int i=0; i<SIZE; ++i) {
d = (number[i] + r*0x100000000) / 10;
r = (number[i] + r*0x100000000) % 10;
number[i] = d;
number[i] = r;
}
void convertTo10Text(int32_t* number, char* buf) {
do {
digit = divideBy10(number);
*buf++ = digit + '0';
} while (!isEqual(number, zero));
reverse(buf);
}
isEqual() and reverse() left to be implemented. divideBy10 divides by 10 and returns the remainder.
Fundamentally you need classic decimal printing using digit production by dividing your number by ten (in your base 2^32) repeatedly and using the remainder as digits. You may not have a divide by (anything, let alone) 10 routine, which is probably the key source of your problem.
If you are working in C or C++, you can get a complete infinite precision arithmetic package from GNU Bignum package. Most other widely used languages have similar packages available.
Of course, if you have too much free time, you can always implement multiprecision division yourself. You're already borrowing terminology from Knuth; he also supplies the multiprecision algorithms in Seminumerical Algorithms.
If it is .NET, take a look at this implementation of a BigInteger class.
How about using long doubles? Then you get 80bits in the mantissa, but I guess that the accuracy is lost when using floating point numbers.

Resources