How to solve a runtime error happening when I use a big size of static array - arrays

my development environment : visual studio
Now, I have to create a input file and print random numbers from 1 to 500000 without duplicating in the file. First, I considered that if I use a big size of local array, problems related to heap may happen. So, I tried to declare as a static array. Then, in main function, I put random numbers without overlapping in the array and wrote the numbers in input file accessing array elements. However, runtime errors(the continuous blinking of the cursor in the console window) continue to occur.
The source code is as follows.
#define SIZE 500000
int sort[500000];
int main()
{
FILE* input = NULL;
input = fopen("input.txt", "w");
if (sort != NULL)
{
srand((unsigned)time(NULL));
for (int i = 0; i < SIZE; i++)
{
sort[i] = (rand() % SIZE) + 1;
for (int j = 0; j < i; j++)
{
if (sort[i] == sort[j])
{
i--;
break;
}
}
}
for (int i = 0; i < SIZE; i++)
{
fprintf(input, "%d ", sort[i]);
}
fclose(input);
}
return 0;
}
When I tried to reduce the array size from 1 to 5000, it has been implemented. So, Carefully, I think it's a memory out phenomenon. Finally, I'd appreciate it if you could comment on how to solve this problem.

“First, I considered that if I use a big size of local array, problems related to heap may happen.”
That does not make any sense. Automatic local objects generally come from the stack, not the heap. (Also, “heap” is the wrong word; a heap is a particular kind of data structure, but the malloc family of routines may use other data structures for managing memory. This can be referred to simply as dynamically allocated memory or allocated memory.)
However, runtime errors(the continuous blinking of the cursor in the console window)…
Continuous blinking of the cursor is normal operation, not a run-time error. Perhaps you are trying to say your program continues executing without ever stopping.
#define SIZE 500000<br>
...
sort[i] = (rand() % SIZE) + 1;
The C standard only requires rand to generate numbers from 0 to 32767. Some implementations may provide more. However, if your implementation does not generate numbers up to 499,999, then it will never generate the numbers required to fill the array using this method.
Also, using % to reduce the rand result skews the distribution. For example, if we were reducing modulo 30,000, and rand generated numbers from 0 to 44,999, then rand() % 30000 would generate the numbers from 0 to 14,999 each two times out of every 45,000 and the numbers from 15,000 to 29,999 each one time out of every 45,000.
for (int j = 0; j < i; j++)
So this algorithm attempts to find new numbers by rejecting those that duplicate previous numbers. When working on the last of n numbers, the average number of tries is n, if the selection of random numbers is uniform. When working on the second-to-last number, the average is n/2. When working on the third-to-last, the average is n/3. So the average number of tries for all the numbers is n + n/2 + n/3 + n/4 + n/5 + … 1.
For 5000 elements, this sum is around 45,472.5. For 500,000 elements, it is around 6,849,790. So your program will average around 150 times the number of tries with 500,000 elements than with 5,000. However, each try also takes longer: For the first try, you check against zero prior elements for duplicates. For the second, you check against one prior element. For try n, you check against n−1 elements. So, for the last of 500,000 elements, you check against 499,999 elements, and, on average, you have to repeat this 500,000 times. So the last try takes around 500,000•499,999 = 249,999,500,000 units of work.
Refining this estimate, for each selection i, a successful attempt that gets completely through the loop of checking requires checking against all i−1 prior numbers. An unsuccessful attempt will average going halfway through the prior numbers. So, for selection i, there is one successful check of i−1 numbers and, on average, n/(n+1−i) unsuccessful checks of an average of (i−1)/2 numbers.
For 5,000 numbers, the average number of checks will be around 107,455,347. For 500,000 numbers, the average will be around 1,649,951,055,183. Thus, your program with 500,000 numbers takes more than 15,000 times as long than with 5,000 numbers.
When I tried to reduce the array size from 1 to 5000, it has been implemented.
I think you mean that with an array size of 5,000, the program completes execution in a short amount of time?
So, Carefully, I think it's a memory out phenomenon.
No, there is no memory issue here. Modern general-purpose computer systems easily handle static arrays of 500,000 int.
Finally, I'd appreciate it if you could comment on how to solve this problem.
Use a Fischer-Yates shuffle: Fill the array A with integers from 1 to SIZE. Set a counter, say d to the number of selections completed so far, initially zero. Then pick a random number r from 1 to SIZE-d. Move the number in that position of the array to the front by swapping A[r] with A[d]. Then increment d. Repeat until d reaches SIZE-1.
This will swap a random element of the initial array into A[0], then a random element from those remaining into A[1], then a random element from those remaining into A[2], and so on. (We stop when d reaches SIZE-1 rather than when it reaches SIZE because, once d reaches SIZE-1, there is only one more selection to make, but there is also only one number left, and it is already in the last position in the array.)

Related

Cycling through interval in C efficiently

I have dynamically allocated array consisting of a lot of numbers (200 000+) and I have to find out, if (and how many) these numbers are contained in given interval. There can be duplicates and all the numbers are in random order.
Example of numbers I get at the beginning:
{1,2,3,1484984,48941651,489416,1816,168189161,6484,8169181,9681916,121,231,684979,795641,231484891,...}
Given interval:
<2;150000>
I created a simple algorithm with 2 for loops cycling through all numbers:
for( int j = 0; j <= numberOfRepeats; j++){
for( int i = 0; i < arraySize; i++){
if(currentNumber == array[i]){
counter++;
}
}
currentNumber++;
}
printf(" -> %d\n", counter);
}
This algorithm is too slow for my task. Is there more efficient way for me to implement my solution? Could sorting the arrays by value help in this case / wouldn't that be too slow?
Example of working program:
{ 1, 7, 22, 4, 7, 5, 11, 9, 1 }
<4;7>
-> 4
The problem was simple as the single comment in my question answered it - there was no reason for second loop. Single loop could do it alone.
My changed code:
for(int i = 0; i <= arraySize-1; i++){
if(array[i] <= endOfInterval && array[i] >= startOfInterval){
counter++;
}
This algorithm is too slow for my task. Is there more efficient way for me to implement my solution? Could sorting the arrays by value help in this case / wouldn't that be too slow?
Of course, it is slow. A single pass algorithm to count the number of elements that are in the set should suffice, just count them in a single pass if they pass the test (be n[i] >= lower bound && be n[i] < upper bound or similar approach) will do the work.
Only in case you need to consider duplicates (e.g. not counting them) you will need to consider if you have already touched them or no. In that case, the sorting solution will be faster (a qsort(3) call is O(nlog(n)) against the O(nn) your double loop is doing, so it will run in an almost linear, then you make a second pass over the data (converting your complexity to O(nlog(n) + n), still lower than O(nn) for the large amount of data you have.
Sorting has the advantage that puts all the repeated key values together, so you have to consider only if the last element you read was the same as the one you are processing now, if it is different, then count it only if it is in the specified range.
One final note: Reading a set of 200,000 integers into an array to filter them, based on some criteria is normally a bad, non-scalable way to solve a problem. Your problem (select the elements that belong to a given interval) allow you for a scalable and better solution by streaming the problem (you read a number, check if it is in the interval, then output it, or count it, or whatever you like to do on it), without using a large amount of memory to hold them all before starting. That is far better way to solve a problem, as it allows you to read a true unbounded set of numbers (coming e.g. from a file) and producing an output based on that:
#include <stdio.h>
#define A (2)
#define B (150000)
int main()
{
int the_number;
size_t count = 0;
int res;
while ((res = scanf("%d", &the_number)) > 0) {
if (the_number >= A && the_number <= B)
count++;
}
printf("%zd numbers fitted in the range\n", count);
}
on this example you can give the program 1.0E26 numbers (assuming that you have an input file system large enough to hold a file this size) and your program will be able to handle it (you cannot create an array with capacity to hold 10^26 values)

Given an array of integers of size n+1 consisting of the elements [1,n]. All elements are unique except one which is duplicated k times

I have been attempting to solve the following problem:
You are given an array of n+1 integers where all the elements lies in [1,n]. You are also given that one of the elements is duplicated a certain number of times, whilst the others are distinct. Develop an algorithm to find both the duplicated number and the number of times it is duplicated.
Here is my solution where I let k = number of duplications:
struct LatticePoint{ // to hold duplicate and k
int a;
int b;
LatticePoint(int a_, int b_) : a(a_), b(b_) {}
}
LatticePoint findDuplicateAndK(const std::vector<int>& A){
int n = A.size() - 1;
std::vector<int> Numbers (n);
for(int i = 0; i < n + 1; ++i){
++Numbers[A[i] - 1]; // A[i] in range [1,n] so no out-of-access
}
int i = 0;
while(i < n){
if(Numbers[i] > 1) {
int duplicate = i + 1;
int k = Numbers[i] - 1;
LatticePoint result{duplicate, k};
return LatticePoint;
}
So, the basic idea is this: we go along the array and each time we see the number A[i] we increment the value of Numbers[A[i]]. Since only the duplicate appears more than once, the index of the entry of Numbers with value greater than 1 must be the duplicate number with the value of the entry the number of duplications - 1. This algorithm of O(n) in time complexity and O(n) in space.
I was wondering if someone had a solution that is better in time and/or space? (or indeed if there are any errors in my solution...)
You can reduce the scratch space to n bits instead of n ints, provided you either have or are willing to write a bitset with run-time specified size (see boost::dynamic_bitset).
You don't need to collect duplicate counts until you know which element is duplicated, and then you only need to keep that count. So all you need to track is whether you have previously seen the value (hence, n bits). Once you find the duplicated value, set count to 2 and run through the rest of the vector, incrementing count each time you hit an instance of the value. (You initialise count to 2, since by the time you get there, you will have seen exactly two of them.)
That's still O(n) space, but the constant factor is a lot smaller.
The idea of your code works.
But, thanks to the n+1 elements, we can achieve other tradeoffs of time and space.
If we have some number of buckets we're dividing numbers between, putting n+1 numbers in means that some bucket has to wind up with more than expected. This is a variant on the well-known pigeonhole principle.
So we use 2 buckets, one for the range 1..floor(n/2) and one for floor(n/2)+1..n. After one pass through the array, we know which half the answer is in. We then divide that half into halves, make another pass, and so on. This leads to a binary search which will get the answer with O(1) data, and with ceil(log_2(n)) passes, each taking time O(n). Therefore we get the answer in time O(n log(n)).
Now we don't need to use 2 buckets. If we used 3, we'd take ceil(log_3(n)) passes. So as we increased the fixed number of buckets, we take more space and save time. Are there other tradeoffs?
Well you showed how to do it in 1 pass with n buckets. How many buckets do you need to do it in 2 passes? The answer turns out to be at least sqrt(n) bucekts. And 3 passes is possible with the cube root. And so on.
So you get a whole family of tradeoffs where the more buckets you have, the more space you need, but the fewer passes. And your solution is merely at the extreme end, taking the most spaces and the least time.
Here's a cheekier algorithm, which requires only constant space but rearranges the input vector. (It only reorders; all the original elements are still present at the end.)
It's still O(n) time, although that might not be completely obvious.
The idea is to try to rearrange the array so that A[i] is i, until we find the duplicate. The duplicate will show up when we try to put an element at the right index and it turns out that that index already holds that element. With that, we've found the duplicate; we have a value we want to move to A[j] but the same value is already at A[j]. We then scan through the rest of the array, incrementing the count every time we find another instance.
#include <utility>
#include <vector>
std::pair<int, int> count_dup(std::vector<int> A) {
/* Try to put each element in its "home" position (that is,
* where the value is the same as the index). Since the
* values start at 1, A[0] isn't home to anyone, so we start
* the loop at 1.
*/
int n = A.size();
for (int i = 1; i < n; ++i) {
while (A[i] != i) {
int j = A[i];
if (A[j] == j) {
/* j is the duplicate. Now we need to count them.
* We have one at i. There's one at j, too, but we only
* need to add it if we're not going to run into it in
* the scan. And there might be one at position 0. After that,
* we just scan through the rest of the array.
*/
int count = 1;
if (A[0] == j) ++count;
if (j < i) ++count;
for (++i; i < n; ++i) {
if (A[i] == j) ++count;
}
return std::make_pair(j, count);
}
/* This swap can only happen once per element. */
std::swap(A[i], A[j]);
}
}
/* If we get here, every element from 1 to n is at home.
* So the duplicate must be A[0], and the duplicate count
* must be 2.
*/
return std::make_pair(A[0], 2);
}
A parallel solution with O(1) complexity is possible.
Introduce an array of atomic booleans and two atomic integers called duplicate and count. First set count to 1. Then access the array in parallel at the index positions of the numbers and perform a test-and-set operation on the boolean. If a boolean is set already, assign the number to duplicate and increment count.
This solution may not always perform better than the suggested sequential alternatives. Certainly not if all numbers are duplicates. Still, it has constant complexity in theory. Or maybe linear complexity in the number of duplicates. I am not quite sure. However, it should perform well when using many cores and especially if the test-and-set and increment operations are lock-free.

Making a character array rotate its cells left/right n times

I'm totally new here but I heard a lot about this site and now that I've been accepted for a 7 months software development 'bootcamp' I'm sharpening my C knowledge for an upcoming test.
I've been assigned a question on a test that I've passed already, but I did not finish that question and it bothers me quite a lot.
The question was a task to write a program in C that moves a character (char) array's cells by 1 to the left (it doesn't quite matter in which direction for me, but the question specified left). And I also took upon myself NOT to use a temporary array/stack or any other structure to hold the entire array data during execution.
So a 'string' or array of chars containing '0' '1' '2' 'A' 'B' 'C' will become
'1' '2' 'A' 'B' 'C' '0' after using the function once.
Writing this was no problem, I believe I ended up with something similar to:
void ArrayCharMoveLeft(char arr[], int arrsize, int times) {
int i;
for (i = 0; i <= arrsize ; i++) {
ArraySwap2CellsChar(arr, i, i+1);
}
}
As you can see the function is somewhat modular since it allows to input how many times the cells need to move or shift to the left. I did not implement it, but that was the idea.
As far as I know there are 3 ways to make this:
Loop ArrayCharMoveLeft times times. This feels instinctively inefficient.
Use recursion in ArrayCharMoveLeft. This should resemble the first solution, but I'm not 100% sure on how to implement this.
This is the way I'm trying to figure out: No loop within loop, no recursion, no temporary array, the program will know how to move the cells x times to the left/right without any issues.
The problem is that after swapping say N times of cells in the array, the remaining array size - times are sometimes not organized. For example:
Using ArrayCharMoveLeft with 3 as times with our given array mentioned above will yield
ABC021 instead of the expected value of ABC012.
I've run the following function for this:
int i;
char* lastcell;
if (!(times % arrsize))
{
printf("Nothing to move!\n");
return;
}
times = times % arrsize;
// Input checking. in case user inputs multiples of the array size, auto reduce to array size reminder
for (i = 0; i < arrsize-times; i++) {
printf("I = %d ", i);
PrintArray(arr, arrsize);
ArraySwap2CellsChar(arr, i, i+times);
}
As you can see the for runs from 0 to array size - times. If this function is used, say with an array containing 14 chars. Then using times = 5 will make the for run from 0 to 9, so cells 10 - 14 are NOT in order (but the rest are).
The worst thing about this is that the remaining cells always maintain the sequence, but at different position. Meaning instead of 0123 they could be 3012 or 2301... etc.
I've run different arrays on different times values and didn't find a particular pattern such as "if remaining cells = 3 then use ArrayCharMoveLeft on remaining cells with times = 1).
It always seem to be 1 out of 2 options: the remaining cells are in order, or shifted with different values. It seems to be something similar to this:
times shift+direction to allign
1 0
2 0
3 0
4 1R
5 3R
6 5R
7 3R
8 1R
the numbers change with different times and arrays. Anyone got an idea for this?
even if you use recursion or loops within loops, I'd like to hear a possible solution. Only firm rule for this is not to use a temporary array.
Thanks in advance!
If irrespective of efficiency or simplicity for the purpose of studying you want to use only exchanges of two array elements with ArraySwap2CellsChar, you can keep your loop with some adjustment. As you noted, the given for (i = 0; i < arrsize-times; i++) loop leaves the last times elements out of place. In order to correctly place all elements, the loop condition has to be i < arrsize-1 (one less suffices because if every element but the last is correct, the last one must be right, too). Of course when i runs nearly up to arrsize, i+times can't be kept as the other swap index; instead, the correct index j of the element which is to be put at index i has to be computed. This computation turns out somewhat tricky, due to the element having been swapped already from its original place. Here's a modified variant of your loop:
for (i = 0; i < arrsize-1; i++)
{
printf("i = %d ", i);
int j = i+times;
while (arrsize <= j) j %= arrsize, j += (i-j+times-1)/times*times;
printf("j = %d ", j);
PrintArray(arr, arrsize);
ArraySwap2CellsChar(arr, i, j);
}
Use standard library functions memcpy, memmove, etc as they are very optimized for your platform.
Use the correct type for sizes - size_t not int
char *ArrayCharMoveLeft(char *arr, const size_t arrsize, size_t ntimes)
{
ntimes %= arrsize;
if(ntimes)
{
char temp[ntimes];
memcpy(temp, arr, ntimes);
memmove(arr, arr + ntimes, arrsize - ntimes);
memcpy(arr + arrsize - ntimes, temp, ntimes);
}
return arr;
}
But you want it without the temporary array (more memory efficient, very bad performance-wise):
char *ArrayCharMoveLeft(char *arr, size_t arrsize, size_t ntimes)
{
ntimes %= arrsize;
while(ntimes--)
{
char temp = arr[0];
memmove(arr, arr + 1, arrsize - 1);
arr[arrsize -1] = temp;
}
return arr;
}
https://godbolt.org/z/od68dKTWq
https://godbolt.org/z/noah9zdYY
Disclaimer: I'm not sure if it's common to share a full working code here or not, since this is literally my first question asked here, so I'll refrain from doing so assuming the idea is answering specific questions, and not providing an example solution for grabs (which might defeat the purpose of studying and exploring C). This argument is backed by the fact that this specific task is derived from a programing test used by a programing course and it's purpose is to filter out applicants who aren't fit for intense 7 months training in software development. If you still wish to see my code, message me privately.
So, with a great amount of help from #Armali I'm happy to announce the question is answered! Together we came up with a function that takes an array of characters in C (string), and without using any previously written libraries (such as strings.h), or even a temporary array, it rotates all the cells in the array N times to the left.
Example: using ArrayCharMoveLeft() on the following array with N = 5:
Original array: 0123456789ABCDEF
Updated array: 56789ABCDEF01234
As you can see the first cell (0) is now the sixth cell (5), the 2nd cell is the 7th cell and so on. So each cell was moved to the left 5 times. The first 5 cells 'overflow' to the end of the array and now appear as the Last 5 cells, while maintaining their order.
The function works with various array lengths and N values.
This is not any sort of achievement, but rather an attempt to execute the task with as little variables as possible (only 4 ints, besides the char array, also counting the sub function used to swap the cells).
It was achieved using a nested loop so by no means its efficient runtime-wise, just memory wise, while still being self-coded functions, with no external libraries used (except stdio.h).
Refer to Armali's posted solution, it should get you the answer for this question.

How Can I Reduce The Time Taken For The Code Execution Below?

Purpose: To store numbers between 1-1000 in a random order.
My Code:
#include<time.h>
int main(){
int arr[1000]={0}, store[1000];
for(int i=0;i<1000;i++){
int no;
while(1){
srand(time(0));
no=rand();
no%=1001;
if(no==0)
continue;
//This ensures Loop will continue till the time a unique random number is generated
if(arr[no-1]!=no){
arr[no-1]=no;
break;
}
}
store[i]=no;
}
For me the code works perfectly fine,however, it took me 58 minutes to execute. Is there a way to speed up the program?
Practical Purpose: I have around 4000 employees and I want to give each one of them a unique random number for an upcoming project.
I tried to execute a code using 1000 to check the efficiency.
Create an array containing 1 to n. Iterate through the list and swap that entry with one that is randomly selected. You will then have a random list containing 1 to n.
From your first sentence the numbers do not have to be random but only need to be in random order.
Therefore you can try a simple approach:
Create an array arr of n elements and initialize with values 1..n
run a loop (counter i) over range 0..n-1
Pick a random number x in range 0..n-i-1
Swap element at index i with index i+x
With this algorithm you don't need to worry about collisions of random numbers.
You swap the numbers and afterwards you decrease the range of candidates.
A number picked once is not available to pick in later steps.
This solution is similar to William's answer. I don't really know if the result has better "randomness" or not.
Try to avoid branches on random numbers. And it will most likely run faster on modern processors.
This because the processor is not able to predict which way to chose on random numbers.
For example
while(1){
srand(time(0));
no=rand();
no%=1001;
if(no==0)
continue;
// ...
}
could be changed to
srand(time(0)); // better outside the loop
while(1) {
no = rand() % 1000 + 1;
// ...
}

Best (fastest) way to find the number most frequently entered in C?

Well, I think the title basically explains my doubt. I will have n numbers to read, this n numbers go from 1 to x, where x is at most 105. What is the fastest (less possible time to run it) way to find out which number were inserted more times? That knowing that the number that appears most times appears more than half of the times.
What I've tried so far:
//for (1<=x<=10⁵)
int v[100000+1];
//multiple instances , ends when n = 0
while (scanf("%d", &n)&&n>0) {
zerofill(v);
for (i=0; i<n; i++) {
scanf("%d", &x);
v[x]++;
if (v[x]>n/2)
i=n;
}
printf("%d\n", x);
}
Zero-filling a array of x positions and increasing the position vector[x] and at the same time verifying if vector[x] is greater than n/2 it's not fast enough.
Any idea might help, thank you.
Observation: No need to care about amount of memory used.
The trivial solution of keeping a counter array is O(n) and you obviously can't get better than that. The fight is then about the constants and this is where a lot of details will play the game, including exactly what are the values of n and x, what kind of processor, what kind of architecture and so on.
On the other side this seems really the "knockout" problem, but that algorithm will need two passes over the data and an extra conditional, thus in practical terms in the computers I know it will be most probably slower than the array of counters solutions for a lot of n and x values.
The good point of the knockout solution is that you don't need to put a limit x on the values and you don't need any extra memory.
If you know already that there is a value with the absolute majority (and you simply need to find what is this value) then this could make it (but there are two conditionals in the inner loop):
initialize count = 0
loop over all elements
if count is 0 then set champion = element and count = 1
else if element != champion decrement count
else increment count
at the end of the loop your champion will be the value with the absolute majority of elements, if such a value is present.
But as said before I'd expect a trivial
for (int i=0,n=size; i<n; i++) {
if (++count[x[i]] > half) return x[i];
}
to be faster.
EDIT
After your edit seems you're really looking for the knockout algorithm, but caring about speed that's probably still the wrong question with modern computers (100000 elements is nothing even for a nail-sized single chip today).
I think you can create a max heap for the count of number you read,and use heap sort to find all the count which greater than n/2

Resources