String into Array- efficient way - arrays

Given a string : "A,B,C\n1,2,3\n,4,9,6" how can I convert it into this 2D array (in C)?
[A B C
1 2 3
4 9 6]
I thought to count the number of rows (by checking how many \ns there are +1) then count number of columns (by checking how many ',' there are +1) then, to allocate the 2D array. And then to put the values into the array.
Is there any way to do it in more efficient way? because I'm going through the same string many times.

Related

Efficient removal of duplicates in array

How can duplicates be removed and recorded from an array with the following constraints:
The running time must be at most O(n log n)
The additional memory used must be at most O(n)
The result must fulfil the following:
Duplicates must be moved to the end of the original array
The order of the first occurrence of each unique element must be preserved
For example, from this input:
int A[] = {2,3,7,3,2,11,2,3,1,15};
The result should be similar to this (only the order of duplicates may differ):
2 3 7 11 1 15 3 3 2 2
As I understand it, the goal is to split an array into two parts: unique elements and duplicates in such a way that the order of the first occurrence of the unique elements is preserved.
Using the the array of the OP as an example:
A={2,3,7,3,2,11,2,3,1,15}
A solution could do the following::
Initialize the helper array with indices 0, ..., n-1:
B={0,1,2,3,4,5,6,7,8,9}
Sort the pairs (A[i],B[i]) using A[i] as key and with a stable sorting algorithm of complexity O(n log n):
A={1,2,2,2,3,3,3,7,11,15}
B={8,0,4,6,1,3,7,2,5, 9}
With n being the size of the array, go through the pairs (A[i],B[i]) and for all duplicates (A[i]==A[i-1]), add n to B[i]:
A={1,2, 2, 2,3, 3, 3,7,11,15}
B={8,0,14,16,1,13,17,2, 5, 9}
Sort the pairs (A[i],B[i]) again, but now using B[i] as key:
A={2,3,7,11,1,15, 3, 2, 2, 3}
B={0,1,2, 5,8, 9,13,14,16,17}
A then contains the desired result.
Steps 1 and 3 are O(n) and steps 2 and 4 can be done in O(n log n), so overall complexity is O(n log n).
Note that this method also preserves the order of duplicates. If you want them sorted, you can assign indices n, n+1, ... in step 3 instead of adding n.
Here is a very important hint: when an algorithm is permitted O(n) extra space, that is not the same as saying it can only use the same amount of memory as the input array!
For example, given the input array int array[] = {2,3,7,3,2,11,2,3,1,15}; (10 elements)That is a total space of 10 * sizeof(int) bytes.On a 64-bit machine an int is 8 bytes long, making the array 80 bytes of data.
However, I can use more space for my extra array than just 80 bytes! In fact, I can make a histogram structure that looks like this:
struct histogram
{
bool is_used; // Is this element in use in the histogram?
int value; // The integer value represented by this element
size_t index; // The index in the output array of the FIRST instance of the value
size_t count; // The number of times the value appears in the source array
};
typedef struct histogram histogram;
And since that is a fixed, finite amount of space, I can feel totally free to allocate n of them!
histogram * new_histogram( size_t size )
{
return calloc( size, sizeof(struct histogram) );
}
On my machine that’s 240 bytes.
And yes, this absolutely, totally complies with the O(n) extra space requirement! (Because we are only using space for n extra items. Bigger items, yes, but only n of them.)
Goals
So, why make a histogram with all that extra stuff in it?
We are counting duplicates — suggesting that we should be looking at a Counting Sort, and hence, a histogram.
Accept integers in a range beyond [0,n).
The example array has 10 items, so our histogram should only have 10 slots. But there are integer values larger than 9.
Keep all the non-duplicate values in the same order as input
So we need to track the index of the first instance of each value in the input array.
We are obviously not sorting the data, but the basic idea behind a Counting Sort is to build a histogram and then use that histogram to overwrite the array with the ordered elements.
This is a powerful idea. We are going to tweak it.
The Algorithm
Remember that our input array is also our output array! So we will overwrite the array’s input values with our algorithm.
Let’s look at our example again:
2 3 7 3 2 11 2 3 1 15
  0    1    2    3    4    •5     6    7    8     9
❶ Build the histogram:
0 1 2 3 4 5 6 7 8 9 (index in histogram)
used?: no yes yes yes yes yes no yes no no
value: 0 11 2 3 1 15 0 7 0 0
index: 0 3 0 1 4 5 0 2 0 0
count: 0 1 3 3 1 1 0 1 0 0
I used a simple non-negative modulo function to get a hash index into the histogram: abs(value) % histogram_size, then found the first matching or unused entry, again modulo the histogram size. Our histogram has a single collision: 1 and 11 (mod 10) both hash to 1. Since we encountered 11 first it gets stored at index 1 of the histogram, and for 1 we had to seek to the first unused index: 4.
We can see that the duplicate values all have a count of 2 or more, and all non-duplicate values have a count of 1.
The magic here is the index value. Look at 11. It’s index is 3, not 5. If we look at our desired output we can see why:
2 3 7 11 1 15   2 2 3 3.
  0    1    2    •3     4     5       6    7    8    9
The 11 is in index 3 of the output. This is a very simple counting trick when building the histogram. Keep a running index that we only increment when we first add a value to the histogram. This index is where the value should appear in the ouput!
❷ Use the histogram to put the non-duplicate values into the array.
Clearly, anything with a non-zero count appears at least once in the input, so it must also be output.
Here’s where our magic histogram index first helps us. We already know exactly where in the array to put the value!
2 3 7 11 1 15
  0    1    2     3     4     5    ⟵   index into the array to put the value
You should take a moment to compare the array output index with the index values stored in the histogram above and convince yourself that it works.
❸ Use the histogram to put the duplicate values into the array.
So, at what index do we start putting duplicates into the array? Do we happen to have some magic index laying around somewhere that could help? From when we built the histogram?
Again stating the obvious, anything with a count greater than 1 is a value with duplicates. For each duplicate, put count-1 copies into the array.
We don’t care what order the duplicates appear, so we’ll just take them in the order they are stored in the histogram.
Complexity
The complexity of a Counting Sort is O(n+k): one pass over the input array (to build the histogram) and one pass over the histogram data (to rebuild the array in sorted order).
Our modification is: one pass over the input array (to build the histogram), then one pass over the histogram to build the non-duplicate partition, then one more pass over the histogram to build the duplicates partition. That’s a complexity of O(n+2k).
In both cases it reduces to an O(n) worst-case complexity. In fact, it is also an Ω(n) best-case complexity, making it a Θ(n) complexity — it takes the same processing per element no matter what the input.
Aaaaaahhhh! I gotta code that!!!?
Yep. It is a only a tiny bit more complex than you are used to. Remember, you only need a few things:
An array of integer values (obtained from the user?)
A histogram array
A function to turn an integer value into an index into the histogram
A function that does the three things:
Build the histogram from the array
Use the histogram to write the non-duplicate values back into the array in the correct spots
Use the histogram to write the duplicate values to the end of the array
Ability to print an integer array
Your main() should look something like this:
int main(void)
{
// Get number of integers to input
int size = 0;
scanf( "%d", &n );
// Allocate and get the integers
int * array = malloc( size );
for (int n = 0; n < size; n++)
scanf( "%d", &array[n] );
// Partition the array between non-duplicate and duplicate values
int pivot = partition( array, size );
// Print the results
print_array( "non-duplicates:", array, pivot );
print_array( "duplicates: ", array+pivot, size-pivot );
free( array );
return 0;
}
Notice the complete lack of input error checking. You can assume that your professor will test your program without inputting hello or anything like that.
You can do this!

2D array minimum sum of Y elements and just two rows that we can chose to get minimum

With given 2d array[X][Y], i have to find the smallest possible sum of Y elements but:
the sum must be created by using just 2 rows,
each value must be from different index
Example:
for array
7 3 7 9
2 20 10 6
8 8 8 8
Result should be 18, as we get 3 + 7 from 1st row and 2 + 6 from 2nd.
I've been thinking about few hours but i can't figure out how to deal with it.
Try this one here.
Method 1 (Naive Approach): Check every possible submatrix in given 2D
array. This solution requires 4 nested loops and time complexity of
this solution would be O(n^4).
Method 2 (Efficient Approach): Kadane’s algorithm for 1D array can be
used to reduce the time complexity to O(n^3).

Efficient algorithm to print sum of elements at all possible subsequences of length 2 to n+1 [duplicate]

This question already has answers here:
Sum of products of elements of all subarrays of length k
(2 answers)
Permutation of array
(13 answers)
Closed 7 years ago.
I will start with an example. Suppose we have an array of size 3 with elements a, b and c like: (where a, b and c are some numerical values)
|1 | 2| 3| |a | b| c|
(Assume index starts from 1 as shown in the example above)
Now all possible increasing sub-sequence of length 2 are:
12 23 13
so the sum of product of elements at those indexes is required, that is, ab+bc+ac
For length 3 we have only one increasing sub-sequence, that is, 123 so abc should be printed.
For length 4 we have no sequence so 0 is printed and the program terminates.
So output for the given array will be:
ab+bc+ac,abc,0
So for example if the elements a, b and c are 1, 2 and 3 respectively then the output should be 11,6,0
Similarly, for an array of size 4 with elements a,b,c,d the output will be:
ab+ac+ad+bc+bd+cd,abc+abd+acd+bcd,abcd,0
and so on...
Now obviously brute force will be too inefficient for large value of array size. I was wondering if there is an efficient algorithm to compute the output for an array of given size?
Edit 1: I tried finding a pattern. For example for an array of size 4:
The first value we need is :(ab+ac+bc)+d(a+b+c)= ab+ac+ad+bc+bd+cd (Take A=ab+ac+bd)
then the second value we need is:(abc) +d(A) = abc+abd+acd+bcd(B=abc)
then the third value we need is : (0) +d(B) = abcd(Let's take 0 as C)
then the fourth value we need is: +d(C) = 0
But it still requires a lot of computation and I can't figure out an efficient way to implement this.
Edit 2: My question is different then this since:
I don't need all possible permutations. I need all possible increasing sub-sequences from length 2 to n+1.
I also don't need to print all possible such sequences, I just need the value thus obtained (as explained above) and hence I am looking for some maths concept or/and some dynamic programming approach to solve this problem efficiently.
Note I am finding the set of all possible such increasing sub-sequences based on the index value and then computing based on the values at those index position as explained above.
As a post that seems to have disappeared pointed out one way is to get a recurrence relation. Let S(n,k) be the sum over increasing subsequences (of 1..n) of length k of the product of the array elements indexed by the sequence. Such a subsequence either ends in n or not; in the first case it's the concatenation of a subsequence of length k-1 of 1..n-1 and {n}; in the second case it's a subsequence of 1..n-1 of length k. Thus:
S(n,k) = S(n-1,k) + A[n] * S(n-1,k-1)
For this always to make sense we need to add:
S(n,0) = 1
S(n,m) = 0 for m>n

How do I find the maximum of each dimension in a cell array of matrices?

I am given a cell array A which consists of matrices of different sizes. For example, I could have a three element cell array where the dimensions for each element are:
A{1} -> 4 x 3
A{2} -> 16 x 4
A{3} -> 5 x 14
How would I traverse through the cell array and return the maximum for each dimension overall? For example, the expected output of this operation with the example A above should give:
[16 14]
This is because by examining the first dimension, the maximum number of rows over the three matrices is 16. Similarly, the maximum number of columns over the three matrices is 14.
My original answer returned the maximum element of the cell. Now including your comments the right code:
knedlsepp basically got it. Minor improvement in performance:
[a(:,1),a(:,2)]=cellfun(#size,A);
max(a)
I guess you are looking for:
max(cell2mat(cellfun(#size,A(:),'uni',0)),[],1)

Find intersection between two arrays with restrictions

I have to write a program in order to find the same numbers between two arrays.
The problem is that I have to do it in the most optimized way respecting some constraints:
-Having i,j indexes for the array A and w,x indexes for the array B, if A[i]=B[w] and A[j]=b[x] and i
-The maximum distance between these numbers has to be k (given by input);
-I have to use at maximum O(k) space in order to implement something to optimize the search;
-The numbers appears only once in each array (like sets).
I was thinking about constructing a balanced RBTree with k elements of the first array in order to optimize the search process, but I am in doubt about the space it requires (I think it's not O(k) because of the pointers and the color marking).
Anyone has a better idea about this problem?
Edit: I'll put my examples here to make it more clear:
Array A: 3 7 5 9 10 15 16 1 6 2
Array B: 4 8 5 13 1 17 2 11
Constant k = 6
Output: 5 1 2
Edit2: In the output the numbers must appear in the same sequence as they are in the arrays.
Using K as Max Distance
Assuming that when you say they must be presented in Array order that the order from one array is sufficient - assuming:
A: 1 2
B: 2 1
results in 1 2 or 2 1 and not either 1 or 2 since the ordering is crossed
Note that the K constraint makes this less optimal
The first observation is that anything in the larger array, past the index of the number of elements in the smaller array + K -1 can be ignored
The second observation is that all values are apparently int
The third observation is that this has to be optimal for huge arrays with a K that can be close to the size of the arrays
A radix sort is O(N) and takes O(N) size, so we will use that
In order to allow for K we can copy both arrays to parallel arrays of (value, position) and not copy values that are unreachable in the larger array as per observation 1 i.e.
A: 71, 23, 42 ==> A2: { 71, 0 }, { 23, 1 }, { 42, 2 }
We can also create a similar array for results that is the same size as the smaller array
We can modify the radix sort to move values and postions together
Algorythm:
1) Copy arrays [ O(1) ]
2) Radix sort array A and B by values [ O(1) ]
3) Walk A and B: [ O(1) ]
if A < B -> increment index in A
if A > B -> increment index in B
if A == B -> incremnt index in A and B
add original A to result IF the pos diffence is less than K
4) Radix sort results by position [ O(1) ]
5) print result values [ O(1) ]

Resources