Are bulk array operations faster than sequential operations? - arrays

Say I am given an array, and a function called replace:
void replace(from, to, items[])
whose job is to replace the array elements in the range [from, to) with the elements in items.
I'll assume the maximum size of the array is known beforehand, so I can ensure the array will never overflow.
My question is, if I am given a list of replacements (e.g. elements of the form (from, to, items)), is it possible for me to obtain the final resulting array with a faster time complexity than performing each operation sequentially?
In other words, is there any advantage to knowing the sequence of operations beforehand, or is it no better than being given each operation one by one (in terms of the asymptotic time complexity)?
Note: It seems like the question is confusing; I did not intend to imply that the number of elements replacing a given range is the same as the size of that range! It could be fewer or more, causing shifting, and the point of the question was was to ask if knowing them beforehand could avoid extra work like shifting in the worst case.

I think this may not be what you are finding, but using Rope can shorten the time-complexity. It provides operation on string like concat without "shift"ing actual array. (not needed to be a string of char. arbitary type of element can be used as alphabets)
According to Wikipedia (...Because I have not ever used it yet), Rope is a balaced binary tree of fragmented string. A rope represents long string concatenated all fragmented string.
your replace() function can be implemented using Split() and Concat() operation on the rope,
both operations takes only O(log(n)) time. Here, I put n as the length of the long string the rope represents.
To get normal array, rope can be converted in O(n) time using Report() operation.
so the answer is: There is a algorithm faster than sequencially applies string operation.
convert given array to a rope
apply every replace job on the rope. each operation runs in O(log(n)) and does not copy actual array items.
convert the processed rope to a real array (this causes copy of all items).
return it.

It's always linear, down to memcpy's level. The only way to speed it up is trade time for space - override the array subscript operator so that elements between from and to point to elements in items rather than the original array, which is not a good solution under any circumstances.

Related

Range Minimum Query for growing array

I have an array A[0..n] and I need to find the minimum value in the interval A[k₀..n]. Based on that, the array is extended with a value A[n+1] and I need the minimum in A[k₁..n+1]. Again the array is extended with some A[n+2] and queried for the min in A[k₂..n+2]. Is there a way to do each query in O(1) time (after some preprocessing)?
Compared with this earlier question: Range minimum queries when array is dynamic, a difference is that the queried interval start at varying positions k₀, k₁, k₂, ... The end of the queried interval is always the righmost end of the array. In my application I start with an empty array (n=0) so the preprocessing might be trivial. If this helps, in my application the new value used in the extension is always 1+(min returned by last query). But the positions k₀, k₁, k₂, ... depend on data outside of the array.
There is no way that I know of to make both the addition of a new element and the query happen in O(1), and it's probably impossible (though I'm not exactly sure how to prove this). But you can pretty easily make it happen in O(log(n)) using a segment tree. That's probably good enough for any practical application.

What C construct would allow me to 'reverse reference' an array?

Looking for an elegant way (or a construct with which I am unfamiliar) that allows me to do the equivalent of 'reverse referencing' an array. That is, say I have an integer array
handle[number] = nameNumber
Sometimes I know the number and need the nameNumber, but sometimes I only know the nameNumber and need the matching [number] in the array.
The integer nameNumber values are each unique, that is, no two nameNumbers that are the same, so every [number] and nameNumber pair are also unique.
Is there a good way to 'reverse reference' an array value (or some other construct) without having to sweep the entire array looking for the matching value, (or having to update and keep track of two different arrays with reverse value sets)?
If the array is sorted and you know the length of it, you could binary search for the element in the array. This would be an O(n log(n)) search instead of you doing O(n) search through the array. Divide the array in half and check if the element at the center is greater or less than what you're looking for, grab the half of the array your element is in, and divide in half again. Each decision you make will eliminate half of the elements in the array. Keep this process going and you'll eventually land on the element you're looking for.
I don't know whether it's acceptable for you to use C++ and boost libraries. If yes you can use boost::bimap<X, Y>.
Boost.Bimap is a bidirectional maps library for C++. With Boost.Bimap you can create associative containers in which both types can be used as key. A bimap can be thought of as a combination of a std::map and a std::map.

Dynamic array guarantee clarification

In Skiena's Algorithm Design Manual, he mentions at one point:
The primary thing lost using dynamic arrays is the guarantee that each array
access takes constant time in the worst case. Now all the queries will be fast, except
for those relatively few queries triggering array doubling. What we get instead is a
promise that the nth array access will be completed quickly enough that the total
effort expended so far will still be O(n).
I'm struggling to understand this. How will an array query expand the array?
Dynamic arrays are arrays where the size does not need to be specified (Think of an ArrayList in java). Under the hood, dynamic arrays are implemented using a regular array. Though, because it's a regular array the implementation of the ArrayList needs to specify the size of the underlying array.
So the typical way to handle this in dynamic arrays is to initialize the standard array with a certain amount of elements, then when it reached it's maximum elements, the array is doubled in size.
Because of this underlying functionality, most of the time it will take constant time when adding to a dynamic array, but occasionally it will double the size of the 'under the hood' standard array which will take longer than the normal add time.
If your confusion lies with his use of the word 'query', I believe he means to say 'adding or removing from the array' because a simple 'get' query shouldn't be related to the underlying standard array size.

Fast way to count smaller/equal/larger elements in array

I need to optimize my algorithm for counting larger/smaller/equal numbers in array(unsorted), than a given number.
I have to do this a lot of times and given array also can have thousands of elements.
Array doesn't change, number is changing
Example:
array: 1,2,3,4,5
n = 3
Number of <: 2
Number of >: 2
Number of ==:1
First thought:
Iterate through the array and check if element is > or < or == than n.
O(n*k)
Possible optimization:
O((n+k) * logn)
Firstly sort the array (im using c qsort), then use binary search to find equal number, and then somehow count smaller and larger values. But how to do that?
If elements exists (bsearch returns pointer to the element) I also need to check if array contain possible duplicates of this elements (so I need to check before and after this elements while they are equal to found element), and then use some pointer operations to count larger and smaller values.
How to get number of values larger/smaller having a pointer to equal element?
But what to do if I don't find the value (bsearch returns null)?
If the array is unsorted, and the numbers in it have no other useful properties, there is no way to beat an O(n) approach of walking the array once, and counting items in the three buckets.
Sorting the array followed by a binary search would be no better than O(n), assuming that you employ a sort algorithm that is linear in time (e.g. a radix sort). For comparison-based sorts, such as quicksort, the timing would increase to O(n*log2n).
On the other hand, sorting would help if you need to run multiple queries against the same set of numbers. The timing for k queries against n numbers would go from O(n*k) for k linear searches to O(n+k*log2n) assuming a linear-time sort, or O((n+k)*log2n) with comparison-based sort. Given a sufficiently large k, the average query time would go down.
Since the array is (apparently?) not changing, presort it. This allows a binary search (Log(n))
a.) implement your own version of bsearch (it will be less code anyhow)
you can do it inline using indices vs. pointers
you won't need function pointers to a specialized function
b.) Since you say that you want to count the number of matches, you imply that the array can contain multiple entries with the same value (otherwise you would have used a boolean has_n).
This means you'll need to do a linear search for the beginning and end of the array of "n"s.
From which you can calculate the number less than n and greater than n.
It appears that you have some unwritten algorithm for choosing these (for n=3 you look for count of values greater and less than 2 and equal to 1, so there is no way to give specific code)
c.) For further optimization (at the expense of memory) you can sort the data into a binary search tree of structs that holds not just the value, but also the count and the number of values before and after each value. It may not use more memory at all if you have a lot of repeat values, but it is hard to tell without the dataset.
That's as much as I can help without code that describes your hidden algorithms and data or at least a sufficient description (aside from recommending a course or courses in data structures and algorithms).

Is it more efficent to use a linked list and delete nodes or use an array and do a small computation to a string to see if element can be skipped?

I am writing a program in C that reads a file. Each line of the file is a string of characters to which a computation will be done. The result of the computation on a particular string may imply that strings latter on in the file do not need any computations done to them. Also if the reverse of the string comes in alphabetical order before the (current, non-reversed) string then it does not need to be checked.
My question is would it be better to put each string in a linked list and delete each node after finding particular strings don’t need to be checked or using an array and checking the last few characters of a string and if it is alphabetically after the string in the previous element skip it? Either way the list or array only needs to be iterated through once.
Rules of thumb is that if you are dealing with small objects (< 32 bytes), std::vector is better than a linked list for most of general operations.
But for larger objects, (say, 1K bytes), generally you need to consider lists.
There is an article details the comparison you can check , the link is here
http://www.baptiste-wicht.com/2012/11/cpp-benchmark-vector-vs-list/3/
Without further details about what are your needs is a bit difficult to tell you which one would fit more with your requirements.
Arrays are easy to access, specially if you are going to do it in a non sequential way, but they are hard to maintain if you need to perform deletions on it or if you don't have a good approximation of the final number of elements.
Lists are good if you plan to access them sequentially, but terrible if you need to jump between its elements. Also deletion over them can be done in constant time if you are already in the node you want to delete.
I don't quite understand how you plan to access them since you say that either one would be iterated just once, but if that is the case then either structure would give you the similar performance since you are not really taking advantage of their key benefits.
It's really difficult to understand what you are trying to do, but it sounds like you should create an array of records, with each record holding one of your strings and a boolean flag to indicate whether it should be processed.
You set each record's flag to true as you load the array from the file.
You use one pointer to scan the array once, processing only the strings from records whose flags are still true.
For each record processed, you use a second pointer to scan from the first pointer + 1 to the end of the array, identify strings that won't need processing (in light of the current string), and set their flags to false.
-Al.

Resources