Find specific date in a date array - arrays

I am working with a datetime array s constructed as follows:
ds = datetime(2010,01,01,'TimeZone','Europe/Berlin');
de = datetime(2030,01,01,'TimeZone','Europe/Berlin');
s = ds:hours(1):de;
I am using ismember function to find the first occurrence of a specific date in that array.
ind = ismember(s,specificDate);
startPlace = find(ind,1);
The two lines from above are called many times in my application and consume quite some time. It is clear to me that Matlab compares ALL dates from s with specificDate, even though I need only the first occurrence of specificDate in s. So to speed up the application it would be good if Matlab would stop comparing specificDate to s once the first match is found.
One solution would be to use a while loop, but with the while loop the application becomes even slower (I tried it).
Any idea how to work around this problem?

I'm not sure what your specific use-case is here, but with the step size between elements of s being one hour, your index is simply going to be the difference in hours between your specific date and the start date, plus one. No need to create or search through s in the first place:
startPlace = hours(specificDate-ds)+1;
And an example to test each solution:
specificDate = datetime(2017, 1, 1, 'TimeZone', 'Europe/Berlin'); % Sample date
ind = ismember(s, specificDate); % Compare to the whole vector
startPlace = find(ind, 1); % Find the index
isequal(startPlace, hours(specificDate-ds)+1) % Check equality of solutions
ans =
logical
1 % Same!

What you can do to save yourself some time is to convert the datetime to a datenum in such a case you will be comparing numbers rather than strings, which significantly accelerates your processing time, like this:
s_new = datenum(s);
ind = ismember(s_new,datenum(specificDate));
startPlace = find(ind,1);

Related

Calculate change for each element in Pine Script array

I want to calculate the percent change between periods of each element in the "features" array (simply using the array as a grouping of financial time series data to report on). However the way the script is working now, it seems that it wants to calculate the percent change between each element in the array and not FOR each element in the array.
I don't think I've done anything wrong here in how I reference the array elements but I get the feeling there's some sort of 'under the hood' concept about how variables are processed by TV that is causing this issue.
//#version=4
study("My Script")
pct_change(source, period) =>
now = source
then = source[period]
missing_now = na(now)
missing_then = na(then)
if not missing_now and not missing_then
(now - then) / abs(then)
else
missing_now ? 0 : 1
evaluate(sources) =>
s = array.size(sources)
bar_changes = array.new_float()
for i = 0 to 99999
if i < s
source = array.get(sources, i)
array.push(bar_changes, pct_change(source, 1))
continue
else
break
bar_changes
features = array.new_float()
array.push(features, open)
array.push(features, high)
array.push(features, close)
bar_changes = evaluate(features)
plot(pct_change(open, 1))
plot(array.get(bar_changes, 0))
plot(pct_change(high, 1), color=color.aqua)
plot(array.get(bar_changes, 1), color=color.aqua)
plot(pct_change(close, 1), color=color.red)
plot(array.get(bar_changes, 2), color=color.red)
I think you have come across the same problem I'm faced with, and it relates to using history referencing operator [] in connection with setting array element values.
I've boiled it down to a very simple script illustrating the problem
here.
In essence what you are doing in your code is passing array element to a pct_change() function, which uses [] operator, and then use returned result in array.push() to set array element value.
I've experienced weird results when I was trying to experiment with arrays in my scripts as soon as they've been introduced, so I started to dig in order to find the root of the problem. And it came down to the script referenced in the link above. So far I believe that Pine Script still has some bugs when it comes to arrays so we just have to wait until they'll be fixed.

Number of events in one array within w minutes after any event in a second array

I have two sorted arrays of unix time stamps (so integers representing times at which some events happen). Lets call the arrays ts1 and ts2. I want to find the number of events in ts1 that lie after w-minutes of any event in ts2. Let's say the method signature is (take the first and second arrays and window size then return number of events in ts1 that are within w minutes after any event in ts2):
critical_events(ts1,ts2,w)->int
Here are some test cases:
## Test cases.
ev = critical_events([.5,1.5,2.5],[1,2,3],.5)
print(ev==0)
ev = critical_events([1.4,1.4,2.7],[1,2,3],.5)
print(ev==2)
ev = critical_events([1.4,2.4,3.4],[1,2,3],.5)
print(ev==3)
I expect the length of the first array, n to be much larger than the length of the second one, m. Looking for efficient algorithms in terms of time and space and if possible, their average and worst case complexities in terms of n and m, time and space.
My attempt: instead of explaining my attempts, I'll just link to the code which should be self-explanatory (or at least better than what I can do in words): https://gist.github.com/ryu577/fdc22af4ed17d122a6aa25684597745b
You are showing them as sorted, so my assumption is they are (need to be for this to work).
Because your first array is much larger than your second, you need to take your second in a for loop.
I am using example test case 2:ev = critical_events([1.4,1.4,2.7],[1,2,3],.5)
Next you can use a binary search on the first element of ts2 + interval (1 + 0.5) = 1.5.
Your startIndex is 0 and endIndex is 2. So in first compare you take all elements.
Doing a binary search will result in index 2 in ts1. Note: Because you have equal element in your array, you need to go right until you get higher number. What you can tell now is that 2.7 (and all elements after if there where any) are the element what lies after 1.5. Count is ts2.lenght - foundindex.
Now you can set your start index to 2. because you know, all on the left of this index is smaller and will not lie after 1.5 sec.
You take element2 and do a binary search, you will find index 2 ( 2.5 < 2.7), again:
Count = Count + ts2.lenght - foundindex.
To my knowledge, this is the fastest method. I believe the speed is Log(n).m.

Use of SumIf function with range object or arrays

I'm am trying to optimize a sub that uses Excel´s sumif function since it takes several time to finish.
The specific line (contained in to a for loop) is this one:
Cupones = Application.WorksheetFunction.SumIf(Range("Test_FecFinCup"), Arr_FecFlujos(i), Range("Test_MtoCup"))
Where the ranges are named ranges in the workbook, and Arr_FecFlujos() is an array of dates
That, code works fine, except for it takes to much time to finish.
I am trying this two approaches
Arrays:
Declare my arrays
With Test
Fluj = .Range(Range("Test_Emision").Cells(2, 1), Range("Test_Emision").Cells(2, 1).End(xlDown)).Rows.Count
Arr_FecFinCup = .Range("Test_FecFinCup")
Arr_MtoCup = .Range("Test_MtoCup")
End With
Cupones = Application.WorksheetFunction.SumIf(Arr_FecFinCup, Arr_FecFlujos(i), Arr_MtoCup)
Error tells me I need to work with Range Objects, so I changed to:
With Test
Set Rango1 = .Range("Test_FecIniCup")
Set Rango2 = .Range("Test_MtoCup")
End With
Cupones = Application.WorksheetFunction.SumIf(Rango1, Arr_FecFlujos(i), Rango2)
That one, doesn't shows any error messages, but the sum is incorrect.
Can anybody tell me what's working wrong with these methods and perhaps point me in the correct direction?
It seems that you try to sum a range of numbers using a range of criteria:
WorksheetFunction.SumIf(Arr_FecFinCup, Arr_FecFlujos(i), Arr_MtoCup)
As i know, if the criteria parameter is given a range, Excel don't iterate over that range but instead look for the one value in the criteria_range that coincides with the row of the cell that it is calculating.
For example
Range("D3") = WorksheetFunction.SumIf(Range("A1:A10"),Range("B1:B10"))
Excel will actually calculate as follow
Range("D3") = WorksheetFunction.SumIf(Range("A1:A10"),Range("B3"))
If there is no coincident, then the return is 0
For example
Range("D7") = WorksheetFunction.SumIf(Range("A1:A10"),Range("B1:B5"))
Then D7 is always 0 because looking for [B7] in [B1:B5] is out of range.
Therefore, to do a sum with multiple criterias, the correct way is using SUMIFS as suggested by #mrtiq.

Rounding problems when creating date vectors

I want to create a vector containing dates in matlab. For that I specified the start time and the stop time:
WHM01_start = datenum('01-JAN-2005 00:00')
WHM01_stop = datenum('01-SEP-2014 00:00')
and then I created the vector with
WHM01_timevec = WHM01_start:datenum('01-JAN-2014 00:20') - datenum('01-JAN-2014 00:00'):WHM01_stop;
after I want to have time steps of 20 minutes each. Unfortunately I get a rounding error after some thousands of values, leading me to
>> datestr(WHM01_timevec(254160))
ans =
31-Aug-2014 23:39:59
and not as expected, 31-Aug-2014 23:40:00
How can I correct these incorrect values?
Edit: I also saw this thread, but unfortunately I get there a vector per date, and not a number as desired.
You can give year, month, day, ... in numeric format to the function datenum. Datenum accepts vectors for one or several of its arguments, and if the numbers are too big (for example, 120 minutes), datenum knows what to do with it.
So by supplying the minutes vector in 20-minute increments, you can avoid rounding errors (at least on a 1-second level):
WHM01_start = datenum('01-JAN-2005 00:00');
WHM01_stop = datenum('01-SEP-2014 00:00');
time_diff = WHM01_stop - WHM01_start;
WHM01_timevec = test = datenum(2005,01,01,00,[00:20:time_diff*24*60],00);
datestr(WHM01_timevec(254160))
To answer your comment:
The reason you saw rounding errors was that you used the difference of two big numbers for your time-increments. The difference of large numbers has a (relatively) large rounding error.
Matlab time is counted in days since the (fictional) date 0.0.0000. Your time-increment is 1/3 hour, or 1/(24*3) days. Modifying your original code so that it reads
WHM01_timevec = WHM01_start:1/(24*3):WHM01_stop;
is an alternative way to reduce the rounding error, but for absurdely large time spans the first solution is a more robust approach.
Related answer: use linspace instead of the colon operator :.
%// given
WHM01_start = datenum('01-JAN-2005 00:00')
WHM01_stop = datenum('01-SEP-2014 00:00')
%// number of elements
n = numel(WHM01_start: datenum('01-JAN-2014 00:20') - ...
datenum('01-JAN-2014 00:00') : WHM01_stop);
%// creating vector using linspace
WHM01_timevec = linspace(WHM01_start, WHM01_stop, n);
%// proof
datestr(WHM01_timevec(254160))
ans =
31-Aug-2014 23:40:00
Drawback of this solution: to determine the number of elements of the output vector I use the original vector created with :, which is not the best option probably.
Important quote from the linked answer:
Using linspace can reduce the probability of occurance of these issue, it's not a security.

Help with a special case of permutations algorithm (not the usual)

I have always been interested in algorithms, sort, crypto, binary trees, data compression, memory operations, etc.
I read Mark Nelson's article about permutations in C++ with the STL function next_perm(), very interesting and useful, after that I wrote one class method to get the next permutation in Delphi, since that is the tool I presently use most. This function works on lexographic order, I got the algo idea from a answer in another topic here on stackoverflow, but now I have a big problem. I'm working with permutations with repeated elements in a vector and there are lot of permutations that I don't need. For example, I have this first permutation for 7 elements in lexographic order:
6667778 (6 = 3 times consecutively, 7 = 3 times consecutively)
For my work I consider valid perm only those with at most 2 elements repeated consecutively, like this:
6676778 (6 = 2 times consecutively, 7 = 2 times consecutively)
In short, I need a function that returns only permutations that have at most N consecutive repetitions, according to the parameter received.
Does anyone know if there is some algorithm that already does this?
Sorry for any mistakes in the text, I still don't speak English very well.
Thank you so much,
Carlos
My approach is a recursive generator that doesn't follow branches that contain illegal sequences.
Here's the python 3 code:
def perm_maxlen(elements, prefix = "", maxlen = 2):
if not elements:
yield prefix + elements
return
used = set()
for i in range(len(elements)):
element = elements[i]
if element in used:
#already searched this path
continue
used.add(element)
suffix = prefix[-maxlen:] + element
if len(suffix) > maxlen and len(set(suffix)) == 1:
#would exceed maximum run length
continue
sub_elements = elements[:i] + elements[i+1:]
for perm in perm_maxlen(sub_elements, prefix + element, maxlen):
yield perm
for perm in perm_maxlen("6667778"):
print(perm)
The implentation is written for readability, not speed, but the algorithm should be much faster than naively filtering all permutations.
print(len(perm_maxlen("a"*100 + "b"*100, "", 1)))
For example, it runs this in milliseconds, where the naive filtering solution would take millenia or something.
So, in the homework-assistance kind of way, I can think of two approaches.
Work out all permutations that contain 3 or more consecutive repetitions (which you can do by treating the three-in-a-row as just one psuedo-digit and feeding it to a normal permutation generation algorithm). Make a lookup table of all of these. Now generate all permutations of your original string, and look them up in lookup table before adding them to the result.
Use a recursive permutation generating algorthm (select each possibility for the first digit in turn, recurse to generate permutations of the remaining digits), but in each recursion pass along the last two digits generated so far. Then in the recursively called function, if the two values passed in are the same, don't allow the first digit to be the same as those.
Why not just make a wrapper around the normal permutation function that skips values that have N consecutive repetitions? something like:
(pseudocode)
funciton custom_perm(int max_rep)
do
p := next_perm()
while count_max_rerps(p) < max_rep
return p
Krusty, I'm already doing that at the end of function, but not solves the problem, because is need to generate all permutations and check them each one.
consecutive := 1;
IsValid := True;
for n := 0 to len - 2 do
begin
if anyVector[n] = anyVector[n + 1] then
consecutive := consecutive + 1
else
consecutive := 1;
if consecutive > MaxConsecutiveRepeats then
begin
IsValid := False;
Break;
end;
end;
Since I do get started with the first in lexographic order, ends up being necessary by this way generate a lot of unnecessary perms.
This is easy to make, but rather hard to make efficient.
If you need to build a single piece of code that only considers valid outputs, and thus doesn't bother walking over the entire combination space, then you're going to have some thinking to do.
On the other hand, if you can live with the code internally producing all combinations, valid or not, then it should be simple.
Make a new enumerator, one which you can call that next_perm method on, and have this internally use the other enumerator, the one that produces every combination.
Then simply make the outer enumerator run in a while loop asking the inner one for more permutations until you find one that is valid, then produce that.
Pseudo-code for this:
generator1:
when called, yield the next combination
generator2:
internally keep a generator1 object
when called, keep asking generator1 for a new combination
check the combination
if valid, then yield it

Resources