Newbie to bash scripting here and could use some help on this if you have time. My customers upload and each has a datestamp in filename like this:
* 20170815041135
* 20170820041135
* 20170823071727
* 20170826040609
* 20170828050704
* 20170830153011
I need to calculate the number of days between each upload then find the average interval of the listed uploads
I can find the date difference between two dates with this command
echo $(( ($(date --date="20170831" +'%s' ) - $(date --date="20170821" +'%s')) / (60*60*24) ))
gives 10
To do multiple dates I've read that I need an array, so here is my range of upload dates in an array.
array=( `20170830153011`,`20170828050704`,`20170826040609`,`20170823071727`,`20170820041135`,`20170815041135` )
I've read I need to loop through the calculation like this
for i in "${array[#]}" do
?
How do I add my array dates into the calculation?
Your datetimes into an array:
timestamps=(
20170815041135
20170820041135
20170823071727
20170826040609
20170828050704
20170830153011
)
Let's now convert those into epoch times:
epochs=()
for timestamp in "${timestamps[#]}"; do
iso8601=$(sed -r 's/(....)(..)(..)(..)(..)(..)/\1-\2-\3T\4:\5:\6/' <<<"$timestamp")
epochs+=( "$(date -d "$iso8601" "+%s")" )
done
printf "%s\n" "${epochs[#]}"
1502784695
1503216695
1503487047
1503734769
1503911224
1504121411
Now we can iterate over them to calculate the differences. Note that bash array indices start at zero:
n=0
sum=0
for ((i=1; i < "${#epochs[#]}"; i++ )); do
((n++, diff=(${epochs[i]} - ${epochs[i-1]}), sum+=diff))
echo "diff $n = $diff seconds = $((diff/86400)) days"
done
echo "average = $((sum/n)) seconds = $((sum/n/86400)) days"
diff 1 = 432000 seconds = 5 days
diff 2 = 270352 seconds = 3 days
diff 3 = 247722 seconds = 2 days
diff 4 = 176455 seconds = 2 days
diff 5 = 210187 seconds = 2 days
average = 267343 seconds = 3 days
Convert the date in seconds from 1970.
Calculate the difference.
I hope that the bash date function knows to take into account the daylight saving time from that date.
Related
I'm currently working on a small project on handling time difference on MATLAB. I have two input files; Time_in and Time_out. The two files contain arrays of time in the format e.g 2315 (GMT - Hours and Minute)
I've read both Time_in' and 'Time_out on MATLAB but I don't know how to perform the subtraction. Also, I want the corresponding answers to be in minutes domain only e.g (2hrs 30mins = 150minutes)
this is one of several possible solutions:
First, you should convert your time strings to a MATLAB serial date number. If you've done this, you can do your calculation as you want:
% input time as string
time_in = '2115';
time_out = '2345';
% read the input time as datenum
dTime_in = datenum(time_in,'HHMM');
dTime_out = datenum(time_out,'HHMM');
% subtract to get the time difference
timeDiff = abs(dTime_out - dTime_in);
% Get the minutes of the time difference
timeout = timeDiff * 24 * 60;
Furthermore, to calculate the time differences correctly you also should put some information about the date in your time vector, in order to calculate the correct time around midnight.
If you need further information about the function datenum you should read the following part of the MATLAB documentation:
https://de.mathworks.com/help/matlab/ref/datenum.html
Any questions?
In a recent version of MATLAB, you could use textscan together with datetime and duration data types to do this.
% read the first file
fh1 = fopen('Time_in');
d1 = textscan(fh1, '%{HHmm}D');
fclose(fh1);
fh2 = fopen('Time_out');
d2 = textscan(fh2, '%{HHmm}D');
fclose(fh2);
Note the format specifier '%{HHmm}D' tells MATLAB to read the 4-digit string into a datetime array.
d1 and d2 are now cell arrays where the only element is a datetime vector. You can subtract these, and then use the minutes function to find the number of minutes.
result = minutes(d2{1} - d1{1})
Lets say i have daily data for 30 years of period in a matrix. To make it simple just assume it has only 1 column and 10957 row indicates the days for 30 years. The year start in 2010. I want to find the max value for every year so that the output will be 1 column and 30 rows. Is there any automated way to program it in Matlab? currently im doing it manually where what i did was:
%for the first year
max(RAINFALL(1:365);
.
.
%for the 30th of year
max(RAINFALL(10593:10957);
It is exhausting to do it manually and i have quite few of same data sets. I used the code below to calculate mean and standard deviation for the 30 years. I tried modified the code to work for my task above but i couldn't succeed. Hope anyone can modify the code or suggest new way to me.
data = rand(32872,100); % replace with your data matrix
[nDays,nData] = size(data);
% let MATLAB construct the vector of dates and worry about things like leap
% year.
dayFirst = datenum(2010,1,1);
dayStamp = dayFirst:(dayFirst + nDays - 1);
dayVec = datevec(dayStamp);
year = dayVec(:,1);
uniqueYear = unique(year);
K = length(uniqueYear);
a = nan(1,K);
b = nan(1,K);
for k = 1:K
% use logical indexing to pick out the year
currentYear = year == uniqueYear(k);
a(k) = mean2(data(currentYear,:));
b(k) = std2(data(currentYear,:));
end
One possible approach:
Create a column containing the year of each data value, using datenum and datevec to take care of leap years.
Find the maximum for each year, with accumarray.
Code:
%// Example data:
RAINFALL = rand(10957,1); %// one column
start_year = 2010; %// data starts on January 1st of this year
%// Computations:
[year, ~] = datevec(datenum(start_year,1,1) + (0:size(RAINFALL,1)-1)); %// step 1
result = accumarray(year.'-start_year+1, RAINFALL.', [], #max); %// step 2
As a bonus: if you change #max in step 2 by either #mean or #std, guess what you get... much simpler than your code.
This may help You:
RAINFALL = rand(1,10957); % - Your data here
firstYear = 2010;
numberOfYears = 4;
cum = 0; % - cumulative factor
yearlyData = zeros(1,numberOfYears); % - this isnt really necessary
for i = 1 : numberOfYears
yearLength = datenum(firstYear+i,1,1) - datenum(firstYear + i - 1,1,1);
yearlyData(i) = max(RAINFALL(1 + cum : yearLength + cum));
cum = cum + yearLength;
end
I want to accept (say)3 time elements (for example 8:30, 8:20 & 8:00) from user and store it in an array using 'datenum'. How can i achieve that? Please help.
Assuming that you just want to prompt the user given the current day and year and you only want the current time (hours and minutes - seconds is 0), you can do the following:
dateNumArray = []; %// Store datenums here
%// Enter a blank line to quit this loop
while true
timestr = input('Enter a time: ', 's');
if (isempty(timestr))
break;
end
%// Split the string up at the ':'
%//splitStr = strsplit(timestr, ':'); %// For MATLAB R2012 and up
splitStr = regexp(timestr, ':', 'split');
%// Read in the current date as a vector
%// Format of: [Year, Month, Day, Hour, Minute, Second]
timeRead = clock;
%// Replace hours and minutes with user prompt
%// Zero the seconds
timeRead(4:6) = [str2num(splitStr{1}) str2num(splitStr{2}) 0];
%// Convert to datenum format
dateNumArray = [dateNumArray datenum(timeRead)];
end
What the above code does is that we will keep looping for user input, where the time is expected to be in HH:MM format. Note that I did not perform error checking, so it is expected that HH is between 0-23 while MM is between 0-59. You keep putting in numbers by pushing in ENTER or RETURN for each entry. It parses this as a string, splits the string up at the : character, and converts each part before and after the : character into a number. We then get the current time when each hour and minute was recorded using the clock command. This is in a 6 element vector where the year, the month, the day, the hour, the minute and the second are recorded. We simply replace the hour and minute with what we read in from the user, and zero the seconds. We finally use this vector and append this to a dateNumArray variable where each time the user writes in a time, we will append a datenum number into this array.
When you call this, here is an example scenario:
Enter a time: 8:30
Enter a time: 8:45
Enter a time: 8:00
Enter a time:
Here's the example output from above:
format bank %// Show whole numbers and little precision
dateNumArray
format %// Reset format
dateNumArray =
735778.35 735778.36 735778.33
I have coded this partly..but am not sure, since what i get is only partial data.
so i have a matrix 4D, it has dimensions: xV(6,24,63,15) ---> meaning: xV(min,hour,day,customer).. the data is collected every 10 min for 63 days for 15 customer.
so that is why first 6 row is 10 min interval.
what i want is that i can collect the data for lets say monday every week and use it for plot.
meaning there is 63/7 = 9 mondays.. 9 mondays having 24 hours where each hour has 6 data(every 10 min). i want for each of those hour each monday each 10 min a new matrix..so i can take the mean of it and plot..
is this possible?
i have come so far..but no luck:
n = 0;
m = 0;
while(n<24)
n = n + 1;
while(m<6)
m = m + 1;
Va(:,m) = x(m,n,1:63,1); %(min,hour,day,line)
Vb(:,m) = x(m,n,1:63,1);
Vc(:,m) = x(m,n,1:63,1);
end
end
the file: xV.mat
thanks again for help
firstMonday = 1; %// index of first Monday. 1 if first day is a Monday
result = xV(:,:,firstMonday:7:end,:);
This gives a 6x24x9x15 matrix containing only Mondays. To average over all Mondays, use
squeeze(mean(result,3)) %// mean along 3rd dim. Size is 6x24x15
I am trying to design a function that will calculate 30 day rolling volatility.
I have a file with 3 columns: date, and daily returns for 2 stocks.
How can I do this? I have a problem in summing the first 30 entries to get my vol.
Edit:
So it will read an excel file, with 3 columns: a date, and daily returns.
daily.ret = read.csv("abc.csv")
e.g. date stock1 stock2
01/01/2000 0.01 0.02
etc etc, with years of data. I want to calculate rolling 30 day annualised vol.
This is my function:
calc_30day_vol = function()
{
stock1 = abc$stock1^2
stock2 = abc$stock1^2
j = 30
approx_days_in_year = length(abc$stock1)/10
vol_1 = 1: length(a1)
vol_2 = 1: length(a2)
for (i in 1 : length(a1))
{
vol_1[j] = sqrt( (approx_days_in_year / 30 ) * rowSums(a1[i:j])
vol_2[j] = sqrt( (approx_days_in_year / 30 ) * rowSums(a2[i:j])
j = j + 1
}
}
So stock1, and stock 2 are the squared daily returns from the excel file, needed to calculate vol. Entries 1-30 for vol_1 and vol_2 are empty since we are calculating 30 day vol. I am trying to use the rowSums function to sum the squared daily returns for the first 30 entries, and then move down the index for each iteration.
So from day 1-30, day 2-31, day 3-32, etc, hence why I have defined "j".
I'm new at R, so apologies if this sounds rather silly.
This should get you started.
First I have to create some data that look like you describe
library(quantmod)
getSymbols(c("SPY", "DIA"), src='yahoo')
m <- merge(ROC(Ad(SPY)), ROC(Ad(DIA)), all=FALSE)[-1, ]
dat <- data.frame(date=format(index(m), "%m/%d/%Y"), coredata(m))
tmpfile <- tempfile()
write.csv(dat, file=tmpfile, row.names=FALSE)
Now I have a csv with data in your very specific format.
Use read.zoo to read csv and then convert to an xts object (there are lots of ways to read data into R. See R Data Import/Export)
r <- as.xts(read.zoo(tmpfile, sep=",", header=TRUE, format="%m/%d/%Y"))
# each column of r has daily log returns for a stock price series
# use `apply` to apply a function to each column.
vols.mat <- apply(r, 2, function(x) {
#use rolling 30 day window to calculate standard deviation.
#annualize by multiplying by square root of time
runSD(x, n=30) * sqrt(252)
})
#`apply` returns a `matrix`; `reclass` to `xts`
vols.xts <- reclass(vols.mat, r) #class as `xts` using attributes of `r`
tail(vols.xts)
# SPY.Adjusted DIA.Adjusted
#2012-06-22 0.1775730 0.1608266
#2012-06-25 0.1832145 0.1640912
#2012-06-26 0.1813581 0.1621459
#2012-06-27 0.1825636 0.1629997
#2012-06-28 0.1824120 0.1630481
#2012-06-29 0.1898351 0.1689990
#Clean-up
unlink(tmpfile)