Matlab: Number of observations per year for very large array - arrays

I have a large array with daily data from 1926 to 2012. I want to find out how many observations are in each year (it varies from year-to-year). I have a column vector which has the dates in the form of:
19290101
19290102
.
.
.
One year here is going to be July through June of the next year.
So 19630701 to 19640630
I would like to use this vector to find the number of days in each year. I need the number of observations to use as inputs into a regression.

I can't tell whether the dates are stored numerically or as a string of characters; I'll assume they're numbers. What I suggest doing is to convert each value to the year and then using hist to count the number of dates in each year. So try something like this:
year = floor(date/10000);
obs_per_year = hist(year,1926:2012);
This will give you a vector holding the number of observations in each year, starting from 1926.

Series of years starting July 1st:
bin = datenum(1926:2012,7,1);
Bin your vector of dates within each year with bin(1) <= x < bin(2), bin(2) <= x < bin(3), ...
count = histc(dates,bin);

Related

How to pull specific indices out of a character array in a loop?

I have an array that contains multiple dates in the format yyyymmdd, stored as a 50x1 double. I am trying to pull out the year,month, and day so I can use datenum to assign each date a serial number.
Indexing an individual date, converting the using str2num, then indexing and pulling the appropriate values works fine, but when I try to loop through the list of dates it doesn't work- only variations of the number 2 are returned.
dates = [20180910; 20180920; 20181012; 20181027; 20181103; 20181130; 20181225];
% version1
datesnums=num2str(dates); % dates is a list of dates stored as
integers
for i=1:length(datesnums)
pullyy=str2num(datesnums(1:4));
pullmm=str2num(datesnums(5:6));
pulldd=str2num(datesnums(7:8));
end
As well as
%version2
datesnums=num2str(dates,'%d')
for i = 1:length(datesnums)
dd=datenum(str2num(datesnums(i(1:4))),str2num(datesnums(i(5:6))),
str2num(datesnums(i(7:8))));
end
I'm trying to generate a new array that is just the serial numbers of the input dates. In the examples shown, I am only getting single integer values, which I know is because the loop is incorrect and I get errors that say "Index exceeds the number of array elements (1)." for version 1. When I've gotten it to successfully loop through everything, the outputs are just '2222','22,'22' for every single date which is incorrect. What am I doing wrong? Do I need to incorporate a cell array?
To get all the years, month, and days in a loop:
datesnums=num2str(dates);
for i=1:size(datesnums, 1)
pullyy(i) = str2num(datesnums(i,1:4));
pullmm(i) = str2num(datesnums(i,5:6));
pulldd(i) = str2num(datesnums(i,7:8));
end
Actually, you can do this without a loop:
pullyy = str2num(datesnums(:,1:4));
pullmm = str2num(datesnums(:,5:6));
pulldd = str2num(datesnums(:,7:8));
Explanation:
If for example the dates vector is a [6x1] array:
dates =[...
20190901
20170124
20191215
20130609
20141104
20190328];
Than datesnums=num2str(dates); creates a char matrix of size [6x8] where each row corresponds to one element in dates:
datesnums =
6×8 char array
'20190901'
'20170124'
'20191215'
'20160609'
'20191104'
'20190328'
So in the loop you need to refer to the row index for each date and and the column indices to extract the years, month, and days.
The easiest solution I can think of is:
SN = datenum(num2str(dates),'yyyymmdd')
You only have to specify the date format which is 'yyyymmdd'

How to change from annual year end to annual mid year in SAS

I currently work in SAS and utilise arrays in this way:
Data Test;
input Payment2018-Payment2021;
datalines;
10 10 10 10
20 20 20 20
30 30 30 30
;
run;
In my opinion this automatically assumes a limit, either the start of the year or the end of the year (Correct me if i'm wrong please)
So, if I wanted to say that this is June data and payments are set to increase every 9 months by 50% I'm looking for a way for my code to recognise that my years go from end of June to the next end of june
For example, if I wanted to say
Data Payment_Pct;
set test;
lastpayrise = "31Jul2018";
array payment:
array Pay_Inc(2018:2021) Pay_Inc: ;
Pay_Inc2018 = 0;
Pay_Inc2019 = 2; /*2 because there are two increments in 2019*/
Pay_Inc2020 = 1;
Pay_Inc2021 = 1;
do I = 2018 to 2021;
if i = year(pay_inc) then payrise(i) * 50% * Pay_Inc(i);
end;
run;
It's all well and good for me to manually do this for one entry but for my uni project, I'll need the algorithm to work these out for themselves and I am currently reading into intck but any help would be appreciated!
P.s. It would be great to have an algorithm that creates the following
Pay_Inc2019 Pay_Inc2020 Pay_Inc2021
1 2 1
OR, it would be great to know how the SAS works in setting the array for 2018:2021 , does it assume end of year or can you set it to mid year or?
Regarding input Payment2018-Payment2021; there is no automatic assumption of yearness or calendaring. The numbers 2018 and 2021 are the bounds for a numbered range list
In a numbered range list, you can begin with any number and end with any number as long as you do not violate the rules for user-supplied names and the numbers are consecutive.
The meaning of the numbers 2018 to 2021 is up to the programmer. You state the variables correspond to the June payment in the numbered year.
You would have to iterate a date using 9-month steps and increment a counter based on the year in which the date falls.
Sample code
Dynamically adapts to the variable names that are arrayed.
data _null_;
array payments payment2018-payment2021;
array Pay_Incs pay_inc2018-pay_inc2021; * must be same range numbers as payments;
* obtain variable names of first and last element in the payments array;
lower_varname = vname(payments(1));
upper_varname = vname(payments(dim(payments)));
* determine position of the range name numbers in those variable names;
lower_year_position = prxmatch('/\d+\s*$/', lower_varname);
upper_year_position = prxmatch('/\d+\s*$/', upper_varname);
* extract range name numbers from the variable names;
lower_year = input(substr(lower_varname,lower_year_position),12.);
upper_year = input(substr(upper_varname,upper_year_position),12.);
* prepare iteration of a date over the years that should be the name range numbers;
date = mdy(06,01,lower_year); * june 1 of year corresponding to first variable in array;
format date yymmdd10.;
do _n_ = 1 by 1; * repurpose _n_ for an infinite do loop with interior leave;
* increment by 9-months;
date = intnx('month', date, 9);
year = year(date);
if year > upper_year then leave;
* increment counter for year in which iterating date falls within;
Pay_Incs( year - lower_year + 1 ) + 1;
end;
put Pay_Incs(*)=;
run;
Increment counter notes
There is a lot to unpack in this statement
Pay_Incs( year - lower_year + 1 ) + 1;
+ 1 at the end of the statement increments the addressed array element by 1, and is the syntax for the SUM Statement
variable + expression The sum statement is equivalent to using the SUM function and the RETAIN statement, as shown here:
retain variable 0;
variable=sum(variable,expression);
year - lower_year + 1 computes the array base-1 index, 1..N, that addresses the corresponding variable in the named range list pay_inc<lower_year>-pay_inc<upper_year>
Pay_Incs( <computed index> ) selects the variable of the SUM statement
This is a wonderful use case of the intnx() function. intnx() will be your best friend when it comes to aligning dates.
In the traditional calendar, the year starts on 01JAN. In your calendar, the year starts in 01JUN. The difference between these two dates is exactly 6 months. We want to shift our date so that the year starts on 01JUN. This will allow you to take the year part of the date and determine what year you are on in the new calendar.
data want;
format current_cal_year
current_new_year year4.
;
current_cal_year = intnx('year', '01JUN2018'd, 0, 'B');
current_new_year = intnx('year.6', '01JUN2018'd, 1, 'B');
run;
Note that we shifted current_new_year by one year. To illustrate why, let's see what happens if we don't shift it by one year.
data want;
format current_cal_year
current_new_year year4.
;
current_cal_year = intnx('year', '01JUN2018'd, 0, 'B');
current_new_year = intnx('year.6', '01JUN2018'd, 0, 'B');
run;
current_new_year shows 2018, but we really are in 2019. For 5 months out of the year, this value will be correct. From June-December, the year value will be incorrect. By shifting it one year, we will always have the correct year associated with this date value. Look at it with different months of the year and you will see that the year part remains correct throughout time.
data want;
format cal_month date9.
cal_year
new_year year4.
;
do i = 0 to 24;
cal_month = intnx('month', '01JAN2016'd, i, 'B');
cal_year = intnx('year', cal_month, i, 'B');
new_year = intnx('year.6', cal_month, i+1, 'B');
year_not_same = (year(cal_year) NE year(new_year) );
output;
end;
drop i;
run;

count number of weekdays between two date strings in C99

I have two date strings in the form yyyy-mm-dd , just like
const char* date_start = "2015-09-30";
const char* date_end = "2015-10-03";
How do I calculate the number of weekdays (number of days which are neither Saturday nor Sunday) between the two dates? Dates where start and end day are equal can exists (the day count should be equal to 1 (workday) or 0 (weekend) then). All input dates are guaranteed to be valid (e.g. no 30th of February).
The solution need to work with C99 on OS X as well as Windows and independent of the system locale settings.
I would prefer to use as little external code (i.e. libraries or frameworks) as possible.
Pseudo Code
Form time structures
struct tm start = {0};
start.tm_year = 2105-1900;
start.tm_mon = 9-1;
start.tm_mday = 30;
start.tm_isdst = 0;
struct tm end = ...
Form time number and set tm_wday field
time_t tstart = mktime(&start);
time_t tend = mktime(&end);
Find day difference
double day_diff = difftime(&tend, &tstart)/(24.0*60*60);
Some magic per weekday (left for OP)
numweekdays = ((long)day_diff/7)*5 + foo(start->tm_wday, end->tm_wday);
convert date-strings to something more handable, like "long date since 1.1.1970", if that suits your use case; or "struct tm"
calculate the difference in days (end minus start plus one)
for each complete week inside the difference (> 7) add 5 weekdays/2 weekends
calculate the weekday-status for the rest (at least 6) days and add them accordinly

Utilizing SQL datepart to indentify consecutive periods of time

I have a stored procedure that works correctly, but don't understand the theory behind why it works. I'm indentifying a consecutive period of time by utilizing a datepart and dense rank (found solution through help elsewhere).
select
c.bom
,h.x
,h.z
,datepart(year, c.bom) * 12 + datepart(month, c.bom) -- this is returning a integer value for the year and month, allowing us to increment the number by one for each month
- dense_rank() over ( partition by h.x order by datepart(year, c.bom) * 12 + datepart(month, c.bom)) as grp -- this row does a dense rank and subtracts out the integer date and rank so that consecutive months (ie consecutive integers) are grouped as the same integer
from
#c c
inner join test.vw_info_h h
on h.effective_date <= c.bom
and (h.expiration_date is null or h.expiration_date > c.bom)
I understand in theory what is happening with the grouping functionality.
How does multiplying year * 12 + month work? Why do we multiply the year? What is happening in the backend?
The year component of a date is an integer value. Since there are 12 months in a year, multiplying the year value by 12 provides the total number of months that have passed to get to the first of that year.
Here's an example. Take the date February 11, 2012 (20120211 in CCYYMMDD format)
2012 * 12 = 24144 months from the start of time itself.
24144 + 2 months (february) = 24146.
Multiplying the year value by the number of months in a year allows you to establish month-related offsets without having to do any coding to handle the edge cases between the end of one year and the start of another. For example:
11/2011 -> 24143
12/2011 -> 24144
01/2012 -> 24145
02/2012 -> 24146

How can I find out all the dates of a given day in this month?

I have to write a program in C which, given the name of a day, returns all of the dates in this month that the day will be on.
For example, if the input is "Sunday" the output should be: 5,12,19,26 (which are the days Sunday will be on this month.)
Does any one have any idea how to do this? I have tried a lot.
You can use the time() function to get the current time.
Then use the localtime() function to get a struct (struct tm) with the information (year, month, day, ...) of the current time.
From the "struct tm" get the tm_mday and the tm_wday. Use these fields to determine the next or previous sunday. e.g. if tm_mday is 12, and tm_wday is 3 (wednesday), then we now that the 9th of this month is a sunday (12-3 = 9). From that number simply add or subtract 7 to get all other sundays.
You need to know it for a given year too? Or is this for only this year? If you need to know it for any given year, you can do a "days per month" enum, having one for the leap years and one for the non-leap years.
You just need to know in which day of week started a year (i.e: "Monday", "Tuesday", etc)
You will, at least, have 5 dates for any given month, so, you can have a fixed length array of ints.
You know that the gregorian calendar repeats itself each 400 years, and that if a year X started with day "Y", then, year X + 1 will start with day ("Y" + 1) % 7 if x is not a leap year, if it is a leap year, it will start with day ("Y" + 2).
that could give you the first date of any year, and knowing how many days have all the months for any given year, you can easily get what date that month starts in ("Monday", etc).
Then, all you have to do, is something like
int offset = 0;
int i;
while (myDate + offset != monthStartingDate) {
offset++;
}
i = offset + monthStartingDate;
(myDate is the number of day of week, and monthStartingDate is the number of day of week for the first day of that month)
when you go out of that loop, you will have the first ocurrence, then, you just add 7 until i is out of month bounds.
you can add each i into the array..
int res[5] = {0,0,0,0,0}
for ( ; i < daysOfMonth(month, year); i += 7) {
int res[i / 7] = i;
}
then you just return res.
Oh, I dind't know that you were able to use date functions :P I think the idea of the excercise was practicing C :P
1) Take a string (weekday name) from the input (use scanf or gets)
2) Convert it to a number (find it's index in a table of weekdays using loop and strcmp), assign 0 to Sunday, 1 for Monday ...
3) Get current time with time function and convert it to tm struct with localtime function
4) From tm struct calculate the first day in current month of a given weekday
first_mday_of_given_wday = (tm_struct.tm_mday + given_wday - tm_struct.tm_wday + 6) % 7 + 1
5) Find out how many days is in current month. To do this:
put 1 into tm_mday and 0 into tm_isdst of your tm struct
duplicate the struct
increase by 1 tm_mon in the duplicate (watch for the last month! in that case increase tm_year by 1 and set tm_mon to 1)
convert booth strutcs to time_t with mktime function and calculate difference (just subtract these time_t values), convert result from seconds to days (divide by 60*60*24)
6) Run a loop though calculated range:
for (i = first_mday_of_given_wday; i <= num_days_in_month; i += 7) printf("%d, ", i)
5th step can be omitted in a certain situations. We know that a month can have from 28 to 31 days. If any of hypothetical days 29,30,31 of current month cannot be a given weekday we can assume that current month has 28 days.
So simply - assume we have 28 days in current month if first_mday_of_given_wday is more then 3, otherwise calculate the number like shown in 5th step.

Resources