Before asking my question, here's a little background so you understand what I'm doing. I'm looking to analyze a very large data set (a little less than 2,000,000 rows). I've parsed the data set into Matlab and built a structure array from this data, giving names, dates, returns, etc for each asset i. Now, I would like to restrict my data set to being between two days, and Matlab doesn't seem to be particularly amenable to that kind of approach. One suggestion that was given to me was to take the dates, which are of the form MM/DD/YYYY and use a delimiter '/' to somehow build three integer arrays for my data structure (which I'd call stock(i).month, stock(i).day, and stock(i).year). However, nothing I'm doing seems to be working, and I'm very much stuck.
What I have been trying to do is something like the following:
%% Dates
fid = fopen('52c6d3831952b24a.csv');
C = textscan(fid, [repmat('%*s ',1,0),'%s %*[^\n]'], 'delimiter',',');
date = C{1}(2:end,1);
fclose(fid);
for i=1:numStock
locate = strcmp(uniquePermno{i},permno);
stock(i+1).date = date(locate);
end;
for i = 1:numStock
stock(i+1).date = char(stock(i+1).date);
D = textscan(stock(i+1).date, '%s %s %s', 'delimiter','/');
stock(i+1).month = D{1}(1:end);
stock(i+1).day = D{2}(1:end);
stock(i+1).year = D{3}(1:end);
end
I initially wanted to save them as integers (and was using %u instead), but I was getting a strange situation where most of my entries were just 0 and the non-zero ones were very large (obviously not what I expected). However, the above form returns the following error:
Error using textscan
Buffer overflow (bufsize = 4095) while reading string from
file (row 1 u, field 1 u). Use 'bufsize' option. See HELP TEXTSCAN.
44444444444444444444455555555555555555555566666666666666666666677777777777777777777778888888888888888888889999999999999999999990000000000000000000000011111111111111111112222222222222222222222111111111
Error in makeData_CRSP (line 87)
D = textscan(stock(i+1).date, '%s %s %s', 'delimiter','/');
So I'm honestly at a loss for how to approach this. What am I doing wrong? Seeing how I saved my dates vectors for my data structure, is this the best way to approach this problem?
You can use the datenum function to convert dates into numbers. The syntax is datenum(dateString, format). For example, if your dates are in the format YYYY MM DD then that would be
datenum('2012 12 04', 'yyyy mm dd')
Once you converted all your dates like that you can simply compare the resulting numbers using > and <:
>> datenum('2012 12 04', 'yyyy mm dd') > datenum('2012 12 03', 'yyyy mm dd')
ans =
1
>> datenum('2012 12 04', 'yyyy mm dd') > datenum('2012 12 05', 'yyyy mm dd')
ans =
0
Related
I received timestamp datasets which are in the format of yyyy-mm-dd HH:MM:SS.ms. I want to convert into yyyy-mm-dd HH:MM:SS format. Is there any way to select only in this format using matlab?
For example:
2012-08-01 00:10:00.0
should be:
2012-08-01 00:10:00
Please note that the millisecond values are all zero.
The general way would be to use datestr to convert it to your desired format.
dates = {'2012-08-01 00:10:00.1';
'2012-08-01 00:10:00.1'};
new = datestr(dates, 'yyyy-mm-dd HH:MM:SS');
% 2012-08-01 00:10:00
% 2012-08-01 00:10:00
Another approach would be that since all of your milliseconds are going to be zero (therefore you don't have to worry about rounding) you can just use a regular expression to remove the milliseconds component (anything after the decimal point)
new = regexprep(dates, '\..*', '')
This is likely going to be more performant as you don't need to perform the intermediate step of converting to either a datetime object or a date number.
Since the input and output format are the same except for the milliseconds, don't use date functions, but simple string operations:
% example dates
C = {'2012-08-01 00:10:00.0'
'2013-08-02 00:11:11.0'
'2014-08-03 00:12:22.0'
'2015-08-04 00:13:33.0'
'2016-08-05 00:14:44.0'};
% method 1
D = cellfun(#(x)x(1:end-2), C, 'UniformOutput', false);
% method 2 (same, but no cellfun)
D = char(C);
D = cellstr(D(:,1:end-2));
% method 3
D = regexp(C, '[^\.]*', 'match', 'once');
% method 4
D = regexprep(C, '\..*$', '');
Lets say you need this data in datetime objects anyway then i would do something like this:
inp = {'2012-08-01 00:10:00.0'; '2012-08-02 04:10:00.0'}; % Some datestrins
t = datetime(inp,'InputFormat','yyyy-MM-dd HH:mm:ss.0'); % Convert to datetimes
datestr(t, 'yyyy-mm-dd HH:MM:SS') % convert back to strings
For the input & output formatter see the documentation. I assume that the last part is always zero.
I'm currently working on a small project on handling time difference on MATLAB. I have two input files; Time_in and Time_out. The two files contain arrays of time in the format e.g 2315 (GMT - Hours and Minute)
I've read both Time_in' and 'Time_out on MATLAB but I don't know how to perform the subtraction. Also, I want the corresponding answers to be in minutes domain only e.g (2hrs 30mins = 150minutes)
this is one of several possible solutions:
First, you should convert your time strings to a MATLAB serial date number. If you've done this, you can do your calculation as you want:
% input time as string
time_in = '2115';
time_out = '2345';
% read the input time as datenum
dTime_in = datenum(time_in,'HHMM');
dTime_out = datenum(time_out,'HHMM');
% subtract to get the time difference
timeDiff = abs(dTime_out - dTime_in);
% Get the minutes of the time difference
timeout = timeDiff * 24 * 60;
Furthermore, to calculate the time differences correctly you also should put some information about the date in your time vector, in order to calculate the correct time around midnight.
If you need further information about the function datenum you should read the following part of the MATLAB documentation:
https://de.mathworks.com/help/matlab/ref/datenum.html
Any questions?
In a recent version of MATLAB, you could use textscan together with datetime and duration data types to do this.
% read the first file
fh1 = fopen('Time_in');
d1 = textscan(fh1, '%{HHmm}D');
fclose(fh1);
fh2 = fopen('Time_out');
d2 = textscan(fh2, '%{HHmm}D');
fclose(fh2);
Note the format specifier '%{HHmm}D' tells MATLAB to read the 4-digit string into a datetime array.
d1 and d2 are now cell arrays where the only element is a datetime vector. You can subtract these, and then use the minutes function to find the number of minutes.
result = minutes(d2{1} - d1{1})
I have a date cell array which is read from a csv file. The format is below:
date =
'2008.12.01'
'2008.12.02'
'2008.12.03'
'2008.12.04'
'2008.12.05'
... ...
And I want to:
turn the cell array to a string array,
use the strread() to read its "yyyy","mm" and "dd" value into 3 double array [year,mm,dd],
use the datenummx() to turn [year,mm,dd] into date seriel num.
After i use
date = char(date);
the date array become like this:
date =
2008.12.01
2008.12.02
2008.12.03
2008.12.04
2008.12.05
... ...
which I think the result is what i want...
But after I use the strread(), it gives me odd result.
[year,month,day]=strread(date,'%d%d%d','delimiter','.');
year =
-1
0
0
0
0
... ...
BUT if I use the code below, the strread() can give me the right answer:
s = sprintf('2008.12.01')
s =
2008.12.01
[year,month,day]=strread(s,'%d%d%d','delimiter','.')
year =
2008
month =
12
day =
1
And I checked in the matlab that both the "date" and "s" is a char array.(by using function 'ischar' and simply display both)...
But why do the strread() give differnt results?
Can anyone answer?
by the way, I use the MatLab v6.5.(for my own reason, so please don't comment by asking "why not use a higher version")....
Your problem is this line:
date = char(date);
It does not create an array of strings, there is no array of strings in matlab. It creates an array of chars. As you already noticed, your strread-line is fine if you input a single date, so input each date form your original cell array individually:
for idx=1:numel(date)
[year(idx),month(idx),day(idx)]=strread(date{idx},'%d%d%d','delimiter','.');
end
Preallocation of year, month and day improves the performance.
I need to concatenate all characters from a date/time string into a single 1x1 array.
If I do
K>> datestr(now, 'mmm dd yyyy - HH:MM')
ans =
Jan 04 2014 - 11:58
K>> size(datestr(now, 'mmm dd yyyy - HH:MM'))
ans =
1 19
I end up with a 1x19 array for my date. It needs to be 1x1 as it will be the first of many other comma separated values exported to a single CSV file.
values =[
software_version; date_time; % <===== date_time should be 1x1 to fit
P_total; P_total_SD; P_total_CV; % in single cell
t1;
con_int1; con_int2; ...
];
Do you have any suggestions on how to accomplish this?
My question is only partially addressed here, so I believe this is not a duplicate.
They suggest dlmwrite but that will not work for my case in which there are many other values being output to a single file.
The string has 19 characters, so as a character array it is necessarily of size 1x19. How to write this to a CSV file depends on the method you use; if the respective function supports to output strings to the file, I would expect it to accept a cell array as input. In that case, your 1x19 character array would be contained in one cell of the cell array:
values ={
software_version; date_time;
P_total; P_total_SD; P_total_CV;
t1;
con_int1; con_int2; ...
};
The difference from your code is to replace [] by {}.
When trying the following concatenation:
for i=1:1:length(Open)
data(i,1) = Open(i);
data(i,2) = Close(i);
data(i,3) = High(i);
data(i,4) = Low(i);
data(i,5) = Volume(i);
data(i,6) = Adj_Close(i);
data(i,7) = cell2mat(dates(1,i));
end
Where all matrices but dates contain double values, and dates is a cell array with dates in the format '2001-01-01'. Running the code above, I get the following error:
??? Subscripted assignment dimension mismatch.
Error in ==> Test_Trades_part2 at 81
data(i,7) = cell2mat(dates(1,i));
The code above is tied to a master code which takes data from Yahoo Finance and then puts it in my SQL database.
A convenient way to store dates in completely numeric format is with datenum:
>> data(i,7) = datenum('2001-01-01');
>> disp(data(i,:))
0 0 0 0 0 0 730852
Whether this is useful to you depends on what you intend to do with the SQL database. Howeer, converting back to a string with MATLAB is straightforward with the datestr command:
>> datestr(730852,'yyyy-mm-dd')
ans =
2001-01-01
APPENDIX:
A serial date number represents a calendar date as the number of days that has passed since a fixed base date. In MATLAB, serial date number 1 is January 1, 0000.
Thank you all for the help!
I solved this issue using the following methodlogy (incorporating structs, should have thought of that .. stupid me):
data = [open, close_price, high, low, volume, closeadj];
s = struct('OpenPrice', data(:,1), 'ClosePrice', data(:,2), 'High', data(:,3), 'Low', data(:,4), 'Volume', data(:,5), 'Adj_Close', data(:,6), 'Dates', {dates});
This way, I enter all the values contained in the struct, circumventing the need to concatenate numeric and string matrices. Odd though, that is not allowed to have such matrices in on matrix; I would suppose that is the reason they created structs.