model.train() and model.eval() causing nan values - loops

Hey so I am trying my hand at image classification/transfer learning using the monkey species dataset and the resnet50 with a modified final fc layer to predict just the 10 classes. Eveything is working until I use model.train() and model.eval() then after the first epoch it starts to return nans and the accuracy drops off as you'll see below. I'm curious why is this only when switching to train/eval....?
First I import the model and attach the classifier and freeze the parameters
%%capture
resnet = models.resnet50(pretrained=True)
for param in resnet.parameters():
param.required_grad = False
in_features = resnet.fc.in_features
# Build custom classifier
classifier = nn.Sequential(OrderedDict([('fc1', nn.Linear(in_features, 512)),
('relu', nn.ReLU()),
('drop', nn.Dropout(0.05)),
('fc2', nn.Linear(512, 10)),
]))
# ('output', nn.LogSoftmax(dim=1))
resnet.classifier = classifier
resnet.to(device)
Then setting my loss func, optimizer, and shceduler
# Step : Define criterion and optimizer
criterion = nn.CrossEntropyLoss()
# pass the optimizer to the appended classifier layer
optimizer = torch.optim.SGD(resnet.parameters(), lr=0.01)
# Scheduler
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[10], gamma=0.05)
Then setting the training and validation loops
epochs = 20
tr_losses = []
avg_epoch_tr_loss = []
tr_accuracy = []
val_losses = []
avg_epoch_val_loss = []
val_accuracy = []
val_loss_min = np.Inf
resnet.train()
for epoch in range(epochs):
for i, batch in enumerate(train_loader):
# Pull the data and labels from the batch
data, label = batch
# If available push data and label to GPU
if train_on_gpu:
data, label = data.to(device), label.to(device)
# Compute the logit
logit = resnet(data)
# Compte loss
loss = criterion(logit, label)
# Clearing the gradient
resnet.zero_grad()
# Backpropagate the gradients (accumulte the partial derivatives of loss)
loss.backward()
# Apply the updates to the optimizer step in the opposite direction to the gradient
optimizer.step()
# Store the losses of each batch
# loss.item() seperates the loss from comp graph
tr_losses.append(loss.item())
# Detach and store the average accuracy of each batch
tr_accuracy.append(label.eq(logit.argmax(dim=1)).float().mean())
# Print the rolling batch training loss every 20 batches
if i % 40 == 0 and not i == 1:
print(f'Batch No: {i} \tAverage Training Batch Loss: {torch.tensor(tr_losses).mean():.2f}')
# Print the average loss for each epoch
print(f'\nEpoch No: {epoch + 1},Training Loss: {torch.tensor(tr_losses).mean():.2f}')
# Print the average accuracy for each epoch
print(f'Epoch No: {epoch + 1}, Training Accuracy: {torch.tensor(tr_accuracy).mean():.2f}\n')
# Store the avg epoch loss for plotting
avg_epoch_tr_loss.append(torch.tensor(tr_losses).mean())
resnet.eval()
for i, batch in enumerate(val_loader):
# Pull the data and labels from the batch
data, label = batch
# If available push data and label to GPU
if train_on_gpu:
data, label = data.to(device), label.to(device)
# Compute the logits without computing the gradients
with torch.no_grad():
logit = resnet(data)
# Compte loss
loss = criterion(logit, label)
# Store test loss
val_losses.append(loss.item())
# Store the accuracy for each batch
val_accuracy.append(label.eq(logit.argmax(dim=1)).float().mean())
if i % 20 == 0 and not i == 1:
print(f'Batch No: {i+1} \tAverage Val Batch Loss: {torch.tensor(val_losses).mean():.2f}')
# Print the average loss for each epoch
print(f'\nEpoch No: {epoch + 1}, Epoch Val Loss: {torch.tensor(val_losses).mean():.2f}')
# Print the average accuracy for each epoch
print(f'Epoch No: {epoch + 1}, Epoch Val Accuracy: {torch.tensor(val_accuracy).mean():.2f}\n')
# Store the avg epoch loss for plotting
avg_epoch_val_loss.append(torch.tensor(val_losses).mean())
# Checpoininting the model using val loss threshold
if torch.tensor(val_losses).float().mean() <= val_loss_min:
print("Epoch Val Loss Decreased... Saving model")
# save current model
torch.save(resnet.state_dict(), '/content/drive/MyDrive/1. Full Projects/Intel Image Classification/model_state.pt')
val_loss_min = torch.tensor(val_losses).mean()
# Step the scheduler for the next epoch
scheduler.step()
# Print the updated learning rate
print('Learning Rate Set To: {:.5f}'.format(optimizer.state_dict()['param_groups'][0]['lr']),'\n')
The model starts to train but then slowly becomes nan values
Batch No: 0 Average Training Batch Loss: 9.51
Batch No: 40 Average Training Batch Loss: 1.71
Batch No: 80 Average Training Batch Loss: 1.15
Batch No: 120 Average Training Batch Loss: 0.94
Epoch No: 1,Training Loss: 0.83
Epoch No: 1, Training Accuracy: 0.78
Batch No: 1 Average Val Batch Loss: 0.39
Batch No: 21 Average Val Batch Loss: 0.56
Batch No: 41 Average Val Batch Loss: 0.54
Batch No: 61 Average Val Batch Loss: 0.54
Epoch No: 1, Epoch Val Loss: 0.55
Epoch No: 1, Epoch Val Accuracy: 0.81
Epoch Val Loss Decreased... Saving model
Learning Rate Set To: 0.01000
Batch No: 0 Average Training Batch Loss: 0.83
Batch No: 40 Average Training Batch Loss: nan
Batch No: 80 Average Training Batch Loss: nan

I see that resnet.zero_grad() is after logit = resnet(data), which causes the gradient to explode in your case.
Please do it as below:
# Clearing the gradient
optimizer.zero_grad()
logit = resnet(data)
# Compute loss
loss = criterion(logit, label)

Related

How do I add my array dates into the datediff calculation

Newbie to bash scripting here and could use some help on this if you have time. My customers upload and each has a datestamp in filename like this:
* 20170815041135
* 20170820041135
* 20170823071727
* 20170826040609
* 20170828050704
* 20170830153011
I need to calculate the number of days between each upload then find the average interval of the listed uploads
I can find the date difference between two dates with this command
echo $(( ($(date --date="20170831" +'%s' ) - $(date --date="20170821" +'%s')) / (60*60*24) ))
gives 10
To do multiple dates I've read that I need an array, so here is my range of upload dates in an array.
array=( `20170830153011`,`20170828050704`,`20170826040609`,`20170823071727`,`20170820041135`,`20170815041135` )
I've read I need to loop through the calculation like this
for i in "${array[#]}" do
?
How do I add my array dates into the calculation?
Your datetimes into an array:
timestamps=(
20170815041135
20170820041135
20170823071727
20170826040609
20170828050704
20170830153011
)
Let's now convert those into epoch times:
epochs=()
for timestamp in "${timestamps[#]}"; do
iso8601=$(sed -r 's/(....)(..)(..)(..)(..)(..)/\1-\2-\3T\4:\5:\6/' <<<"$timestamp")
epochs+=( "$(date -d "$iso8601" "+%s")" )
done
printf "%s\n" "${epochs[#]}"
1502784695
1503216695
1503487047
1503734769
1503911224
1504121411
Now we can iterate over them to calculate the differences. Note that bash array indices start at zero:
n=0
sum=0
for ((i=1; i < "${#epochs[#]}"; i++ )); do
((n++, diff=(${epochs[i]} - ${epochs[i-1]}), sum+=diff))
echo "diff $n = $diff seconds = $((diff/86400)) days"
done
echo "average = $((sum/n)) seconds = $((sum/n/86400)) days"
diff 1 = 432000 seconds = 5 days
diff 2 = 270352 seconds = 3 days
diff 3 = 247722 seconds = 2 days
diff 4 = 176455 seconds = 2 days
diff 5 = 210187 seconds = 2 days
average = 267343 seconds = 3 days
Convert the date in seconds from 1970.
Calculate the difference.
I hope that the bash date function knows to take into account the daylight saving time from that date.

Praat: Get formant intensity

I have a praat script that extracts formant information from a folder of wavefiles:
clearinfo
min_f0 = 75
max_f0 = 350
directory$ = "./soundfiles/"
outputDir$ = "./test/"
strings = Create Strings as file list: "list", directory$ + "/*.WAV"
numberOfFiles = Get number of strings
for ifile to numberOfFiles
select Strings list
filename$ = Get string... ifile
Read from file... 'directory$''filename$'
soundname$ = selected$ ("Sound", 1)
outputFileName$ = outputDir$ + soundname$ + ".f0123"
appendInfoLine: outputFileName$
select Sound 'soundname$'
formant = To Formant (burg): 0, 4, 5000, 0.025, 50
formantStep = Get time step
selectObject: formant
table = Down to Table: "no", "yes", 6, "yes", 3, "yes", 3, "yes"
numberOfRows = Get number of rows
select Sound 'soundname$'
pitch = To Pitch: 0, min_f0, max_f0
selectObject: table
Append column: "Pitch"
for step to numberOfRows
selectObject: table
t = Get value: step, "time(s)"
selectObject: pitch
pitchValue = Get value at time: t, "Hertz", "Nearest"
selectObject: table
Set numeric value: step, "Pitch", pitchValue
endfor
#export to csv
selectObject: table
Save as comma-separated file: outputFileName$
removeObject(table)
select all
minus Strings list
Remove
endfor
select all
Remove
exit
And it generates the following output:
time(s),intensity,nformants,F1(Hz),B1(Hz),F2(Hz),B2(Hz),F3(Hz),B3(Hz),F4(Hz),B4(Hz),Pitch
0.025370,0.000007,3,213.115,14.053,2385.911,791.475,3622.099,677.605,--undefined--,--undefined--,--undefined--
0.031620,0.000007,3,208.843,15.034,2487.710,687.736,3818.027,645.184,--undefined--,--undefined--,197.5315925472943
...
This works great for what I need, but is there a way to get the intensity of each formant as well? Right now I only have the one intensity estimate.
It's an old question, but I'll still respond.
I've ran into this too in 2002, when I was creating an editor for a hardware format synthesizer (FS1R). I used praat to do the wav->format tracks calculation, and the synthesizer expects formant frequencies and intensities as an input.
I've implemented several algorithms for it, but the one that had the most realistic results evaluated the intensity for each formant at each frame in the spectogram.
Here's the code that I've used for that.
Keep in mind that it was my goal to get a list of 512 frames with up to 8 freq/intensity pairs, and a fundamental pitch.
# Add to dynamic menu... Sound 1 "" 0 "" 0 "Sine-wave speech" Resample... 1 yourdisk:Praat:scripts:SWS
form Add Sounds
word wavePath e:\samples\wav\root\
word waveFile DOUG.wav
word OutPath e:\samples\wav\root\
integer minFP 75
integer maxFP 500
integer maxFF 5000
integer Amp_low_pass_freq 50
integer Formant_low_pass_freq 20
endform
echo Wave to FSeq - FORMANT EXTRACTION
echo -------------------------------------------------------
# LOAD WAVEFILE
echo loading 'wavePath$''waveFile$'
Read from file... 'wavePath$''waveFile$'
if numberOfSelected ("Sound") <> 1
pause Select one Sound then Continue
endif
snd$ = selected$("Sound", 1)
snd = selected("Sound", 1)
sampleRate = Get sample rate
numSamples = Get number of samples
dur = Get duration
zzz = 512/509*512
timeStep = dur/zzz
echo samplerate : 'sampleRate' herz
echo number of samples : 'numSamples'
echo duration : 'dur' seconds
echo timestep : 'timeStep' seconds
echo
# GET FUNDAMENTAL PITCH
echo getting fundamental pitch
# this was the old method, used until FSeqEdit 1.21:
# To Pitch... 'timeStep' 'minFP' 'maxFP'
# Interpolate
# this algorithm seems to work better
To Pitch (ac)... 'timeStep' 'minFP' 15 no 1e-06 0.1 0.01 1 1 'maxFP'
Kill octave jumps
Interpolate
select Pitch 'snd$'
Write to short text file... 'outPath$'pitch.txt
select Pitch 'snd$'
Remove
# GET VOICED/UNVOICED INFORMATION
echo getting voiced/unvoiced information
select Pitch 'snd$'
To PointProcess
select PointProcess 'snd$'
To TextGrid (vuv)... 0.02 'timeStep'
select TextGrid 'snd$'
Write to short text file... 'outPath$'vuv.txt
#create wide-band spectrogram for finding formant amplitudes
# to spectorgam analwidth maxfreq timestep freqstep windowshape
echo to spectogram
select 'snd'
To Spectrogram... 0.003 'maxFF' 0.001 40 Gaussian
select 'snd'
echo finding formants
To Formant (burg)... 'timeStep' 8 'maxFF' 0.025 50
Rename... untrack
Track... 6 'maxFP' 'maxFP'*3 'maxFP'*5 'maxFP'*7 'maxFP'*9 1 0.1 1
Rename... 'snd$'
select Formant untrack
Remove
select 'snd'
#start of main formant loop
#===========================
#for each chosen formant turn formant tracks into
#a Matrix then a Sound object for optional low-pass filtering
#NB this Sound object is the formant TRACK
#then back into a Matrix object for sound synthesis
for i from 1 to 6
# make a matrix from Fi
select Formant 'snd$'
echo extracting formant 'i'
To Matrix... 'i'
Rename... f'i'
#low-pass filter the formant track and tidy-up the names
#filtering needs a Sound object, so cast as Sound, filter and then back to Matrix
if Formant_low_pass_freq <> 0
To Sound (slice)... 1
Filter (pass Hann band)... 0 'formant_low_pass_freq' 'formant_low_pass_freq'
Down to Matrix
select Matrix f'i'
Remove
select Matrix f'i'_band
Rename... f'i'
select Sound f'i'
plus Sound f'i'_band
Remove
endif
#set up amplitude contour array (sample only at 1kHz) for i'th formant
#make it a Sound object so that it can be smoothed by filtering
Create Sound... amp'i' 0 'dur' 1000 sqrt(Spectrogram_'snd$'(x,Matrix_f'i'(x)))
#smooth out pitch amplitude modulation by low-pass filtering
if Amp_low_pass_freq <> 0
Filter (pass Hann band)... 0 'amp_low_pass_freq' 'amp_low_pass_freq'
select Sound amp'i'
Remove
select Sound amp'i'_band
Rename... amp'i'
endif
Extract part... 0 'dur' Rectangular 1 yes
To Intensity... 'minFP' 0
Write to short text file... 'outPath$'amp'i'.txt
select Matrix f'i'
Remove
endfor
#===========================
#end of the main formant loop
select Formant 'snd$'
Write to short text file... 'outPath$'formant.txt
#tidy-up
select Spectrogram 'snd$'
plus Formant 'snd$'
plus Pitch 'snd$'
plus PointProcess 'snd$'
plus TextGrid 'snd$'
Remove
echo
echo -------------------------------------------------------
echo done.
I'm not sure if this is what you need, but based on the comment from #nikolay-shmyrev, this is how you'd insert the measurement of formant intensity from Spectrogram objects into your script.
I seem to be inoculated against the pain of scripting using Praat...
I simplified the script below so that it works only on the currently selected Sound object (for testing), and simply kept the generated Table (so you can check it out), but it should point you in the right direction.
form Script...
positive Minimum_F0 75
positive Maximum_F0 350
positive Formants 4
endform
sound = selected("Sound")
pitch = To Pitch: 0, minimum_F0, maximum_F0
# You need this for the intensity
selectObject: sound
spectrogram = To Spectrogram: 0.005, 5000, 0.002, 20, "Gaussian"
selectObject: sound
formant = To Formant (burg): 0, formants, 5000, 0.025, 50
table = Down to Table: "no", "yes", 6, "yes", 3, "yes", 3, "yes"
Append column: "Pitch"
# Insert columns for each formant intensity
# (labeled here as "I#", where # is the formant index)
for f to formants
index = Get column index: "F" + string$(f) + "(Hz)"
Insert column: index + 1, "I" + string$(f)
endfor
for row to Object_'table'.nrow
selectObject: table
time = Object_'table'[row, "time(s)"]
# Get the intensity of each formant
for f to formants
frequency = Object_'table'[row, "F" + string$(f) + "(Hz)"]
selectObject: spectrogram
if frequency != undefined
intensity = Get power at: time, frequency
else
intensity = undefined
endif
selectObject: table
Set string value: row, "I" + string$(f), fixed$(intensity, 3)
endfor
selectObject: pitch
pitchValue = Get value at time: time, "Hertz", "Nearest"
selectObject: table
Set string value: row, "Pitch", fixed$(pitchValue, 3)
endfor
removeObject: spectrogram, formant, pitch

Saving mixed data cell array to ascii file in MATLAB

I get some data from an instrument that is formatted in a specific way. I need to load the data into MATLAB, manipulate some values, then save it back with the same format to load back into the instrument software for further analysis...
The issue I am having is the data is of mixed value types and they are kind of all over the place.
The file is tab delimited, I have added arrows eg --> to show the location of the tabs (like notepad++ does)
Scan-42/01
Temperature [K] :--> 295.00
Time [s] :--> 60
"Linspace"
0.01--> 0.96
0.02--> 0.95
0.03--> 0.95
"Logspace"
0.01--> 0.96
0.02--> 0.95
0.04--> 0.94
The data keeps going down but I have cut it off after 3 rows.
The data I need to manipulate will be the Temperature, and some of the values under Linspace and Logspace.
I am currently importing the data like this:
filename = 'test.asc';
delimiter = '\t';
formatSpec = '%s%s%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'ReturnOnError', false);
Data in MATLAB looks like this:
Even if I could set up some kind of template in MATLAB where I could get the values nesessary, and then save them in excactly this format would work fine. The file must be saved as .asc, or the instrument will reject it.
Help is greatly appreciated.
Thanks
Hope this would work for you.
Code
%%// Note: file1 is your input .asc filename and file2 is the output .asc.
%%// Please specify their names before running this.
%%// **** Read in file data ****
fid = fopen(file1,'r');
A = importdata(file1,'\n')
%%// Delimiters (mind these assumptions)
linlog_delim1 = '--> ';
temperature_delim1 = 'Temperature [K] :--> ';
sep1 = cellfun(#(x) isequal(x,''),A)
sep1 = [sep1 ;1]
sep_ind = find(sep1)
full_data = regexp(A,linlog_delim1,'split')
%%// Temperature value
temp_ind = find(~cellfun(#isempty,strfind(A,'Temperature [K] :-->')))
temp_val = str2num(cell2mat(full_data{temp_ind,:}(1,2)))
%%// Linspace values
sep_linspace = cellfun(#(x) isequal(x,'"Linspace"'),A)
lin_start_ind = find(sep_linspace)+1
lin_stop_ind = sep_ind(find(sep_ind>lin_start_ind,1,'first'))-1
linspace_data = vertcat(full_data{lin_start_ind:lin_stop_ind})
linspace_valid_ind = cellfun(#str2num,linspace_data(:,1))
linspace_valid_val = cellfun(#str2num,linspace_data(:,2))
%%// Logspace values
sep_linspace = cellfun(#(x) isequal(x,'"Logspace"'),A)
log_start_ind = find(sep_linspace)+1
log_stop_ind = sep_ind(find(sep_ind>log_start_ind,1,'first'))-1
logpace_data = vertcat(full_data{log_start_ind:log_stop_ind})
logspace_valid_ind = cellfun(#str2num,logpace_data(:,1))
logspace_valid_val = cellfun(#str2num,logpace_data(:,2))
%%// **** Let us modify some data ****
temp_val = temp_val + 10;
linspace_valid_val_mod1 = linspace_valid_val+[1 2 3]'; %%//'
logspace_valid_val_mod1 = logspace_valid_val+[1 20 300]'; %%//'
%%// **** Write back file data ****
%%// Write back temperature data
A(temp_ind) = {[temperature_delim1,num2str(temp_val)]}
%%// Write back linspace data
mod_lin_val = cellfun(#strtrim,cellstr(num2str(linspace_valid_val_mod1)),'uni',0)
mod_lin_ind = cellstr(num2str(linspace_valid_ind))
sep_lin = repmat({linlog_delim1},numel(mod_lin_val),1)
A(lin_start_ind:lin_stop_ind)=cellfun(#horzcat,mod_lin_ind,sep_lin,mod_lin_val,'uni',0)
%%// Write back logspace data
mod_log_val = cellfun(#strtrim,cellstr(num2str(logspace_valid_val_mod1)),'uni',0)
mod_log_ind = cellstr(num2str(logspace_valid_ind))
sep_log = repmat({linlog_delim1},numel(mod_log_val),1)
A(log_start_ind:log_stop_ind)=cellfun(#horzcat,mod_log_ind,sep_log,mod_log_val,'uni',0)
%%// Remove leading whitespaces
A = strtrim(A)
%%// Write the modified data
fid2 = fopen(file2,'w');
for row = 1:numel(A)
fprintf(fid2,'%s\n',A{row,:});
end
fclose(fid);
fclose(fid2);
Changes for the demo:
Temperature has 10 added.
"Linspace" has 1 2 and 3 added to it's elements respectively.
"Logspace" has 1 20 and 300 added to it's elements respectively.
Results
Before -
Scan-42/01
Temperature [K] :--> 295.00
Time [s] :--> 60
"Linspace"
0.01--> 0.96
0.02--> 0.95
0.103--> 0.95
"Logspace"
0.01--> 0.96
0.02--> 0.95
0.04--> 0.94
After -
Scan-42/01
Temperature [K] :--> 305
Time [s] :--> 60
"Linspace"
0.01--> 1.96
0.02--> 2.95
0.103--> 3.95
"Logspace"
0.01--> 1.96
0.02--> 20.95
0.04--> 300.94
Edit 1:
Code
%%// I-O filenames
input_filename = 'gistfile1.txt';
output_file = 'gistfile1_out.txt';
%%// Get data from input filename
delimiter = '\t';
formatSpec = '%s%s%[^\n\r]';
fid = fopen(input_filename,'r');
dataArray = textscan(fid, formatSpec, 'Delimiter', delimiter, 'ReturnOnError', false);
%%// Get data into A
A(:,1) = dataArray{1,1}
A(:,2) = dataArray{1,2}
%%// Find separator indices
ind1 = find([cellfun(#(x) isequal(x,''),A(:,2));1])
temperature_ind = find(~cellfun(#isempty,strfind(A,'Temperature')))
temperature_val = str2num(cell2mat(A(temperature_ind,2)))
%%// Linspace values
sep_linspace = cellfun(#(x) isequal(x,'"Linspace"'),A(:,1))
lin_start_ind = find(sep_linspace)+1
lin_stop_ind = ind1(find(ind1>lin_start_ind,1,'first'))-1
linspace_valid_ind = cellfun(#str2num,A(lin_start_ind:lin_stop_ind,1))
linspace_valid_val = cellfun(#str2num,A(lin_start_ind:lin_stop_ind,2))
%%// Logspace values
sep_logspace = cellfun(#(x) isequal(x,'"Logspace"'),A(:,1))
log_start_ind = find(sep_logspace)+1
log_stop_ind = ind1(find(ind1>log_start_ind,1,'first'))-1
logspace_valid_ind = cellfun(#str2num,A(log_start_ind:log_stop_ind,1))
logspace_valid_val = cellfun(#str2num,A(log_start_ind:log_stop_ind,2))
%%// **** Let us modify some data ****
temp_val_mod1 = temperature_val + 10;
linspace_valid_val_mod1 = linspace_valid_val+[1:numel(linspace_valid_val)]';
logspace_valid_val_mod1 = logspace_valid_val+10.*[1:numel(logspace_valid_val)]';
%%// **** Write back file data into A ****
A(temperature_ind,2) = cellstr(num2str(temp_val_mod1))
A(lin_start_ind:lin_stop_ind,2) = cellstr(num2str(linspace_valid_val_mod1))
A(log_start_ind:log_stop_ind,2) = cellstr(num2str(logspace_valid_val_mod1))
%%// Write the modified data
fid2 = fopen(output_file,'w');
for row = 1:size(A,1)
fprintf(fid2,'%s\t%s\n',A{row,1},A{row,2});
end
%%// Close files
fclose(fid);
fclose(fid2);
Results
Before -
Scan-42/01
Temperature [K] : 295.00
Time [s] : 60
"Linspace"
0.01 0.96
0.02 0.95
0.03 0.95
"Logspace"
0.01 0.96
0.02 0.95
0.04 0.94
After -
Scan-42/01
Temperature [K] : 305
Time [s] : 60
"Linspace"
0.01 1.96
0.02 2.95
0.03 3.95
"Logspace"
0.01 10.96
0.02 20.95
0.04 30.94
Please note that the only formatting difference between input and output files is that there is no whitespaced row between "Linspace" and the previous row in the output file, as was there in the input file. This is seen similarly for "Logspace".
I've solved a nearly identical problem once before. The solution goes something like this:
First, you're already splitting your data up into chunks, so that's good. Judging by your comment, it seems that the data is consistently formatted from file to file, but inconsistently formatted in each individual file. That's fine.
What you need to do is iterate through dataArray, and find each unique label (Such as "Linspace") and track that labels index. What you'll end up with is a vector of indices that tell you exactly where in dataArray these labels appear. Once you have all of the labels indices, you need to look at the dataArray, and see how the data between each label is formatted. Then you'll write some code to break dataArray into sub-arrays. You'll need to write a different sub-array parser for each format.
I know that's a little abstract, so let me try to give you an example.
timeIndex = find(strcmp(dataArray, 'Time'), 1);
linespaceIndex = find(strcmp(dataArray, '"linSpace"'), 1);
logespaceIndex = find(strcmp(dataArray, '"logSpace"'), 1);
linSpaceData = dataArray(linspaceIndex+3:logspaceIndex-1); % This is the "sub-array" I was refering to. It's a little piece of dataArray that contains only the linspace data values.
This is just an example, and will probably not plug-and-play, it's just meant to be a thought-provoker. Note the +3 and -1, those were just guessed. You'll have to empirically determine those for each range, as lings like tabs, colons, and spaces can get in the way. That should be enough to get you started on your problem. Let me know if you need clarification, or if this isn't helpful. Good luck!
-Fletch

Rounding up a number not always works as expected

I want to charge my users 1 credit for each hour or fraction they use a service.
To calculate the cost I'm using the following code, but in some cases, for example when the starting and ending dates are exactly at one day difference, I get a cost of 25 credits instead of 24:
NSNumberFormatter *format = [[NSNumberFormatter alloc]init];
[format setNumberStyle:NSNumberFormatterDecimalStyle];
[format setRoundingMode:NSNumberFormatterRoundUp];
[format setMaximumFractionDigits:0];
[format setMinimumFractionDigits:0];
NSTimeInterval ti = [endDate timeIntervalSinceDate:startDate];
float costValue = ti/3600;
self.cost = [format stringFromNumber:[NSNumber numberWithFloat:costValue]];
What am I doing wrong?
NSTimeInterval has sub-millisecond precision. If the dates are one day and a millisecond apart, you would charge the 25-th credit.
Changing the code to do integer division should deal with the problem:
// You do not need sub-second resolution here, because you divide by
// the number of seconds in the hour anyway
NSInteger ti = [endDate timeIntervalSinceDate:startDate];
NSInteger costValue = (ti+3599)/3600;
// At this point, the cost is ready. You do not need a special formatter for it.

how to calculate rolling volatility

I am trying to design a function that will calculate 30 day rolling volatility.
I have a file with 3 columns: date, and daily returns for 2 stocks.
How can I do this? I have a problem in summing the first 30 entries to get my vol.
Edit:
So it will read an excel file, with 3 columns: a date, and daily returns.
daily.ret = read.csv("abc.csv")
e.g. date stock1 stock2
01/01/2000 0.01 0.02
etc etc, with years of data. I want to calculate rolling 30 day annualised vol.
This is my function:
calc_30day_vol = function()
{
stock1 = abc$stock1^2
stock2 = abc$stock1^2
j = 30
approx_days_in_year = length(abc$stock1)/10
vol_1 = 1: length(a1)
vol_2 = 1: length(a2)
for (i in 1 : length(a1))
{
vol_1[j] = sqrt( (approx_days_in_year / 30 ) * rowSums(a1[i:j])
vol_2[j] = sqrt( (approx_days_in_year / 30 ) * rowSums(a2[i:j])
j = j + 1
}
}
So stock1, and stock 2 are the squared daily returns from the excel file, needed to calculate vol. Entries 1-30 for vol_1 and vol_2 are empty since we are calculating 30 day vol. I am trying to use the rowSums function to sum the squared daily returns for the first 30 entries, and then move down the index for each iteration.
So from day 1-30, day 2-31, day 3-32, etc, hence why I have defined "j".
I'm new at R, so apologies if this sounds rather silly.
This should get you started.
First I have to create some data that look like you describe
library(quantmod)
getSymbols(c("SPY", "DIA"), src='yahoo')
m <- merge(ROC(Ad(SPY)), ROC(Ad(DIA)), all=FALSE)[-1, ]
dat <- data.frame(date=format(index(m), "%m/%d/%Y"), coredata(m))
tmpfile <- tempfile()
write.csv(dat, file=tmpfile, row.names=FALSE)
Now I have a csv with data in your very specific format.
Use read.zoo to read csv and then convert to an xts object (there are lots of ways to read data into R. See R Data Import/Export)
r <- as.xts(read.zoo(tmpfile, sep=",", header=TRUE, format="%m/%d/%Y"))
# each column of r has daily log returns for a stock price series
# use `apply` to apply a function to each column.
vols.mat <- apply(r, 2, function(x) {
#use rolling 30 day window to calculate standard deviation.
#annualize by multiplying by square root of time
runSD(x, n=30) * sqrt(252)
})
#`apply` returns a `matrix`; `reclass` to `xts`
vols.xts <- reclass(vols.mat, r) #class as `xts` using attributes of `r`
tail(vols.xts)
# SPY.Adjusted DIA.Adjusted
#2012-06-22 0.1775730 0.1608266
#2012-06-25 0.1832145 0.1640912
#2012-06-26 0.1813581 0.1621459
#2012-06-27 0.1825636 0.1629997
#2012-06-28 0.1824120 0.1630481
#2012-06-29 0.1898351 0.1689990
#Clean-up
unlink(tmpfile)

Resources