Saving a string array to CSV - arrays

I have a 25194081x2 matrix of strings called s1. Below an outlook of how the data looks like.
I am trying to save this matrix to csv. I tried the code below but for some reason it saves the first column of the vector twice (side by side) instead of the two columns.
What am I doing wrong?
fileID= fopen('data.csv', 'w') ;
fprintf(fileID, '%s,%s\n', [s1(:,1) s1(:,2)]);
fclose(fileID)

Dont merge the columns to a string array like you do now, but provide them as separate arguments, and loop over the rows of s1:
fileID= fopen('data.csv', 'w') ;
for k = 1:size(s1,1)
fprintf(fileID, '%s,%s\n', s1(k,1), s1(k,2));
end
fclose(fileID)
Or, if you're using >R2019a, you can use writematrix:
writematrix(s1, 'data.csv');

My version of MATLAB (R2016a) does not have the string type available yet, but your problem is one I was having regularly with cell arrays of character vectors. The trick I was using to avoid using a loop for fprintf should be applicable to you.
Let's start with sample data as close to yours:
s1 = {'2F5E8693E','al1 1aj_25';
'3F5E8693E','al1 1aj_50';
'3F5E8693E','al1 1aj_50';}
Then this code usually executed much faster for me than having to loop on the matrix for writing to file:
% step 1: transpose, to get the matrix in the MATLAB default column major order
s = s1.' ;
% step 2 : Write all in a long character array
so = sprintf('%s, %s\n', s{:} ) ;
% step 3 : write to file in one go (no need loop)
fid = fopen('data.csv', 'w') ;
fprintf(fid,'%c',so) ;
fclose(fid) ;
The only step slightly different for you might be step 2. I don't know if this syntax will work on a matrix of string as good on a cell array of characters, but I'm sure there is a way to get the same result: a single long vector of characters. Once you get that, fprintf will be uber fast to write it to file.
note: If the amount of data is too large and with a limited memory you might not be able to generate the long char vector. In my experience, it was still faster to use this method in chuncks (which would fit in memory) rather than looping over each line of the matrix.

Related

How do I add two integers together from a csv file in c code?

I'll make this short. I really want to learn from the answer. I am not here to make anyone code for me, so you choose if you want to learn me how to solve this problem or simply write the whole code.
I am trying to make a script which can read a CSV with this pattern:
DATE,TEXT,EXPENSE or INCOME,BALANCE,STATUS,DENIED.
Example: 11.01.2011,Grocery shop, -200, 700, Done, No.
I want the output to be the sum of all the expenses and income and second output what the balance is. I would like this info to be stored somewhere in the CSV file, so when I open it with excel it's there.
If you are able to explain what each line does it would be great for me so I can learn as a coding noob, but if not that's all good.
If this question is already answered I am so sorry. I have tried for a couple hours to find a answer, but I have only gotten some code that I don't know how to modify to what I want to do.
For any problem like this, I prefer to do it as a loop, structured in several parts:
read lines of text
when there are no more lines, we're done
for each line, break it into a number of fields
finally, do something interesting with the fields
The basic C function for step 1 is the fgets function. It's customery to bundle step 1 and step 2 together in the header of a while loop:
char linebuf[100];
while(fgets(linebuf, sizeof(linebuf), infp) != NULL) {
step 3; step 4;
}
Now, in the body of the loop, we have a line of text linebuf we've just read, and it's time to break it up into fields, in this case by searching for comma characters as delimiters. One way to do this sort of thing is using the library function strtok. Another is described in this chapter of some programming notes.
For files with columns of data, I like to store the broken-out fields in an array of pointers:
char *fields[MAXCOLS];
So then, you can use ordinary string operations to do interesting things with the columns (remembering that arrays in C are 0-based).
For example, to see if column 3 is the word "yes" or something else:
if(strcmp(fields[2], "yes") == 0) {
/* column 3 was "yes" */ ;
} else {
/* column 3 was something else */ ;
}
If column 5 was an amount, convert it to a number so you can do something with it:
double amount, running_total;
/* ... */
amount = atof(4);
running_total += amount;
(But beware: types float or double are not always good for dealing with monetary amounts, due to roundoff issues.)

Can you preallocate an array of random size?

The essential part of the code in question can be distilled into:
list=rand(1,x); % where x is some arbitrarily large integer
hitlist=[];
for n=1:1:x
if rand(1) < list(n)
hitlist=[hitlist n];
end
end
list(hitlist)=[];
This program is running quite slowly and I suspect this is why, however I'm unaware how to fix it. The length of the hitlist will necessarily vary in a random way, so I can't simply preallocate a 'zeros' of the proper size. I contemplated making the hitlist a zeros the length of my list, but then I would have to remove all the superfluous zeros, and I don't know how to do that without having the same problem.
How can I preallocate an array of random size?
I'm unsure about preallocating 'random size', but you can preallocate in large chunks, e.g. 1e3, or however is useful for your use case:
list=rand(1,x); % where x is some arbitrarily large integer
a = 1e3; % Increment of preallocation
hitlist=zeros(1,a);
k=1; % counter
for n=1:1:x
if rand(1) < list(n)
hitlist(k) = n;
k=k+1;
end
if mod(k-1,a)==0 % if a has been reached
hitlist = [hitlist zeros(1,a)]; % extend
end
end
hitlist = hitlist(1:k-1); % trim excess
% hitlist(k:end) = []; % alternative trim, but might error
list(hitlist)=[];
This won't be the fastest possible, but at least a whole lot faster than incrementing each iteration. Make sure to choose a suitable; you can even base it somehow on the available amount of RAM using memory, and trim the excess afterwards, that way you don't have to do the in-loop trick at all.
As an aside: MATLAB works column-major, so running through matrices that way is faster. I.e. first the first column, then the second and so on. For a 1D array this doesn't matter, but for matrices it does. Hence I prefer to use list = rand(x,1), i.e. as column.
For this specific case, don't use this looped approach anyway, but use logical indexing:
list = rand(x,1);
list = list(list<rand(size(list)));

Matlab: concatenation of 'n' vectors with data from 'n' .csv files within a loop. To show in a structure or table

I know there are a lot of posts about concatenation of arrays, but I can't find one that I can use for my case.
I have the following code, that reads a .csv file with mixed data types (using this function), and can store the elements of one .csv file into a Vector of dimensions Nx1.
basepath = 'Unzipped\Portfolio';
files = dir(fullfile(basepath, '*.csv'));
% Pre-allocate data storage
data = cell(size(files));
N=0;
% Import each file using it's filename
for k = 1:numel(files)
data{k} = csvimport(fullfile(basepath, files(k).name)); %reads all the data
[a, b]=size(data{k}); %the size of our data matrix
Index{k}=a-1; %Size of one .csv
N=N+Index{k}; %Size of all .csv
%for single .csv
V1s{k}=cell(Index{k}, 1);
V2s{k}=cell(Index{k}, 1);
V3s{k}=cell(Index{k}, 1);
V4s{k}=cell(Index{k}, 1);
%for all .csv
V1=cell(N,1);
V2=cell(N,1);
V3=cell(N,1);
V4=cell(N,1);
end
for i = 1:numel(files)
%Filling out every vector for a single .csv file and showing it up
[pathstr,name,ext] = fileparts(files(i).name);
V4s{i}(1:Index{i},1)=cellstr(name); %this vector contains only the name of the file repeatedly
V1s{i}=data{i}(2:end, 15);
V2s{i}=data{i}(2:end, 14);
V3s{i}=data{i}(2:end, 9);
C{i}=[V4s{i} V1s{i} V2s{i} V3s{i}] ;
Table{i}=array2table(C{i}, 'VariableNames', {'V1s' 'V2s' 'V3s' 'V4s'});
%For all .csv's
%This is where my doubts are and I know this one won't produce me any results.
V4=cat(1,V4s{:});
end
My problem comes when I have to make the concatenation of all the vectors one by one. It would be simple if I had, let's say, 3 of 'em.
I could use something like C=[V1; V2; V3]. But I can't find a way to make them all do that.
I was thinking about using counters and start indexing for every i, something like this:
V1(a:b,1)=V1s{i}
With values a and b being counter. But I don't how to write them.
I want a massive vector Nx1 with all of the V{i} going right after the next set of values.
Any help will be appreciated.
If you use the {:} operator, it will expand all of the contents of the cell array as input arguments. So if we call cat with the input data as a cell array, we can do the following to achieve what you want.
V1 = cat(1, V1s{:});
This is effectively the same as doing
V1 = [V1s{1}; V1s{2}; V1s{3}, ..., V1s{n}]
If I understood your question correctly (to read multiple csv files and store them into a N x 1 vector):
C = []; //start with an empty vector
for ... //for each csv file
M = csvread(filename); //read the csv file into a matrix
V = reshape(M, [numel(M) 1]); //reshape the matrix into an nx1 vector
C = vertcat(C, V); //concatenate nx1 vectors into Nx1 vectors
end
There are multiple ways to loop through your csv files. Either loop through an array of filenames, or you can obtain all the .csv files in the directory with:
files = dir('*.csv');
and loop through the csv files like that:
for i=1:length(files)
filename = files(i).name;
...
end
I had to work on an 'alternative' solution. I'm sure there's an easier way. But I'll settle with this one just for now.
I tried this approach: Indexing with counters. For that, you have to define those counters. So, for every different V{i}, you can index those values within a certain range.
To understand this in a better way, you have to add the following part to the last loop in my code above. Obviously, you must say first that counters are equal to zero outside of the loop so it can take an effect.
m=m+Index{i};
n=m-Index{i}+1;
V4(n:m,1)=V4s{i};
Remember, you have to place it below the commentary line that says
% For all .csv's
% This is where my doubts are and I know this one won't produce me any results.
And initialize m and n before getting into the loop:
m=0;
n=0;
You can apply this approach for indexing really large vectors or concatenating several of them.

MATLAB string to number error

I have a dozens of arrays with different array names and I would like to do some mathematical calculations in to for loop array by array. I srucked in calling these array into for loop. Is there anybody can help me with this problem? text1 array contains array names. My "s" struct has all these arrays with the same name content of text1 array.
text1=['s.CustomerArray.DistanceDriven','s.CustomerArray.TimeDriven'];
for i=1:3
parameter=str2num(text1(i));
k=size(parameter,2);
a=100;
y=zeros(a,k);
end
After this part my some other calculations should start using "parameter"
Regards,
Eren
I think you are doing several things wrong, here are some pointers.
Rather than listing them manually, consider looping over the fieldnames which can be obtained automatically.
If you are looping over strings, make sure to use a cell array with , rather than a matrix.
If you have a constant, declare it outside the loop, rather than inside the loop. This won't break the code but just makes for obsolete evaluations.
If you want to store results obtained inside a loop, make sure to add an index to the variable that you loop over.
That being said, here is a guess at what you are trying to do:
f = fieldnames(s.CustomerArray);
y = cell(numel(f),1);
parameter = NaN(numel(f),1);
for t = 1:numel(f)
parameter(t) = s.CustomerArray.(f{t});
y{t} = zeros(100,numel(f{t}));
end

How can I efficiently convert a large decimal array into a binary array in MATLAB?

Here's the code I am using now, where decimal1 is an array of decimal values, and B is the number of bits in binary for each value:
for (i = 0:1:length(decimal1)-1)
out = dec2binvec(decimal1(i+1),B);
for (j = 0:B-1)
bit_stream(B*i+j+1) = out(B-j);
end
end
The code works, but it takes a long time if the length of the decimal array is large. Is there a more efficient way to do this?
bitstream = zeros(nelem * B,1);
for i = 1:nelem
bitstream((i-1)*B+1:i*B) = fliplr(dec2binvec(decimal1(i),B));
end
I think that should be correct and a lot faster (hope so :) ).
edit:
I think your main problem is that you probably don't preallocate the bit_stream matrix.
I tested both codes for speed and I see that yours is faster than mine (not very much tho), if we both preallocate bitstream, even though I (kinda) vectorized my code.
If we DONT preallocate the bitstream my code is A LOT faster. That happens because your code reallocates the matrix more often than mine.
So, if you know the B upfront, use your code, else use mine (of course both have to be modified a little bit to determine the length at runtime, which is no problem since dec2binvec can be called without the B parameter).
The function DEC2BINVEC from the Data Acquisition Toolbox is very similar to the built-in function DEC2BIN, so some of the alternatives discussed in this question may be of use to you. Here's one option to try, using the function BITGET:
decimal1 = ...; %# Your array of decimal values
B = ...; %# The number of bits to get for each value
nValues = numel(decimal1); %# Number of values in decimal1
bit_stream = zeros(1,nValues*B); %# Initialize bit stream
for iBit = 1:B %# Loop over the bits
bit_stream(iBit:B:end) = bitget(decimal1,B-iBit+1); %# Get the bit values
end
This should give the same results as your sample code, but should be significantly faster.

Resources