Julia - specify type with readdlm - file

I have a large text file with small numbers which I need to import using Julia.
A toy example is
7
31 16
90 2 53
I found readdlm. When I go
a = readdlm("FileName.txt")
it works but the resulting array is of type Any and the resulting computations are really slow.
I've tried and failed to specify the type as int or specifically Int16.
How do I do that correctly?
Also if I use readdlm, do I have to close the file.

Your toy example would give you errors in case you specify types as there are some missing values in there. These missing values are handled as strings in Julia so the Type of your table would end up being Any as readdlm can't figure out whether these are numeric/character values. Row1 has only 1 value while row2 has 2, etc, giving you the missing values.
In case all your data is nice and clean in the text file, you can set the Type of the table in readdlm:
int_table = readdlm("FileName2.txt", Int16)
int_table
3x3 Array{Int16,2}:
7 0 0
31 16 0
90 2 53
Where FileName2.txt is:
7 0 0
31 16 0
90 2 53
However, if your data has missing values, you will need to convert them to some numeric values or use the DataFrames package to handle them. I'm assuming here that you want a pure integer Array so I fill the values with 0:
any_table = readdlm("FileName.txt")
any_table
3x3 Array{Any,2}:
7 "" ""
31 16 ""
90 2 53
# fill missing values with 0
any_table[any_table .== ""] .= 0
# convert to integer table
clean_array = Array{Int16}(any_table)
clean_array
3x3 Array{Int16,2}:
7 0 0
31 16 0
90 2 53
Readdlm closes the file for you, so you don't have to worry about that.

Related

Does MATLAB support truly 1d arrays?

It would really help me reason about my MATLAB code if I didn't have to worry about accidental 2d operations. For instance, if I want to do element-wise multiplication of 1d arrays, but one is a row and another is a column, I end up with a 2d result.
>> a = 1:8;
>> a = a(:);
>> a .* cumsum(ones(8))
ans =
1 1 1 1 1 1 1 1
4 4 4 4 4 4 4 4
9 9 9 9 9 9 9 9
16 16 16 16 16 16 16 16
25 25 25 25 25 25 25 25
36 36 36 36 36 36 36 36
49 49 49 49 49 49 49 49
64 64 64 64 64 64 64 64
I'd like to prevent this type of thing, and likely other problems that I can't foresee, by keeping all my arrays 1d wherever I can. But every time I check the size() of vector, I get at least 2 elements back:
>> size(1:1:6)
ans =
1 6
>> size(linspace(0, 5, 10))
ans =
1 10
I've tried the suggestions at How to create single dimensional array in matlab? and some of the options here (PDF download), and I can't get a "truly" 1d array. How would you deal with this type of issue?
There is no such thing as 1D array. The documentation says (emphasis mine):
All MATLAB variables are multidimensional arrays, no matter what type of data. A matrix is a two-dimensional array often used for linear algebra.
You may use isvector, isrow and iscolumn to identify vectors, row vectors and column vectors respectively.
#Sardar has already said the last word. Another clue is ndims:
N = ndims(A) returns the number of dimensions in the array A. The
number of dimensions is always greater than or equal to 2. ...
But about your other question:
How would you deal with this type of issue?
There's not much you can do. Debug, find the mistake and fix it. If it's some one-time script, you are done. But if you are writing functions that may be used later, it's better to protect them from accepting arguments with unequal dimensions:
function myFunc(A, B)
if ndims(A)~=ndims(B) || any(size(A)~=size(B))
error('Matrix dimensions must agree.');
end
% ...
end
Or, if your function really needs them to be vectors:
function myFunc(A, B)
if ~isvector(A) || ~isvector(B) || any(size(A)~=size(B))
error('A and B must be vectors with same dimensions.');
end
% ...
end
You can also validate different attributes of arguments using validateattributes:
function myFunc(A, B)
validateattributes(A, {'numeric'},{'vector'}, 'myFunc', 'A')
validateattributes(B, {'numeric'},{'size', size(A)}, 'myFunc', 'B')
% ...
end
Edit:
Also, if the function only needs the inputs to be vectors and their orientation does not matter, you can modify them inside the function (thanks to #CrisLuengo for commenting).
function myFunc(A, B)
if ~isvector(A) || ~isvector(B) || length(A)~=length(B)
error('A and B must be vectors with the same length.');
end
A = A(:);
B = B(:);
% ...
end
However, this is not recommended when the output of the function is also a vector with the same size as the inputs. This is because the caller expects the output to be in the same orientation as the inputs, and if this is not the case, problems may arise.

Multi-Dimensional Arrays Julia

I am new to using Julia and have little experience with the language. I am trying to understand how multi-dimensional arrays work in it and how to access the array at the different dimensions. The documentation confuses me, so maybe someone here can explain it better.
I created an array (m = Array{Int64}(6,3)) and am trying to access the different parts of that array. Clearly I am understanding it wrong so any help in general about Arrays/Multi-Dimensional Arrays would help.
Thanks
Edit I am trying to read a file in that has the contents
58 129 10
58 129 7
25 56 10
24 125 25
24 125 15
13 41 10
0
The purpose of the project is to take these fractions (58/129) and round the fractions using farey sequence. The last number in the row is what both numbers need to be below. Currently, I am not looking for help on how to do the problem, just how to create a multidimensional array with all the numbers except the last row (0). My trouble is how to put the numbers into the array after I have created it.
So I want m[0][0] = 58, so on. I'm not sure how syntax works for this and the manual is confusing. Hopefully this is enough information.
Julia's arrays are not lists-of-lists or arrays of pointers. They are a single container, with elements arranged in a rectangular shape. As such, you do not access successive dimensions with repeated indexing calls like m[j][i] — instead you use one indexing call with multiple indices: m[i, j].
If you trim off that last 0 in your file, you can just use the built-in readdlm to load that file into a matrix. I've copied those first six rows into my clipboard to make it a bit easier to follow here:
julia> str = clipboard()
"58 129 10\n58 129 7\n25 56 10\n24 125 25\n24 125 15\n13 41 10"
julia> readdlm(IOBuffer(str), Int) # or readdlm("path/to/trimmed/file", Int)
6×3 Array{Int64,2}:
58 129 10
58 129 7
25 56 10
24 125 25
24 125 15
13 41 10
That's not very helpful in teaching you how Julia's arrays work, though. Constructing an array like m = Array{Int64}(6,3) creates an uninitialized matrix with 18 elements arranged in 6 rows and 3 columns. It's a bit easier to see how things work if we fill it with a sensible pattern:
julia> m .= [10,20,30,40,50,60] .+ [1 2 3]
6×3 Array{Int64,2}:
11 12 13
21 22 23
31 32 33
41 42 43
51 52 53
61 62 63
This has set up the values of the array to have the row number in their tens place and the column number in the ones place. Accessing m[r,c] returns the value in m at row r and column c.
julia> m[2,3] # second row, third column
23
Now, r and c don't have to be integers — they can also be vectors of integers to select multiple rows or columns:
julia> m[[2,3,4],[1,2]] # Selects rows 2, 3, and 4 across columns 1 and 2
3×2 Array{Int64,2}:
21 22
31 32
41 42
Of course ranges like 2:4 are just vectors themselves, so you can more easily and efficiently write that example as m[2:4, 1:2]. A : by itself is a shorthand for a vector of all the indices within the dimension it indexes into:
julia> m[1, :] # the first row of all columns
3-element Array{Int64,1}:
11
12
13
julia> m[:, 1] # all rows of the first column
6-element Array{Int64,1}:
11
21
31
41
51
61
Finally, note that Julia's Array is column-major and arranged contiguously in memory. This means that if you just use one index, like m[2], you're just going to walk down that first column. As a special extension, we support what's commonly referred to as "linear indexing", where we allow that single index to span into the higher dimensions. So m[7] accesses the 7th contiguous element, wrapping around into the first row of the second column:
julia> m[5],m[6],m[7],m[8]
(51, 61, 12, 22)

Finding minimum positive value and its position in each column of a matrix

I need to find the minimum positive values in each column and its position inside the column of a certain matrix. So if I have:
A = [1 4
2 3
3 6]
I need to obtain the values 1 and 3, and the positions 1 and 2. Doing this inside a for loop I obtain correctly the minimum values and its position, but it also catches the negative values:
for bit = 1:2
[y(bit),x(bit)] = min(A(:,bit));
end
And if I use:
[y(bit),x(bit)] = min(A(A(:,bit)>0));
I don't receive the expected result. What I'm doing wrong? Thanks.
This can be easily achieved using inf and min...
New method using inf and no looping
Take some random example:
% Generated using A = randi([-100, 100], 10, 3)
A = [ 31 41 -12
-93 -94 -24
70 -45 53
87 -91 59
36 -81 -63
52 65 -2
49 39 -11
-22 -37 29
31 90 42
-66 -94 51];
Set all negative values to positive infinity, which will ensure they are never the minimum value in the column.
A(A<=0) = inf;
% if you want to preserve A, use A2=A; A2(A<=0)=inf;
Now you can just use the min function as expected.
[mins, idx] = min(A);
% mins = 31, 39, 29: as expected
% idx = 1, 7, 8: the indices of the above values in each column as expected.
By default, min will get the column-wise minimum as you want.To specify this explicitly, use min(A,[],1), see the documentation for more details.
Note that you could achieve the same result by using NaN instead of inf.
Your method
In response to why you were getting an unexpected result, it's because you weren't selecting the column of A in your loop, the second attempt should be corrected to
[y(bit),x(bit)] = min(A(A(:,bit)>0, bit));
However, this will still give an unexpected result! The minimums will be correct, but their indices will be lower than expected. This is because the indices will only count the positive values in each column, so you will get the nth positive number rather than the nth number. The easiest "workaround" is to abandon this method and use the quicker one above which doesn't require looping.

How to identify breaks within an array of MATLAB?

I have an array in MATLAB containing elements such as
A=[12 13 14 15 30 31 32 33 58 59 60];
How can I identify breaks in values of data? For example, the above data exhibits breaks at elements 15 and 33. The elements are arranged in ascending order and have an increment of one. How can I identify the location of breaks of this pattern in an array? I have achieved this using a for and if statement (code below). Is there a better method to do so?
count=0;
for i=1:numel(A)-1
if(A(i+1)==A(i)+1)
continue;
else
count=count+1;
q(count)=i;
end
end
Good time to use diff and find those neighbouring differences that aren't equal to 1. However, this will return an array which is one less than the length of your input array because it finds pairwise differences up until the last element, so naturally there will be one less. As such, when you find the locations that aren't equal to 1, make sure you add 1 to the locations to account for this:
>> A=[12 13 14 15 30 31 32 33 58 59 60];
>> q = find(diff(A) ~= 1) + 1
q =
5 9
This tells us that locations 5 and 9 in your array is where the jump happens, and that's right for your example data.
However, if you want to find the locations before the jump happens, such as in your code, don't add 1 to the result:
>> q = find(diff(A) ~= 1)
q =
4 8

Importing text files with comments in MATLAB

Is there any character or character combination that MATLAB interprets as comments, when importing data from text files? Being that when it detects it at the beginning of a line, will know all the line is to ignore?
I have a set of points in a file that look like this:
And as you can see he doesn't seem to understand them very well. Is there anything other than // I could use that MATLAB knows it's to ignore?
Thanks!
Actually, your data is not consistent, as you must have the same number of column for each line.
1)
Apart from that, using '%' as comments will be correctly recognized by importdata:
file.dat
%12 31
12 32
32 22
%abc
13 33
31 33
%ldddd
77 7
66 6
%33 33
12 31
31 23
matlab
data = importdata('file.dat')
2)
Otherwise use textscan to specify arbitrary comment symbols:
file2.dat
//12 31
12 32
32 22
//abc
13 33
31 33
//ldddd
77 7
66 6
//33 33
12 31
31 23
matlab
fid = fopen('file2.dat');
data = textscan(fid, '%f %f', 'CommentStyle','//', 'CollectOutput',true);
data = cell2mat(data);
fclose(fid);
If you use the function textscan, you can set the CommentStyle parameter to // or %. Try something like this:
fid = fopen('myfile.txt');
iRow = 1;
while (~feof(fid))
myData(iRow,:) = textscan(fid,'%f %f\n','CommentStyle','//');
iRow = iRow + 1;
end
fclose(fid);
That will work if there are two numbers per line. I notice in your examples the number of numbers per line varies. There are some lines with only one number. Is this representative of your data? You'll have to handle this differently if there isn't a uniform number of columns in each row.
Have you tried %, the default comment character in MATLAB?
As Amro pointed out, if you use importdata this will work.

Resources