Importing text files with comments in MATLAB - file

Is there any character or character combination that MATLAB interprets as comments, when importing data from text files? Being that when it detects it at the beginning of a line, will know all the line is to ignore?
I have a set of points in a file that look like this:
And as you can see he doesn't seem to understand them very well. Is there anything other than // I could use that MATLAB knows it's to ignore?
Thanks!

Actually, your data is not consistent, as you must have the same number of column for each line.
1)
Apart from that, using '%' as comments will be correctly recognized by importdata:
file.dat
%12 31
12 32
32 22
%abc
13 33
31 33
%ldddd
77 7
66 6
%33 33
12 31
31 23
matlab
data = importdata('file.dat')
2)
Otherwise use textscan to specify arbitrary comment symbols:
file2.dat
//12 31
12 32
32 22
//abc
13 33
31 33
//ldddd
77 7
66 6
//33 33
12 31
31 23
matlab
fid = fopen('file2.dat');
data = textscan(fid, '%f %f', 'CommentStyle','//', 'CollectOutput',true);
data = cell2mat(data);
fclose(fid);

If you use the function textscan, you can set the CommentStyle parameter to // or %. Try something like this:
fid = fopen('myfile.txt');
iRow = 1;
while (~feof(fid))
myData(iRow,:) = textscan(fid,'%f %f\n','CommentStyle','//');
iRow = iRow + 1;
end
fclose(fid);
That will work if there are two numbers per line. I notice in your examples the number of numbers per line varies. There are some lines with only one number. Is this representative of your data? You'll have to handle this differently if there isn't a uniform number of columns in each row.

Have you tried %, the default comment character in MATLAB?
As Amro pointed out, if you use importdata this will work.

Related

Multi-Dimensional Arrays Julia

I am new to using Julia and have little experience with the language. I am trying to understand how multi-dimensional arrays work in it and how to access the array at the different dimensions. The documentation confuses me, so maybe someone here can explain it better.
I created an array (m = Array{Int64}(6,3)) and am trying to access the different parts of that array. Clearly I am understanding it wrong so any help in general about Arrays/Multi-Dimensional Arrays would help.
Thanks
Edit I am trying to read a file in that has the contents
58 129 10
58 129 7
25 56 10
24 125 25
24 125 15
13 41 10
0
The purpose of the project is to take these fractions (58/129) and round the fractions using farey sequence. The last number in the row is what both numbers need to be below. Currently, I am not looking for help on how to do the problem, just how to create a multidimensional array with all the numbers except the last row (0). My trouble is how to put the numbers into the array after I have created it.
So I want m[0][0] = 58, so on. I'm not sure how syntax works for this and the manual is confusing. Hopefully this is enough information.
Julia's arrays are not lists-of-lists or arrays of pointers. They are a single container, with elements arranged in a rectangular shape. As such, you do not access successive dimensions with repeated indexing calls like m[j][i] — instead you use one indexing call with multiple indices: m[i, j].
If you trim off that last 0 in your file, you can just use the built-in readdlm to load that file into a matrix. I've copied those first six rows into my clipboard to make it a bit easier to follow here:
julia> str = clipboard()
"58 129 10\n58 129 7\n25 56 10\n24 125 25\n24 125 15\n13 41 10"
julia> readdlm(IOBuffer(str), Int) # or readdlm("path/to/trimmed/file", Int)
6×3 Array{Int64,2}:
58 129 10
58 129 7
25 56 10
24 125 25
24 125 15
13 41 10
That's not very helpful in teaching you how Julia's arrays work, though. Constructing an array like m = Array{Int64}(6,3) creates an uninitialized matrix with 18 elements arranged in 6 rows and 3 columns. It's a bit easier to see how things work if we fill it with a sensible pattern:
julia> m .= [10,20,30,40,50,60] .+ [1 2 3]
6×3 Array{Int64,2}:
11 12 13
21 22 23
31 32 33
41 42 43
51 52 53
61 62 63
This has set up the values of the array to have the row number in their tens place and the column number in the ones place. Accessing m[r,c] returns the value in m at row r and column c.
julia> m[2,3] # second row, third column
23
Now, r and c don't have to be integers — they can also be vectors of integers to select multiple rows or columns:
julia> m[[2,3,4],[1,2]] # Selects rows 2, 3, and 4 across columns 1 and 2
3×2 Array{Int64,2}:
21 22
31 32
41 42
Of course ranges like 2:4 are just vectors themselves, so you can more easily and efficiently write that example as m[2:4, 1:2]. A : by itself is a shorthand for a vector of all the indices within the dimension it indexes into:
julia> m[1, :] # the first row of all columns
3-element Array{Int64,1}:
11
12
13
julia> m[:, 1] # all rows of the first column
6-element Array{Int64,1}:
11
21
31
41
51
61
Finally, note that Julia's Array is column-major and arranged contiguously in memory. This means that if you just use one index, like m[2], you're just going to walk down that first column. As a special extension, we support what's commonly referred to as "linear indexing", where we allow that single index to span into the higher dimensions. So m[7] accesses the 7th contiguous element, wrapping around into the first row of the second column:
julia> m[5],m[6],m[7],m[8]
(51, 61, 12, 22)

Julia - specify type with readdlm

I have a large text file with small numbers which I need to import using Julia.
A toy example is
7
31 16
90 2 53
I found readdlm. When I go
a = readdlm("FileName.txt")
it works but the resulting array is of type Any and the resulting computations are really slow.
I've tried and failed to specify the type as int or specifically Int16.
How do I do that correctly?
Also if I use readdlm, do I have to close the file.
Your toy example would give you errors in case you specify types as there are some missing values in there. These missing values are handled as strings in Julia so the Type of your table would end up being Any as readdlm can't figure out whether these are numeric/character values. Row1 has only 1 value while row2 has 2, etc, giving you the missing values.
In case all your data is nice and clean in the text file, you can set the Type of the table in readdlm:
int_table = readdlm("FileName2.txt", Int16)
int_table
3x3 Array{Int16,2}:
7 0 0
31 16 0
90 2 53
Where FileName2.txt is:
7 0 0
31 16 0
90 2 53
However, if your data has missing values, you will need to convert them to some numeric values or use the DataFrames package to handle them. I'm assuming here that you want a pure integer Array so I fill the values with 0:
any_table = readdlm("FileName.txt")
any_table
3x3 Array{Any,2}:
7 "" ""
31 16 ""
90 2 53
# fill missing values with 0
any_table[any_table .== ""] .= 0
# convert to integer table
clean_array = Array{Int16}(any_table)
clean_array
3x3 Array{Int16,2}:
7 0 0
31 16 0
90 2 53
Readdlm closes the file for you, so you don't have to worry about that.

How to identify breaks within an array of MATLAB?

I have an array in MATLAB containing elements such as
A=[12 13 14 15 30 31 32 33 58 59 60];
How can I identify breaks in values of data? For example, the above data exhibits breaks at elements 15 and 33. The elements are arranged in ascending order and have an increment of one. How can I identify the location of breaks of this pattern in an array? I have achieved this using a for and if statement (code below). Is there a better method to do so?
count=0;
for i=1:numel(A)-1
if(A(i+1)==A(i)+1)
continue;
else
count=count+1;
q(count)=i;
end
end
Good time to use diff and find those neighbouring differences that aren't equal to 1. However, this will return an array which is one less than the length of your input array because it finds pairwise differences up until the last element, so naturally there will be one less. As such, when you find the locations that aren't equal to 1, make sure you add 1 to the locations to account for this:
>> A=[12 13 14 15 30 31 32 33 58 59 60];
>> q = find(diff(A) ~= 1) + 1
q =
5 9
This tells us that locations 5 and 9 in your array is where the jump happens, and that's right for your example data.
However, if you want to find the locations before the jump happens, such as in your code, don't add 1 to the result:
>> q = find(diff(A) ~= 1)
q =
4 8

Slice only working on the last line in a file using 'with open' and 'readline'

What I am trying to accomplish: Pull the last 8 characters from the lines in a file, slice them into two character chunks, compare those chunks with my dictionary, and list the results. This is literally the first thing I have done in python, and my head is spinning with all the answers here.
I think I need basic swimming instruction, and every answer seems to be a primer on free-diving for world records.
I am using the following code (Right now I have the h1 through h4 commented out because it is not returning keys that are in my dictionary):
d1 = {'30': 0, '31': 1, '32': 2, '33' : 3, '34': 4, '35': 5, '36': 6, '37': 7, '38': 8, '39': 9,
'41': 'A', '42': 'B', '43': 'C', '44': 'D', '45': 'E', '46': 'F'}
filename = raw_input("Filename? > ")
with open(filename) as file:
for line in iter(file.readline, ''):
h1 = line[-8:-6]
h2 = line[-6:-4]
h3 = line[-4:-2]
h4 = line[-2:]
#h1 = d1[h1]
#h2 = d1[h2]
#h3 = d1[h3]
#h4 = d1[h4]
print h1,h2,h3,h4
Here is part of the txt file I am using as input:
naa.60000970000192600748533031453442
naa.60000970000192600748533031453342
naa.60000970000192600748533031453242
naa.60000970000192600748533031453142
naa.60000970000192600748533031434442
naa.60000970000192600748533031434342
naa.60000970000192600748533031434242
naa.60000970000192600748533032363342
When I run my script, here is the output generated by the code above:
14 53 44 2
14 53 34 2
14 53 24 2
14 53 14 2
14 34 44 2
14 34 34 2
14 34 24 2
32 36 33 42
The last line looks exactly as I would expect. All the other lines have been shifted or have dropped characters. I am at a loss for this...I have tried many different ways to open the file in python, but have been unable to get them to loop through, or had other issues.
Is there a simple fix I am just missing here? Thanks, j
I suspect that what's going on is that each line you read has a carriage return at the end, except the last one. So the last one is right, but the others are basically splitting the last part of the string. IOW, I think your file lines look something like
>>> open("demo.txt").readline()
'naa.60000970000192600748533031453442\n'
where the \n is the symbol for the carriage return, and is only one character (it's not \ + n). I might write your code something like
with open(filename) as myfile:
for line in myfile:
line = line.strip() # get rid of leading and trailing whitespace
h1 = line[-8:-6]
h2 = line[-6:-4]
h3 = line[-4:-2]
h4 = line[-2:]
print h1,h2,h3,h4
which for me produces
Filename? > demo.txt
31 45 34 42
31 45 33 42
31 45 32 42
31 45 31 42
31 43 44 42
31 43 43 42
31 43 42 42
32 36 33 42
We could simplify the h parts, but we'll leave that alone for now. :^)

using array names in tcl to get the indices matching regexp

Following array is set in tcl
db(PR,) =
db(PR,132754) = 5 6 7 8 9 10 11 12 13 14 31 32 33 34 35 36 37 38 39 40
db(PR,144917) = 2 3 28 29
db(PR,83055) = 4 30
I want all the array indices except db(PR,) since it has nothing after comma
I tried:
array names db -regexp PR,\d+
but it gives no output
and
array names db -regexp PR,*
PR,144917 PR,132754 PR, PR,83055
return unwanted PR, index
So how can i eliminate that array index from getting in array names output?
What about
array names db -regexp PR,.+
?
Resp.: If there are always digits after the comma (except for db(PR,)) you should escape the backslash
array names db -regexp PR,\\d+
or do
array names db -regexp PR,\[0-9]+
If the criterion is simply "must be something after the comma", it can be as simple as
array names db -regexp ,.
array names db -glob *,?* ;# alternative

Resources