python: The comparison of numbers stored as elements in an array gives incorrect result - arrays

I am brand new to python and am trying to write a script that iterates over a list created by parsing a simple tab delineated file then comparing values in that list.
The input file looks like this:
1 11 12
9 46 200
471 56 30
And the code is:
with open("my_file.txt", "r") as my_file:
str = my_file.read()
clnstr = str.replace('\n', '\t')
content_list = clnstr.split("\t")
for i in range(0, len(content_list)-3, 3 ):
if (content_list[i+1] >= content_list[i]):
print(content_list[i], "is smaller than", content_list[i+1])
else :
print(content_list[i], "is bigger than", content_list[i+1])
However the output of these comparisons is wrong for some of the values:
1 is smaller than 11
9 is bigger than 46
471 is smaller than 56
I think its because its only comparing the first digits of the numbers stored in the array? If so how do i fix this in the above code?

Related

trouble with results of .split(" ") in Ruby

I am just starting to learn ruby and I am having troubles splitting my strings by spaces.
First I read in my file and break them up by the newline character :
inputfile = File.open("myfile.in")
filelines = inputfile.read.split("\n")
Then I try to read each of the two numbers individually:
filelines.each_with_index {|val, index| do_something(val, index)}
Where do_something is defined as:
def do_something(value, index)
if index == 0
numcases = value
puts numcases
else
value.split(" ")
puts value
puts value[0] #trying to access the first number
puts value[1] #trying to access the second number
end
end
but with a smaller input file like this one,
42
4 2
11 19
0 10
10 0
-10 0
0 -10
-76 -100
5 863
987 850
My outputs ends up looking like this:
42
4 2
4
11 19
1
1
0 10
0
10 0
1
0
-10 0
-
1
0 -10
0
-76 -100
-
7
5 863
5
987 850
9
8
so what I am understanding is that it is breaking it up character by character, rather than by spaces. I know it can read in the whole line, as I can print the contents of the array in its entirety, but I dont know what I am doing wrong.
I have also tried replacing value.split(" ") with:
value.gsub(/\s+/m, ' ').strip.split(" ")
value.split
value.split("\s")
Using RubyMine 2017.3.2
As was said in the comments, plus some other points, with an idiomatic code sample:
lines = File.readlines('myfile.in')
header_line, data_lines = lines[0], lines[1..-1]
num_cases = header_line.to_i
arrays_of_number_strings = data_lines.map(&:split)
arrays_of_numbers = arrays_of_number_strings.map do |array_of_number_strings|
array_of_number_strings.map(&:to_i)
end
puts "#{num_cases} cases in file."
arrays_of_numbers.each { |a| p a }
File.readlines is super handy!
I don't think you were calling to_i on the header information, that
will be important.
The data_lines.map(&:split) will return an array of the numbers as strings, but then you'll need to convert those strings to numbers too.
The p a in the final line will use the Array#inspect method, which is handy for viewing arrays as arrays, e.g. [12, 34].

How can I add only non zero content from text file?

I have the following type of data in a text file.
15 1
23 0
39 -1
71 -1
79 1
95 1
127 -2
151 2
183 1
191 -1
239 0
247 3
I want to create a 2d list from the text file as follows. I am able to do that with the code given below with the following result
[[15, 1.36896146582243],
[23, 0.000000000000000],
[39, 0.848993860380692],
[71, 0.629227476540724],
[79, 0.596517662620081],
[95, 0.543970127117099],
[127, 1.88189324753006],
[151, 1.72587115688942],
[183, 0.391932527534896],
[191, 0.383636720228727]]
However I do not want all the entries, I want only those with non zero entries in 2nd column in my source text file. For example I do not want the entries
23 0
239 0
How can I add the conditional statement into my code.
with open("path.text") as file:
R = [[int(x) for x in line.split()] for line in file]
There is no need to shoehorn it into a single list comprehension expression - it won't be faster in your case:
result = []
with open("path.text", "r") as f:
for line in f:
line = line.split()
if len(line) < 2: # just to make sure we have both columns
continue
column_2 = float(line[1]) # since, it appears, you have floats
if column_2:
result.append([int(line[0]), column_2]) # turn column 1 to int, too
UPDATE - Benchmark time - if you define your functions to closely match each other (so no floats handling or validation as above):
def f1():
with open("path.text", "r") as f:
return [[int(x) for x in line.split()] for line in f if '0' not in line.split()]
def f2():
result = []
with open("path.text", "r") as f:
for line in f:
line = line.split()
if line[1] != '0':
result.append([int(line[0]), int(line[1])])
return result
Where path.text contains the same data as in the OP and assert f1() == f2() passes, here are some results on my system:
Python 3.5.1, 64-bit
f1() - 100,000 loops: 10.834s
f2() - 100,000 loops: 9.9601s
Python 2.7.11 64-bit
f1() - 100,000 loops: 6.9243s
f2() - 100,000 loops: 6.4012s
Most of this is actually I/O, the difference in processing is actually far greater in relative terms.
A pythonic solution would be to add an if statement to the list comprehension
with open("path.text") as file:
R = [[int(x) for x in line.split()] for line in file if line[1] != '0']

How is an array sliced?

I have some sample code where the array is sliced as follows:
A = X(:,2:300)
What does this mean about the slice of the array?
: stands for 'all' if used by itself and 2:300 gives an array of integers from 2 to 300 with a spacing of 1 (1 is implicit) in MATLAB. 2:300 is the same as 2:1:300 and you can even use any spacing you wish, for example 2:37:300 (result: [2 39 76 113 150 187 224 261 298]) to generate equally spaced numbers.
Your statement says - select every row of the matrix A and columns 2 to 300. Suggested reading

Julia - specify type with readdlm

I have a large text file with small numbers which I need to import using Julia.
A toy example is
7
31 16
90 2 53
I found readdlm. When I go
a = readdlm("FileName.txt")
it works but the resulting array is of type Any and the resulting computations are really slow.
I've tried and failed to specify the type as int or specifically Int16.
How do I do that correctly?
Also if I use readdlm, do I have to close the file.
Your toy example would give you errors in case you specify types as there are some missing values in there. These missing values are handled as strings in Julia so the Type of your table would end up being Any as readdlm can't figure out whether these are numeric/character values. Row1 has only 1 value while row2 has 2, etc, giving you the missing values.
In case all your data is nice and clean in the text file, you can set the Type of the table in readdlm:
int_table = readdlm("FileName2.txt", Int16)
int_table
3x3 Array{Int16,2}:
7 0 0
31 16 0
90 2 53
Where FileName2.txt is:
7 0 0
31 16 0
90 2 53
However, if your data has missing values, you will need to convert them to some numeric values or use the DataFrames package to handle them. I'm assuming here that you want a pure integer Array so I fill the values with 0:
any_table = readdlm("FileName.txt")
any_table
3x3 Array{Any,2}:
7 "" ""
31 16 ""
90 2 53
# fill missing values with 0
any_table[any_table .== ""] .= 0
# convert to integer table
clean_array = Array{Int16}(any_table)
clean_array
3x3 Array{Int16,2}:
7 0 0
31 16 0
90 2 53
Readdlm closes the file for you, so you don't have to worry about that.

How to identify breaks within an array of MATLAB?

I have an array in MATLAB containing elements such as
A=[12 13 14 15 30 31 32 33 58 59 60];
How can I identify breaks in values of data? For example, the above data exhibits breaks at elements 15 and 33. The elements are arranged in ascending order and have an increment of one. How can I identify the location of breaks of this pattern in an array? I have achieved this using a for and if statement (code below). Is there a better method to do so?
count=0;
for i=1:numel(A)-1
if(A(i+1)==A(i)+1)
continue;
else
count=count+1;
q(count)=i;
end
end
Good time to use diff and find those neighbouring differences that aren't equal to 1. However, this will return an array which is one less than the length of your input array because it finds pairwise differences up until the last element, so naturally there will be one less. As such, when you find the locations that aren't equal to 1, make sure you add 1 to the locations to account for this:
>> A=[12 13 14 15 30 31 32 33 58 59 60];
>> q = find(diff(A) ~= 1) + 1
q =
5 9
This tells us that locations 5 and 9 in your array is where the jump happens, and that's right for your example data.
However, if you want to find the locations before the jump happens, such as in your code, don't add 1 to the result:
>> q = find(diff(A) ~= 1)
q =
4 8

Resources