How can I add only non zero content from text file? - arrays

I have the following type of data in a text file.
15 1
23 0
39 -1
71 -1
79 1
95 1
127 -2
151 2
183 1
191 -1
239 0
247 3
I want to create a 2d list from the text file as follows. I am able to do that with the code given below with the following result
[[15, 1.36896146582243],
[23, 0.000000000000000],
[39, 0.848993860380692],
[71, 0.629227476540724],
[79, 0.596517662620081],
[95, 0.543970127117099],
[127, 1.88189324753006],
[151, 1.72587115688942],
[183, 0.391932527534896],
[191, 0.383636720228727]]
However I do not want all the entries, I want only those with non zero entries in 2nd column in my source text file. For example I do not want the entries
23 0
239 0
How can I add the conditional statement into my code.
with open("path.text") as file:
R = [[int(x) for x in line.split()] for line in file]

There is no need to shoehorn it into a single list comprehension expression - it won't be faster in your case:
result = []
with open("path.text", "r") as f:
for line in f:
line = line.split()
if len(line) < 2: # just to make sure we have both columns
continue
column_2 = float(line[1]) # since, it appears, you have floats
if column_2:
result.append([int(line[0]), column_2]) # turn column 1 to int, too
UPDATE - Benchmark time - if you define your functions to closely match each other (so no floats handling or validation as above):
def f1():
with open("path.text", "r") as f:
return [[int(x) for x in line.split()] for line in f if '0' not in line.split()]
def f2():
result = []
with open("path.text", "r") as f:
for line in f:
line = line.split()
if line[1] != '0':
result.append([int(line[0]), int(line[1])])
return result
Where path.text contains the same data as in the OP and assert f1() == f2() passes, here are some results on my system:
Python 3.5.1, 64-bit
f1() - 100,000 loops: 10.834s
f2() - 100,000 loops: 9.9601s
Python 2.7.11 64-bit
f1() - 100,000 loops: 6.9243s
f2() - 100,000 loops: 6.4012s
Most of this is actually I/O, the difference in processing is actually far greater in relative terms.

A pythonic solution would be to add an if statement to the list comprehension
with open("path.text") as file:
R = [[int(x) for x in line.split()] for line in file if line[1] != '0']

Related

python: The comparison of numbers stored as elements in an array gives incorrect result

I am brand new to python and am trying to write a script that iterates over a list created by parsing a simple tab delineated file then comparing values in that list.
The input file looks like this:
1 11 12
9 46 200
471 56 30
And the code is:
with open("my_file.txt", "r") as my_file:
str = my_file.read()
clnstr = str.replace('\n', '\t')
content_list = clnstr.split("\t")
for i in range(0, len(content_list)-3, 3 ):
if (content_list[i+1] >= content_list[i]):
print(content_list[i], "is smaller than", content_list[i+1])
else :
print(content_list[i], "is bigger than", content_list[i+1])
However the output of these comparisons is wrong for some of the values:
1 is smaller than 11
9 is bigger than 46
471 is smaller than 56
I think its because its only comparing the first digits of the numbers stored in the array? If so how do i fix this in the above code?

Creating a series of 2-dimensional arrays from a text file in Julia

I'm trying to write a Sudoku solver, which is the fun part. The un-fun part is actually loading the puzzles into Julia from a text file. The text file consists of a series of puzzles comprising a label line followed by 9 lines of digits (0s being used to denote blank squares). The following is a simple example of the sort of text file I am using (sudokus.txt):
Easy 7
000009001
008405670
940000032
034061800
070050020
002940360
890000056
061502700
400700000
Medium 95
000300100
800016070
000009634
001070000
760000015
000020300
592400000
030860002
007002000
Hard 143
000003700
305061000
000200004
067002100
400000003
003900580
200008000
000490308
008100000
What I want to do is strip out the label lines and store the 9x9 grids in an array. File input operations are not my specialist subject, and I've tried various methods such as read(), readcsv(), readlines() and readline(). I don't know whether there is any advantage to storing the digits as characters rather than integers, but leading zeros have to be maintained (a problem I have encountered with some input methods and with abortive attempts to use parse()).
I've come up with a solution, but I suspect it's far from optimal:
function main()
open("Text Files\\sudokus.txt") do file
grids = Vector{Matrix{Int}}()
grid = Matrix{Int}(0,9)
row_no = 0
for line in eachline(file)
if !(all(i -> isnumber(i), line))
continue
else
row_no += 1
squares = split(line, "")
row = transpose([parse(Int, square) for square in squares])
grid = vcat(grid, row)
if row_no == 9
push!(grids, grid)
grid = Matrix{Int}(0,9)
row_no = 0
end
end
end
return grids
end
end
#time main()
I initially ran into #code_warntype problems from the closure, but I seem to have solved those by moving my grids, grid and row_no variables from the main() function to the open block.
Can anyone come up with a more efficient way to achieve my objective or improve my code? Is it possible, for example, to load 10 lines at a time from the text file? I am using Julia 0.6, but solutions using 0.7 or 1.0 will also be useful going forward.
I believe your file is well-structured, by that I mean each 1,11,21... contains difficulty information and the lines between them contains the sudoku rows. Therefore if we know the number of lines then we know the number of sudokus in the file. The code utilizes this information to pre-allocate an array of exactly the size needed.
If your file is too-big then you can play with eachline instead of readlines. readlines read all the lines of the file into the RAM while eachline creates an iterable to read lines one-by-one.
function readsudoku(file_name)
lines = readlines(file_name)
sudokus = Array{Int}(undef, 9, 9, div(length(lines),10)) # the last dimension is for each sudoku
for i in 1:length(lines)
if i % 10 != 1 # if i % 10 == 1 you have difficulty line
sudokus[(i - 1) % 10, : , div(i-1, 10) + 1] .= parse.(Int, collect(lines[i])) # collect is used to create an array of `Char`s
end
end
return sudokus
end
This should run on 1.0 and 0.7 but I do not know if it runs on 0.6. Probably, you should remove undef argument in Array allocation to make it run on 0.6.
Similar to Hckr's (faster) approach, my first idea is:
s = readlines("sudoku.txt")
smat = reshape(s, 10,3)
sudokus = Dict{String, Matrix{Int}}()
for k in 1:3
sudokus[smat[1,k]] = parse.(Int, permutedims(hcat(collect.(Char, smat[2:end, k])...), (2,1)))
end
which produces
julia> sudokus
Dict{String,Array{Int64,2}} with 3 entries:
"Hard 143" => [0 0 … 0 0; 3 0 … 0 0; … ; 0 0 … 0 8; 0 0 … 0 0]
"Medium 95" => [0 0 … 0 0; 8 0 … 7 0; … ; 0 3 … 0 2; 0 0 … 0 0]
"Easy 7" => [0 0 … 0 1; 0 0 … 7 0; … ; 0 6 … 0 0; 4 0 … 0 0]

trouble with results of .split(" ") in Ruby

I am just starting to learn ruby and I am having troubles splitting my strings by spaces.
First I read in my file and break them up by the newline character :
inputfile = File.open("myfile.in")
filelines = inputfile.read.split("\n")
Then I try to read each of the two numbers individually:
filelines.each_with_index {|val, index| do_something(val, index)}
Where do_something is defined as:
def do_something(value, index)
if index == 0
numcases = value
puts numcases
else
value.split(" ")
puts value
puts value[0] #trying to access the first number
puts value[1] #trying to access the second number
end
end
but with a smaller input file like this one,
42
4 2
11 19
0 10
10 0
-10 0
0 -10
-76 -100
5 863
987 850
My outputs ends up looking like this:
42
4 2
4
11 19
1
1
0 10
0
10 0
1
0
-10 0
-
1
0 -10
0
-76 -100
-
7
5 863
5
987 850
9
8
so what I am understanding is that it is breaking it up character by character, rather than by spaces. I know it can read in the whole line, as I can print the contents of the array in its entirety, but I dont know what I am doing wrong.
I have also tried replacing value.split(" ") with:
value.gsub(/\s+/m, ' ').strip.split(" ")
value.split
value.split("\s")
Using RubyMine 2017.3.2
As was said in the comments, plus some other points, with an idiomatic code sample:
lines = File.readlines('myfile.in')
header_line, data_lines = lines[0], lines[1..-1]
num_cases = header_line.to_i
arrays_of_number_strings = data_lines.map(&:split)
arrays_of_numbers = arrays_of_number_strings.map do |array_of_number_strings|
array_of_number_strings.map(&:to_i)
end
puts "#{num_cases} cases in file."
arrays_of_numbers.each { |a| p a }
File.readlines is super handy!
I don't think you were calling to_i on the header information, that
will be important.
The data_lines.map(&:split) will return an array of the numbers as strings, but then you'll need to convert those strings to numbers too.
The p a in the final line will use the Array#inspect method, which is handy for viewing arrays as arrays, e.g. [12, 34].

Octave read each line of file to vector

I have a file data.txt
1 22 34 -2
3 34 -3
2
3 43 -3 2 3
And I want to load this file onto Octave as separate matrices
matrix1 = [1; 22; 34 ;-2]
matrix2 = [3; 34 -3]
.
.
.
How do I do this? I've tried fopen and fgetl, but it seems as if each character is given its own spot in the matrix. I want to separate the values, not the characters (it's space delimited).
quick and dirty:
A = dlmread("file");
matrix = 1; # just for nice varname generation
for i = 1:size(A,1)
name = genvarname("matrix",who());
eval([name " = A(i,:);"]);
eval(["[_,__," name "] = find(" name ");"]);
end
clear _ __ A matrix i
The format needs to be as you specified.

find and replace values in cell array

I have a cell array like this: [...
0
129
8...2...3...4
6...4
0
I just want to find and replace specific values, but I can't use the ordinary function because the cells are different lengths. I need to replace many specific values at the same time and there is no general function about how values are replaced. However, sometimes several input values should be replaced by the same output.
so I want to say
for values 1:129
'if 0, then 9'
'elseif 1 then 50'
'elseif 2 or 3 or 4 then 61'
etc...up to 129
where these rules are applied to the entire array.
I've tried to work it out myself, but still getting nowhere. Please help!
Since your values appear to span the range 0 to 129, one solution is to add one to these values (so they span the range 1 to 130) and use them as indices into a vector of replacement values. Then you can apply this operation to each cell using the function CELLFUN. For example:
>> C = {0, 129, [8 2 3 4], [6 4], 0}; %# The sample cell array you give above
>> replacement = [9 50 61 61 61 100.*ones(1,125)]; %# A 1-by-130 array of
%# replacement values (I
%# added 125 dummy values)
>> C = cellfun(#(v) {replacement(v+1)},C); %# Perform the replacement
>> C{:} %# Display the contents of C
ans =
9
ans =
100
ans =
100 61 61 61
ans =
100 61
ans =
9

Resources