Ruby String Split on "\t" loses "\n" - arrays

\tTrying to split this Tab delimited data set:
171 1000 21
269 1000 25
389 1000 40
1020 1-03 30 1
1058 1-03 30 1
1074 1-03 30 1
200 300 500
(for clarity: )
171\t1000\t21\t\n
269\t1000\t25\t\n
389\t1000\t40\t\n
1020\t1-03\t30\t1\n
1058\t1-03\t30\t1\n
1074\t1-03\t30\t1\n
200\t300\t\t500\n
a = text.split(/\n/)
a.each do |i|
u = i.split(/\t/)
puts u.size
end
==>
3
3
3
4
4
4
4
The \t\n combination seems to shave off the last \t, which I need for further importation. How can I get around this? Cheers
Edited: This is what I was expecting:
4
4
4
4
4
4
4

If this is for production, you should be using the CSV class as #DmitryZ pointed out in the comments. CSV processing has a surprising number of caveats and you should not do it by hand.
But let's go through it as an exercise...
The problem is split does not keep the delimiter, and it does not keep trailing null columns. You've hit both issues.
When you run a = text.split(/\n/) then the elements of a do not have newlines.
a = [
171\t1000\t21\t
269\t1000\t25\t
389\t1000\t40\t
1020\t1-03\t30\t1
1058\t1-03\t30\t1
1074\t1-03\t30\t1
200\t300\t\t500
]
Then, as documented in String#split, "if the limit parameter is omitted, trailing null fields are suppressed.", so u = i.split(/\t/) will ignore that last field unless you give it a limit.
If you know it's always going to be 4 fields, you can use 4.
u = i.split(/\t/, 4)
But it's probably more flexible to use -1 because "If [the limit is] negative, there is no limit to the number of fields returned, and trailing null fields are not suppressed." so that will keep the empty fields without hard coding the number of columns in the CSV.
u = i.split(/\t/, -1)

Related

trouble with results of .split(" ") in Ruby

I am just starting to learn ruby and I am having troubles splitting my strings by spaces.
First I read in my file and break them up by the newline character :
inputfile = File.open("myfile.in")
filelines = inputfile.read.split("\n")
Then I try to read each of the two numbers individually:
filelines.each_with_index {|val, index| do_something(val, index)}
Where do_something is defined as:
def do_something(value, index)
if index == 0
numcases = value
puts numcases
else
value.split(" ")
puts value
puts value[0] #trying to access the first number
puts value[1] #trying to access the second number
end
end
but with a smaller input file like this one,
42
4 2
11 19
0 10
10 0
-10 0
0 -10
-76 -100
5 863
987 850
My outputs ends up looking like this:
42
4 2
4
11 19
1
1
0 10
0
10 0
1
0
-10 0
-
1
0 -10
0
-76 -100
-
7
5 863
5
987 850
9
8
so what I am understanding is that it is breaking it up character by character, rather than by spaces. I know it can read in the whole line, as I can print the contents of the array in its entirety, but I dont know what I am doing wrong.
I have also tried replacing value.split(" ") with:
value.gsub(/\s+/m, ' ').strip.split(" ")
value.split
value.split("\s")
Using RubyMine 2017.3.2
As was said in the comments, plus some other points, with an idiomatic code sample:
lines = File.readlines('myfile.in')
header_line, data_lines = lines[0], lines[1..-1]
num_cases = header_line.to_i
arrays_of_number_strings = data_lines.map(&:split)
arrays_of_numbers = arrays_of_number_strings.map do |array_of_number_strings|
array_of_number_strings.map(&:to_i)
end
puts "#{num_cases} cases in file."
arrays_of_numbers.each { |a| p a }
File.readlines is super handy!
I don't think you were calling to_i on the header information, that
will be important.
The data_lines.map(&:split) will return an array of the numbers as strings, but then you'll need to convert those strings to numbers too.
The p a in the final line will use the Array#inspect method, which is handy for viewing arrays as arrays, e.g. [12, 34].

What input operator can we use in C that ignores 'space' and accepts 'Enter' while taking array inputs

Consider an example:
5
1 0 5
1 1 7
1 0 3
2 1 0
2 1 1
Here, in the first line, 5 denotes the size of the array.
I'm entering five sequences one by one.
I want the first sequence ie. 1 0 5 to be stored in arr[0].
Note: 1, 0 and 5 are seperated by spaces.
However, arr[0] should contain 105 without any space.
I want to accept the next sequence into arr[1] only after pressing 'Enter'.
So that arr[1] should contain 117, arr[2] should contain 103 and so on up to arr[4].
Is there any operator that I can use for this?
There are no operators that do I/O in C at all, so no.
I also don't think there's any standard function with those semantics, they tend to view all whitespace as equal.
You should write your own, probably using fgets() to read in whole lines and then extracting the digits to convert to integers.

Merge multiple arrays of unique occurrences

I want to merge multiple arrays of unique occurrences to a single array. To get the arrays in the first place I use this code, where image series is a slice from a tiff image imported using imread:
a = unique(img_series);
occu = [a,histc(img_series(:),a)];
I do that multiple times, because the tiff image I'm using has multiple hundred images stacked, which my RAM will not support to import at once. So each 'occu' looks something like this (first number is the unique value, second number is the number of occurrences):
occu1 occu2 .....
0 1 1 2
12 1 10 1
14 1 12 1
15 1 14 2
.. .. .. .. .....
Now I want to merge them all together, or better merge them in each iteration, when I'm reading another stacked image.
The merged results should be a 2D matrix similar to the one above. The number of occurrences of the same values should be added to one another, as this is the whole point of counting them. So the result of the above example should be this:
occu_total
0 1
1 2
10 1
12 2
14 3
15 1
.. ..
I found the join command, but that one does not seem to work here. I guess I could do it the long way of searching the matching number and add the occurrences together and so on, but there must be a quicker way of doing it.
A = [0 1;12 1; 14 1;15 1];B = [1 2;10 1;12 1;14 2];
tmp = [A;B]; %// merge arrays into a single one
tmp(:,1) = tmp(:,1)+1;%// remove zero occurrences by adding 1 to everything
C = accumarray(tmp(:,1),tmp(:,2)); %// add occurrences all up
D = [1:numel(C)].'; %// create numbered array
E = [D C];
E((C==0),:)=[]; %// get output
E(:,1) = E(:,1)-1;%// subtract the 1 again
E =
0 1
1 2
10 1
12 2
14 3
15 1
Job for accumarray. This takes the first argument as your dictionary key, and adds the values of the each key together. The addition and subtraction of 1 is done because 0 cannot be an index in MATLAB. To circumvent this (assuming you have no negative numbers), you can simply add 1 and remove that afterwards, shifting all your indices to positive integers. If you hit negative numbers, subtract tmp(:,1) = min(tmp(:,1)+1 and add E(:,1) = min(tmp(:,1)-1

Intertwining 3 arrays in matlab / octave to get correct pattern

I know I can intertwine 2 arrays by
C = [A(:),B(:)].'; %'
D = C(:)
But how can I intertwine 3 arrays with a (pendulum type of pattern going back and forth) See image below with arrows showing the intertwining path pattern I'm trying to get (each column is an array). Also the number pattern I'm trying to get is also next to it, in one large column. Please note the numerical values are just examples to make it easier to read. the numerical values could be decimals also
I tried the code below but the pattern is incorrect.
A=[1,2,3,4,5]
B=[10,20,30,40,50,60,70,80,90]
C=[100,200,300,400,500]
D = [A(:),B(:),C(:)].'; %'
E = D(:)
I get an error in the D array due to the fact that the B array is a larger size than A and C but the number pattern is also not following the pattern I'm trying to get.
1
10
100
2
20
200
3
30
300
4
40
400
5
50
500
error: horizontal dimensions mismatch (5x1 vs 9x1)
The pattern from the 3 arrays I'm trying to get is below.
Please note the numerical values are just examples to make it easier to read. the numerical values could be decimals also
1
10
100
20
2
30
200
40
3
50
300
60
4
70
400
80
5
90
500
PS: I'm using Octave 3.8.1 which is like matlab
Have you tried the following?
D = zeros(4 * size(A, 2) - 1, 1); % initialization
D(1 : 4 : end) = A;
D(2 : 2 : end) = B;
D(3 : 4 : end) = C;

Get average of two consecutive values in a vector depending on logical vector

I am reading data from a file and I am trying to do some manipulation on the vector containing the data basically i want to check if the values come from consecutive lines and if so i want to average each two and put the value in a output vector
part of the data and lines
lines=[153 152 153 154 233 233 234 235 280 279 280 281];
Sail=[ 3 4 3 1.5 3 3 1 2 2.5 5 2.5 2 ];
here is what i am doing
Sail=S(lines);
Y=diff(lines)==1;
for ii=1:length(Y)
if Y(ii)
output(ceil(ii/2))=(Sail(ii)+Sail(ii+1))/2;
end
end
is this correct also is there a way to do that without a for loop
Thanks
My suggestion:
y = find(diff(lines)==1);
output = mean([Sail(y);Sail(y+1)]);
This assumes that when you have, say [233 234 235], you want one value averaging the values from lines [233 234] and one value averaging those from [234 245]. If you wanted to do something more complex when longer sets of consecutive lines exist in your data, then the problem becomes more complex.
Incidentally it's a bad idea to do something like (ceil(ii/2)) - you can't guarantee a unique index for each matching value of ii. If you did want an output the same size as Sail (will have zeros in non-matching areas) then you can do something like this:
output2 = zeros(size(Sail));
output2(y)=output;

Resources