So, i'm doing a beginners challenge on HackerHank and, a strange behavior of ruby is boggling my mind.
The challenge is: find and count how many pairs there are in the array. (sock pairs)
Here's my code.
n = 100
ar = %w(50 49 38 49 78 36 25 96 10 67 78 58 98 8 53 1 4 7 29 6 59 93 74 3 67 47 12 85 84 40 81 85 89 70 33 66 6 9 13 67 75 42 24 73 49 28 25 5 86 53 10 44 45 35 47 11 81 10 47 16 49 79 52 89 100 36 6 57 96 18 23 71 11 99 95 12 78 19 16 64 23 77 7 19 11 5 81 43 14 27 11 63 57 62 3 56 50 9 13 45)
def sockMerchant(n, ar)
counter = 0
ar.each do |item|
if ar.count(item) >= 2
counter += ar.count(item)/2
ar.delete(item)
end
end
counter
end
print sockMerchant(n, ar)
The problem is, it doesn't count well. after running the function, in it's internal array ar still have countable pairs, and i prove it by running it again.
There's more. If you sort the array, it behaves differently.
it doesnt make sense to me.
you can check the behavior on this link
https://repl.it/repls/HuskyFrighteningNaturallanguage
You're deleting items from a collection while iterating over it - expect bad stuff to happen. In short, don't do that if you don't want to have such problems, see:
> arr = [1,2,1]
# => [1, 2, 1]
> arr.each {|x| puts x; arr.delete(x) }
# 1
# => [2]
We never get the 2 in our iteration.
A simple solution, that is a small variation of your code, could look as follows:
def sock_merchant(ar)
ar.uniq.sum do |item|
ar.count(item) / 2
end
end
Which is basically finding all unique socks, and then counting pairs for each of them.
Note that its complexity is n^2 as for each unique element n of the array, you have to go through the whole array in order to find all elements that are equal to n.
An alternative, first group all socks, then check how many pairs of each type we have:
ar.group_by(&:itself).sum { |k,v| v.size / 2 }
As ar.group_by(&:itself), short for ar.group_by { |x| x.itself } will loop through the array and create a hash looking like this:
{"50"=>["50", "50"], "49"=>["49", "49", "49", "49"], "38"=>["38"], ...}
And by calling sum on it, we'll iterate over it, summing the number of found elements (/2).
The Scenario is as follows:
I have a dynamically changing text file which I'm passing to a variable to capture a pattern that occurs throughout the file. It looks something like this:
my #array1;
my $file = `cat <file_name>.txt`;
if (#array1 = ( $file =~ m/<pattern_match>/g) ) {
print "#array1\n";
}
The array looks something like this:
10:38:49 788 56 51 56 61 56 59 56 51 56 80 56 83 56 50 45 42 45 50 45 50 45 43 45 54 10:38:51 788 56 51 56 61 56 59 56 51 56 80 56 83 56 50 45 42 45 50 45 50 45 43 45 54
From the above array1 output, the pattern of the array is something like this:
T1 P1 t1(1) t1(2)...t1(25) T2 P2 t2(1) t2(2)...t2(25) so on and so forth
Currently, /g in the regex returns a set of values that occur only twice (only because the txt file contains this pattern that number of times). This particular pattern occurrence will change depending on the file name that I plan to pass dynamically.
What I intend to acheive:
The final result should be a csv file that contains these values in the following format:
T1,P1,t1(1),t1(2),...,t1(25)
T2,P2,t2(1),t2(2),...,t2(25)
so on and so forth
For instance: My final CSV file should look like this:
10:38:49,788,56,51,56,61,56,59,56,51,56,80,56,83,56,50,45,42,45,50,45,50,45,43,45,54
10:38:51,788,56,51,56,61,56,59,56,51,56,80,56,83,56,50,45,42,45,50,45,50,45,43,45,54
The delimiter for this pattern is T1 which is time in the format \d\d:\d\d:\d\d
Example: 10:38:49, 10:38:51 etc
What I have tried so far:
use Data::Dumper;
use List::MoreUtils qw(part);
my $partitions = 2;
my $i = 0;
print Dumper part {$partitions * $i++ / #array1} #array1;
In this particular case, my $partitions = 2; holds good since the pattern occurrence in the txt file is only twice, and hence, I'm splitting the array into two. However, as mentioned earlier, the pattern occurrence number keeps changing according to the txt file I use.
The Question:
How can I make this code more generic to achieve my final goal of splitting the array into multiple equal sized arrays without losing the contents of the original array, and then converting these mini-arrays into one single CSV file?
If there is any other workaround for this other than array manipulation, please do let me know.
Thanks in advance.
PS: I considered Hash of Hashes and Array of Hashes, but that kind of a data structure did not seem to be healthy solution for the problem I'm facing right now.
As far as I can tell, all you need is splice, which will work fine as long as you know the record size and it's constant
The data you showed has 52 fields, but the description of it requires 27 fields per record. It looks like each line has T, P, and t1 .. t24, rather than ending at t25
Here's how it looks if I split the data into 26-element chunks
use strict;
use warnings 'all';
my #data = qw/
10:38:49 788 56 51 56 61 56 59 56 51 56 80 56 83 56 50 45 42 45 50 45 50 45 43 45 54 10:38:51 788 56 51 56 61 56 59 56 51 56 80 56 83 56 50 45 42 45 50 45 50 45 43 45 54
/;
while ( #data ) {
my #set = splice #data, 0, 26;
print join(',', #set), "\n";
}
output
10:38:49,788,56,51,56,61,56,59,56,51,56,80,56,83,56,50,45,42,45,50,45,50,45,43,45,54
10:38:51,788,56,51,56,61,56,59,56,51,56,80,56,83,56,50,45,42,45,50,45,50,45,43,45,54
If you wanted to use List::MoreUtils instead of splice, the the natatime function returns an iterator that will do the same thing as the splice above
Like this
use List::MoreUtils qw/ natatime /;
my $iter = natatime 26, #data;
while ( my #set = $iter->() ) {
print join(',', #set), "\n";
}
The output is identical to that of the program above
Note
It is very wrong to start a new shell process just to use cat to read a file. The standard method is to undefine the input record separator $/ like this
my $file = do {
open my $fh, '<', '<file_name>.txt' or die "Unable to open file for input: $!";
local $/;
<$fh>;
};
Or if you prefer you could use File::Slurper like this
use File::Slurper qw/ read_binary /;
my $file = read_binary '<file_name>.txt';
although you will probably have to install it as it is not a core module
input_file1:
a 1 33
a 34 67
a 68 78
b 1 99
b 100 140
c 1 70
c 71 100
c 101 190
input file2:
a 5 23
a 30 72
a 76 78
b 5 30
c 23 88
c 92 98
I want to compare these two files such that for every value of 'a' in file2 the two integers (boundary) fall in the range (boundaries) of 'a' in file1 or between two ranges.
Instead of storing values like this 'a 1 33', you can make one structure (like 'a:1:33') for your data while writing into file. So that it will become easy to read data also.
Then, you can read each line and can split it based on ':' separator and you can compare with another file easily.
What I am trying to accomplish: Pull the last 8 characters from the lines in a file, slice them into two character chunks, compare those chunks with my dictionary, and list the results. This is literally the first thing I have done in python, and my head is spinning with all the answers here.
I think I need basic swimming instruction, and every answer seems to be a primer on free-diving for world records.
I am using the following code (Right now I have the h1 through h4 commented out because it is not returning keys that are in my dictionary):
d1 = {'30': 0, '31': 1, '32': 2, '33' : 3, '34': 4, '35': 5, '36': 6, '37': 7, '38': 8, '39': 9,
'41': 'A', '42': 'B', '43': 'C', '44': 'D', '45': 'E', '46': 'F'}
filename = raw_input("Filename? > ")
with open(filename) as file:
for line in iter(file.readline, ''):
h1 = line[-8:-6]
h2 = line[-6:-4]
h3 = line[-4:-2]
h4 = line[-2:]
#h1 = d1[h1]
#h2 = d1[h2]
#h3 = d1[h3]
#h4 = d1[h4]
print h1,h2,h3,h4
Here is part of the txt file I am using as input:
naa.60000970000192600748533031453442
naa.60000970000192600748533031453342
naa.60000970000192600748533031453242
naa.60000970000192600748533031453142
naa.60000970000192600748533031434442
naa.60000970000192600748533031434342
naa.60000970000192600748533031434242
naa.60000970000192600748533032363342
When I run my script, here is the output generated by the code above:
14 53 44 2
14 53 34 2
14 53 24 2
14 53 14 2
14 34 44 2
14 34 34 2
14 34 24 2
32 36 33 42
The last line looks exactly as I would expect. All the other lines have been shifted or have dropped characters. I am at a loss for this...I have tried many different ways to open the file in python, but have been unable to get them to loop through, or had other issues.
Is there a simple fix I am just missing here? Thanks, j
I suspect that what's going on is that each line you read has a carriage return at the end, except the last one. So the last one is right, but the others are basically splitting the last part of the string. IOW, I think your file lines look something like
>>> open("demo.txt").readline()
'naa.60000970000192600748533031453442\n'
where the \n is the symbol for the carriage return, and is only one character (it's not \ + n). I might write your code something like
with open(filename) as myfile:
for line in myfile:
line = line.strip() # get rid of leading and trailing whitespace
h1 = line[-8:-6]
h2 = line[-6:-4]
h3 = line[-4:-2]
h4 = line[-2:]
print h1,h2,h3,h4
which for me produces
Filename? > demo.txt
31 45 34 42
31 45 33 42
31 45 32 42
31 45 31 42
31 43 44 42
31 43 43 42
31 43 42 42
32 36 33 42
We could simplify the h parts, but we'll leave that alone for now. :^)
I am trying to set a variable equal to state fips codes given its state abbreviation. Is there a shorter way to do this other than:
replace fips = "[fips code]" if other_variable=="[state_abbrev]"
Which I currently have 50 lines of. I would like to create a loop, but given that I have two changing values, I don't know how to avoid looping through every permutation.
Here is an example of the strategy covered in the FAQ.
1) Create a dataset containing two variables: the state name and the associated fips code. To make this slightly more flexible, I include common semi-abbreviations for the state name. In the future, you could add a third variable that includes the two-letter state abbreviation.
clear
input fips str20 state
1 "alabama"
2 "alaska"
4 "arizona"
5 "arkansas"
6 "california"
8 "colorado"
9 "connecticut"
10 "delaware"
11 "district of columbia"
12 "florida"
13 "georgia"
15 "hawaii"
16 "idaho"
17 "illinois"
18 "indiana"
19 "iowa"
20 "kansas"
21 "kentucky"
22 "louisiana"
23 "maine"
24 "maryland"
25 "massachusetts"
26 "michigan"
27 "minnesota"
28 "mississippi"
29 "missouri"
30 "montana"
31 "nebraska"
32 "nevada"
33 "new hampshire"
34 "new jersey"
35 "new mexico"
36 "new york"
37 "north carolina"
37 "n. carolina"
38 "north dakota"
38 "n. dakota"
39 "ohio"
40 "oklahoma"
41 "oregon"
42 "pennsylvania"
44 "rhode island"
45 "south carolina"
45 "s. carolina"
46 "south dakota"
46 "s. dakota"
47 "tennessee"
48 "texas"
49 "utah"
50 "vermont"
51 "virginia"
53 "washington"
54 "west virginia"
54 "w. virginia"
55 "wisconsin"
56 "wyoming"
72 "puerto rico"
end
save statefips, replace
2) Load your primary dataset that holds a variable with state names and perform a many-to-one merge using statefips.dta.
sysuse census, clear
// Convert the state names to lowercase to ensure
// consistency with the statefips dataset
replace state = lower(state)
merge m:1 state using statefips.dta
drop if _merge == 2
drop _merge
If you wanted to preserve the case of the state names in your master data set, you could simply generate a temporary variable and use that for the merge, i.e.
gen statelower = lower(state)
merge m:1 statelower using statefips.dta
Also, once you've created the statefips.dta data set, there's no need to recreate it every time you want to perform a merge. You could simply bundle it along with your project's files and use it when necessary. If you find you want to add two-letter state abbreviations or make some other change, then it's practically instantaneous to recreate it.
No obvious shortcut, but in Stata
. search merge, faq
to find a relevant FAQ by Kit Baum.