Less compact form - loops

def read_file(filename):
trolls = dict()
try:
with open(filename) as file:
for line in file:
city, data = line.split(':')
for i in data.split():
trolls[city] = [int(_) for _ in data.split()]
except OSError as error:
print(f"Yeuch!: {error}")
return trolls
How can I rewrite this line:
trolls[city] = [int(_) for _ in data.split()]
in a less compact way?

In these two lines in your code
for i in data.split():
trolls[city] = [int(_) for _ in data.split()]
the for loop is uselessly repeating the work on the second line however many elements are in the list times, but the result is the same.
(in your code, the second line isn't indented but it should be, otherwise you'll just get a SyntaxError, so I assume it just didn't paste correctly and you have it indented)
I think you meant to do this
if city not in trolls:
trolls[city] = []
for i in data.split():
trolls[city].append(int(i))
or simply delete the for i in data.split(): line in your original code.

Related

How can I create and shuffle a dataset for triplet mining in TensorFlow 2?

I'm working on a network using triplet mining for training. In order to make it work properly, I need my batches to contain several images of the same class. The problem I'm currently facing is that I have 751 classes, for a total of 12,937 pictures, and a batch size of 48 pictures. When shuffling the dataset using the command below, the odds to get pictures from the same class are really low, making the triplet mining inefficient.
dataset = dataset.shuffle(12937)
What I would need instead is a way of generating batches that contain a specific number of pictures for every class represented in this batch. As an example, let's say here that I want 12 classes per batch, there would be 4 pictures for each of them.
Another problem I'm facing is how would I shuffle this dataset at the end of every epoch so that I can have different batches that still follow the condition fixed above, that is 12 classes, 4 pictures for each one of them?
Is there any proper way to do it? I can't really find one. Please let me know if I'm unclear, and if you need further details.
================ EDIT ================
I've been trying a few things, and came up with something that would do what I want. The function would be the following:
counter = 0.
# Assuming a format such as (data, label)
def predicate(data, label):
global counter
allowed_labels = tf.constant([counter])
isallowed = tf.equal(allowed_labels, tf.cast(label, tf.float32))
reduced = tf.reduce_sum(tf.cast(isallowed, tf.float32))
counter += 1
return tf.greater(reduced, tf.constant(0.))
##tf.function
def custom_shuffle(train_dataset, batch_size, samples_per_class = 4, iterations_in_epoch = 100, database='market'):
assert batch_size%samples_per_class==0, F'batch size must be a {samples_per_class} multiple.'
if database == 'market':
class_nbr = 751
else:
raise Exception('Unsuported database yet')
all_datasets = [train_dataset.filter(predicate) for _ in range(class_nbr)] # Every element of this array is a dataset of one class
for i in range(iterations_in_epoch):
choice = tf.random.uniform(
shape=(batch_size//samples_per_class,),
minval=0,
maxval=class_nbr,
dtype=tf.dtypes.int64,
) # Which classes will be in batch
choice = tf.data.Dataset.from_tensor_slices(tf.concat([choice for _ in range(4)], axis=0)) # Exactly 4 picture from each class in the batch
batch = tf.data.experimental.choose_from_datasets(all_datasets, choice)
if i==0:
all_batches = batch
else:
all_batches = all_batches.concatenate(batch)
all_batches = all_batches.batch(batch_size)
return all_batches
It does what I want, however the returned dataset is extremely slow to iterate, making modele learning impossible. As per this thread, I understood that I needed to decorate custom_shuffle with #tf.function, as the one commented out. However, when doing so, it raises the following error:
Traceback (most recent call last):
File "training.py", line 137, in <module>
main()
File "training.py", line 80, in main
train_dataset = get_dataset(TRAINING_FILENAMES, IMG_SIZE, BATCH_SIZE, database=database, func_type='train')
File "E:\Morgan\TransReID_TF\tfr_to_dataset.py", line 260, in get_dataset
dataset = custom_shuffle(dataset, batch_size)
File "D:\Programs\Anaconda3\envs\AlignedReID_TF\lib\site-packages\tensorflow\python\eager\def_function.py", line 780, in __call__
result = self._call(*args, **kwds)
File "D:\Programs\Anaconda3\envs\AlignedReID_TF\lib\site-packages\tensorflow\python\eager\def_function.py", line 846, in _call
return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds) # pylint: disable=protected-access
File "D:\Programs\Anaconda3\envs\AlignedReID_TF\lib\site-packages\tensorflow\python\eager\function.py", line 1843, in _filtered_call
return self._call_flat(
File "D:\Programs\Anaconda3\envs\AlignedReID_TF\lib\site-packages\tensorflow\python\eager\function.py", line 1923, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "D:\Programs\Anaconda3\envs\AlignedReID_TF\lib\site-packages\tensorflow\python\eager\function.py", line 545, in call
outputs = execute.execute(
File "D:\Programs\Anaconda3\envs\AlignedReID_TF\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: No unary variant device copy function found for direction: 1 and Variant type_index: class tensorflow::data::`anonymous namespace'::DatasetVariantWrapper
[[{{node BatchDatasetV2/_206}}]] [Op:__inference_custom_shuffle_11485]
Function call stack:
custom_shuffle
Which I don't understand, and don't see how to fix.
Is there something I'm doing wrong?
PS: I'm aware the lack of minimal code to reproduce this behavior makes it hard to debug, I'll try to provide some as soon as possible.

How do I get matches from a text file and output them in an array?

I'm using a text file with lines of movies. If a user inputs Oz, I want to output all the movies in the file that have the word Oz in it.
This is what I have so far.
puts "Enter the keyword you want to search for: "
keyword = gets
movies_file = File.new("movies.txt", "r")
movies = movies_file.read
movies_list = movies.split(" ")
match_list = []
movies_list.each do |w|
matchObj = w.match(keyword)
if matchObj then
matchlist.push(matchObj.captures[0])
end
end
match_list.each do |title|
puts title
end
Presuming you've got the file organized like this:
Wizard of Oz
Battlefield Earth
Twilight
Ozymandias
Then you can read it in this way:
lines = File.readlines('movies.txt').map(&:chomp)
Then to find matching lines:
matches = lines.grep(phrase)
There's no need for all the each stuff. Also the then on an if is almost never put in there, it's just useless decoration.

Trying to define a function with a loop in ruby, and I'm getting an error where there isn't a function

I wrote a program to extract all (over 1000) the comments from a reddit post, and I'm having trouble defining a function.
The program:
require "rubygems"
require "json"
require "net/http"
require "uri"
require 'open-uri'
require 'neatjson'
#The URL, which changes.
url = ("https://www.reddit.com/r/AskReddit/comments/46n0zc.json")
#Sets up the JSON reader.
result = JSON.parse(open(url).read)
post = result[0]["data"]["children"]
children = result[1]["data"]["children"]
#Sets up the base location.
base = ("https://www.reddit.com" + post[0]["data"]["permalink"].to_s)
#Sets up the arrays.
mainIDs = Array.new
reIDs = Array.new
#Collects the main jsons.
children.each do |child|
if child["data"].has_key? "body"
mainIDs.push(child["data"]["id"].to_s)
end
end
mainINT = mainIDs.count
#Collects the remaining.
children.each do |child|
if child["data"].has_key? "children"
reIDs = child["data"]["children"]
end
end
remainINT = reIDs.count
puts "Main Comments: " + mainINT.to_s
puts "Total Comments: " + (mainINT + remainINT).to_s
#Divides the page.
puts ("__" * 50)
puts ("\n")
#Creates a function for collection.
def printAllComments(array, comINT)
for i in array do i
url = base + i
puts "Post URL: " + url
result = JSON.parse(open(url).read)
children = result[1]["data"]["children"]
int = comINT
for i in children do child
if child["data"].has_key? "body"
puts "Comment Number: " + int.to_s
puts "Author: " + child["data"]["author"]
puts "Body: " + neatBD(child["data"]["body"].to_s)
puts "ID: " + child["data"]["id"]
puts "Ups: " + child["data"]["ups"].to_s
puts "\n\n"
int += 1
end
end
end
end
printAllComments(mainINT, 1)
The "Creates a function for collection" is where the error is. When I run this, I get:
Main Comments: 64
Total Comments: 1676
____________________________________________________________________________________________________
007----extractallredditpostcomments.rb:51:in `printAllComments': undefined method `each' for 64:Fixnum (NoMethodError)
from 007----extractallredditpostcomments.rb:72:in `<main>'
When I should be getting the first 64 comments from the main array, instead it just breaks after it prints the line. What's weird is that the error is on line 51, and line 51 is the:
url = base + i
there's no 'each' function there.
What am I missing?
for i in array do is equivalent to array.each do |i|. If array is not, actually, an array, you will get the error you are getting. Proof:
for i in 64 do puts i end
# NoMethodError: undefined method `each' for 64:Fixnum
Stylistically... no Rubyist uses for; everyone uses each directly.
so there are a couple items that I would point out.
First the loop syntax feels a bit funny. If you want to access each element of the array I would use array.each do |item|, same of the inner loop children.each do |child|.
Next you are actually passing the mainINT instead of mainIDs which is an int, not an array. This is what was throwing the error you asked about.
Lastly you are trying to reference base from within your method but it doesn't exist within the method's scope, try passing it into the method or making the variable global with #base.
Changing these few items I was able to get one URL to print out and then ran into an issue with parsing the JSON but I will leave that to you. This should hopefully get you moving forward.
Patrick

Read from text file and assign data to new variable

Python 3 program allows people to choose from list of employee names.
Data held on text file look like this: ('larry', 3, 100)
(being the persons name, weeks worked and payment)
I need a way to assign each part of the text file to a new variable,
so that the user can enter a new amount of weeks and the program calculates the new payment.
Below is my code and attempt at figuring it out.
import os
choices = [f for f in os.listdir(os.curdir) if f.endswith(".txt")]
print (choices)
emp_choice = input("choose an employee:")
file = open(emp_choice + ".txt")
data = file.readlines()
name = data[0]
weeks_worked = data[1]
weekly_payment= data[2]
new_weeks = int(input ("Enter new number of weeks"))
new_payment = new_weeks * weekly_payment
print (name + "will now be paid" + str(new_payment))
currently you are assigning the first three lines form the file to name, weeks_worked and weekly_payment. but what you want (i think) is to separate a single line, formatted as ('larry', 3, 100) (does each file have only one line?).
so you probably want code like:
from re import compile
# your code to choose file
line_format = compile(r"\s*\(\s*'([^']*)'\s*,\s*(\d+)\s*,\s*(\d+)\s*\)")
file = open(emp_choice + ".txt")
line = file.readline() # read the first line only
match = line_format.match(line)
if match:
name, weeks_worked, weekly_payment = match.groups()
else:
raise Exception('Could not match %s' % line)
# your code to update information
the regular expression looks complicated, but is really quite simple:
\(...\) matches the parentheses in the line
\s* matches optional spaces (it's not clear to me if you have spaces or not
in various places between words, so this matches just in case)
\d+ matches a number (1 or more digits)
[^']* matches anything except a quote (so matches the name)
(...) (without the \ backslashes) indicates a group that you want to read
afterwards by calling .groups()
and these are built from simpler parts (like * and + and \d) which are described at http://docs.python.org/2/library/re.html
if you want to repeat this for many lines, you probably want something like:
name, weeks_worked, weekly_payment = [], [], []
for line in file.readlines():
match = line_format.match(line)
if match:
name.append(match.group(1))
weeks_worked.append(match.group(2))
weekly_payment.append(match.group(3))
else:
raise ...

How do I check for pangrams in a line in ruby?

Some of you may notice I'm already back with the same painful code already. I'm not sure if the other question is still open or not once I accept an answer.
Now the problem is a little simpler. I found some code that checked for pangrams. It use to be def pangram?('sentence') but I needed line to go in there so I tried changing it to def pangram?(line). It doesn't seem to mesh well with my coding style and doesn't work. I tried to use .contain('a' . . 'z') to check for a pangram but someone I know tried that and it didn't work. Also google isn't much help either.
Any ideas for how I could check for pangrams in an if stmt?
# To change this template, choose Tools | Templates
# and open the template in the editor
# This program reads a file line by line,
#separating lines by writing into certain text files.
#PPQ - Pangrams, Palindromes, and Quotes
class PPQ
def pangram?(line)
unused_letters = ('a'..'z').to_a - line.downcase.chars.to_a
unused_letters.empty?
end
def categorize
file_pangram = File.new('pangram.txt', 'w')
file_palindrome = File.new('palindrome.txt', 'w')
file_quotes = File.new('quotes.txt','w')
File.open('ruby1.txt','r') do |file|
while line = file.gets
if(line.reverse == line)
file_palindrome.write line
elsif(pangram?(line)== true)
file_pangram.write line
else
file_quotes.write line
end
end
end
file.close
file_pangram.close
file_palindrome.close
file_quotes.close
end
end
my_ruby_assignment = PPQ.new
my_ruby_assignment.categorize
I'm partial to simpler syntax, something like
def pangram?(line)
('a'..'z').all? { |word| line.downcase.include? (word) }
end
if pangram?(line) then file_pangram.write line end
def pangram?(string)
str = string.chars.map(&:downcase)
letters =('a'..'z').to_a
result = true
letters.each do |l|
if !(str.include? l.downcase)
result = false
break
end
end
result
end

Resources