I have a problem with my ruby script. I have an array
files = ["2020-09-14.access","2020-09-13.access","2020-09-11.access","2020-09-10.access","2020-09-09.access","2020-09-08.access","2020-09-07.access","2020-09-05.access","2020-09-04.access","2020-09-02.access","2020-09-01.access","2020-09-14.sale","2020-09-12.sale","2020-09-08.sale","2020-09-07.sale","2020-09-06.sale","2020-09-04.sale",]
that contains values that are file names. There are two types of files: access and sale. Every file name contains date of file creating. From each file type I want to get only these values with older dates beginning form file created two days ago. For the file type sale there is no problem, today is 2020-09-14, file created two days ago is 2020-09-12.sale. But in case access files there is no file created 2020-09-12 so I want file with the closest date to 2020-09-12 which means value 2020-09-10.access and I'm stack in here. In short I want to get array like this
to_del_files = [["2020-09-10.access","2020-09-09.access","2020-09-08.access","2020-09-07.access","2020-09-05.access","2020-09-04.access","2020-09-02.access","2020-09-01.access"],["2020-09-12.sale","2020-09-08.sale","2020-09-07.sale","2020-09-06.sale","2020-09-04.sale"]]
My code is below:
require 'date'
files = ["2020-09-14.access","2020-09-13.access","2020-09-10.access","2020-09-09.access","2020-09-08.access","2020-09-07.access","2020-09-05.access","2020-09-04.access","2020-09-02.access","2020-09-01.access","2020-09-14.sale","2020-09-12.sale","2020-09-08.sale","2020-09-07.sale","2020-09-06.sale","2020-09-04.sale",]
names = files.map {|x| x.split('.')[1] }.uniq
puts names
date = Date.today
date2ago = date -2
to_del_files = []
names.each do |item|
tmp = files.select { |x| x =~ /#{item}/ }
flag = tmp.select {|x| x =~ /#{date2ago}/ }
if flag.size > 0
index = tmp.find_index("#{flag[0]}")
to_del_files << tmp[index..-1]
else
#what to do in case where there is no such date in files
end
end
puts to_del_files
Thanks for any help.
In order for you to get the files to delete:
def old_files(files, date)
files.sort.filter { |file| Date.parse(file) < date }
end
And then you can use:
files = ["2020-09-14.access","2020-09-13.access","2020-09-10.access","2020-09-09.access","2020-09-08.access","2020-09-07.access","2020-09-05.access","2020-09-04.access","2020-09-02.access","2020-09-01.access","2020-09-14.sale","2020-09-12.sale","2020-09-08.sale","2020-09-07.sale","2020-09-06.sale","2020-09-04.sale",]
today = Date.today
date = today -2
to_del_files = old_files(files, date)
I understand you wish to select elements from files corresponding to dates that are equal to or earlier than a given date. If that is correct you can do that as follows.
files = [
"2020-09-14.access", "2020-09-13.access", "2020-09-11.access",
"2020-09-10.access", "2020-09-09.access", "2020-09-08.access",
"2020-09-07.access", "2020-09-05.access", "2020-09-04.access",
"2020-09-02.access", "2020-09-01.access", "2020-09-14.sale",
"2020-09-12.sale", "2020-09-08.sale", "2020-09-07.sale",
"2020-09-06.sale", "2020-09-04.sale"
]
require 'date'
def files_on_or_before_date(arr)
files_on_or_before_target_date(arr, Date.now-2)
end
def files_on_or_before_target_date(arr, target_date)
arr.select { |d| Date.strptime(d, '%Y-%m-%d') <= target_date }
end
files_on_or_before_target_date(files, Date.new(2020, 9, 12))
#=> ["2020-09-11.access", "2020-09-10.access", "2020-09-09.access",
# "2020-09-08.access", "2020-09-07.access", "2020-09-05.access",
# "2020-09-04.access", "2020-09-02.access", "2020-09-01.access",
# "2020-09-12.sale", "2020-09-08.sale", "2020-09-07.sale",
# "2020-09-06.sale", "2020-09-04.sale"]
files_on_or_before_target_date(files, Date.new(2020, 9, 10))
#=> ["2020-09-10.access", "2020-09-09.access", "2020-09-08.access",
# "2020-09-07.access", "2020-09-05.access", "2020-09-04.access",
# "2020-09-02.access", "2020-09-01.access", "2020-09-08.sale",
# "2020-09-07.sale", "2020-09-06.sale", "2020-09-04.sale"]
These return values can of course be added to an array.
See Date::strptime and DateTime#strftime, the latter for date formatting directives.
Date.strptime("2020-09-14.access", '%Y-%m-%d')
returns the same Date object as does
Date.strptime("2020-09-14", '%Y-%m-%d')
To guard against possible future change in the implementation of Date::strptime strptime's argument d could be replaced with d[/[^.]+/] or d[0, d.index('.')], both of which become "2020-09-14" when d = "2020-09-14.access".
Related
I am trying to find the highest sales between two given dates.
this is what my ad_report.csv file with headers:
date,impressions,clicks,sales,ad_spend,keyword_id,asin
2017-06-19,4451,1006,608,24.87,UVOLBWHILJ,63N02JK10S
2017-06-18,5283,3237,1233,85.06,UVOLBWHILJ,63N02JK10S
2017-06-17,0,0,0,21.77,UVOLBWHILJ,63N02JK10S
...
Below is all the working code I have that returns the row with the highest value, but not between the given dates.
require 'csv'
require 'date'
# get directory of the current file
LIB_DIR = File.dirname(__FILE__)
# get the absolute path of the ad_report & product_report CSV
# and set to a var
AD_CSV_PATH = File.expand_path('data/ad_report.csv', LIB_DIR)
PROD_CSV_PATH = File.expand_path('data/product_report.csv', LIB_DIR)
# create CSV::Table for ad-ad_report and product_report CSV
ad_report_table = CSV.parse(File.read(AD_CSV_PATH), headers: true)
prod_report_table = CSV.parse(File.read(PROD_CSV_PATH), headers: true)
## finds the row with the highest sales
sales_row = ad_report_table.max_by { |row| row[3].to_i }
At this point I can get the row that has the greatest sale, and all the data from that row, but it is not in the excepted range.
Below I am trying to use range with the preset dates.
## range of date for items between
first_date = Date.new(2017, 05, 02)
last_date = Date.new(2017, 05, 31)
range = (first_date...last_date)
puts sales_row
below is sudo code of what I feel that I am supposed to do, but there is probably a better method.
## check for highest sales
## return sales if between date
## else reject col if
## loop this until it returns date between
## return result
You could do this by creating a range containing two dates and then use Range#cover? to test if the date is in the range:
range = Date.new(2015-01-01)..Date.new(2020-01-01)
rows.select do |row|
range.cover?(Date.parse(row[1]))
end.max_by { |row| row[3].to_i }
Although the Tin Man is completely right in that you should use a database instead.
You could obtained the desired value as follows. I have assumed that the field of interest ('sales') represents integer values. If not, change .to_i to .to_f below.
Code
require 'csv'
def greatest(fname, max_field, date_field, date_range)
largest = nil
CSV.foreach(fname, headers:true) do |csv|
largest = { row: csv.to_a, value: csv[max_field].to_i } if
date_range.cover?(csv[date_field]) &&
(largest.nil? || csv[max_field].to_i > largest[:value])
end
largest.nil? ? nil : largest[:row].to_h
end
Examples
Let's first create a CSV file.
str =<<~END
date,impressions,clicks,sales,ad_spend,keyword_id,asin
2017-06-19,4451,1006,608,24.87,UVOLBWHILJ,63N02JK10S
2017-06-18,5283,3237,1233,85.06,UVOLBWHILJ,63N02JK10S
2017-06-17,0,0,0,21.77,UVOLBWHILJ,63N02JK10S
2017-06-20,4451,1006,200000,24.87,UVOLBWHILJ,63N02JK10S
END
fname = 't.csv'
File.write(fname, str)
#=> 263
Now find the record within a given date range for which the value of "sales" is greatest.
greatest(fname, 'sales', 'date', '2017-06-17'..'2017-06-19')
#=> {"date"=>"2017-06-18", "impressions"=>"5283", "clicks"=>"3237",
# "sales"=>"1233", "ad_spend"=>"85.06", "keyword_id"=>"UVOLBWHILJ",
# "asin"=>"63N02JK10S"}
greatest(fname, 'sales', 'date', '2017-06-17'..'2017-06-25')
#=> {"date"=>"2017-06-20", "impressions"=>"4451", "clicks"=>"1006",
# "sales"=>"200000", "ad_spend"=>"24.87", "keyword_id"=>"UVOLBWHILJ",
# "asin"=>"63N02JK10S"}
greatest(fname, 'sales', 'date', '2017-06-22'..'2017-06-25')
#=> nil
I read the file line-by-line (using CSV#foreach) to keep memory requirements to a minimum, which could be essential if the file is large.
Notice that, because the date is in "yyyy-mm-dd" format, it is not necessary to convert two dates to Date objects to compare them; that is, they can be compared as strings (e.g. '2017-06-17' <= '2017-06-18' #=> true).
I have multiple csv files that have the name and the price of products. There may be or may not be products that are in both files. I have to find the highest and the lowest price across these files for each product.
I joined products from both files into one array:
Dir["./*.csv"].each do |file|
CSV.foreach(file, headers:true) do |row|
tmpRow = row.to_s.chomp + "," + file #saving name of the input file
list.push(tmpRow.chomp.split(","))
end
end
The array list looks like this:
[["5893105","2.38", "weightOrSomethingIrrelevant", "./FIAT_2.csv"]]
This is the main algorithm:
while list[0] do
if list[0] != nil
tmpPart = list[0][0]
tmpParts = list.select{ |part, price| part == tmpPart}
tmpParts.each do |tp|
tmpPrices.push(tp[1])
end
list[0][2].to_f != 0.0 ? tmpWeight = list[0][2].to_s : tmpWeight = "Undefined"
tmpMaxPrice = tmpParts.select{|part, price| part == tmpPart && price == tmpPrices.max}
tmpMinPrice = tmpParts.select{|part, price| part == tmpPart && price == tmpPrices.min}
result.push([tmpPart, tmpWeight, tmpPrices.max, tmpMaxPrice[0].last, tmpPrices.min, tmpMinPrice[0].last)
tmpPart = ""
list = list - tmpParts
tmpParts = []
tmpPrices = []
tmpMaxPrice = []
tmpMinPrice = []
tmpWeight = ""
end
end
The input files are huge (over 200 000 rows), so I am having problems with efficiency of my algorithm (as it processes one row in half a second).
I am wondering if there is any better way to write this app.
I would split this into several parts:
1) I suggest you have a table which represents files (the file name, location, line number etc) and connected to that a product table (the row data from that file)
2) script / function to ingest files and store rows as DB records
3) script / function to analyse rows and find products by name, using the DB and pulling price info out using Min/max.
This could later be improved to deal with naming inconsistencies products vs product occurrences etc.
I start with an empty array, and a Hash of key, values.
I would like to iterate over the Hash and compare it against the empty array. If the value for each k,v pair doesn't already exist in the array, I would like to create an object with that value and then access an object method to append the key to an array inside the object.
This is my code
class Test
def initialize(name)
#name = name
#values = []
end
attr_accessor :name
def values=(value)
#values << value
end
def add(value)
#values.push(value)
end
end
l = []
n = {'server_1': 'cluster_x', 'server_2': 'cluster_y', 'server_3': 'cluster_z', 'server_4': 'cluster_x', 'server_5': 'cluster_y'}
n.each do |key, value|
l.any? do |a|
if a.name == value
a.add(key)
else
t = Test.new(value)
t.add(key)
l << t
end
end
end
p l
I would expect to see this:
[
#<Test:0x007ff8d10cd3a8 #name=:cluster_x, #values=["server_1, server_4"]>,
#<Test:0x007ff8d10cd2e0 #name=:cluster_y, #values=["server_2, server_5"]>,
#<Test:0x007ff8d10cd1f0 #name=:cluster_z, #values=["server_3"]>
]
Instead I just get an empty array.
I think that the condition if a.name == value is not being met and then the add method isn't being called.
#Cyzanfar gave me a clue as to what to look for, and I found the answer here
https://stackoverflow.com/a/34904864/5006720
n.each do |key, value|
found = l.detect {|e| e.name == value}
if found
found.add(key)
else
t = Test.new(value)
t.add(key)
l << t
end
end
#ARL you're almost there! The last thing you need to consider is when found actually returns an object since detect will find a matching one at some point.
n.each do |key, value|
found = l.detect {|e| e.name == value}
if found
found.add(key)
else
t = Test.new(value)
t.add(key)
l << t
end
end
You actually only want to add a new instance of Test when found return nil. This code should yield your desired output:
[
#<Test:0x007ff8d10cd3a8 #name=:cluster_x, #values=["server_1, server_4"]>,
#<Test:0x007ff8d10cd2e0 #name=:cluster_y, #values=["server_2, server_5"]>,
#<Test:0x007ff8d10cd1f0 #name=:cluster_z, #values=["server_3"]>
]
I observe two things in your code :
def values=(value)
#values << value
def add(value)
#values.push(value)
two methods do the same thing, pushing a value, as << is a kind of syntactic sugar meaning push
you have changed the meaning of values=, which is usually reserved for a setter method, equivalent to attire_writer :values.
Just to illustrate that there are many ways to do things in Ruby, I propose the following :
class Test
def initialize(name, value)
#name = name
#values = [value]
end
def add(value)
#values << value
end
end
h_cluster = {} # intermediate hash whose key is the cluster name
n = {'server_1': 'cluster_x', 'server_2': 'cluster_y', 'server_3': 'cluster_z',
'server_4': 'cluster_x', 'server_5': 'cluster_y'}
n.each do | server, cluster |
puts "server=#{server}, cluster=#{cluster}"
cluster_found = h_cluster[cluster] # does the key exist ? => nil or Test
# instance with servers list
puts "cluster_found=#{cluster_found.inspect}"
if cluster_found
then # add server to existing cluster
cluster_found.add(server)
else # create a new cluster
h_cluster[cluster] = Test.new(cluster, server)
end
end
p h_cluster.collect { | cluster, servers | servers }
Execution :
$ ruby -w t.rb
server=server_1, cluster=cluster_x
cluster_found=nil
server=server_2, cluster=cluster_y
cluster_found=nil
server=server_3, cluster=cluster_z
cluster_found=nil
server=server_4, cluster=cluster_x
cluster_found=#<Test:0x007fa7a619ae10 #name="cluster_x", #values=[:server_1]>
server=server_5, cluster=cluster_y
cluster_found=#<Test:0x007fa7a619ac58 #name="cluster_y", #values=[:server_2]>
[#<Test:0x007fa7a619ae10 #name="cluster_x", #values=[:server_1, :server_4]>,
#<Test:0x007fa7a619ac58 #name="cluster_y", #values=[:server_2, :server_5]>,
#<Test:0x007fa7a619aac8 #name="cluster_z", #values=[:server_3]>]
I writing a little program to generate some bogus top-ten sales numbers for book sales. I'm trying to do this in as compact a fashion as possible and do it without using MySQL or another DB.
I have written out what I want to happen. I've created a bogus catalog array and a bogus sales array corresponding sales to the index of the catalog entries. That part all works great.
I want to create a third array that includes all the titles from the catalog array with the sales numbers from the sales array, like a join in a DB, but without any DB. I can't figure out how to do that part of it though. I think once I have it in there I can sort it the way I want it, but making that third array is killing. I cannot figure out what I'm doing wrong or how to do it right.
So given the following code:
require 'random_word'
class BestOnline
def initialize
#catalog = Array.new
#sales = Array.new
#topten = Array.new
inventory = rand(50) + 10
days = rand(1..50)
now = Time.now
yesterday = now - 86400
saleshistory = now - (days * 86400)
(1..inventory).each do
#catalog << {
:title => "#{RandomWord.adjs.next.capitalize} #{RandomWord.nouns.next.capitalize}",
:price => rand(5.99..29.99).round(2)}
end
(0..days).each do
#sales << {
:id => rand(0..#catalog.count),
:salescount => rand(0..24),
:date => rand(saleshistory..now) }
end
end
def bestsellers
#sales.each do
# THIS DOESNT WORK AND I'M STUCK AS HOW TO FIX IT.
# #topten << {
# :title => #catalog[:id],
# :salescount => #sales[:salescount]
# }
end
puts #topten.group_by{ |tt| tt[:salescount]}.sort_by{ |k,v| -k}.first(10)
end
end
BestOnline.new.bestsellers
How can I create a third array that contains the titles and number of sales and output the result of the top-ten books sold?
Try this out:
def bestsellers
#sales.each do |sale|
#topten << {
title: #catalog[sale[:id]][:title],
salescount: sale[:salescount] }
end
#topten.sort! { |x, y| y[:salescount] <=> x[:salescount] }
puts #topten.first(10)
end
I suggest you write:
def bestsellers(sales)
sales.max_by(10) { |h| h[:salescount][:salescount]] }
end
puts bestsellers(sales)
Enumerable#max_by was permitted to have an argument in Ruby v2.2.
There are several problems with the way you've structured your code. Now that you have running code (by incorporating #fbonds66's answer), I suggest you post it at SO's sister-site Code Review. The purpose of CR is to suggest improvements to working code. If you read through some of the questions and answers there I think you will be impressed.
I was doing the dereferencing wrong trying to build the 3rd array of the 1st two:
#sales.each do |sale|
#topten << {
:title => #catalog[sale[:id]][:title],
:salescount => sale[:salescount]
}
end
I needed to work on the hash returned from .each as |sale| and use correct syntax to get what I was after from the other arrays.
Is it possible to create an Array from another Array?
Lang: Ruby on Rails
Case
Workers are entitled to fill in their own work hours. Sometimes they forget to do it. This is what I want to tackle. In the end, I want an Array with time codes of periods the worker forgot to register his hours.
timecodes = [201201, 201202, 201203, 201204, 201205, 201206, 201207, 201208, 201209, 201210, 201211, 201212, 201213, 201301, 201302, 201304, 201305, 201306, ...]
Worker works from 201203 to 201209 with us.
timecards = [201203, 201204, 201205, 201207, 201208, 201209]
As you see, he forgot to register 201206.
What I want to do
# Create Array from timecode on start to timecode on end
worked_with_us = [201203, 201204, 201205, 201206, 201207, 201208, 201209]
#=> This is the actual problem, how can I automate this?
forgot_to_register = worked_with_us.?????(timecards)
forgot_to_register = worked_with_us - timecards # Thanks Zwippie
#=> [201206]
Now I know which period the worker forgot to register his hours.
All together
How can I create an Array from another Array, giving a start and end value?
You can just subtract arrays with - (minus):
[1, 2, 3] - [1, 3] = [2]
To build an array with years/months, this can be done with a Range, but this only works if you build an array for each year, something like:
months = (2012..2013).map do |year|
("#{year}01".."#{year}12").to_a.collect(&:to_i)
end.flatten
=> [201201, 201202, 201203, 201204, 201205, 201206, 201207, 201208, 201209, 201210, 201211, 201212, 201301, 201302, 201303, 201304, 201305, 201306, 201307, 201308, 201309, 201310, 201311, 201312]
And for the function to create those ranges dynamically:
def month_array(year_from, year_to, month_from=1, month_to=12)
(year_from..year_to).map do |year|
# Correct from/to months
mf = year_from == year ? month_from : 1
mt = year_to == year ? month_to : 12
(mf..mt).map do |month|
("%d%02d" % [year, month]).to_i
end
end.flatten
end
Update: You wanted other input parameters for this method, but I hope you can work that out yourself. :)