Applying different multiplier to an integer depending on threshold - arrays

I'm having to build a program which calculates the Annual cost of minutes used for phone providers and it depends on different rates.
As an example one phone operator may have the following rates:
"rates": [
{"price": 15.5, "threshold": 150},
{"price": 12.3, "threshold": 100},
{"price": 8}
],
operators can have multiple tariffs with the last tariff always having no threshold.
so in the example above the first 150 minutes will be charged at 15.5p per minute, the next 100mins will be charged at 12.3p per minute and all subsequent minutes will be charged at 8p.
Therefore if:
AnnualUsage = 1000
the total cost would be 95.55.
I'm struggling to visualise a method which would accommodate for the multiple tariffs an operator could have and multiplying a value by a different price depending on threshold.
Please Help!

Just another option, I think it's self explanatory:
rates = [
{price: 15.5, threshold: 150},
{price: 12.3, threshold: 100},
{price: 8}
]
annual_usage = 1000
res = rates.each_with_object([]) do |h, ary|
if h.has_key?(:threshold) && annual_usage > h[:threshold]
annual_usage -= h[:threshold]
ary << h[:threshold] * h[:price]/100
else
ary << annual_usage * h[:price]/100
end
end
res #=> [23.25, 12.3, 60]
res.sum #=> 95.55
Take a look to Enumerable#each_with_object.

def tot_cost(rate_tbl, minutes)
rate_tbl.reduce(0) do |tot,h|
mins = [minutes, h[:threshold] || Float::INFINITY].min
minutes -= mins
tot + h[:price] * mins
end
end
rate_tbl = [
{ price: 15.5, threshold: 150},
{ price: 12.3, threshold: 100 },
{ price: 8 }
]
tot_cost(rate_tbl, 130) #=> 2015.0 (130*15.5)
tot_cost(rate_tbl, 225) #=> 3247.5 (150*15.5 + (225-150)*12.3)
tot_cost(rate_tbl, 300) #=> 3955.0 (150*15.5 + 100*12.3 + (300-250)*8)
If desired, h[:threshold] || Float::INFINITY can be replaced by
h.fetch(:threshold, Float::INFINITY)

RATES = [
{price: 15.5, threshold: 150},
{price: 12.3, threshold: 100},
{price: 8}
]
def total_cost(annual_usage)
rate_idx = 0
idx_in_threshold = 1
1.upto(annual_usage).reduce(0) do |memo, i|
threshold = RATES[rate_idx][:threshold]
if threshold && (idx_in_threshold > RATES[rate_idx][:threshold])
idx_in_threshold = 1
rate_idx += 1
end
idx_in_threshold += 1
memo + RATES[rate_idx][:price]
end
end
puts total_cost(1000).to_i
# => 9555
The key concepts:
using an enumerable method such as reduce to incrementally build the solution. You could alternatively use each but reduce is more idiomatic.
tracking progress through the rates list through the rate_idx and idx_in_threshold variables. These give you all the information you need to determine whether you should advance to the next tier.
Also, avoid writing your hash keys like "price": 15.5 - just remove the quotations, it's more idiomatic.

With object-oriented approach you can remove explicit if ..else statements and maybe make code more self explanatory.
class Total
attr_reader :price
def initialize(usage)
#usage = usage
#billed_usage = 0
#price = 0
end
def apply(rate)
applicable_usage = [#usage - #billed_usage, 0].max
usage_to_apply = [applicable_usage, rate.fetch(:threshold, applicable_usage)].min
#price += usage_to_apply * rate[:price]
#billed_usage += usage_to_apply
end
end
Simple usage with each method
rates = [
{:price => 15.5, :threshold => 150},
{:price => 12.3, :threshold => 100},
{:price => 8}
]
total = Total.new(1000)
rates.each { |rate| total.apply(rate) }
puts "Total: #{total.price}" # Total: 9555.0 (95.55)

Related

How to merge 2 arrays where value in one matches a value in another with different key in Ruby

I have an array that contains other arrays of items with prices but when one has a sale a new item is created How do I merge or pull value from one to the other to make 1 array so that the sale price replaces the non sale but contains the original price?
Example:
items=[{"id": 123, "price": 100, "sale": false},{"id":456,"price":25,"sale":false},{"id":678, "price":75, "sale":true, "parent_price_id":123}]
Transform into:
items=[{"id":456,"price":25,"sale":false},{"id":678, "price":75, "sale":true, "parent_price_id":123, "original_price": 100}]
It's not the prettiest solution, but here's one way you can do it. I added a minitest spec to check it against the values you provided and it gives the answer you're hoping for.
require "minitest/autorun"
def merge_prices(prices)
# Create a hash that maps the ID to the values
price_map =
prices
.map do |price|
[price[:id], price]
end
.to_h
# Create a result array which is initially duplicated from the original
result = prices.dup
result.each do |price|
if price.key?(:parent_price)
price[:original_price] = price_map[price[:parent_price]][:price]
# Delete the original
result.delete_if { |x| x[:id] == price[:parent_price] }
end
end
result
end
describe "Merge prices" do
it "should work" do
input = [
{"id":123, "price": 100, "sale": false},
{"id":456,"price":25,"sale": false},
{"id":678, "price":75, "sale": true, "parent_price":123}
].freeze
expected_output = [
{"id":456,"price":25,"sale": false},
{"id":678, "price":75, "sale": true, "parent_price":123, "original_price": 100}
].freeze
assert_equal(merge_prices(input), expected_output)
end
end
Let's being by defining items in an equivalent, but more familiar, way:
items = [
[{:id=>123, :price=>100, :sale=>false}],
[{:id=>456, :price=>25, :sale=>false}],
[{:id=>678, :price=>75, :sale=>true, :parent_price=>123}]
]
with the desired return value being:
[
{:id=>456, :price=>25, :sale=>false},
{:id=>678, :price=>75, :sale=>true, :parent_price=>123,
:original_price=>100}
]
I assume that h[:sale] #=> false for every element of items (a hash) g for which g[:parent_price] = h[:id].
A convenient first step is to create the following hash.
h = items.map { |(h)| [h[:id], h] }.to_h
#=> {123=>{:id=>123, :price=>100, :sale=>false},
# 456=>{:id=>456, :price=>25, :sale=>false},
# 678=>{:id=>678, :price=>75, :sale=>true, :parent_price=>123}}
Then:
h.keys.each { |k| h[k][:original_price] =
h.delete(h[k][:parent_price])[:price] if h[k][:sale] }
#=> [123, 456, 678] (not used)
h #=> {456=>{:id=>456, :price=>25, :sale=>false},
# 678=>{:id=>678, :price=>75, :sale=>true, :parent_price=>123,
# :original_price=>100}}
Notice that Hash#delete returns the value of the deleted key.
The last two steps are to extract the values from this hash and replace items with the resulting array of hashes:
items.replace(h.values)
#=> [{:id=>456, :price=>25, :sale=>false},
# {:id=>678, :price=>75, :sale=>true, :parent_price=>123,
# :original_price=>100}]
See Array#replace.
If desired we could combine these steps as follows.
items.replace(
items.map { |(h)| [h[:id], h] }.to_h.tap do |h|
h.keys.each { |k| h[k][:original_price] =
h.delete(h[k][:parent_price])[:price] if h[k][:sale] }
end.values)
#=> [{:id=>456, :price=>25, :sale=>false},
# {:id=>678, :price=>75, :sale=>true, :parent_price=>123,
# :original_price=>100}]
See Object#tap.

Groovy for loop that compares records from database

I have a database with 1 table that holds hundreds of records. I need to make a for loop in groovy script that compares first record with second record, second record with third record, etc. i need to compare length changes between records and print out all changes that is higher than 30. Example - first record 30m, second record 40m, third record 100m. It will print out second-third record.
I dont know amount of records in table, so i dont know how to create for loop. Any suggestions?
Also records has ip. Each ip can be multiple times and i need to compare all records in each ip.
record 1:
port_nbr | 1
pair | pairA
length | 30.00
add_date | 2020-06-16 00:01:13.237164
record 2:
port_nbr | 1
pair | pairA
length | 65.00
add_date | 2020-06-16 00:02:13.237164
record 3:
port_nbr | 2
pair | pairc
length | 65.00
add_date | 2020-06-16 00:02:13.237164
I expect that for loop checks if current record port_nbr is the same with next record, if yes, then it checks if pair is same and if its the same, then he compares if length changed for 30+m. In this case it would output that there is 30+m change in 1/2 record. After outputing it, then it would compare second record and third record. But they doesnt have same port_nbr and pair, so i expect it to start comparing again all port_nbr that is 2 with all following records.
There could be even 10 records with port_nbr 1, but with different pairs. I need to check for pairs aswell and only then compare lengths.
My code at this moment:
import java.sql.*;
import groovy.sql.Sql
class Main{
static void main(String[] args) {
def dst_db1 = Sql.newInstance('connection.........')
dst_db1.getConnection().setAutoCommit(false)
def sql = (" select d.* from (select d.*, lead((case when length <> 'N/A' then length else length_to_fault end)::float) over (partition by port_nbr, pair order by port_nbr, pair, d.add_date) as lengthh from diags d)d limit 10")
def lastRow = [id:-1, port_nbr:-1, pair:'', lengthh:-1.0]
dst_db1.eachRow( sql ) {row ->
if( row.port_nbr == lastRow.port_nbr && row.pair == lastRow.pair){
BigDecimal lengthChange =
new BigDecimal(row.lengthh ? row.lengthh : 0 ) - new BigDecimal(lastRow.lengthh ? lastRow.lengthh :0 )
if( lengthChange > 30.0){
print "Port ${row.port_nbr}, ${row.pair} length change: $lengthChange"
println "/tbetween row ID ${lastRow.id} and ${row.id}"
}
lastRow = row
}else{
println "Key Changed"
lastRow = row
}
}
}
}
The following code will report length changes > 30 within the same port_nbr and pair.
def sql = 'Your SQL here.' // Should include "order by pair, port_nbr, date"
def lastRow = [id:-1, port_nbr:-1, pair:'', length:-1.0]
dst_db1.eachRow( sql ) { row ->
if ( row.port_nbr == lastRow.port_nbr && row.pair == lastRow.pair ) {
BigDecimal lengthChange =
new BigDecimal( row.length ) - new BigDecimal( lastRow.length )
if ( lengthChange > 30.0 ) {
print "Port ${row.port_nbr}, ${row.pair} length change: $lengthChange"
println "\tbetween row ID ${lastRow.id} and ${row.id}"
}
lastRow = row
} else {
println "Key changed"
lastRow = row
}
}
To run the above code without a database I prefixed it with this test code:
class DstDb1 {
def eachRow ( sql, closure ) {
rows.each( closure )
}
def rows = [
[id: 1, port_nbr: 1, pair: 'pairA', length: 30.00 ],
[id: 2, port_nbr: 1, pair: 'pairA', length: 65.00 ],
[id: 3, port_nbr: 1, pair: 'pairA', length: 70.00 ],
[id: 4, port_nbr: 1, pair: 'pairA', length: 75.00 ],
[id: 5, port_nbr: 1, pair: 'pairB', length: 130.00 ],
[id: 6, port_nbr: 1, pair: 'pairB', length: 165.00 ],
[id: 7, port_nbr: 1, pair: 'pairB', length: 170.00 ],
[id: 8, port_nbr: 1, pair: 'pairB', length: 175.00 ],
[id: 9, port_nbr: 2, pair: 'pairC', length: 230.00 ],
[id:10, port_nbr: 2, pair: 'pairC', length: 265.00 ],
[id:11, port_nbr: 2, pair: 'pairC', length: 270.00 ],
[id:12, port_nbr: 2, pair: 'pairC', length: 350.00 ]
]
}
DstDb1 dst_db1 = new DstDb1()
Running the test gives this result:
Key changed
Port 1, pairA length change: 35 between row ID 1 and 2
Key changed
Port 1, pairB length change: 35 between row ID 5 and 6
Key changed
Port 2, pairC length change: 35 between row ID 9 and 10
Port 2, pairC length change: 80 between row ID 11 and 12

Manipulating character arrays quickly in R data.table [duplicate]

This question already has answers here:
Faster way to read fixed-width files
(4 answers)
Closed 4 years ago.
I have a huge datatset (14GB, 200 Mn rows) of character vector. I've fread it (took > 30 mins on 48 core 128 GB server). The string contains concatenated information on various fields. For instance, the first row of my table looks like:
2014120900000001091500bbbbcompany_name00032401
where the first 8 characters represent date in YYYYMMDD format, next 8 characters are id, next 6 the time in HHMMSS format and then next 16 are name (prefixed with b's) and the last 8 are price (2 decimal places).
I need to transfer the above 1 column data.table into 5 columns: date, id, time, name, price.
For the above character vector that will turn out to be: date = "2014-12-09", id = 1, time = "09:15:00", name = "company_name", price = 324.01
I am looking for a (very) fast and efficient dplyr / data.table solution. Right now I am doing it with using substr:
date = as.Date(substr(d, 1, 8), "%Y%m%d");
and it's taking forever to execute!
Update: With readr::read_fwf I am able to read the file in 5-10 mins. Apparently, the reading is faster than fread. Below is the code:
f = "file_name";
num_cols = 5;
col_widths = c(8,8,6,16,8);
col_classes = "ciccn";
col_names = c("date", "id", "time", "name", "price");
# takes 5-10 mins
data = readr::read_fwf(file = f, col_positions = readr::fwf_widths(col_widths, col_names), col_types = col_classes, progress = T);
setDT(data);
# object.size(data) / 2^30; # 17.5 GB
A possible solution:
library(data.table)
library(stringi)
widths <- c(8,8,6,16,8)
sp <- c(1, cumsum(widths[-length(widths)]) + 1)
ep <- cumsum(widths)
DT[, lapply(seq_along(sp), function(i) stri_sub(V1, sp[i], ep[i]))]
which gives:
V1 V2 V3 V4 V5
1: 20141209 00000001 091500 bbbbcompany_name 00032401
Including some additional processing to get the desired result:
DT[, lapply(seq_along(sp), function(i) stri_sub(V1, sp[i], ep[i]))
][, .(date = as.Date(V1, "%Y%m%d"),
id = as.integer(V2),
time = as.ITime(V3, "%H%M%S"),
name = sub("^(bbbb)","",V4),
price = as.numeric(V5)/100)]
which gives:
date id time name price
1: 2014-12-09 1 09:15:00 company_name 324.01
But you are actually reading a fixed width file. So could also consider read.fwf from base R or read_fwffrom readr or write your own fread.fwf-function like I did a while ago:
fread.fwf <- function(file, widths, enc = "UTF-8") {
sp <- c(1, cumsum(widths[-length(widths)]) + 1)
ep <- cumsum(widths)
fread(file = file, header = FALSE, sep = "\n", encoding = enc)[, lapply(seq_along(sp), function(i) stri_sub(V1, sp[i], ep[i]))]
}
Used data:
DT <- data.table(V1 = "2014120900000001091500bbbbcompany_name00032401")
Maybe your solution is not so bad.
I am using this data:
df <- data.table(text = rep("2014120900000001091500bbbbcompany_name00032401", 100000))
Your solution:
> system.time(df[, .(date = as.Date(substr(text, 1, 8), "%Y%m%d"),
+ id = as.integer(substr(text, 9, 16)),
+ time = substr(text, 17, 22),
+ name = substr(text, 23, 38),
+ price = as.numeric(substr(text, 39, 46))/100)])
user system elapsed
0.17 0.00 0.17
#Jaap solution:
> library(data.table)
> library(stringi)
>
> widths <- c(8,8,6,16,8)
> sp <- c(1, cumsum(widths[-length(widths)]) + 1)
> ep <- cumsum(widths)
>
> system.time(df[, lapply(seq_along(sp), function(i) stri_sub(text, sp[i], ep[i]))
+ ][, .(date = as.Date(V1, "%Y%m%d"),
+ id = as.integer(V2),
+ time = V3,
+ name = sub("^(bbbb)","",V4),
+ price = as.numeric(V5)/100)])
user system elapsed
0.20 0.00 0.21
An attempt with read.fwf:
> setClass("myDate")
> setAs("character","myDate", function(from) as.Date(from, format = "%Y%m%d"))
> setClass("myNumeric")
> setAs("character","myNumeric", function(from) as.numeric(from)/100)
>
> ff <- function(x) {
+ file <- textConnection(x)
+ read.fwf(file, c(8, 8, 6, 16, 8),
+ col.names = c("date", "id", "time", "name", "price"),
+ colClasses = c("myDate", "integer", "character", "character", "myNumeric"))
+ }
>
> system.time(df[, as.list(ff(text))])
user system elapsed
2.33 6.15 8.49
All outputs are the same.
Maybe try using matrix with numeric instead of data.frame. Aggregation should take less time.

Groovy, insert node after current node

I'll try my best to explain the situation.
I have the following db columns:
oid - task - start - end - realstart - realend
My requirement is to have an output like the following:
oid1 - task1 - start1 - end1
oid2 - task2 - start2 - end2
where task1 is task, task2 is task + "real", start1 is start, start2 is realstart, end1 is end, end2 is realend
BUT
the first row should always be created (those start/end fields are never empty) the second row should only be created if realstart and realend exist which may not be true.
Inputs are 6 arrays (one for each column), Outputs must be 4 arrays, something like this:
#input oid,task,start,end,realstart,realend
#output oid,task,start,end
I was thinking about using something like oid.each but I don't know how to add nodes after the current one. Order is important in the requirement.
For any explanation please ask, thanks!
After your comment and understanding that you don't want (or cannot) change the input/output data format, here's another solution that does what you've asked using classes to group the data and make it easier to manage:
import groovy.transform.Canonical
#Canonical
class Input {
String[] oids = [ 'oid1', 'oid2' ]
String[] tasks = [ 'task1', 'task2' ]
Integer[] starts = [ 10, 30 ]
Integer[] ends = [ 20, 42 ]
Integer[] realstarts = [ 12, null ]
Integer[] realends = [ 21, null ]
List<Object[]> getEntries() {
// ensure all entries have the same size
def entries = [ oids, tasks, starts, ends, realstarts, realends ]
assert entries.collect { it.size() }.unique().size() == 1,
'The input arrays do not all have the same size'
return entries
}
int getSize() {
oids.size() // any field would do, they have the same length
}
}
#Canonical
class Output {
List oids = [ ]
List tasks = [ ]
List starts = [ ]
List ends = [ ]
void add( oid, task, start, end, realstart, realend ) {
oids << oid; tasks << task; starts << start; ends << end
if ( realstart != null && realend != null ) {
oids << oid; tasks << task + 'real'; starts << realstart; ends << realend
}
}
}
def input = new Input()
def entries = input.entries
def output = new Output()
for ( int i = 0; i < input.size; i++ ) {
def entry = entries.collect { it[ i ] }
output.add( *entry )
}
println output
Responsibility of arranging the data is on the Input class, while the responsibility of knowing how to organize the output data is in the Output class.
Running this code prints:
Output([oid1, oid1, oid2], [task1, task1real, task2], [10, 12, 30], [20, 21, 42])
You can get the arrays (Lists, actually, but call toArray() if on the List to get an array) from the output object with output.oids, output.tasks, output.starts and output.ends.
The #Canonical annotation just makes the class implement equals, hashCode, toString and so on...
If you don't understand something, ask in the comments.
IF you need an "array" whose size you don't know from the start, you should use a List instead. But in Groovy, that's very easy to use.
Here's an example:
final int OID = 0
final int TASK = 1
final int START = 2
final int END = 3
final int R_START = 4
final int R_END = 5
List<Object[]> input = [
//oid, task, start, end, realstart, realend
[ 'oid1', 'task1', 10, 20, 12, 21 ],
[ 'oid2', 'task2', 30, 42, null, null ]
]
List<List> output = [ ]
input.each { row ->
output << [ row[ OID ], row[ TASK ], row[ START ], row[ END ] ]
if ( row[ R_START ] && row[ R_END ] ) {
output << [ row[ OID ], row[ TASK ] + 'real', row[ R_START ], row[ R_END ] ]
}
}
println output
Which outputs:
[[oid1, task1, 10, 20], [oid1, task1real, 12, 21], [oid2, task2, 30, 42]]

Azure Search scoring

I have sets of 3 identical (in Text) items in Azure Search varying on Price and Points. Cheaper products with higher points are boosted higher. (Price is boosted more then Points, and is boosted inversely).
However, I keep seeing search results similar to this.
Search is on ‘john milton’.
I get
Product="Id = 2-462109171829-1, Price=116.57, Points= 7, Name=Life of Schamyl / John Milton Mackie, Description=.", Score=32.499783
Product="Id = 2-462109171829-2, Price=116.40, Points= 9, Name=Life of Schamyl / John Milton Mackie, Description=.", Score=32.454872
Product="Id = 2-462109171829-3, Price=115.64, Points= 9, Name=Life of Schamyl / John Milton Mackie, Description=.", Score=32.316270
I expect the scoring order to be something like this, with the lowest price first.
Product="Id = 2-462109171829-3, Price=115.64, Points= 9, Name=Life of Schamyl / John Milton Mackie, Description=.", Score=
Product="Id = 2-462109171829-2, Price=116.40, Points= 9, Name=Life of Schamyl / John Milton Mackie, Description=.", Score=
Product="Id = 2-462109171829-1, Price=116.57, Points= 7, Name=Life of Schamyl / John Milton Mackie, Description=.", Score=
What am I missing or are minor scoring variations acceptable?
The index is defined as
let ProductDataIndex =
let fields =
[|
new Field (
"id",
DataType.String,
IsKey = true,
IsSearchable = true);
new Field (
"culture",
DataType.String,
IsSearchable = true);
new Field (
"gran",
DataType.String,
IsSearchable = true);
new Field (
"name",
DataType.String,
IsSearchable = true);
new Field (
"description",
DataType.String,
IsSearchable = true);
new Field (
"price",
DataType.Double,
IsSortable = true,
IsFilterable = true)
new Field (
"points",
DataType.Int32,
IsSortable = true,
IsFilterable = true)
|]
let weightsText =
new TextWeights(
Weights = ([|
("name", 4.);
("description", 2.)
|]
|> dict))
let priceBoost =
new MagnitudeScoringFunction(
new MagnitudeScoringParameters(
BoostingRangeStart = 1000.0,
BoostingRangeEnd = 0.0,
ShouldBoostBeyondRangeByConstant = true),
"price",
10.0)
let pointsBoost =
new MagnitudeScoringFunction(
new MagnitudeScoringParameters(
BoostingRangeStart = 0.0,
BoostingRangeEnd = 10000000.0,
ShouldBoostBeyondRangeByConstant = true),
"points",
2.0)
let scoringProfileMain =
new ScoringProfile (
"main",
TextWeights =
weightsText,
Functions =
new List<ScoringFunction>(
[
priceBoost :> ScoringFunction
pointsBoost :> ScoringFunction
]),
FunctionAggregation =
ScoringFunctionAggregation.Sum)
new Index
(Name = ProductIndexName
,Fields = fields
,ScoringProfiles = new List<ScoringProfile>(
[
scoringProfileMain
]))
All indexes in Azure Search are split into multiple shards allowing us for quick scale up and scale downs. When a search request is issued, it’s issued against each of the shards independently. The result sets from each of the shards are then merged and ordered by score (if no other ordering is defined). It is important to know that the scoring function weights query term frequency in each document against its frequency in all documents, in the shard!
It means that in your scenario, in which you have three instances of every document, even with scoring profiles disabled, if one of those documents lands on a different shard than the other two, its score will be slightly different. The more data in your index, the smaller the differences will be (more even term distribution). It’s not possible to assume on which shard any given document will be placed.
In general, document score is not the best attribute for ordering documents. It should only give you general sense of document relevance against other documents in the results set. In your scenario, it would be possible to order the results by price and/or points if you marked price and/or points fields as sortable. You can find more information how to use $orderby query parameter here: https://msdn.microsoft.com/en-us/library/azure/dn798927.aspx

Resources