SOLR and complex pricing - solr

We have a property booking site (similar to Airbnb) with a LAMP setup and a relatively simple SOLR index. A user can search by:
Location
Guests
Date IN/OUT
They can also filter results for a specific price range, as well as sort by price.
The issue is that price per night is never fixed, and is depends on several things such as custom prices (for one day, or a range of days), price per guest, discounts (e.g. weekend discount) etc. One example could be that price_per_night might be 50$, however for the 10th and 11th of August the real price per night could be 110$ plus 20$ extra guest fee, minus a discount of 30$ for that particular weekend. On top of this, commissions are applied on the total amount so we'd like this to be as accurate as possible.
While we can calculate this on the server side, this of course affects the results returned for a price range as well as the sorting (highest price first vs. lowest).
Could anyone suggest any possible solution?
Below is an example of a select:
"docs": [
{
"id": 1,
"property_type": 1,
"room_type": 1,
"minimum_nights": 2,
"maximum_nights": 120,
"location": "41.3902359,2.1685901",
"price_per_night": 210,
........
"unavailable_days": [
"2016-09-15T00:00:00Z",
"2016-09-16T00:00:00Z",
......
]
}

Related

Measure that indicates sales volume per product per day (Google Data Studio)

I need to implement a measure that indicates sales volume per product per day. For the example table below (each line is a record of a sale):
id,create_date,report_date,quantity
329,2019-01-02 08:19:17,2019-01-02 14:34:12,6
243,2019-01-02 09:11:42,2019-01-03 15:30:14,6
238,2019-02-02 08:19:17,2019-03-02 14:36:17,2
170,2019-04-02 02:15:17,2019-04-02 14:37:12,2
238,2019-04-02 08:43:11,2019-04-02 14:41:01,8
238,2019-04-02 08:52:52,2019-04-02 14:39:12,1
238,2019-08-02 08:10:09,2019-08-02 15:02:12,1
238,2019-10-02 08:10:17,2019-10-02 18:34:11,1
170,2020-01-02 08:24:14,2020-01-02 19:31:31,2
170,2020-01-02 08:32:16,2020-01-02 21:52:32,3
The operations to reach the result:
1. Identify total sales and total products for each day.
For 2019-01-02, two sales were carried out, totaling 12 products (6 products for each sale on the day)
2. Divide total products by total sales, resulting in the product/sale ratio for the day (if the result is 2, it indicates that each sale on average corresponds to two products).
In the example table there are 6 different dates (YYYMMDD), for each corresponding date: total products/amount of sales on the day (12/2, 2/1, 11/3, 1/1, 1/1, 5/1) .
3. Average every day's story, resulting in a single value.
(3 + 2 + 3.6 + 1 + 1 + 3)/6 = 2.26 , indicating that on average two products are sold per sale per day.
As it involves many operations, I couldn't get a solution for this problem. If anyone can help me.
note: I accept alternative suggestions to offer the measure to indicate the volume of sales per product per day.
Please check the numbers given in your steps 2 and 3:
12/2=6 not 3
5/1 must be 5/2
I still think that you want to calculate a 'day story' in step 2, see formular below.
Here are the steps for generating such a value:
create a table
add your time as dimension and make it to date not date&time
order by date ascending (optional)
create a field day story with the formula sum(quantity)/count(id)
add this field three times to your table
click on the AUT left to the fieldname and select Running calculation to 'running average`
You have to convince your users to only look at the last line of the table.

How to calculate the tax free amount of sales, based on date fields?

i need your help for a task that i have undertaken and i face difficulties.
So, i have to calculate the NET amount of sales for some products, which were sold in different cities on different years and for this reason different tax rate is applied.
Specifically, i have a dimension table (Dim_Cities) which consists of the cities that the products can be sold.
i.e
Dim_Cities:
CityID, CityName, Area, District.
Dim_Cities:
1, "Athens", "Attiki", "Central Greece".
Also, i have a file/table which consists of the following information :
i.e
[SalesArea]
,[EffectiveFrom_2019]
,[EffectiveTo_2019]
,[VAT_2019]
,[EffectiveFrom_2018]
,[EffectiveTo_2018]
,[VAT_2018]
,[EffectiveFrom_2017]
,[EffectiveTo_2017]
,[VAT_2017]
,[EffectiveFrom_2016_Semester1]
,[EffectiveTo_2016_Semester1]
,[VAT_2016_Semester1]
,[EffectiveFrom_2016_Semester2]
,[EffectiveTo_2016_Semester2]
,[VAT_2016_Semester2]
i.e
"Athens", "2019-01-01", "2019-12-31", 0.24,
"2018-01-01", "2018-12-31", 0.24,
"2017-01-01", "2017-12-31", 0.17,
"2016-01-01", "2016-05-31", 0.16,
"2016-01-06", "2016-12-31", 0.24
And of course there is a fact table that holds all the information,
i.e
FactSales_ID, CityID, SaleAmount (with VAT), SaleDate_ID.
The question is how to compute for every city the "TAX-Free SalesAmount", that corresponds to each particular saledate? In other words, i think that i have to create a function that computes every time the NET amount, substracting in each case the corresponding tax rate, based on the date and city that it finds. Can anyone help me or guide me to achieve this please?
I'm not sure if you are asking how to query your data to produce this result or how to design your data warehouse to make this data available - but I'm hoping you are asking about how to design your data warehouse as this information should definitely be pre-calculated and held in your DW rather than being calculated every time anyone wants to report on the data.
One of the key points of building a DW is that all the complex business logic should be handled in the ETL (as much as possible) so that the actually reporting is simple; the only calculations in a reporting process are those that can't be pre-calculated.
If your CITY Dim is SCD2 (or could be made to be SCD2) then I would add the VAT rate as an attribute to that Dim - otherwise you could hold VAT Rate in a "worker" table.
When your ETL loads your Fact table you would use the VAT rate on the CITY Dim (or in the worker table) to calculate the Net and Gross amounts and hold both as measures in your fact table

DB Schema: Versioned price model vs invoice-related data

I am creating some db model for rental invoice generation.
The invoice consists of N booking time ranges.
Each booking belongs to a price model. A price model is a set of rules which determine a final price (base price + season price + quantity discout + ...).
That means the final price for the N bookings within an invoice can be a complex calculation, and of course I want to keep track of every aspect of the final price calculation for later review of an invoice.
The problem is, that a price model can change in the future. So upon invoice generation, there are two possibilities:
(a) Never change a price model. Just make it immutable by versioning it and refer to a concrete version from an invoice.
(b) Put all the price information, discounts and extras into the invoice. That would mean alot of data, as an invoice contains N bookings which may be partly in the range of a season price.
Basically, I would break down each booking into its days and for each day I would have N rows calculating the base price, discounts and extra fees.
Possible table model:
Invoice
id: int
InvoiceBooking # Each booking. One invoice has N bookings
id: int
invoiceId: int
(other data, e.g. guest information)
InvoiceBookingDay # Days of a booking. Each booking has N days
id: int
invoiceBookingId: id
date: date
InvoiceBookingDayPriceItem # Concrete discounts, etc. One days has many items
id: int
invoiceBookingDayId: int
price: decimal
title: string
My question is, which way should I prefer and why.
My considerations:
With solution (a), the invoice would be re-calculated using the price model information each time the data is viewed. I don't like this, as algorithms can change. It does not feel natural for the "read-only" nature of an invoice.
Also the version handling of price models is not a trivial task and the user needs to know about the version concept, which adds application complexity.
With solution (b), I generate a bunch of nested data and it adds alot of complexity to the schema.
Which way would you prefer? Am I missing something?
Thank you
There is a third option which I recommend. I call it temporal (time) versioning and the layout of the table is really quite simple. You don't describe your pricing data so I'll just show a simple example.
Table: DailyPricing
ID EffDate Price ...
A 01/01/2015 17.50 ...
B 01/01/2015 20.00 ...
C 01/01/2015 22.50 ...
B 01/01/2016 19.50 ...
C 07/01/2016 24.00 ...
This shows that all three price schedules (A, B and C just represent whatever method you use to distinguish between price levels) were given a price on Jan 1, 2015. On Jan 1, 2016, the price of plan B was reduced. In July, the price of plan C was increased.
To get the current price of a plan, the query is this:
select dp.Price
from DailyPricing dp
where dp.ID = 'A'
and dp.Effdate =(
select Max( dp2.EffDate )
from DailyPricing dp2
where dp2.ID = dp.ID
and dp2.EffDate >= :DateOfInterest);
The DateOfInterest variable would be loaded with the current date/time. This query returns the one price that is currently in effect. In this case, the price set Jan 1, 2015 as that has never changed since taking effect. If the search had been for plan B, the price set on Jan 1, 2016 would have been returned and for plan C, the price set on July 1, 2016. These are the latest prices set for each plan; that is, the current prices.
Such a query would more likely be in a join with probably the invoice table so you could perform the price calculation.
select ...
from Invoices i
join DailyPricing dp
on dp.ID = i.ID
and dp.Effdate =(
select Max( dp2.EffDate )
from DailyPricing dp2
where dp2.ID = dp.ID
and dp2.EffDate >= i.InvoiceDate )
where i.ID = 1234;
This is a little more complex than a simple query but you are asking for more complex data (or, rather, a more complex view of the data). However, this calculation is probably only executed once and the final price stored back in to the invoice data or elsewhere.
It would be calculated again only if the customer made some changes or you were going through an audit, rechecking the calculation for accuracy.
Notice something, however, that is subtle but very important. If the query above were being executed for an invoice that had just been created, the InvoiceDate would be the current date and the price returned would be the current price. If, however, the query was being run as a verification on an invoice that was two years old, the InvoiceDate would be two years ago and the price returned would be the price that was in effect two years ago.
In other words, the query to return current data and the query to return past data is the same query.
That is because current data and past data remain in the same table, differentiated only by the date the data takes effect. This, I think, is about the simplest solution to what you want to do.
How about A and B?
It's not best practice to re-calculate any component of an invoice, especially if the component was printed. An invoice and invoice details should be immutable, and you should be able to reproduce it without re-calculating.
If you ever have a problem with figuring out how you got to a certain amount, or if there is a bug in your program, you'll be glad you have the details, especially if the calculations are complex.
Also, it's a good idea to keep a history of your pricing models so you can validate how you got to a certain price. You can make this simple to your users. They don't have to see the history -- but you should record their changes in the history log.

MS SQL - Calculating plan payments for a month

I need to calculate how much a plan has cost the customer in a specific month.
Plans have floating billing cycles of a month's length - for example a billing cycle can run from '2014-04-16' to '2014-05-16'.
I know the start date of a plan, and the end date can either be a specific date or NULL if the plan is still running.
If the end date is not null, then the customer is charged for a whole month - not pro rated. Example: The billing cycle is going from the 4th to 4th each month, but the customer ends his plan on the 10th, he will still be charged until the 4th next month.
Can anyone help me? I feel like I've been going over this a million times, and just can't figure it out.
Variables I have:
#planStartDate [Plan's start date]
#planEndDate [Plan's end date - can be null]
#billStartDate [The bill's start date - example: 2015-02-01]
#billEndDate [One month after bill's start date - 2015-03-01]
#price [the plan's price per billing cycle]
Heres the best answer I can give based on the very small information you have given so far(btw, in the future, it would really help people answer your question faster/easier/more efficiently if you could specify a lot more info;tables involved, all columns, etc..):
"I need to calculate how much a plan has cost the customer in a specific month."
SELECT SUM(price), customerID(I assume you have a column of some sort in this table to distinguish between customers) FROM table_foo
where planStartDate BETWEEN = 'a specific date you specify'
Its a bit rough of a query, but thats the best I can give till you specify more clearly your variable (i.e. tables involved, ALL columns in table, etc etc.....)

Newrelic custom plugin metrics

I'm working directly with the HTTP API and trying to get some metrics from our storage.
The doc states "Tip: If you want the metric to appear as a percentage in the user interface, then you must define it as a percentage in the JSON."
However - I can't send metric values which are percentages; the POST response has status 400 with body
{"error":"Unable to parse request: null"}
My POST is
{"components": [
{"duration": 1,
"guid": "com.cumulus.Test5",
"name":"ServerX",
"metrics": {
"Component/Filesystem/root/Percentage Used": "62%"
}
}],
"agent": {"host": "vss-syd", "version": "1.0.0", "pid": 1080}
}
Also - I have a metric "Number of devices offline" (for a ZFS storage pool) which is discrete i.e. not continuous - so averages don't make sense, just absolute values.
For which I'd like to set an alert if it gets above 0.
I know the threshold is only 'greater than', so I can set thresholds # 0.1 Alert & 0.2 Critical no prob.
However - please can someone point me in the right direction as to how I should
Send such a metric (i.e. need to specify [units] and aggregates?)
Create the Summary Metric + Graphs in the frontend? (which 'Value' to select e.g. 'Calls per minute')
There are two issues that look like they could be the cause.
The first is that the duration should be 60, which represents the number of seconds for which the reported metrics correspond. NewRelic is optimized to work with this particular interval and while you can have larger values (300 seconds is the recommended maximum), the minimum required value is 60. Smaller values may be accepted by the API, but the results will be unpredictable.
The second is that the percentage used is a string value which should instead be reported as an integer value, such as 62, or a float value of 62.0 if you wish to preserve that level of precision.
Regarding the second portion of your question about reporting and displaying a metric related to "# of Failing Disks":
New Relic does not currently support reporting metrics that represent absolute values. All metric values are presented in aggregate over some particular time period. Summary Metrics are aggregated over the most recent ~4 minutes, while metrics on charts and tables are aggregated over the time period selected in the time picker.
That said, you could try something along the lines of "percentage of failing disks" where perhaps an average might still be useful in that any non-zero value indicates a failure.
This average would be of questionable value once the aggregation time period became larger than a few minutes. However, given that summary metrics are always aggregated over a fixed time period of ~4 minutes — and it is summary metrics that trigger alerts — this may still be useful to you.

Resources