How to store and bill high flexible licences? - database

I am currently working on a prototype for a future product. I am currently working on a concept for how highly flexible licences can best be stored and billed.
The following illustration explains the situation a little:
The billing should take place for each month and include the exact prices on a pro rata basis. In the example, January is to be billed, which is made up of 3 different licence periods. Each part should contain the exact price proportionate to its time in the month. So the concept is clear to me, but I am asking about the best technical implementation.
Should the periods be in an SQL database or a timeseries DB or something else entirely? How should I mark when periods have been paid, but extend into the next month?
SQL would be my first approach: Periods table: store periods (startDate, endDate, priceFactor, isBilled)
Problems:
How to bill ongoing periods, which extend over many months? (they have no endDate)
sql-queries could get complex
Thank you for your help!

Related

Database Design: How do I handle tracking goals vs. actuals over time?

This isn't exactly a programming question, as I don't have an issue writing the code, but a database design question. I need to create an app that tracks sales goals vs. actual sales over time. The thing is, that a persons goal can change (let's say daily at most).
Also, a location can have multiple agents with different goals that need to be added together for the location.
I've considered basically running a timed task to save daily goals per agent into a field. It seems that over the years that will be a lot of data, but it would let me simply query a date range and add all the daily goals up to get an goal for that date range.
Otherwise, I guess I could simply write changes (i.e. March 2nd - 15 sales / week, April 12th, 16 sales per week) which would be less data, but much more programming work to figure out goals based on a time query.
I'm assuming there is probably a best practice for this - anyone?
Put a date range on your goals. The start of the range is when you set that goal. The end of the range starts off as max-collating date (often 9999-12-31, depending on your database).
Treat this as "until forever" or "until further notice".
When you want to know what goals were in effect as of a particular date, you would have something like this in your WHERE clause:
...
WHERE effective_date <= #AsOfDate
AND expiry_date > #AsOfDate
...
When you change a goal, you need two operations, first you update the existing record (if it exists) and set the expiry_date to the new as-of date. Then you insert a new record with an effective_date of the new as-of date and an expiry_date of forever (e.g. '9999-12-31')
This give you the following benefits:
Minimum number of rows
No scheduled processes to take daily snapshots
Easy retrieval of effective records as of a point in time
Ready-made audit log of changes

Deriving and saving the historical values into a separate table, or calculate the historical values from the existing data only when they're needed?

tl;dr general question about handling database data and design:
Is it ever acceptable/are there any downsides to derive data from other data at some point in time, and then store that derived data into a separate table in order to keep a history of values at that certain time, OR, should you never store data that is derived from other data, and instead derive the required data from the existing data only when you need it?
My specific scenario:
We have a database where we record peoples' vacation days and vacation day statuses. We track how many days they have left, how many days they've taken, and things like that.
One design requirement has changed and now asks that I be able to show how many days a person had left on December 31st of any given year. So I need to be able to say, "Bob had 14 days left on December 31st, 2010".
We could do this two ways I see:
A SQL Server Agent job that, on December 31st, captures the days remaining for everyone at that time, and inserts them into a table like "YearEndHistories", which would have your EmployeeID, Year, and DaysRemaining at that time.
We don't keep a YearEndHistories table, but instead if we want to find out the amount of days possessed at a certain time, we loop through all vacations added and subtracted that exist UP TO that specific time.
I like the feeling of certainty that comes with #1 --- the recorded values would be reviewed by administration, and there would be no arguing or possibility about that number changing. With #2, I like the efficiency --- one less table to maintain, and there's no derived data present in the actual tables. But I have a weird fear about some unseen bug slipping by and peoples' historical value calculation start getting screwed up or something. In 2020 I don't want to deal with, "I ended 2012 with 9.5 days, not 9.0! Where did my half day go?!"
One thing we have decided on is that it will not be possible to modify values in previous years. That means it will never be possible to go back to the previous calendar year and add a vacation day or anything like that. The value at the end of the year is THE value, regardless of whether or not there was a mistake in the past. If a mistake is discovered, it will be balanced out by rewarding or subtracting vacation time in the current year.
Yes, it is acceptable, especially if the calculation is complex or frequently called, or doesn't change very often (eg: A high score table in a game - it's viewed very often, but the content only changes on the increasingly rare occasions when a player does very well).
As a general rule, I would normalise the data as far as possible, then add in derived fields or tables where necessary for performance reasons.
In your situation, the calculation seems relatively simple - a sum of employee vacation days granted - days taken, but that's up to you.
As an aside, I would encourage you to get out of thinking about "loops" when data is concerned - try to think about the data as a whole, as a set. Something like
SELECT StaffID, sum(Vacation)
from
(
SELECT StaffID, Sum(VacationAllocated) as Vacation
from Allocations
where AllocationDate<=convert(datetime,'2010-12-31' ,120)
group by StaffID
union
SELECT StaffID, -Count(distinct HolidayDate)
from HolidayTaken
where HolidayDate<=convert(datetime,'2010-12-31' ,120)
group by StaffID
) totals
group by StaffID
Derived data seems to me like a transitive dependency, which is avoided in normalisation.
That's the general rule.
In your case I would go for #1, which gives you a better "auditability", without performance penalty.

MySQL Database Table Structure Efficiency Advice

We are designing a MySQL table to track the number of followers on a daily basis for 10,000s of Twitter accounts. We've been struggling to figure out the most efficient way to store this data. The two options we are consider are:
1) OPTION 1 - Table with rows: Twitter ID, Month, Day1, Day2, Day3, etc. where each day would contain the number of followers for that account for each day of the specified month
2) OPTION 2 - Table with rows: Twitter ID, Day, Followers
Option 1 would result in about 30x less rows than Option 2. What I'm not sure from a performance perspective is if it's preferable to have less columns or less rows.
In terms of the queries we will be using, we just want to be able to query the data to get the number of followers for a specific Twitter account for arbitrary time ranges.
I would appreciate suggestions on which approach is better and why. Also, if there is a much better option than the ones I present please feel free to suggest it.
Thanks in advance for your help!
Option 2, no question.
Imagine trying to write a query using each option. Let's give the best case for option 1: We know we want the total for all 31 days of the month. THen with option 1 the query is:
select twitterid, day1+day2+day3+day4+day5+day6+day7+day8+day9+day10
+day11+day12+day13+day14+day15+day16+day17+day18+day19+day20
+day21+day22+day23+day24+day15+day26+day27+day28+day29+day30
+day31 as total
from table1
where month='2010-12';
select twitterid, sum(day) as total
from table2
where date between '2010-12-01' and '2010-12-31'
group by twitterid;
The second looks way easier to me. If you don't think so, tell me if you immediately noticed the error in the option 1 version, and if you're confident that no programmer would ever make such an error.
Now imagine that the requirements change just slightly, and someone wants the total for one week. With the second version, that's easy: give a date range that describes that week. This could easily be done when building a query on the fly: JUst ask for start date and add 6 days to it for the end date. But with the first version, what are you going to do? You'd have to figure out which days of the month fall in that week and change the list of fields retrieved. The week might span two calendar months. This would be a giant pain.
As to performance: Sure, more rows take more time to retrieve. But longer rows also take more time to retrieve. Lesson 1 on database design: Don't throw out normalization to do a micro-optimization when you don't even have a good reason to believe there's a problem. Build a normalized database first. Then if it turns out that there are performance problems, tune it afterwards. Odds are that you can buy a faster hard drive for a whole lot less than the cost of one day of programmer's time taken finding a mistake in an unnecessarily complex query.
Offcourse it depends on what queries you are going to do - but unless every query requires the 31 days of that month, for your operational data, Use Option 2.
It's better from a logical perspective (say later on you don't want queries per "30 calender days", but "last X days")
It's better for writes, too (only
update 1 row with 2 fields instead of
overwriting all fields).
You can always optimize later (partitioning comes to mind)
Your data-warehouse can still be optimized for long-term aggregate statistics.
Use Option 2. Option 1 would be a nightmare for queries.
MySQL has good support for doing date ranges in queries, so it is easiest to just have row per day.
I would say option 2, but you would probably want to add a field for a primary key to speed up queries. And if that primary key is an integer value, even better.
Option 2 definitely (with a two-column unique key/constraint on Twitter ID and Day).
Option 1 will just be regrettable.

Date / Time reference table needed for Analytic?

Is it better to keep Days of month, Months, Year, Day of week and week of year as separate reference tables or in a common Answer table? Goal is allow user content searches and action analytic to be filtered by all the various date-time values (There will be custom reporting for users based on their shared content). I am trying to ensure data accuracy by using IDs, and also report out on numbers of shares, etc by time and date for system reporting by comparing various user groups. If we keep in separate tables, what about time? A table with each hour, minute and second also needed?
Most databases support some sort of TIMESTAMP data type plus assciated DAY(), MONTH(), DAYOFWEEK() functions.
The only valid reason for separate DAY or HOUR columns in a separate table is if you have procomputed totals and averages for each timeslot.
Even then its only worth it if you expect a lot of filtering based on these values, as the cost of building these tables is high, and, for most queries the standard SQL "GROUP BY ... HAVING .. " will perform well enough.
It sounds like you may be interested in a "STAR SCHEMA" wikipedia a common method in data warehosing to speed up queries -- but be warned designing and building a Star Schem is not a trivial exercise.

SSAS cube design, semi-additive measures, and running totals

I have what is to me a bit of a tricky design issue in my SSAS cube. The question is related to general accounting practices, I have a fact table containing financial transactions (i.e. a ledger) and each of those transactions is tagged with a transaction date and a period. The period does NOT related directly to a day, or a series of days. Users may close a period in the middle of a day if that is when they have finished their months work.
I need to be able to report on Accounts Receivable (AR) by both date and period. I am not using Enterprise Edition of SSAS so the time intelligence semi-additive options are not availabe to me, and even if they were they would only allow one time dimension to use non-standard aggregation and I believe in this case I need two that allow this.
Accounts Receivable is a running total, it should be the sum of the latest ledger item selected and everything that came before it. I know how do do this calculation in MDX for a single time dimension, but how can I allow this to work with two time dimensions, transaction date, and period close? Is period close even considered a "time" dimension in this case? It does have a temporal aspect to it, and I do want the sums from all periods up to the current.
I am stumped on how to related the two time dimensions to a single fact table and use different aggregation for each. Maybe the best solution here is to have two periodic snapshot tables (instead of trying to aggregate this info from the FactLedger table), one aggregated by transaction date and one by period which is the solution I am currently leaning towards but I would love a second opinion.
You can most certainly have more than one time dimension in a cube, and in this case I would actually just create one common time dimension and have it role play as two, transaction date and period close. To role play a dimension, just add it to the cube again in the Dimension Usage tab of the cube designer and rename it. Set up your references appropriately to key off of the two different fact columns.
Or maybe I'm not understanding the issue correctly. This sounds pretty straight-forward.
You can create your own time-table with periods and you can alter your fact_table's datetime format to match your time-table. Then 1 dimension would be enough.

Resources