Database schema design for financial forecasting

Database schema design for financial forecasting - database

I need to develop a web app that allows companies to forecast financials.
the app has different screens, one for defining employee salaries, another for sales projections etc..
basically turn an excel financial forecast model into an app.
question is, what would be the best way to design the database, so that financial reports (e.g. a profit and loss statement or balance sheet) could be quickly generated?
assuming the forecast period is for 5 years, would you have a table
with 5 years*12 months = 60 fields per each row? is that performant
enough?
would you use DB triggers to recalculate salary expenses whenever a single employee data is changed?

I'd think it would be better to store each month's forecast in its own row in a table that looks like this
month forecast
----- --------
1 30000
2 31000
3 28000
... ...
60 52000
Then you can use the aggregate functions to calculate forecast reports, discounted cash flow etc. ( Like if you want the un-discounted total for just 4 years):
SELECT SUM(forecast) from FORECASTS where month=>1 and month<=48
For salary expenses, I would think that having a view that does calculations on the fly (or if you DB engine supports "materialized views" should have sufficient performance unless we're talking some giant number of employees or really slow DB.
Maybe have a salary history table, that trigger populates when employee data changes/payroll runs
employeeId month Salary
---------- ----- ------
1 1 4000
2 1 3000
3 1 5000
1 2 4100
2 2 3100
3 2 4800
... ... ...
Then again, you can do SUM or other aggregate function to get to the reported data.

Related

SQL tracking payment over time

I have been tasked to move a process that pays people for training from an excel spreadsheet to sql server DB. I need to be able to track payments and the reason why it was approved, denied. Example:
Payment Run Jan1
Student
Class
Amount
Reason for NonPayment
Mary
Introduction to Python
0
No W2
John
Introduction to Java
100
Payment Run Feb 1
Student
Class
Amount
Reason for NonPayment
Mary
Introduction to Python
100
Now I know I should make three tables , One for student info, one for course info , and a linked table with payments. It the payments table that has me stumped. I can do that for Jan1 , but how do I track the changes ?
I want to be able to say "On Jan runs Mary did not get paid because she was missing her W2, but she was paid in Feb" . For every payment run, I need to be able to track who got paid, amount paid , reason for nonpayment ( if present ) .

My bad. I was forgetting about many to one relationship. I was thinking more about addresses, where you "retire" the address, and only have the new link active.
So instead of "retiring" the link to the payment table on every run, keep a "PaymentRunDate" field and have a key referencing the user.
Like this ( given Marys ID is 15 , John Id is 5 , dates are in European format)
UserId
Class ID
PaymentRunDate
amount_paid
Reason
15
1
01/01/2022
0
No W2
5
2
01/01/2022
100
15
1
01/02/2022
100
and let the front end worry about how this is presented to the user.

How can I aggregate on 2 dimensions in Google Data Studio?

I have data in 2 dimensions (let's say, time and region) counting the number of visitors on a website for a given day and region, as per the following:
time
region
visitors
2021-01-01
Europe
653
2021-01-01
America
849
2021-01-01
Asia
736
2021-01-02
Europe
645
2021-01-02
America
592
2021-01-02
Asia
376
...
...
...
2021-02-01
Asia
645
...
...
...
I would like to create a table showing the average daily worldwide visitors for each month, that is:
time
visitors
2021-01
25238
2021-02
16413
This means, I need to aggregate the data this way:
first, sum over regions for distinct dates
then, calculate average on dates
Is was thinking of doing a global average of all lines of data for each month, and then multiply the value by the number of days in the month but since that number is variable I can't do it.
Is there any way to do this ?

Create 2 calculated fields:
Month(time)
SUM(visitors)/COUNT(DISTINCT(time))

In case it might help someone... so far (January 2021) it seems there is no way to do that in DataStudio. Calculated fields or data blending do not have a GROUP BY-like function.
So, I found 2 alternative solutions:
create an additional table in my data with the first aggregation (sum over regions). This gives a table with the number of visitors for each date.
Then I import it in DataStudio and do the second aggregation in the table.
since my data is stored in BigQuery, a custom SQL query can be used to create another data source from the same dataset. This way, a GROUP BY statement can be used to sum over regions before the average is calculated.
These solutions have a big drawback that is, I cannot add controls to filter by region (since data from all regions is aggregated before entering datastudio).

SSAS cube with date range records

I have to build a cube based on date range records, and not sure about the best way to proceed.
Imagine say a cube of Cars and warranty periods. Each car has a start date, and an end of warranty periods. Then there may be extended warranty periods.. so imagine
CAR REG TYPE WARRANTY START WARRANTY END
CAR A PURCHASE 01/01/2016 31/01/2016
CAR A EXTENDED 01/01/2017 30/06/2017
CAR A EXTENDED 01/08/2017 30/01/2018 -- note, gap here
CAR B PURCHASE 01/01/2016 31/01/2016
CAR B EXTENDED 01/01/2017 30/06/2017
CAR B EXTENDED 01/08/2017 30/01/2018 -- note, gap here
So multiple items, with multiple date ranges. There is a main table (CARS) with car details (colour, model, etc).
Now I want to build a cube, which is reportable at month level, cars under warranty/warranty type, etc.
So plan 1 was to build a view which explodes the above out by a join to a date table, report by month, and feed this into a cube. But, the number of cars multiplied by the months covered leads to multi hundreds of milions of rows - which means sometimes the server runs out of TempDB space, and when it does run, the cube takes hours to build.
Is there a better way - such as a view for the car details, and then another view on the warranty table (how do I get SSAS to deal with months in a date range) - will the join in SSAS be more efficient than a join in a view in SQL?
Thanks all.

You can connect start and end columns to time dimension. And on the report you can use ":" operator to build date tange report.
More details you will find here: http://www.purplefrogsystems.com/blog/2013/04/mdx-between-start-date-and-end-date/

One approach which will work with drag-and-drop client tools like Excel or Power BI would be a many-to-many Date dimension. Since car A and B match, let's pretend there's a car C which has a warranty from 2015-07-30 to 2015-12-31.
Create a DimWarrantyDateRangeKey which represents a unique combination of dates during which a warranty is active. The surrogate key is WarrantyDateRangeKey. Certainly the ETL that builds this table will be a bit expensive, but given the size of your data I think it's a worthwhile investment that will produce better query performance than if your m2m bridge table has one row per active day per car.
Each car should be assigned one WarrantyDateRangeKey. Add the WarrantyDateRangeKey column to your fact tables...
CAR REG WarrantyDateRangeKey
A 1
B 1
C 2
m2mWarrantyDateRange
WarrantyDateRangeKey DateKey
1 20160101
1 20160102
1 ...
1 20170629
1 20170630
1 20170801
1 20170802
1 ...
1 20180129
1 20180130
2 20150701
2 20150702
2 ...
2 20151230
2 20151231
The tables relate together as follows...
FactTable -> DimWarrantyDateRange <- m2mWarrantyDateRange -> DimDate
Then in the cube you DimWarrantyDateRange should be a dimension, m2mWarrantyDateRange should be a measure group with a count measure. DimDate should be a dimension. Then you should relate DimDate to FactTable as a many-to-many (m2m) dimension using m2mWarrantyDateRange as the intermediate measure group.
Now in Excel or Power BI you should be able to filter to a particular date and it will filter down to the cars which had an active warranty on that day.

How to merge rows of SQL data on column-based logic?

I'm writing a margin report on our General Ledger and I've got the basics working, but I need to merge the rows based on specific logic and I don't know how...
My data looks like this:
value1 value2 location date category debitamount creditamount
2029 390 ACT 2012-07-29 COSTS - Widgets and Gadgets 0.000 3.385
3029 390 ACT 2012-07-24 SALES - Widgets and Gadgets 1.170 0.000
And my report needs to display the two columns together like so:
plant date category debitamount creditamount
ACT 2012-07-29 Widgets and Gadgets 1.170 3.385
The logic to join them is contained in the value1 and value 2 column. Where the last 3 digits of value 1 and all three digits of value 2 are the same, the rows should be combined. Also, the 1st digit of value 1 will always been 2 for sales and 3 for costs (not sure if that matters)
IE 2029-390 is money coming in for Widgets and Gadgets sold to customers, while 3029-390 is money being spent to buy the Widgets and Gadgets from suppliers.
How can I so this programmatically in my stored procedure? (SQL Server 2008 R2)
Edit: Would I load the 3000's into one variable table the and the 2000's into another, then join the two on value2 and right(value1, 3)? Or something like that?

Try this:
SELECT RIGHT(LTRIM(RTRIM(value1)),3) , value2, MAX(location),
MAX(date), MAX(category), SUM(debitamount), SUM(creditamount) FROM
table1 GROUP BY RIGHT(LTRIM(RTRIM(value1)),3), value2
It will sum the credit amount and debit amount. It will choose the maximum string value in the other columns, assuming they are always the same when value2 and the last 3 digits of value1 are the same it shouldn't matter.

Access Report / SQL DCount function

I have the following Tables and Fields in my DB:
tblProductionRecords:
pk_fldProductionRecordID,
fk_fldJobNumber_ID,
fldPartsCompleted,
tblJobs:
pk_fldJobID,
fldJobNumber
tblEmployee_ProductionRecord:
pk_fldEmp_ProdRec,
fk_fldProdRec_ID,
fk_fldEmployee_ID,
fldHours
tblEmployees:
pk_fldEmployeeID,
FldName
So what I am tracking is production records for a given job. The pieces of the production records are:
Part Quantities.
Which employee(s) were in involved with completing the quantities (it could be 1 or more employees, hence the Many to Many relationship between employees and production records)
And how many hours each employee spent completing the quantities.
The problem I am faced with is tracking my total quantities for a job on my report. When a given production record has ore that one employee that worked on those quantities, the quantities are added for each employee record. So lets same 3 employees work on a job and the 3 of them created completed 1000 parts. In my report, it will show a total of 3000 completed.
I understand creating groups, footer, headers, running sums and the Count function (I believe).
What I suspect I need is a field in my Query that is a sum of how many records are in tblEmployee_ProductionRecord where fldProdRec_ID = "the current production record" in the detailed section. With that number I can divide the total completed quantities between the 3 (or how ever many) employees and do a sum of that field.
I hope this is clear enough. I and sincerly appriciate any help!
David

If I understood you correctly, your example (3 employees work on a job and each one created 1000 parts) will look like this in the tables:
tblEmployee_ProductionRecord:
pk_fldEmp_ProdRec fk_fldProdRec_ID fk_fldEmployee_ID fldHours
1 1 1 5
2 2 2 5
3 3 3 5
tblProductionRecords:
pk_fldProductionRecordID fk_fldJobNumber_ID fldPartsCompleted
1 1 1000
2 1 1000
3 1 1000
If yes, try the following query:
SELECT
tblProductionRecords.fk_fldJobNumber_ID,
Count(tblEmployee_ProductionRecord.pk_fldEmp_ProdRec) AS EmployeeCount,
Sum(tblProductionRecords.fldPartsCompleted) AS CompletedSum
FROM
tblEmployee_ProductionRecord
INNER JOIN tblProductionRecords ON tblEmployee_ProductionRecord.fk_fldProdRec_ID
= tblProductionRecords.pk_fldProductionRecordID
GROUP BY
tblProductionRecords.fk_fldJobNumber_ID
HAVING
tblProductionRecords.fk_fldJobNumber_ID=1;
This will return the following result:
fk_fldJobNumber_ID EmployeeCount CompletedSum
1 3 3000
--> for Job number one, three employees created a total of 3000 parts.
Is this what you wanted?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight