I have a dataset called billing and payments, which I have appended together, then sorted by id, such that one will be able to view the bill charged to each id, and the payments made by each id.
Here is a snapshot of it:
Dataset snapshot
contractaccount printdate paymentdate billingperiod duedate payable payment
12345 01jan2015 201501 15jan2015 100
12345 13jan2015 50
12345 13jan2015 50
12345 29jan2015 201502 13feb2015 150
12345 03feb2015 150
12345 05mar2015 201503 20mar2015 100
12345 21mar2015 80
12345 22mar2015 20
23456 15jan2015 201501 31jan2015 200
23456 20jan2015 200
23456 12feb2015 201502 28feb2015 220
23456 13feb2015 100
23456 15feb2015 100
23456 20feb2015 20
23456 10mar2015 201503 20mar2015 200
23456 18mar2015 100
23456 20mar2015 100
I intend to sum the payments made, if the payment is being made between the printdate and duedate, with the additional condition that the billingperiod corresponds to a print date, for example, sum the payments made by id 12345, between 01jan2015 and 15jan2015, provided that the billingperiod "201501" corresponds to the printdate 01jan2015.
I have tried the following code:
bysort contractaccount: gen paymentsum=sum(payment) if paymentdate<=duedate & paymentdate>=printdate
but I still don't know how to add in the additional condition mentioned above. It also does not give me what I want, as the code generated a cumulative sum of payments for each id. Ideally, I'd like to generate an output that would show the sum of payments made for each id subject to all the date conditions aforementioned (for each billingperiod, what is the sum of payments made by each id).
I have a hunch I'd need to write a loop on this, which is another challenge as I'm still a new user of Stata.
Related
Name Canada US Euro
John 10 50 60
Mindy 5 60 100
Joe 20 15 55
Alan 10 30
Visual of original table
Into The below Format (effectively a concatenation of the first column and the first row into one column and then taking the array of number values corresponding to the person and geography into one column).
Visual of Desired Table
Name Fields
John Canada 10
John US 50
John Euro 60
Mindy Canada 5
Mindy US 60
Mindy Euro 100
Joe Canada 20
Joe US 15
Joe Euro 55
Alan Canada 10
Alan US 30
I am able to get it to work without the concatenation of the first column and first row.
I need to model a star schema for some business needs about liquidity stress testing.
i will try to find an analogy example.
let's say we have deals about financing/financial securities etc
in the fact table,
at a given date, this deals have the real value of X euros
but will have a variation in time. thus we have some projection values.
my concern is about how to represent this projection values for this deals, and more specifically what granularity to choose.
( the example below is oversimplification of the fact table -- and yes it's Dimension Id's that
are used otherwise )
Method 1 : as many metric columns as projection values calculated
AsofDate
DealId
0Day
1Day
7Days
1Month
2022-01-01
financingDeal1
100
99
98
85
2022-01-01
financingDeal2
150
150
120
120
2022-01-01
financingDeal3
100
99
98
85
2022-01-01
financingDeal4
100
99
98
85
Method 2 : add a granularity : a row is not anymore only a deal on a given that. it's a deal in a given date and it's projection in the next few days/months
AsofDate
DealId
projection
value
2022-01-01
financingDeal1
0Day - actual
100
2022-01-01
financingDeal1
1Day
99
2022-01-01
financingDeal1
7Days
99
2022-01-01
financingDeal1
1Month
85
from where i see it :
for method 1, the main inconvenient is if in the futur, we have a new projection value for 3 months, we will need to add a column in the ETL/ in the OLAP cube for '3months'
for method 2:
we will have as many rows as (deals * projections) and we do have 11 projections so it's 11 rows for each deal and we do have 1Million+ of them.
what is your opinion on this topic?
Thanks for your consideration
There is a table below and further below is the question. Is this a multi-line code with parenthesis to use for this? This is for a business analyst assignment...its the first time im using sql(I've used python, js, html, css self-taught back when i was trying to be a web developer)
SQL Queries
Table Name: TRADES
DATE FIRM SYMBOL SIDE QUANTITY PRICE
2/3/2014 1ABC A123 B 200 41
2/4/2014 2BCD B234 B 600 60
2/7/2014 1ABC C345 S 600 70
2/10/2014 3CDE C345 S 600 70
2/12/2014 4DEF B234 B 200 62
2/14/2014 3CDE B234 B 300 61
2/21/2014 1ABC A123 B 300 40
2/24/2014 1ABC A123 S 300 30
2/25/2014 4DEF C345 B 2100 71
2/27/2014 CDE B234 S 1100 63
Q3. Your business user asks you to show them a table that includes the number of trades for each firm and symbol combination in the data table above. Please write the SQL query you would use to query TRADES table to get below result
FIRM SYMBOL NO_TRADES
1ABC A123 3
2BCD B234 1
1ABC C345 1
3CDE C345 1
4DEF B234 1
3CDE B234 1
4DEF C345 1
CDE B234 1
This looks like simple aggregation:
select firm, symbol, count(*) no_trades
from mytable
group by firm, symbol
order by no_trades desc, symbol
Based on the following Data:
Location Salesperson Category SalesValue
North Bill Bikes 10
South Bill Bikes 90
South Bill Clothes 250
North Bill Accessories 20
South Bill Accessories 20
South Bob Bikes 200
South Bob Clothess 400
North Bob Accesories 40
I have the following Sales PivotTable in Excel 2016
Bill Bob
Bikes 100 200
Clothes 10 160
Accessories 40 40
I would now like to diplay the difference between Bill and Bob and, importantly, be able to sort the table by difference. I have tried adding the Sales a second time and displaying it as a difference to "Bill". This gives me the correct values but sorts according to the underlying sales value and not the computed difference.
Bill Bob Difference
Bikes 100 200 100
Clothes 10 160 150
Accessories 40 40 0
I am fairly sure I need to use some form of DAX calculation but am having difficulty finding out exactly how. Can anyone give me a pointer?
Create a measure for that calculation:
If Bill and Bob are columns in your table.
Difference = ABS(TableName[Bill] - TableName[Bob])
If Bill and Bob are measures:
Difference = ABS([Bill] - [Bob])
UPDATE: Expression to only calculate difference between Bob and Bill.
Create a measure (in this case DifferenceBillAndBob) and use the following expression.
DifferenceBillAndBob =
ABS (
SUMX ( FILTER ( Sales, Sales[SalesPerson] = "Bob" ), [SalesValue] )
- SUMX ( FILTER ( Sales, Sales[SalesPerson] = "Bill" ), [SalesValue] )
)
It is not tested but should work.
Let me know if this helps.
I have two needs in my query
First : to have a sorted product list base on my measure.product with higher sales should appears first.
ProductCode Sales
----------- ------------
123 18
332 17
245 16
656 15
Second : to have cumulative sum on my presorted product list.
ProductCode Sales ACC
----------- ------------ ----
123 18 18
332 17 35
245 16 51
656 15 66
I wrote below MDX in order to achieve above goal:
WITH
SET SortedProducts AS
Order([DIMProduct].[ProductCode].[ProductCode].AllMEMBERS,[Measures]. [Sales],BDESC)
MEMBER [Measures].[ACC] AS
Sum
(
Head
(
[SortedProducts],Rank([DIMProduct].[ProductCode].CurrentMember,[SortedProducts])
)
,[Measures].[Sales]
)
SELECT
{[Measures].[Sales] ,[Measures].[ACC]}
ON COLUMNS,
SortedProducts
ON ROWS
FROM [Model]
But it takes about 3 minutes to run,any suggestion on how to optimize my code or is it normal?
I have 9635 products in total
if you do a quick research on google, there are different ways to achieve it (many answers here as well).
That said, I will give a try to this different way to calculate your running total
MEMBER [Measures].[SortedRank] AS Rank([Product].[Product].CurrentMember, [SortedProducts])
MEMBER [Measures].[ACC2] AS SUM(TopCount([SortedProducts], [Measures].[SortedRank]) ,[Measures].[Internet Sales Amount])
I don't know if TopCount will perform faster than Head for your case, but for example your query on my test machine on AdventureWorks cube takes the same time using Head or TopCount function.
Hope this helps