I have a cube built in SSAS2008 that is built upon 2 tables - Accounts and Payments.
Both tables have dimensions and measures, and it appears to be working relatively well - I can get payments for accounts broken down into dimensions for either payments or accounts, and account measures broken down into account dimensions.
What I can't do is view measures for accounts where a relationship exists with the child payments table. For example, see the balance of all accounts that have at least 1 payment.
I understand I may need a separate cube for this, but I still can't see how my data source would need to be configured.
I'd ideally not like/have to completely reformat the data into a fact / dimension snowflake schema, as I'm not entirely sure how to do this with the relational data I have, however any suggestions on this would be welcome.
Thanks.
Update: Bounty added due to lack of interest...
My answer takes into account that you don't want to reformat your data into a traditional data warehouse schema. If it gets you further down the road then good for you but I suspect you'll run into more of these problems as you grow your project. It might be worth tinkering with how you might transform the data into a star schema before you need it.
I can suggest a few options. The first that comes to mind is to make a degenerate dimension in the accounts cube that is based on the payments fact table. The following example answers your "all accounts that have a payment" problem but this should work for similar questions. I assumed an account statement date as the last day of each calendar month so you'll want to count payments made in each calendar month.
create table accounts_fact
( account_id int not null,
statement_date datetime not null,
bal int not null,
constraint acc_pk primary key (account_id, statement_date)
)
create table payments_fact
( account_id int not null,
payment_date datetime not null,
amount money not null
)
insert into accounts_fact values (1, '20100131', 100)
insert into accounts_fact values (1, '20100228', 120)
insert into accounts_fact values (1, '20100331', 0)
insert into accounts_fact values (2, '20100131', 100)
insert into accounts_fact values (2, '20100228', 20)
insert into accounts_fact values (2, '20100331', 50)
insert into accounts_fact values (3, '20100131', 10)
insert into accounts_fact values (3, '20100228', 30)
insert into accounts_fact values (3, '20100331', 50)
insert into payments_fact values (1, '20100112', 50)
insert into payments_fact values (1, '20100118', 60)
insert into payments_fact values (1, '20100215', 70)
insert into payments_fact values (1, '20100318', 80)
insert into payments_fact values (1, '20100331', 90)
insert into payments_fact values (2, '20100112', 50)
insert into payments_fact values (2, '20100215', 60)
insert into payments_fact values (2, '20100320', 70)
insert into payments_fact values (3, '20100101', 50)
insert into payments_fact values (3, '20100118', 60)
insert into payments_fact values (3, '20100318', 70)
create view dim_AccountPayments
as
select acc.account_id, acc.statement_date,
sum(case when pay.payment_date IS NULL THEN 0
else 1
end) as payment_count
from accounts_fact acc
left outer join payments_fact pay on acc.account_id = pay.account_id
and pay.payment_date >= dateadd(mm, -1, dateadd(dd, 1, acc.statement_date))
and pay.payment_date <= acc.statement_date
group by acc.account_id, acc.statement_date
select * from dim_AccountPayments
This produces the following results:
account_id statement_date payment_count
1 2010-01-31 00:00:00.000 2
1 2010-02-28 00:00:00.000 1
1 2010-03-31 00:00:00.000 2
2 2010-01-31 00:00:00.000 1
2 2010-02-28 00:00:00.000 1
2 2010-03-31 00:00:00.000 1
3 2010-01-31 00:00:00.000 2
3 2010-02-28 00:00:00.000 0
3 2010-03-31 00:00:00.000 1
It should now be a doddle to make a payments count dimension in your accounts cube. For extra points, remove the group by and sum in the view to let SSAS do the aggregation; it suited me to show the results table above. Use the view's SQL in your data source view you don't have create view permission in the source database.
Option 2 would be to make payment count from the view above a measure in the accounts cube. You can do this similarly to the above solution except make your accounts fact use a view similar to dim_AccountPayments. This time you must group by all key fields and aggregate the measures in the database... Very ugly. I don't recommend it but it is possible.
If you go with option 1 then it's simple enough to make a named set in the account payments dimension called 'Made a payment this month' which is children of all filtered to remove 0.
I hope I understood your question. I did have to make quite a few assumptions about your data structures but I hope it's useful.
Good luck.
Related
I have variables in a column named Tier. Please see below some examples;
Tier
Tier 1 (Unspecified)
Tier 7 (Anti-client)
Tier 3 (Priority)
I would like the variables to be transformed like the below;
Tier
Tier 1
Tier 7
Tier 3
Would you know how to efficiently remove all the strings in brackets at the end of the variables?
Thanks
Chris
Is this what you need?
create table #table1
(id int
,Tier varchar(100)
)
insert into #table1 VALUES
(1, 'Tier 1 (Unspecified)'),
(2, 'Tier 7 (Anti-client)'),
(3, 'Tier 3 (Priority)')
select id,
substring(tier, 1, charindex('(', tier) - 1) as Tier
from #table1
You can use substring(tier, 1, charindex('(', tier) - 2) as Tier if you are sure that there's a space before the bracket
General Scenario:
I have an aggregated table per user and date with several measures.
the table stores up to 10 records per user and date (could be less, depending on the user activity)
There is a column which is the sequence occurrence ordered by date.
Sample:
CREATE TABLE #Main (UserId int , DateId int , MeasureA numeric(20,2) , MeasureB numeric(20,2), PlayDaySeq int)
INSERT INTO #Main
VALUES (188, 20180522 ,75.00, 282287.00, 1),
(188, 20180518 ,250.00, 1431725.00, 2),
(188, 20180514 ,25.00, 35500.00, 3),
(188, 20180513 ,115.00, 67100.00, 4),
(188, 20180511 ,75.00, 10625.00, 5),
(188, 20180510 ,40.00, 2500.00, 6),
(188, 20180509 ,40.00, 750.00, 7),
(188, 20180508 ,160.00, 16250.00, 8),
(188, 20180507 ,135.00, 138200.00, 9),
(188, 20180507 ,150.00, 68875.00, 10)
The Column PlayDaySeq is calculated as ROW_NUMBER () OVER (PARTITION BY UserID ORDER BY DateId DESC)
and here is the table that will hold the new aggregated data for this is user:
CREATE TABLE #Inc (UserId int , DateId int , MeasureA numeric(20,2) , MeasureB numeric(20,2), PlayDaySeq int)
INSERT INTO #Inc
VALUES (188, 20180523 ,225.00, 802921.00, 1)
Now, a new record is available so I used The following:
INSERT INTO #Main
SELECT *
FROM #Inc I
WHERE NOT EXISTS
(
SELECT 1
FROM #Main M
WHERE i.UserId = M.UserId
AND i.DateId = M.DateId
)
The Question is
I need to update the PlayDaySeq column so the new record will 1 and all the rest will increment by 1
and delete the records that their sequence will be greater than 10
What is the best way of doing that?
keep in mind that the #main table is pretty large (250M records).
I can update the sequence by running the ROW_NUMBER again, and then DELETE the ones that will be greater than 10,
I'm looking for the most efficient way to do that.
Updating one row resulting an update of every other single record does not sounds a good idea despite how infrequently it is. Like the comment already mentioned that I don't see the need of such a column.
But you stated you have you reason so I will assume that is true.
My suggestion is drop PlayDaySeq on the table and create a view with following as additional column.
ROW_NUMBER () OVER (PARTITION BY UserID ORDER BY DateId DESC) AS PlayDaySeq
And then whatever your code was using that table now should use the view, should keep the change minimal. But you need to test this out see what's the performance like. Also if you changing the view to indexed view, SQL server stores the value as a table like thing, which when you insert new record it would automatically update things for you, again you need test performance, on insert.
If I were you I would be more willing to try a different approach, such as instead of make it 1,2,3 I set it to 100,200,300, hence when insertion need are smaller like 20 records a day I then never need update rest record but just put in 11,12 101,102 which would still keeps the order correct, and a nightly job to update whole table to be 100,200,300 again for a fresh start next day, or make the code to only do it when running out of numbers, , but due to how you are using it as you state this other meaning, it may not work at all.
I have three columns in MyTable and I need to calculate % of total by each group.
Code:
SELECT AppVintage, Strategy, Apps
FROM mytable
GROUP BY AppVintage,Strategy,Apps
Each date appears 4 times, with a different class for each line and then a total of applications for each line.
The table looks something like this:
Code for the sample data set:
CREATE TABLE mytable(
AppVintage VARCHAR(6) NOT NULL PRIMARY KEY
,Strategy VARCHAR(28) NOT NULL
,Apps INTEGER NOT NULL
);
INSERT INTO mytable(AppVintage,Strategy,Apps) VALUES ('Nov16','300',10197);
INSERT INTO mytable(AppVintage,Strategy,Apps) VALUES ('Nov-16','ORIG',29023);
INSERT INTO mytable(AppVintage,Strategy,Apps) VALUES ('Nov-16','400',7219);
INSERT INTO mytable(AppVintage,Strategy,Apps) VALUES ('Nov-16','500',9452);
INSERT INTO mytable(AppVintage,Strategy,Apps) VALUES ('Dec-16','300',12517);
INSERT INTO mytable(AppVintage,Strategy,Apps) VALUES ('Dec-16','ORIG',37762);
INSERT INTO mytable(AppVintage,Strategy,Apps) VALUES ('Dec-16','400',8992);
INSERT INTO mytable(AppVintage,Strategy,Apps) VALUES ('Dec-16','500',11229);'
What I need is to add a column that calculates percentage of apps per strategy for each appvintage.
Is there a way to do this? Thanks in advance!
You can use window functions to do this (Assuming SQL Server 2008+)
SELECT
AppVintage,
Strategy,
Apps,
Apps/SUM(Apps) OVER (PARTITION BY AppVintage) as Percent_Apps
FROM myTable
Let's say that we have to store information of different types of product in a database. However, these products have different specifications. For example:
Phone: cpu, ram, storage...
TV: size, resolution...
We want to store each specification in a column of a table, and all the products (whatever the type) must have a different ID.
To comply with that, now I have one general table named Products (with an auto increment ID) and one subordinate table for each type of product (ProductsPhones, ProductsTV...) with the specifications and linked with the principal with a Foreign Key.
I find this solution inefficient since the table Products has only one column (the auto incremented ID).
I would like to know if there is a better approach to solve this problem using relational databases.
The short answer is no. The relational model is a first-order logical model, meaning predicates can vary over entities but not over other predicates. That means dependent types and EAV models aren't supported.
EAV models are possible in SQL databases, but they don't qualify as relational since the domain of the value field in an EAV row depends on the value of the attribute field (and sometimes on the value of the entity field as well). Practically, EAV models tend to be inefficient to query and maintain.
PostgreSQL supports shared sequences which allows you to ensure unique auto-incremented IDs without a common supertype table. However, the supertype table may still be a good idea for FK constraints.
You may find some use for your Products table later to hold common attributes like Type, Serial number, Cost, Warranty duration, Number in stock, Warehouse, Supplier, etc...
Having Products table is fine. You can put there all the columns common across all types like product name, description, cost, price just to name some. So it's not just auto increment ID. Having an internal ID of type int or long int as the primary key is recommended. You may also add another field "code" or whatever you want to call it for user-entered or user-friendly which is common with product management systems. Make sure you index it if used in searching or query criteria.
HTH
While this can't be done completely relationally, you can still normalize your tables some and make it a little easier to code around.
You can have these tables:
-- what are the products?
Products (Id, ProductTypeId, Name)
-- what kind of product is it?
ProductTypes (Id, Name)
-- what attributes can a product have?
Attributes (Id, Name, ValueType)
-- what are the attributes that come with a specific product type?
ProductTypeAttributes (Id, ProductTypeId, AttributeId)
-- what are the values of the attributes for each product?
ProductAttributes (ProductId, ProductTypeAttributeId, Value)
So for a Phone and TV:
ProductTypes (1, Phone) -- a phone type of product
ProductTypes (2, TV) -- a tv type of product
Attributes (1, ScreenSize, integer) -- how big is the screen
Attributes (2, Has4G, boolean) -- does it get 4g?
Attributes (3, HasCoaxInput, boolean) -- does it have an input for coaxial cable?
ProductTypeAttributes (1, 1, 1) -- a phone has a screen size
ProductTypeAttributes (2, 1, 2) -- a phone can have 4g
-- a phone does not have coaxial input
ProductTypeAttributes (3, 2, 1) -- a tv has a screen size
ProductTypeAttributes (4, 2, 3) -- a tv can have coaxial input
-- a tv does not have 4g (simple example)
Products (1, 1, CoolPhone) -- product 1 is a phone called coolphone
Products (2, 1, AwesomePhone) -- prod 2 is a phone called awesomephone
Products (3, 2, CoolTV) -- prod 3 is a tv called cooltv
Products (4, 2, AwesomeTV) -- prod 4 is a tv called awesometv
ProductAttributes (1, 1, 6) -- coolphone has a 6 inch screen
ProductAttributes (1, 2, True) -- coolphone has 4g
ProductAttributes (2, 1, 4) -- awesomephone has a 4 inch screen
ProductAttributes (2, 2, False) -- awesomephone has NO 4g
ProductAttributes (3, 3, 70) -- cooltv has a 70 inch screen
ProductAttributes (3, 4, True) -- cooltv has coax input
ProductAttributes (4, 3, 19) -- awesometv has a 19 inch screen
ProductAttributes (4, 4, False) -- awesometv has NO coax input
The reason this is not fully relational is that you'll still need to evaluate the value type (bool, int, etc) of the attribute before you can use it in a meaningful way in your code.
I have a table with millions of rows. I have a column which should be set to true 90 days after the record was created. I have a createdDate column to track the created date. How to handle this for large tables?
Since column should get updated depending on date created you need to place the query in a job and schedule it every day.You can use below query for updating the column values.
Link to add SQL code to run through job daily:how to schedule a job for sql query to run daily?
Update Column='yourvalue'
From TableName T
Where DateDiff(DD,T.CreateDate,Getdate())>90
It is better to run SQL command for updating query even it is a big database.
Why not simply use a computed column? That way the value will always be correct and yet there is no need for (costly) updates that need to be run on a regular basis.
CREATE TABLE t_my_table (row_id int IDENTITY (1,1) PRIMARY KEY,
value_x int NOT NULL,
value_y int NOT NULL,
createDate datetime NOT NULL,
is_90_days AS (CASE WHEN DateDiff(day, createDate, CURRENT_TIMESTAMP) >= 90 THEN Convert(bit, 1) ELSE Convert(bit, 0) END)
)
GO
INSERT INTO t_my_table
(value_x, value_y, createDate)
VALUES (10, 100, '18 aug 2016'),
(20, 200, '25 nov 2016'),
(30, 300, '12 dec 2016'),
(40, 400, '14 may 2017')
GO
SELECT * FROM t_my_table
To add the computed column to an already existing table, use this syntax:
ALTER TABLE t_my_table ADD is_90_days AS (CASE WHEN DateDiff(day, createDate, CURRENT_TIMESTAMP) >= 90 THEN Convert(bit, 1) ELSE Convert(bit, 0) END)
Adding a computed column is pretty much instantaneous since it's only a meta-data operation. The actual value is calculated upon SELECTion of the record. Since the DateDiff operation is pretty quick you'll hardly notice it.