Ensure data integrity in SQL Server

Ensure data integrity in SQL Server - sql-server

I have to make some changes in a small system that stores data in one table as following:
TransId TermId StartDate EndDate IsActiveTerm
------- ------ ---------- ---------- ------------
1 1 2007-01-01 2007-12-31 0
1 2 2008-01-01 2008-12-31 0
1 3 2009-01-01 2009-12-31 1
1 4 2010-01-01 2010-12-31 0
2 1 2008-08-05 2009-08-04 0
2 2 2009-08-05 2010-08-04 1
3 1 2009-07-31 2010-07-30 1
3 2 2010-07-31 2011-07-30 0
where the rules are:
StartDate must be the previous
term EndDate + 1 day (terms cannot overlapping)
there are many terms per each transaction
term length is from 1 to n days (I
made 1 year to make it simpler in this example)
NOTE: IsActiveTerm is a computed column which depends on CurentDate so is not deterministic
I need to ensure terms not overlapping. In other words I want to enforce this condition even when inserting/updating a multiple rows.
What I am thinking of is to add an "INSTEAD OF" triggers (for both Insert and Update) but this requires to use cursors as I need to cope with multiple rows.
Does anyone have a better idea?

You can find pretty much everything about temporal databases in: Richard T. Snodgrass, "Developing Time-Oriented Database Applications in SQL", Morgan-Kaufman (2000), which i believe is out of print but can be downloaded via the link on his publication list

I've got working solution:
CREATE TRIGGER TransTerms_EnsureCon ON TransTerms
FOR INSERT, UPDATE, DELETE AS
BEGIN
IF (EXISTS (SELECT *
FROM TransTerms pT
INNER JOIN TransTerms nT
ON pT.TransId= nT.OfferLettingId
AND nT.TransTermId = pT.TransTermId + 1
WHERE nT.StartDate != DATEADD(d, 1, pT.EndDate)
AND pT.EndDate > pT.StartDate
AND nT.EndDate > nT.StartDate
)
)
RAISERROR('Transaction violates sequenced CONSTRAINT', 1, 2)
ROLLBACK TRANSACTION
END
P.S. Many thanks wallenborn!

Related

T-SQL Conditional Max or Sum based on group criteria

I have not been able to figure this out and I want to find out the most efficient way of doing this. I have this table (partial records shown):
prefix FolioNumber CmmtAmount CmmtNumber
------- ----------- ---------- ----------
100981 10098100005 1 100981100
100981 10098100006 3 100981100
100981 10098100007 9 100981100
100981 10098100009 2 100981100
100981 10098100010 6 100981100
600499 60049900001 0 NULL
600499 60049900003 2 600499300
600499 60049900004 5 600499500
From that table I need to come up with this result set:
prefix CmmtAmount
------ ----------
100981 9
600499 7
This is the logic:
For each prefix:
if there are multiple but identical CmmtNumber records, pick the
one with the MAX(CmmtAmount)
if there are multiple but different CmmtNumber records,
display the SUM(CmmtAmount) for all those records.
I've been looking at OVER/PARTITION but can't come up with the right query. Please help! Thanks.

Use CASE WHEN:
select
prefix,
case when min(CmmtNumber) = max(CmmtNumber) then max(CmmtAmount)
else sum(CmmtAmount)
end as CmmtAmount
from mytable
group by prefix
order by prefix;

I'm not sure, if I understand your requirements completely, but I'll give it a try:
SELECT Prefix, MAX(CmmtAmount)
FROM YOUR_TABLE
GROUP BY Prefix
HAVING MIN(CmmtNumber)=MAX(CmmtNumber)
UNION
SELECT Prefix, SUM(CmmtAmount)
FROM YOUR_TABLE
GROUP BY Prefix
HAVING MIN(CmmtNumber) != MAX(CmmtNumber)
I hope, this works for you.

Advice on how best to manage this dataset?

New to SAS and would appreciate advice and help on how best to handle this data mangement situation.
I have a dataset in which each observation represents a client. Each client has a "description" variable which could include either a comprehensive assessment, treatment or discharge. I have created 3 new variables to flag each observation if they contain one of these.
So for example:
treat_yes = 1 if description contains "tx", "treatment"
dc_yes = 1 if description contains "dc", "d/c" or "discharge"
ca_yes = 1 if desciption contains "comprehensive assessment" or "ca" or "comprehensive ax"
My end goal is to have a new dataset of clients that have gone through a Comprehensive Assessment, Treatment and Discharge.
I'm a little stumped as to what my next move should be here. I have all my variables flagged for clients. But there could be duplicate observations just because a client could have come in many times. So for example:
Client_id treatment_yes ca_yes dc_yes
1234 0 1 1
1234 1 0 0
1234 1 0 1
All I really care about is if for a particular client the variables treatment_yes, ca_yes and dc_yes DO NOT equal 0 (i.e., they each have at least one "1". They could have more than one "1" but as long as they are flagged at least once).
I was thinking my next step might be to collapse the data (how do you do this?) for each unique client ID and sum treatment_yes, dc_yes and ca_yes for each client.
Does that work?
If so, how the heck do I accomplish this? Where do I start?
thanks everyone!

I think the easiest thing to do at this point is to use a proc sql step to find the max value of each of your three variables, aggregated by client_id:
data temp;
input Client_id $ treatment_yes ca_yes dc_yes;
datalines;
1234 0 1 1
1234 1 0 0
1234 1 0 1
;
run;
proc sql;
create table temp_collapse as select distinct
client_id, max(treatment_yes) as treatment_yes,
max(ca_yes) as ca_yes, max(dc_yes) as dc_yes
from temp
group by client_id;
quit;
A better overall approach would be to use the dataset you used to create the _yes variables and do something like max(case when desc = "tx" then 1 else 0 end) as treatment_yes etc., but since you're still new to SAS and understand what you've done so far, I think the above approach is totally sufficient.

The following code allows you to preserve other variables from your original dataset. I have added two variables (var1 and var2) for illustrative purposes:
data temp;
input Client_id $ treatment_yes ca_yes dc_yes var1 var2 $;
datalines;
1234 0 1 1 10 A
1234 1 0 0 11 B
1234 1 0 1 12 C
;
run;
Join the dataset with itself so that each row of a client_id in the original dataset is merged with its corresponding row in an aggregated dataset constructed in a subquery.
proc sql;
create table want as
select *
from temp as a
left join (select client_id,
max(treatment_yes) as max_treat,
max(ca_yes) as max_ca,
max(dc_yes) as max_dc
from temp
group by client_id) as b
on a.client_id=b.client_id;
quit;

Can I set rules for string comparison in SQL? (or do I need to hardcode using CASE WHEN)

I need to make a comparison for ratings in two points in time and indicate if the change was upwards,downwards or stayed the same.
For example:
This would be a table with four columns:
ID T0 T0+1 Status
1 AAA AA Lower
2 BB A Higher
3 C C Same
However, this does not work when applying regular string comparison, because in SQL
A<B
B<BBB
I need
A>B
B<BBB
So my order(highest to lowest): AAA,AA,A,BBB,BB,B
SQL order(highest to lowest): BBB,BB,B,AAA,AA,A
Now I have 2 options in mind, but I wonder if someone know a better one:
1) Use CASE WHEN statements for all the possibilities of ratings going up and down ( I have more values than indictaed above)
CASE WHEN T0=T0+1 then 'Same'
WHEN T0='AAA' and To+1<>'AAA' then 'Lower'
....adress all other options for rating going down
ELSE 'Higher'
However, this generates a very large number of CASE WHEN statements.
2) My other option requires generating 2 tables. In table 1 I use case when statements to assign values/rank to the ratings.
For example:
CASE WHEN T0='AAA' then 6
CASE WHEN T0='AA' then 5
CASE WHEN T0='A' then 4
CASE WHEN T0='BBB' then 3
CASE WHEN T0='BB' then 2
CASE WHEN T0='B' then 1
The same for T0+1.
Then in table 2 I use a regular compariosn between column T0 and Column T0+1 on the numeric values.
However, I am looking for a solution where I can do it in one table (with as little lines as possible), and optimally never really show the ranking column.
I think a nested statement would be the best option, but it did now work for me.
Anybody has suggestions?
I use SQL Server 2008.

If you are using Credit Rating, this is very likely that this is not just about AAA > AA or BBB > BB.
Whether you are using one agency or another, it could also be AA+ or Aa1 for long term, F1+ for short term or something else in different contexts or with other agencies.
It is also often requiered to convert data from one agency to other agencies Rating.
Therefore it is better to use a mapping table such as:
Id | Rating
0 | AAA
1 | AA+
2 | AA
3 | AA-
4 | A+
5 | A
6 | A-
7 | BBB+
Using this table, you only have to join the rating in your data table with the rating in the mapping table:
SELECT d.Rating_T0, d.Rating_T1
CASE WHEN d.Rating_T0 = d.Rating_T1 THEN '='
WHEN m0.id < m1.id THEN '<'
WHEN m0.id > m1.id THEN '>'
END
FROM yourData d
INNER JOIN RatingMapping m0
ON m0.Rating= d.Rating_T0
INNER JOIN RatingMapping m1
ON m1.Rating= d.Rating_T1
If you only store the Rating id in you data table, you will not only save space (1 byte for tinyint versus up to 4 chars) but will also be able to compare without the JOIN to the mapping table.
SELECT d.Rating_Id0, d.Rating_Id1
CASE WHEN d.Rating_Id0 = d.Rating_Id1 THEN '='
WHEN d.Rating_Id0 < d.Rating_Id1 THEN '<'
WHEN d.Rating_Id0 > d.Rating_Id1 THEN '>'
END
FROM yourData d
The JOIN would only be requiered when you want to display the actual Rating value such as AAA for Rating_ID = 0.
You could also add an agency_Id to the Mapping table. This way, you can easily choose which Notation agency you want to display and easily convert between Agency 1 and Agency 2 or Agency 3 (ie. Id 1 => S&P and Id 2 => Fitch, Id 3 => ...)

Fill secondly data from Q KDB+

I have a csv file with some high frequency stock price data, and I'd like to get a secondly price data from the table.
In each file, there are columns named date, time, symbol, price, volume, and etc.
There are some seconds with no trading so there are missing data in some seconds.
I'm wondering how could I fill the missing data in Q to get the secondly data from 9:30 to 16:00 in full? If there is missing price, just use the recently price as its price in that second.
I'm considering to write some loop, but I don't know how to exactly to that.

Simplifying a little, I'll assume you have some random timestamps in your dataset like this:
time price
--------------------------------------
2015.01.20D22:42:34.776607000 7
2015.01.20D22:42:34.886607000 3
2015.01.20D22:42:36.776607000 4
2015.01.20D22:42:37.776607000 8
2015.01.20D22:42:37.886607000 7
2015.01.20D22:42:39.776607000 9
2015.01.20D22:42:40.776607000 4
2015.01.20D22:42:41.776607000 9
so there are some missing seconds there. I'm going to call this table t. So if you do a by-second type of query, obviously the seconds that are missing are still missing:
q)select max price by time.second from t
second | price
--------| -----
22:42:34| 7
22:42:36| 4
22:42:37| 8
22:42:39| 9
22:42:40| 4
22:42:41| 9
To get missing seconds, you have to join a list of nulls. In this case we know the data goes from 22:42:34 to 22:42:41, but in reality you'll have to find the min/max time and use that to create a temporary "null" table to join against:
q)([] second:22:42:34 + til 1+`int$22:42:41-22:42:34 ; price:(1+`int$22:42:41-22:42:34)#0N)
second price
--------------
22:42:34
22:42:35
22:42:36
22:42:37
22:42:38
22:42:39
22:42:40
22:42:41
Then left join:
q)([] second:22:42:34 + til 1+`int$22:42:41-22:42:34 ; price:(1+`int$22:42:41-22:42:34)#0N) lj select max price by time.second from t
second price
--------------
22:42:34 7
22:42:35
22:42:36 4
22:42:37 8
22:42:38
22:42:39 9
22:42:40 4
22:42:41 9
You can use fills or whatever your favourite filling heuristic is after that.
q)fills `second xasc asc ([] second:22:42:34 + til 1+`int$22:42:41-22:42:34 ; price:(1+`int$22:42:41-22:42:34)#0N) lj select max price by time.second from t
second price
--------------
22:42:34 7
22:42:35 7
22:42:36 4
22:42:37 8
22:42:38 8
22:42:39 9
22:42:40 4
22:42:41 9
(Note the sort on second before fills!)
By the way for larger tables this will be much faster than a loop. Loops in q are generally a bad idea.
EDIT
You could use a comma join too, both tables need to be keyed on the second column
t,t1
(where t1 is the null-filled table keyed on second)
I haven't tested it, but I suspect it would be slightly faster than the lj version.

Using aj which is one of the most powerful features of KDB:
q)data
sym time price size
----------------------------
MS 10:24:04 93.35974 8
MS 10:10:47 4.586986 1
APPL 10:50:23 0.7831685 1
GOOG 10:19:52 49.17305 0
in-memory table needs to be sym,time sorted with g# attribute applied to sym column
q)data:update `g#sym from `sym`time xasc data
q)meta trade
c | t f a
-----| -----
sym | s g
time | v
price| f
size | j
Creating a rack table intervalized per second per sym :
q)rack: `sym`time xasc (select distinct sym from data) cross ([] time:{x[0]+til `int$x[1]-x[0]}(min;max)#\:data`time)
Using aj to join the data :
q)aj[`sym`time; rack; data]

In SSRS, how can I add a row to aggregate all the rows that don't match a filter?

I'm working on a report that shows transactions grouped by type.
Type Total income
------- --------------
A 575
B 244
C 128
D 45
E 5
F 3
Total 1000
I only want to provide details for transaction types that represent more than 10% of the total income (i.e. A-C). I'm able to do this by applying a filter to the group:
Type Total income
------- --------------
A 575
B 244
C 128
Total 1000
What I want to display is a single row just above the total row that has a total for all the types that have been filtered out (i.e. the sum of D-F):
Type Total income
------- --------------
A 575
B 244
C 128
Other 53
Total 1000
Is this even possible? I've tried using running totals and conditionally hidden rows within the group. I've tried Iif inside Sum. Nothing quite seems to do what I need and I'm butting up against scope issues (e.g. "the value expression has a nested aggregate that specifies a dataset scope").
If anyone can give me any pointers, I'd be really grateful.
EDIT: Should have specified, but at present the dataset actually returns individual transactions:
ID Type Amount
---- ------ --------
1 A 4
2 A 2
3 B 6
4 A 5
5 B 5
The grouping is done using a row group in the tablix.

One solution is to solve that in the SQL source of your dataset instead of inside SSRS:
SELECT
CASE
WHEN CAST([Total income] AS FLOAT) / SUM([Total income]) OVER (PARTITION BY 1) >= 0.10 THEN [Type]
ELSE 'Other'
END AS [Type]
, [Total income]
FROM Source_Table
See also SQL Fiddle

Try to solve this in SQL, see SQL Fiddle.
SELECT I.*
,(
CASE
WHEN I.TotalIncome >= (SELECT Sum(I2.TotalIncome) / 10 FROM Income I2) THEN 10
ELSE 1
END
) AS TotalIncomePercent
FROM Income I
After this, create two sum groups.
SUM(TotalIncome * TotalIncomePercent) / 10
SUM(TotalIncome * TotalIncomePercent)
Second approach may be to use calculated column in SSRS. Try to create a calculated column with above case expression. If it allows you to create it, you may use it in the same way as SQL approach.

1) To show income greater than 10% use row visibility condition like
=iif(reportitems!total_income.value/10<= I.totalincome,true,false)
here reportitems!total_income.value is total of all income textbox value which will be total value of detail group.
and I.totalincome is current field value.
2)add one more row to outside of detail group to achieve other income and use expression as
= reportitems!total_income.value-sum(iif(reportitems!total_income.value/10<= I.totalincome,I.totalincome,nothing))

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight