This is a simplified example of what I want to do. Assume there is table named contractor that looks like this:
name | paid_adjustment_amount | adj_date
Bob | 1000 | 4/7/2016
Mary | 2000 | 4/8/2016
Bill | 5000 | 4/8/2016
Mary | 4000 | 4/10/2016
Bill | (1000) | 4/12/2016
Ann | 3000 | 4/30/2016
There is a view of the contractor table, let's call it v_sum, that is just a SUM of the paid_adustment_amount grouped by name. So it looks like this:
name | total_paid_amount
Bob | 1000
Mary | 6000
Bill | 4000
Ann | 3000
Finally, there is another table called to_date_payment that looks like this:
name | paid_to_date_amount
Bob | 1000
Mary | 8000
Bill | 3000
Ann | 3000
Joe | 4000
I want to compare the information in the to_date_payment table to the v_sum view and insert a new row in the contractor table to show an adjustment. Something like this:
INSERT INTO contractor
SELECT to_date_payment.name,
to_date_payment.paid_to_date_amount - v_sum.total_paid_amount,
GETDATE()
FROM to_date_payment
LEFT JOIN v_sum ON to_date_payment.name = v_sum.name
WHERE to_date_payment.paid_to_date_amount - v_sum.total_paid_amount <> 0
OR v_sum.name IS NULL
Are there any issues with using a view for this? My understanding, please correct me if I'm wrong, is that a view is just a result set of a query. And, since the view is of the table I'm inserting new records into, I'm afraid there could be data integrity problems.
Thanks for the help!
In order to fully understand what you are doing, you should also provide the definition for v_sum. Generally speaking, views might provide some advantages, especially when indexed. More details can be found here and here.
Simple usage of views do not provide performance benefits, but they are very good of providing abstraction over tables.
In your particular case, I do not see any problem in JOINing with the view, but I would worry about potential problems related to:
1) JOIN using VARCHARs instead of integer - ON to_date_payment.name = v_sum.name - if possible, try to JOIN on integer (ids or foreign keys ids) values, as it is faster (indexes applied on integer columns will have a smaller key, comparisons are slightly faster).
2) OR in queries - usually leads to performance problems. One thing to try is to change the SELECT like this:
SELECT to_date_payment.name,
to_date_payment.paid_to_date_amount - v_sum.total_paid_amount,
GETDATE()
FROM to_date_payment
JOIN v_sum ON to_date_payment.name = v_sum.name
WHERE to_date_payment.paid_to_date_amount - v_sum.total_paid_amount <> 0
UNION ALL
SELECT to_date_payment.name,
to_date_payment.paid_to_date_amount, -- or NULL if this is really intended
GETDATE()
FROM to_date_payment
-- NOT EXISTS is usually faster than LEFT JOIN ... IS NULL
WHERE NOT EXISTS (SELECT 1 FROM v_sum V WHERE V.name = to_date_payment.name)
3) Possible undesired result - by default, arithmetic involving NULL returns NULL. When there is no match in v_sum, then v_sum.total_paid_amount is NULL and to_date_payment.paid_to_date_amount - v_sum.total_paid_amount will evaluate to NULL. Is this correct? Maybe to_date_payment.paid_to_date_amount - ISNULL(v_sum.total_paid_amount, 0) is intended.
Related
I have two tables:
Account & Amount column
list of related accounts
Data samples:
Account | Amount
--------+---------
001 | $100
002 | $150
003 | $200
004 | $300
Account | Related Account
--------+------------------
001 | 002
002 | 003
003 | 002
My goal is to be able to aggregate all related accounts. From table two - 001,002 & 003 are actually all related to each other. What I would like to be able to do is to get a sum of all related accounts. Possibly ID 001 to 003 as Account #1, so I can aggregate them.
Result below
ID | Account | Amount
-----+-----------+--------
#1 | 001 | $100
#1 | 002 | $150
#1 | 003 | $200
#2 | 004 | $300
I can then manipulate the above table as below (final result)
ID | Amount
-----+--------
#1 | $450
#2 | $300
I tried doing a join, but it doesn't quite achieve what I want. I still have a problem relating account 001 with 003 (they are indirectly related because 002 is related with both 001 and 003.
If anyone can point me to the right direction, will be much appreciated.
Well, you really made this harder then it should be.
If you could change the data in the second table, so it will not contain reversed duplicates (in your sample data - 2,3 and 3,2) it would simplify the solution.
If you could refactor both tables into a single table, where the related column is a self referencing nullable foreign key, it would simplify the solution even more.
Let's assume for a minute you can't do either, and you have to work with the data as provided. So the first thing you want to do is to ignore the reversed duplicates in the second table. This can be done using a common table expression and a couple of case expressions.
First, create and populate sample tables (Please save us this step in your future questions):
DECLARE #TAccount AS TABLE
(
Account int,
Amount int
)
INSERT INTO #TAccount (Account, Amount) VALUES
(1, 100),
(2, 150),
(3, 200),
(4, 300)
DECLARE #TRelatedAccounts AS TABLE
(
Account int,
Related int
)
INSERT INTO #TRelatedAccounts (Account, Related) VALUES
(1,2),
(2,3),
(3,2)
You want to get only the first two records from the #TRelatedAccounts table.
This is the AccountAndRelated CTE.
Now, you want to left join the #TAccount table with the results of this query, so for each Account we will have the Account, the Amount, and the Related Account or NULL, if the account is not related to any other account or it's the first on the relationship chain.
This is the CTERecursiveBase CTE.
Then, based on that you can create a recursive CTE (called CTERecursive), and finally select the sum of amount from the recursive CTE based on the root of the recursion.
Here is the entire script:
;WITH AccountAndRelated AS
(
SELECT DISTINCT CASE WHEN Account > Related THEN Account Else Related END As Account,
CASE WHEN Account > Related THEN Related Else Account END As Related
FROM #TRelatedAccounts
)
, CTERecursiveBase AS
(
SELECT A.Account, Related, Amount
FROM #TAccount As A
LEFT JOIN AccountAndRelated As R ON A.Account = R.Account
)
, CTERecursive AS
(
SELECT Account As Id, Account, Related, Amount
FROM CTERecursiveBase
WHERE Related IS NULL
UNION ALL
SELECT Id, B.Account, B.Related, B.Amount
FROM CTERecursiveBase AS B
JOIN CTERecursive AS R ON B.Related = R.Account
)
SELECT Id, SUM(Amount) As TotalAmount
FROM CTERecursive
GROUP BY Id
Results:
Id TotalAmount
1 450
4 300
You can see a live demo on rextester.
Now, Let's assume you can modify the data of the second table. You can use the AccountAndRelated cte to get only the records you need to keep in the #TRelatedAccounts table - This means you can skip the AccountAndRelated cte and use the #TRelatedAccounts directly in the CTERecursiveBase cte.
You can see a live demo of that as well.
Finally, let's assume you can refactor your database. In that case, I would recommend joining the two tables together - so your #TAccount table would look like this:
Account Amount Related
1 100 NULL
2 150 1
3 200 2
4 300 NULL
Then you only need the recursive cte.
Here is a live demo of that option as well.
I am at a bit of a standstill here. I have a simple left outer join to a table that is returning an ID.
My code is as
Select distinct TenantID
,Name
,Name2
,TenantNumber
,Cashname
From Tenants
LEFT OUTER JOIN tCash
on TenantNumber = CashNumber
and tMoney.CashName = Tenants.Name2
My result set is as follows:
**TenantID | Name | Name2 | TenantNo | CashName**
100 |MyShop | John's shop | 12345 |John's shop
999 |MyShop | John's Shop | 12345 |John's shop
My Issue: for all intents and purposes, "John's shop" IS different from "John's Shop" - I am correctly joining my money table on the TenantNo and then on Name2, but name 2 is different by Case.
Question:
Is there any way to differentiate a join based on case sensitivity? I would not want to use UPPER or LOWER due to the fact that it would ruin the case on reporting.
Thanks!
Adding Table information below, please assume all columns are trimmed of whitespace.
tMoney
CashNumnbr | CashName
102504 Bill's Place
102374 Tom's Shop
12345 John's Shop
12345 John's shop
Tenants
TenantID | Name | Name2 |TenantNumber
1 |MyShop | John's Shop | 12345
2 |MyShop | John's shop | 12345
3 |Shoppee | Bill's Place | 102504
4 | Shop2 | Toms Shop | 102374
Since I want to join to get the correct TenantID for an AR report, I would want to make sure I am always bringing in the correct tenant. If the case is different, is there anything I can write to differentiate a situation like John's Shop?
The problem is that in the second row of your results "John's Shop" shouldn't have matched "John's shop"?
You can use a case sensitive collation.
This is probably best achieved by altering the collation of the columns involved to allow index use but you can also do it at run time with an explicit COLLATE clause as below.
SELECT DISTINCT TenantID,
Name,
Name2,
TenantNumber,
Cashname
FROM Tenants
LEFT OUTER JOIN tCash
ON TenantNumber = CashNumber
AND tMoney.CashName = Tenants.Name2 COLLATE Latin1_General_100_CS_AS
The comments about joining on id instead of name are likely correct though and would negate the need to do this at all.
If COLLATE ends up being too slow due to a lack of indexing, you could also do something like the below, where each 30 below must match the length of each column to avoid an invalid comparison.
LEFT OUTER JOIN tCash ON
TenantNumber = CashNumber
AND CONVERT(VARBINARY(30),LTRIM(RTRIM(tMoney.CashName))) = CONVERT(VARBINARY(30),LTRIM(RTRIM(Tenants.Name2)))
I have data in my PeopleInfo table where there are some people that have multiple records that I am trying to combine together into one record for a view.
All people data is the almost the same except for the PlanId and PlanName. So:
| FirstName | LastName | SSN | PlanId | PlanName | Status | Price1 | Price2 |
|-----------|----------|-----------|--------|----------|-----------|---------|--------|
| John | Doe | 123456789 | 1 | Plan A | Primary | 9.00 | NULL |
|-----------|----------|-----------|--------|----------|-----------|---------|--------|
| John | Doe | 123456789 | 2 | Plan B | Secondary | NULL | 5.00 |
I would like to only to have one John Doe record in my view that looked like this:
| FirstName | LastName | SSN | PlanId | PlanName | Status | Price1 | Price2 |
|-----------|----------|-----------|--------|----------|-----------|---------|--------|
| John | Doe | 123456789 | 1 | Plan A | Primary | 9.00 | 5.00 |
Where the Primary status determines which PlanId and PlanName to show. Can anyone help me with this query?
declare #t table ( FNAME varchar(10), LNAME varchar(10), SSN varchar(10), PLANID INT,PLANNAME varchar(10),stat varchar(10),Price1 decimal(18,2),Price2 decimal(18,2))
insert into #t (FNAME,LNAME,SSN,PLANID,PLANNAME,stat,Price1,Price2)values ('john','doe','12345',1,'PlanA','primary',9.00,NULL),('john','doe','12345',1,'PlanB','secondary',Null,8.00)
select
FNAME,
LNAME,
SSN,
MAX(PLANID)PLANID,
MIN(PLANNAME)PLANNAME,
MIN(stat)stat,
MIN(Price1)Price1,
MIN(Price2)Price2 from #t
GROUP BY FNAME,LNAME,SSN
(I can't yet add a comment, so have an answer.)
The only thing that troubles me here is that i am also determining which PlanId and PlanName since they are different and i want to show a specific one based off of the Status field of both records.
Then you don't even need GROUPing. It would be much simpler. Just SELECT WHERE 'Primary' = PlanName. Assuming that (A) there will always be this PlanName for each user, and (B) You are happy to ignore all others.
P.S. If you will only be using Primary and Secondary PlanNames, you might want to change the column to a bit named something like isPrimaryPlan where 1 indicates true and 0 false. However, if you might bring in e.g. Bronze and Consolation Prize Plans later, then you'll need to retain a more variable datatype. Perhaps store the plans in a separate table and have an int FOREIGN KEY to it... I could go on!
OK, I'm back after having a sleep, which has improved my brain slightly,
First, let the record reflect that I don't like the database design here. The People and Plans should be separate tables, linked by foreign keys - via a 3rd table, e.g. PeoplePlans. That takes me to another point: the people here have no primary key (at least not that you have specified). So when writing the below, I had to pick the SSN, assuming that will always be present and unique.
Anyway, something like this should work, with the caveat that I'm not going to replicate the database structure to test it.
select
FirstName,
LastName,
SSN,
PlanId,
PlanName,
Status,
_ca._sum_Price1,
_ca._sum_Price2
from
PeopleInfo as _Primary
cross apply (
select
sum(Price1) as _sum_Price1,
sum(Price2) as _sum_Price2
from
PeopleInfo
where
_Primary.SSN = SSN
) as _ca
where
'Primary' = Status;
This SELECTs all People with Primary status in order to get you those rows. It then CROSS APPLYs their Primary and any other rows and takes the summed Prices.
Hopefully this makes sense. If not, you'll have to do some reading about CROSS APPLY, in addition to about good relational database design. ;-)
I've shifted through views and other points and I've gotten to here. Take example below
Name | Quantity | Billed |
| | |
PC Tablet| 0 | 100 |
PC Tablet| 100 | -2345 |
Monitor | 9873 | 0 |
Keyboard | 200 | -300 |
So basically the select I would do off this view. I would want it to Bring in the data BUT it be ordered by the Name first so its in nice alphabetical order and also for a few reasons some of the records appear more then once (I think the most is 4 times). If you add the up the rows with duplicates the true 'quantity' and 'billed' would be correct.
NOTE: The actual query is very long but I broke it down for a simple example to explain the problem. The idea is the same but there is A LOT MORE columns that needs to be added together... So I'm looking for a query that would bring them together if it contains the same name. I've tried a bunch of different queries with no success either it rolls ALL the rows into one. or it won't work and I get a bunch of null errors/ name column is invalid in the select list/group by because it's not an aggregate function..
Is this even possible?
Try:
SELECT A.Name, A.TotalQty, B.TotalBilled
FROM (
SELECT Name, SUM(Quantity) as TotalQty
FROM YourTableHere
GROUP BY Name
) A
INNER JOIN
(
SELECT Name, SUM(Billed) as TotalBilled
FROM YourTableHere
GROUP BY Name
) B
ON A.Name = B.Name
Suppose I have the following data:
OrderNumber | CustomerName | CustomerAddress | CustomerCode
1 | Chris | 1234 Test Drive | 123
2 | Chris | 1234 Test Drive | 123
How can I detect that the columns "CustomerName", "CustomerAddress", and "CustomerCode" all correlate perfectly? I'm thinking that Sql Server data mining is probably the right tool for the job, but I don't have too much experience with that.
Thanks in advance.
UPDATE:
By "correlate", I mean in the statistics sense, that whenever column a is x, column b will be y. In the above data, The last three columns correlate with each other, and the first column does not.
The input of the operation would be the name of the table, and the output would be something like :
Column 1 | Column 2 | Certainty
CustomerName | CustomerAddress | 100%
CustomerAddress | CustomerCode | 100%
There is a 'functional dependency' test built in to the SQL Server Data Profiling component (which is an SSIS component that ships with SQL Server 2008). It is described pretty well on this blog post:
http://blogs.conchango.com/jamiethomson/archive/2008/03/03/ssis-data-profiling-task-part-7-functional-dependency.aspx
I have played a little bit with accessing the data profiler output via some (under-documented) .NET APIs and it seems doable. However, since my requirement dealt with distribution of column values, I ended up going with something much simpler based on the output of DBCC STATISTICS. I was quite impressed by what I saw of the profiler component and the output viewer.
What do you mean by correlate? Do you just want to see if they're equal? You can do that in T-SQL by joining the table to itself:
select distinct
case when a.OrderNumber < b.OrderNumber then a.OrderNumber
else b.OrderNumber
end as FirstOrderNumber,
case when a.OrderNumber < b.OrderNumber then b.OrderNumber
else a.OrderNumber
end as SecondOrderNumber
from
MyTable a
inner join MyTable b on
a.CustomerName = b.CustomerName
and a.CustomerAddress = b.CustomerAddress
and a.CustomerCode = b.CustomerCode
This would return you:
FirstOrderNumber | SecondOrderNumber
1 | 2
Correlation is defined on metric spaces, and your values are not metric.
This will give you percent of customers that don't have customerAddress uniquely defined by customerName:
SELECT AVG(perfect)
FROM (
SELECT
customerName,
CASE
WHEN COUNT(customerAddress) = COUNT(DISTINCT customerAddress)
THEN 0
ELSE 1
END AS perfect
FROM orders
GROUP BY
customerName
) q
Substitute other columns instead of customerAddress and customerName into this query to find discrepancies between them.