I have a bunch of bank transactions in a table in SQL.
Example: http://sqlfiddle.com/#!6/6b2c8/1/0
I need to identify the transactions that are made between these 2 linked accounts. The Accounts table (not shown) links these 2 accounts to the one source (user).
For example:
I have an everyday account, and a savings account. From time to time, I may transfer money from my everyday account, to my savings account (or vice-versa).
The transaction descriptions are usually similar (Transfer to xxx/transfer from xxx), usually on the same day, and obviously, the same dollar amount.
EDIT: I now have the following query (dumbed down), which works for some scenarios
Basically, I created 2 temp tables with all withdrawals and deposits that met certain criteria. I then join them together, based on a few requirements (same transaction amount, different account # etc). Then using the ROW_NUMBER function, I have ordered which ones are more likely to be inter-account transactions.
I now have an issue where if, for example:
$100 transferred from Account A to Account B
$100 Transferred from Account B to Account C
My query will match the transfer between Account A and C, then there is only one transaction for account B, and it will not be matched. So essentially, instead of receiving 2 rows back (2 deposits, lined up with 2 withdrawals), I only get 1 row (1 deposit, 1 withdrawal), for a transfer from A to B :(
INSERT INTO #Deposits
SELECT t.*
FROM dbo.Customer c
INNER JOIN dbo.Source src ON src.AppID = app.AppID
INNER JOIN dbo.Account acc ON acc.SourceID = src.SourceID
INNER JOIN dbo.Tran t ON t.AccountID = acc.AccountID
WHERE c.CustomerID = 123
AND t.Template = 'DEPOSIT'
INSERT INTO #Withdrawals
SELECT t.*
FROM dbo.Customer c
INNER JOIN dbo.Source src ON src.AppID = app.AppID
INNER JOIN dbo.Account acc ON acc.SourceID = src.SourceID
INNER JOIN dbo.Tran t ON t.AccountID = acc.AccountID
WHERE c.CustomerID = 123
AND t.Template = 'WITHDRAWAL'
;WITH cte
AS ( SELECT [...] ,
ROW_NUMBER() OVER ( PARTITION BY d.TranID ORDER BY SUM( CASE WHEN d.TranDate = d.TranDate THEN 2 ELSE 1 END), w.TranID ) AS DepRN,
ROW_NUMBER() OVER ( PARTITION BY w.TranID ORDER BY SUM( CASE WHEN d.TranDate = d.TranDate THEN 2 ELSE 1 END ), d.TranID ) AS WdlRN
FROM #Withdrawal w
INNER JOIN d ON w.TranAmount = d.TranAmount -- Same transaction amount
AND w.AccountID <> d.AccountID -- Different accounts, same customer
AND w.TranDate BETWEEN d.TranDate AND DATEADD(DAY, 3, d.TranDate) -- Same day, or within 3 days
GROUP BY [...]
)
SELECT *
FROM cte
WHERE cte.DepRN = cte.WdlRN
Maybe this is a start? I don't think we have enough info to say whether this would be reliable or would cause a lot of "false positives".
select t1.TransactionID, t2.TransactionID
from dbo.Transactions as t1 inner join dbo.Transactions as t2
on t2.AccountID = t2.AccountID
and t2.TransactionDate = t1.TransactionDate
and t2.TransactionAmount = t1.TransactionAmount
and t2.TransactionID - t1.TransactionID between 1 and 20 -- maybe??
and t1.TransactionDesc like 'Transfer from%'
and t2.TransactionDesc like 'Transfer to%'
and t2.TransactionID > t1.TransactionID
Related
I have a requirement to show 6 months data in a Tableau Dashboard. For that I created a view in SQL server which has a join on 3 tables T1,T2 and T3. Each table is having 20 million records and the number will keep on increasing. The problem is that when I execute the query it takes a long time about 2 hours and nothing is displayed on dashboard. Is there a way to increase the performance of the query.
Follwing is the query for cretaing a view. T1,T2 and T3 are three tables and trackingIdentifier in T1 is the foreign key in T3 as transmissionTID and
trackingIdentifier in T2 is the foreign key in T3 as claimSubmissionTID and
Indexes are built on trackingIdentifier , transmissionTID, trackingIdentifier.
RAM: 32768 MB
Processors: 32
SQL Version : 11.0.3381.0
Create View [dbo].[claimReceivingDashboard] AS
(
Select
trans.trackingIdentifier as trackingIdentifier,
trans.receiptdt as downloadDate,
trans.transactiondate as TransactionDate,
trans.purpose as Purpose ,
trans.recordCount as RecordCount ,
cast(acv.activityNet as decimal(12,4)) as Net,
d.Caption as DispositionID ,
trans.SenderID as SenderId ,
trans.transmissionfilename as filename ,
trans.damaninscomp as ReceiverID,
claim.claimid as claimid ,
claim.claimproviderid as providerid ,
claim.trackingIdentifier as claimTrackingId
from
endisposition d , t1 trans , t2 acv, t3 claim
where
claim.dispositionId = d.EnumId
and trans.trackingIdentifier = acv.transmissionTID
and claim.trackingIdentifier = acv.claimsubmissionTID
and cast(t1.transationdate as date) > cast(Getdate() - 120 as date))
GO
You need to join the right attributes between endisposition d, t1, t2 and t3 below is the sample
endisposition d inner join t1 trans
on d.joiningcolumn = trans.joiningcolumn
inner join t2 acv
on trans.trackingidentifier = acv.transmissionid
inner<<left depends on the requirement>> join t3 claim
on acv.claimsubmissionid = claim.trackingidentifier
where condition
you can avoid condition by calculating this in variable cast(Getdate() - 120 as date))
If these joining conditions are not indexed properly create indexes by looking at execution plan
Could you explain how I get this particular table result?
My 4 queries to individually get each column separately are also below.
I am not sure on method here do I nest the last 3 queries into the first or do I use a union between the queries.
Bearing in mind that the information in each one doesn't really match I assume Union or Union All isn't going to be useful.
Would a derived table be a better method. Sorry my SQL skills are fairly basic.
I need to also retain the ability to 'tweak' the where clauses as my admin decides to exclude certain records later (you IT folks will be used to that!)
Some the ability to alter the where clauses would be good in a solution.
Just to make it more annoying for ya ;-)
Query table would need to look a little like this
Company Department Total_B Total_R Total_Ret RushJobs
ACME LSD 2 100 24 3
The four queries (that work separately to get each column above are here ( I have left in the respective Group By and where clauses incidentally I_Department does map to just Department in the case of 2nd query.
-- Total B count query from B
Select Company,Department, count(*) as Total_B from B
Group by Company,Department
Order BY Company;
--Select h count from h table
Select count(*) as Total_R, I_Department from H
where L ='re-box'
Group By IDepartment
-- Select r count
Select Company,Department,Count (B_Number) AS Total_Ret
from P Inner Join B ON P.Record_Number = B.B_Number
where P.Request_Date > = 'SOMEDATE' and P.Request_Date < = 'SOMEDATERANGE'
Group By Company,Department
-- Select Rush Jobs
Select Company,Department,Count (*) as RushJobs
from Res
Inner Join B on Res.Item_Number = B.B_Number
where Res.Setup_Date >= 'Somedate' and Res.Setup_Date<= 'somedaterange'
and Res.Res_Priority = '1'
Group By Company,Department
So final table
<table><TBODY>
<TR>
<TH>Company</TH>
<TH>Department</TH>
<TH>Total_B</TH>
<TH>Total_R </TH>
<TH>Total_Ret</TH>
<TH>RushJobs</TH></TR>
<TR>
<TD>ACME</TD>
<TD>LSD</TD>
<TD>100</TD>
<TD>2</TD>
<TD>4</TD>
<TD>1</TD></TR></TBODY></table>
One approach would be to use a Common table expression (CTE) aka with statement..
This allows each query to continue to be independent allowing you to easily twerk (I was going to correct that typo but it was just too funny) the where clauses for each and combines the results in the end returning 1 record with 4 columns.
-- Total B count query from B
With B as (
Select Company,Department, count(*) as Total_B from B
Group by Company,Department
Order BY Company),
H as (
--Select h count from h table
Select count(*) as Total_R, I_Department from H
where L ='re-box'
Group By IDepartment),
R as (-- Select r count
Select Company,Department,Count (B_Number) AS Total_Ret
from P Inner Join B ON P.Record_Number = B.B_Number
where P.Request_Date > = 'SOMEDATE' and P.Request_Date < = 'SOMEDATERANGE'
Group By Company,Department),
RushJobs as (-- Select Rush Jobs
Select Company,Department,Count (*) as RushJobs
from Res
Inner Join B on Res.Item_Number = B.B_Number
where Res.Setup_Date >= 'Somedate' and Res.Setup_Date<= 'somedaterange'
and Res.Res_Priority = '1'
Group By Company,Department)
SELECT coalesce(B.Company, R.Company, RJ.Company)
, coalesce(B.Department,R.Department, Rj.Department)
, B.Total_B, H.Total_R, R.Total_Ret, RJ.RushJobs
FROM
FULL OUTER JOIN H
on B.Company = H.Company
FULL OUTER JOIN R
on B.company = R.Company
and B.Department = R.Department
FULL OUTER JOIN RushJobs RJ
on H.company = RJ.Company
and H.Department = RJ.Department
Having issues getting a dataset to return with one date per client in the query.
Requirements:
Must have the recent date of transaction per client list for user
Will need have the capability to run through EXEC
Current Query:
SELECT
c.client_uno
, c.client_code
, c.client_name
, c.open_date
into #AttyClnt
from hbm_client c
join hbm_persnl p on c.resp_empl_uno = p.empl_uno
where p.login = #login
and c.status_code = 'C'
select
ba.payr_client_uno as client_uno
, max(ba.tran_date) as tran_date
from blt_bill_amt ba
left outer join #AttyClnt ac on ba.payr_client_uno = ac.client_uno
where ba.tran_type IN ('RA', 'CR')
group by ba.payr_client_uno
Currently, this query will produce at least 1 row per client with a date, the problem is that there are some clients that will have between 2 and 10 dates associated with them bloating the return table to about 30,000 row instead of an idealistic 246 rows or less.
When i try doing max(tran_uno) to get the most recent transaction number, i get the same result, some have 1 value and others have multiple values.
The bigger picture has 4 other queries being performed doing other parts, i have only included the parts that pertain to the question.
Edit (2011-10-14 # 1:45PM):
select
ba.payr_client_uno as client_uno
, max(ba.row_uno) as row_uno
into #Bills
from blt_bill_amt ba
inner join hbm_matter m on ba.matter_uno = m.matter_uno
inner join hbm_client c on m.client_uno = c.client_uno
inner join hbm_persnl p on c.resp_empl_uno = p.empl_uno
where p.login = #login
and c.status_code = 'C'
and ba.tran_type in ('CR', 'RA')
group by ba.payr_client_uno
order by ba.payr_client_uno
--Obtain list of Transaction Date and Amount for the Transaction
select
b.client_uno
, ba.tran_date
, ba.tc_total_amt
from blt_bill_amt ba
inner join #Bills b on ba.row_uno = b.row_uno
Not quite sure what was going on but seems the Temp Tables were not acting right at all. Ideally i would have 246 rows of data, but with the previous query syntax it would produce from 400-5000 rows of data, obviously duplications on data.
I think you can use ranking to achieve what you want:
WITH ranked AS (
SELECT
client_uno = ba.payr_client_uno,
ba.tran_date,
be.tc_total_amt,
rnk = ROW_NUMBER() OVER (
PARTITION BY ba.payr_client_uno
ORDER BY ba.tran_uno DESC
)
FROM blt_bill_amt ba
INNER JOIN hbm_matter m ON ba.matter_uno = m.matter_uno
INNER JOIN hbm_client c ON m.client_uno = c.client_uno
INNER JOIN hbm_persnl p ON c.resp_empl_uno = p.empl_uno
WHERE p.login = #login
AND c.status_code = 'C'
AND ba.tran_type IN ('CR', 'RA')
)
SELECT
client_uno,
tran_date,
tc_total_amt
FROM ranked
WHERE rnk = 1
ORDER BY client_uno
Useful reading:
Ranking Functions (Transact-SQL)
ROW_NUMBER (Transact-SQL)
WITH common_table_expression (Transact-SQL)
Using Common Table Expressions
I need to develop a query that will count the total number of 'open' cases per month.
I have a 'cases' table with an id and a name, and a 'state_changes' table with a datetime column, a caseid column and a state.
How can I calculate the number of cases in each month that have a record with state 'open' in the past, but without a corresponding record with state closed?
I'm using SQL server 2000.
This should get you close (T-SQL):
SELECT
MONTH(s.casedate) m,
YEAR(s.casedate) y,
COUNT(DISTINCT c.caseid) count_cases
FROM
cases c
INNER JOIN state_changes s ON s.caseid = c.caseid
WHERE
s.state = 'open' /* "with state 'open'" */
AND s.casedate < GETDATE() /* "in the past" */
AND NOT EXISTS ( /* "without corresp. record with state 'closed'" */
SELECT 1 FROM state_changes i WHERE i.caseid = s.caseid AND i.state = 'closed'
)
GROUP BY
MONTH(s.casedate),
YEAR(s.casedate)
EDIT: To make a statistic over all twelve months (independent of actual cases existing in these months) you need a small helper table (let's call it month), that contains nothing but one column (let's call that month as well) with numbers from 1 to 12. Then you join against it:
SELECT
m.month,
COUNT(DISTINCT c.caseid) count_cases
FROM
cases c
INNER JOIN state_changes s ON s.caseid = c.caseid
LEFT JOIN month m ON m.month = MONTH(s.casedate)
WHERE
s.state = 'open'
AND YEAR(c.createddate) = YEAR(GETDATE()) /* whatever */
AND NOT EXISTS (
SELECT 1 FROM state_changes i WHERE i.caseid = s.caseid AND i.state = 'closed'
)
GROUP BY
m.month
ORDER BY
m.month
Create a query of the state changes tables for open events and one for close events.
Create a query that does an outer join of the open to the closed on the case ID returning the case ID from both queries
Query the latter query result for rows where the ID from the "close" event query is null
Count the number of rows in the latter query result.
Something very roughly like (off the top of my head, without correction):
SELECT COUNT (T1.CaseID) FROM (SELECT T1.CaseID AS T1_CaseID, T2.CaseID AS T2_CaseID
FROM ((SELECT CaseID FROM state_changes WHERE state = 'open' AND timestamp BETWEEN 1-Jan-09 AND 30-Jan-09) AS T1 OUTER JOIN (SELECT CaseID FROM state_changes WHERE state = 'closed' AND timestamp BETWEEN 1-Jan-09 AND 30-Jan-09) AS T2 ON T1.CaseID = T2.CaseID)) WHERE T2_CaseID = NULL
I suppose this is quite a common SP present in socialnetworks and community type websites.
I have this SP that returns all of a user's friends on their 'friends' page order by those currently online, then alphabetically. It's taking quite a while to load and I am looking to speed it up.
I remember reading somewhere on SO that breaking up multiple joins into smaller result sets might speed it up. I haven't tried this yet but I am curious to see what other recommendations SO might have on this procedure.
DECLARE #userID INT -- This variable is parsed in
DECLARE #lastActivityMinutes INT
SET #lastActivitytMinutes = '15'
SELECT
Active = CASE WHEN DATEDIFF("n", b.LastActivityDate ,GETDATE()) < #lastActivityMinutes THEN 1 ELSE 0 END,
a.DisplayName, a.ImageFile, a.UserId, b.LastActivityDate
FROM
Profile AS a
INNER JOIN aspnet_Users as b on b.userId = a.UserId
LEFT JOIN Friend AS x ON x.UserID = a.UserID
LEFT JOIN Friend AS z ON z.FriendID = a.UserID
WHERE ((x.FriendId = #userID AND x.status = 1) -- Status = 1 means friendship accepted
OR (z.UserID = #userID AND z.Status = 1))
GROUP BY a.userID, a.DisplayName, a.ImageFile, a.UserId, b.LastActivityDate
ORDER BY Active DESC, DisplayName ASC
I am not sure how to clip in my execution plan but the main bottle neck seems to be occurring on a MERGE JOIN (Right Outer Join) that's costing me 29%. At various stages, Parallelism is also costing 9%, 6%, 5% and 9% for a total of 29% as well.
My initial thoughts are to first return the JOINED results from the Profile and aspnet tables with a CTE and then do LEFT JOINS to the Friends table.
You are joining Friend twice, using a LEFT JOIN, then you are removing the NULL's returned by the LEFT JOIN by WHERE condition, then using GROUP BY to get rid on distincts.
This is not the best query possible.
Why don't you just use this:
SELECT Active = CASE WHEN DATEDIFF("n", b.LastActivityDate ,GETDATE()) < #lastActivityMinutes THEN 1 ELSE 0 END,
a.DisplayName, a.ImageFile, a.UserId, b.LastActivityDate
FROM (
SELECT FriendID
FROM Friends
WHERE UserID = #UserId
AND status = 1
UNION
SELECT UserID
FROM Friends
WHERE FriendID = #UserId
AND status = 1
) x
INNER JOIN
Profile AS a
ON a.UserID = x.FriendID
INNER JOIN
aspnet_Users as b
ON b.userId = a.UserId
ORDER BY
Active DESC, DisplayName ASC