SQL Server, table-valued functions slow processing - sql-server

I tried everything but I couldn't overcome this problem.
I have a table-valued function.
When I call this function with
SELECT * FROM Ratings o1
CROSS APPLY dbo.FN_RatingSimilarity(50, 497664, 'Cosine') o2
WHERE o1.trackId = 497664
It takes a while to be executed. But when I do this.
SELECT * FROM Ratings o1
CROSS APPLY dbo.FN_RatingSimilarity(50, o1.trackId, 'Cosine') o2
WHERE o1.trackId = 497664
It is executed in 32 seconds. I created all indexes but It didn't help.
My function by the way:
ALTER FUNCTION [dbo].[FN_RatingSimilarity]
(
#trackId INT,
#nTrackId INT,
#measureType VARCHAR(100)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT o2.id,
o2.name,
o2.releaseDate,
o2.numberOfRatings,
o2.averageRating,
COUNT(1) as numberOfSharedUsers,
CASE #measureType
WHEN 'Cosine' THEN SUM(o3.score*o4.score)/(0.01+SQRT(SUM(POWER(o3.score,2))) * SQRT(SUM(POWER(o4.score,2))))
WHEN 'AdjustedCosine' THEN SUM((o3.score-o5.averageRating)*(o4.score-o5.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o5.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o5.averageRating, 2))))
WHEN 'Pearson' THEN SUM((o3.score-o1.averageRating)*(o4.score-o2.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o1.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o2.averageRating, 2))))
END as similarityRatio
FROM dbo.Tracks o1
INNER JOIN dbo.Tracks o2 ON o2.id != #trackId
INNER JOIN dbo.Ratings o3 ON o3.trackId = o1.id
INNER JOIN dbo.Ratings o4 ON o4.trackId = o2.id AND o4.userId = o3.userId
INNER JOIN dbo.Users o5 ON o5.id = o4.userId
WHERE o1.id = #trackId
AND o2.id = ISNULL(#nTrackId, o2.id)
GROUP BY o2.id,
o2.name,
o2.releaseDate,
o2.numberOfRatings,
o2.averageRating
)
Any help would be appreciated.
Thanks.
Emrah

I believe that your bottleneck is the calculations + your very expensive inner joins.
The way your are joining is basically creating a cross join - It is returning a result set with all ther records linked to all other records, Except the one for which the id is supplied. Then you go and add to that result set with the other inner joins.
For every inner join, SQL goes and creates a result set with all the rows matching.
So the first thing you do in your query is to tell SQL to basically do a cross join on the same table. (I am assuming you are still following, that looks pretty advanced so I'll just take you are familiar with advanced SQL syntax and operators)
Now in the next inner join, you are applying the Results table to your newly created huge result set, and only then filtering out the ones not both tables.
So as a start, see if you can't do your joins the other way around. (This really depends on your table record count and record sizes). Try to have the smallest result sets first and then join onto that.
The second thing you might want to try is to firstly limit your result set even before the joins.So start with a CTE where you filter for o1.id = #trackId. Then select * from this CTE , do your joins on the CTE and then filter in your query for o2.id = ISNULL(#nTrackId, o2.id)
I will work on an example, stay tuned...
--
Ok, I added an example, did a quick test and the values returned are the same. Run this through your data and let us know if there is any improvement. (Note, this does not address the INNER JOIN order point discussed, still do play around with that.)
Example:
ALTER FUNCTION [dbo].[FN_RatingSimilarity_NEW]
(
#trackId INT,
#nTrackId INT,
#measureType VARCHAR(100)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
WITH CTE_ALL AS
(
SELECT id,
name,
releaseDate,
numberOfRatings,
averageRating
FROM dbo.Tracks
WHERE id = #trackId
)
SELECT o2.id,
o2.name,
o2.releaseDate,
o2.numberOfRatings,
o2.averageRating,
COUNT(1) as numberOfSharedUsers,
CASE #measureType
WHEN 'Cosine' THEN SUM(o3.score*o4.score)/(0.01+SQRT(SUM(POWER(o3.score,2))) * SQRT(SUM(POWER(o4.score,2))))
WHEN 'AdjustedCosine' THEN SUM((o3.score-o5.averageRating)*(o4.score-o5.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o5.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o5.averageRating, 2))))
WHEN 'Pearson' THEN SUM((o3.score-o1.averageRating)*(o4.score-o2.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o1.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o2.averageRating, 2))))
END as similarityRatio
FROM CTE_ALL o1
INNER JOIN dbo.Tracks o2 ON o2.id != #trackId
INNER JOIN dbo.Ratings o3 ON o3.trackId = o1.id
INNER JOIN dbo.Ratings o4 ON o4.trackId = o2.id AND o4.userId = o3.userId
INNER JOIN dbo.Users o5 ON o5.id = o4.userId
WHERE o2.id = ISNULL(#nTrackId, o2.id)
GROUP BY o2.id,
o2.name,
o2.releaseDate,
o2.numberOfRatings,
o2.averageRating
)

Related

Multiple INNER JOIN SELECT query not returning data

This is a stored procedure to select the full details of a wine.
It takes as a parameter a storage location ID.
This builds properly and is similar in structure to many sp's I have written that work,
however it returns no data. Is there something I am missing?
CREATE PROCEDURE [sp_retrieveInventory_Full]
(
#StorageLocationID [int]
)
AS
BEGIN
SELECT [WineName],
[RackID],
[RackColumn],
[RackRow],
[WineTypeID],
[WineVarietyID],
[VintnerName],
[dbo].[Vintner].[Country],
[dbo].[Vintner].[StateProvince],
[dbo].[Vintner].[Region],
[VendorName],
[Vintage],
[PurchaseDate],
[PurchasePrice],
[BottleSizeID],
[ABV],
[DrinkByDate],
[FoodType],
[TastingNotes],
[RatedBy],
[RatingScore]
FROM [dbo].[StorageLocation]
INNER JOIN [dbo].[Wine]
ON [dbo].[StorageLocation].[WineID] = [dbo].[Wine].[WineID]
INNER JOIN [dbo].[Vintner]
ON [dbo].[Vintner].[VintnerID] = [dbo].[Wine].[VintnerID]
INNER JOIN [dbo].[Vendor]
ON [dbo].[Vendor].[VendorID] = [dbo].[Wine].[VendorID]
INNER JOIN [dbo].[Pairing]
ON [dbo].[Pairing].[PairingID] = [dbo].[Wine].[PairingID]
INNER JOIN [dbo].[Rating]
ON [dbo].[Rating].[RatingID] = [dbo].[Wine].[RatingID]
WHERE [dbo].[StorageLocation].[StorageLocationID] = #storageLocationID
END
GO
When a query reruns no result it's because no records exists that satisfy the conditions in it's where clause, or, in case of queries that contains inner joins, at least one of the joins on condition returns no records.
The easiest way to figure out what join is messing up your query is to start removing the joins one by one, but to do that, you have to first change the select clause.
What I like to do is this: Start with all the joins except the last one - see if it returns any records:
SELECT 1
FROM [dbo].[StorageLocation]
INNER JOIN [dbo].[Wine]
ON [dbo].[StorageLocation].[WineID] = [dbo].[Wine].[WineID]
INNER JOIN [dbo].[Vintner]
ON [dbo].[Vintner].[VintnerID] = [dbo].[Wine].[VintnerID]
INNER JOIN [dbo].[Vendor]
ON [dbo].[Vendor].[VendorID] = [dbo].[Wine].[VendorID]
INNER JOIN [dbo].[Pairing]
ON [dbo].[Pairing].[PairingID] = [dbo].[Wine].[PairingID]
-- INNER JOIN [dbo].[Rating]
-- ON [dbo].[Rating].[RatingID] = [dbo].[Wine].[RatingID]
WHERE [dbo].[StorageLocation].[StorageLocationID] = #storageLocationID
If not, comment out the next join and run the query again. Rinse and repeat.
Please note, however, that there might be more than one inner join that prevents the query from returning records.

SQL Server : Cross Join in GreenPlum (Pre-LATERAL version of Postgres)

I'm attempting to convert the following SQL Server query into a GreenPlum version of the query:
INSERT INTO #TMP1 (part_id, file_id, location, measure_date)
SELECT DISTINCT
pt.part_id, qf.file_id, qf.edl_desc, pt.measure_date
FROM
part pt WITH (NOLOCK)
INNER JOIN
file_model qm with (nolock) on qm.file_model_id = pt.file_model_id
INNER JOIN
file qf with (nolock) on qf.file_id = qm.file_id;
INSERT INTO #part_list (file_id, part_id, measure_date)
SELECT DISTINCT
t1.file_id, k.part_id, k.measure_date
FROM
#TMP1 t1 WITH (NOLOCK)
CROSS APPLY
(SELECT DISTINCT TOP (300)
t2.part_id, t2.measure_date
FROM
#TMP1 t2 WITH (NOLOCK)
WHERE
t1.file_id = t2.file_id and t1.location = t2.location
ORDER BY
t2.measure_date DESC) k
WHERE
t1.measure_date >= dateadd(day, 30, getdate());
The idea here being that the final table contains the most recent up to 300 parts for all parts programs that are active (ie manufactured something) in the last 30 days.
Per the answers to this question, I am aware that LATERAL JOIN would do it, except my organization is using an older version of Postgres that does not have LATERAL, so I was left with implementing the following function instead:
CREATE FUNCTION BuildActiveParts(p_day INT, p_n INT)
RETURNS SETOF RECORD --TABLE (part_id bigint,file_id int, measure_date timestamp, location varchar(255))
AS $$
DECLARE
part_active RECORD;
part_list RECORD;
BEGIN
FOR part_active IN
SELECT DISTINCT qf.file_id, qf.location
FROM part pt
INNER JOIN file_model qm on qm.file_model_id = pt.file_model_id
INNER JOIN file qf on qf.file_id = qm.file_id WHERE pt.measure_date >= current_date - p_day LOOP
FOR part_list IN
SELECT DISTINCT pt.part_id, qf.file_id, pt.measure_date, qf.location
FROM part pt
INNER JOIN file_model qm on qm.file_model_id = pt.file_model_id
INNER JOIN file qf on qf.file_id = qm.file_id WHERE qf.file_id = part_active.file_id
AND qf.location = part_active.location
ORDER BY pt.measure_date DESC LIMIT p_n LOOP
RETURN NEXT part_list;
END LOOP;
END LOOP;
END
$$ LANGUAGE plpgsql;
-- Later used in:
--Build list of all active programs in last p_day days. This temporary table is a component of a larger function that produces a table based on this and other other calculations, called daily.
-- Note: this insert yields 'function cannot execute because it accesses relation'
INSERT INTO TMP_part_list ( part_id, file_id, measure_date, location)
SELECT DISTINCT * FROM BuildActiveParts(p_day, p_n) AS active_parts (part_id int, file_id text, measure_date timestamp, location text )
;
Unfortunately, this function is used in inserts to another table (an unavoidable reality of my business requirements), so while the function returns nice happy results when run in isolation, I get a big angry function cannot execute on segment because it accesses relation when I try to use it for its intended purpose. While I've seen suggestions to the effect of "make a VIEW instead", that's not really an option because a view resulting from the script this functionality is a part of would take too long to query.
What can I do, beyond embarking on a months-long excursion through a jungle of red tape to convince my organization to update their stuff, to resolve this?
Edit: Here are some attempts based on comments:
Attempt with function, did not work because of function cannot execute on segment because it accesses relation:
DROP FUNCTION IF EXISTS BuildRecentParts(TEXT, TEXT, INT);
CREATE FUNCTION BuildRecentParts(file_id TEXT, location_in TEXT, p_n INT)
RETURNS SETOF RECORD --TABLE (measure_date timestamp, part_id bigint)
AS $$
DECLARE
part_list RECORD;
BEGIN
FOR part_list IN
SELECT DISTINCT pt.measure_date, pt.part_id
FROM part pt
INNER JOIN file_model qm on qm.file_model_id = pt.file_model_id
INNER JOIN file qf on qf.file_id = qm.file_id
WHERE qf.file_id = file_id
AND qf.edl_desc = location_in
ORDER BY pt.measure_date DESC LIMIT p_n LOOP
RETURN NEXT part_list;
END LOOP;
END
$$ LANGUAGE plpgsql;
SELECT DISTINCT qf.file_id, qf.edl_desc, (SELECT pti.measure_date, pti.part_id FROM part pti
INNER JOIN file_model qmi on qmi.file_model_id = pti.file_model_id
INNER JOIN file qfi on qfi.file_id = qmi.file_id
WHERE qfi.file_id = qf.file_id
AND qfi.edl_desc = qf.edl_desc
ORDER BY pti.measure_date DESC LIMIT 300)
FROM part pt
INNER JOIN file_model qm on qm.file_model_id = pt.file_model_id
INNER JOIN file qf on qf.file_id = qm.file_id
WHERE pt.measure_date >= current_date - 30 ;
Attempt without function, will not work because subquery has multiple columns:
CREATE TEMPORARY TABLE TMP_TMP1 (part_id bigint, file_id varchar(255), location varchar(255), measure_date timestamp) DISTRIBUTED BY (part_id);
INSERT INTO TMP_TMP1 (part_id, file_id, location, measure_date)
SELECT DISTINCT pt.part_id, qf.file_id, qf.edl_desc, pt.measure_date
FROM part pt
INNER JOIN file_model qm on qm.file_model_id = pt.file_model_id
INNER JOIN file qf on qf.file_id = qm.file_id;
ANALYZE TMP_TMP1;
SELECT DISTINCT t1.file_id, t1.location, (SELECT t2.measure_date, t2.part_id FROM TMP_TMP1 t2
WHERE t2.file_id = t1.file_id
AND t2.location = t1.location
ORDER BY t2.measure_date DESC LIMIT 300)
FROM TMP_TMP1 t1
WHERE t1.measure_date >= current_date - 30;
I also attempted a recursive CTE, but found that that was unsupported.
Between answers here and from architects at my organization, we decided that we have struck a GreenPlum limitation that would be too costly to overcome, the logic that performs the Cross Join will be shifted to the R script that calls the stored procedure that this functionality would have been a part of.
Well, Greenplum doesn't have dirty reads so you can't implement the nolock hint you have. That is probably a good thing too. I would recommend removing that from SQL Server too.
I think the best solution is to use an Analytical function here instead of that function or even a correlated subquery which Greenplum supports. It is also more efficient in SQL Server to use this approach.
SELECT sub2.part_id, sub2.location, sub2.measure_date
FROM (
SELECT sub1.part_id, sub1.location, sub1.measure_date, row_number() over(partition by sub1.part_id order by sub1.measure_date desc) as rownum
FROM (
SELECT pt.part_id, qf.edl_desc as location, pt.measure_date
FROM part pt
INNER JOIN file_model qm on qm.file_model_id = pt.file_model_id
INNER JOIN file qf on qf.file_id = qm.file_id
WHERE pt.measure_date >= (now() - interval '30 days')
GROUP BY pt.part_id, qf.edl_desc, pt.measure_date
) AS sub1
) as sub2
WHERE sub2.rownum <= 300;
Now, I had to guess at your data because it looks like you could get into trouble with your original query if you have multiple qf.qcc_file_desc values because your original group by includes this. If you had multiple values, then things would get ugly.
I'm also not 100% sure on the row_number function without knowing your data. It might be this instead:
row_number() over(partition by sub1.part_id, sub1.location order by sub1.measure_date desc)

How to increase performance of this query?

I have an SQL query, it is running on MSSQL 2008 R2
View vMobileLastMobileHistory has about 1000 rows and
select * from vMobileLastMobileHistory is taking 0.2 sec
but this query is taking 5 seconds, how can I optimize this code?
(I think the problem is INTERSECT but I dont know how change this)
SELECT DISTINCT *
FROM
(
SELECT vMobileLastMobileHistory.*
FROM vMobileLastMobileHistory
LEFT OUTER JOIN MobileType_DomainAction ON
MobileType_DomainAction.tiMobileType = vMobileLastMobileHistory.tiMobileType
LEFT OUTER JOIN MobileType_User ON
MobileType_User.MobileID = MobileType_DomainAction.ID
WHERE MobileType_User.UserID = #UserID OR #UserID = - 1
INTERSECT
SELECT vMobileLastMobileHistory.*
FROM vMobileLastMobileHistory
LEFT OUTER JOIN dbo.Region_User ON
dbo.vMobileLastMobileHistory.strRegion = dbo.Region_User.strRegion
WHERE Region_User.iSystemUser = #UserID OR #UserID = - 1
INTERSECT
SELECT vMobileLastMobileHistory.*
FROM vMobileLastMobileHistory
LEFT OUTER JOIN Contractor_User ON
vMobileLastMobileHistory.strContractor = Contractor_User.strContractor
WHERE Contractor_User.iSystemUser = #UserID OR #UserID = - 1
)
The problem is that if you have any indexes on your iSytemUser columns, the optimise is unable to use them because it has to account for a specific userID being passed, or returning all results, it would be better to logically separate your two cases. In addition, since you don't care about any columns in the auxiliary tables, you could use EXISTS in your case of specific users to take advantage of a semi join:
IF (#UserID = -1)
BEGIN
SELECT DISTINCT *
FROM vMobileLastMobileHistory;
END
ELSE
BEGIN
SELECT DISTINCT *
FROM vMobileLastMobileHistory AS mh
WHERE EXISTS
( SELECT 1
FROM Contractor_User AS cu
WHERE cu.strContractor = mh.strContractor
AND cu.iSystemUser = #UserID
)
AND EXISTS
( SELECT 1
FROM Region_User AS ru
WHERE ru.strRegion = mh.strRegion
AND ru.iSystemUser = #UserID
)
AND EXISTS
( SELECT 1
FROM MobileType_DomainAction AS da
INNER JOIN MobileType_User AS mu
ON mu.MobileID = da.ID
WHERE da.tiMobileType = mh.tiMobileType
AND mu.iSystemUser = #UserID
);
END
Now you can have two execution plans for each case (returning all results, or for a specific user), in each case you only need to read from vMobileLastMobileHistory once, and you also limit the sorts required by removing the INTERSECT and replacing with 3 EXISTS clauses.
If they don't already exist then you may also which to consider some indexes on your tables. A good way of finding out what indexes would help is to run the query in SQL Server Management Studio with the option "Show Actual Execution Plan" enabled, this will then show you any missing indexes.
Most of time Intersect and Inner Join will be same. You are not share your data, so based on my knowledge and this link, I just replace intersect query into Inner join query as :
--I think you don't need distinct upper query. If you have issue inform me.
SELECT DISTINCT vml.*
FROM vMobileLastMobileHistory vml
LEFT OUTER JOIN MobileType_DomainAction mtda ON mtda.tiMobileType = vml.tiMobileType
LEFT OUTER JOIN MobileType_User ON MobileType_User.MobileID = mtda.ID
LEFT OUTER JOIN dbo.Region_User ON dbo.vml.strRegion = dbo.Region_User.strRegion
LEFT OUTER JOIN Contractor_User ON vml.strContractor = Contractor_User.strContractor
WHERE
(MobileType_User.UserID = #UserID
and Region_User.iSystemUser = #UserID or Contractor_User.iSystemUser = #UserID
) OR #UserID = - 1

How to join one select with another when the first one not always returns a value for specific row?

I have a complex query to retrieve some results:
EDITED QUERY (added the UNION ALL):
SELECT t.*
FROM (
SELECT
dbo.Intervencao.INT_Processo, analista,
ETS.ETS_Sigla, ATC.ATC_Sigla, PAT.PAT_Sigla, dbo.Assunto.SNT_Peso,
CASE
WHEN ETS.ETS_Sigla = 'PE' AND (PAT.PAT_Sigla = 'LIB' OR PAT.PAT_Sigla = 'LBR') THEN (0.3*SNT_Peso)
WHEN ETS.ETS_Sigla = 'CD' THEN (0.3*SNT_Peso)*0.3
ELSE SNT_Peso
END AS PESOAREA,
CASE
WHEN a.max_TEA_FimTarefa IS NULL THEN a.max_TEA_InicioTarefa
ELSE a.max_TEA_FimTarefa
END AS DATA_INICIO_TERMINO,
ROW_NUMBER() OVER (PARTITION BY ATC.ATC_Sigla, a.SRV_Id ORDER BY TEA_FimTarefa DESC) AS seqnum
FROM dbo.Tarefa AS t
INNER JOIN (
SELECT
MAX(dbo.TarefaEtapaAreaTecnica.TEA_InicioTarefa) AS max_TEA_InicioTarefa,
MAX (dbo.TarefaEtapaAreaTecnica.TEA_FimTarefa) AS max_TEA_FimTarefa,
dbo.Pessoa.PFJ_Descri AS analista, dbo.AreaTecnica.ATC_Id, dbo.Tarefa.SRV_Id
FROM dbo.TarefaEtapaAreaTecnica
LEFT JOIN dbo.Tarefa ON dbo.TarefaEtapaAreaTecnica.TRF_Id = dbo.Tarefa.TRF_Id
LEFT JOIN dbo.AreaTecnica ON dbo.TarefaEtapaAreaTecnica.ATC_Id = dbo.AreaTecnica.ATC_Id
LEFT JOIN dbo.ServicoAreaTecnica ON dbo.TarefaEtapaAreaTecnica.ATC_Id = dbo.ServicoAreaTecnica.ATC_Id
AND dbo.Tarefa.SRV_Id = dbo.ServicoAreaTecnica.SRV_Id
INNER JOIN dbo.Pessoa ON dbo.Pessoa.PFJ_Id = dbo.ServicoAreaTecnica.PFJ_Id_Analista
GROUP BY dbo.AreaTecnica.ATC_Id, dbo.Tarefa.SRV_Id, dbo.Pessoa.PFJ_Descri
) AS a ON t.SRV_Id = a.SRV_Id
INNER JOIN dbo.TarefaEtapaAreaTecnica AS TarefaEtapaAreaTecnica_1 ON
t.TRF_Id = TarefaEtapaAreaTecnica_1.TRF_Id
AND a.ATC_Id = TarefaEtapaAreaTecnica_1.ATC_Id
AND a.max_TEA_InicioTarefa = TarefaEtapaAreaTecnica_1.TEA_InicioTarefa
LEFT JOIN AreaTecnica ATC ON TarefaEtapaAreaTecnica_1.ATC_Id = ATC.ATC_Id
LEFT JOIN Etapa ETS ON TarefaEtapaAreaTecnica_1.ETS_Id = ETS.ETS_Id
LEFT JOIN ParecerTipo PAT ON TarefaEtapaAreaTecnica_1.PAT_Id = PAT.PAT_Id
LEFT OUTER JOIN dbo.Servico ON a.SRV_Id = dbo.Servico.SRV_Id
INNER JOIN dbo.Intervencao ON dbo.Servico.INT_Id = dbo.Intervencao.INT_Id
LEFT JOIN dbo.Assunto ON dbo.Servico.SNT_Id = dbo.Assunto.SNT_Id
) t
The result is following:
It works good, the problem is that I was asked that if when a row is not present on this query, it must contain values from another table (ServicoAreaTecnica), so I got this query for the other table based on crucial information of the first query. So if I UNION ALL I get this:
Query1 +
UNION ALL
SELECT INN.INT_Processo,
PES.PFJ_Descri,
NULL, --ETS.ETS_Sigla,
ART.ATC_Sigla,
NULL ,--PAT.PAT_Sigla,
ASS.SNT_Peso,
NULL, --PESOAREA
NULL, --DATA_INICIO_TERMINO
NULL --seqnum
FROM dbo.ServicoAreaTecnica AS SAT
INNER JOIN dbo.AreaTecnica AS ART ON ART.ATC_Id = SAT.ATC_Id
INNER JOIN dbo.Servico AS SER ON SER.SRV_Id = SAT.SRV_Id
INNER JOIN dbo.Assunto AS ASS ON ASS.SNT_Id = SER.SNT_Id
INNER JOIN dbo.Intervencao AS INN ON INN.INT_Id = SER.INT_Id
INNER JOIN dbo.Pessoa AS PES ON PES.PFJ_Id = SAT.PFJ_Id_Analista
The result is following:
So what I want to do is to remove row number 1 because row number 2 exists on the first query, I think I got it explained better this time. The result should be only row number 1, row number 2 would appear only if query 1 doesn't retrieve a row for that particular INN.INT_Processo.
Thanks!
Ok, there are two ways to reduce your record set. Given that you've already written the code to produce the table with the extra rows, it might be easiest to just add code to reduce that:
Select * from
(Select *
, Row_Number() over
(partition by IntProcesso, Analista order by ISNULL(seqnum, 0) desc) as RN
from MyResults) a
where RN = 1
This will assign row_number 1 to any rows that came from your first query, or to any rows from the second query that do not have matches in the first query, then filter out extra rows.
You could also use outer joins with isnull or coalesce, as others have suggested. Something like this:
Select ISNULL(a.IntProcesso, b.IntProcesso) as IntProcesso
, ISNULL(a.Analista, b.Analista) as Analista
, ISNULL(a.ETSsigla, b.ETSsigla) as ETSsigla
[repeat for the rest of your columns]
from Table1 a
full outer join Table2 b
on a.IntProcesso = b.IntProcesso and a.Analista = b.Analista
Your code is hard to read, because of the lengthy names of everything (and to be honest, the fact that they're in a language I don't speak also makes it a lot harder).
But how about: replacing your INNER JOINs with LEFT JOINs, adding more LEFT JOINs to draw in the alternative tables, and introducing ISNULL clauses for each variable you want in the results?
If you do something like ... Query1 Right Join Query2 On ... that should get only the rows in Query2 that don't appear in Query 1.

TSQL - Return recent date

Having issues getting a dataset to return with one date per client in the query.
Requirements:
Must have the recent date of transaction per client list for user
Will need have the capability to run through EXEC
Current Query:
SELECT
c.client_uno
, c.client_code
, c.client_name
, c.open_date
into #AttyClnt
from hbm_client c
join hbm_persnl p on c.resp_empl_uno = p.empl_uno
where p.login = #login
and c.status_code = 'C'
select
ba.payr_client_uno as client_uno
, max(ba.tran_date) as tran_date
from blt_bill_amt ba
left outer join #AttyClnt ac on ba.payr_client_uno = ac.client_uno
where ba.tran_type IN ('RA', 'CR')
group by ba.payr_client_uno
Currently, this query will produce at least 1 row per client with a date, the problem is that there are some clients that will have between 2 and 10 dates associated with them bloating the return table to about 30,000 row instead of an idealistic 246 rows or less.
When i try doing max(tran_uno) to get the most recent transaction number, i get the same result, some have 1 value and others have multiple values.
The bigger picture has 4 other queries being performed doing other parts, i have only included the parts that pertain to the question.
Edit (2011-10-14 # 1:45PM):
select
ba.payr_client_uno as client_uno
, max(ba.row_uno) as row_uno
into #Bills
from blt_bill_amt ba
inner join hbm_matter m on ba.matter_uno = m.matter_uno
inner join hbm_client c on m.client_uno = c.client_uno
inner join hbm_persnl p on c.resp_empl_uno = p.empl_uno
where p.login = #login
and c.status_code = 'C'
and ba.tran_type in ('CR', 'RA')
group by ba.payr_client_uno
order by ba.payr_client_uno
--Obtain list of Transaction Date and Amount for the Transaction
select
b.client_uno
, ba.tran_date
, ba.tc_total_amt
from blt_bill_amt ba
inner join #Bills b on ba.row_uno = b.row_uno
Not quite sure what was going on but seems the Temp Tables were not acting right at all. Ideally i would have 246 rows of data, but with the previous query syntax it would produce from 400-5000 rows of data, obviously duplications on data.
I think you can use ranking to achieve what you want:
WITH ranked AS (
SELECT
client_uno = ba.payr_client_uno,
ba.tran_date,
be.tc_total_amt,
rnk = ROW_NUMBER() OVER (
PARTITION BY ba.payr_client_uno
ORDER BY ba.tran_uno DESC
)
FROM blt_bill_amt ba
INNER JOIN hbm_matter m ON ba.matter_uno = m.matter_uno
INNER JOIN hbm_client c ON m.client_uno = c.client_uno
INNER JOIN hbm_persnl p ON c.resp_empl_uno = p.empl_uno
WHERE p.login = #login
AND c.status_code = 'C'
AND ba.tran_type IN ('CR', 'RA')
)
SELECT
client_uno,
tran_date,
tc_total_amt
FROM ranked
WHERE rnk = 1
ORDER BY client_uno
Useful reading:
Ranking Functions (Transact-SQL)
ROW_NUMBER (Transact-SQL)
WITH common_table_expression (Transact-SQL)
Using Common Table Expressions

Resources