need to know how to remove duplicate rows from SQL to fetch data from a variety of tables; - sql-server

I need data from a variety of tables and below is the only way I know to do it (I just know the basics). The query below works fine but shows duplicates. I need to know how to remove those.
SELECT DISTINCT
a.int_order_id, a.trans_id, a.wtn,a.isp_id,
d.first_name, d.middle_initial, d.last_name,
d.company_name, d.emaiL,
a.ilec_lob, a.node_type_id, a.cddd,
a.isp_ban, a.tos_version, a.isp_ckt_id,
a.isp_circuit_type, a.atm_vpi, a.atm_vci,
a.frs_dlci, b.order_create_date, b.pon,
b.order_status_id, e.trans_status_id,
e.description, c.STREET_NUMBER,
c.STREET_NUMBER_SUFFIX, c.DIRECTIONAL_ID,
c.street_name, c.thoroughfare_id,
c.street_suffix, c.address_line1, c.address_line2,
c.unit_type_id, c.unit_value, c.city, c.state_id, c.zip
FROM
VZEXTRACT1.vvov_os_ord_dsl A
JOIN
VZEXTRACT1.vvov_os_order_details B ON a.int_order_id = b.int_order_id
JOIN
VZEXTRACT1.vvov_os_ord_address C ON b.int_order_id = c.int_order_id
JOIN
vzextract1.vvov_os_ord_contact D ON c.int_order_id = d.int_order_id
JOIN
VZEXTRACT1.vv0v_trans_status E On b.order_status_id = e.trans_status_id
WHERE
a.isp_id NOT IN (657,500)
AND B.ORDER_CREATE_DATE >= to_date('01-may-15')
AND B.ORDER_CREATE_DATE < to_date('30-JUL-15')

When you want to remove duplicate rows you will have to use
Distinct : It will consider distinct of all selected columns so you will have to find out because of which column you are getting duplicate rows even if you have used distinct
Group by : You can use group by clause to get distinct rows. You can use aggregate function for column causing duplicate rows from point 1 and avoid duplicates.
over(partition by) : you can also use this clause for column causing duplicates. Like you concat values of such column ,
wm_concat(my_col)over(partition by id)

Related

Ignoring Nulls using Lag() in SQL Server 2018

I am trying to get the Activity ID linked to Dispatch. In the data Attrib_43 will only get populated if Dispatch is created.
What i am looking to do is to get the Activity ID of the previous row before the Dispatch created.
Attached is the code that i am using
Select sea.ROW_ID, sea.CREATED_DTTM,sea.SRA_SR_ID,sea.ATTRIB_43, tsk.CRT_DTS, tsk.TASK_DESC, datediff(ss,sea.CREATED_DTTM, tsk.CRT_DTS) as dd
, seal.x_isp_notes, seal.x_isp_comments, seal.comments, seal.x_isp_agent_desc
, tsk.TASK_SUB_TYPE_CD, tsk.TASK_TYPE_CD, tsk.WHAT_ID
, cdl.ORIGIN_NM
,LAG(sea.ROW_ID,1) over (partition by sea.ATTRIB_43 order by sea.CREATED_DTTM) AS 'FLAGID'
--, tsk.*,
, fdc.FISCAL_QUARTER
from GSEDATA.dbo.X_ISP_EXTRNL_CASE_ID sea
join rawdata.corp_ww.FISCAL_DAY_CALENDAR fdc on sea.CREATED_DATE = fdc.ACTUAL_DATE
left join rawdata.svc_base.SFDC_TASK_DTL tsk on sea.X_ISP_EXTRNL_CASE_ID = tsk.TASK_ID
left join rawdata.svc_base.SFDC_CASE_DTL cdl on cdl.case_id = tsk.what_id
left join GSEDATA.dbo.s_evt_act_logs seal on sea.ROW_ID = seal.row_id
where --sea.ATTRIB_43 = '04391481876'
sea.SRA_SR_ID = 'A-2Q7YF57W'
order by sea.CREATED_DTTM
But it is not working as per my Expectation - Activity ID flag for Attrib 43 is coming as Null
If I understand your question correctly, you are getting null FLAGID for non- null attrib_43 becuase you are using a partition by sea.ATTRIB_43 clause.
Partition by divides the query result set into partitions. The window function is applied to each partition separately and computation restarts for each partition.
That is why the null values for ATTRIB_43 are grouped into one window and the non-null ATTRIB_43 will have a separate window for each distinct value, therefore giving a NULL for LAG() function in the first row of each window.
you should remove the clause partition by sea.ATTRIB_43 if you want lag values for all rows.
LAG(sea.ROW_ID,1) over (order by sea.CREATED_DTTM) AS 'FLAGID'

How to avoid repeating a field value in access query result

I have the below tables in my DB.
Orders:
OrderNo.........ItemNo........OrderQty
1000________10_________10
2000________20_________10
VendorPO:
OrderNo.........ItemNo........VendorPO........POQty
1000________10_________100__________5
2000________20_________100__________7
2000________20_________200__________3
And I used this Query:
SELECT Order.OrderNo, Order.ItemNo, Order.OrderQty, VendorPO.VendorPO, VendorPO.POQty
FROM [Order]
INNER JOIN VendorPO ON (Order.ItemNo = VendorPO.ItemNo)
AND (Order.OrderNo = VendorPO.OrderNo);
With these results:
Query Result
OrderNo.........ItemNo........OrderQty........VendorPO........POQty
1000________10__________10_________100__________5
2000________20__________10_________100__________7
2000________20__________10_________200__________3
I want to avoid repetition of the quantity in the query result, where Item quantity 10 is repeated twice against two PO reference with POQty 3 and 7.
I appreciate your support on finding a way to avoid order quantity repetition.

SQL Server LEFT JOIN

This query has been keeping me busy for the last couple of days. I tried to rewrite it with different ideas but I keep having the same problem. To simplify the problem I put part of my query in a view, this view returns 23 records. Using a left join I would like to add fields coming from the table tblDatPositionsCalc to these 23 records. As you can see I have an additional condition on the tblDatPositionsCalc in order to only consider the most recent records. With this condition it would return 21 records. The join should be on two fields together colAccount and colId.
I simply want the query to return the 23 records from the view and where possible have the information from tblDatPositionsCalc. There is actually only 2 records in the view without corresponding id and account in tblDatPositionsCalc, that means out of the 23 records only 2 will have missing values in the fields coming from the table tblDatPositionsCalc.
The problem with my query is that it only returns the 21 records from tblDatPositionsCalc. I don't understand why. I tried to move the condition on date in just after the JOIN condition but that did not help.
SELECT TOP (100) PERCENT
dbo.vwCurrPos.Account,
dbo.vwCurrPos.Id,
dbo.vwCurrPos.TickerBB,
dbo.vwCurrPos.colEquityCode,
dbo.vwCurrPos.colType,
dbo.vwCurrPos.colCcy,
dbo.vwCurrPos.colRegion,
dbo.vwCurrPos.colExchange,
dbo.vwCurrPos.[Instr Type],
dbo.vwCurrPos.colMinLastDay,
dbo.vwCurrPos.colTimeShift,
dbo.vwCurrPos.Strike,
dbo.vwCurrPos.colMultiplier,
dbo.vwCurrPos.colBetaVol,
dbo.vwCurrPos.colBetaEq,
dbo.vwCurrPos.colBetaFloor,
dbo.vwCurrPos.colBetaCurv,
dbo.vwCurrPos.colUndlVol,
dbo.vwCurrPos.colUndlEq,
dbo.vwCurrPos.colUndlFut,
tblDatPositionsCalc_1.colLots,
dbo.vwCurrPos.[Open Positions],
dbo.vwCurrPos.colListMatShift,
dbo.vwCurrPos.colStartTime,
tblDatPositionsCalc_1.colPrice,
tblDatPositionsCalc_1.colMktPrice,
dbo.vwCurrPos.colProduct,
dbo.vwCurrPos.colCalendar,
CAST(dbo.vwCurrPos.colExpiry AS DATETIME) AS colExpiry,
dbo.vwCurrPos.colEndTime,
CAST(tblDatPositionsCalc_1.colDate AS datetime) AS colDate,
dbo.vwCurrPos.colFund,
dbo.vwCurrPos.colExchangeTT,
dbo.vwCurrPos.colUserTag
FROM dbo.vwCurrPos
LEFT OUTER JOIN dbo.tblDatPositionsCalc AS tblDatPositionsCalc_1
ON tblDatPositionsCalc_1.colId = dbo.vwCurrPos.Id
AND tblDatPositionsCalc_1.colAccount = dbo.vwCurrPos.Account
WHERE (tblDatPositionsCalc_1.colDate =
(SELECT MAX(colDate) AS Expr1 FROM dbo.tblDatPositionsCalc))
ORDER BY
dbo.vwCurrPos.Account,
dbo.vwCurrPos.Id,
dbo.vwCurrPos.colEquityCode,
dbo.vwCurrPos.colRegion
Any idea what might cause the problem?
(Option 1) DrCopyPaste is right so your from clause would look like:
...
FROM dbo.vwCurrPos
LEFT OUTER JOIN dbo.tblDatPositionsCalc AS tblDatPositionsCalc_1
ON tblDatPositionsCalc_1.colId = dbo.vwCurrPos.Id
AND tblDatPositionsCalc_1.colAccount = dbo.vwCurrPos.Account
and (tblDatPositionsCalc_1.colDate =
(SELECT MAX(colDate) AS Expr1 FROM dbo.tblDatPositionsCalc))
...
reason: the where clause restriction of left joined to column = some expression with fail to return for "null = something" so the row will be removed.
(Option 2) As oppose to pushing code in to additional views where it is harder to maintain you can nest sql select statements;
select
X.x1,X.x2,
Y.*
from X
left join
(select Z.z1 as y1, Z.z2 as y2, Z.z3 as y3
from Z
where Z.z1 = (select max(Z.z1) from Z)
) as Y
on x.x1 = Y.y1 and X.x2 = Y.y2
The advantage here is you check each nested sub query a move out quickly. Although if you still building up more logic check out common table expressions (CTE's) http://msdn.microsoft.com/en-us/library/ms175972.aspx

Retriving data from different tables depending on value in a column

Please pardon me if this question has been asked before, but I simply don't have enough vocabulary to search for what I need as a novice in data bases.
I am using SQL server 2008.
I have a table tblPDCDetails with several columns. One of the columns PDCof holds values :
"A"(for applicant),
"C" for coapplicant,
"G" (for Guarantor).
Another column HolderID holds uniqueid (of holder).
The PDCHolders reside in their respective tables: Applicants in tblApplBasicDetails, CoApllicants in their own table and so on.
Now what I need is how should I retrive the names of holders from their respective tables, depending on the value in PDCof column.
Can I do it at all?
If no how should I work around this?
This should do:
SELECT A.*,
COALESCE(B.Name,C.Name,D.Name) Name
FROM dbo.tblPDCDetails A
LEFT JOIN dbo.tblApplBasicDetails B
ON A.HolderID = B.HolderID
AND A.PDCof = 'A'
LEFT JOIN dbo.tblCoApplBasicDetails C
ON A.HolderID = C.HolderID
AND A.PDCof = 'C'
LEFT JOIN dbo.tblGuarantorlBasicDetails D
ON A.HolderID = D.HolderID
AND A.PDCof = 'G'
The other option is to use a case switch:
Select case Main.PDCof
when 'A' then (select HolderID from Applicants where main.value = value)
when 'C' then (select HolderID from CoApplicants where main.value = value)
when 'G' then (select HolderID from Guarantor where main.value = value)
end
,main.*
from tblPDCDetails main
Depends on whether you run this a few times a day, or a few thousand times an hour

mysql complex select query from multiple tables

Table Visits;
fields[id,patient_id(fk),doctor_id(fk),flag(Xfk),type(Xfk),time_booked,date,...]
Xfk = it refer to other table, but its not a must to exist so i dont use constrain.
SELECT `v`.`date`, `v`.`time_booked`, `v`.`stats`, `p`.`name` as pt_name,
`d`.`name` as dr_name, `f`.`name` as flag_name, `f`.`color` as flag_color,
`vt`.`name` as type, `vt`.`color` as type_color
FROM (`visits` v, `users` p, `users` d, `flags` f, `visit_types` vt)
WHERE `p`.`id`=`v`.`patient_id`
AND `d`.`id`=`v`.`doctor_id`
AND `v`.`flag`=`f`.`id`
AND `v`.`type`=`vt`.`id`
AND `v`.`date` >= '2013-02-27'
AND (v.date <= DATE_ADD('2013-02-27', INTERVAL 7 DAY))
AND (`v`.`doctor_id`='00002' OR `v`.`doctor_id`='00001')
ORDER BY `v`.`date` ASC, `v`.`time_booked` ASC;
One big statmeant i have !
my question is,
1: should i consider using join instead of select multiple tables ?
and if i should why ?
this query execution time is 0.0009 so i think its fine, and since i get all my data in one query, or is it bad practice ?
2: in the select part i want to say
if v.type != 0 select f.name,f.color else i dont want to select them nither there tables flags f
is it possible ?
also currently if flag was not found, it replicate all rows as much as flag table have in rows ! is there a way i can prevent this ? both for
flag and visit_types table ?
If it's running fast, I wouldn't mess with it. I generally prefer to use joins instead of matching stuff in the where clause.
Any chance you'd remove the ` characters? Just makes it a bit harder to read in my opinion.
Look at the case statement for MySQL: http://dev.mysql.com/doc/refman/5.0/en/case.html
select case when v.type <> 0 then
f.name
else
''
end as name, ...

Resources