Change value of column in whole group - sql-server

Given Example
A group is defined via the GRP_ID and the GRP_MAIN set. Orange and green are examples of what it should get. Blue is what we have.
All with the same NAME should be grouped to have the same UNIQUE GRP_ID (unique is important, because there are already some data grouped together) and the MAX or TOP or AVG record should be Marked as GRP_MAIN.
So my first question is, how can I set a value to a whole group
column, so I can after group by NAME do GRP_ID = 1
Secound is, how can I check all existing GRP_ID against a number?
Third, how to pick the next free number of all existing in GRP_ID?

Just copy this into an empty query window and execute... adapt to your needs...
You should use this query once to shuffle your data into a new table with columns für GRP_ID and GRP_MAIN. GRP_ID should be IDENTITY...
DECLARE #tbl TABLE(VALUE INT, NAME VARCHAR(10));
INSERT INTO #tbl VALUES
(12,'ab')
,(1,'ab')
,(2,'ab')
,(34,'ab')
,(5,'ab')
,(6,'ab')
,(3,'fg')
,(45,'fg')
,(65,'fg')
,(2,'ht')
,(3,'ht')
,(44,'hh')
,(5,'hh')
,(6,'hh')
,(7,'hh');
WITH DistinctNames AS
(
SELECT ROW_NUMBER() OVER(ORDER BY Name) AS Inx, x.NAME
FROM
(
SELECT DISTINCT NAME
FROM #tbl
) AS x
)
SELECT dn.Inx AS GRP_ID
,CASE WHEN tbl.Value=MainRec.VALUE THEN 1 ELSE 0 END AS GRP_MAIN
,tbl.VALUE
,tbl.NAME
FROM #tbl AS tbl
INNER JOIN DistinctNames AS dn ON tbl.NAME=dn.NAME
CROSS APPLY(SELECT TOP 1 x.* FROM #tbl AS x WHERE x.NAME=dn.NAME ORDER BY VALUE DESC) AS MainRec

Related

T-SQL : Cleaning up data, merging rows into columns

I'm trying to clean up some Active Directory data in SQL Server. I have managed to read the raw LFD file into a table. Now I need to clean up some attributes where values are spread out over multiple rows. I can identify records that need to be appended to the prior record by the fact the have a leading space.
Example:
ID Record IsPartOfPrior
3114 memberOf: 0
3115 CN=Sharepoint-Members-home 1
3116 memberOf: 0
3117 This is 1
3118 part of the 1
3119 next line. 1
Ultimately, I would like to have the following table generated:
ID Record
3114 memberOf:CN=Sharepoint-Members-home
3116 memberOf:This is part of the next line
I could write it through a cursor, setting variables, working with temp tables and populating a table. But there has to be a set based (maybe recursive?) approach to this?
I could use the STUFF method to combine various rows together, but how am I about to group the various sets together? I'm thinking that I first have to define groupID's per record, and then stuff them together per groupID?
Thanks for any help.
Batch with comments below. Should work starting with SQL Server 2008.
--Here I emulate your table
DECLARE #yourtable TABLE (ID int, Record nvarchar(max), IsPartOfPrior bit)
INSERT INTO #yourtable VALUES
(3114,'memberOf:',0),(3115,'CN=Sharepoint-Members-home',1),(3116,'memberOf:',0),(3117,'This is',1),(3118,'part of the',1),(3119,'next line.',1)
--Getting max ID
DECLARE #max_id int
SELECT #max_id = MAX(ID)+1
FROM #yourtable
--We get next prior for each prior record
--And use STUFF and FOR XML PATH to build new Record
SELECT y.ID,
y.Record + b.Record as Record
FROM #yourtable y
OUTER APPLY (
SELECT TOP 1 ID as NextPrior
FROM #yourtable
WHERE IsPartOfPrior = 0 and y.ID < ID
ORDER BY ID ASC
) as t
OUTER APPLY (
SELECT STUFF((
SELECT ' '+Record
FROM #yourtable
WHERE ID > y.ID and ID < ISNULL(t.NextPrior,#max_id)
ORDER BY id ASC
FOR XML PATH('')
),1,1,'') as Record
) as b
WHERE y.IsPartOfPrior = 0
The output:
ID Record
----------- -----------------------------------------
3114 memberOf:CN=Sharepoint-Members-home
3116 memberOf:This is part of the next line.
This will work if ID are numeric and ascending.
Yet another option if 2012+
Example
Declare #YourTable Table ([ID] int,[Record] varchar(50),[IsPartOfPrior] int)
Insert Into #YourTable Values
(3114,'memberOf:',0)
,(3115,'CN=Sharepoint-Members-home',1)
,(3116,'memberOf:',0)
,(3117,'This is',1)
,(3118,'part of the',1)
,(3119,'next line.',1)
;with cte as (
Select *,Grp = sum(IIF([IsPartOfPrior]=0,1,0)) over (Order By ID)
From #YourTable
)
Select ID
,Record = Stuff((Select ' ' +Record From cte Where Grp=A.Grp Order by ID For XML Path ('')),1,1,'')
From (Select Grp,ID=min(ID)from cte Group By Grp ) A
Returns
ID Record
3114 memberOf: CN=Sharepoint-Members-home
3116 memberOf: This is part of the next line.
If it Helps with the Visualization, the cte Produces:
ID Record IsPartOfPrior Grp << Notice Grp Values
3114 memberOf: 0 1
3115 CN=Sharepoint-Members-home 1 1
3116 memberOf: 0 2
3117 This is 1 2
3118 part of the 1 2
3119 next line. 1 2

TSQL : Find PAIR Sequence in a table

I have following table in T-SQL(there are other columns too but no identity column or primary key column):
Oid Cid
1 a
1 b
2 f
3 c
4 f
5 a
5 b
6 f
6 g
7 f
So in above example I would like to highlight that following Oid are duplicate when looking at Cid column values as "PAIRS":
Oid:
1 (1 matches Oid: 5)
2 (2 matches Oid: 4 and 7)
Please NOTE that Oid 2 match did not include Oid 6, since the pair of 6 has letter 'G' as well.
Is it possible to create a query without using While loop to highlight the "Oid" like above? along with how many other matches count exist in database?
I am trying to find the patterns within the dataset relating to these two columns. Thank you in Advance.
Here is a worked example - see comments for explanation:
--First set up your data in a temp table
declare #oidcid table (Oid int, Cid char(1));
insert into #oidcid values
(1,'a'),
(1,'b'),
(2,'f'),
(3,'c'),
(4,'f'),
(5,'a'),
(5,'b'),
(6,'f'),
(6,'g'),
(7,'f');
--This cte gets a table with all of the cids in order, for each oid
with cte as (
select distinct Oid, (select Cid + ',' from #oidcid i2
where i2.Oid = i.Oid order by Cid
for xml path('')) Cids
from #oidcid i
)
select Oid, cte.Cids
from cte
inner join (
-- Here we get just the lists of cids that appear more than once
select Cids, Count(Oid) as OidCount
from cte group by Cids
having Count(Oid) > 1 ) as gcte on cte.Cids = gcte.Cids
-- And when we list them, we are showing the oids with duplicate cids next to each other
Order by cte.Cids
select o1.Cid, o1.Oid, o2.Oid
, count(*) + 1 over (partition by o1.Cid) as [cnt]
from table o1
join table o2
on o1.Cid = o2.Cid
and o1.Oid < o2.Oid
order by o1.Cid, o1.Oid, o2.Oid
Maybe Like this then:
WITH CTE AS
(
SELECT Cid, oid
,ROW_NUMBER() OVER (PARTITION BY cid ORDER BY cid) AS RN
,SUM(1) OVER (PARTITION BY oid) AS maxRow2
,SUM(1) OVER (PARTITION BY cid) AS maxRow
FROM oid
)
SELECT * FROM CTE WHERE maxRow != 1 AND maxRow2 = 1
ORDER BY oid

How can I ignore duplicate rows where columns do not contain data

I have a table with duplicate rows, however, some of the duplicate rows have columns does not contain data for the same column. How can I remove/ignore only those row where columns are blank? In some instances:
Name Employee# Location City
-----------------------------------------
BowerT 48999 NJ Foods
BowerT 48999 NJ Foods Pearl
BowerT 48999 NJ Foods Johns
BowerT 48999 NJ Foods Johns
I'm using with CTE to delete duplicate, however, if 2nd, 3rd, or 4th row has the data I need for that column, I lose it because these are greater than row 1.
;With hrEmployee as
(
Select
*,
Row_Number () Over (Partition BY Employee_Number order by Employee_Number) As RowNumber
From
[dbo].[hrEmployee]
Where
Employee_Number = '48999'
)
Delete hrEmployees
where RowNumber > 1
What am I missing?
Here is an entire example the relevant code change is:
ROW_NUMBER() OVER (PARTITION BY Employee_Number ORDER BY
CASE WHEN ISNULL(City,'') = '' THEN 1 ELSE 0 END
) as RowNumber
What that does is simply ORDER your results of what you want to keep by saying if the City is null or '' (blank) make it last. You can rank your results anyway you want by specifying different order in your ORDER BY.
DECLARE #Table AS TABLE (Name VARCHAR(10), Employee_Number INT, Location VARCHAR(20), City VARCHAR(20))
INSERT INTO #Table VALUES ('BowerT',48999,'NJ Foods',NULL)
,('BowerT',48999,'NJ Foods','Pearl')
,('BowerT',48999,'NJ Foods','Johns')
,('BowerT',48999,'NJ Foods','Johns')
SELECT *
FROM
#Table
;WITH hrEmployee AS (
SELECT
*
,ROW_NUMBER() OVER (PARTITION BY Employee_Number ORDER BY
CASE WHEN ISNULL(City,'') = '' THEN 1 ELSE 0 END
) as RowNumber
FROM
#Table
where Employee_Number = '48999'
)
DELETE
FROM
hrEmployee
WHERE
RowNumber > 1
SELECT *
FROM
#Table

How to check for a specific condition by looping through every record in SQL Server?

I do have following table
ID Name
1 Jagan Mohan Reddy868
2 Jagan Mohan Reddy869
3 Jagan Mohan Reddy
Name column size is VARCHAR(55).
Now for some other task we need to take only 10 varchar length i.e. VARCHAR(10).
My requirement is to check that after taking the only 10 bits length of Name column value for eg if i take Name value of ID 1 i.e. Jagan Mohan Reddy868 by SUBSTRING(Name, 0,11) if it equals with another row value. here in this case the final value of SUBSTRING(Jagan Mohan Reddy868, 0,11) is equal to Name value of ID 3 row whose Name is 'Jagan Mohan Reddy'. I need to make a list of those kind rows. Can somebody help me out on how can i achieve in SQL Server.
My main check is that the truncated values of my Name column should not match with any non truncated values of Name column. If so i need to get those records.
Assuming I understand the question, I think you are looking for something like this:
Create and populate sample data (Please save us this step in your future questions)
DECLARE #T as TABLE
(
Id int identity(1,1),
Name varchar(15)
)
INSERT INTO #T VALUES
('Hi, I am Zohar.'),
('Hi, I am Peled.'),
('Hi, I am Z'),
('I''m Zohar peled')
Use a cte with a self inner join to get the list of ids that match the first 10 chars:
;WITH cte as
(
SELECT T2.Id As Id1, T1.Id As Id2
FROM #T T1
INNER JOIN #T T2 ON LEFT(T1.Name, 10) = t2.Name AND T1.Id <> T2.Id
)
Select the records from the original table, inner joined with a union of the Id1 and Id2 from the cte:
SELECT T.Id, Name
FROM #T T
INNER JOIN
(
SELECT Id1 As Id
FROM CTE
UNION
SELECT Id2
FROM CTE
) U ON T.Id = U.Id
Results:
Id Name
----------- ---------------
1 Hi, I am Zohar.
3 Hi, I am Z
Try this
SELECT Id,Name
FROM(
SELECT *,ROW_NUMBER() OVER(PARTITION BY Name, LEFT(Name,11) ORDER BY ID) RN
FROM Tbale1 T
) Tmp
WHERE Tmp.RN = 1
loop over your column for all the values and put your substring() function inside this loop and I think in Sql index of string starts from 1 instead of 0. If you pass your string to charindex() like this
CHARINDEX('Y', 'Your String')
thus you will come to know whether it is starting from 0 or 1
and you can save your substring value as value of other column with length 10
I hope it will help you..
I think this should cover all the cases you are looking for.
-- Create Table
DECLARE #T as TABLE
(
Id int identity(1,1),
Name varchar(55)
)
-- Create Data
INSERT INTO #T VALUES
('Jagan Mohan Reddy868'),
('Jagan Mohan Reddy869'),
('Jagan Mohan Reddy'),
('Mohan Reddy'),
('Mohan Reddy123551'),
('Mohan R')
-- Get Matching Items
select *, SUBSTRING(name, 0, 11) as ShorterName
from #T
where SUBSTRING(name, 0, 11) in
(
-- get all shortnames with a count > 1
select SUBSTRING(name, 0, 11) as ShortName
from #T
group by SUBSTRING(name, 0, 11)
having COUNT(*) > 1
)
order by Name, LEN(Name)

Creating permutation via recursive CTE in SQL server?

Looking at :
;WITH cte AS(
SELECT 1 AS x UNION
SELECT 2 AS x UNION
SELECT 3 AS x
)
I can create permutation table for all 3 values :
SELECT T1.x , y=T2.x , z=t3.x
FROM cte T1
JOIN cte T2
ON T1.x != T2.x
JOIN cte T3
ON T2.x != T3.x AND T1.x != T3.x
This uses the power of SQL's cartesian product plus eliminating equal values.
OK.
But is it possible to enhance this recursive pseudo CTE :
;WITH cte AS(
SELECT 1 AS x , 2 AS y , 3 AS z
UNION ALL
...
)
SELECT * FROM cte
So that it will yield same result as :
NB there are other solutions in SO that uses recursive CTE , but it is not spread to columns , but string representation of the permutations
I tried to do the lot in a CTE.
However trying to "redefine" a rowset dynamically is a little tricky. While the task is relatively easy using dynamic SQL doing it without poses some issues.
While this answer may not be the most efficient or straight forward, or even correct in the sense that it's not all CTE it may give others a basis to work from.
To best understand my approach read the comments, but it might be worthwhile looking at each CTE expression in turn with by altering the bit of code below in the main block, with commenting out the section below it.
SELECT * FROM <CTE NAME>
Good luck.
IF OBJECT_ID('tempdb..#cteSchema') IS NOT NULL
DROP Table #cteSchema
GO
-- BASE CTE
;WITH cte AS( SELECT 1 AS x, 2 AS y, 3 AS z),
-- So we know what columns we have from the CTE we extract it to XML
Xml_Schema AS ( SELECT CONVERT(XML,(SELECT * FROM cte FOR XML PATH(''))) AS MySchema ),
-- Next we need to get a list of the columns from the CTE, by querying the XML, getting the values and assigning a num to the column
MyColumns AS (SELECT D.ROWS.value('fn:local-name(.)','SYSNAME') AS ColumnName,
D.ROWS.value('.','SYSNAME') as Value,
ROW_NUMBER() OVER (ORDER BY D.ROWS.value('fn:local-name(.)','SYSNAME')) AS Num
FROM Xml_Schema
CROSS APPLY Xml_Schema.MySchema.nodes('/*') AS D(ROWS) ),
-- How many columns we have in the CTE, used a coupld of times below
ColumnStats AS (SELECT MAX(NUM) AS ColumnCount FROM MyColumns),
-- create a cartesian product of the column names and values, so now we get each column with it's possible values,
-- so {x=1, x =2, x=3, y=1, y=2, y=3, z=1, z=2, z=3} -- you get the idea.
PossibleValues AS (SELECT MyC.ColumnName, MyC.Num AS ColumnNum, MyColumns.Value, MyColumns.Num,
ROW_NUMBER() OVER (ORDER BY MyC.ColumnName, MyColumns.Value, MyColumns.Num ) AS ID
FROM MyColumns
CROSS APPLY MyColumns MyC
),
-- Now we have the possibly values of each "column" we now have to concat the values together using this recursive CTE.
AllRawXmlRows AS (SELECT CONVERT(VARCHAR(MAX),'<'+ISNULL((SELECT ColumnName FROM MyColumns WHERE MyColumns.Num = 1),'')+'>'+Value) as ConcatedValue, Value,ID, Counterer = 1 FROM PossibleValues
UNION ALL
SELECT CONVERT(VARCHAR(MAX),CONVERT(VARCHAR(MAX), AllRawXmlRows.ConcatedValue)+'</'+(SELECT ColumnName FROM MyColumns WHERE MyColumns.Num = Counterer)+'><'+(SELECT ColumnName FROM MyColumns WHERE MyColumns.Num = Counterer+1)+'>'+CONVERT(VARCHAR(MAX),PossibleValues.Value)) AS ConcatedValue, PossibleValues.Value, PossibleValues.ID,
Counterer = Counterer+1
FROM AllRawXmlRows
INNER JOIN PossibleValues ON AllRawXmlRows.ConcatedValue NOT LIKE '%'+PossibleValues.Value+'%' -- I hate this, there has to be a better way of making sure we don't duplicate values....
AND AllRawXmlRows.ID <> PossibleValues.ID
AND Counterer < (SELECT ColumnStats.ColumnCount FROM ColumnStats)
),
-- The above made a list but was missing the final closing XML element. so we add it.
-- we also restict the list to the items that contain all columns, the section above builds it up over many columns
XmlRows AS (SELECT DISTINCT
ConcatedValue +'</'+(SELECT ColumnName FROM MyColumns WHERE MyColumns.Num = Counterer)+'>'
AS ConcatedValue
FROM AllRawXmlRows WHERE Counterer = (SELECT ColumnStats.ColumnCount FROM ColumnStats)
),
-- Wrap the output in row and table tags to create the final XML
FinalXML AS (SELECT (SELECT CONVERT(XML,(SELECT CONVERT(XML,ConcatedValue) FROM XmlRows FOR XML PATH('row'))) FOR XML PATH('table') )as XMLData),
-- Prepare a CTE that represents the structure of the original CTE with
DataTable AS (SELECT cte.*, XmlData
FROM FinalXML, cte)
--SELECT * FROM <CTE NAME>
-- GETS destination columns with XML data.
SELECT *
INTO #cteSchema
FROM DataTable
DECLARE #XML VARCHAR(MAX) ='';
SELECT #Xml = XMLData FROM #cteSchema --Extract XML Data from the
ALTER TABLE #cteSchema DROP Column XMLData -- Removes the superflous column
DECLARE #h INT
EXECUTE sp_xml_preparedocument #h OUTPUT, #XML
SELECT *
FROM OPENXML(#h, '/table/row', 2)
WITH #cteSchema -- just use the #cteSchema to define the structure of the xml that has been constructed
EXECUTE sp_xml_removedocument #h
How about translating 1,2,3 into a column, which will look exactly like the example you started from, and use the same approach ?
;WITH origin (x,y,z) AS (
SELECT 1,2,3
), translated (x) AS (
SELECT col
FROM origin
UNPIVOT ( col FOR cols IN (x,y,z)) AS up
)
SELECT T1.x , y=T2.x , z=t3.x
FROM translated T1
JOIN translated T2
ON T1.x != T2.x
JOIN translated T3
ON T2.x != T3.x AND T1.x != T3.x
ORDER BY 1,2,3
If I understood correctly the request, this might just do the trick.
And to run it on more columns, just need to add them origin cte definition + unpivot column list.
Now, i dont know how you pass your 1 - n values for it to be dynamic, but if you tell me, i could try edit the script to be dynamic too.

Resources