I have a large dataset (1.000.000+ lines) from witch I want to merge multible lines based on a key.
To make things clear, I've made a minimal viable product:
First some test-data
declare #TmpTable table
(
Pnr varchar(10),
Status varchar(10),
Komkod varchar(10),
Fornvn varchar(10)
)
insert into #TmpTable values
('01010101', '01', null, null ),
('01010101', null , '0430', null ),
('01010101', null , null, 'Test' ),
('02020202', '10', null, null ),
('02020202', null, '3004', null ),
('02020202', null , null, 'Test' )
Then the output:
I want to merge the 6 lines into 2.
I've tried diffrent things (Windowed function, Inner join) but never found a good solution.
So now I need some good ideas. Perforce is king.
You can simply aggregate which automatically ignores nulls:
select pnr,
Max(status) Status,
Max(Komkod) Komkod,
Max(Fornvn) Fornvn
from #TmpTable
group by Pnr;
Related
I can't believe I can't figure this out or find anything related to this. I'm trying to generate a set of column headers dynamically but with no data (as if it was an empty table).
SELECT Null AS [CODE], Null AS [DESC];
will return
CODE DESC
----------- -----------
NULL NULL
which is close, but I need it to have no records:
CODE DESC
----------- -----------
As best I can replicate the exact requirement:
DECLARE #Table TABLE
(
[CODE] bit NULL,
[DESC] bit NULL
);
SELECT [CODE], [DESC]
FROM #Table;
Which is what I'll go with if I can't find anything similar but this just feels soooo verbose for something that feels trivial?
Just use a false condition in a WHERE clause:
SELECT Null AS [CODE], Null AS [DESC]
WHERE 1=0
See the demo.
This way you can pass any value to the 2 columns, not just null.
Try to use CTE
WITH CTE AS(
SELECT Null AS [CODE], Null AS [DESC]
)
SELECT * FROM CTE WHERE [CODE] IS NOT NULL AND [DESC] IS NOT NULL
SELECT null as [Code], null as [DESC]
WHERE 1=2
An example scenario for my question would be:
How to get all persons who has multiple address types?
Now here's my sample data:
CREATE TABLE #tmp_1 (
ID uniqueidentifier PRIMARY KEY
, FirstName nvarchar(max)
, LastName nvarchar(max)
)
CREATE TABLE #tmp_2 (
SeedID uniqueidentifier PRIMARY KEY
, SomeIrrelevantCol nvarchar(max)
)
CREATE TABLE #tmp_3 (
KeyID uniqueidentifier PRIMARY KEY
, ID uniqueidentifier REFERENCES #tmp_1(ID)
, SeedID uniqueidentifier REFERENCES #tmp_2(SeedID)
, SomeIrrelevantCol nvarchar(max)
)
INSERT INTO #tmp_1
VALUES
('08781F73-A06B-4316-B6A5-802ED58E54BE', 'AAAAAAA', 'aaaaaaa'),
('4EC71FCE-997C-46AA-B119-6C5A2545DDC2', 'BBBBBBB', 'bbbbbbb'),
('B0726ABF-738E-48BC-95CB-091C9D731A0E', 'CCCCCCC', 'ccccccc'),
('6C6CE284-A63C-49D2-B2CC-F25C9CBC8FB8', 'DDDDDDD', 'ddddddd')
INSERT INTO #tmp_2
VALUES
('4D10B4EC-C929-4D6B-8C94-11B680CF2221', 'Value1'),
('4C891FE9-60B6-41BE-A64B-11A9A8B58AB2', 'Value2'),
('6F6EFED6-8EA0-4F70-A63F-6A103D0A71BD', 'Value3')
INSERT INTO #tmp_3
VALUES
(NEWID(), '08781F73-A06B-4316-B6A5-802ED58E54BE', '4D10B4EC-C929-4D6B-8C94-11B680CF2221', 'sdfsdgdfbgcv'),
(NEWID(), '08781F73-A06B-4316-B6A5-802ED58E54BE', '4C891FE9-60B6-41BE-A64B-11A9A8B58AB2', 'asdfadsas'),
(NEWID(), '08781F73-A06B-4316-B6A5-802ED58E54BE', '4C891FE9-60B6-41BE-A64B-11A9A8B58AB2', 'xxxxxeeeeee'),
(NEWID(), '4EC71FCE-997C-46AA-B119-6C5A2545DDC2', '4D10B4EC-C929-4D6B-8C94-11B680CF2221', 'sdfsdfsd'),
(NEWID(), 'B0726ABF-738E-48BC-95CB-091C9D731A0E', '4D10B4EC-C929-4D6B-8C94-11B680CF2221', 'zxczxcz'),
(NEWID(), 'B0726ABF-738E-48BC-95CB-091C9D731A0E', '6F6EFED6-8EA0-4F70-A63F-6A103D0A71BD', 'eerwerwe'),
(NEWID(), '6C6CE284-A63C-49D2-B2CC-F25C9CBC8FB8', '4D10B4EC-C929-4D6B-8C94-11B680CF2221', 'vbcvbcvbcv')
Which gives you:
This is my attempt:
SELECT
t1.*
, Cnt -- not really needed. Just added for visual purposes
FROM #tmp_1 t1
LEFT JOIN (
SELECT
xt.ID
, COUNT(1) Cnt
FROM (
SELECT
#tmp_3.ID
, COUNT(1) as Cnt
FROM #tmp_3
GROUP BY ID, SeedID
) xt
GROUP BY ID
) t2
ON t1.ID = t2.ID
WHERE t2.Cnt > 1
Which gives:
ID FirstName LastName Cnt
B0726ABF-738E-48BC-95CB-091C9D731A0E CCCCCCC ccccccc 2
08781F73-A06B-4316-B6A5-802ED58E54BE AAAAAAA aaaaaaa 2
Although this gives me the correct results, I'm afraid that this query is not the right way to do this performance-wise because of the inner queries. Any input is very much appreciated.
NOTE:
A person can have multiple address of the same address types.
"Person-Address" is not the exact use-case. This is just an example.
The Cnt column is not really needed in the result set.
The way you have named your sample tables and data help little in understanding the problem.
I think you want all IDs which have 2 or more SomeIrrelevantCol values in the last table?
This can be done by:
select * from #tmp_1
where ID in
(
select ID
from #tmp_3
group by ID
having count(distinct SomeIrrelevantCol)>=2
)
Specifically I'm using SQL Server Compact 4.0, if that makes a difference.
I have 3 tables (note,userTable,verse). the user and verse table have no correlation except in this note table, so I can't do a single subquery joining the two tables.
INSERT INTO [note]
([verse_id]
,[user_id]
,[text]
,[date_created]
,[date_modified])
VALUES
( (SELECT Id FROM verse
WHERE volume_lds_url = 'ot'
AND book_lds_url = 'gen'
AND chapter_number = 8
AND verse_number = 16)
, (SELECT Id FROM userTable
WHERE username = 'canichols2')
,'test message'
,GETDATE()
,GETDATE());
GO
As far as I can tell, the statement should work.
The outer statements works fine if i hard code the Foreign Key values, and each of the subqueries work as they should and only return one column and one row each.
Error Message:There was an error parsing the query. [ Token line number = 8,Token line offset = 14,Token in error = SELECT ]
So It doesn't like the subquery in a scalar values clause, but I Can't figure out how to use a
INSERT INTO .... SELECT ....
statement with the 2 different tables.
Table Definitions
Since #Prasanna asked for it, here's the deffinitions
CREATE TABLE [userTable] (
[Id] int IDENTITY (1,1) NOT NULL
, [username] nvarchar(100) NOT NULL
, [email] nvarchar(100) NOT NULL
, [password] nvarchar(100) NULL
);
GO
ALTER TABLE [userTable] ADD CONSTRAINT [PK_user] PRIMARY KEY ([Id]);
GO
CREATE TABLE [note] (
[Id] int IDENTITY (1,1) NOT NULL
, [verse_id] int NULL
, [user_id] int NULL
, [text] nvarchar(4000) NOT NULL
, [date_created] datetime DEFAULT GETDATE() NOT NULL
, [date_modified] datetime NULL
);
GO
ALTER TABLE [note] ADD CONSTRAINT [PK_note] PRIMARY KEY ([Id]);
GO
CREATE TABLE [verse] (
[Id] int IDENTITY (1,1) NOT NULL
, [volume_id] int NULL
, [book_id] int NULL
, [chapter_id] int NULL
, [verse_id] int NULL
, [volume_title] nvarchar(100) NULL
, [book_title] nvarchar(100) NULL
, [volume_long_title] nvarchar(100) NULL
, [book_long_title] nvarchar(100) NULL
, [volume_subtitle] nvarchar(100) NULL
, [book_subtitle] nvarchar(100) NULL
, [volume_short_title] nvarchar(100) NULL
, [book_short_title] nvarchar(100) NULL
, [volume_lds_url] nvarchar(100) NULL
, [book_lds_url] nvarchar(100) NULL
, [chapter_number] int NULL
, [verse_number] int NULL
, [scripture_text] nvarchar(4000) NULL
);
GO
ALTER TABLE [verse] ADD CONSTRAINT [PK_scriptures] PRIMARY KEY ([Id]);
GO
I'm aware it's not in the 1st normal form or anything, But that's how it was given to me, and I didn't feel like dividing it up into multiple tables.
SubQuery Results
To show the results and how there's only 1 row.
SELECT Id FROM WHERE volume_lds_url = 'ot'
AND book_lds_url = 'gen'
AND chapter_number = 8
AND verse_number = 16
Id
200
And the second subquery
SELECT Id FROM userTable
WHERE username = 'canichols2'
Id
1
Attention: The target system is SQL-Server-Compact-CE-4
This smaller brother seems not to support neither sub-selects as scalar values, nor declared variables. Find details in comments...
Approach 1
As long as you can be sure, that the sub-select returns exactly one scalar value, it should be easy to transform your VALUES to a SELECT. Try this:
INSERT INTO [note]
([verse_id]
,[user_id]
,[text]
,[date_created]
,[date_modified])
SELECT
(SELECT Id FROM verse
WHERE volume_lds_url = 'ot'
AND book_lds_url = 'gen'
AND chapter_number = 8
AND verse_number = 16)
, (SELECT Id FROM userTable
WHERE username = 'canichols2')
,'test message'
,GETDATE()
,GETDATE();
Approach 2
No experience with Compact editions of SQL-Server, but you might try this:
DECLARE #id1 INT=(SELECT Id FROM verse
WHERE volume_lds_url = 'ot'
AND book_lds_url = 'gen'
AND chapter_number = 8
AND verse_number = 16);
DECLARE #id2 INT=(SELECT Id FROM userTable
WHERE username = 'canichols2');
INSERT INTO [note]
([verse_id]
,[user_id]
,[text]
,[date_created]
,[date_modified])
SELECT #id1
,#id2
,'test message'
,GETDATE()
,GETDATE();
SELECT
dbo.PaymentMaster.Code,
dbo.PaymentMaster.OrderNo,
dbo.PaymentMaster.CustIssueNo,
dbo.PaymentMaster.Amount,
dbo.CustomerMaster.CustName,
dbo.OrderRequestMaster.ORequestQty,
InvoiceMaster.SGST,
InvoiceMaster.CGST,
InvoiceMaster.SGST,
InvoiceMaster.SubTotal,
InvoiceMaster.TotalAmount
FROM
dbo.CustomerIssueMaster,
dbo.PaymentMaster,
dbo.CustomerMaster,
dbo.OrderMaster,
dbo.OrderRequestMaster,
InvoiceMaster
WHERE (dbo.PaymentMaster.CustIssueNo = dbo.CustomerIssueMaster.CustIssueNo
OR OrderMaster.OrderNo = PaymentMaster.OrderNo)
AND OrderRequestMaster.ORequestNo = OrderMaster.ORequestNo
I want to create invoice where all conditions should be match but I didn't get any row when there is one row which should be given as result.
I got this result from payment master. and I want customer info, order request info , order info and payment info in invoice master with invoice master field i.e Code, InvoiceNo, CustNo, InvoiceDate, SubTotal, CGST, SGST, TotalAmount.
Code PaymentNo OrderNo CustIssueNo PaymentMode PaymentStatus Amount PaymentDate InvoiceNo InvoiceDate
4 KCS[P][00004] NULL KCS[CI][00001] Cash Paid 500 14:28.3 NULL NULL
5 KCS[P][00005] KCS[O][00001] NULL Cash Paid 2000 22:33.9 NULL NULL
6 KCS[P][00006] KCS[O][00002] NULL Cash Paid 2000 40:38.1 NULL NULL
I got this results
Without any data we are going to be very hard pressed to help. You say one row should meet criteria, but it is very hard to see how only one record would be returned unless your tables have only one record!
I note from your query that you want CustName, without including CustomerMaster in your WHERE clause - this will lead to trouble!
Please use the following code in a query window and add any missing fields that are important. Then create some INSERT lines to populate. Re-post this so that we can help you with a query that does what you want:
declare #CustomerIssueMaster table
(
CustIssueNo int
)
declare #PaymentMaster table
(
Code int,
PaymentNo varchar(50),
OrderNo varchar(50),
CustIssueNo varchar(50),
PaymentMode varchar(50),
PaymentStatus varchar(50),
Amount decimal (18,2),
PaymentDate datetime,
InvoiceNo int,
InvoiceDate datetime
)
declare #CustomerMaster table
(
CustName varchar(50)
)
declare #OrderRequestMaster table
(
ORequestQty int,
ORequestNo int
)
declare #InvoiceMaster table
(
SGST decimal(18,2),
CGST decimal(18,2),
SubTotal decimal(18,2),
TotalAmount decimal(18,2)
)
declare #OrderMaster table
(
OrderNo int
)
INSERT INTO #PaymentMaster VALUES(4, 'KCS[P][00004]', NULL, 'KCS[CI][00001]','Cash','Paid',500,'2014-03-28', NULL, NULL)
INSERT INTO #PaymentMaster VALUES(5, 'KCS[P][00005]', 'KCS[O][00001]', NULL, 'Cash','Paid',2000,'2014-03-28', NULL, NULL)
INSERT INTO #PaymentMaster VALUES(6, 'KCS[P][00006]', 'KCS[O][00002]', NULL, 'Cash','Paid',2000,'2014-03-28', NULL, NULL)
SELECT * FROM #PaymentMaster
EDIT
Please note I have amended the structure of #PaymentMaster and given three INSERTs to mimic the data you show from SELECT * FROM PaymentMaster, with one exception: I do not understand what format your PaymentDate is supposed to be. The first two looked like times, but the third cannot be. I just stuck in a dummy date instead.
Now will you please do the same for the other tables? I have shown you the format you need to use.
I have a table that has about 2.2 million rows and for each row I want the topmost parent (root) together on each row if the row has one. Below you can see my Query With only one row.
This statement takes about 45 Seconds and this will take time to run this query. Not all have a parent key so about 1 million do not have a parent. That could be something to think about. But hopefully some of you have a better solution on this problem and I hope you can share it.
WITH allRows
AS (SELECT Organisasjonsnummer AS ID,
Navn,
Organisasjonsnummer [RootId],
Navn [RootName]
FROM Virksomhetstjeneste.Virksomhet
WHERE Hovedenhet_id IS NULL
UNION ALL
SELECT a1.Organisasjonsnummer AS ID,
a1.Navn,
a2.[RootId],
a2.[RootName]
FROM Virksomhetstjeneste.Virksomhet a1
JOIN allRows a2
ON a2.ID = a1.Hovedenhet_id)
SELECT *
FROM allRows
Where ID = 980659763
Result
ID Navn RootId RootName
980659763 NILLE AS AVD ALTA 953581477 NILLE AS
I'm on record many times as being a fan of hierarchyid. Here's how I'd go about doing it for your situation.
First things first: forgive me for not using your table and column names; I don't speak Norwegian and going back and forth between English and that was error prone for me. So here's the setup:
USE [tempdb];
IF OBJECT_ID('dbo.myTable') IS NOT NULL
DROP TABLE [dbo].[myTable];
CREATE TABLE [dbo].[myTable]
(
[ID] INT NOT NULL ,
CONSTRAINT [PK_myTable] PRIMARY KEY ( [ID] ) ,
[ParentID] INT NULL ,
[Name] VARCHAR(50) NOT NULL ,
[Path] HIERARCHYID NULL,
[Root] AS [Path].GetAncestor([Path].GetLevel() - 1) PERSISTED
);
INSERT INTO [dbo].[myTable]
( [ID], [ParentID], [Name] )
VALUES ( 1, NULL, '1' ),
( 2, 1, '2' ),
( 3, 1, '3' ),
( 4, 2, '4' );
WITH [allRows]
AS (
SELECT [ID] ,
[ParentID] ,
CAST(CONCAT('/', [ID], '/') AS VARCHAR(MAX)) AS [Path]
FROM [dbo].[myTable]
WHERE [ParentID] IS NULL
UNION ALL
SELECT [child].[ID] ,
[child].[ParentID] ,
CAST(CONCAT([parent].[Path], [child].[ID], '/') AS VARCHAR(MAX)) AS [Path]
FROM [dbo].[myTable] AS [child]
JOIN [allRows] AS [parent]
ON [parent].[ID] = [child].[ParentID]
)
UPDATE [m]
SET [m].[Path] = [a].[Path]
FROM [dbo].[myTable] AS [m]
JOIN [allRows] AS [a]
ON [a].[ID] = [m].[ID];
This is just your standard recursive CTE to walk a parent/child hierarchy. The trick here though is that I'm calculating something that I can use as a hierarchyid as I go. Once the hierarchy is walked, I update the base table with the calculated hierarchy.
Since you mentioned that your table is large-ish, you may want to batch those updates. I'll leave that as an exercise for the reader. Also keep in mind that this is a one-time operation (though you will have to keep the [Path] column up to date for inserts/updates/deletes; I also leave this as an exercise for the reader).
Now that you've got the hierarchy stored in row, you can do magic:
SELECT [child].[ID] ,
[child].[Name] ,
[root].[ID] ,
[root].[Name]
FROM [dbo].[myTable] AS [child]
JOIN [dbo].[myTable] AS [root]
ON [root].[Path] = [child].[Root]
WHERE child.[ID] = 4;
Which is to say that I can now get the top-level ancestor for a given ID with just a join. Having the root be a persisted computed column is a) fancy and b) unnecessary; it just made the last select a lot cleaner.
If you don't want to do that, you can drop the [Root] column completely and then the join predicate becomes [root].[Path] = [child].[Path].GetAncestor([Path].GetLevel() - 1).
Lastly, keep in mind that the hierarchyid data type is natively indexable. So you can index [Path], [Root], or both and it will likely improve performance.