Related
I have two tables in SQL DB. They both contain 3 columns that match, and additional columns that have different info in each one. I want to write a query that Interleaves them according to date / timestamp. Table A is for a machine that runs and takes a sample every 10 minutes. Table B is the logfile that has entries logged when operator makes adjustments, turns machine on / off, etc.
I have used the following query but it is giving me duplicates on table A.
I did the where(BatchTable.Batch = 'HB20419' and EventLogTable.Batch = 'HB20419') just to cut down on the amount of date being returned until I get the query figured out. One complication is each table has it's own date / time columns and they are named different and completely independent of each other.
SELECT BatchTable.Asset_Number,BatchTable.Recipe,BatchTable.Batch,BatchTable.Group_No, BatchTable.Sample_No, BatchTable.SampleDate, BatchTable.SampleTime, BatchTable.Weight, EventLogTable.EvtTime, EventLogTable.EvtValueBefore, EventLogTable.EvtValueAfter, EventLogTable.EvtComment
FROM BatchTable,EventLogTable
where(BatchTable.Batch = 'HB20419' and EventLogTable.Batch = 'HB20419')
order by Asset_Number, Recipe, Batch, Group_No, Sample_No ASC
Here is how that query would look using aliases, formatting and ANSI-92 style joins.
SELECT bt.Asset_Number
, bt.Recipe
, bt.Batch
, bt.Group_No
, bt.Sample_No
, bt.SampleDate
, bt.SampleTime
, bt.Weight
, elt.EvtTime
, elt.EvtValueBefore
, elt.EvtValueAfter
, elt.EvtComment
FROM BatchTable bt
join EventLogTable elt on elt.Batch = bt.Batch
WHERE bt.Batch = 'HB20419'
ORDER BY Asset_Number
, Recipe
, Batch
, Group_No
, Sample_No ASC
I had to make up some sample data, but it sounds like you want to union the two tables together to "interleave" them. You can do this by aliasing the column names to match and selecting null values for the final values from the opposite table. I acknowledge that I'm guessing at your desired outcome to some extent.
Make some sample data:
DECLARE #batch table (SampleDate VARCHAR(MAX), SampleTime VARCHAR(MAX), Recipe
VARCHAR(MAX))
DECLARE #event table (EvtTime DATETIME, EvtComment VARCHAR(MAX))
INSERT INTO #batch (SampleDate, SampleTime, Recipe) VALUES ('2018-08-09', '11:56:25
AM', 'Peanut Butter'), ('2018-08-09', '12:11:25 PM', 'Chocolate')
INSERT INTO #event (EvtTime, EvtComment) VALUES ('2018-08-09 11:58:22 AM', 'Turned up
speed'), ('2018-08-09 11:59:22 AM', 'Turned down temperature')
Then select and union to interleave:
SELECT CONVERT(DATETIME, CAST(SampleDate + ' ' + SampleTime AS datetime)) AS [Date],
Recipe, NULL as EvtComment FROM #batch
UNION
SELECT EvtTime AS [Date], NULL AS Recipe, EvtComment FROM #event
ORDER BY [Date]
Which yields:
Date Recipe EvtComment
----------------------- ------------------------- -------------------------
2018-08-09 11:56:25.000 Peanut Butter NULL
2018-08-09 11:58:22.000 NULL Turned up speed
2018-08-09 11:59:22.000 NULL Turned down temperature
2018-08-09 12:11:25.000 Chocolate NULL
USE dev_db
GO
CREATE TABLE T1_VALS
(
[SITE_ID] [int] NULL,
[LATITUDE] [numeric](10, 6) NULL,
[UNIQUE_ID] [int] NULL,
[COLLECT_RANK] [int] NULL,
[CREATED_RANK] [int] NULL,
[UNIQUE_ID_RANK] [int] NULL,
[UPDATE_FLAG] [int] NULL
)
GO
INSERT INTO T1_VALS
(SITE_ID,LATITUDE,UNIQUE_ID,COLLECT_RANK,CREATED_RANK,UNIQUEID_RANK)
VALUES
(207442,40.900470,59664,1,1,1)
(207442,40.900280,61320,1,1,2)
(204314,40.245220,48685,1,2,2)
(204314,40.245910,59977,1,1,1)
(202416,39.449530,9295,1,1,2)
(202416,39.449680,62264,1,1,1)
I generated the COLLECT_RANK and CREATED_RANK columns from two date columns (not shown here) and the UNIQUEID_RANK column from the UNIQUE_ID which is used here.
I used a SELECT OVER clause with ranking function to generate these columns. A _RANK value of 1 means the latest date or greatest UNIQUE_ID value. I thought my solution would be pretty straight forward using these rank values via array and cursor processing but I seem to have painted myself into a corner.
My problem: I need to choose LONGITUDE value and its UNIQUE_ID based upon the following business rules and set the update value, (1), for that record in its UPDATE_FLAG column.
Select the record w/most recent Collection Date (i.e. RANK value = 1) for a given SITE_ID. If multiple records exist w/same Collection Date (i.e. same RANK value), select the record w/most recent Created Date (RANK value =1) for a given SITE_ID. If multiple records exist w/same Created Date, select the record w/highest Unique ID for a given SITE_ID (i.e. RANK value = 1).
Your suggestions would be most appreciated.
I think you can use top and order by:
select top 1 t1.*
from t1_vals
order by collect_rank asc, create_rank, unique_id desc;
If you want this for sites, which might be what your question is asking, then use row_number():
select t1.*
from (select t1.*,
row_number() over (partition by site_id order by collect_rank asc, create_rank, unique_id desc) as seqnum
from t1_vals
) t1
where seqnum = 1;
I have few tables and basically I'm working out on telerik reports. The structure and the sample data I have is given below:
IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('Leave'))
BEGIN;
DROP TABLE [Leave];
END;
GO
IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('Addition'))
BEGIN;
DROP TABLE [Addition];
END;
GO
IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('Deduction'))
BEGIN;
DROP TABLE [Deduction];
END;
GO
IF EXISTS(SELECT 1 FROM sys.tables WHERE object_id = OBJECT_ID('EmployeeInfo'))
BEGIN;
DROP TABLE [EmployeeInfo];
END;
GO
CREATE TABLE [EmployeeInfo] (
[EmpID] INT NOT NULL PRIMARY KEY,
[EmployeeName] VARCHAR(255)
);
CREATE TABLE [Addition] (
[AdditionID] INT NOT NULL PRIMARY KEY,
[AdditionType] VARCHAR(255),
[Amount] VARCHAR(255),
[EmpID] INT FOREIGN KEY REFERENCES EmployeeInfo(EmpID)
);
CREATE TABLE [Deduction] (
[DeductionID] INT NOT NULL PRIMARY KEY,
[DeductionType] VARCHAR(255),
[Amount] VARCHAR(255),
[EmpID] INT FOREIGN KEY REFERENCES EmployeeInfo(EmpID)
);
CREATE TABLE [Leave] (
[LeaveID] INT NOT NULL PRIMARY KEY,
[LeaveType] VARCHAR(255) NULL,
[DateFrom] VARCHAR(255),
[DateTo] VARCHAR(255),
[Approved] Binary,
[EmpID] INT FOREIGN KEY REFERENCES EmployeeInfo(EmpID)
);
GO
INSERT INTO EmployeeInfo([EmpID], [EmployeeName]) VALUES
(1, 'Marcia'),
(2, 'Lacey'),
(3, 'Fay'),
(4, 'Mohammad'),
(5, 'Mike')
INSERT INTO Addition([AdditionID], [AdditionType], [Amount], [EmpID]) VALUES
(1, 'Bonus', '2000', 2),
(2, 'Increment', '5000', 5)
INSERT INTO Deduction([DeductionID], [DeductionType], [Amount], [EmpID]) VALUES
(1, 'Late Deductions', '2000', 4),
(2, 'Delayed Project Completion', '5000', 1)
INSERT INTO Leave([LeaveID],[LeaveType],[DateFrom],[DateTo], [Approved], [EmpID]) VALUES
(1, 'Annual Leave','2018-01-08 04:52:03','2018-01-10 20:30:53', 1, 1),
(2, 'Sick Leave','2018-02-10 03:34:41','2018-02-14 04:52:14', 1, 2),
(3, 'Casual Leave','2018-01-04 11:06:18','2018-01-05 04:11:00', 1, 3),
(4, 'Annual Leave','2018-01-17 17:09:34','2018-01-21 14:30:44', 1, 4),
(5, 'Casual Leave','2018-01-09 23:31:16','2018-01-12 15:11:17', 1, 3),
(6, 'Annual Leave','2018-02-16 18:01:03','2018-02-19 17:16:04', 1, 2)
The query I am using to get the output is something like this:
SELECT Info.EmployeeName, Addition.AdditionType, Addition.Amount, Deduction.DeductionType, Deduction.Amount,
Leave.LeaveType,
SUM(DATEDIFF(Day, Leave.DateFrom, Leave.DateTo)) [#OfLeaves],
DatePart(MONTH, Leave.DateFrom)
FROM EmployeeInfo Info
LEFT JOIN Leave
ON Info.EmpID = Leave.EmpID
LEFT JOIN Addition
ON Info.EmpID = Addition.EmpID
LEFT JOIN Deduction
ON Info.EmpID = Deduction.EmpID
WHERE Approved = 1
GROUP BY Info.EmployeeName, Addition.AdditionType, Addition.Amount, Deduction.DeductionType, Deduction.Amount,
Leave.LeaveType,
DatePart(MONTH, Leave.DateFrom)
I actually want to get the output which I could be able to show on the report but somehow as I'm using joins the data is repeating on multiple rows for same user and that's why it's also appearing multiple times on the report.
The output I am getting is something like this
Fay NULL NULL NULL NULL Casual Leave 4 1
Lacey Bonus 2000 NULL NULL Annual Leave 3 2
Lacey Bonus 2000 NULL NULL Sick Leave 4 2
Marcia NULL NULL Delayed Project Completion 5000 Annual Leave 2 1
Mohammad NULL NULL Late Deductions 2000 Annual Leave 4 1
Although what I want it looks something like this:
Fay NULL NULL NULL NULL Casual Leave 4 1
Lacey Bonus 2000 NULL NULL Annual Leave 3 2
Lacey NULL NULL NULL NULL Sick Leave 4 2
Marcia NULL NULL Delayed Project Completion 5000 Annual Leave 2 1
Mohammad NULL NULL Late Deductions 2000 Annual Leave 4 1
As there was only one bonus and it was not allocated multiple times than it should appear one time. I am stuck in formatting the table layout so I think I might able to get a hint in formatting the output in query so I won't have to do there.
Best,
My own recommendation on this case is to change the left joins to a single table in the following way:
select
info.employeename, additiontype, additionamount, deductiontype, deductionamount, leavetype, #ofleaves, leavemth
from Employeeinfo info
join
(
Select
Leave.empid, null as additiontype, null as additionamount, null as deductiontype, null as deductionamount, leave.leavetype, DATEDIFF(Day, Leave.DateFrom, Leave.DateTo) [#OfLeaves], DatePart(MONTH, DateFrom) leavemth
from leave
where approved = 1
Union all
Select
Addition.empid, additiontype, amount, null, null, null, null, null
From addition
Union all
Select empid, null, null, deductiontype, amount, null, null, null
From deduction
) payadj on payadj.empid= info.empid
This approach separates the different pay adjustments into the different columns and also ensures that you don't get the double ups where this joins add multiple employee IDs.
You might need to explicitly name all the null columns for each Union - I haven't tested it, but I thought you only need to name the columns in a union all once.
The output comes in the format below;
employeename bonus leavetype
Lacey 2000 null
Lacey null Sick Leave
Lacey null Annual Leave
Rather than type out the full result set here is a link to sqlfiddle;
http://sqlfiddle.com/#!18/935e9/5/0
The problem you're facing is based on how you are joining the tables together. It's not syntax that's necessarily wrong but how we look at the data and how we understand the relationships between the tables. When doing the LEFT JOINs your query is able to find EmpIDs in each table and it is happy with that and grabs the records (or returns NULL if there are no records matching the EmpID). That isn't really what you're looking for since it can join too much together. So let's see why this is happening. If we take out the join to the Addition table your results would look like this:
Fay NULL NULL Casual Leave 4 1
Lacey NULL NULL Annual Leave 3 2
Lacey NULL NULL Sick Leave 4 2
Marcia Delayed Project Completion 5000 Annual Leave 2 1
Mohammad Late Deductions 2000 Annual Leave 4 1
You are still left with two rows for Lacey. The reason for these two rows is because of the join to the Leave table. Lacey has taken two leaves of absence. One for Sick Leave and the other for Annual Leave. Both of those records share the same EmpID of 2. So when you join to the Addition table (and/or to the rest of the tables) on EmpID the join looks for all matching records to complete that join. There's a single Addition record that matches two Leave records joined on EmpID. Thus, you end up with two Bonus results--the same Addition record for the two Leave records. Try running this query and check the results, it should also illustrate the problem:
SELECT l.LeaveType, l.EmpID, a.AdditionType, a.Amount
FROM Leave l
LEFT JOIN Addition a ON a.EmpID = l.EmpID
The results using your provided data would be:
Annual Leave 1 NULL NULL
Sick Leave 2 Bonus 2000
Casual Leave 3 NULL NULL
Annual Leave 4 NULL NULL
Casual Leave 3 NULL NULL
Annual Leave 2 Bonus 2000
So the data itself isn't wrong. It's just that when joining on EmpID in this way the relationships may be confusing.
So the problem is the relationship between the Leave table and the others. It doesn't make sense to join Leave to the Addition or Deduction tables directly on EmpID because it may look as though Lacey received a bonus for each leave of absence for example. This is what you are experiencing here.
I would suggest three separate queries (and potentially three reports). One to return the leave of absence data and the others for the Addition and Deduction data. Something like:
--Return each employee's leaves of absence
SELECT e.EmployeeName
, l.LeaveType
, SUM(DATEDIFF(Day, l.DateFrom, l.DateTo)) [#OfLeaves]
, DatePart(MONTH, l.DateFrom)
FROM EmployeeInfo e
LEFT JOIN Leave l ON e.EmpID = l.EmpID
WHERE l.Approved = 1
--Return each employee's Additions
SELECT e.EmployeeName
, a.AdditionType
, a.Amount
FROM EmployeeInfo e
LEFT JOIN Addition a ON e.EmpID = a.EmpID
--Return each employee's Deductions
SELECT e.EmployeeName
, d.DeductionType
, d.Amount
FROM EmployeeInfo e
LEFT JOIN Deduction d ON e.EmpID = d.EmpID
Having three queries should better represent the relationship the EmployeeInfo table has with each of the others and separate concerns. From there you can GROUP BY the different types of data and aggregate the values and get total counts and sums.
Here are some resources which may help if you hadn't found these already:
Explanation of SQL Joins: https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
SQL Join Examples: https://www.w3schools.com/sql/sql_join.asp
Telerik Reporting Documentation: https://docs.telerik.com/reporting/overview
This table is for the purpose of demo, but I have physical table whose values I need to insert into another table. there is no primary key in this table. The question I have is - Is the only way to get all the data in one SELECT statement using aggregate values (using SUM, AVG, etc.) and non-aggregate fields is listing all the not aggregate fields in the GROUP BY clause or is there some other way as well? What would be the impact of listing a large number of fields in the GROUP BY clause?
Here is the sample:
CREATE TABLE #SummaryData(
[Col_Name] varchar(20) not NULL,
[Col_Date] datetime NULL,
[ColC] [decimal](18, 4) NULL,
[ColD] [decimal](18, 4) NULL,
[ColE] [decimal](18, 4) NULL
)
INSERT INTO #SummaryData ([Col_Name],[Col_Date],[ColC],[ColD],[ColE])
VALUES ('BOA' ,'03/10/2017', 2.4507 ,33536.0000 ,0.0073)
INSERT INTO #SummaryData ([Col_Name],[Col_Date],[ColC],[ColD],[ColE])
VALUES ('BOA' , '03/11/2017' , 9.9419,47041.0000, 0.0088)
INSERT INTO #SummaryData ([Col_Name],[Col_Date],[ColC],[ColD],[ColE])
VALUES ('Merrill Lynch', '03/10/2017', 2.8152, 32371.0000, 0.0042)
INSERT INTO #SummaryData ([Col_Name],[Col_Date],[ColC],[ColD],[ColE])
VALUES ('Merrill Lynch', '03/11/2017', 9.9333, 35671.0000, 0.0444)
--NOTE: Next SELECT will be used to INSERT data into another table, so I need all fields
SELECT [Col_Name],[Col_Date],[ColC],
CASE WHEN SUM([ColE]) > 0 THEN SUM([ColD])/SUM([ColE]) ELSE 0 END AS SomeVal , [ColE]
FROM #SummaryData
GROUP BY [Col_Name],[Col_Date],[ColE],[ColC]
If I do not include ColE and ColC in the GROUP BY clause I get:
Msg 8120, Level 16, State 1, Line 21
Column '#SummaryData.Col_Date' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Whenever you use an aggregate function, all non-aggregate values in your SELECT statement need to appear in your group by statement. If you want to insert aggregate values then you need to use the group by. With that said, why do you need to use the SUM function? This would only be needed if you had duplicate entries you were consolidating. The below query avoids the SUM and thus does not need a group by.
SELECT [Col_Name],[Col_Date],[ColC],
CASE WHEN [ColE] > 0 THEN [ColD]/[ColE] ELSE 0 END AS SomeVal , [ColE]
FROM #SummaryData
If you want to see all of the records, you can't use a GROUP BY at all. If you need intermediate values such as SUM(ColE) and SUM(ColD) from the whole table, you can calculate them and put them into a variable. Then you can use the variables however you want to.
DECLARE #SumE DECIMAL(18, 4);
SELECT #SumE = SUM(ColE) FROM #SummaryData
That's totally correct, Group by has all non aggregate functions.
but Why ?
Simple Demo:-
create table emp (empid int , departmentName varchar(15))
go
insert into emp values (1 , 'HR')
insert into emp values (2 , 'HR')
insert into emp values (3 , 'HR')
insert into emp values (4 , 'Sales')
insert into emp values (5 , 'Sales')
insert into emp values (7 , 'Developemnet')
insert into emp values (8 , 'Developemnet')
insert into emp values (9 , 'Developemnet')
insert into emp values (10 , 'Developemnet')
insert into emp values (11 , 'Developemnet')
The Desired Result is:-
countEmpID departmentName
5 Developemnet
3 HR
2 Sales
so for achieving that, you MUST select count (empid) & departmentName then Group by with non aggregate functions (departmentName) because this is way to making groups via next code:-
select count (empid) countEmpID, departmentName
from emp
group by departmentName
and this way if you didn't put non aggragate functions in group by, the next error will be raised:-
Msg 8120, Level 16, State 1, Line 15 Column 'emp.departmentName' is
invalid in the select list because it is not contained in either an
aggregate function or the GROUP BY clause.
Hope it helps.
I have a table of DetailRecords containing records that seem to be "duplicates" of other records, but they have a unique primary key [ID]. I would like to delete these "duplicates" from the DetailRecords table and keep the record with the longest/highest Duration. I can tell that they are linked records because their DateTime field is within 3 seconds of another row's DateTime field and the Duration is within 2 seconds of one another. Other data in the row will also be duplicated exactly, such as Number, Rate, or AccountID, but this could be the same for the data that is not "duplicate" or related.
CREATE TABLE #DetailRecords (
[AccountID] INT NOT NULL,
[ID] VARCHAR(100) NULL,
[DateTime] VARCHAR(100) NULL,
[Duration] INT NULL,
[Number] VARCHAR(200) NULL,
[Rate] DECIMAL(8,6) NULL
);
I know that I will most likely have to perform a self join on the table, but how can I find two rows that are similar within a DateTime range of plus or minus 3 seconds, instead of just exactly the same?
I am having the same trouble with the Duration within a range of plus or minus 2 seconds.
The key is taking the absolute value of the difference between the dates and durations. I don't know SQL server, but here's how I'd do it in SQLite. The technique should be the same, only the specific function names will be different.
SELECT a.id, b.id
FROM DetailRecords a
JOIN DetailRecords b
ON a.id > b.id
WHERE abs(strftime("%s", a.DateTime) - strftime("%s", b.DateTime)) <= 3
AND abs(a.duration - b.duration) <= 2
Taking the absolute value of the difference covers the "plus or minus" part of the range. The self join is on a.id > b.id because a.id = b.id would duplicate every pair.
Given the entries...
ID|DateTime |Duration
1 |2014-01-26T12:00:00|5
2 |2014-01-26T12:00:01|6
3 |2014-01-26T12:00:06|6
4 |2014-01-26T12:00:03|11
5 |2014-01-26T12:00:02|10
6 |2014-01-26T12:00:01|6
I get the pairs...
5|4
2|1
6|1
6|2
And you should really store those dates as DateTime types if you can.
You could use a self-referential CTE and compare the DateTime fields.
;WITH CTE AS (
SELECT AccountID,
ID,
DateTime,
rn = ROW_NUMBER() OVER (PARTITION BY AccountID, ID, <insert any other matching keys> ORDER BY AccountID)
FROM table
)
SELECT earliestAccountID = c1.AccountID,
earliestDateTime = c1.DateTime,
recentDateTime = c2.DateTime,
recentAccountID = c2.AccountID
FROM cte c1
INNER JOIN cte c2
ON c1.rn = 1 AND c2.rn = 2 AND c1.DateTime <> c2.DateTime
Edit
I made several assumptions about the data set, so this may not be as relevant as you need. If you're simply looking for difference between possible duplicates, specifically DateTime differences, this will work. However, this does not constrain to your date range, nor does it automatically assume what the DateTime column is used for or how it is set.