SQL Server select distinct some columns but show all - sql-server

I have a SQL Server database that updates weekly, there is often repeat information but the datetime column is always different. How do I display all columns including the timestamp but only display unique rows not including the timestamp?
I need to keep the information for a month so I am unable to delete the repeat rows. Showing the most recent timestamp row would be ideal, but that does not matter.
Thanks in advance.

SELECT col_a, col_b, col_c...., max(timestamp)
FROM table
GROUP BY col_a, col_b, col_c... -- Same cols as above except timestamp

Related

Enable SYSTEM_VERSIONING Error - Overlapping Dates in History Table

I recently migrated my SQL 2019 database from a VM into Azure SQL.
I used the MS Data Migration tool, but unfortunately, it wouldn't migrate data from Temporal Tables.
So. I just used the tool to create the table schemas and then used SSIS to move the data.
Since my existing history table had data in it, I wanted to keep the SysStartDate and SysEndDate fields. In order to do this, I had to disable SYSTEM_VERSIONING in my Azure SQL database as well as DROP the PERIOD on the table.
The data migration was a success so I re-created my PERIOD on the table but when I tried to enable SYSTEM_VERSIONING with a specified history table, I get the following error:
Msg 13573, Level 16, State 0, Line 34
Setting SYSTEM_VERSIONING to ON failed because history table 'xxxxxHistory' contains overlapping records.
I find this odd because the existing tables were originally joined as a temporal table so I don't understand why there would be a conflict now.
ALTER TABLE xxx.xxx
ADD PERIOD FOR SYSTEM_TIME(SysStartTime, SysEndTime)
ALTER TABLE xxx.xxx
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE=xxx.xxxHistory))
I expect to get a successful temporal table. Instead, I get the following error:
Msg 13573, Level 16, State 0, Line 34
Setting SYSTEM_VERSIONING to ON failed because history table 'xxxxxHistory' contains overlapping records.
I ran the following query to identify the overlaps but I don't get any:
SELECT
xxxxKeyNumeric
,SysStartTime
,SysEndTime
FROM
xxxx.xxxxhistory o
WHERE EXISTS
(
SELECT
1
FROM
xxxx.xxxxhistory o2
WHERE
o2.xxxxKeyNumeric = o.xxxxKeyNumeric
AND o2.SysStartTime <= o.SysEndTime
AND o.SysStartTime <= o2.SysEndTime
AND o2.xxxxPK != o.xxxxPK
)
ORDER BY
o.xxxxKeyNumeric,
o.SysStartTime
I found this explanation for the error:
"There are multiple records for the same record with overlapping start and end dates. The end date for the last row in the history table should match the start date for the active record in the parent table" blog of a DBA
This happened to me after switching the historic table, touching a few rows, then trying to go back to the old historic table.
UPDATE: Happened again, and this time the table had millions of rows. I had to write a query, comparing the start date and end date of every row in the history table.
Possible causes:
For every PK, the start dates and end dates of the history rows must not overlap. The query below will find this specific issue.
the end date of the latest row in the history for that PK, has a later end date than the start date of the PK in the main table. It is possible to modify the above query to do this
in the rows with a same PK, 2 rows cover the same time interval. If they overlap by a single millisecond, and someone requests that exact millisecond, it won't know which of the 2 versions is the correct one.
For the first issue:
select ant.*,post.* , DATEDIFF(day,ant.end_date,post.start_date)
from
(SELECT
PK_column
, start_date
, end_date
, ROW_NUMBER() OVER(PARTITION BY PK_column ORDER BY end_date desc, start_date desc) AS current
,(ROW_NUMBER() OVER(PARTITION BY PK_column ORDER BY end_date desc, start_date desc))-1 AS previous
FROM huge_table_HIST
) ant
inner join
(SELECT
key_column
, start_date
, end_date
, ROW_NUMBER() OVER(PARTITION BY PK_column ORDER BY end_date desc, start_date desc ) AS current
FROM huge_table_HIST
) post
ON ant.PK_column=post.PK_column AND ant.previous=post.current
WHERE ant.end_date > post.start_date
Surprisingly, it doesn't fail if:
you have multiple rows with exactly the same start end and end date, for the same PK. SQL Server seems to consider them a single point in space, instead of an interval. They will only appear if you request the exact millisecond in which they exist.
there are gaps between the end date of a history row, and the start end of the next one. SQL server considers that the PK just didn't exist in that time interval.
Temporal tables depend on the temporal table's primary key values combined with the SysStartTime do determine uniqueness in the history table.
This can very easily happen if you make changes to the primary key definition. Also, if your history table's fields corresponding to the temporal table's PK are not populated, or many / all are populated with a default value, overlaps are detected and you get that error.
Check that your PK is defined on the system versioned temporal table, then check that the corresponding values in your history table's primary key fields are correct (i.e. unique for any given PK & SysStartTime value.)
You may have to update the history table accordingly before applying the system versioning relationship again.
This error can also occur when there are multiple records per Primary Key for any given
GENERATED ALWAYS AS ROW START or GENERATED ALWAYS AS ROW END columns.
The following queries will help identify those records.
select ID
from dbo.HistoryTable
group by ID, SysStartTime
having count(*) > 1
select ID
from dbo.HistoryTable
group by ID, SysEndTime
having count(*) > 1

SQL Server find column difference from different tables and fill intermediate dates

I have the purchase entry in one table #temp1 and sales history in another table #temp2 for multiple stores. There might be no sales, no purchase, or both/either of them in a day. I need to build a graph of daily stock.
Basically, I am stuck in the query part. For first part I need to combine both tables to view the data together...
Secondly, I need to find the cumulative values for the stock ; something like ...
After I get I need to plot it finally... help out !!! QUERY MASTER !!!
If you start out by using a Union something like:
SELECT Store, Date, Purchase, 0 Sales FROM #temp1
UNION ALL
SELECT Store, Date, 0, Sales FROM #temp2
You have all the data in one table/view. From there, you can get things consolidated by
SELECT
Store, Date,
Sum(Purchase) Purchase,
Sum(Sales) Sales,
Sum(Purchase) - Sum(Sales) InStock
GROUP BY
Store, Date
That will give you a view with the Store, Date, Purchases, Sales and In Stock in one row. If you work things via query rather than temp tables, you can easily use the final view to feed SSRS and draw your graph.
Hope that helps.
Yes, the hint by #mark worked...
select store,date,ISNULL(purchase,0) as purchase,0 as sales
into #tbl
from temp1
union all
select store,date,0 as purchase,ISNULL(sales,0) as sales from temp2
select store,sum(purchase) as PUR,sum(sales) as SAL,sum(purchase-sales) as STOCK from #tbl
group by store,date
order by storename
drop table #tbl
And, the empty dates in between are automatically managed by the SSRS reporting tool.
But, the cumulative sums are not able to solve till now...

Difference between duplicate check if using Distinct and Group by with aggregate

Okay it has been quite some time since I have used SQL Server very intensively for writing queries.
There has to be some gotcha that I am missing.
As per my understanding the following two queries should return the same number of duplicate records
SELECT COUNT(INVNO)
, INVNO
FROM INVOICE
GROUP BY INVNO
HAVING COUNT(INVNO) > 1
ORDER BY INVNO
SELECT DISTINCT invno
FROM INVOICE
ORDER BY INVNO
There are no null values in INVNO
Where could I be possible going wrong?
Those queries will not return same results. First one will only give you INVNO values that have duplicates, second will give all unique INVNO values, even if they appear only once in entire table.
the group by query will filter our all the single invoices while the distinct will simply pick one from every invoice. First query is a subset of the second
In addition to what Adam said, the GROUP BY will sort the data on the GROUPed columns.

SQL SERVER - Retrieve Last Entered Data

I've searched for long time for getting last entered data in a table. But I got same answer.
SELECT TOP 1 CustomerName FROM Customers
ORDER BY CustomerID DESC;
My scenario is, how to get last data if that Customers table is having CustomerName column only? No other columns such as ID or createdDate I entered four names in following order.
James
Arun
Suresh
Bryen
Now I want to select last entered CustomerName, i.e., Bryen. How can I get it..?
If the table is not properly designed (IDENTITY, TIMESTAMP, identifier generated using SEQUENCE etc.), INSERT order is not kept by SQL Server. So, "last" record is meaningless without some criteria to use for ordering.
One possible workaround is if, by chance, records in this table are linked to some other table records (FKs, 1:1 or 1:n connection) and that table has a timestamp or something similar and you can deduct insertion order.
More details about "ordering without criteria" can be found here and here.
; with cte_new as (
select *,row_number() over(order by(select 1000)) as new from tablename
)
select * from cte_new where new=4

Indexing on DateTime and VARCHAR fields in SQL Server 2000, which one is more effectient?

We have a CallLog table in Microsoft SQL Server 2000. The table contains CallEndTime field whose type is DATETIME, and it's an index column.
We usually delete free-charge calls and generate monthly fee statistics report and call detail record report, all the SQLs use CallEndTime as query condition in WHERE clause. Due to a lot of records exist in CallLog table, the queries are slow, so we want to optimize it starting from indexing.
Question
Will it more effictient if query upon an extra indexed VARCHAR column CallEndDate ?
Such as
-- DATETIME based query
SELECT COUNT(*) FROM CallLog WHERE CallEndTime BETWEEN '2011-06-01 00:00:00' AND '2011-06-30 23:59:59'
-- VARCHAR based queries
SELECT COUNT(*) FROM CallLog WHERE CallEndDate BETWEEN '2011-06-01' AND '2011-06-30'
SELECT COUNT(*) FROM CallLog WHERE CallEndDate LIKE '2011-06%'
SELECT COUNT(*) FROM CallLog WHERE CallEndMonth = '2011-06'
It has to be the datetime. Dates are essentially stored as a number in the database so it is relatively quick to see if the value is between two numbers.
If I were you, I'd consider splitting the data over multiple tables (by month, year of whatever) and creating a view to combine the data from all those tables. That way, any functionality which needs to entire data set can use the view and anything which only needs a months worth of data can access the specific table which will be a lot quicker as it will contain much less data.
I think comparing DateTime is much faster than LIKE operator.
I agree with DoctorMick on Spliting your DateTime as persisted columns Year, Month, Day
for your query which selects COUNT(*), check if in the execution plan there is a Table LookUp node. if so, this might be because your CallEndTime column is nullable. because you said that you have a [nonclustered] index on CallEndTime column. if you make your column NOT NULL and rebuild that index, counting it would be a INDEX SCAN which is not so slow.and I think you will get much faster results.

Resources