SQL Server : query help on how to find oldest value? - sql-server

I have a table called dbo.files with a column created_time that contain values like:
2012-05-21 13:28:56.960
This tables has over 138 million rows.
I would like to find the oldest value in the table within the created_time column. This would tell me when I first started writing to the table.
What I have tried so far as a test
select count(*)
from dbo.files
where created_time < '2010-01-01 00:00:00.000'
this comes back with 0
How can I find the oldest value in that column?

SELECT min(created_time)
FROM dbo.Files

Related

String or binary data would be truncated error in SQL server. How to know the column name throwing this error

I have an insert Query and inserting data using SELECT query and certain joins between tables.
While running that query, it is giving error "String or binary data would be truncated".
There are thousands of rows and multiple columns I am trying to insert in that table.
So it is not possible to visualize all data and see what data is throwing this error.
Is there any specific way to identify which column is throwing this error? or any specific record not getting inserted properly and resulted into this error?
I found one article on this:
RareSQL
But this is when we insert data using some values and that insert is one by one.
I am inserting multiple rows at the same time using SELECT statements.
E.g.,
INSERT INTO TABLE1 VALUES (COLUMN1, COLUMN2,..) SELECT COLUMN1, COLUMN2,.., FROM TABLE2 JOIN TABLE3
Also, in my case, I am having multiple inserts and update statements and even not sure which statement is throwing this error.
You can do a selection like this:
select TABLE2.ID, TABLE3.ID TABLE1.COLUMN1, TABLE1.COLUMN2, ...
FROM TABLE2
JOIN TABLE3
ON TABLE2.JOINCOLUMN1 = TABLE3.JOINCOLUMN2
LEFT JOIN TABLE1
ON TABLE1.COLUMN1 = TABLE2.COLUMN1 and TABLE1.COLUMN2 = TABLE2.COLUMN2, ...
WHERE TABLE1.ID = NULL
The first join reproduces the selection you have been using for the insert and the second join is a left join, which will yield null values for TABLE1 if a row having the exact column values you wanted to insert does not exist. You can apply this logic to your other queries, which were not given in the question.
You might just have to do it the hard way. To make it a little simpler, you can do this
Temporarily remove the insert command from the query, so you are getting a result set out of it. You might need to give some of the columns aliases if they don't come with one. Then wrap that select query as a subquery, and test likely columns (nvarchars, etc) like this
Select top 5 len(Col1), *
from (Select col1, col2, ... your query (without insert) here) A
Order by 1 desc
This will sort the rows with the largest values in the specified column first and just return the rows with the top 5 values - enough to see if you've got a big problem or just one or two rows with an issue. You can quickly change which column you're checking simply by changing the column name in the len(Col1) part of the first line.
If the subquery takes a long time to run, create a temp table with the same columns but with the string sizes large (like varchar(max) or something) so there are no errors, and then you can do the insert just once to that table, and run your tests on that table instead of running the subquery a lot
From this answer,
you can use temp table and compare with target table.
for example this
Insert into dbo.MyTable (columns)
Select columns
from MyDataSource ;
Become this
Select columns
into #T
from MyDataSource;
select *
from tempdb.sys.columns as TempCols
full outer join MyDb.sys.columns as RealCols
on TempCols.name = RealCols.name
and TempCols.object_id = Object_ID(N'tempdb..#T')
and RealCols.object_id = Object_ID(N'MyDb.dbo.MyTable)
where TempCols.name is null -- no match for real target name
or RealCols.name is null -- no match for temp target name
or RealCols.system_type_id != TempCols.system_type_id
or RealCols.max_length < TempCols.max_length ;

Talend: Get most common value in a column

I have a table with a couple hundred rows. I want to know the most common value of the data in one of the columns. How do I go about that?
I recommend you do it in your sql query with something like this :
select top 1 column, count(*) cnt
from table
group by column
order by count(*) desc
This syntax has to be adapted to your rdbms. For instance, in Oracle it would be something like this :
select column from (
select column, count(*)
from table
group by column
order by count(*) desc
) where rownum = 1
If you want to do it in Talend you can use :
Input -- tAggregateRow -- tSortRow -- tSampleRow -- Output
In tAggregateRow you use a count function to count the frequency of values in your column, then you sort them by descending order in tSortRow, then you get the first line with tSampleRow (just put "1")

How do I setup a daily archive job in SQL server to keep my DB small and quick?

I have a DB in SQL server and one of the tables recieves a large amount of data every day (+100 000). The data is reported on, but my client only needs it for 7 days. On the odd occasion he will require access to historic data, but this can be ignored.
I need to ensure that my primary lookup table stays as small as can be (so that my queries are as quick as possible), and any data older than 7 days goes into a secondary (archiving) table, within the same database. Data feeds in consistently to my primary table throughout the day from a variety of data sources.
How would I go about performing this? I managed to get to the code below through using other questions, butI am now recieving an error ("Msg 8101, Level 16, State 1, Line 12
An explicit value for the identity column in table 'dbo.Archived Data Import' can only be specified when a column list is used and IDENTITY_INSERT is ON. ").
Below is my current code:
DECLARE #NextIDs TABLE(IndexID int primary key)
DECLARE #7daysago datetime
SELECT #7daysago = DATEADD(d, -7, GetDate())
WHILE EXISTS(SELECT 1 FROM [dbo].[Data Import] WHERE [Data Import].[Receive Date] < #7daysago)
BEGIN
BEGIN TRAN
INSERT INTO #NextIDs(IndexID)
SELECT TOP 10000 IndexID FROM [dbo].[Data Import] WHERE [Data Import].[Receive Date] < #7daysago
INSERT INTO [dbo].[Archived Data Import]
SELECT *
FROM [dbo].[Data Import] AS a
INNER JOIN #NextIDs AS b ON a.IndexID = b.IndexID
DELETE [dbo].[Data Import]
FROM [dbo].[Data Import]
INNER JOIN #NextIDs AS b ON a.IndexID = b.IndexID
DELETE FROM #NextIDs
COMMIT TRAN
END
What am I doing wrong here? Im using SQL server 2012 Express, so cannot partition (which would be ideal).
Beyond this, how do I turn this into a daily recurring task? Any help would be much appreciated.
An explicit value for the identity column in table 'dbo.Archived Data Import' can only be specified when a column list is used and IDENTITY_INSERT is ON
So... set identity insert on. Also, use DELETE ... OUTPUT INTO ... rather than SELECT ->
INSERT -> DELETE.
DECLARE #7daysago datetime
SELECT #7daysago = DATEADD(d, -7, GetDate());
SET IDENTITY_INSERT [dbo].[Archived Data Import] ON;
WITH CTE as (
SELECT TOP 10000 *
FROM [dbo].[Data Import]
WHERE [Data Import].[Receive Date] < #7daysago)
DELETE CTE
OUTPUT DELETED.id, DELTED.col1, DELETED.col2, ...
INTO [dbo].[Archived Data Import] (id, col1, col2, ....);
Beyond this, how do I turn this into a daily recurring task?
Use conversation timers and activated procedures. See Scheduling Jobs in SQL Server Express.
Without seeing your Table definitions, I am going to assume that your archive table has the same definition as your current table. Am I right in assuming that You have an identity column as Archived Data Import.IndexID? If so, switch it to ba an int large enough to hold expected values.
In order to schedule, this you will need to create a bat file to run this procedure and schedule it with windows scheduler.

Conditional SQL on Volume of Records

I need to set up an "alert" (will send an email) if the insertion of records in, say, 1 hour, exceeds, say, 10 records.
I'm thinking about doing this in an INSERT trigger, but I'm not clear on what the best way to check that condition (or the syntax for it) is.
Thanks
I don't think a trigger and an audit table is the best way to do this. What I would do is the following:
Add a column (of type datetime) to your table called CreateDT
You can add a default value of GETDATE() to the column.
Then in an external process you can do a select like the following
SELECT COUNT(*)
FROM TABLE
WHERE CreateDT > dateadd(hour,datediff(hour,0,datediff(hour,-1,getdate())),0)
and CreateDT < dateadd(hour,datediff(hour,0,#datetime),0)
This will check the prior hour counts.
To check the last 24 hours counts you could get a list like this
SELECT HOUR(CreateDT), COUNT(*)
FROM (
SELECT CreateDt
FROM TABLE
WHERE CreateDT > dateadd(day,-1,getdate())
) T
GROUP BY HOUR(CreateDT)
Assuming you have created time on your table
Create Trigger trig_CheckRecordCount
ON TableName
For Insert
as
Begin
IF( (select COUNT(*)
from TableName
where CreatedOn >
(
Select DATEADD(HOUR,-1, CreatedOn)
from inserted))>10)
begin
//call a stored procedure to send email
end
end

Select rows based on time or all rows if the table is empty

I have 2 tables. One is a production table and the other is a daily reporting table. The daily reporting table is emptied each day. The daily reporting table is a subset of the production table. I want to update the daily table with all new rows from production table. I thought about using a where clause;
SELECT ftime,
fdate,
fdata,
fdata2
INTO table2
FROM table1
WHERE ftime > table2.ftime
I am not having much luck. I am new to SQL and I am just not sure how to go about this, and can't seem to find anything on the net for this specific issue.
This will eventually go into a stored procedure when I get it working.
Any tips, hints, would be greatly appreciated.
SELECT ftime,
fdate,
fdata,
fdata2
INTO table2
FROM table1
WHERE ftime > (select MAX(ftime) from table2)
OR NOT EXISTS (select * FROM table2);
If table2 is empty (such as if you have just done your daily purge), all of table1 will get pulled into table2.
Otherwise it will only insert the new records from table1 with ftime later than what exists in table2.
Make sure you have an index on table2.ftime

Resources