SSMS T-SQL set column for duplicate row number - sql-server

I have a database with approximately 10 million rows (and 20 columns - about 4 GB) where about 10% of the rows have a duplicate column. Database is in SQL Server 2014 Express and using SSMS.
I created a new column CNT (int, null) to count the occurrences of each row where I have a duplicate ID. Desired result would look like:
ID CNT
100 1
100 2
101 1
102 1
102 2
103 1
104 1
Not being really familiar with advanced SQL capabilities I did some research and came up with using a CTE to set the CNT column. Worked fine on a small test table - but it was obvious this is not the way to go for a large table (I killed it after 5+ hours on a pretty decent system.)
Here's the code that I attempted to implement:
with CTE as
(select dbo.database.id, dbo.database.cnt,
RN = row_number() over (partition by id order by id)
from dbo.databasee)
update CTE set CNT = RN
Column ID is of type Int. All columns allow nulls - there are no keys or indexed columns.

Edit: Martin is right, I can only offer an alternate solution than the CTE at the moment. Make a new table exactly like your old one, and insert the old table's data into it with this.
INSERT INTO newTable
SELECT ID, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID)
FROM oldTable;
Then you can delete your old table. Definitely not a perfect solution, but it should work.

Related

SQL Get Second Record

I am looking to retrieve only the second (duplicate) record from a data set. For example in the following picture:
Inside the UnitID column there is two separate records for 105. I only want the returned data set to return the second 105 record. Additionally, I want this query to return the second record for all duplicates, not just 105.
I have tried everything I can think of, albeit I am not that experience, and I cannot figure it out. Any help would be greatly appreciated.
You need to use GROUP BY for this.
Here's an example: (I can't read your first column name, so I'm calling it JobUnitK
SELECT MAX(JobUnitK), Unit
FROM JobUnits
WHERE DispatchDate = 'oct 4, 2015'
GROUP BY Unit
HAVING COUNT(*) > 1
I'm assuming JobUnitK is your ordering/id field. If it's not, just replace MAX(JobUnitK) with MAX(FieldIOrderWith).
Use RANK function. Rank the rows OVER PARTITION BY UnitId and pick the rows with rank 2 .
For reference -
https://msdn.microsoft.com/en-IN/library/ms176102.aspx
Assuming SQL Server 2005 and up, you can use the Row_Number windowing function:
WITH DupeCalc AS (
SELECT
DupID = Row_Number() OVER (PARTITION BY UnitID, ORDER BY JobUnitKeyID),
*
FROM JobUnits
WHERE DispatchDate = '20151004'
ORDER BY UnitID Desc
)
SELECT *
FROM DupeCalc
WHERE DupID >= 2
;
This is better than a solution that uses Max(JobUnitKeyID) for multiple reasons:
There could be more than one duplicate, in which case using Min(JobUnitKeyID) in conjunction with UnitID to join back on the UnitID where the JobUnitKeyID <> MinJobUnitKeyID` is required.
Except, using Min or Max requires you to join back to the same data (which will be inherently slower).
If the ordering key you use turns out to be non-unique, you won't be able to pull the right number of rows with either one.
If the ordering key consists of multiple columns, the query using Min or Max explodes in complexity.

How to SELECT LIMIT in ASE 12.5? LIMIT 10, 10 gives syntax error?

How can I LIMIT the result returned by a query in Adaptive Server IQ/12.5.0/0306?
The following gives me a generic error near LIMIT:
SELECT * FROM mytable LIMIT 10, 10;
Any idea why? This is my first time with this dbms
Sybase IQ uses row_count to limit the number of rows returned.
You will have to set the row count at the beginning of your statement and decide whether the statement should be TEMPORARY or not.
ROW_COUNT
SET options
LIMIT statement is not supported in Sybase IQ 12. I think there is not simple or clean solution, similarly like in old SQL server. But there are some approches that works for SQL server 2000 and should work also for Sybase IQ 12. I don't promise that queries below will work on copy&paste.
Subquery
SELECT TOP 10 *
FROM mytable
WHERE Id NOT IN (
SELECT TOP 10 Id FROM mytable ORDER BY OrderingColumn
)
ORDER BY OrderingColumn
Basically, it fetches 10 rows but also skips first 10 rows. To get this works, rows must be unique and ordering is important. Id cannot be more times in results. Otherwise you can filter out valid rows.
Asc-Desc
Another workaround depends on ordering. It uses ordering and fetches 10 rows for second page and you have to take care of last page (it does not work properly with simple formula page * rows per page).
SELECT *
FROM
(
SELECT TOP 10 *
FROM
(
SELECT TOP 20 * -- (page * rows per page)
FROM mytable
ORDER BY Id
) AS t1
ORDER BY Id DESC
) AS t2
ORDER BY Id ASC
I've found some info about non working subqueries in FROM statement in ASE 12. This approach maybe is not possible.
Basic iteration
In this scenario you can just iterate through rows. Let's assume id of tenth row is 15. Then it will select next 10 rows after tenth row. Bad things happen when you will order by another column than Id. It is not possible.
SELECT TOP 10 *
FROM mytable
WHERE Id > 15
ORDER BY Id
Here is article about another workarounds in SQL server 2000. Some should also works in similar ways in Sybase IQ 12.
http://www.codeproject.com/Articles/6936/Paging-of-Large-Resultsets-in-ASP-NET
All those things are workarounds. If you can try to migrate on newer version.

SQLite Row_Num/ID

I have a SQLite database that I'm trying to use data from, basically there are multiple sensors writing to the database. And I need to join one row to the proceeding row to calculate the value difference for that time period. But the only catch is the ROWID field in the database can't be used to join on anymore since there are more sensors beginning to write to the database.
In SQL Server it would be easy to use Row_Number and partition by sensor. I found this topic: How to use ROW_NUMBER in sqlite and implemented the suggestion:
select id, value ,
(select count(*) from data b where a.id >= b.id and b.value='yes') as cnt
from data a where a.value='yes';
It works but is very slow. Is there anything simple I'm missing? I've tried to join on the time difference possibly, create a view. Just at wits end! Thanks for any ideas!
Here is sample data:
ROWID - SensorID - Time - Value
1 2 1-1-2015 245
2 3 1-1-2015 4456
3 1 1-1-2015 52
4 2 2-1-2015 325
5 1 2-1-2015 76
6 3 2-1-2015 5154
I just need to join row 6 with row 2 and row 3 with row 5 and so forth based on the sensorID.
The subquery can be sped up with an index with the correct structure.
In this case, the column with the equality comparison must come first, and the one with unequality, second:
CREATE INDEX xxx ON MyTable(SensorID, Time);

How can I assign a number to each row in a table representing the record number?

How can I show the number of rows in a table in a way that when a new record is added the number representing the row goes higher and when a record is deleted the number gets updated accordingly?
To be more clear,suppose I have a simple table like this :
ID int (primary key) Name varchar(5)
The ID is set to get incremented by itself (using identity specification) so it can't represent the number of row(record) since if I have for example 3 records as:
ID NAME
1 Alex
2 Scott
3 Sara
and I delete Alex and Scott and add a new record it will be:
3 Sara
4 Mina
So basically I'm looking for a sql-side solution for doing this so that I don't change anything else in the source code in multiple places.
I tried to write something to get the job done but it failes. Here it is :
SELECT COUNT(*) AS [row number],Name
FROM dbo.Test
GROUP BY ID, Name
HAVING (ID = ID)
This shows as:
row number Name
1 Alex
1 Scott
1 Sara
while I want it to get shown as:
row number Name
1 Alex
2 Scott
3 Sara
If you just want the number against the rows while selecting the data and not in the database then you can use this
select row_number() over(order by id) from dbo.Test
This will give the row number n for nth row.
Try
SELECT id, name, ROW_NUMBER() OVER (ORDER BY id) AS RowNumber
FROM MyTable
What you want is called an auto increment.
For SQL-Server this is achieved by adding the IDENTITY(1,1) attribute to the table definition.
Other RDBMS use a different syntax. Firebird for example has generators, which do the counting. In a BEFORE-INSERT trigger you would assign the ID-field to the current value of the generator (which will be increased automatically).
I had this exact problem a while ago, but I was using SQL Server 2000, so although row number() is the best solution, in SQL Server 2000, this isn't available. A workaround for this is to create a temporary table, insert all the values with auto increment, and replace the current table with the new table in T-SQL.

MS Access row number, specify an index

Is there a way in MS access to return a dataset between a specific index?
So lets say my dataset is:
rank | first_name | age
1 Max 23
2 Bob 40
3 Sid 25
4 Billy 18
5 Sally 19
But I only want to return those records between 'rank' 2 and 4, so my results set is Bob, Sid and Billy? However, Rank is not part of the table, and this should be generated when the query is run. Why don't I use an autogenerated number, because if a record is deleted, this will be inconsistent, and what if I wanted the results in reverse!
This obviously very simple, and the reason I ask is because I am working on a product catalogue and I am looking for a more efficient way of paging through the returned dataset, so if I only return 1 page worth of data from the database this is obviously going to be quicker then return a complete set of 3000 records and then having to subselect from that set!
Thanks R.
Original suggestion:
SELECT * from table where rank BETWEEN 2 and 4;
Modified after comment, that rank is not existing in structure:
Select top 100 * from table;
And if you want to choose subsequent results, you can choose the ID of the last record from the first query, say it was ID 101, and use a WHERE clause to get the next 100;
Select top 100 * from table where ID > 100;
But these won't give you what you're looking for either, I bet.
How are you calculating rank? I assume you are basing it on some data in another dataset somewhere. If so, create a function, do a table join, or do something that can calculate rank based on values in other table(s), then you can do queries based on the rank() function.
For example:
select *
from table
where rank() between 2 and 4
If you are not calculating rank based on some data somewhere, there really isn't a way to write this query, and you might as well be returning three random rows from the table.
I think you need to use a correlated subquery to calculate the rank on the fly e.g. I'm guessing the rank is based on name:
SELECT T1.first_name, T1.age,
(
SELECT COUNT(*) + 1
FROM MyTable AS T2
WHERE T1.first_name > T2.first_name
) AS rank
FROM MyTable AS T1;
The bad news is the Access data engine is poorly optimized for this kind of query; in my experience, performace will start to noticeably degrade beyond a few hundred rows.
If it is not possible to maintain the rank on the db side of the house (e.g. high insertion environment) consider doing the paging on the client side. For example, an ADO classic recordset object has properties to support paging (PageCount, PageSize, AbsolutePage, etc), something for which DAO recordsets (being of an older vintage) have no support.
As always, you'll have to perform your own timings but I suspect that when there are, say, 10K rows you will find it faster to take on the overhead of fetching all the rows to an ADO recordset then finding the page (then perhaps fabricate smaller ADO recordset consisting of just that page's worth of rows) than it is to perform a correlated subquery to only fetch the number of rows for the page.
Unfortunately the LIMIT keyword isn't available in MS Access -- that's what is used in MySQL for a multi-page presentation. If you can write an order key into the results table, then you can use it something like this:
SELECT TOP 25 MyOrder, Etc FROM Table1 WHERE MyOrder in
(SELECT TOP 55 MyOrder FROM Table1 ORDER BY MyOrder DESC)
ORDER BY MyOrder ASCENDING
If I understand you correctly, there is ionly first_name and age columns in your table. If this is the case, then there is no way to return Bob, Sid, and Billy with a single query. Unless you do something like
SELECT * FROM Table
WHERE FirstName = 'Bob'
OR FirstName = 'Sid'
OR FirstName = 'Billy'
But I think that this is not what you are looking for.
This is because SQL databases make no guarantee as to the order that the data will come out of the database unless you specify an ORDER BY clause. It will usually come out in the same order it was added, but there are no guarantees, and once you get a lot of rows in your table, there's a reasonably high probability that they won't come out in the order you put them in.
As a side note, you should probably add a "rank" column (this column is usually called id) to your table, and make it an auto incrementing integer (see Access documentation), so that you can do the query mentioned by Sev. It's also important to have a primary key so that you can be certain which rows are being updated when you are running an update query, or which rows are being deleted when you run a delete query. For example, if you had 2 people named Max, and they were both 23, how you delete 1 row without deleting the other. If you had another auto incrementing unique column in there, you could specify the unique ID in your query to delete only one.
[ADDITION]
Upon reading your comment, If you add an autoincrement field, and want to read 3 rows, and you know the ID of the first row you want to read, then you can use "TOP" to read 3 rows.
Assuming your data looks like this
ID | first_name | age
1 Max 23
2 Bob 40
6 Sid 25
8 Billy 18
15 Sally 19
You can wuery Bob, Sid and Billy with the following QUERY.
SELECT TOP 3 FirstName, Age
From Table
WHERE ID >= 2
ORDER BY ID

Resources