Row numbers for a query in informix - database

I am using informix database, I want a query which you could also generate a row number along with the query
Like
select row_number(),firstName,lastName
from students;
row_number() firstName lastName
1 john mathew
2 ricky pointing
3 sachin tendulkar
Here firstName, lastName are from Database, where as row number is generated in a query.

The best way is to use a (newly initialized) sequence.
begin work;
create sequence myseq;
select myseq.nextval,s.firstName,s.lastName from students s;
drop sequence myseq;
commit work;

You may not be able to use ROWID in a table that's fragmented across multiple DBSpaces, so any solution that uses ROWID is not particularly portable. It's also strongly discouraged.
If you don't have a SERIAL column in your source table (which is a better way of implementing this as a general concept), have a look at
CREATE SEQUENCE, which is more or less the equivalent of an Orrible function that generates unique numbers when SELECTed from (as opposed to SERIAL, which generates the unique number when the row is INSERTed).

Given a table called Table3 with 3 columns:
colnum name datatype
======= ===== ===
1 no text;
2 seq number;
3 nm text;
NOTE:
seq is a field within the Table that has unique values in ascending order. The numbers do not have to be contiguous.
Here is query to return a rownumber (RowNum) along with query result
SELECT table3.no, table3.seq, Table3.nm,
(SELECT COUNT(*) FROM Table3 AS Temp
WHERE Temp.seq < Table3.seq) + 1 AS RowNum
FROM Table3;

I think the easiest way would be to use the following code and adjust its return accordingly.
SELECT rowid, * FROM table
It works for me but please note that it will return the row number in the database, not the row number in the query.
P.S. it's an accepted answer from Experts Exchange.

select sum(1) over (order by rowid) as row_number, M.* from systables M

I know its an old question, but since i just faced this problem and got a soultion not mentioned here, i tough i could share it, so here it is:
1- You need to create a FUNCTION that return numbers in a given range:
CREATE FUNCTION fnc_numbers_in_range (pMinNumber INT, pMaxNumber INT)
RETURNING INT as NUMERO;
DEFINE numero INT;
LET numero = 0;
FOR numero = pMinNumber TO pMaxNumber
RETURN numero WITH RESUME;
END FOR;
END FUNCTION;
2- You Cross the results of this Function with the table you want:
SELECT * FROM TABLE (fnc_numbers_in_range(0,10000)), my_table;
The only thing is that you must know before-hand the number of rows you want, you may get this with the COUNT(*) Function.
This works with my Informix Database, other implementations may need some tweaking.

Using OLAP expressions you need the OVER() with something in it, since you don't want partitions include a SORT clause, like this:
SELECT ROW_NUMBER() OVER(ORDER BY lastName, firstName) AS rn, firstName, lastName
FROM students;
and if you don't want to order by name, you could use the way records were entered in the system by ordering by ROWID.

Related

SQL Server : Row Number without ordering

I want to create a Select statement that ranks the column as is without ordering.
Currently, the table is in the following order:
ITEM_Description1
ITEM_Description2
ITEM_StockingType
ITEM_RevisionNumber
I do not want the results to be numerical in any way, nor depend on the VariableID numbers, but with ROW_Number(), I have to choose something. Does anyone know how I can have the results look like this?
Row| VariableName
---------------------
1 | ITEM_Description1
2 | ITEM_Description2
3 | ITEM_StockingType
4 | ITEM_RevisionNumber
My code for an example is shown below.
SELECT
VariableName,
ROW_NUMBER() OVER (ORDER BY VariableID) AS RowNumber
FROM
SeanVault.dbo.TempVarIDs
Using ORDER BY (SELECT NULL) will give you the results your looking for.
SELECT
VariableName,
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
FROM
SeanVault.dbo.TempVarIDs
Your problem seems to be with this sentence:
Currently, the table is in the following order:
No, your table is NOT implicitly ordered!!
Although it might look like this...
The only way to enforce the resultset's sort order is an ORDER BY-clause at the outer most SELECT.
If you want to maintain the sort order of your inserts you can use
a column like ID INT IDENTITY (which will automatically increase a sequence counter)
Using GETDATE() on insert will not solve this, as multiple row inserts might get the same DateTime value.
You do not have to show this in your output of course...
Your table has no inherent order. Even if you get that order a 100 times in a row is no guarantee it will be that order on the 101 time.
You can add an identity column to the table.

SQL Get Second Record

I am looking to retrieve only the second (duplicate) record from a data set. For example in the following picture:
Inside the UnitID column there is two separate records for 105. I only want the returned data set to return the second 105 record. Additionally, I want this query to return the second record for all duplicates, not just 105.
I have tried everything I can think of, albeit I am not that experience, and I cannot figure it out. Any help would be greatly appreciated.
You need to use GROUP BY for this.
Here's an example: (I can't read your first column name, so I'm calling it JobUnitK
SELECT MAX(JobUnitK), Unit
FROM JobUnits
WHERE DispatchDate = 'oct 4, 2015'
GROUP BY Unit
HAVING COUNT(*) > 1
I'm assuming JobUnitK is your ordering/id field. If it's not, just replace MAX(JobUnitK) with MAX(FieldIOrderWith).
Use RANK function. Rank the rows OVER PARTITION BY UnitId and pick the rows with rank 2 .
For reference -
https://msdn.microsoft.com/en-IN/library/ms176102.aspx
Assuming SQL Server 2005 and up, you can use the Row_Number windowing function:
WITH DupeCalc AS (
SELECT
DupID = Row_Number() OVER (PARTITION BY UnitID, ORDER BY JobUnitKeyID),
*
FROM JobUnits
WHERE DispatchDate = '20151004'
ORDER BY UnitID Desc
)
SELECT *
FROM DupeCalc
WHERE DupID >= 2
;
This is better than a solution that uses Max(JobUnitKeyID) for multiple reasons:
There could be more than one duplicate, in which case using Min(JobUnitKeyID) in conjunction with UnitID to join back on the UnitID where the JobUnitKeyID <> MinJobUnitKeyID` is required.
Except, using Min or Max requires you to join back to the same data (which will be inherently slower).
If the ordering key you use turns out to be non-unique, you won't be able to pull the right number of rows with either one.
If the ordering key consists of multiple columns, the query using Min or Max explodes in complexity.

How can we add a column on the fly in a dynamic table in SQL SERVER?

My question needs little explanation so I'd like to explain this way:
I've got a table (lets call it RootTable), it has one million records, and not in any proper order. What I'm trying to do is to get number of rows(#ParamCount) from RootTable and at the same time these records must be sorted and also have an additional column(with unique data) added on the fly to maintain a key for row identification which will be used later in the program. It can take any number of parameters but my basic parameters are the two which mentioned below.
It's needed for SQL SERVER environment.
e.g.
RootTable
ColumnA ColumnB ColumnC
ABC city cellnumber
ZZC city1 cellnumber
BCD city2 cellnumber
BCC city3 cellnumber
Passing number of rows to return #ParamCount and columnA startswith
#paramNameStartsWith
<b>#paramCount:2 <br>
#ParamNameStartsWith:BC</b>
desired result:
Id(added on the fly) ColumnA ColumnB ColumnC
101 BCC city3 cellnumber
102 BCD city2 cellnumber
Here's another point about Id column. Id must maintain its order, like in the above result it's starting from 101 because 100 is already assigned to the first row when sorted and added column on the fly, and because it starts with "ABC" so obviously it won't be in the result set.
Any kind of help would be appreciated.
NOTE: My question title might not reflect my requirement, but I couldn't get any other title.
So first you need your on-the-fly-ID. This one is created by the ROW_NUMBER() function which is available from SQL Server 2005 onwards. What ROW_NUMBER() will do is pretty self-explaining i think. However it works only on a partition. The Partition is specified by the OVER clause. If you include GROUP BY within the OVER clause, you will have multiple partitions. In your case, there is only one partition which is the whole table, therefor GROUP BY is not necessary. However an ORDER BY is required so that the system knows which record should get which row number in the partition. The query you get is:
SELECT ROW_NUMBER() OVER (ORDER BY ColumnA) ID, ColumnA,ColumnB,ColumnC
FROM RootTable
Now you have a row number for your whole table. You cannot include any condition like your #ParamNameStartsWith parameter here because you wanted a row number set for the whole table. The query above has to be a subquery which provides the set on which the condition can be applied. I use a CTE here, i think that is better for readability:
;WITH OrderedList AS (
SELECT ROW_NUMBER() OVER (ORDER BY ColumnA) ID, ColumnA,ColumnB,ColumnC
FROM RootTable
)
SELECT *
FROM OrderedList
WHERE ColumnA LIKE #ParamNameStartsWith+'%'
Please note that i added the wildcard % after the parameter, so that the condition is basically "starts with" #ParamNameStartsWith.
Finally,if i got you right you wanted only #ParamCount rows. You can use your parameter directly with the TOP keyword which is also only possible with SQL Server 2005 or later.
;WITH OrderedList AS (
SELECT ROW_NUMBER() OVER (ORDER BY ColumnA) ID, ColumnA,ColumnB,ColumnC
FROM RootTable
)
SELECT TOP (#ParamCount) *
FROM OrderedList
WHERE ColumnA LIKE #ParamNameStartsWith+'%'

"order by newid()" - how does it work?

I know that If I run this query
select top 100 * from mytable order by newid()
it will get 100 random records from my table.
However, I'm a bit confused as to how it works, since I don't see newid() in the select list. Can someone explain? Is there something special about newid() here?
I know what NewID() does, I'm just
trying to understand how it would help
in the random selection. Is it that
(1) the select statement will select
EVERYTHING from mytable, (2) for each
row selected, tack on a
uniqueidentifier generated by NewID(),
(3) sort the rows by this
uniqueidentifier and (4) pick off the
top 100 from the sorted list?
Yes. this is pretty much exactly correct (except it doesn't necessarily need to sort all the rows). You can verify this by looking at the actual execution plan.
SELECT TOP 100 *
FROM master..spt_values
ORDER BY NEWID()
The compute scalar operator adds the NEWID() column on for each row (2506 in the table in my example query) then the rows in the table are sorted by this column with the top 100 selected.
SQL Server doesn't actually need to sort the entire set from positions 100 down so it uses a TOP N sort operator which attempts to perform the entire sort operation in memory (for small values of N)
In general it works like this:
All rows from mytable is "looped"
NEWID() is executed for each row
The rows are sorted according to random number from NEWID()
100 first row are selected
as MSDN says:
NewID() Creates a unique value of type
uniqueidentifier.
and your table will be sorted by this random values.
use select top 100 randid = newid(), * from mytable order by randid
you will be clarified then..
I have an unimportant query which uses newId() and joins many tables. It returns about 10k rows in about 3 seconds. So, newId() might be ok in such cases where performance is not too bad & does not have a huge impact. But, newId() is bad for large tables.
Here is the explanation from Brent Ozar's blog - https://www.brentozar.com/archive/2018/03/get-random-row-large-table/.
From the above link, I have summarized the methods which you can use to generate a random id. You can read the blog for more details.
4 ways to get a random row from a large table:
Method 1, Bad: ORDER BY NEWID() > Bad performance!
Method 2, Better but Strange: TABLESAMPLE > Many gotchas & is not really
random!
Method 3, Best but Requires Code: Random Primary Key >
Fastest, but won't work for negative numbers.
Method 4, OFFSET-FETCH (2012+) > Only performs properly with a clustered
index.
More on method 3:
Get the top ID field in the table, generate a random number, and look for that ID. For top N rows, call the code below N times or generate N random numbers and use in an IN clause.
/* Get a random number smaller than the table's top ID */
DECLARE #rand BIGINT;
DECLARE #maxid INT = (SELECT MAX(Id) FROM dbo.Users);
SELECT #rand = ABS((CHECKSUM(NEWID()))) % #maxid;
/* Get the first row around that ID */
SELECT TOP 1 *
FROM dbo.Users AS u
WHERE u.Id >= #rand;

Efficient checking of possible duplicate entities

I have a requirement to produce a list of possible duplicates before a user saves an entity to the database and warn them of the possible duplicates.
There are 7 criteria on which we should check the for duplicates and if at least 3 match we should flag this up to the user.
The criteria will all match on ID, so there is no fuzzy string matching needed but my problem comes from the fact that there are many possible ways (99 ways if I've done my sums corerctly) for at least 3 items to match from the list of 7 possibles.
I don't want to have to do 99 separate db queries to find my search results and nor do I want to bring the whole lot back from the db and filter on the client side. We're probably only talking of a few tens of thousands of records at present, but this will grow into the millions as the system matures.
Anyone got any thoughs of a nice efficient way to do this?
I was considering a simple OR query to get the records where at least one field matches from the db and then doing some processing on the client to filter it some more, but a few of the fields have very low cardinality and won't actually reduce the numbers by a huge amount.
Thanks
Jon
OR and CASE summing will work but are quite inefficient, since they don't use indexes.
You need to make UNION for indexes to be usable.
If a user enters name, phone, email and address into the database, and you want to check all records that match at least 3 of these fields, you issue:
SELECT i.*
FROM (
SELECT id, COUNT(*)
FROM (
SELECT id
FROM t_info t
WHERE name = 'Eve Chianese'
UNION ALL
SELECT id
FROM t_info t
WHERE phone = '+15558000042'
UNION ALL
SELECT id
FROM t_info t
WHERE email = '42#example.com'
UNION ALL
SELECT id
FROM t_info t
WHERE address = '42 North Lane'
) q
GROUP BY
id
HAVING COUNT(*) >= 3
) dq
JOIN t_info i
ON i.id = dq.id
This will use indexes on these fields and the query will be fast.
See this article in my blog for details:
Matching 3 of 4: how to match a record which matches at least 3 of 4 possible conditions
Also see this question the article is based upon.
If you want to have a list of DISTINCT values in the existing data, you just wrap this query into a subquery:
SELECT i.*
FROM t_info i1
WHERE EXISTS
(
SELECT 1
FROM (
SELECT id
FROM t_info t
WHERE name = i1.name
UNION ALL
SELECT id
FROM t_info t
WHERE phone = i1.phone
UNION ALL
SELECT id
FROM t_info t
WHERE email = i1.email
UNION ALL
SELECT id
FROM t_info t
WHERE address = i1.address
) q
GROUP BY
id
HAVING COUNT(*) >= 3
)
Note that this DISTINCT is not transitive: if A matches B and B matches C, this does not mean that A matches C.
You might want something like the following:
SELECT id
FROM
(select id, CASE fld1 WHEN input1 THEN 1 ELSE 0 "rule1",
CASE fld2 when input2 THEN 1 ELSE 0 "rule2",
...,
CASE fld7 when input7 THEN 1 ELSE 0 "rule2",
FROM table)
WHERE rule1+rule2+rule3+...+rule4 >= 3
This isn't tested, but it shows a way to tackle this.
What DBS are you using? Some support using such constraints by using server side code.
Have you considered using a stored procedure with a cursor? You could then do your OR query and then step through the records one-by-one looking for matches. Using a stored procedure would allow you to do all the checking on the server.
However, I think a table scan with millions of records is always going to be slow. I think you should work out which of the 7 fields are most likely to match are make sure these are indexed.
I'm assuming your system is trying to match tag ids of a certain post, or something similar. This is a multi-to-multi relationship and you should have three tables to handle it. One for the post, one for tags and one for post and tags relationship.
If my assumptions are correct then the best way to handle this is:
SELECT postid, count(tagid) as common_tag_count
FROM posts_to_tags
WHERE tagid IN (tag1, tag2, tag3, ...)
GROUP BY postid
HAVING count(tagid) > 3;

Resources