SQL NOT LIKE and LIKE - sql-server

I'm having a problem with understanding the LIKE and NOT LIKE operators in SQL. This is a query that I've executed:
select serial_number from UNIT U
group by serial_number
order by serial_number
which yields 2000 results.
When I execute this query, I get 1950 results:
select serial_number from UNIT U
WHERE op_name LIKE 'Assembly'
group by serial_number
order by serial_number
So when I execute this query, I expect to get 50 results, but instead I get 2000:
select serial_number from UNIT U
WHERE op_name NOT LIKE 'Assembly'
group by serial_number
order by serial_number
Any explanations? Thanks a bunch.

The group you're doing makes it not really valid to do the sort of count comparison you're attempting.
Suppose you have 10 unique serial numbers, and for each of those serial numbers there are two rows (so 20 rows total), one with op_name "Xyz", and another with op_name "Assembly". Your first query would return 10 rows. Your second query would return 10 rows. Your third query would return 10 rows. Because of the group, LIKE "Assembly" and NOT LIKE "Assembly" are not mutually exclusive.

NULL is neither LIKE nor NOT LIKE anything.
Actually, re-reading your numbers more carefully, there may be another cause. (My earlier point is true, but this is more likely.)
Suppose you have the following data:
serial_number | op_name
--------------+---------
1 | Assembly
1 | Not
1 will be returned by both queries.

Without the wildcards % your LIKE operates just like an =.
Check and see what the values are...most likely if you are using LIKE, you also want to use wildcards for example:
select serial_number from UNIT U WHERE op_name NOT LIKE '%Assembly%' group by serial_number order by serial_number

Related

order of column values in select from scalar type collection

If I have :
create type tvn as table of number;
select column_value from table(tvn(1,9,27,3,7))
Should Oracle return values in the order exactly matching they appear in collection constructor? I mean should it be always:
1
9
27
3
7
?
No SQL output has a guaranteed order unless you add ORDER BY. It might come back in order 99 times out of 100, but without the ORDER BY, there are no guarantees.
The most common counterexample is running a statement in parallel.

Group by an evaluated field (sql server) [duplicate]

Why are column ordinals legal for ORDER BY but not for GROUP BY? That is, can anyone tell me why this query
SELECT OrgUnitID, COUNT(*) FROM Employee AS e GROUP BY OrgUnitID
cannot be written as
SELECT OrgUnitID, COUNT(*) FROM Employee AS e GROUP BY 1
When it's perfectly legal to write a query like
SELECT OrgUnitID FROM Employee AS e ORDER BY 1
?
I'm really wondering if there's something subtle about the relational calculus, or something, that would prevent the grouping from working right.
The thing is, my example is pretty trivial. It's common that the column that I want to group by is actually a calculation, and having to repeat the exact same calculation in the GROUP BY is (a) annoying and (b) makes errors during maintenance much more likely. Here's a simple example:
SELECT DATEPART(YEAR,LastSeenOn), COUNT(*)
FROM Employee AS e
GROUP BY DATEPART(YEAR,LastSeenOn)
I would think that SQL's rule of normalize to only represent data once in the database ought to extend to code as well. I'd want to only right that calculation expression once (in the SELECT column list), and be able to refer to it by ordinal in the GROUP BY.
Clarification: I'm specifically working on SQL Server 2008, but I wonder about an overall answer nonetheless.
One of the reasons is because ORDER BY is the last thing that runs in a SQL Query, here is the order of operations
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
so once you have the columns from the SELECT clause you can use ordinal positioning
EDIT, added this based on the comment
Take this for example
create table test (a int, b int)
insert test values(1,2)
go
The query below will parse without a problem, it won't run
select a as b, b as a
from test
order by 6
here is the error
Msg 108, Level 16, State 1, Line 3
The ORDER BY position number 6 is out of range of the number of items in the select list.
This also parses fine
select a as b, b as a
from test
group by 1
But it blows up with this error
Msg 164, Level 15, State 1, Line 3
Each GROUP BY expression must contain at least one column that is not an outer reference.
There is a lot of elementary inconsistencies in SQL, and use of scalars is one of them. For example, anyone might expect
select * from countries
order by 1
and
select * from countries
order by 1.00001
to be a similar queries (the difference between the two can be made infinitesimally small, after all), which are not.
I'm not sure if the standard specifies if it is valid, but I believe it is implementation-dependent. I just tried your first example with one SQL engine, and it worked fine.
use aliasses :
SELECT DATEPART(YEAR,LastSeenOn) as 'seen_year', COUNT(*) as 'count'
FROM Employee AS e
GROUP BY 'seen_year'
** EDIT **
if GROUP BY alias is not allowed for you, here's a solution / workaround:
SELECT seen_year
, COUNT(*) AS Total
FROM (
SELECT DATEPART(YEAR,LastSeenOn) as seen_year, *
FROM Employee AS e
) AS inline_view
GROUP
BY seen_year
databases that don't support this basically are choosing not to. understand the order of the processing of the various steps, but it is very easy (as many databases have shown) to parse the sql, understand it, and apply the translation for you. Where its really a pain is when a column is a long case statement. having to repeat that in the group by clause is super annoying. yes, you can do the nested query work around as someone demonstrated above, but at this point it is just lack of care about your users to not support group by column numbers.

SQL Server : Row Number without ordering

I want to create a Select statement that ranks the column as is without ordering.
Currently, the table is in the following order:
ITEM_Description1
ITEM_Description2
ITEM_StockingType
ITEM_RevisionNumber
I do not want the results to be numerical in any way, nor depend on the VariableID numbers, but with ROW_Number(), I have to choose something. Does anyone know how I can have the results look like this?
Row| VariableName
---------------------
1 | ITEM_Description1
2 | ITEM_Description2
3 | ITEM_StockingType
4 | ITEM_RevisionNumber
My code for an example is shown below.
SELECT
VariableName,
ROW_NUMBER() OVER (ORDER BY VariableID) AS RowNumber
FROM
SeanVault.dbo.TempVarIDs
Using ORDER BY (SELECT NULL) will give you the results your looking for.
SELECT
VariableName,
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
FROM
SeanVault.dbo.TempVarIDs
Your problem seems to be with this sentence:
Currently, the table is in the following order:
No, your table is NOT implicitly ordered!!
Although it might look like this...
The only way to enforce the resultset's sort order is an ORDER BY-clause at the outer most SELECT.
If you want to maintain the sort order of your inserts you can use
a column like ID INT IDENTITY (which will automatically increase a sequence counter)
Using GETDATE() on insert will not solve this, as multiple row inserts might get the same DateTime value.
You do not have to show this in your output of course...
Your table has no inherent order. Even if you get that order a 100 times in a row is no guarantee it will be that order on the 101 time.
You can add an identity column to the table.

SQL Get Second Record

I am looking to retrieve only the second (duplicate) record from a data set. For example in the following picture:
Inside the UnitID column there is two separate records for 105. I only want the returned data set to return the second 105 record. Additionally, I want this query to return the second record for all duplicates, not just 105.
I have tried everything I can think of, albeit I am not that experience, and I cannot figure it out. Any help would be greatly appreciated.
You need to use GROUP BY for this.
Here's an example: (I can't read your first column name, so I'm calling it JobUnitK
SELECT MAX(JobUnitK), Unit
FROM JobUnits
WHERE DispatchDate = 'oct 4, 2015'
GROUP BY Unit
HAVING COUNT(*) > 1
I'm assuming JobUnitK is your ordering/id field. If it's not, just replace MAX(JobUnitK) with MAX(FieldIOrderWith).
Use RANK function. Rank the rows OVER PARTITION BY UnitId and pick the rows with rank 2 .
For reference -
https://msdn.microsoft.com/en-IN/library/ms176102.aspx
Assuming SQL Server 2005 and up, you can use the Row_Number windowing function:
WITH DupeCalc AS (
SELECT
DupID = Row_Number() OVER (PARTITION BY UnitID, ORDER BY JobUnitKeyID),
*
FROM JobUnits
WHERE DispatchDate = '20151004'
ORDER BY UnitID Desc
)
SELECT *
FROM DupeCalc
WHERE DupID >= 2
;
This is better than a solution that uses Max(JobUnitKeyID) for multiple reasons:
There could be more than one duplicate, in which case using Min(JobUnitKeyID) in conjunction with UnitID to join back on the UnitID where the JobUnitKeyID <> MinJobUnitKeyID` is required.
Except, using Min or Max requires you to join back to the same data (which will be inherently slower).
If the ordering key you use turns out to be non-unique, you won't be able to pull the right number of rows with either one.
If the ordering key consists of multiple columns, the query using Min or Max explodes in complexity.

Postgresql inner select with distinct

I'm using Postgresql 9.2 and have a simple students table as follow
id | proj_id | mark | name | test_date
I have 2 queries which is described below
select * from (select distinct on (proj_id) proj_id , mark, name,
test_date from students )
t
where t.mark <= 1000
VS
select distinct on (proj_id) proj_id , mark, name, test_date from
students where mark <= 1000
when I run each query for more than 10000 records each query returns different result especially result count although for less than 3000 records the result would be the same.
is this postgresql 9.2 bug or I'm missing something ?
Your queries are producing two different sets of results because they are applying the logic differently.
The first query is getting a distinct set of results, and then applying the 'mark' filter.
The second query is applying the 'mark' filter, and then getting a distinct set of results.
As you don't have any ordering applied the first query could potential return a different number of rows each time it is run - as the mark field could contain any of the values that relate to the proj_id.

Resources