Select a large volume of data with like SQL server

Select a large volume of data with like SQL server - sql-server

I have a table with ID column
ID column is like this : IDxxxxyyy
x will be 0 to 9
I have to select row with ID like ID0xxx% to ID3xxx%, there will be around 4000 ID with % wildcard from ID0000% to ID3999%.
It is like combining LIKE with IN
Select * from TABLE where ID in (ID0000%,ID0001%,...,ID3999%)
I cannot figure out how to select with this condition.
If you have any idea, please help.
Thank you so much!

You can use pattern matching with LIKE. e.g.
WHERE ID LIKE 'ID[0-3][0-9][0-9][0-9]%'
Will match an string that:
Starts with ID (ID)
Then has a third character that is a number between 0 and 3 [0-3]
Then has 3 further numbers ([0-9][0-9][0-9])
This is not likely to perform well at all. If it is not too late to alter your table design, I would separate out the components of your Identifier and store them separately, then use a computed column to store your full id e.g.
CREATE TABLE T
(
NumericID INT NOT NULL,
YYY CHAR(3) NOT NULL, -- Or whatever type makes up yyy in your ID
FullID AS CONCAT('ID', FORMAT(NumericID, '0000'), YYY),
CONSTRAINT PK_T__NumericID_YYY PRIMARY KEY (NumericID, YYY)
);
Then your query is a simple as:
SELECT FullID
FROM T
WHERE NumericID >= 0
AND NumericID < 4000;
This is significantly easier to read and write, and will be significantly faster too.

This should do that, it will get all the IDs that start with IDx, with x that goes form 0 to 4
Select * from TABLE where ID LIKE 'ID[0-4]%'

You can try :
Select * from TABLE where id like 'ID[0-3][0-9]%[a-zA-Z]';

Related

Identify if a column is Virtual in Snowflake

Snowflake does not document its Virtual Column capability that uses the AS clause. I am doing a migration and needing to filter out virtual columns programatically.
Is there any way to identify that a column is virtual? The Information Schema.COLLUMNS view shows nothing different between a virtual and non-virtual column definition.

There is a difference between column defined as DEFAULT and VIRTUAL COLUMN(aka computed, generated column):
Virtual column
CREATE OR REPLACE TABLE T1(i INT, calc INT AS (i*i));
INSERT INTO T1(i) VALUES (2),(3),(4);
SELECT * FROM T1;
When using AS (expression) syntax the expression is not visible inCOLUMN_DEFAULT:
DEFAULT Expression
In case of the defintion DEFAULT (expression):
CREATE OR REPLACE TABLE T2(i INT, calc INT DEFAULT (i*i));
INSERT INTO T2(i) VALUES (2),(3),(4);
SELECT * FROM T2;
It is visible in COLUMN_DEFAULT:
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'T2';
Comparing side-by-side with SHOW COLUMNS:
SHOW COLUMNS LIKE 'CALC';
-- kind: VIRTUAL_COLUMN
One notable difference between them is that virtual column cannot be updated:
UPDATE T1
SET calc = 1;
-- Virtual column 'CALC' is invalid target.
UPDATE T2
SET calc = 1;
-- success

How about using SHOW COLUMNS ? you should identify them when expression
field is not null.
create table foo (id bigint, derived bigint as (id * 10));
insert into foo (id) values (1), (2), (3);
SHOW COLUMNS IN TABLE foo;
SELECT "table_name", "column_name", "expression" FROM table(result_scan(last_query_id()));
| table_name | column_name | expression |
| ---------- | ----------- | -------------- |
| FOO | ID | null |
| FOO | DERIVED | ID*10 |

I normally use the desc table option.
First lets create the table with some example data:
create or replace temporary table ColumnTypesTest (
id int identity(1,1) primary key,
userName varchar(30),
insert_DT datetime default CAST(CONVERT_TIMEZONE('UTC', CAST(CURRENT_TIMESTAMP() AS TIMESTAMP_TZ(9))) AS TIMESTAMP_NTZ(9)) not null,
nextDayAfterInsert datetime as dateadd(dd,1,insert_DT)
);
insert into ColumnTypesTest (userName) values
('John'),
('Cris'),
('Anne');
select * from ColumnTypesTest;
ID
USERNAME
INSERT_DT
NEXTDAYAFTERINSERT
1
John
2021-10-04 19:11:21.069
2021-10-05 19:11:21.069
2
Cris
2021-10-04 19:11:21.069
2021-10-05 19:11:21.069
3
Anne
2021-10-04 19:11:21.069
2021-10-05 19:11:21.069
Now the answer to your question
Using the 'desc table <table_name>;' you will get a column named kind which will tell you if it is virtual or not, also separately there is the default with NULL if it has no default value.
name
type
kind
null?
default
primary key
unique key
check
expression
comment
policy name
ID
NUMBER(38,0)
COLUMN
N
IDENTITY START 1 INCREMENT 1
Y
N
USERNAME
VARCHAR(30)
COLUMN
Y
N
N
INSERT_DT
TIMESTAMP_NTZ(9)
COLUMN
N
CAST(CONVERT_TIMEZONE('UTC', CAST(CURRENT_TIMESTAMP() AS TIMESTAMP_TZ(9))) AS TIMESTAMP_NTZ(9))
N
N
NEXTDAYAFTERINSERT
TIMESTAMP_NTZ(9)
VIRTUAL
Y
N
N
DATE_ADDDAYSTOTIMESTAMP(1, INSERT_DT)
A/
With 'desc table <table_name>' you get meta data of the table with a column named kind, which will say VIRTUAL or COLUMN. In case it is VIRTUAL, then in the column expression you get how that column is calculated.
This is used in Stored Procedures, and saved in an array of arrays with javascript, from there the next query in the stored procedure is created dynamically. A while loop is used to go through the resultSet and push each row intho the array of arrays. You can then use javascript filter to just get the virtual columns. This is part of the advantage of having a mix of javascript and SQL in Snowflake Stored Procedures.
Here the documentation which doesn't say much.

SQL Server check constraint logic

I've got a table that has such kind of structure:
CREATE TABLE #Mine
(
ProductID INT
, CountryID INT
, ApplicationID INT
);
Let's assume it has data as follows:
ProductID CountryID ApplicationID
1 2 -1
1 3 -1
1 3 2
I'd like to enforce such logic that there's no other ProductID/CountryID combination in entire table if it exists with ApplicationID = -1. In My example 2nd and 3rd row wouldn't pass this.
I could create a custom function to validate that and make a CHECK constraint out of it. Is there perhaps a more elegant way to do it?

I would split your task. First, assign unique constraint (this can be table Key):
CREATE UNIQUE INDEX IX_UQ ON Mine(ProductId, CountryId, ApplicationId)
This is for trivial validations and to improve trigger query.
Second, your check requires many records involved (no CHECK constraint possible). This is task for trigger:
CREATE TRIGGER trMine
ON Mine FOR INSERT,UPDATE
IF (EXISTS(
SELECT Mark FROM
(
SELECT MAX(CASE WHEN M.ApplicationId=-1 THEN 1 ELSE 0 END)*(COUNT(*)-1) Mark
FROM Mine M
JOIN inserted I ON M.ProductId=I.ProductId AND M.CountryId=I.CountryId
GROUP BY M.ProductId,M.CountryId
) Q
WHERE Mark != 0
)) THROW 50000, 'Validation error', 1
When there are 2 or more records (COUNT(*)-1>0) and there is any record with ApplicationId=-1, Mark evaluates to something != 0. This is your violation rule.

You can just use a Unique Filtered Index:
CREATE UNIQUE INDEX IX_UniqueNegativeApp ON Mine(ProductID, CountryID) WHERE ApplicationID = -1

Any reason why I shouldn't use "between X and Y" on a varchar field in SQL to return a number?

I've got an indexed (but not unique) varchar field of Employee IDs in a table, and in a query I need to return rows that are exactly 4 numerical characters but also over 1000.
I've found various questions on here about using validation methods to check that the field contains 0-9 characters, or doesn't contain a-z characters etc, but these are unrelated to this question.
Background:
I've got a table with various values, sample set as follows:
EmployeeID
----------
6745
EMP1
EMP2
1874
LTST
5694
0014
What I would like to do is return all values except EMP1, EMP2, LTST and 0014.
My question is, are there any reasons why I shouldn't use a Where clause like where EmployeeID between '1000' and '9999'? Reason for this being employeeid is a varchar column
If I can do this, should I also Order By employee ID, or does this not matter?

I believe "0014" would be left out of the where clause between '1000' and '9999', so that's a reason. Perhaps between '0000' and '9999' would suit your purposes better. Just remember that you're still sorting based on text. If you have any entries like "1_99", this would also show up in your query results with your given between clause.
If you're looking to only return 4-character numbers excluding leading zeroes, then the following addition should suffice:
WHERE EmployeeID BETWEEN '1000' AND '9999' AND TRY_CAST(EmployeeID As int) IS NOT NULL
...or, more intuitively:
WHERE TRY_CAST(EmployeeID As int) BETWEEN 1000 AND 9999

Run the following code as an example and you'll see that SQL Server doesn't treat INT the same as integers stored as VARCHAR:
WITH IntsAsVars
AS (
SELECT var = '1000',
int = 1000
UNION ALL
SELECT var = '100',
int = 100
UNION ALL
SELECT var = '9999',
int = 999
UNION ALL
SELECT var = '99',
int = 99
UNION ALL
SELECT var = '750',
int = 750
UNION ALL
SELECT var = '10',
int = 10
UNION ALL
SELECT var = '2',
int = 2
)
SELECT *
FROM IntsAsVars
--WHERE var BETWEEN '2' AND '750'
/* should return 2, 10, 99, 100 & 750 if it works like INT
but does it? */
ORDER BY
--var ASC,
int ASC;
Running it without the where clause gets the following so SQL Server doesn't consider the other records to be between 2 and 750 when they are stored as varchar.:

If your real data is exactly as the sample data in regard of the non-numeric values beginning with a letter, you could use your query to achieve the desired result.
However be aware of of the sort order of the data. If you have got an EmployeeId of 1ABC it will be included in the data returned by WHERE EmployeeID BETWEEN '1000' AND '9999'!
Your approach is not suitable to filter out non-numeric values!
An additional ORDER BY affects the order of the results only, it has no effect on the evaluation of the WHERE condition.

I'd say the simplest way is to use like:
select * from yourtable
where EmployeeID like '[1-9][0-9][0-9][0-9]'

lets say you have this input:
IF OBJECT_ID('tempdb..#test') IS NOT NULL
DROP TABLE #test
CREATE TABLE #test
(
EmployeeID VARCHAR(255)
)
CREATE CLUSTERED INDEX CIX_test_EmployeeID ON #test(EmployeeID)
INSERT INTO #test
VALUES
('6745'),
('EMP1'),
('EMP2'),
('1874'),
('LTST'),
('5694'),
('1000'),
('9999'),
('10L'),
('187'),
('9X9'),
('7est'),
('1ok'),
('0_o'),
('0014');
Your statement would also return '1ok','187', '10L' and so on.
Since you mentioned that your employeeID has a fixed length, you could use something like this:
SELECT *
FROM #test
WHERE EmployeeID LIKE '[1-9][0-9][0-9][0-9]'

conditional "next value for sequence"

scenario:
Sql Server 2012 Table named "Test" has two fields. "CounterNo" and "Value" both integers.
There are 4 sequence objects defined named sq1, sq2, sq3, sq4
I want to do these on inserts:
if CounterNo = 1 then Value = next value for sq1
if CounterNo = 2 then Value = next value for sq2
if CounterNo = 3 then Value = next value for sq3
I think, create a custom function assign it as default value of Value field. But when i tried custom functions not supports "next value for Sequence Objects"
Another way is using trigger. That table has trigger already.
Using a Stored Procedure for Inserts is the best way. But EntityFramework 5 Code-First is not supporting it.
Can you suggest me a way to achieve this.
(if you show me how can i do it with custom functions you can also post it here. It's another question of me.)
Update:
In reality there are 23 fields in that table and also primary keys setted and i'm generating this counter value on software side, using "counter table".It is not good to generate counter values on client side.
I'm using 4 sequence objects as counters because they represents different types of records.
If i use 4 counters on same record at same time, all of them generates next values. I want only related counter generates it's next value while others remains same.

I'm not shure if I fully understand your use case but maybe the following sample illustrates what you need.
Create Table Vouchers (
Id uniqueidentifier Not Null Default NewId()
, Discriminator varchar(100) Not Null
, VoucherNumber int Null
-- ...
, MoreData nvarchar(100) Null
);
go
Create Sequence InvoiceSequence AS int Start With 1 Increment By 1;
Create Sequence OrderSequence AS int Start With 1 Increment By 1;
go
Create Trigger TR_Voucher_Insert_VoucherNumer On Vouchers After Insert As
If Exists (Select 1 From inserted Where Discriminator = 'Invoice')
Update v
Set VoucherNumber = Next Value For InvoiceSequence
From Vouchers v Inner Join inserted i On (v.Id = i.Id)
Where i.Discriminator = 'Invoice';
If Exists (Select 1 From inserted Where Discriminator = 'Order')
Update v
Set VoucherNumber = Next Value For OrderSequence
From Vouchers v Inner Join inserted i On (v.Id = i.Id)
Where i.Discriminator = 'Order';
go
Insert Into Vouchers (Discriminator, MoreData)
Values ('Invoice', 'Much')
, ('Invoice', 'More')
, ('Order', 'Data')
, ('Invoice', 'And')
, ('Order', 'Again')
;
go
Select * From Vouchers;
Now Invoice- and Order-Numbers will be incremented independently. And as you can have multiple insert triggers on the same table, that shouldn't be an issue.

I think you're thinking about this in the wrong way. You have 3 values and these values are determined by another column. Switch it around, create 3 columns and remove the Counter column.
If you have a table with value1, value2 and value3 then the Counter value is implied by the column in which the value resides. Create a unique index on these three columns and add an identity column for a primary key and you're sorted; you can do it all in a stored procedure easily.

If you have four different types of records, use four different tables, with a separate identity column in each one.
If you need to see all the data together, then use a view to combine them:
create v_AllTypes as
select * from type1 union all
select * from type2 union all
select * from type3 union all
select * from type4;
Alternatively, do the calculation of the sequence number on output:
select t.*,
row_number() over (partition by CounterNo order by t.id) as TypeSeqNum
from AllTypes t;
Something seems amiss with your data model if it requires conditional updates to four identity columns.

get max from table where sum required

Suppose I have a table with following data:
gameId difficultyLevel numberOfQuestions
--------------------------------------------
1 1 2
1 2 2
1 3 1
In this example the game is configured for 5 questions, but I'm looking for a SQL statement that will work for n number of questions.
What I need is a SQL statement that given a question, displayOrder will return the current difficulty level of question. For example - given a displayOrder of 3, with the table data above, will return 2.
Can anyone advise how the query should look like?

I'd recommend a game table with a 1:m relationship with a question table.
You shouldn't repeat columns in a table - it violates first normal form.
Something like this:
create table if not exists game
(
game_id bigint not null auto_increment,
name varchar(64),
description varchar(64),
primary key (game_id)
);
create table if not exists question
(
question_id bigint not null auto_increment,
text varchar(64),
difficulty int default 1,
game_id bigint,
primary key (question_id) ,
foreign key game_id references game(game_id)
);
select
game.game_id, name, description, question_id, text, difficulty
game left join question
on game.game_id = question.game_id
order by question_id;

things might be easier for you if you change your design as duffymo suggests, but if you must do it that way, here's a query that should do the trick.
SELECT MIN(difficultyLevel) as difficltyLevel
FROM
(
SELECT difficltyLevel, (SELECT sum(numberOfQuestions) FROM yourtable sub WHERE sub.difficultyLevel <= yt.difficultyLevel ) AS questionTotal
FROM yourTable yt
) AS innerSQL
WHERE innerSQL.questionTotal >= #displayOrder

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Select a large volume of data with like SQL server - sql-server

This should do that, it will get all the IDs that start with IDx, with x that goes form 0 to 4 Select * from TABLE where ID LIKE 'ID[0-4]%'

You can try : Select * from TABLE where id like 'ID[0-3][0-9]%[a-zA-Z]';

Related

Identify if a column is Virtual in Snowflake

SQL Server check constraint logic

Any reason why I shouldn't use "between X and Y" on a varchar field in SQL to return a number?

conditional "next value for sequence"

get max from table where sum required

Categories

Resources