I have a table filled with some words of different length.
ID | WORD | LENGTH
1 | able | 4
2 | acid | 4
3 | about | 5
.....
method in C# is generating random number and I want to get word with nearest match of length and ID. Recently I am using this query
select top 1 word from vocabulary where length = 4 and id <= 3;
problem is that this way it always returns first occurrence of word with 4 letters. That's not what I need.
I can't use this:
select top 1 word from vocabulary where length = 4 and id <= 3;
because when the random number is close to the last id in the table, it could happen that there will be no other word with requested length. (e.g.
select top 1 word from vocabulary where length = 4 and id >= 2;
would not be able to find match.
Is there a way how to select 1 value that has nearest match in the requested direction? <= or >=
Thanks.
declare #vocabulary table (ID int, Word varchar(max), LENGTH int)
insert into #vocabulary(ID,Word,LENGTH)values(1,'able',4),(2,'acid',4),(3,'about',5)
declare #random int = rand() * 10
select #random
select top 1 word from #vocabulary where LENGTH = 4 order by ABS(ID - #random)
Result is the neareast id from random number
Not sure if it's crucial to retrieve the row based on an externally generated random number, but if you just want a random word of given length, you could do something like this...
Setup:
DROP TABLE IF EXISTS DICTIONARY;
CREATE TABLE DICTIONARY (
ID int,
WORD nvarchar(255),
LENGTH AS LEN(WORD),
CONSTRAINT DICTIONARY_PK PRIMARY KEY (ID),
);
CREATE INDEX DICTIONARY_I1 ON DICTIONARY (LENGTH) INCLUDE (WORD);
INSERT INTO DICTIONARY (ID, WORD) VALUES
(1, 'able'),
(2, 'acid'),
(3, 'about'),
(4, 'boss'),
(5, 'brain'),
(6, 'child'),
(7, 'computer'),
(8, 'hint'),
(9, 'human'),
(10, 'ichthyosaur'),
(11, 'mother'),
(12, 'otorhinolaryngologist');
Query for getting a random row of given length (4 in this example):
DECLARE #length int = 4;
SELECT TOP 1 * FROM DICTIONARY WHERE LENGTH = #length ORDER BY NEWID();
The query plan is nice, which may be important for a large table and/or frequent querying:
Related
I have a query but it is not working correctly,
Select *
from "Firms"
Where "Properties" IN ('{1,2}')
That's my postgres query,
"Properties" column is int array.
Only those containing these two values are coming, but I want to fetch records containing any of the values, and I want to list by number of matching values if possible.
Test case:
create table array_any(id integer, array_fld int[]);
insert into array_any values (1, ARRAY[1,2]), (2, ARRAY[2,3]), (3, ARRAY[3,4]);
select id, count(*) from array_any,
lateral unnest(array_fld) as s where s = ANY(ARRAY[1,2]) group by id order by id;
id | count
----+-------
1 | 2
2 | 1
I need a query or function to count the 0's between 1's in a string.
For example:
String1 = '10101101' -> Result=3
String2 = '11111001101' -> Result=1
String3 = '01111111111' -> Result=1
I only need to search for 101 pattern or 01 pattern if its at the beginning of the string.
You may try to decompose the input strings using SUBTRING() and a number table:
SELECT
String, COUNT(*) AS [101Count]
FROM (
SELECT
v.String,
SUBSTRING(v.String, t.No - 1, 1) AS PreviousChar,
SUBSTRING(v.String, t.No, 1) AS CurrentChar,
SUBSTRING(v.String, t.No + 1, 1) AS NextChar
FROM (VALUES
('10101101'),
('11111001101'),
('01111111111')
) v (String)
CROSS APPLY (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) t (No)
) cte
WHERE
CASE WHEN PreviousChar = '' THEN '1' ELSE PreviousChar END = '1' AND
CurrentChar = '0' AND
NextChar = '1'
GROUP BY String
Result:
String 101Count
10101101 3
11111001101 1
01111111111 1
Notes:
The table with alias v is the source table, the table with alias t is the number table. If the input strings have more than 10 characters, use an appropriate number (tally) table.
-- This converts "111101101010111" in "01101010" and "011101000" in "01110"
regexp_replace(field, '^1*(.*)1*0*$', '\1')
-- This converts "01101010" in "0000"
regexp_replace(field, '1', '')
-- This counts the string length, returning 4 for '0000':
LENGTH(field)
-- Put all together:
LENGTH(
regexp_replace(
regexp_replace(field, '^1*(.*)1*0*$', '\1')
, '1', '')
)
Different or more complicated cases require a modification of the regular expression.
Update
For "zeros between 1s" I see now you mean "101" sequences. This is more complicated because of the possibility of having "10101". Suppose you want to count this as 2:
replace 101 with 11011. Now 10101 will become either 1101101 or 1101111011. In either case, you have the "101" sequence well apart and still only have two of them.
replace all 101s with 'X'. You now have 1X11X1
replace [01] with the empty string. You now have XX.
use LENGTH to count the X's.
Any extra special sequence like "01" at the beginning you can convert as first thing with "X1" ("10" at the end would become "1X"), which will then neatly fold back in the above workflow.
By using the LIKE operator with % you can decide how to search a specific string. In this SQL query I am saying that I want every record that starts as 101 or 01.
SELECT ColumnsYouWant FROM TableYouWant
WHERE ColumnYouWant LIKE '101%' OR '01%';
You can simple COUNT the ColumnYouWant, like this:
SELECT COUNT(ColumnYouWant) FROM TableYouWant
WHERE ColumnYouWant LIKE '101%' OR '01%';
Or you can use a method of your backend language to count the results that the first query returns. This count method will depend on the language you are working with.
SQL Documentation for LIKE: https://www.w3schools.com/sql/sql_like.asp
SQL Documentation for COUNT; https://www.w3schools.com/sql/sql_count_avg_sum.asp
The other solutions do not account for all of the characters (max of 11, of the examples shown)
Data
drop table if exists #tTEST;
go
select * INTO #tTEST from (values
(1, '10101101'),
(2, '11111001101'),
(3, '01111111111')) V(id, string);
Query
;with
split_cte as (
select id, n, substring(t.string, v.n, 1) subchar
from #tTEST t
cross apply (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),
(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)) v(n)
where v.n<=len(t.string)),
lead_lag_cte as (
select id, n, lead(subchar, 1, 9) over (partition by id order by n) lead_c, subchar,
lag(subchar, 1, 9) over (partition by id order by n) lag_c
from split_cte)
select id, sum(case when (lead_c=1 and lag_c=9) then 1 else
case when (lead_c=1 and lag_c=1) then 1 else 0 end end) zero_count
from lead_lag_cte
where subchar=0
group by id;
Results
id zero_count
1 3
2 1
3 1
Another way, perhasp quicker:
DECLARE #T TABLE (ID INT, STRING VARCHAR(32));
INSERT INTO #T
VALUES (1, '10101101'),
(2, '11111001101'),
(3, '01111111111');
SELECT *, LEN(STRING) - LEN(REPLACE(STRING, '0', '')) AS NUMBER_OF_ZERO
FROM #T
Result:
ID STRING NUMBER_OF_ZERO
----------- -------------------------------- --------------
1 10101101 3
2 11111001101 3
3 01111111111 1
select (len(replace('1'+x, '101', '11011')) - len(replace(replace('1'+x, '101', '11011'), '101', '')))/3
from
(
values
('10101101'),
('11111001101'),
('01111111111'),
('01010101010101010101')
) v(x);
I have a column notes with a length of more than 80,0000 characters.
As per the transformation rule, I have to write a SQL script which will caption the notes column in the below steps :
First 300 characters in Column_A
Next 300 characters in Column_B
Next 300 characters in Column_C
and so on.
So I am looking for a output as below :
For every client ID with end of the length of the notes column.
Ouch! That's quite a complex requirement. You will need to combine a number of skills to solve this one.
Firstly you need to create additional rows. One way to achieve this is via recursion. In the example below I've calculated how many rows are required for each Client Id. I've then used recursion to create them.
You also need to break each row into 3 300 character blocks. In my example I've used 3 3 character blocks instead, so it's easier to read. But the principle will scale up. Using SUBSTRING and the record number you can calculate the starting point for each column.
I've created some sample records in a CTE called Raw. This allows anyone to follow the example, which is up on Stack Data Exchange (link below).
Example
DECLARE #ColumnWidth INT = 3; -- Use to adjust required length of columns A, B and C.
DECLARE #ColumnCount INT = 3; -- Use to adjust number of output columns.
WITH [Raw] AS
(
/* This CTE creates sample records for us to experiment with.
* The note column contains each letter of the alphabet, repeated
* 3 times. The repeatition will help us validate the result set.
*
* Using ceiling, to round up, the field length (#ColumnWidth) and
* the number of fields (#ColumnCount) and the number of charaters (LEN)
* we can calculate how many rows are required.
*/
SELECT
r.ClientId,
r.Note,
CEILING(CAST(LEN(r.Note) AS DECIMAL(18, 8)) / (#ColumnWidth * #ColumnCount)) AS RecordsRequired
FROM
(
VALUES
(1, 'aaabbbcccdddeeefffggghhhiiijjjkkklllmmmnnnooopppqqqrrrssstttuuuvvvwwwxxxyyyzz'),
(2, 'aaabbbcccdddeeefffggghhhiiijjjkkklll'),
(3, 'aaabbbcccdddeeefffggghhhiiijjjkkklllmmmnnno'),
(4, 'aaabbbcccdddeeefffggghhhiiijjjkkklllmmmnnnoooppp'),
(5, 'aaabbbcccdddeeefffggghhhiiijjj'),
(6, 'aaabbbcccdd')
) AS r(ClientId, Note)
),
MultiRow AS
(
/* This CTE uses recursion to return multiple rows for
* each orginal row.
* The number returned matches the RecordsRequired value
* from the Raw CTE.
*/
SELECT
1 AS RecordNumber,
RecordsRequired,
ClientId,
Note
FROM
[Raw]
UNION ALL
-- Keep repeating each record until the number of required rows has been returned.
SELECT
RecordNumber + 1 AS RecordNumber,
RecordsRequired,
ClientId,
Note
FROM
MultiRow
WHERE
RecordNumber < RecordsRequired
)
/* Each record returned by the MultiRow CTE is numbered: 1, 2, 3 etc.
* Using this we can extract blocks of text from the orginal Note column.
*/
SELECT
ClientId,
SUBSTRING(Note, ((#ColumnWidth * #ColumnCount) * RecordNumber) - ((#ColumnWidth * 3) -1), #ColumnWidth) AS Column_A,
SUBSTRING(Note, ((#ColumnWidth * #ColumnCount) * RecordNumber) - ((#ColumnWidth * 2) -1), #ColumnWidth) AS Column_B,
SUBSTRING(Note, ((#ColumnWidth * #ColumnCount) * RecordNumber) - ((#ColumnWidth * 1) -1), #ColumnWidth) AS Column_C
FROM
MultiRow
ORDER BY
ClientId, RecordNumber
;
Here is how you can do this:
DECLARE #c TABLE(ID INT, Notes VARCHAR(26))
INSERT INTO #c VALUES
(1, 'abcdefghijklmnopqrstuvwxyz'),
(2, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')
DECLARE #size INT = 26
DECLARE #chunk INT = 5
;WITH tally AS(SELECT 1 s1, #chunk + 1 s2, 2*#chunk + 1 s3
UNION ALL
SELECT s3 + #chunk, s3 + 2*#chunk, s3 + 3*#chunk FROM tally
WHERE s3 < #size)
SELECT c.ID,
SUBSTRING(Notes, t.s1, #chunk) A,
SUBSTRING(Notes, t.s2, #chunk) B,
SUBSTRING(Notes, t.s3, #chunk) C
FROM #c c
CROSS JOIN tally t
ORDER BY c.ID, t.s1
Output:
ID A B C
1 abcde fghij klmno
1 pqrst uvwxy z
2 ABCDE FGHIJ KLMNO
2 PQRST UVWXY Z
Description:
tally table returns you the starting positions, which you will use in substring function. For the above configuration it returns:
s1 s2 s3
1 6 11
16 21 26
For this you are using recursive cte which spreads starting positions across the rows with 3 starting position. The rest should be easy to understand.
The required result can be obtained using simple looping
/* Declare a temperory table for storing the results */
DECLARE #Result_TABLE AS TABLE
(
CustomerId BIGINT
,ColA VARCHAR(300)
,ColB VARCHAR(300)
,ColC VARCHAR(300)
)
DECLARE #CustomerCount INT --To store customer count
DECLARE #IteratorForCustomers INT = 1 --To iterate for each customers
/* Get Count of cutomers */
SELECT #CustomerCount = COUNT (1) FROM Customers
DECLARE #CustomerId BIGINT --To store customer id in looping
DECLARE #TempNote VARCHAR(MAX) -- To store customer note of each customer in looping
/* Loop for all customers */
WHILE (#IteratorForCustomers <=#CustomerCount)
BEGIN
;WITH CTE AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY CustomerID ) AS RowId
,CustomerId
,Customer_Note
FROM Customers
)
SELECT
#CustomerId = a.CustomerId
,#TempNote = a.Customer_Note
FROM CTE a
WHERE
RowId = #IteratorForCustomers
/* Loop for generating each row with three columns */
WHILE (LEN(#TempNote)>0)
BEGIN
INSERT INTO #Result_TABLE
VALUES
(
#CustomerId,SUBSTRING(#TempNote,1,300),SUBSTRING(#TempNote,301,300),SUBSTRING(#TempNote,601,300)
)
SET #TempNote = CASE WHEN LEN(#TempNote)>900 THEN SUBSTRING(#TempNote,901,LEN(#TempNote)-900)
ELSE NULL END
END
SET #IteratorForCustomers = #IteratorForCustomers + 1
END
SELECT * FROM #Result_TABLE
In my SQL Server database, I have a table like this :
counter, value
12345, 10.1
12370, 10.5
12390, 9.7
12405, 10.1
12510, 12.3
Let's assume that I input a value of 5. I need to fill in the data between the first record and second record by increment of 5 in the counter column.
For example using Record 1 and Record 2, here are the additional data needs to be inserted into the table.
12345, 10.1 --> Record 1
12350, 10.1
12355, 10.1
12360, 10.1
12365, 10.1
12370, 10.5 --> Record 2
Other than using a database cursor to loop through each record in the table and then select the MIN counter after Record 1, is there any other way that I can achieve it with less I/O overhead ? I just need to insert additional counter between the range based on the input parameter.
Thanks for your input.
If you're wanting to compute a weighted average, there's no need to create these rows. You can just work out how many rows you would have added and use that information to calculate the average. E.g.:
declare #t table (counter int not null, value decimal(19,4) not null)
insert into #t(counter, value) values
(12345, 10.1),
(12370, 10.5),
(12390, 9.7 ),
(12405, 10.1),
(12510, 12.3)
declare #gap int
set #gap = 5
;With Numbered as (
select counter,value,ROW_NUMBER() OVER (ORDER BY counter) as rn
from #t
), Paired as (
select n1.counter,n1.value,
(n2.counter - n1.counter)/#gap as Cnt --What do we do for the last row?
from Numbered n1
left join
Numbered n2
on
n1.rn = n2.rn - 1
)
select SUM(value*COALESCE(Cnt,1))/SUM(COALESCE(Cnt,1)) from Paired
Where as you can (hopefully) see, I've currently decided that the last row counts as just 1, but anything else could be done there also.
Filling gaps with values is usually a problem best answered using a Numbers table (a table with a single int column containing numbers from 1 to some sufficiently large number):
declare #n1 int = 12345, #n2 int = 12370, #step int = 5
select #n1 + (n * #step)
from numbers
where n < (#n2 - #n1) / #step
a recursion should work as well:
;WITH
Initial AS (SELECT COUNTER,value FROM yourtable),
maxvalue AS (SELECT MAX(COUNTER) Mvalue FROM Initial),
recur AS (
SELECT COUNTER, value FROM yourtable
UNION ALL
SELECT counter+5,value FROM recur r WHERE COUNTER+5< (SELECT Mvalue FROM maxvalue)
AND NOT EXISTS (SELECT 1 FROM Initial o WHERE o.COUNTER=r.COUNTER+5)
)
SELECT * FROM recur ORDER BY COUNTER
just replace 'yourtable' with the name of your table
I am working in SQL Server 2008 R2 with a priority ordered set of content that must be assigned to a set of buckets to achieve a content specified value. Each item in the content list is related to nodes within a ragged tree hierarchy (the buckets). Each bucket has a value assigned to it and can hold a fixed quantity of content.
I am trying to allocate content in priority order to the buckets that they relate to (or any parent/grandparent up the tree from related content). I must start with the highest bucket value (with empty spaces) and stop only when the bucket values match or exceed my content value.
Hopefully my crude example will help. Assuming the B’s are buckets that can each hold 2 pieces of content and C’s are content. The bracketed numbers are the bucket value and required content value.
C1 would result in being allocated to B1 (highest value in B1’s tree) and B4 to give it a total value of 7. Both B1 an B4 now only have one slot remaining.
C2 would be allocated B1 and B5 leaving no slots in B1 and 1 slot in B2.
C3 would not be able to use B1 as there are no slots available, so would result in B2, B5 and B9 leaving no slots in B5 and one slot in B2 / B5.
And so on...
I can see how to achieve this iteratively by creating a list of all buckets and their relationship with all child / grand child buckets. Looping though content one at a time, assigning its' buckets and reducing the remaining bucket spaces. The reason I feel that it needs to be a loop is due to the unknown number of spaces remaining in each bucket based on processing all higher priority content.
But looping through content one at a time feels intrinsically wrong and there must be a more efficient way to solve this allocation problem – ideally in one pass…
Example SQL Server code (to match the above diagram)
--core table/fields
CREATE TABLE Bucket
(
Id int,
Name varchar(3),
BucketValue int,
SlotRemaining int --only required for my solution to hold number of slots left to fill
)
CREATE TABLE BucketParent
(
ChildBucketId int,
ParentBucketId int
)
CREATE TABLE Content
(
Id int,
Name varchar(3),
ContentValue int,
AllocationState int, --only required for my solution to identify content that still needs processing
--1=unprocessed, 2=Complete
Priority int --order to work through content 1=most imnportant
)
CREATE TABLE ContentBucket
(
ContentId int,
BucketId int
)
Go
CREATE TABLE ContentPriorityBucket -- table to record my allocation of content to the most valuable bucket
(
ContentId int,
BucketId int
)
Go
--test data to match example (wish id made it smaller now :)
INSERT INTO Bucket Values (1,'B1', 4, null)
INSERT INTO Bucket Values (2,'B2', 5, null)
INSERT INTO Bucket Values (3,'B3', 4, null)
INSERT INTO Bucket Values (4,'B4', 3, null)
INSERT INTO Bucket Values (5,'B5', 3, null)
INSERT INTO Bucket Values (6,'B6', 3, null)
INSERT INTO Bucket Values (7,'B7', 4, null)
INSERT INTO Bucket Values (8,'B8', 2, null)
INSERT INTO Bucket Values (9,'B9', 1, null)
INSERT INTO Bucket Values (10,'B10', 2, null)
INSERT INTO Bucket Values (11,'B11', 1, null)
INSERT INTO BucketParent Values (8, 4)
INSERT INTO BucketParent Values (4, 1)
INSERT INTO BucketParent Values (9, 5)
INSERT INTO BucketParent Values (5, 1)
INSERT INTO BucketParent Values (5, 2)
INSERT INTO BucketParent Values (10, 5)
INSERT INTO BucketParent Values (10, 6)
INSERT INTO BucketParent Values (6, 2)
INSERT INTO BucketParent Values (6, 3)
INSERT INTO BucketParent Values (11, 6)
INSERT INTO BucketParent Values (11, 7)
INSERT INTO BucketParent Values (7, 3)
INSERT INTO Content Values (1,'C1', 5, null, 1)
INSERT INTO Content Values (2,'C2', 8, null, 2)
INSERT INTO Content Values (3,'C3', 9, null, 3)
INSERT INTO Content Values (4,'C4', 10, null, 4)
INSERT INTO ContentBucket Values (1,8)
INSERT INTO ContentBucket Values (1,4)
INSERT INTO ContentBucket Values (2,9)
INSERT INTO ContentBucket Values (3,9)
INSERT INTO ContentBucket Values (4,10)
INSERT INTO ContentBucket Values (4,7)
GO
--Iterative solution that I am trying to improve on
UPDATE Bucket
SET SlotRemaining = 2 --clear previous run and allocate maximum bucket size
UPDATE Content
SET AllocationState = 1 --set state to unprocessed
--Clear last run
TRUNCATE Table ContentPriorityBucket
GO
DECLARE #ContentToProcess int = 0
DECLARE #CurrentContent int
DECLARE #CurrentContentValue int
SELECT #ContentToProcess = COUNT(id) FROM Content WHERE AllocationState =1
WHILE (#ContentToProcess > 0)
BEGIN
-- get next content to process
SELECT Top(1) #CurrentContent = ID,
#CurrentContentValue = ContentValue
FROM Content
WHERE AllocationState =1
ORDER BY Priority;
WITH BucketList (Id, BucketValue, SlotRemaining)
as
(
-- list buckets related to content
SELECT b.Id
,b.BucketValue
,b.SlotRemaining
FROM ContentBucket cb
INNER JOIN Bucket b on cb.BucketId = b.Id
WHERE cb.ContentId = #CurrentContent
-- need to pull back all buckets (even those that are full as they may have empty parents)
UNION ALL
SELECT b.Id
,b.BucketValue
,b.SlotRemaining
FROM BucketList bl
INNER JOIN BucketParent bp on bl.Id = bp.ChildBucketId
INNER JOIN Bucket b on bp.ParentBucketId = b.Id
),
DistinctBucketList (Id, BucketValue, SlotRemaining)
as
(
--dedupe buckets
SELECT distinct Id
, BucketValue
, SlotRemaining
FROM BucketList
),
BucketListOrdered (Id, BucketValue, RowOrder)
as
(
--order buckets
SELECT Id
,BucketValue
,ROW_NUMBER() OVER (ORDER BY BucketValue desc, Id)-- added id to get consistant result if two buckets have same value
FROM DistinctBucketList
WHERE SlotRemaining >0
),
CulmativeBucketListWithinRequiredValue (Id, RowOrder, CulmativeBucketValue, RequiredBucket)
as
(
-- this will mark all buckets up to the bucket value, but will be 1 bucket short
SELECT blo.Id
,blo.RowOrder
,SUM(blc.BucketValue) CulmativeBucketValue
,CASE
WHEN SUM(blc.BucketValue) <=#CurrentContentValue THEN 1
ELSE 0
END RequiredBucket
FROM BucketListOrdered blo
LEFT JOIN BucketListOrdered blc ON blc.RowOrder <= blo.RowOrder
GROUP BY blo.Id, blo.RowOrder
)
-- this will identify all buckets required to top content value
INSERT INTO ContentPriorityBucket
SELECT #CurrentContent
,b.Id
FROM CulmativeBucketListWithinRequiredValue b
WHERE b.RowOrder <= (SELECT Max(RowOrder) + 1 FROM CulmativeBucketListWithinRequiredValue WHERE RequiredBucket =1)
--reduce all used bucket sizes by 1 (could alternatively determine this from ContentPriorityBucket)
UPDATE Bucket
SET SlotRemaining = SlotRemaining -1
WHERE id in (SELECT BucketId FROM ContentPriorityBucket WHERE ContentId = #CurrentContent)
-- update processed bucket
UPDATE Content
SET AllocationState = 2
WHERE #CurrentContent = Id
SELECT #ContentToProcess = COUNT(id) FROM Content WHERE AllocationState =1
END
SELECT ContentId, BucketId FROM ContentPriorityBucket
/*
DROP TABLE Bucket
DROP TABLE BucketParent
DROP TABLE Content
DROP TABLE ContentBucket
DROP TABLE ContentPriorityBucket
*/
There are a couple points to make about this problem.
First, generalized bin-packing is a NP-Complete problem, and therefore cannot be solved in general in a single pass. This specific bin-packing, since it is an ordered packing, may be different, but the issue of the complexity of the problem remains; it's certainly not O(1), so it may need a loop no matter what.
1-pass non-looping solutions for this seem like they should not be possible; it looks like a problem that isn't made for set-based solutions. You could create a table-valued CLR function, which could find the bucket that each item fits into. Otherwise, keeping the looping solution would be fine. (If you post the code, it might be easier to see if there are improvements possible.)