TSQL remap big number to smaller, but keep Identity - sql-server

I'm trying to load a huge number into Field1 INT which can hold only max=2,147,483,647, according to it I can't change DDL, so tried to find adhoc solution to cut out single digit from the middle of this number and then add check for uniqueness.
This numbers are in the format like: 29000001234, so I mean to keep this format with zeros in the middle to easy recognizing. I don't want to introduce any new columns/tables into this task, as limited in freedom there, this is 3rd party schema.
Can anybody suggest better solution, how to remap/keep all numbers under that limit; this is my draft:
DECLARE #fl FLOAT = 29000001234
DECLARE #I INT
SELECT #i = (SUBSTRING(CAST(CAST(#fl AS BIGINT) AS VARCHAR(18)),1,4) +
SUBSTRING(CAST(CAST(#fl AS BIGINT) AS VARCHAR(18)),7,LEN(CAST(CAST(#fl AS BIGINT) AS VARCHAR(18)))) )
select #i;

But if you really want to remove the middle digits, here's another approach:
DECLARE #fl FLOAT = 29000001234
DECLARE #I INT
DECLARE #StringFloat as varchar(80)
SET #StringFloat = CONVERT(varchar(80), CAST(#fl AS bigint))
SET #I = CAST( CONCAT(LEFT( #StringFloat, 4 ), RIGHT( #StringFloat, 5 )) as int )
SELECT #i;

I think arithmetic operations should be less expensive than string operations, so you should use them instead:
DECLARE #fl FLOAT = 29000001234
DECLARE #flBig BIGINT = #fl
DECLARE #i INT
SET #i = (#flBig / 1000000000) * 10000000 + (#flBig % 100000000)
select #i; --> 290001234
Provided example assumes the first part of the number will have a maximum of two digits (i.e. 29 in your case) and that you want to allow larger number in the left part (up to 999999).
NOTE: parentheses are redundant, as division and multiplication have the same priority and modulo operator has higher precedence over addition. I have used them just to highlight the parts of the computation.

You can't do that without any arithmetic overflow, or with out losing your original data.
If you have a limitation in columns of your destination table or query, use multiple rows:
declare #c bigint = 29000001234;
declare #s bigint = 1000000000; -- Separator value
;with cte(partNo, partValue) as (
select 1, #c % #s
union all
select partNo + 1, (#c / power(#s, partNo)) % #s
from cte
where (#c / power(#s, partNo)) > 0
)
select partValue
from cte;

Seems like a strange situation, not sure why you need to go to all the trouble of converting a big number to a string and then randomly remove a digit. Some more information about why or what the real goal is would be helpful.
That said, maybe it would be easier to just subtract a constant amount from these values? e.g.:
DECLARE #fl FLOAT = 29000001234
DECLARE #I INT
DECLARE #OFFSET BIGINT = 29000000000
SET #I = CAST(#fl AS BIGINT)-#OFFSET
SELECT #I
Which gives you an INT of 1234 as the result using your example.

The following creation drops increasingly wide blocks of digits from the original third party value and returns the results that fit in an INT. The results could be outer joined with the existing data to find a suitable new value.
declare #ThirdPartyValue as BigInt = 29000001234;
declare #MaxInt as BigInt = 2147483647;
declare #TPV as VarChar(19) = Cast( #ThirdPartyValue as VarChar(19) );
declare #TPVLen as Int = Len( #TPV );
with
-- 0 through 9.
Digits as (
select Digit from ( values (0), (1), (2), (3), (4), (5), (6), (7), (8), (9) ) as Digits( Digit ) ),
-- 0 through #TPVLen .
Positions as (
select Ten_1.Digit * 10 + Ten_0.Digit as Number
from Digits as Ten_0 cross join Digits as Ten_1
where Ten_1.Digit * 10 + Ten_0.Digit <= #TPVLen ),
-- 1 through #TPVLen - 1 .
Widths as (
select Number
from Positions
where 0 < Number and Number < #TPVLen ),
-- Try dropping Width digits at Position from #TPV .
AlteredTPVs as (
select P.Number as Position, W.Number as Width,
Stuff( #TPV, P.Number, W.Number, '' ) as AlteredTPV
from Positions as P cross join Widths as W
where P.Number + W.Number <= #TPVLen )
-- See which results fit in an Int .
select Position, Width, AlteredTPV, Cast( AlteredTPV as BigInt ) as AlteredTPVBigInt
from AlteredTPVs
where Cast( AlteredTPV as BigInt ) <= #MaxInt -- Comment out this line to see all results.
order by Width, Position
It could be more clever about returning only distinct new values.
This general idea could be used to hunt down blocks of zeroes or other suitable patterns to arrive at a set of values to be tested against the existing data.

Related

How to generate random alphanumeric unique characters with specified length

Problem is described below:
Generate a unique alphanumeric characters.
Length of characters should be 32.
Unique numbers may be seeded in the current time to help in the uniqueness of the generated numbers.
Alphabet characters must come from this pool: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
Sample Output: 445rpxlKYPkj1pg4q8nAy7Ab91zxZ8v1
I can do this using Java, but will greatly appreciate if you could help me do this on MS SQL or T-SQL.
First, you need to split the string into separate rows. Then, do a SELECT with ORDER BY NEWID() for the random sort. Finally, use FOR XML PATH('') to concatenate them back:
DECLARE #str VARCHAR(100) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
;WITH E1(N) AS( -- 10 ^ 1 = 10 rows
SELECT 1 FROM(VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(N)
),
E2(N) AS(SELECT 1 FROM E1 a CROSS JOIN E1 b), -- 10 ^ 2 = 100 rows
E4(N) AS(SELECT 1 FROM E2 a CROSS JOIN E2 b), -- 10 ^ 4 = 10,000 rows
CteTally(N) AS(
SELECT TOP(LEN(#str)) ROW_NUMBER() OVER(ORDER BY(SELECT NULL))
FROM E4
)
SELECT (
SELECT TOP(32)
SUBSTRING(#str, N, 1)
FROM CteTally t
ORDER BY NEWID()
FOR XML PATH('')
) AS Result
ONLINE DEMO
The above is more of a generic random string generator. You can modify it to suit your need. If the requirement will not change, you can simply use this:
DECLARE #str VARCHAR(100) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
;WITH E1(N) AS( -- 52 Rows
SELECT 1 FROM( VALUES
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1)
)t(N)
),
CteTally(N) AS(
SELECT ROW_NUMBER() OVER(ORDER BY(SELECT NULL))
FROM E1
)
SELECT (
SELECT TOP(32)
SUBSTRING(#str, N, 1)
FROM CteTally t
ORDER BY NEWID()
FOR XML PATH('')
) AS Result
I made this generic enough to handle any pool of characters and any output length. The core idea is to take a random sequence of bytes and use a base conversion algorithm to convert a long number into a new representation then translated to a string using your desired characters as its "digits".
For your specific scenario we need about 183 bits, or log2(52) x 32, to get to your desired length. Using newid() will generate the unique bit sequence but it will only do so 128 bits at a time and a series of values is simply concatenated until there are enough. Then having a value to operate on, the main loop is essentially the same long division we learned from elementary school. The intermediate calculations are kept in place in the varbinary array and the loop continues only until enough output characters are obtained. Each iteration determines another low order digit in the new base and this can terminate early since they won't change. The algorithm can't guarantee any global uniqueness if the output doesn't consume at least all of one newid(), so make sure log2(len(pool)) x output length is at least 128.
The target base, which is ultimately the length of the character pool, can't be more than 256. I hard-coded a limitation by setting the 128-byte maximum length of #e. For the question #e only needs to be 32 bytes long and it could be adjusted upward or downward as necessary or just defined as varbinary(max). If you need something more truly random you could find another source for the entropy bits like crypt_gen_random(). Since uniqueness appears to be the primary concern this answer fits that requirement. And by the way, repeating characters in the pool will naturally open the door for collisions.
This is fast and generic and it can be easily wrapped up in a function. And a more robust implementation would handle these extra checks.
declare #characterPool varchar(256) =
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
declare #outputLength int = 32;
declare #n int = 0; /* counter */
declare #numLoops int = ceiling(log(len(#characterPool)) / log(2) * #outputLength / 128)
declare #e varbinary(128) = 0x; /* entropy */
while #n < #numLoops
begin
set #e = cast(newid() as binary(16)); set #n += 1;
end
declare #b int; /* byte */
declare #d int; /* dividend */
declare #out varchar(128) = '';
declare #outputBase int = len(#characterPool);
declare #entropyBytes int = len(#e);
declare #m int = 0;
while #m < #outputLength
begin
set #b = 0; set #d = 0; set #n = 0;
while #n < #entropyBytes /* big-endian */
begin
set #b = (#b - #d * #outputBase) * 256 + cast(substring(#e, #n + 1, 1) as int);
set #d = #b / #outputBase;
set #e = cast(stuff(#e, #n + 1, 1, cast(#d as binary(1))) as varbinary(128));
set #n += 1;
end
set #out = substring(#characterPool, #b - #d * #outputBase + 1, 1) + #out;
set #m += 1;
end
select #out as "UniqueString"
http://rextester.com/EYAK79470
As one simple test of the algorithm you could just assign a known value in hexadecimal format and confirm that the output (using 012345678ABCDEF as the character pool) is the same hexadecimal value. In the same way this obviously works with base64, binary and octal.
Update: The main loop can be made faster by not having to iterate over more bytes than necessary. I don't know how crypt_gen_random() compares to newid() in terms of speed or CPU usage so this change might not even be a net positive so I'll just note it as an alternative to explore. You will want to keep the bytes from newid on the little end and attach the rest to the front.
declare #e varbinary(1024) = cast(newid() as binary(16));
declare #padBytes int = ceiling(log(len(#characterPool)) / log(2) * #outputLength) - 128;
if #padBytes > 0 set #e = crypt_gen_random(#padBytes) + #e; /* big end plus little end */

Avg of float inconsistency

The select returns right at 23,000 rows
The except will return between 60 to 200 rows (and not the same rows)
The except should return 0 as it is select a except select a
PK: [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
[tf] is a float and and I get float is not exact
But I naively thought avg(float) would be repeatable
Avg(float) does appear to be repeatable
What is the solution?
TF is between 0 and 1 and I only need like 5 significant digits
I just need avg(TF) to be the same number run to run
Decimal(9,8) gives me enough precision and if I cast to decimal(9,8) the except properly returns 0
I can change [TF] to decimal(9,8) but it will be bit of work and lot of regression testing as some of the test that use [tf] take over a day to run
Is change [TF] to decimal(9,8) the best solution?
SELECT [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
, avg([FTSindexWordOnce].[tf]) AS [avgTFraw]
FROM [docSVenum1]
JOIN [docFieldLock]
ON [docFieldLock].[sID] = [docSVenum1].[sID]
AND [docFieldLock].[fieldID] = [docSVenum1].[enumID]
AND [docFieldLock].[lockID] IN (4, 5) /* secLvl docAdm */
JOIN [FTSindexWordOnce]
ON [FTSindexWordOnce].[sID] = [docSVenum1].[sID]
GROUP BY [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
except
SELECT [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
, avg([FTSindexWordOnce].[tf]) AS [avgTFraw]
FROM [docSVenum1]
JOIN [docFieldLock]
ON [docFieldLock].[sID] = [docSVenum1].[sID]
AND [docFieldLock].[fieldID] = [docSVenum1].[enumID]
AND [docFieldLock].[lockID] IN (4, 5) /* secLvl docAdm */
JOIN [FTSindexWordOnce]
ON [FTSindexWordOnce].[sID] = [docSVenum1].[sID]
GROUP BY [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
order by [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
In this case tf is term frequency of tf-idf
tf normalization is subjective and does not require much precision
Avg(tf) needs to be consistent from select to select or the results are not consistent
In a single select with joins I need a consistent avg(tf)
Going with decimal and a low precision for tf got consistent results
This is very similiar to: SELECT SUM(...) is non-deterministic when adding the column-values of datatype float.
The problem is that with inaccurate datatype (FLOAT/REAL) the order of of arithmetic operations on floating point matters. Demo from connect:
DECLARE #fl FLOAT = 100000000000000000000
DECLARE #i SMALLINT = 0
WHILE (#i < 100)
BEGIN
SET #fl = #fl + CONVERT(float, 5000)
SET #i = #i + 1
END
SET #fl = #fl - 100000000000000000000
SELECT CONVERT(NVARCHAR(40), #fl, 2)
-- 0.000000000000000e+000
DECLARE #fl FLOAT = 0
DECLARE #i SMALLINT = 0
WHILE (#i < 100)
BEGIN
SET #fl = #fl + CONVERT(float, 5000)
SET #i = #i + 1
END
SET #fl = #fl + 100000000000000000000
SET #fl = #fl - 100000000000000000000
SELECT #fl
-- 507904
LiveDemo
Possible solutions:
CAST all arguments to accurate datatype like DECIMAL/NUMERIC
alter table and change FLOAT to DECIMAL
you can try to force query optimizer to calculate the sum with the same order.
The good news is that when a stable query result matters to your
application, you can force the order to be the same by preventing
parallelism with OPTION (MAXDOP 1).
It looks like intial link is dead. WebArchive

Convert 32 bit binary string of 1's and 0's to Signed Decimal Number in SQL Server

I am using SQL Server 2012 Express. I have a string of 1's and 0's 32 bits in length.
01010010000100010111001101110011
How would I convert that to a Signed Decimal Number in a SQL script?
Currently I use a Web Tool online for my answer, and my current searching is not leading me to the answer I need.
You can perform the conversion with a single T-SQL statement if you use a Tally table:
;WITH Tally(i) AS (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS i
FROM (VALUES (0), (0), (0), (0), (0), (0), (0), (0)) a(n)
CROSS JOIN (VALUES (0), (0), (0), (0)) b(n)
)
SELECT SUM(t.v) AS DecimalNumber
FROM (
SELECT POWER(CAST(SUBSTRING(x.d, i, 1) AS DECIMAL(10,0)) * 2, 32 - i)
FROM (VALUES ('01010010000100010111001101110011')) x(d)
CROSS JOIN Tally) AS t(v)
Explanation:
Tally is a table expression returning all values from 1-32.
Using these values we can extract every single digit out of the binary string using SUBSTRING.
With the use of POWER mathematical function we can convert every separate binary digit to decimal.
Using SUM we can add up all separate decimal numbers to get the expected result.
Demo here
You can try like below -
DECLARE #Binary VARCHAR(100) = '01010010000100010111001101110011';
DECLARE #characters CHAR(36),
#result BIGINT,
#index SMALLINT,
#base BIGINT;
SELECT #characters = '0123456789abcdefghijklmnopqrstuvwxyz',
#result = 0,
#index = 0,
#base = 2;
WHILE #index < LEN(#Binary)
BEGIN
SELECT #result = #result + POWER(#base, #index) * (CHARINDEX(SUBSTRING(#Binary, LEN(#Binary) - #index, 1), #characters) - 1);
SET #index = #index + 1;
END
SELECT #result;
This will help you to convert from any base ( I used#base as 2 for binary) to base 10. Started from far right and moved to the left until we run out of digits. The conversion is the (base ^ index) * digit.
in Addition to Giorgos Betsos's reply
see ITVF function below:
CREATE FUNCTION [dbo].[udf_BinaryToDecimal]
(
#Binary VARCHAR(31)
)
RETURNS TABLE AS RETURN
WITH Tally (n) AS
(
--32 Rows
SELECT TOP (LEN (#Binary)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1
FROM (VALUES (0),(0),(0),(0)) a(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
)
SELECT
SUM(SUBSTRING(REVERSE(#Binary),n+1,1) * POWER(2,n)) TenBase
FROM Tally
/*How to Use*/
SELECT TenBase
FROM udf_BinaryToDecimal ('01010010000100010111')
/*Result -> 336151*/
I had to add this code to the end of Abhishek's Code Example to get a Signed number I needed.
DECLARE #MyNewExValue INT
IF #result > 2147483647
BEGIN
SET #result = #result - (2147483648 * 2)
END
SET #MyNewExValue = #result
SELECT #MyNewExValue
This solution will work with binary strings of any (bigint) length:
DECLARE #input varchar(max) =
'01010010000100010111001101110011'
;WITH N(V) AS
(
SELECT
ROW_NUMBER()over(ORDER BY (SELECT 1))
FROM
(VALUES(1),(1),(1),(1))M(a),
(VALUES(1),(1),(1),(1))L(a),
(VALUES(1),(1),(1),(1))K(a)
)
SELECT SUM(SUBSTRING(REVERSE(#input),V,1)*POWER(CAST(2 as BIGINT), V-1))
FROM N
WHERE V <= LEN(#input)
(source)
It should be noted the most upvoted solution above only works with binary strings exactly 32 bits in length, and a string of '00000000000000000000000000000000' returns 1.

Remove the 7th digit of a bigint for every row in a Sql Server table

I am maintaining SQL Server database and some c# code which uploads data to it from a third party. The database has a table 'LessonRoom' which contains a row for each lesson which occurs in a particluar room, it has a field 'SourceKey' which is a bigint and is formed by concatenating a room id and a lesson id, the c# which returns this key is as follows:
SourceKey = long.Parse(RoomId.ToString().PadRight(7, '0') + LessonId.ToString());
This code started falling over because the lessonId's grew too large and the resulting int is too large to fit in a bigint (c# long). The RoomIds are only ever 5 digits long so an easy fix is to PadRight(6, '0').
Now I have a solution but I need to update the existing data. I don't know how to remove a zero from the 7th digit of a SQL Server bigint in every row of 500,000 rows. Do I have to write a query to convert the value to a string, remove the zero, parse and put it back or can anyone think of a more succinct way to do it?
Essentially I need to turn this number:
6,159,800,830,114,069,893
Into this one:
615,980,830,114,069,893
Sine you know it is always the 7th character you want to remove you can do this quite easily.
declare #SourceKey bigint = 6159800830114069893
select cast(stuff(cast(#SourceKey as varchar(25)), 7, 1, '') as bigint)
you could resolve them with the modulo-Operator :)
here a simple T SQL example
DECLARE #input AS BIGINT
DECLARE #expect AS BIGINT
DECLARE #rest AS BIGINT
DECLARE #result AS BIGINT
DECLARE #resultShort AS BIGINT
SET #input = 6159800830114069893
SET #expect= 615980830114069893
SET #rest = #input % 1000000000000
SET #result = ( ( #input - #rest ) / 10 ) + #rest
SET #resultShort = ( ( #input - #input % 1000000000000 ) / 10 ) + #input %
1000000000000
SELECT #rest, #result,
CASE
WHEN #result = #expect THEN 'true'
ELSE 'false'
END AS test,
#resultShort,
CASE
WHEN #resultShort = #expect THEN 'true'
ELSE 'false'
END AS test2

How to add or concatenate money type and string on query mssql

I have a situation like this
I got a column with 'money' type, 2 decimal . Example data:(65.00)
I need to add 12 zero / 000000000000 to it so that the output would be like this:
(65.00 convert to 6500) + (000000000000) = 000000006500
Output: 000000006500
How can I achieve this?. Thank you for your help and suggestion
You can do this with a couple of casts, multiplying by 100, and using REPLICATE('0') to pad with the requisite number of zeroes).
I'm assuming you DO want up to 2 x trailing decimals, but no more.
DECLARE #value MONEY;
SET #value = 65.123;
DECLARE #intValue BIGINT;
SET #intValue = CAST(#value * 100.0 AS BIGINT);
SELECT REPLICATE('0',12-LEN(#intValue)) + CAST(#intValue AS NVARCHAR(20));
Returns 000000006512
If you need to do this on a set, a CTE can be used for the intermediate step, e.g.
WITH cte AS
(
SELECT CAST(MoneyField * 100.0 AS BIGINT) AS intValue
FROM SomeTable
)
SELECT
REPLICATE('0',12-LEN(cte.intValue)) + CAST(cte.intValue AS NVARCHAR(20))
FROM cte;
Fiddle here
It is Possible .But output Column should be in the type of varchar(15) .If you want to do further operation of your output you have to convert that into int or whatever
SELECT CONCAT(REPEAT('0',12-LENGTH(65.00)),(65.00*100));

Resources