I have a table with IDs and locales. The same ID can be listed more than once with a different locale:
ID Locale
123456 EN_US
234567 EN_US
234567 EN_CA
345678 EN_US
I need to create an unique identifier in the form of an numeric ID (Integer) for each record, while maintaining the ability to reverse engineer the original components.
I was thinking bit shifting might work: assign a numerical value to each locale, but I'm not quite sure how to implement. Has anyone faced this challenge before? Also, I have 75 locales so I'm not sure if that would be an issue with bit shifting.
Lastly, I'm using SQL Server with a Linked Server connection to Teradata (that's my data source). I don't think Teradata supports bitwise out-of-the-box so I'm assuming I'll have to do it in MSSQL.
Thank you.
You can create a composite numeric key, mapping your 75 unique values into the last 2 digits of the numeric key. You can parse into components with simple modulus 100 arithmetic or just a substring. If you will ever exceed 100 values, use 3 digits instead. 9 digits total will fit int an int, 10-18 will fit in a bigint.
Converting 234567-EN_US into an integer is easy. Just use CHECKSUM on the concatenated string value. It would not be reversible, however.
You could store this CHECKSUM value on the original table, however, and then use it to backtrack from whatever table you're going to store the integer in.
Another solution would be to assign each locale an Integer value (as Marc B suggested). Call that X. Then call your existing integer ID (234567) as Y. Your final key would be (X * 1,000,000) + Y. You could then reverse the formula to get the values back. This would only work, of course, if your existing integer IDs are well below 1,000,000, and also if your final integer can be a BigInt.
Related
I am using ASP.NET MVC and SQL Server and I want to store a 12 digit value which is 221133556677.
This is where I wanted to store the value in, So Int36 can only store up to 10 digit.
So how can I change the data type into numeric(12,0) in order to store the 12 digit value.
[Display(Name = "IC")]
[Required(AllowEmptyStrings = false, ErrorMessage = "IC is required")]
public int IC { get; set; }
So Int36 can only store up to 10 digit.
In all computers in the world htat follow standard architecture, there IS NO SUCH THING AS INT36. Bytes are 8 bits, so it is 32. Not 36.
And since ages, Int64 is a thing too. Which has MUCH MUCH larger scale.
In SQL Server it is named BIGINT and has a scale that may surprise you:
2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807)
Case closed?
Oh, no....
So how can I change the data type into numeric(12,0) in order to store the 12 digit
value.
Just Do It? Let's start with your C# side code using int - not long. Int is 32 bit (not 26). Just change it to - oh, you insist on using numeric (decimal)? Ah, use Decimal not int. Done. Otherwise I would go with a long and bigint on the database.
Note, though, that this "number" is likely NOT A NUMBER. It is a numeric string. Storing it as number makes little sense if you may need one day to do partial searches and never will use stuff like average, sum etc.
Now, you may want to read some documentation:
https://learn.microsoft.com/en-us/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql?view=sql-server-ver15
has all SQL Server data types. This helps you not to ignore the obvious larger scale data type.
According to the SQL Server documentation you can use BIGINT.
Its a signed 64 bit int and has a range of -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807).
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql-server-data-type-mappings
From what I have been able to find specifying the size of the numeric doesn't effect the size of the number it can store in SQL Server and only affects when ZEROFILL is used
What is the size of column of int(11) in mysql in bytes?
A long can store 12 digits just fine. So use long instead of int in your C#.
long twelveDigits = 221133556677;
Console.WriteLine($"\nHere is twelve digit number, {twelveDigits}.");
Console.Write("\nPress any key to exit...");
Console.ReadKey(true);
See here: Long data type MSDocs
And SQLServer has the Data Type bigint
See here: int, bigint, smallint, and tinyint (Transact-SQL)
and: www.sqlservertutorial.net
These should get you taken care of.
My SQL Server database was created & designed by a freelance developer.
I see the database getting quite big and I want to ensure that the column datatypes are the most efficient in preserving the size as small as possible.
Most columns were created as
VARCHAR (255), NULL
This covers those where they are
Numerics with a length of 2 numbers maximum
Numerics where a length will never be more than 3 numbers or blank
Alpha which will contain just 1 letter or are blank
Then there are a number of columns which are alphanumeric with a maximum of 10
alphanumeric characters with a maximum of 25.
There is one big alphanumeric column which can be up to 300 characters.
There has been an amendment for a column which show time taken in seconds to race an event. Under 1000 seconds and up to 2 decimal places
This is set as DECIMAL (18,2) NULL
The question is can I reduce the size of the database by changing the column data types, or was the original design, optimum for purpose?
You should definitely strive to use the most appropriate data types for all columns - and in this regard, that freelance developer did a very poor job - both from a point of consistency and usability (just try to sum up the numbers in a VARCHAR(255) column, or sort by their numeric value - horribly bad design...), but also from a performance point of view.
Numerics with a length of 2 numbers maximum
Numerics where a length will never be more than 3 numbers or blank
-> if you don't need any fractional decimal points (only whole numbers) - use INT
Alpha which will contain just 1 letter or are blank
-> in this case, I'd use a CHAR(1) (or NCHAR(1) if you need to be able to handle Unicode characters, like Hebrew, Arabic, Cyrillic or east asian languages). Since it's really only ever 1 character (or nothing), there's no need or point in using a variable-length string datatype, since that only adds at least 2 bytes of overhead per string stored
There is one big alphanumeric column which can be up to 300 characters.
-> That's a great candidate for a VARCHAR(300) column (or again: NVARCHAR(300) if you need to support Unicode). Here I'd definitely use a variable-length string type to avoid padding the column with spaces up to the defined length if you really want to store fewer characters.
I have a SQL Server database that has a table that contains a field of type varbinary(256).
When I view this binary field via a query in MMS, the value looks like this:
0x004BC878B0CB9A4F86D0F52C9DEB689401000000D4D68D98C8975425264979CFB92D146582C38D74597B495F87FEA09B68A8440A
When I view this same field (and same record) using CFDUMP, the value looks like this:
075-56120-80-53-10279-122-48-1144-99-21104-1081000-44-42-115-104-56-10584373873121-49-714520101-126-61-115116891237395-121-2-96-101104-886810
(For the example below, the original binary value will be #A, and the CFDUMP value above will be #B)
I have tried using CAST(#B as varbinary(256)) but didn't get the same value as #A.
What must I do to convert the value retrieved from CFDUMP into the correct binary representation?
Note: I no longer have the applicable records in the database. I need to convert #B into the correct value that can re-INSERT into a varbinary(256) field.
(Expanded from comments)
I do not mean this sarcastically, but what difference does it make how they display binary? It is simply a difference in how the data is presented. It does not mean the actual binary values differ.
It is similar to how dates are handled. Internally, they are a big numbers. But since most people do not know which date 1234567890 represents, applications chose to display the number in a more human friendly format. So SSMS might present the date as 2009-02-13 23:31:30.000, while CF might present it as {ts '2009-02-13 23:31:30'}. Even though the presentations differ, it still the same value internally.
As far as binary goes, SSMS displays it as hexadecimal. If you use binaryEncode() on your query column, and convert the binary to hex, you can see it is the same value. Just without the leading 0x:
writeDump( binaryEncode(yourQuery.binaryColumn, "hex") )
If you are having some other issue with binary, could you please elaborate?
Update:
Unfortunately, I do not think you can easily convert the cfdump representation back into binary. Unlike Railo's implementation, Adobe's cfdump just concatenates the numeric representation of the individual bytes into one big string, with no delimiter. (The dashes are simply negative numbers). You can reproduce this by looping through the bytes of your sample string. The code below produces the same string of numbers you posted.
bytes = binaryDecode("004BC878B0CB9A4F...", "hex");
for (i=1; i<=arrayLen(bytes); i++) {
WriteOutput( bytes[i] );
}
I suppose it is theoretically possible to convert that string into binary, but it would be very difficult. AFAIK, there is no way to accurately determine where one number (or byte) begins and the other ends. There are some clues, but ultimately it would come down to guesswork.
Railo's implementation, displays the byte values separated by a dash "-". Two consecutive dashes indicates a negative number. ie "0", "75", "-56", ...
0-75--56-120--80--53--102-79--122--48--11-44--99--21-104--108-1-0-0-0--44--42--115--104--56--105-84-37-38-73-121--49--71-45-20-101--126--61--115-116-89-123-73-95--121--2--96--101-104--88-68-10
So you could probably parse that string back into an array of bytes. Then insert the binary into your database using <cfqueryparam cfsqltype="CF_SQL_BINARY" ..>. Unfortunately that does not help you, but the explanation might help the next guy.
At this point, I think your best bet is to just restore the data from a database backup.
I have a doubt about one of my book's statement.
Talking about key-indexed search in a symbol table, at a certain point it says: "If there aren't records (but only keys), we can use a bit table. In this case, the symbol table is called existence table, because we can consider the k-th bit as an indicator whether the k key there is or there isn't in the table. For example, using a 313-word table on a 32-bit computer, we can use this method to quickly determine whether a given 4-digit internal telephone number was already assigned."
Well, I know what a word is, thus that existence table should be a 10.016-bit table, in that case. But what does it mean? What does that fact of the 4-digit telephone number have to do with it? And so, how you can implement a symbol table with key-indexed search, when the records correspond to the keys?
There are 9000 four-digit numbers (in base 10, decimal), and 10000 (nonnegative) numbers with at most four digits, so a table with more than 10,000 bits is sufficient to indicate for each of these numbers whether it's present (is bit no n set or not?). For five-digit numbers - 90,000 of them - you'd need a larger table.
Since the bit-table can only tell you either "yes, we have it" or "no, we haven't", you can't use it if you need any information exceeding that. But if that's all you need to know, any injective mapping of keys to indices into the table (array) gives you access to that information, compactly stored. In the case of the telephone numbers, the mapping is trivial.
You can use a bittable of 10000 bits (each bit corresponding to a phone number), which fit in 313 bytes (10000/32 = 312.5 ~= 313)
I need to multiply a number which is like these 00000000001099 with 0.01 and then convert into two decimal places for e.g., 10.99 after multiplication in a derived column in SSIS package.
Right now I am using these expression (dt_numeric,2,2)((DT_CY)((dt_wstr,14)PRICE) * 0.01) but it is failing.
I get the column price with value 00000000001099 from a flat file after conversion I need to place the value back to a flat file again.
Since your string is 14 long you cannot use DT_I4 - it'll just figure out that this is very wrong and give you the error about potential loss of data. You could edit the error and ignore possible truncations, but a better way is to use a datatype that can hold your number
Your Derivation should look like this:
(DT_NUMERIC,X,2)((DT_NUMERIC,X+2,2)([InputColumn]))*0.01)
In your example
(DT_NUMERIC,14,2)(((DT_NUMERIC,16,2)([PRICE]))*0.01)
By using the extra step with x+2,2 makes you able to hold 99999999999999 into the numeric, then divide by 100 (or multiply with 0.01) and cast back to the minimum possible numeric (x,2) - you might want to use a bigger standardized numeric type - look at MSDN/BOL to see the storage requirements for each of them, and just pick the biggest type taking the same amount of bytes as your requirement.
This should work...
(DT_DECIMAL, 2 )(DT_WSTR, 20 )((DT_I4)#[User::Cost] * 0.01)
While the value 00000000001099 is a number, it cannot be represented this way in a numeric datatype. The leading zeros will be stripped. Because you are showing this number this way, I must presume the number is stored in a string datatype. In the dataflow before your derived column I would recommend the use of the "Data Conversion" component. Convert the string to a numeric type. In the downstream derived column component perform the mathematical multiplcation operation to get the decimal point in the correct place.