Generate unique ID from string

Generate unique ID from string - sql-server

I am trying to take a text string and create a unique numerical value from it and I am not having any luck.
For example, I have user names (first and last) and birthdate. I have tried taking these values and converting them to varbinary, which does give me a numerical value from the data, but it isn't unique. Out of ~700 records, I will get at least 100 numerical values that are duplicated but the text of first name, last name, and birthdate that was used to generate the number is different.
Here is some code I have been trying:
SELECT CONVERT(VARCHAR(300), CONVERT(BIGINT,(CONVERT(VARBINARY, SE.FirstName) + CONVERT(VARBINARY, SE.BirthDate) ))) FROM ELIGIBILITY SE
If I use that code and convert the following data, the result is 3530884780910457344. So the same number is generated from this unique data:
David 12/03/1952
Janice 12/23/1952
Michael 03/24/1952
Mark 12/23/1952
I am looking for some way, the simpler the better, to take these values and generate a unique numerical value from that data. And the reason why I need to use these values as input is because I am trying to avoid creating duplicates in the future as well as be able to predict the numerical value based on the formula. This is why NewID() won't work for me.

How about simply:
SELECT CHECKSUM(name, BirthDate) FROM dbo.ELIGIBILITY;
Of course, since there are still chances for collisions, maybe you should better define what you are actually trying to do. You've stated some reasons why e.g. NEWID() won't work but I still don't follow the the underlying purpose of this unique number.

Related

Find Maximum of Array of Index/Match from Concatenated String

Given the following table:
I would like the Actual Start to show the Preferred Start value, if the Depends column is empty (easy).
If the Depends column contains one or more comma-separated Id values, I would like to split on comma, look up the array of "Preferred Start" values based on the corresponding Id value, and then select the maximum value.
The following formula will correctly split the "Depends" cell:
=FILTERXML("<t><s>"&SUBSTITUTE(G6,",","</s><s>")&"</s></t>","//s")
Which can be verified, by using an array-valued MAX function (this returns "4"):
={MAX((FILTERXML("<t><s>"&SUBSTITUTE(G6,",","</s><s>")&"</s></t>","//s")))}
However, what I really want to do is:
={MAX(INDEX(Table1[Preferred Start],MATCH((FILTERXML("<t><s>"&SUBSTITUTE(G6,",","</s><s>")&"</s></t>","//s")),Table1[Id],0)))}
Somewhere along the way however, it loses the "arrayness", and simply returns the "Preferred Start" of the first Id number of the split (Id 3, 17 Jan 18).
Is what I'm trying to do even possible without resorting to VBA? I suspect I will run into a circular reference in actuality, since I really need to take the maximum of the "Actual Start" (adjusted for dependencies), to properly cascade a chain of dependent items.
Thanks

This is a known issue with INDEX, it's reluctant to return an array without some co-ercion. Generically this should work
=INDEX(range,N(IF(1,{array})))
so that becomes the following with your specific scenario
=MAX(INDEX(Table1[Preferred Start],N(IF(1,MATCH((FILTERXML("<t><s>"&SUBSTITUTE(G6,",","</s><s>")&"</s></t>","//s")),Table1[Id],0)))))
confirm with CTRL+SHIFT+ENTER
I assume that every row has a different ID number because the MATCH function will only find the first match for each ID
....or for a completely different approach you can use AGGREGATE function (and SEARCH instead of FILTERXML), which doesn't require "array entry" and would return the correct MAX even if IDs repeat, i.e.
=AGGREGATE(14,6,Table1[Preferred Start]/SIGN(SEARCH(","&Table1[Id]&",",","&G6&",")),1)

Reorder the match to include the max in it:
=INDEX(Table1[Preferred Start],MATCH(MAX((FILTERXML("<t><s>"&SUBSTITUTE(G6,",","</s><s>")&"</s></t>","//s"))),Table1[Id],0))
Enter as an array formula using Ctrl-Shift-Enter.

How to compare two strings in sql in natural order

I had a quick question.
I need to compare two strings in SQL in natural order. So if I have a string like ‘20091210’ and ‘20101213’ then the latter would be greater. The string could also contain alpha characters so ‘Y4550’ would be greater than ‘Y4500’. I tried using the CHECKSUM system function to convert the string to a hashed number but that isn’t giving me a number with regard to natural order.
Do you know of anything that I can use aside from making a CLR function?

If I'm understanding your question right, you want to compare two string columns in the same row, or a column with a variable. For that, you can simply use the < and > operators:
SELECT * FROM Users WHERE Username > 'Tom'
That will return any users whose username falls alphabetically after "Tom."
If we were talking about multiple records, ORDER BY will do the trick:
SELECT * FROM Users ORDER BY Username
That will sort users by their username, in ascending alphabetical order.

Using a string key to return a value from an array

I have a named array of 14 rows by 2 columns. The first has a string key (ie: Country), and the second an attribute (ie: Owner). I want to retrieve the Owner by supplying the Country.
I only know how to use =INDEX to retrieve values from named arrays, but that expects col/row numbers.
How might I achieve my requirement?

For the sake of an answer.
Feed the INDEX function with a MATCH function to provide the requisite row number, along the lines:
=INDEX(B:B,MATCH(A2,A:A,0))
VLOOKUP will work but INDEX/MATCH is more powerful (see) so if you are already comfortable with INDEX it might be better to add MATCH to your arsenal rather than to bother with V/H LOOKUP.

How to get array from non-array field for INSERT command?

I'm trying to use a SELECT statement together with a INSERT INTO command. Everything would work fine, if there wasn't a small problem: some fields of the table are defined as real[] but my input is numeric. Thus, the question:
Is there a function in PostgreSQL to create out of the single numeric input an array of type real (with just one element)?
My setting looks like this:
tempLogTable(..., logValue NUMERIC, ...)
finalLogTable(..., logValues REAL[], ...)
The idea is to insert the tuples from the tempLogTable to the finalLogTable using INSERT INTO ... SELECT .... Unfortunately, because of various reasons the data types are given and I would not like to change these for the moment (not to break anything).
I'm using PostgreSQL 9.2.

SELECT ARRAY[thenumeric::real] FROM the_table;
or
SELECT ARRAY[thenumeric]::real[] FROM the_table;
They're not really any different for a one-element array.
real has limits that numeric doesn't. In particular, comparing real values for equality doesn't work reliably; you should instead compare for two numerics being different by smaller than a small (somewhat task-specific) amount. It also can't represent values as big or small as numeric can. See the floating point guide among other info on comparing floats. This will be much harder to do right when they're wrapped in arrays.
For the purpose you describe, where it sounds like you are just collecting stats or historical data, that isn't going to be a problem. It usually only turns out to be an issue where people try to write:
WHERE some_real = some_other_real
which will result in surprising and unexpected behaviour.
You should be fine with an INSERT INTO ... SELECT as described.

Dropping Leading Zeros

I have a form that records a student ID number. Some of those numbers contain a leading zero. When the number gets recorded into the database it drops the leading 0.
The field is set up to only accept numbers. The length of the student ID varies.
I need the field to be recorded and displayed with the leading zero.

If you are always going to have a number of a certain length (say, it will always be 10 characters), then you can just get the length of the number in the database (after it is converted to a string) and then add the appropriate 0's.
However, if this is an arbitrary amount of leading zeros, then you will have to store the content as a string in the database so you can capture the leading zeros.

It sounds like this should be stored as string data. It sounds like the leading zeros are part of the data itself, not just part of it's formatting.
You could reformat the data for display with the leading zeros in it, however I believe you should store the correct form of the ID number, it will lead to less bugs down the road (ex: you forgot to format it in one place but not in another).

There are a few ways of doing this - depending on the answers to my comments in your question:
Store the extra data in the database by converting the datatype from numeric to varchar/string.
Advantages: Very simple in its implementation; You can treat all the values in the same way.
Disadvantage: If you've got very large amounts of data, storage sizes will escalate; indexing and sorting on strings doesn't perform so well.
Use if: Each number may have an arbitrary length (and hence number of zeros).
Don't use if: You're going to be spending a lot of time sorting data, sorting numeric strings is a pain in the ass - look up natural sorting to see some of the pitfalls;
Continue to store the data in the database as numeric but pad the numeric back to a set length (i.e. 10 as I have suggested in my example below):
Advantages: Data will index better, search better, not require such large amounts of storage if you've got large amounts of data.
Disadvantage: Every query or display of data will require every data instance to be padded to the correct length causing a slight performance hit.
Use if: All the output numbers will be the same length (i.e. including zeros they're all [for example] 10 digits); Large amounts of sorting will be necessary.
Add a field to your table to store the original length of the numeric, continue to store the value as numeric (to leverage sorting/indexing performance gains of numeric vs. string) in your new field store the length as it would include the significant zeros:
Advantages: Reduction in required storage space; maximum use of indexing; sorting of numerics is far easier than sorting text numerics; You still get the ability to pad numerics to arbitrary lengths like you have with option 1.
Disadvantages: An extra field is required in your database, so all your queries will have to pull that extra field thus potentially requiring a slight increase in resources at query/display time.
Use if: Storage space/indexing/sorting performance is any sort of concern.
Don't use if: You don't have the luxury of changing the table structure to include the extra value; This will overcomplicate already complex queries.
If I were you and I had access to modify the db structure slightly, I'd go with option 3, sure you need to pull out an extra field to get the length. The slightly increased complexity pays huge dividends in the advantages versus the disadvantages. The performance hit of padding the string back out the correct length will be far superceded by the performance increase of the indexing and storage space required.

I worked with a database with a similar problem. They were storing zip codes as a number. The consequence was that people in New Jersey couldn't use our app.
You're using data that is logically a text string and not a number. It just happens to look like a number, but you really need to treat it as text. Use a text-oriented data type, or at least create a database view that enables you to pull back a properly formatted value for this.

See here: Pad or remove leading zeroes from numbers

declare #recordNumber integer;
set #recordNumber = 93088;
declare #padZeroes integer;
set #padZeroes = 8;
select
right( replicate('0',#padZeroes)
+ convert(varchar,#recordNumber), #padZeroes);

Unless you intend on doing calculations on that ID, its probably best to store them as text/string.

Another option is since the field is an id, i would recommend creating a secondary field for display number (nvarchar) that you can use for reports, etc...
Then in your application when the student id is entered you can insert that into the database as the number, as well as the display number.

An Oracle solution
Store the ID as a number and convert it into a character for display. For instance, to display 42 as a zero-padded, three-character string:
SQL> select to_char(42, '099') from dual;
042
Change the format string to fit your needs.
(I don't know if this is transferable to other SQL flavors, however.)

You could just concatenate '1' to the beginning of the ID when storing it in the database. When retrieving it, treat it as a string and remove the first char.
MySQL Example:
SET #student_id = '123456789';
INSERT INTO student_table (id,name) VALUES(CONCAT('1',#student_id),'John Smith');
...
SELECT SUBSTRING(id,1) FROM student_table;
Mathematically:
Initially I thought too much and did it mathematically by adding an integer to the student ID, depending on its length (like 1,000,000,000 if it's 9 digits), before storing it.
SET #new_student_id = ABS(#student_id) + POW(10, CHAR_LENGTH(#student_id));
INSERT INTO student_table (id,name) VALUES(#new_student_id,'John Smith');

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight