I have a customer table, and my requirement is to add a new varchar column that automatically obtains a random unique value each time a new customer is created.
I thought of writing an SP that randomizes a string, then check and re-generate if the string already exists. But to integrate the SP into the customer record creation process would require transactional SQL stuff at code level, which I'd like to avoid.
Help please?
edit:
I should've emphasized, the varchar has to be 5 characters long with numeric values between 1000 and 99999, and if the number is less than 10000, pad 0 on the left.
if it has to be varchar, you can cast a uniqueidentifier to varchar.
to get a random uniqueidentifier do NewId()
here's how you cast it:
CAST(NewId() as varchar(36))
EDIT
as per your comment to #Brannon:
are you saying you'll NEVER have over 99k records in the table? if so, just make your PK an identity column, seed it with 1000, and take care of "0" left padding in your business logic.
This question gives me the same feeling I get when users won't tell me what they want done, or why, they only want to tell me how to do it.
"Random" and "Unique" are conflicting requirements unless you create a serial list and then choose randomly from it, deleting the chosen value.
But what's the problem this is intended to solve?
With your edit/update, sounds like what you need is an auto-increment and some padding.
Below is an approach that uses a bogus table, then adds an IDENTITY column (assuming that you don't have one) which starts at 1000, and then which uses a Computed Column to give you some padding to make everything work out as you requested.
CREATE TABLE Customers (
CustomerName varchar(20) NOT NULL
)
GO
INSERT INTO Customers
SELECT 'Bob Thomas' UNION
SELECT 'Dave Winchel' UNION
SELECT 'Nancy Davolio' UNION
SELECT 'Saded Khan'
GO
ALTER TABLE Customers
ADD CustomerId int IDENTITY(1000,1) NOT NULL
GO
ALTER TABLE Customers
ADD SuperId AS right(replicate('0',5)+ CAST(CustomerId as varchar(5)),5)
GO
SELECT * FROM Customers
GO
DROP TABLE Customers
GO
I think Michael's answer with the auto-increment should work well - your customer will get "01000" and then "01001" and then "01002" and so forth.
If you want to or have to make it more random, in this case, I'd suggest you create a table that contains all possible values, from "01000" through "99999". When you insert a new customer, use a technique (e.g. randomization) to pick one of the existing rows from that table (your pool of still available customer ID's), and use it, and remove it from the table.
Anything else will become really bad over time. Imagine you've used up 90% or 95% of your available customer ID's - trying to randomly find one of the few remaining possibility could lead to an almost endless retry of "is this one taken? Yes -> try a next one".
Marc
Does the random string data need to be a certain format? If not, why not use a uniqueidentifier?
insert into Customer ([Name], [UniqueValue]) values (#Name, NEWID())
Or use NEWID() as the default value of the column.
EDIT:
I agree with #rm, use a numeric value in your database, and handle the conversion to string (with padding, etc) in code.
Try this:
ALTER TABLE Customer ADD AVarcharColumn varchar(50)
CONSTRAINT DF_Customer_AVarcharColumn DEFAULT CONVERT(varchar(50), GETDATE(), 109)
It returns a date and time up to milliseconds, wich would be enough in most cases.
Do you really need an unique value?
Related
I'm using SQL Server 2014. My request I believe is rather simple. I have one table containing a field holding a date value that is stored as VARCHAR, and another table containing a field holding a date value that is stored as INT.
The date value in the VARCHAR field is stored like this: 2015M01
The data value in the INT field is stored like this: 201501
I need to compare these tables against each other using EXCEPT. My thought process was to somehow extract or TRIM the "M" out of the VARCHAR value and see if it would let me compare the two. If anyone has a better idea such as using CAST to change the date formats or something feel free to suggest that as well.
I am also concerned that even extracting the "M" out of the VARCHAR may still prevent the comparison since one will still remain VARCHAR and the other is INT. If possible through a T-SQL query to convert on the fly that would be great advice as well. :)
REPLACE the string and then CONVERT to integer
SELECT A.*, B.*
FROM TableA A
INNER JOIN
(SELECT intField
FROM TableB
) as B
ON CONVERT(INT, REPLACE(A.varcharField, 'M', '')) = B.intField
Since you say you already have the query and are using EXCEPT, you can simply change the definition of that one "date" field in the query containing the VARCHAR value so that it matches the INT format of the other query. For example:
SELECT Field1, CONVERT(INT, REPLACE(VarcharDateField, 'M', '')) AS [DateField], Field3
FROM TableA
EXCEPT
SELECT Field1, IntDateField, Field3
FROM TableB
HOWEVER, while I realize that this might not be feasible, your best option, if you can make this happen, would be to change how the data in the table with the VARCHAR field is stored so that it is actually an INT in the same format as the table with the data already stored as an INT. Then you wouldn't have to worry about situations like this one.
Meaning:
Add an INT field to the table with the VARCHAR field.
Do an UPDATE of that table, setting the INT field to the string value with the M removed.
Update any INSERT and/or UPDATE stored procedures used by external services (app, ETL, etc) to do that same M removal logic on the way in. Then you don't have to change any app code that does INSERTs and UPDATEs. You don't even need to tell anyone you did this.
Update any "get" / SELECT stored procedures used by external services (app, ETL, etc) to do the opposite logic: convert the INT to VARCHAR and add the M on the way out. Then you don't have to change any app code that gets data from the DB. You don't even need to tell anyone you did this.
This is one of many reasons that having a Stored Procedure API to your DB is quite handy. I suppose an ORM can just be rebuilt, but you still need to recompile, even if all of the code references are automatically updated. But making a datatype change (or even moving a field to a different table, or even replacinga a field with a simple CASE statement) "behind the scenes" and masking it so that any code outside of your control doesn't know that a change happened, not nearly as difficult as most people might think. I have done all of these operations (datatype change, move a field to a different table, replace a field with simple logic, etc, etc) and it buys you a lot of time until the app code can be updated. That might be another team who handles that. Maybe their schedule won't allow for making any changes in that area (plus testing) for 3 months. Ok. It will be there waiting for them when they are ready. Any if there are several areas to update, then they can be done one at a time. You can even create new stored procedures to run in parallel for any updated app code to have the proper INT datatype as the input parameter. And once all references to the VARCHAR value are gone, then delete the original versions of those stored procedures.
If you want everything in the first table that is not in the second, you might consider something like this:
select t1.*
from t1
where not exists (select 1
from t2
where cast(replace(t1.varcharfield, 'M', '') as int) = t2.intfield
);
This should be close enough to except for your purposes.
I should add that you might need to include other columns in the where statement. However, the question only mentions one column, so I don't know what those are.
You could create a persisted view on the table with the char column, with a calculated column where the M is removed. Then you could JOIN the view to the table containing the INT column.
CREATE VIEW dbo.PersistedView
WITH SCHEMA_BINDING
AS
SELECT ConvertedDateCol = CONVERT(INT, REPLACE(VarcharCol, 'M', ''))
--, other columns including the PK, etc
FROM dbo.TablewithCharColumn;
CREATE CLUSTERED INDEX IX_PersistedView
ON dbo.PersistedView(<the PK column>);
SELECT *
FROM dbo.PersistedView pv
INNER JOIN dbo.TableWithIntColumn ic ON pv.ConvertedDateCol = ic.IntDateCol;
If you provide the actual details of both tables, I will edit my answer to make it clearer.
A persisted view with a computed column will perform far better on the SELECT statement where you join the two columns compared with doing the CONVERT and REPLACE every time you run the SELECT statement.
However, a persisted view will slightly slow down inserts into the underlying table(s), and will prevent you from making DDL changes to the underlying tables.
If you're looking to not persist the values via a schema-bound view, you could create a non-persisted computed column on the table itself, then create a non-clustered index on that column. If you are using the computed column in WHERE or JOIN clauses, you may see some benefit.
By way of example:
CREATE TABLE dbo.PCT
(
PCT_ID INT NOT NULL
CONSTRAINT PK_PCT
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
, SomeChar VARCHAR(50) NOT NULL
, SomeCharToInt AS CONVERT(INT, REPLACE(SomeChar, 'M', ''))
);
CREATE INDEX IX_PCT_SomeCharToInt
ON dbo.PCT(SomeCharToInt);
INSERT INTO dbo.PCT(SomeChar)
VALUES ('2015M08');
SELECT SomeCharToInt
FROM dbo.PCT;
Results:
I have a table called ComplaintCodes which contains about 15 rows and 2 columns: ComplaintCodeId and ComplaintCodeText.
I want to insert a new row into that table but have its ID set to 1 which will also add 1 to all of the ID's that exist already. Is this possible?
EDIT
Using SQL Server and ComplaintCodeId is an identity / PK column
It's possible as two separate DML statements, an UPDATE to update the ID and a subsequent INSERT. But this will fail if you are using the ID as a foreign key in another table of course, so you'd need to find a way to update across all related tables.
Why would you want to do this though? Suggest you take a step back and reconsider the design decision that has brought you to this question.
And yes, as podiluska says in his/her(/its!) comment, please specify which DBMS you are using in your question and/or tags.
update <table> set ComplaintCodeId =ComplaintCodeId +1
insert into <table>
select 1,'other column'
Edit:
If its a PK+Identity column, then its a very bad idea to do like this. You cannot update an identity column..
Instead of updating you could do something like this:
select row_number() over (order by ComplaintCodeId desc) as row_num,
ComplaintCodeId
from table
and use row_num instead of ComplaintCodeId
After some thought, it seems to me that the best solution to your problem is to change the PK to be non-identity. Then you can set the value to whatever you'd like.
I still think that using a Display Order column (which is the only reason I can think you'd care the order in the table) would be a fine solution, but if you really want the PK order to be the display order, then changing the PK to non-identity would be a good long-term solution as you wouldn't have these problems in the future.
I'm in the process of writing a Membership Provider for use with our existing membership base. I use EF4.1 for all of my database access and one of the issued that I'm running into is when the DB was originally setup the relationships were done programmatically instead of in the db. One if the relationships needs to be made on a column that isn't required for all of our users, but in order to make the relationships does need to be unique (from my understanding).
My solution that I believe will work is to do an MD5 hash on the userid field (which is unique ...which would/should guarantee a unique value in that field). The part that I'm having issues with on sql server is the query that would do this WITHOUT replacing the existing values stored in the employeeNum field (the one in question).
So in a nutshell my question is. What is the best way to get a unique value in the employeeNum field (possibly based on an md5 hash of the userid field) on all the rows in which a value isn't already present. Also, to a minor/major extent...does this sound like a good plan?
If your question is just how to generate a hash value for userid, you can do it this way using a computed column (or generate this value as part of the insert process). It isn't clear to me whether you know about the HASHBYTES function or what other criteria you're looking at when you say "best."
DECLARE #foo TABLE
(
userid INT,
hash1 AS HASHBYTES('MD5', CONVERT(VARCHAR(12), userid)),
hash2 AS HASHBYTES('SHA1', CONVERT(VARCHAR(12), userid))
);
INSERT #foo(userid) SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 500;
SELECT userid, hash1, hash2 FROM #foo;
Results:
userid hash1 hash2
------ ---------------------------------- ------------------------------------------
1 0xC4CA4238A0B923820DCC509A6F75849B 0x356A192B7913B04C54574D18C28D46E6395428AB
2 0xC81E728D9D4C2F636F067F89CC14862C 0xDA4B9237BACCCDF19C0760CAB7AEC4A8359010B0
500 0xCEE631121C2EC9232F3A2F028AD5C89B 0xF83A383C0FA81F295D057F8F5ED0BA4610947817
In SQL Server 2012, I highly recommend at least SHA2_256 instead of either of the above. (You forgot to mention what version you're using - always useful information.)
All that said, I still want to call attention to the point I made in the comments: the "best" solution here is to fix the model. If employeeNum is optional, EF shouldn't be made to think it is required or unique, and it shouldn't be used in relationships if it is not, in fact, some kind of identifier. Why would a user care about collisions between employeeNum and userid if you're using the right attribute for the relationship in the first place?
EDIT as requested by OP
So what is wrong with saying UPDATE table SET EmployeeNum = 1000000 + UserID WHERE EmployeeNum IS NULL? If EmployeeNum will stay below 1000000 then you've guaranteed no collisions and you've avoided hashing altogether.
You could generate similar padding if employeeNum might contain a string, but again is it EF that promotes these horrible column names? Why would a column with a Num suffix contain anything but a number?
You could also use a uniqueidentifier setting the default value to (newid())
Create a new column EmployeeNum as uniqueidentifer, then:
UPDATE Employees SET EmployeeNum = newid()
Then set as primary key.
UPDATE EMPLOYEE
SET EMPLOYEENUM = HASHBYTES('SHA1', CAST(USERID AS VARCHAR(20)))
WHERE EMPLOYEENUM IS NULL
I have the following table:
tbl_ProductCatg
Id IDENTITY
Code
Description
a few more.
Id field is auto-incremented and I have to insert this field value in Code field.
i.e. if Id generated is 1 then in Code field the value should be inserted like 0001(formatted for having length of four),if id is 77 Code should be 0077.
For this, I made the query like:
insert into tbl_ProductCatg(Code,Description)
values(RIGHT('000'+ltrim(Str(SCOPE_IDENTITY()+1,4)),4),'testing')
This query runs well in sql server query analyzer but if I write this in C# then it insets Null in Code even Id field is updated well.
Thanks
You may want to look at Computed Columns (Definition)
From what is sounds like you are trying to do, this would work well for you.
CREATE TABLE tbl_ProductCatg
(
ID INT IDENTITY(1, 1)
, Code AS RIGHT('000' + CAST(ID AS VARCHAR(4)), 4)
, Description NVARCHAR(128)
)
or
ALTER TABLE tbl_ProductCatg
ADD Code AS RIGHT('000' + CAST(id AS VARCHAR(4)), 4)
You can also make the column be PERSISTED so it is not calculated every time it is referenced.
Marking a column as PERSISTED Specifies that the Database Engine will physically store the computed values in the table, and update the values when any other columns on which the computed column depends are updated.
Unfortunately SCOPE_IDENTITY isn't designed to be used during an insert so the value will not be populated until after the insert happens.
The three solutions I can see of doing this would be either making a stored procedure to generate the scope identity and then do an update of the field.
insert into tbl_ProductCatg(Description) values(NULL,'testing')
update tbl_ProductCatg SET code=RIGHT('000'+ltrim(Str(SCOPE_IDENTITY()+1,4)),4) WHERE id=SCOPE_IDENTITY()
The second option, is taking this a step further and making this into a trigger which runs on UPDATE and INSERT. I've always been taught to avoid triggers where possible and instead do things at the SP level, but triggers are justified in some cases.
The third option is computed fields, as described by #Adam Wenger
I want to create a PIN that is unique within a table but not incremental to make it harder for people to guess.
Ideally I'd like to be able to create this within SQL Server but I can do it via ASP.Net if needed.
EDIT
Sorry if I wasn't clear: I'm not looking for a GUID as all I need is a unique id for that table; I just don't want it to be incremental.
Add a uniqueidentifier column to your table, with a default value of NEWID(). This will ensure that each column gets a new unique identifier, which is not incremental.
CREATE TABLE MyTable (
...
PIN uniqueidentifier NOT NULL DEFAULT newid()
...
)
The uniqueidentifier is guaranteed to be unique, not just for this table, but for all tables.
If it's too large for your application, you can derive a smaller PIN from this number, you can do this like:
SELECT RIGHT(REPLACE((SELECT PIN from MyTable WHERE UserID=...), '-', ''), 4/*PinLength*/)
Note that the returned smaller PIN is not guaranteed to be unique for all users, but may be more manageable, depending upon your application.
EDIT: If you want a small PIN, with guaranteed uniqueness, the tricky part is that you need to know at least the maximum number of users, in order to choose the appropriate size of the pin. As the number of users increases, the chances of a PIN collision increases. This is similar to the Coupon Collector's problem, and approaches n log n complexity, which will cause very slow inserts (insert time proportional to the number of existing elements, so inserting M items then becomes O(N^2)). The simplest way to avoid this is to use a large unique ID, and select only a portion of that for your PIN, assuming that you can forgo uniqueness of PIN values.
EDIT2:
If you have a table definition like this
CREATE TABLE YourTable (
[id] [int] IDENTITY(1,1) NOT NULL,
[pin] AS (CONVERT(varchar(9),id,0)+RIGHT(pinseed,3)) PERSISTED,
[pinseed] [uniqueidentifier] NOT NULL
)
This will create the pin from the pinseed a unique ID and the row id. (RAND does not work - since SQL server will use the same value to initialize multiple rows, this is not the case with NEWID())
Just so that it is said, I advise that you do not consider this in any way secure. You should consider it always possible that another user could guess someone else's PIN, unless you somehow limit the number of allowed guesses (e.g. stop accepting requests after 3 attempts, similar to a bank witholding your card after 3 incorrect PIN entries.)
What you want is a GUID
http://en.wikipedia.org/wiki/Globally_unique_identifier
Most languages have some sort of API for generating this... a google search will help ;)
How about a UNIQUEIDENTIFIER type column with a default value of NEWID()?
That will generate a new GUID for each row.
Please have in mind that by requiring an unique PIN (which is uncommon) you will be limiting the max number of allowed users to the the PIN specification. Are you sure you want this ?
A not very elegant solution but which works is to use an UNIQUE field, and then loop attempting to insert a random generated PIN until the insert is successful.
You can use the following to generate a BIGINT, or other datatype.
SELECT CAST(ABS(CHECKSUM(NEWID()))%2000000000+1 as BIGINT) as [PIN]
This creates a number between 1 and 2 billion. You will simulate some level of randomness since it's derived from the NEWID function. You can also format the result as you wish.
This doesn't guarantee uniqueness. I suggest that you use a unique constraint on the PIN column. And, your code that creates the new PIN should check that the new value is unique before it assigns the value.
Use a random number.
SET #uid = ROUND(RAND() * 100000)
The more sparse your values are in the table, the better this works. If the number of assigned values gets large is relationship to the number of available values, it does not work as well.
Once the number is generated you have a couple of options.
1) INSERT the value inside of a retry loop. If you get a dupe error, regenerate the value (or try the value +/-1) and try again.
2) Generate the value and look for the MAX and MIN existing unique identifiers.
DECLARE
#uid INTEGER
SET #uid = ROUND(RAND() * 10000, 1)
SELECT #uid
SELECT MAX(uid) FROM table1 WHERE uid < #uid
SELECT MIN(uid) FROM table1 WHERE uid > #uid
The MIN and MAX value give you a range of available values to work from if the random value is already assigned.