How to generate serial STRING primary key in Database - database

My collegues don't like auto generated int serial number by database and want to use string primary key like :
"camera0001"
"camera0002"
As camera may be deleted, I can not use "total nubmer of camera + 1" for id of a new camera.
If you were me, how will you generate this kind of key in your program?
PS : I think auto generated serail number as primary key is OK, just don't like arguing with my collegues.

Don't do it like "camera0001"! argue it out, that is a horrible design mistake.
try one of these:
http://en.wikipedia.org/wiki/Database_normalization
http://www.datamodel.org/NormalizationRules.html
just google: database normalization
Each column in a database should only contain 1 piece of information. Keep the ID and the type in different columns. You can display them together if you wish, but do not store them the together! You will have to constantly split them and make simple queries difficult. The string will take a lot of space on disk and cache memory, if it is a FK it will waste space there too.
have a pure numeric auto column ID and a type column that is a foreign key to a table that contains a description, like:
Table1
YourID int auto id PK
YourType char(1) fk
TypeTable
YourType char(1) PK
Description varchar(100)
Table1
YourID YourType YourData....
1 C xyz
2 C abc
3 R dffd
4 C fg
TypeTable
YourType Description
C Camera
R Radio

I don't agree that a sequence number is always the best key. When there is a natural primary key available, I prefer it to a sequence number. If, say, your cameras are identified by some reasonably short model name or code, like you identify your "Super Duper Professional Camera Model 3" as "SDPC3" in the catalog and all, that "SDPC3" would, in my opinion, be an excellent choice for a primary key.
But that doesn't sound like what your colleagues want to do here. They want to take a product category, "camera", that of course no one expects to be unique, and then make it unique by tacking on a sequence number. This gives you the worst of both worlds: It's hard to generate, a long string which makes it slower to process, and it's still meaningless: no one is going to remember that "camera0002904" is the 3 megapixel camera with the blue case while "camera0002905" is the 4 megapixel camera with the red case. No one is going to consistently remember that sort of thing, anyway. So you're not going to use these values as useful display values to the user.
If you are absolutely forced to do something like this, I'd say make two fields: One for the category, and one for the sequence number. If they want them concatenated together for some display, fine. Preferably make the sequence number unique across categories so it can be the primary key by itself, but if necessary you can assign sequence numbers within the category. MySQL will do this automatically; most databases will require you to write some code to do it. (Post again if you want discussion on how.) Oh, and I wouldn't have anyone type in "camera" for the category. This should be a look-up table of legal values, and then post the primary key of this look-up table into the product record. Otherwise you're going to have "camera" and "Camera" and "camrea" and dozens of other typos and variations.

Have a table with your serial number counters, increment it and insert your record.
OR
Set the Id to 'camera' + PAD((RIGHT(MAX(ID), 4) + 1), '0', 4)

Related

Trigger before insert to create a primary key from name and surname

I need to generate a primary key char(3) for my database table "People" from name and surname, i had old database inherited and i have to replicate this id. Example from 'John' 'Smith' i would like to generate the id 'JOS' then if there are two 'John Smith' it should go like this 'JSM' etc etc.
So I tought I probably could do this as trigger before insert, is that really possible? if yes is it the best way to do it ? How to setup the trigger.
CREATE OR REPLACE TRIGGER cacciatore_bef_ins
BEFORE INSERT ON eddo.CACCIATORI
referencing old as "old" new as "new"
FOR EACH ROW
DECLARE
temp char(3);
pos number :=1;
BEGIN
temp:=genera_cod(new.nome,new.cognome,pos);
while esiste(temp)= true
LOOP
temp:=genera_cod(new.nome,new.cognome,pos+1)
END LOOP;
new.codcacciatore:=temp;
END
end cacciatore_bef_ins
/
i had old database inherited and i have to replicate this id.
Says who? The fact that you inherited something bad doesn't mean that you have to make it worse. Just being curious: what would be the 3rd John Smith's primary key value? What about Joan Smadden? Yet another JOS as well as JSM.
I'd suggest you to
keep "old" values
keep column datatype
it is difficult to change primary key datatype and values because of foreign key constraints. If there are none, consider switching to something easier to maintain
CHAR(3) suggests that you don't expect many rows in this table. There are (in English alphabet) 26^3 = 17.576 possible combinations. That's probably enough for what you store, but you might exceed combinations for those "John Smith" variations. How to find an available primary key? That depends on algorithm you use (I'm not sure which one is it, based on what you described and my first question)
finally, see whether you can use something "simpler", e.g.
all digits (from 000 to 999), then
letter + 2 digits (e.g. A00, A01, ..., A99, B00, B01, ...)
and so on
A bad idea is still a bad idea even if data volume is small. The fact being it is a conversion from a manual system is all the more reason to change, everything else will be different anyway. Processes which work reasonable well manually fail miserably when automated. This is a case in point.
With a manual system reducing to 3 letters is a good, effective, efficient process. It gets me to the correct section of the file cabinet very quickly. From there I have only a few files to scan in order to locate the exact one needed.
That is manually.
In an automated system that is the exact opposite - it is ineffective, inefficient, and limited. I can guess with a high level of confidence your 3letter code derives from the first letter of first,middle,last name of the individual. (It could of course be derived from something else, but let go with it anyway.) This is both natural and intuitive to a person and if there is more than one John O. Smith (JOS) or Joan Ophelia Saunders (JOS) - then so what there are other means(data points) to identify the exact file. Not so on an automated system. I have a hard time imagining that anyone would think of OOS as indication for Joan Ophelia Saunders. But as automated it is a very simple algorithmic change.
If you think the 3char code lookup really needs to be maintained, and that is not a bad assumption, then add that as a column and create a non-unique index. When the user enters 'JOS', provide a list of qualifying namesand maybe another data point (like address) to choose from -just like a manual file cabinet. Then you do not need to try making then unique - indeed you do not want unique.
For you Primacy key however just use an generated integer (sequence). BTW Hint of the day. Make the 3letter code column varchar2(3) not char(3).

Single Big SQL Server lookup table

I have a SQL Server 2008 database with a snowflake-style schema, so lots of different lookup tables, like Language, Countries, States, Status, etc. All these lookup table have almost identical structures: Two columns, Code and Decode. My project manager would like all of these different tables to be one BIG table, so I would need another column, say CodeCategory, and my primary key columns for this big table would be CodeCategory and Code. The problem is that for any of the tables that have the actual code (say Language Code), I cannot establish a foreign key relationship into this big decode table, as the CodeCategory would not be in the fact table, just the code. And codes by themselves will not be unique (they will be within a CodeCategory), so I cannot make an FK from just the fact table code field into the Big lookup table Code field.
So am I missing something, or is this impossible to do and still be able to do FKs in the related tables? I wish I could do this: have a FK where one of the columns I was matching to in the lookup table would match to a string constant. Like this (I know this is impossible but it gives you an idea what I want to do):
ALTER TABLE [dbo].[Users] WITH CHECK ADD CONSTRAINT [FK_User_AppCodes]
FOREIGN KEY('Language', [LanguageCode])
REFERENCES [dbo].[AppCodes] ([AppCodeCategory], [AppCode])
The above does not work, but if it did I would have the FK I need. Where I have the string 'Language', is there any way in T-SQL to substitute the table name from code instead?
I absolutely need the FKs so, if nothing like this is possible, then I will have to stick with my may little lookup tables. any assistance would be appreciated.
Brian
It is not impossible to accomplish this, but it is impossible to accomplish this and not hurt the system on several levels.
While a single lookup table (as has been pointed out already) is a truly horrible idea, I will say that this pattern does not require a single field PK or that it be auto-generated. It requires a composite PK comprised of ([AppCodeCategory], [AppCode]) and then BOTH fields need to be present in the fact table that would have a composite FK of both fields back to the PK. Again, this is not an endorsement of this particular end-goal, just a technical note that it is possible to have composite PKs and FKs in other, more appropriate scenarios.
The main problem with this type of approach to constants is that each constant is truly its own thing: Languages, Countries, States, Statii, etc are all completely separate entities. While the structure of them in the database is the same (as of today), the data within that structure does not represent the same things. You would be locked into a model that either disallows from adding additional lookup fields later (such as ISO codes for Language and Country but not the others, or something related to States that is not applicable to the others), or would require adding NULLable fields with no way to know which Category/ies they applied to (have fun debugging issues related to that and/or explaining to the new person -- who has been there for 2 days and is tasked with writing a new report -- that the 3 digit ISO Country Code does not apply to the "Deleted" status).
This approach also requires that you maintain an arbitrary "Category" field in all related tables. And that is per lookup. So if you have CountryCode, LanguageCode, and StateCode in the fact table, each of those FKs gets a matching CategoryID field, so now that is 6 fields instead of 3. Even if you were able to use TINYINT for CategoryID, if your fact table has even 200 million rows, then those three extra 1 byte fields now take up 600 MB, which adversely affects performance. And let's not forget that backups will take longer and take up more space, but disk is cheap, right? Oh, and if backups take longer, then restores also take longer, right? Oh, but the table has closer to 1 billion rows? Even better ;-).
While this approach looks maybe "cleaner" or "easier" now, it is actually more costly in the long run, especially in terms of wasted developer time, as you (and/or others) in the future try to work around issues related to this poor design choice.
Has anyone even asked your project manager what the intended benefit of this is? It is a reasonable question if you are going to spend some amount of hours making changes to the system that there be a stated benefit for that time spent. It certainly does not make interacting with the data any easier, and in fact will make it harder, especially if you choose a string for the "Category" instead of a TINYINT or maybe SMALLINT.
If your PM still presses for this change, then it should be required, as part of that project, to also change any enums in the app code accordingly so that they match what is in the database. Since the database is having its values munged together, you can accomplish that in C# (assuming your app code is in C#, if not then translate to whatever is appropriate) by setting the enum values explicitly with a pattern of the first X digits are the "category" and the remaining Y digits are the "value". For example:
Assume the "Country" category == 1 and the "Language" catagory == 2, you could do:
enum AppCodes
{
// Countries
United States = 1000001,
Canada = 1000002,
Somewhere Else = 1000003,
// Languages
EnglishUS = 2000001,
EnglishUK = 2000002,
French = 2000003
};
Absurd? Completely. But also analogous to the request of merging all lookup tables into a single table. What's good for the goose is good for the gander, right?
Is this being suggested so you can minimise the number of admin screens you need for CRUD operations on your standing data? I've been here before and decided it was better/safer/easier to build a generic screen which used metadata to decide what table to extract from/write to. It was a bit more work to build but kept the database schema 'correct'.
All the standing data tables had the same basic structure, they were mainly for dropdown population with occasional additional fields for business rule purposes.

Schema for Inspection/Exam Questions/Answers with scoring

I'm writing an application that will generate inspections for our locations. Basically, think of them as health inspection forms. Each "inspection" will have a series of questions and answers. The answers can be either numeric (1,2,3,4,5 - which will represent their point values), or multiple choice ('Yes','No') that will have map to points (1 for yes, 0 for no) and flat text answers that will not map to points but might be able to be used by the application layer for averaging. So for example, we could have a field for "Sauce Temperature" which carries no points, but could be used for reporting down the road.
Questions can be reused on multiple inspection forms but can have different point values. So can answers.
I'm having trouble figuring out the schema for this. My instinct says EAV would be a good way to go, but the more I think about it, the more I'm thinking more of a data warehouse model would be better.
Particularly, I'm having a problem figuring out the best way to map the min_points, max_points and no_points to each question/answer. This is where I am thinking I'm going to have to use EAV. I'm kind of stuck on it actually. If it was a survey or something where there were no points, or the same point value for each answer, it would be pretty simple. Question table, answer table, some boilerplate tables for input type and so forth. But since each question MAY have a point value, and that point value may change depending on which location is using that question, I'm not sure how to proceed.
So, the example questions are as follows
Was the food hot [Yes, No] Possible points = 5 (5 for yes, 0 for no)
Was the food tasty [1,2,3,4,5] Possible points = 5 (1 for 1, 2 for 2, etc)
Was the manager on duty [Yes, No] Possible points = 5 (5 for yes, 0 for no)
Was the building clean [1,2,3,4,5] Possible Points = 10 (2 for 1, 4 for 2, 6 for 3, etc)
Was the staff professional [Yes, No] Possible Points = 5 (5 for yes, 0 for no)
Freezer Temp [numerical text input]
Manager on duty [text input]
Since all the answers can have different data types and point values I'm not sure how to build out the database for them.
I'm thinking (Other tables, names and other imp details left out or changed for brevity)
CREATE TABLE IF NOT EXISTS inspection(
id mediumint(8) unsigned not null auto_increment PRIMARY KEY,
store_id mediumint(8) unsigned not null,
inspection_id mediumint(8) unsigned not null,
date_created datetime,
date_modified timestamp,
INDEX IDX_STORE(store_id),
INDEX IDX_inspection(inspection_id),
FOREIGN KEY (store_id) REFERENCES store (store_id)ON DELETE CASCADE,
FOREIGN KEY (inspection_id) REFERENCES inspection (inspection_id)ON DELETE CASCADE)
CREATE TABLE IF NOT EXISTS input_type(
input_type_id tinyint(4) unsigned not null auto_increment PRIMARY KEY,
input_type_name varchar(255),
date_created datetime,
date_modified timestamp)
CREATE TABLE IF NOT EXISTS inspection_question(
question_id mediumint(8) unsigned not null auto_increment PRIMARY KEY,
question text,
input_type_id mediumint(8),
date_created datetime,
date_modified timestamp)
CREATE TABLE IF NOT EXISTS inspection_option(
option_id,
value)
But here's where I'm kind of stuck. I'm not sure how to build the question answers tables to account for points, no points, and different data types.
Also, I know I'll need mapping tables for stores to inspections and so forth, but I've left those all off for now, since it's not important to the question.
So, should I make a table for answers where all possible answers (built from either the options table or entered as text) are stored in that table and then a mapping table to map an "answer" to a "question" (for any particular inspection) and store the points there?
I'm just not thinking right. I could use some help.
There’s no right or wrong answer here, I’m just tossing out some ideas and discussion points.
I would propose that the basic “unit” isn’t the question, but the pair of question + answer type (e.g. 1-5, text, or whatever). Seems to me that Was the food hot / range 1 to 5 and Was the food hot / text description are so very different you’d go nuts trying to relate a question with two (or more) answer types (let alone answer keys for those answers--ignore that for now, I pick up on that later). Call the pair a QnA item. You may end up with a lot of similar pairs, but hey, it's what you've got to work with.
So you have a “pool” of QnA items. How are they selected for use? Are specific forms (or questionnaires) built from items in the pool, or are they randomly selected every time a questionnaire is filled out? Are forms specifically related to location, or might a form be used at any location? How fussy are they at building their forms/questionnaires? How the QnA items are collected/associated with one another and/or there ultimate results is pretty important, and you should work it all out before you start writing code, unless you really like rewriting code.
Given a defined QnA item, you should also have an “answer key” for that item – a means by which a given answer (as based on the item's answer type) measured: Zero, Value, Value * 2, whatever. This apparently can vary from usage to usage (questionnaire to questionnaire? Does it differ based on the location at which the questionnaire is presented? If so, how or why?) Are there standardized answer key algorithms (alwyas zero, always Value * 2, etc) or are these also extremely free-form? Determining how they are used/associated with the QnA items will be essential for proper modeling.

Simple database design question about foreign keys

I have a simple question about database desing...
Let's say we have Table Customer with some fields:
(PK) Id,
Firstname,
Lastname,
Address,
City,
(FK) Sex_Id...
So...
Would it be a good idea to have an additional table Table Sex where data about Sex ('M', 'W') would be saved?
Sex_Id,
Value
or should Sex values ('M' or 'W') be saved directly into table Customer? What about query speed etc.?
Thanks in advance,
best Regards.
Or, one could use an existing standard. ISO 5218 covers four codes:
0 = Not Known
1 = Male
2 = Female
9 = Not applicable (lawful person such as corporation, organization etc)
ISO 5218 is a legal encoding and does not apply for medical/biological aspect.
Obviously, a reference table containing those codes should use the natural key (as per above list), and not a syntetic key.
Joe Celko's Data Measurements And Standards in SQL is a great (albeit boring) read.
You could try a multivalued attribute, but I prefer to do this: If there are only 2 values, you could consider using a BOOL type for that attribute in your DB and making 0 = Male and 1 = Female (commenting, of course, to avoid confusion). When data is entered in the external program (given there is one), you could just do a quick mapping where if they check "male", the attribute is 0 in the DB, and if they check "female", the attribute value is 1 in the DB.
How many different values are you planning on having for Sex? If you aren't going to be adding more possible values for that column, it doesn't make sense to use a foreign key.
You can use a character for the column, storing "M" or "W", and also use a foreign key into a table (primary key of a character) if you need to store any more details about that thing; You get the benefit of easy to write/read queries (no join required) for basic stuff, but still have the possibility of adding more data later on.
That said, unless you actually do have more columns in your Sex table, you could probably not create it at all now and add it later when you actually do have a need for it.
in your example, the extra table does not buy you anything.
#marc_s has the right idea here to add a good CHECK CONSTRAINT to make sure the local values are in the proper subset.
now if your example contained additional attributes on the related object, like a 'name' or'description' or further links to other objects like 'alias' or some kind of date range - then absolutely yes, create another table.

Is this a valid use of NULL foreign keys?

Consider this very small contrived subset of my schema:
SensorType1
ID : PK
SensorType2
ID : PK
Reading
Timestamp
Value
SensorType1_ID FK -> SensorType1
SensorType2_ID FK -> SensorType2
Some readings are for SensorType1, some are for SensorType2. I would probably add a constraint to ensure exclusively one of those FK's is always pointing somewhere.
I've read a lot in the past about NULL FK's being very bad design, but I've been wrestling with my schema for days (see previous posts) and no matter which way I twist and turn it, I either end up with a NULL-able FK somewhere, or I have to duplicate my reading table (and it's dependants) for every sensor type I have (3).
The above just seems to solve the problem nicely, but it leaves a not-so-nice taste in my mouth, for some reason. It is the ONE place in my entire schema where I allow NULL fields.
I thought a bit of peer review would help me accept it before I move on.
Thanks!
What is wrong with doing it like:
Sensor
ID: PK
... common sensor fields ...
SensorType1
ID: FK(Sensor)
... specifics ...
SensorType2
ID: FK(Sensor)
... specifics ...
Reading
ID: PK
Sensor: FK(Sensor)
Timestamp: DateTime
Value: whatever
First PK's as just "ID" mean they have to change names constantly throughout the model. It makes following the RI difficult. I know some people like that. I hate it because it prevents an automated approach to finding columns.
I do things like this
SELECT * FROM ALL_TAB_COLUMNS WHERE Column_Name = :1;
If you need to "role play" have the same FK twice in a table then
LIKE '%' || :1 should work.
But you're changing col names even when not forced to. ID becomes Location_ID and then becomes LoggingLocation_ID for no technical reason
I'm assuming this isn't a physical model. If it is, why are you vertically partitioning LiveMonitoringLocation and HandProbingLocation? Is it just to avoid a nullable column? If so you're utility function is all messed up. Nullable columns are fine... adding a new table to avoid a nullable column is like driving from NYC to Cleveland to Boston in order to avoid any red lights.

Resources