Enforce uniqueness on column based on contents from another column - sql-server

I have the typical Invoice/InvoiceItems master/detail tables (the ones every book and tutorial out there uses as examples). I also have a "Proforma" table which holds data similar to invoices that are sometimes linked to invoices. Both are linked to each item in the invoice, with a column optionally referencing a proforma, something like this:
id | id_invoice | id_proforma | amount ....... and a bunch of irrelevant stuff
-----------------------------------------------
1 | 1 | null | 100
2 | 1 | null | 40
3 | 2 | 3 | 1000
4 | 3 | 4 | 473
5 | 3 | 4 | 139
Basically, each item in an invoice can be linked to a proforma. There is also a business rule that says that each proforma can be used in only one invoice (it's OK to use it in many items within the same invoice).
Currently that rule is enforced on the application side but this has problems with concurrency, as 2 users could take the same proforma at the same time and the system would let it pass. My intention is to have the DB validate this in addition to some front-end visual clues, but so far I've failed to come with an approach for this particular case.
Filtered unique indexes could serve well, except that the same proforma can be used twice if it's for the same invoice, so my question is, how can I make the DB server enforce that rule?
Database engine can be SQL 2012 or latter and any edition from express to enterprise.

You can create a user-defined scalar function that returns TRUE if the proforma id and invoice id combination are valid. Then put a check constraint on the table requiring the function to return true. Like this (tweak to fit your table name/needs):
-- Here's the function:
create function dbo.svfIsCombinationValid (
#id_invoice int
, #id_proforma int
)
returns bit
as
begin;
declare #return bit = 1;
if exists (
select 1
from dbo.YourInvoiceProformaXRefTable
where id_proforma = #id_proforma
and id_invoice <> #id_invoice
)
begin;
set #return = 0;
end;
return #return;
end;
After that, you can alter the table and add the check constraint:
alter table dbo.YourInvoiceProformaXRefTable
add constraint CK_YourInvoiceProformaXRefTable_UniqueInvoiceProforma
check (dbo.svfIsCombinationValid(id_invoice,id_proforma)=1);
This is OK with nulls (multiple id_invoice can have id_proforma NULL values). but if both values are not null, then the combination must either be NEW or the same as existing rows.

Related

SQL Server 2012 - Inserting multiple records from one block of text

On SQL server 2012. I have normalized tables that will consist of "notes". Each "note" record can have many notelines tied to it with a foreign key. I'm looking for a SQL statement that will parse a block of text and, for each line within that text, insert a separate record.
I'm guessing some sort of "WHILE loop" for each block of text but can't get my head around how it would work.
To be clear: The end result of this would be to just paste the block of text into the query and execute so that I can get each individual line of it into the note without messing around creating multiple insert statements.
I agree with Zohar Peled, this is probably not be the right way to normalize these tables. If this is still something you need to do then:
In SQL Server 2016+ you can use string_split().
Prior to that version; using a CSV Splitter table valued function by Jeff Moden:
table setup:
create table dbo.note (
id int not null identity(1,1) primary key
, created datetime2(2) not null
/* other cols */
);
create table dbo.note_lines (
id int not null identity(1,1) primary key
, noteId int not null foreign key references dbo.note(id)
, noteLineNumber int not null
, noteLineText varchar(8000) not null
);
insert statements:
declare #noteId int;
insert into dbo.note values (sysutcdatetime());
set #noteId = scope_identity();
declare #note_txt varchar(8000) = 'On SQL server 2012. I have normalized tables that will consist of "notes". Each "note" record can have many notelines tied to it with a foreign key. I''m looking for a SQL statement that will parse a block of text and, for each line within that text, insert a separate record.
I''m guessing some sort of "WHILE loop" for each block of text but can''t get my head around how it would work.
To be clear: The end result of this would be to just paste the block of text into the query and execute so that I can get each individual line of it into the note without messing around creating multiple insert statements.'
insert into dbo.note_lines (noteId, noteLineNumber, noteLineText)
select #noteId, s.ItemNumber, s.Item
from [dbo].[delimitedsplit8K](#note_txt, char(10)) s
And after the insert:
select * from note;
select * from note_lines;
rextester demo: http://rextester.com/SCODAG90159
return (respectively):
+----+---------------------+
| id | created |
+----+---------------------+
| 1 | 2017-04-06 15:16:59 |
+----+---------------------+
+----+--------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | noteId | noteLineNumber | noteLineText |
+----+--------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | 1 | 1 | On SQL server 2012. I have normalized tables that will consist of "notes". Each "note" record can have many notelines tied to it with a foreign key. I'm looking for a SQL statement that will parse a block of text and, for each line within that text, insert a separate record. |
| 2 | 1 | 2 | I'm guessing some sort of "WHILE loop" for each block of text but can't get my head around how it would work. |
| 3 | 1 | 3 | To be clear: The end result of this would be to just paste the block of text into the query and execute so that I can get each individual line of it into the note without messing around creating multiple insert statements. |
+----+--------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
splitting strings reference:
Tally OH! An Improved SQL 8K “CSV Splitter” Function - Jeff Moden
Splitting Strings : A Follow-Up - Aaron Bertrand
Split strings the right way – or the next best way - Aaron Bertrand
string_split() in SQL Server 2016 : Follow-Up #1 - Aaron Bertrand
Note: the function used for the example is designed for a limit of 8000 characters, if you need more, then there are alternatives in the links above.

Subquery for calculated field giving invalid argument to function error

I have a table with a list of stores and attributes that dictate the age of the store in weeks and the order volume of the store. The second table lists the UPLH goals based on age and volume. I want to return the stores listed in the first table along with its associated UPLH goal. The following works correctly:
SELECT store, weeksOpen, totalItems,
(
SELECT max(UPLH)
FROM uplhGoals as b
WHERE b.weeks <= a.weeksOpen AND 17000 between b.vMIn and b.vmax
) as UPLHGoal
FROM weekSpecificUPLH as
a
But this query, which is replacing the hard coded value of totalItems with the field from the first table, gives me the "Invalid argument to function" error.
SELECT store, weeksOpen, totalItems,
(
SELECT max(UPLH)
FROM uplhGoals as b
WHERE b.weeks <= a.weeksOpen AND a.totalItems between b.vMIn and b.vmax
) as UPLHGoal
FROM weekSpecificUPLH as a
Any ideas why this doesnt work? Are there any other options? I can easily use a dmax() and cycle through every record to create a new table but that seems the long way around something that a query should be able to produce.
SQLFiddle: http://sqlfiddle.com/#!9/e123a8/1
It appears that SQLFiddle output (below) was what i was looking for even though Access gives the error.
| store | weeksOpen | totalItems | UPLHGoal |
|-------|-----------|------------|----------|
| 1 | 15 | 13000 | 30 |
| 2 | 37 | 4000 | 20 |
| 3 | 60 | 10000 | 30 |
EDIT:
weekSpecificUPLH is a query not a table. If I create a new test table in Access, with identical fields, it works. This would indicate to me that it has something to do with the [totalItems] field which is actually a calculated result. So instead i replace that field with [a.IPO * a.OPW]. Same error. Its as if its not treating it as the correct type of number.
Ive tried:
SELECT store, weeksOpen, (opw * ipo) as totalItems,
(
SELECT max(UPLH)
FROM uplhGoals as b
WHERE 17000 between b.vMIn and b.vmax AND b.weeks <= a.weeksOpen
) as UPLHGoal
FROM weekSpecificUPLH as
a
which works. but replace the '17000' with 'totalitems' and same error. I even tried using val(totalItems) to no avail.
Try to turn it into
b.vmin < a.totalItems AND b.vmax > a.totalItems
Although there're questions to your DB design.
For future approaches, it would be very helpful if you reveal your DB structure.
For example, it seems you don't have the records in weekSpecificUPLH table related to the records in UPLHGoals table, do you?
Or, more general: these table are not related in any way except for rules described by data itself in Goals table (which is "external" to DB model).
Thus, when you call it "associated" you got yourself & others into confusion, I presume, because everyone immediately start considering the classical Relation in terms of Relational Model.
Something was changing the type of value of totalItems. To solve I:
Copied the weekSpecificUPLH query results to a new table 'tempUPLH'
Used that table in place of the query which correctly pulled the UPLHGoal from the 'uplhGoals' table

How to label result tables in multiple SELECT output

I wrote a simple dummy procedure to check the data that saved in the database. When I run my procedure it output the data as below.
I want to label the tables. Then even a QA person can identify the data which gives as the result. How can I do it?
**Update : ** This procedure is running manually through Management Studios. Nothing to do with my application. Because all I want to check is whether the data has inserted/updated properly.
For better clarity, I want to show the table names above the table as a label.
Add another column to the table, and name it so it will be distinguished by who reads them :)
Select 'Employee' as TABLE_NAME, * from Employee
Output will look like this:
| TABLE_NAME | ID | Number | ...
------------------------------
| Employee | 1 | 123 | ...
Or you can call the column 'Employee'
SELECT 'Employee' AS 'Employee', * FROM employee
The output will look like this:
| Employee | ID | Number | ...
------------------------------
| Employee | 1 | 123 | ...
Add an extra column, whiches name (not value!) is the label.
SELECT 'Employee' AS "Employee", e.* FROM employee e
The output will look like this:
| Employee | ID | Number | ...
------------------------------
| Employee | 1 | 123 | ...
By doing so, you will see the label, even if the result does not contain rows.
I like to stick a whole nother result set that looks like a label or title between the result sets with real data.
SELECT 0 AS [Our Employees:]
WHERE 1 = 0
-- Your first "Employees" query goes here
SELECT 0 AS [Our Departments:]
WHERE 1 = 0
-- Now your second real "Departments" query goes here
-- ...and so on...
Ends up looking like this:
It's a bit looser-formatted with more whitespace than I like, but is the best I've come up with so far.
Unfortunately there is no way of labeling any SELECT query output in SQL Server or SSMS. The very similar thing was once needed in my experience a few years ago. We settled for using a work around:
Adding another table which contains the list of table aliases.
Here is what we did:
We appended the list of tables with another table in the beginning of the data set. So the first Table will look as follows:
Name
Employee
Department
Courses
Class
Attendance
In c# while reading the tables, you can iterate through the first table first and assign TableName to all tables in the DataSet further.
This is best done using Reporting Services and creating a simple report. You can then email this report daily if you wish.

References in a table

I have a table like this, that contains items that are added to the database.
Catalog table example
id | element | catalog
0 | mazda | car
1 | penguin | animal
2 | zebra | animal
etc....
And then I have a table where the user selects items from that table, and I keep a reference of what has been selected like this
User table example
id | name | age | itemsSelected
0 | john | 18 | 2;3;7;9
So what I am trying to say, is that I keep a reference to what the user has selected as a string if ID's, but I think this seems a tad troublesome
Because when I do a query to get information about a user, all I get is the string of 2;3;7;9, when what I really want is an array of the items corresponing to those ID's
Right now I get the ID's and I have to split the string, and then run another query to find the elements the ID's correspond to
Is there any easier ways to do this, if my question is understandable?
Yes, there is a way to do this. You create a third table which contains a map of A/B. It's called a Multiple to Multiple foreign-key relationship.
You have your Catalogue table (int, varchar(MAX), varchar(MAX)) or similar.
You have your User table (int, varchar(MAX), varchar(MAX), varchar(MAX)) or similar, essentially, remove the last column and then create another table:
You create a UserCatalogue table: (int UserId, int CatalogueId) with a Primary Key on both columns. Then the UserId column gets a Foreign-Key to User.Id, and the CatalogueId table gets a Foreign-Key to Catalogue.Id. This preserves the relationship and eases queries. It also means that if Catalogue.Id number 22 does not exist, you cannot accidentally insert it as a relation between the two. This is called referential-integrity. The SQL Server mandates that if you say, "This column must have a reference to this other table" then the SQL Server will mandate that relationship.
After you create this, for each itemsSelected you add an entry: I.e.
UserId | CatalogueId
0 | 2
0 | 3
0 | 7
0 | 9
This also alows you to use JOINs on the tables for faster queries.
Additionally, and unrelated to the question, you can also optimize the Catalogue table you have a bit, and create another table for CatalogueGroup, which contains your last column there (catalog: car, animal) which is referenced via a Foreign-Key Relationship in the current Catalogue table definition you have. This will also save storage space and speed up SQL Server work, as it no longer has to read a string column if you only want the element value.

Metadata database design

I am trying to store meta data about a document into a SQL Server. The document are stored into a document archive, and returns back an identifier so I can get back that document by asking the archive to get the document by identifier.
Our user would like to be able to search for this document based on different meta data. The meta data could be 1 attribute or 5 depending on the document type, and the users should be able to create new document types from a admin site.
I can see two solution here. One is that each documenttype gets it's own metadata table, where all metadata attributes are predefined, and if one should be added a new column needs to be created. And if a new documenttype is created a new metadata table needs to be created. Our DBA will freak out with a solution like this, and I also see a problem with indexes. Because if the documenttype has 5 different meta data attributes it needs to be searchable with 1 or 4 of them specified in the search. Then I would need to write index for all the different combinations of possible searchs.
here is an example (fictiv)
|documentId | Name | InsertDate | CustomerId | City
| 1 | John | 2014-01-01 | 2 | London
| 2 | John | 2014-01-20 | 5 | New York
| 3 | Able | 2014-01-01 | 10 | Paris
I could here say:
Give me all documents where Name = 'John'
Give me all documets where Name = 'John' And CustomerId = 5
Give me all document where InserDate = '2014-01-01' and City = 'London'
This will be 3 differnet indexes and then I haven't coverd all possible combinations. This isn't practical.
So I am look in to the evil 'EAV' (anti)pattern.
So instead of having the metadata as columns I can have the as rows.
|documentId | MetaAttribute | MetaValue
| 1 | Name | John
| 1 | InsertDate | 2014-01-01
| 1 | CustomerId | 2
| 1 | City | London
| 2 | Name | John
| 2 | InsertDate | 2014-01-20
| 2 | CustomerId | 5
| 2 | City | New York
| 3 | Name | Able
| 3 | InserDate | 2014-01-01
| 3 | CustomerId | 10
| 3 | City | Paris
Here it's simple to create one index om MetaAttribute och metaValue, and it's covered. If a new documenttype is created, new metadata can be created with that documenttype into a MetaAttributeTable (that contains all MetaAttribute for the different documenttype). So no need to create new tables or coulms if a new documenttype is added or if a new attribute is added to a documenttype. Instead all MetaValues most be strings :( and the SQL Query to find the document id is a bit more complicated.
This is what I figured out. (In this example the MetaAttribute is a string, but would be an ID to the MetaAttribute Table)
SELECT * FROM [Document]
WHERE ID IN (SELECT documentId FROM [MetaData]
WHERE ((MetaAttribute = 'Name' AND MetaValue = 'John')
OR (MetaAttribute = 'CustomerId' and MetaValue = '5'))
GROUP BY [documentId]
HAVING Count(1) = 2)
Here I need to ask if the Name = 'John' and CustomerId = 5. I do that by finding all records that have Name = 'John' and CustomerId = '5' and the Group it on the documentId and count number of items in the group. If I got 2 then both Name = 'John' and CustomerId = '5' is true for this search. Return the documentId and use that to retrive information about the document, like the document archive storage id.
There should be a better SQL statement for this isn't there?
So my question is. Is there a better approche than these 2. Is the EAV-pattern so bad that I should stick with the first approche and have a Freaked out DBA and "ten millions of indexes"
We are talking about a system that will have around 10-20 millions of new records each month, and contain data for at least 3 years.... So the tables will be preatty big and good indexes are neccasary for performance.
Best Regards
Magnus
The EAV model is appealing if you have unbounded attributes--that is, anyone can set up anything as an attribute. However, it sounds from your description that this is not the case--the possible document attributes come from a known and fairly limited set. If this is the case, routine normalization suggests the following:
-- One per document
CREATE TABLE Document
(
DocumentId -- primary key
,DocumentType
,<etc>
)
-- One per "type" of document
CREATE TABLE DocumentType
(
DocumentTypeId -- pirmary key
,Name
)
-- One per possible document attribute.
-- Note that multiple document types can reference the same attribute
CREATE TABLE DocumentAttributes
(
AttributeId -- primary key
,Name
)
-- This lists which attributes are used by a given type
CREATE TABLE DocumentTypeAttributes
(
DocumentTypeId
,AttributeId
-- compound primary key on both columns
-- foeign keys on both columns
)
-- This contains the final association of document and attributes
CREATE TABLE DocumentAttributeValues
(
DocumentId
,AttributeId
,Value
-- compound primary key on DocumentId, AttributeId
-- foeign keys on both columns ot their respective parent tables
)
A tighter model with more robust keys could be implemented to ensure at the database level that an attribute cannot be assigned to a document with an “inappropriate” type.
Queries have to use joins, but (presumably) only the Documents and DocumentAttributes tables will ever be large. An index on on (AttributeId + Value) facilitiate lookups by attribute type, and depending on cardinality an index on (Value + AttributeId) could make searches for specific attributes quite efficient.
(Edit)
Ooh, clever, I created two tables with the same name. I've renamed the last one to DocumentAttributeValues. (Free advice is clearly worth what you paid for it!)
This shows how ugly these systems can get in SQL, as you have to “look up” both attributes separately. On the plus side you don’t have to worry about “does this type go with this document”, as those rules have (better had) been applied when the data was loaded. Two examples:
This one spells everything out in joins, and as such I think it might perform worse than the next:
-- Top-down
SELECT do.DocumentId
from Documents do
inner join DocumentAttributes da1
on da.Name = 'Name'
inner join DocumentAttributeValues dav1
on dav1.AttributeId = da1.AttributeId
and dav1.Value = 'John'
inner join DocumentAttributes da2
on da2.Name = 'CustomerId'
inner join DocumentAttributeValues dav2
on dav2.AttributeId = da2.AttributeId
and dav2.Value = '5'
This one picks out the attributes, then finds which documents have all of them. It might perform better, as there’s one less table to process:
-- Bottom-up
SELECT xx.DocumentId
from (-- All documents with name "John"
select dav.DocumentId
from DocumentAttributes da
inner join DocumentAttributeValues dav
on dav.AttributeId = da.AttributeId
where da.Name = 'Name'
and dav.Value = 'John'
-- This combines the two sets, with "all" keeping any duplicate entries
union all
-- All documents with CustomerId = "5"
select dav.DocumentId
from DocumentAttributes da
inner join DocumentAttributeValues dav
on dav.AttributeId = da.AttributeId
where da.Name = 'CustomerId'
and dav.Value = '5') xx -- Have to give the subquery an alias
group by xx.DocumentId
having count(*) = 2
While further refinements might be possible, the more more attributes you’re filtering on, the uglier the queries will be. Five attributes max might work ok in SQL, but if you’ve got tons of attributes, a NoSQL solution might be what you’re looking for.
(Please note that, as with my original post, I have not tested this code, so there may be typos or subtle--or not so subtle--errors in here.)
SQL Server 2008+ offers three related features for dealing with such cases:
Sparse Columns which allow you to define hundreds of columns even if only a subset are used at a time
Column Sets allow you to group these columns and treat them as a group
Filtered indexes can index only the rows that actually have values in them.
These features allow you to work with more-or-less normal SQL statements to handle all metadata columns.
These features were specifically added to address the EAV/metadata scenario.
EDIT
If you have a limited set of attributes that are always filled, there is no need for Sparse Columns or the EAV anti-pattern either.
You can create your tables as you normally would and add indexes to optimize the real workload you encounter. Certain types of queries will occur far more often than others and SQL Server's Index tuning advisor can propose the indexes and statistics to use based on a trace captured using SQL Server's Profiler.
It's quite possible that only a subset of the columns will accelerate searches and the rest can be added as include columns in the index.
Full Text Search
A more powerful option is to use SQL Server's Full Text Search. This will allow you to execute queries using arbitrary attributes. This is another technique using by document/content management systems, ERPs and CRMs to handle arbitrary attributes.
With FTS you simply specify the columns to include in one FTS index and don't have to create separate indexes for each attribute.
You can use FTS predicates in SELECT queries like this:
SELECT Name, ListPrice
FROM Production.Product
WHERE ListPrice = 80.99
AND CONTAINS(Name, 'Mountain')
This can result in much simpler queries (you just write a modified select) and administration (no worries about column order in indexes, only one FTS index to manage)

Resources