SQL If Else Logic - sql-server

I have a staging table with the following structure
ID | BookID |Title | Cost |
----------------------------
1 | Test |1234 | 1234 |
This is populated through my system when an excel file is picked up, and I place all of the values inside this sheet into my staging table
I also have another table, for this example I'll call my Specials tables. It has an identical structure to my staging table.
ID | BookID |Title | Cost |
----------------------------
1 | Test |Mr Men | 4,99 |
What I'm doing now is amending a proc that is doing a whole host of calculations based on the data inside my staging table. A typical call to this looks like this:
BookTitle = dbo.StagingTable.Title
My amendment needs to check to see if in the books name in my staging table is also in the specials table. If it is, then I should bring back that data instead of the data inside of my staging table.
The BookId values are the same in both and I'm doing an Left Outer Join to tie them both together. What I'm struggling with is figuring out the correct syntax to do what I want.
LEFT OUT JOIN dbo.Specials s on dbo.StagingTable.BookId = s.BookId
Could someone point me in the right direction please?
The above is just small snippets from a larger proc that I can't share. So if things seem odd, that's why. I've simply taken the bits I could to help better explain my issue.

In T-SQL, you can do:
SELECT * FROM dbo.StagingTable
WHERE StagingTable.Title
IN (SELECT Specials.Title FROM dbo.Specials)
to get all the rows in the StagingTable that have a Title that is also present in the Specials table.

Please test following SELECT statement
I hope that is what you require
select
staging.ID,
staging.BookID,
Title = case when staging.Title <> isnull(Specials.Title,staging.Title) then Specials.Title else staging.Title end,
staging.Cost
from staging
left outer join Specials on staging.BookID = Specials.BookID

Related

Are Views or Functions faster in SQL?

I have a table with customer receipts. I'm trying to generate a report based on the user's name, address, and purchases total based by department. The desired output should look like
|Customer |Address | Clothing | Electronics | Hardware | Household |
|Homer Simpson | 724 Evergreen Terr | $42 | $20 | $500 | $24 |
|Walter White | 308 Negra Arroyo Lane | $120 | $80 | $52 | $2400 |
The receipts table is part of a temporal model. So, the code looks like:
Select c.customername,a.address,r.receiptno,ir.department,ir.total
from customer c
inner join customer_address_lnk cal on cal.customerid = c.id
inner join address a on cal.addressid = a.id
inner join customer_receipts_lnk crl on crl.customerid = c.id
inner join receipts r on crl.receiptid = r.id
inner join receipts_receiptitem_lnk rrl on rrl.receiptid = r.id
inner join receiptitem ri on ri.id = rrl.receiptitemid
The lnk tables are linking tables.
The receiptitem table has the following columns: ID, Department, Amount, CreatedDate, UpdatedDate
The idea is that if the receipt is updated, the updated amount can be adjusted for returns, price adjustments, and so forth.
The goal is to get the query under 5 sec. Since we have over 125 million rows in the receiptitems table alone, it takes SQL 20+ minutes to calculate the report.
I've tried CTE's on views without success. I've tried different JOIN orders. I've used LEFT Joins. Even Pivot didn't slow it down. I still can't get it under 20 minutes.
Before I start down the path of creating a Function to get it under the 5 second goal, I'm open to any suggestions. I have limited ability to alter indices at this time.
Any thoughts?
Well, obviously views and SQL functions are different things.
Try to use a function where it needs to be clear to a user in the future (maybe yourself!) that the data returned requires certain parameters where the data does not make sense without those parameters. Sort of like forcing the user to include a WHERE clause.
In your example, you may want to force the user to filter by CustomerId or ReceiptId.
HOWEVER....
In this case, the view approach would probably be better.
Functions, by design, do not use temporary tables, but use table variables instead. Tables as variables are much slower than temp tables.
The query you've included is really straight forward with no surprises. The view would be the simplest and best approach here.
For 125M rows, I suggest either checking execution plan during processing (include a WHERE clause for this) or dumping data into a summary table that is updated periodically. Or both. Check indexes all along the way.
Here is more (better) discussion Test SQL Queries

How to label result tables in multiple SELECT output

I wrote a simple dummy procedure to check the data that saved in the database. When I run my procedure it output the data as below.
I want to label the tables. Then even a QA person can identify the data which gives as the result. How can I do it?
**Update : ** This procedure is running manually through Management Studios. Nothing to do with my application. Because all I want to check is whether the data has inserted/updated properly.
For better clarity, I want to show the table names above the table as a label.
Add another column to the table, and name it so it will be distinguished by who reads them :)
Select 'Employee' as TABLE_NAME, * from Employee
Output will look like this:
| TABLE_NAME | ID | Number | ...
------------------------------
| Employee | 1 | 123 | ...
Or you can call the column 'Employee'
SELECT 'Employee' AS 'Employee', * FROM employee
The output will look like this:
| Employee | ID | Number | ...
------------------------------
| Employee | 1 | 123 | ...
Add an extra column, whiches name (not value!) is the label.
SELECT 'Employee' AS "Employee", e.* FROM employee e
The output will look like this:
| Employee | ID | Number | ...
------------------------------
| Employee | 1 | 123 | ...
By doing so, you will see the label, even if the result does not contain rows.
I like to stick a whole nother result set that looks like a label or title between the result sets with real data.
SELECT 0 AS [Our Employees:]
WHERE 1 = 0
-- Your first "Employees" query goes here
SELECT 0 AS [Our Departments:]
WHERE 1 = 0
-- Now your second real "Departments" query goes here
-- ...and so on...
Ends up looking like this:
It's a bit looser-formatted with more whitespace than I like, but is the best I've come up with so far.
Unfortunately there is no way of labeling any SELECT query output in SQL Server or SSMS. The very similar thing was once needed in my experience a few years ago. We settled for using a work around:
Adding another table which contains the list of table aliases.
Here is what we did:
We appended the list of tables with another table in the beginning of the data set. So the first Table will look as follows:
Name
Employee
Department
Courses
Class
Attendance
In c# while reading the tables, you can iterate through the first table first and assign TableName to all tables in the DataSet further.
This is best done using Reporting Services and creating a simple report. You can then email this report daily if you wish.

Metadata database design

I am trying to store meta data about a document into a SQL Server. The document are stored into a document archive, and returns back an identifier so I can get back that document by asking the archive to get the document by identifier.
Our user would like to be able to search for this document based on different meta data. The meta data could be 1 attribute or 5 depending on the document type, and the users should be able to create new document types from a admin site.
I can see two solution here. One is that each documenttype gets it's own metadata table, where all metadata attributes are predefined, and if one should be added a new column needs to be created. And if a new documenttype is created a new metadata table needs to be created. Our DBA will freak out with a solution like this, and I also see a problem with indexes. Because if the documenttype has 5 different meta data attributes it needs to be searchable with 1 or 4 of them specified in the search. Then I would need to write index for all the different combinations of possible searchs.
here is an example (fictiv)
|documentId | Name | InsertDate | CustomerId | City
| 1 | John | 2014-01-01 | 2 | London
| 2 | John | 2014-01-20 | 5 | New York
| 3 | Able | 2014-01-01 | 10 | Paris
I could here say:
Give me all documents where Name = 'John'
Give me all documets where Name = 'John' And CustomerId = 5
Give me all document where InserDate = '2014-01-01' and City = 'London'
This will be 3 differnet indexes and then I haven't coverd all possible combinations. This isn't practical.
So I am look in to the evil 'EAV' (anti)pattern.
So instead of having the metadata as columns I can have the as rows.
|documentId | MetaAttribute | MetaValue
| 1 | Name | John
| 1 | InsertDate | 2014-01-01
| 1 | CustomerId | 2
| 1 | City | London
| 2 | Name | John
| 2 | InsertDate | 2014-01-20
| 2 | CustomerId | 5
| 2 | City | New York
| 3 | Name | Able
| 3 | InserDate | 2014-01-01
| 3 | CustomerId | 10
| 3 | City | Paris
Here it's simple to create one index om MetaAttribute och metaValue, and it's covered. If a new documenttype is created, new metadata can be created with that documenttype into a MetaAttributeTable (that contains all MetaAttribute for the different documenttype). So no need to create new tables or coulms if a new documenttype is added or if a new attribute is added to a documenttype. Instead all MetaValues most be strings :( and the SQL Query to find the document id is a bit more complicated.
This is what I figured out. (In this example the MetaAttribute is a string, but would be an ID to the MetaAttribute Table)
SELECT * FROM [Document]
WHERE ID IN (SELECT documentId FROM [MetaData]
WHERE ((MetaAttribute = 'Name' AND MetaValue = 'John')
OR (MetaAttribute = 'CustomerId' and MetaValue = '5'))
GROUP BY [documentId]
HAVING Count(1) = 2)
Here I need to ask if the Name = 'John' and CustomerId = 5. I do that by finding all records that have Name = 'John' and CustomerId = '5' and the Group it on the documentId and count number of items in the group. If I got 2 then both Name = 'John' and CustomerId = '5' is true for this search. Return the documentId and use that to retrive information about the document, like the document archive storage id.
There should be a better SQL statement for this isn't there?
So my question is. Is there a better approche than these 2. Is the EAV-pattern so bad that I should stick with the first approche and have a Freaked out DBA and "ten millions of indexes"
We are talking about a system that will have around 10-20 millions of new records each month, and contain data for at least 3 years.... So the tables will be preatty big and good indexes are neccasary for performance.
Best Regards
Magnus
The EAV model is appealing if you have unbounded attributes--that is, anyone can set up anything as an attribute. However, it sounds from your description that this is not the case--the possible document attributes come from a known and fairly limited set. If this is the case, routine normalization suggests the following:
-- One per document
CREATE TABLE Document
(
DocumentId -- primary key
,DocumentType
,<etc>
)
-- One per "type" of document
CREATE TABLE DocumentType
(
DocumentTypeId -- pirmary key
,Name
)
-- One per possible document attribute.
-- Note that multiple document types can reference the same attribute
CREATE TABLE DocumentAttributes
(
AttributeId -- primary key
,Name
)
-- This lists which attributes are used by a given type
CREATE TABLE DocumentTypeAttributes
(
DocumentTypeId
,AttributeId
-- compound primary key on both columns
-- foeign keys on both columns
)
-- This contains the final association of document and attributes
CREATE TABLE DocumentAttributeValues
(
DocumentId
,AttributeId
,Value
-- compound primary key on DocumentId, AttributeId
-- foeign keys on both columns ot their respective parent tables
)
A tighter model with more robust keys could be implemented to ensure at the database level that an attribute cannot be assigned to a document with an “inappropriate” type.
Queries have to use joins, but (presumably) only the Documents and DocumentAttributes tables will ever be large. An index on on (AttributeId + Value) facilitiate lookups by attribute type, and depending on cardinality an index on (Value + AttributeId) could make searches for specific attributes quite efficient.
(Edit)
Ooh, clever, I created two tables with the same name. I've renamed the last one to DocumentAttributeValues. (Free advice is clearly worth what you paid for it!)
This shows how ugly these systems can get in SQL, as you have to “look up” both attributes separately. On the plus side you don’t have to worry about “does this type go with this document”, as those rules have (better had) been applied when the data was loaded. Two examples:
This one spells everything out in joins, and as such I think it might perform worse than the next:
-- Top-down
SELECT do.DocumentId
from Documents do
inner join DocumentAttributes da1
on da.Name = 'Name'
inner join DocumentAttributeValues dav1
on dav1.AttributeId = da1.AttributeId
and dav1.Value = 'John'
inner join DocumentAttributes da2
on da2.Name = 'CustomerId'
inner join DocumentAttributeValues dav2
on dav2.AttributeId = da2.AttributeId
and dav2.Value = '5'
This one picks out the attributes, then finds which documents have all of them. It might perform better, as there’s one less table to process:
-- Bottom-up
SELECT xx.DocumentId
from (-- All documents with name "John"
select dav.DocumentId
from DocumentAttributes da
inner join DocumentAttributeValues dav
on dav.AttributeId = da.AttributeId
where da.Name = 'Name'
and dav.Value = 'John'
-- This combines the two sets, with "all" keeping any duplicate entries
union all
-- All documents with CustomerId = "5"
select dav.DocumentId
from DocumentAttributes da
inner join DocumentAttributeValues dav
on dav.AttributeId = da.AttributeId
where da.Name = 'CustomerId'
and dav.Value = '5') xx -- Have to give the subquery an alias
group by xx.DocumentId
having count(*) = 2
While further refinements might be possible, the more more attributes you’re filtering on, the uglier the queries will be. Five attributes max might work ok in SQL, but if you’ve got tons of attributes, a NoSQL solution might be what you’re looking for.
(Please note that, as with my original post, I have not tested this code, so there may be typos or subtle--or not so subtle--errors in here.)
SQL Server 2008+ offers three related features for dealing with such cases:
Sparse Columns which allow you to define hundreds of columns even if only a subset are used at a time
Column Sets allow you to group these columns and treat them as a group
Filtered indexes can index only the rows that actually have values in them.
These features allow you to work with more-or-less normal SQL statements to handle all metadata columns.
These features were specifically added to address the EAV/metadata scenario.
EDIT
If you have a limited set of attributes that are always filled, there is no need for Sparse Columns or the EAV anti-pattern either.
You can create your tables as you normally would and add indexes to optimize the real workload you encounter. Certain types of queries will occur far more often than others and SQL Server's Index tuning advisor can propose the indexes and statistics to use based on a trace captured using SQL Server's Profiler.
It's quite possible that only a subset of the columns will accelerate searches and the rest can be added as include columns in the index.
Full Text Search
A more powerful option is to use SQL Server's Full Text Search. This will allow you to execute queries using arbitrary attributes. This is another technique using by document/content management systems, ERPs and CRMs to handle arbitrary attributes.
With FTS you simply specify the columns to include in one FTS index and don't have to create separate indexes for each attribute.
You can use FTS predicates in SELECT queries like this:
SELECT Name, ListPrice
FROM Production.Product
WHERE ListPrice = 80.99
AND CONTAINS(Name, 'Mountain')
This can result in much simpler queries (you just write a modified select) and administration (no worries about column order in indexes, only one FTS index to manage)

Possible to query a database into excel on a cell by cell basis? Or another solution..?

I have various large views/stored procedures that basically churns out a lot of data into an excel spread sheet. There was a problem where not all of the
company amounts weren't flowing through. I narrowed it down to a piece of code in a stored procedure: (Note this is cut down for simplicity)
LEFT OUTER JOIN view_creditrating internal_creditrating
ON creditparty.creditparty =
internalrating.company
LEFT OUTER JOIN (SELECT company, contract, SUM(amount) amount
FROM COMMON_OBJ.amount
WHERE status = 'Active'
GROUP BY company, contract) col
ON vd.contract = col.contract
Table with issue:
company | contract | amount |
| | |
TVC | NULL | 1006 |
KS | 10070 | -2345 |
NYC-G | 10060 | 334000 |
NYC-G | 100216 | 4000 |
UECR | NULL | 0 |
SP | 10090 | 84356 |
Basically some of the contracts are NULL. So when there is a LEFT OUTER JOIN on contract the null values in contract drop out and don't flow through...So i decided to do it based on company.
This also causes problems because company appears within the table more than once in order to show different contracts. With this change the query becomes ambiguous because it won't know if I want
contract 10060's amount or the contract 100216's amount and more often than not it gives me the incorrect amount. I thought about leaving the final ON clause with company = company.
This causes the least issues.... Then Somehow directly querying for for each cell value that would be inconsistent because it only affects a few cells. Although I've searched and I don't think that this is possible.
Is this possible?? OR is there another way to fix this on the database end?
As you've worked out, the problem is in the ON clause, and its use of NULL.
One way to alter the NULL to be a value you can match against is to use COALESCE, which would alter the clause to:
ON coalesce(vd.contract,'No Contract') = coalesce(col.contract,'No Contract')
This will turn all NULL's into 'No Contract', which will change the NULL=NULL test (which would return NULL) to 'No Contract'='No Contract', which will return True

Detecting Correlated Columns in Data

Suppose I have the following data:
OrderNumber | CustomerName | CustomerAddress | CustomerCode
1 | Chris | 1234 Test Drive | 123
2 | Chris | 1234 Test Drive | 123
How can I detect that the columns "CustomerName", "CustomerAddress", and "CustomerCode" all correlate perfectly? I'm thinking that Sql Server data mining is probably the right tool for the job, but I don't have too much experience with that.
Thanks in advance.
UPDATE:
By "correlate", I mean in the statistics sense, that whenever column a is x, column b will be y. In the above data, The last three columns correlate with each other, and the first column does not.
The input of the operation would be the name of the table, and the output would be something like :
Column 1 | Column 2 | Certainty
CustomerName | CustomerAddress | 100%
CustomerAddress | CustomerCode | 100%
There is a 'functional dependency' test built in to the SQL Server Data Profiling component (which is an SSIS component that ships with SQL Server 2008). It is described pretty well on this blog post:
http://blogs.conchango.com/jamiethomson/archive/2008/03/03/ssis-data-profiling-task-part-7-functional-dependency.aspx
I have played a little bit with accessing the data profiler output via some (under-documented) .NET APIs and it seems doable. However, since my requirement dealt with distribution of column values, I ended up going with something much simpler based on the output of DBCC STATISTICS. I was quite impressed by what I saw of the profiler component and the output viewer.
What do you mean by correlate? Do you just want to see if they're equal? You can do that in T-SQL by joining the table to itself:
select distinct
case when a.OrderNumber < b.OrderNumber then a.OrderNumber
else b.OrderNumber
end as FirstOrderNumber,
case when a.OrderNumber < b.OrderNumber then b.OrderNumber
else a.OrderNumber
end as SecondOrderNumber
from
MyTable a
inner join MyTable b on
a.CustomerName = b.CustomerName
and a.CustomerAddress = b.CustomerAddress
and a.CustomerCode = b.CustomerCode
This would return you:
FirstOrderNumber | SecondOrderNumber
1 | 2
Correlation is defined on metric spaces, and your values are not metric.
This will give you percent of customers that don't have customerAddress uniquely defined by customerName:
SELECT AVG(perfect)
FROM (
SELECT
customerName,
CASE
WHEN COUNT(customerAddress) = COUNT(DISTINCT customerAddress)
THEN 0
ELSE 1
END AS perfect
FROM orders
GROUP BY
customerName
) q
Substitute other columns instead of customerAddress and customerName into this query to find discrepancies between them.

Resources