I have a database table that records staff contacts with clients. Usually only 1 staff member contacts a client, however occasionally 2 staff members have a contact with a client at the same time. One staff member is flagged as the primary, the other the secondary. To provide a link the one flagged as secondary will have the ContactID of the primary stored in the SecondaryContactID field, like in this example:
ContactID SecondaryContactID ContactDate StaffMemberID ContactLocation
--------- ------------------ ----------- ------------- ---------------
123456 Null 01/JUL/2013 John SydneyCBD
123457 123456 01/JUL/2013 James Null
Our major corporate app has a bug in that it does not store the same ContactLocation for the secondary staff member as the primary (even though they are always the same location in reality), its defaulting to Null. So in the example above "SydneyCBD" should be in both rows.
In my extract I need these records on 2 rows pretty much like the example, but how do I get SydneyCBD to print instead of Null...some sort of case subquery using the SecondaryContactID as a link?
You can try something like this:
SELECT T1.CONTACTID,
SECONDARYCONTACTID,
CONTACTDATE,
STAFFMEMBERID,
ISNULL(T1.CONTACTLOCATION, T2.CONTACTLOCATION) ContactLocation
FROM TABLE1 T1
LEFT JOIN (SELECT CONTACTLOCATION,
CONTACTID
FROM TABLE1)T2
ON T1.SECONDARYCONTACTID = T2.CONTACTID
A working example can be found on SQL Fiddle.
Related
I'm using Access 2016 to view data from a table on our SQL server. I have a massive audit log where the record being viewed is represented by a "FolderID" field. I have another table that has values for the FolderID (represented as "fid") along with columns identifying the record's name and other ID numbers.
I want to be able to replace the FolderID value in the first table with CUSTOMER_NAME value from the second table so I know what's being viewed at a glance.
I've tried googling different join techniques to build a query that will accomplish this, but my google-fu is weak or I'm just not caffeinated enough today.
Table 1.
EventTime EventType FolderID
4/4/2019 1:23:39 PM A 12345
Table 2
fid acc Other_ID Third_ID CUSTOMER_NAME
12345 0 9875 12345678 Doe, John
Basically I want to query Table 2 to search for fid using the value in Table 1 for FolderID, and I want it to respond with the CUSTOMER_NAME associated with the FolderID/fid. The result would look like:
EventTime EventType FolderID
4/4/2019 1:23:39 PM A Doe, John
I'm stupid because I thought I was too smart to use the freaking Query Wizard. When I did, and it prompted me to create relationships and actually think about what I was doing, it came up with this.
SELECT [table1].EventTime, [table1].EventType, [table1].FolderID, [table1].ObjRef, [table1].AreaID, [table1].FileID, [table2].CUSTOMER_NAME, [table2].fid FROM [table2]
LEFT JOIN [table1] ON [table2].[fid] = [table1].[FolderID];
You can run this query and check if it helps!.
Select EventTime, EventType , CUSTOMER_NAME AS FolderID FROM Table1, Table2 Where Table1.FolderID = Table2.fid;
Basically, 'AS' is doing what you want here as you can rename your column to whatever you want.
I am trying to get all the data from all tables in one DB.
I have looked around, but i haven't been able to find any solution that works with my current problems.
I made a C# program that creates a table for each day the program runs. The table name will be like this tbl18_12_2015 for today's date (Danish date format).
Now in order to make a yearly report i would love if i can get ALL the data from all the tables in the DB that stores these reports. I have no way of knowing how many tables there will be or what they are called, other than the format (tblDD-MM-YYYY).
in thinking something like this(that obviously doesen't work)
SELECT * FROM DB_NAME.*
All the tables have the same columns, and one of them is a primary key, that auto increments.
Here is a table named tbl17_12_2015
ID PERSONID NAME PAYMENT TYPE RESULT TYPE
3 92545 TOM 20,5 A NULL NULL
4 92545 TOM 20,5 A NULL NULL
6 117681 LISA NULL NULL 207 R
Here is a table named tbl18_12_2015
ID PERSONID NAME PAYMENT TYPE RESULT TYPE
3 117681 LISA 30 A NULL NULL
4 53694 DAVID 78 A NULL NULL
6 58461 MICHELLE NULL NULL 207 R
What i would like to get is something like this(from all tables in the DB):
PERSONID NAME PAYMENT TYPE RESULT TYPE
92545 TOM 20,5 A NULL NULL
92545 TOM 20,5 A NULL NULL
117681 LISA NULL NULL 207 R
117681 LISA 30 A NULL NULL
53694 DAVID 78 A NULL NULL
58461 MICHELLE NULL NULL 207 R
Have tried some different query's but none of them returned this, just a lot of info about the tables.
Thanks in advance, and happy holidays
edit: corrected tbl18_12_2015 col 3 header to english rather than danish
Thanks to all those who tried to help me solving this question, but i can't (due to my skill set most likely) get the UNION to work, so that's why i decided to refactor my DB.
While you could store the table names in a database and use dynamic sql to union them together, this is NOT a good idea and you shouldn't even consider it - STOP NOW!!!!!
What you need to do is create a new table with the same fields - and add an ID (auto-incrementing identity column) and a DateTime field. Then, instead of creating a new table for each day, just write your data to this table with the DateTime. Then, you can use the DateTime field to filter your results, whether you want something from a day, week, month, year, decade, etc. - and you don't need dynamic sql - and you don't have 10,000 database tables.
I know some people posted comments expressing the same sentiments, but, really, this should be an answer.
If you had all the tables in the same database you would be able to use the UNION Operator to combine all your tables..
Maybe you can do something like this to select all the tables names from a given database
For SQL Server:
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_CATALOG='dbName'
For MySQL:
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_SCHEMA='dbName'
Once you have the list of tables you can move all the tables to 1 database and create your report using Unions..
You will need to use a UNION between each select query.
Do not use *, always list the name of the columns you are bringing up.
If you want duplicates, then UNION ALL is what you want.
If you want unique records based on the PERSONID, but there is likely to be differences, then I will guess that an UPDATE_DATE column will be useful to determine which one to use but what if each records with the same PERSONID lived a life of its own on each side?
You'd need to determine business rules to find out which specific changes to keep and merge into the unique resulting record and you'd be on your own.
What is "Skyttenavn"? Is it Danish? If it is the same as "NAME", you'd want to alias that column as 'NAME' in the select query, although it's the order of the columns as listed that counts when determining what to unite.
You'd need a new auto-incremented ID as a unique primary key, by the way, if you are likely to have conflicting IDs. If you want to merge them together into a new primary key identity column, you'd want to set IDENTITY_INSERT to OFF then back to ON if you want to restart natural incrementation.
I am trying to store meta data about a document into a SQL Server. The document are stored into a document archive, and returns back an identifier so I can get back that document by asking the archive to get the document by identifier.
Our user would like to be able to search for this document based on different meta data. The meta data could be 1 attribute or 5 depending on the document type, and the users should be able to create new document types from a admin site.
I can see two solution here. One is that each documenttype gets it's own metadata table, where all metadata attributes are predefined, and if one should be added a new column needs to be created. And if a new documenttype is created a new metadata table needs to be created. Our DBA will freak out with a solution like this, and I also see a problem with indexes. Because if the documenttype has 5 different meta data attributes it needs to be searchable with 1 or 4 of them specified in the search. Then I would need to write index for all the different combinations of possible searchs.
here is an example (fictiv)
|documentId | Name | InsertDate | CustomerId | City
| 1 | John | 2014-01-01 | 2 | London
| 2 | John | 2014-01-20 | 5 | New York
| 3 | Able | 2014-01-01 | 10 | Paris
I could here say:
Give me all documents where Name = 'John'
Give me all documets where Name = 'John' And CustomerId = 5
Give me all document where InserDate = '2014-01-01' and City = 'London'
This will be 3 differnet indexes and then I haven't coverd all possible combinations. This isn't practical.
So I am look in to the evil 'EAV' (anti)pattern.
So instead of having the metadata as columns I can have the as rows.
|documentId | MetaAttribute | MetaValue
| 1 | Name | John
| 1 | InsertDate | 2014-01-01
| 1 | CustomerId | 2
| 1 | City | London
| 2 | Name | John
| 2 | InsertDate | 2014-01-20
| 2 | CustomerId | 5
| 2 | City | New York
| 3 | Name | Able
| 3 | InserDate | 2014-01-01
| 3 | CustomerId | 10
| 3 | City | Paris
Here it's simple to create one index om MetaAttribute och metaValue, and it's covered. If a new documenttype is created, new metadata can be created with that documenttype into a MetaAttributeTable (that contains all MetaAttribute for the different documenttype). So no need to create new tables or coulms if a new documenttype is added or if a new attribute is added to a documenttype. Instead all MetaValues most be strings :( and the SQL Query to find the document id is a bit more complicated.
This is what I figured out. (In this example the MetaAttribute is a string, but would be an ID to the MetaAttribute Table)
SELECT * FROM [Document]
WHERE ID IN (SELECT documentId FROM [MetaData]
WHERE ((MetaAttribute = 'Name' AND MetaValue = 'John')
OR (MetaAttribute = 'CustomerId' and MetaValue = '5'))
GROUP BY [documentId]
HAVING Count(1) = 2)
Here I need to ask if the Name = 'John' and CustomerId = 5. I do that by finding all records that have Name = 'John' and CustomerId = '5' and the Group it on the documentId and count number of items in the group. If I got 2 then both Name = 'John' and CustomerId = '5' is true for this search. Return the documentId and use that to retrive information about the document, like the document archive storage id.
There should be a better SQL statement for this isn't there?
So my question is. Is there a better approche than these 2. Is the EAV-pattern so bad that I should stick with the first approche and have a Freaked out DBA and "ten millions of indexes"
We are talking about a system that will have around 10-20 millions of new records each month, and contain data for at least 3 years.... So the tables will be preatty big and good indexes are neccasary for performance.
Best Regards
Magnus
The EAV model is appealing if you have unbounded attributes--that is, anyone can set up anything as an attribute. However, it sounds from your description that this is not the case--the possible document attributes come from a known and fairly limited set. If this is the case, routine normalization suggests the following:
-- One per document
CREATE TABLE Document
(
DocumentId -- primary key
,DocumentType
,<etc>
)
-- One per "type" of document
CREATE TABLE DocumentType
(
DocumentTypeId -- pirmary key
,Name
)
-- One per possible document attribute.
-- Note that multiple document types can reference the same attribute
CREATE TABLE DocumentAttributes
(
AttributeId -- primary key
,Name
)
-- This lists which attributes are used by a given type
CREATE TABLE DocumentTypeAttributes
(
DocumentTypeId
,AttributeId
-- compound primary key on both columns
-- foeign keys on both columns
)
-- This contains the final association of document and attributes
CREATE TABLE DocumentAttributeValues
(
DocumentId
,AttributeId
,Value
-- compound primary key on DocumentId, AttributeId
-- foeign keys on both columns ot their respective parent tables
)
A tighter model with more robust keys could be implemented to ensure at the database level that an attribute cannot be assigned to a document with an “inappropriate” type.
Queries have to use joins, but (presumably) only the Documents and DocumentAttributes tables will ever be large. An index on on (AttributeId + Value) facilitiate lookups by attribute type, and depending on cardinality an index on (Value + AttributeId) could make searches for specific attributes quite efficient.
(Edit)
Ooh, clever, I created two tables with the same name. I've renamed the last one to DocumentAttributeValues. (Free advice is clearly worth what you paid for it!)
This shows how ugly these systems can get in SQL, as you have to “look up” both attributes separately. On the plus side you don’t have to worry about “does this type go with this document”, as those rules have (better had) been applied when the data was loaded. Two examples:
This one spells everything out in joins, and as such I think it might perform worse than the next:
-- Top-down
SELECT do.DocumentId
from Documents do
inner join DocumentAttributes da1
on da.Name = 'Name'
inner join DocumentAttributeValues dav1
on dav1.AttributeId = da1.AttributeId
and dav1.Value = 'John'
inner join DocumentAttributes da2
on da2.Name = 'CustomerId'
inner join DocumentAttributeValues dav2
on dav2.AttributeId = da2.AttributeId
and dav2.Value = '5'
This one picks out the attributes, then finds which documents have all of them. It might perform better, as there’s one less table to process:
-- Bottom-up
SELECT xx.DocumentId
from (-- All documents with name "John"
select dav.DocumentId
from DocumentAttributes da
inner join DocumentAttributeValues dav
on dav.AttributeId = da.AttributeId
where da.Name = 'Name'
and dav.Value = 'John'
-- This combines the two sets, with "all" keeping any duplicate entries
union all
-- All documents with CustomerId = "5"
select dav.DocumentId
from DocumentAttributes da
inner join DocumentAttributeValues dav
on dav.AttributeId = da.AttributeId
where da.Name = 'CustomerId'
and dav.Value = '5') xx -- Have to give the subquery an alias
group by xx.DocumentId
having count(*) = 2
While further refinements might be possible, the more more attributes you’re filtering on, the uglier the queries will be. Five attributes max might work ok in SQL, but if you’ve got tons of attributes, a NoSQL solution might be what you’re looking for.
(Please note that, as with my original post, I have not tested this code, so there may be typos or subtle--or not so subtle--errors in here.)
SQL Server 2008+ offers three related features for dealing with such cases:
Sparse Columns which allow you to define hundreds of columns even if only a subset are used at a time
Column Sets allow you to group these columns and treat them as a group
Filtered indexes can index only the rows that actually have values in them.
These features allow you to work with more-or-less normal SQL statements to handle all metadata columns.
These features were specifically added to address the EAV/metadata scenario.
EDIT
If you have a limited set of attributes that are always filled, there is no need for Sparse Columns or the EAV anti-pattern either.
You can create your tables as you normally would and add indexes to optimize the real workload you encounter. Certain types of queries will occur far more often than others and SQL Server's Index tuning advisor can propose the indexes and statistics to use based on a trace captured using SQL Server's Profiler.
It's quite possible that only a subset of the columns will accelerate searches and the rest can be added as include columns in the index.
Full Text Search
A more powerful option is to use SQL Server's Full Text Search. This will allow you to execute queries using arbitrary attributes. This is another technique using by document/content management systems, ERPs and CRMs to handle arbitrary attributes.
With FTS you simply specify the columns to include in one FTS index and don't have to create separate indexes for each attribute.
You can use FTS predicates in SELECT queries like this:
SELECT Name, ListPrice
FROM Production.Product
WHERE ListPrice = 80.99
AND CONTAINS(Name, 'Mountain')
This can result in much simpler queries (you just write a modified select) and administration (no worries about column order in indexes, only one FTS index to manage)
Let's say you have a web application that manages books for book sellers, and it is built on a multi-tenant database with a single books table that contains books from several book sellers.
Now let's say that each book seller really wants each of their books to have a unique number associated with it so they can look books up by that number, but it's important to them that the number is roughly consecutive for them. (It's OK if there are small breaks in the sequence due to deleted books and other events that cause an AutoNumber to get consumed but not used).
Obviously each book already has a unique number (primary key) associated with it that is generated via AutoNumber and is unique across book sellers. That is not what I am discussing here.
Let's just assume SQL-Server from here on, but the discussion applies equally to Oracle (except that Oracle uses Sequences that are independent of tables, and the current version of SQL Server must use a table to accomplish the same thing).
We want a number that increments safely in the context of a book seller. We want to maintain the benefits of using AutoNumber, but we want there to be one sequence per book seller. It seems like there are two options, and neither are very good:
Create one single-column table per book seller. This scares me because I can't think of another example of dynamically changing the schema (adding a new table whenever a new book seller is added to the system via the web application) in a web request. It also seems really heavyweight to have one table per book seller. I know a future version of SQL-server will support Sequences, but even that would still be a schema change at run time.
Roll your own auto-numbering behavior. This seems really risky because databases' built-in AutoNumber features take care of a lot of stuff for you, and giving that up is a big deal. Attempts to re-implement it yourself are probably error-prone and may cause poorer concurrency than the built-in AutoNumber.
Hopefully there are additional options that I'm missing. Has anyone successfully dealt with a similar situation? Thanks.
Is there a reason you couldn't have a 2 field table with:
BookSeller_ID, BookID
You wouldn't need to change schema as you add sellers, and it would be trivial to track per seller:
SELECT MAX(BookID)
WHERE BookSeller_ID = 123
For additional info you could also add a Universal_BookID field that linked to your unique ID referenced in the 3rd paragraph.
EDIT:
To clarify, if you have sellers 1 2 and 3 you could have a table like:
SellerID BookID BookUniversalID
1 1 123
2 1 456
3 1 999
1 2 1234
1 3 8798
1 4 999
1 5 10000001
3 2 123
3 3 456
You keep track of which seller has which IDs assigned and which actual book it links too, and to determine what the next ID is for a seller just query
SELECT MAX(bookid) FROM ThisTable WHERE SellerID = 1
DENSE_RANK, works in SQL Server and Oracle
Assuming your table looks vaguely thus
CREATE TABLE dbo.BOOKS
(
internal_book_id int identity(1,1) primary key
, seller_id int NOT NULL
, title varchar(50) NOT NULL
)
Whenever you present the identity value to the seller, use the dense_rank() function to generate the surrogate values.
CREATE VIEW dbo.BOOK_TO_SELLER_MAP
AS
SELECT
B.*
, DENSE_RANK() OVER (PARTITION BY B.seller_id ORDER BY B.internal_book_id ASC) AS unique_book_id_for_seller
FROM
dbo.BOOKS B
WHERE
B.seller_id = #sellerId
For the combination of seller_id and the generated id, you ought to always match back to the true id (assuming no physical deletes).
Demo code
;
WITH BOOKS (internal_book_id, seller_id, title)
AS
(
SELECT 1, 100, 'Secret of NIMH'
UNION ALL SELECT 2, 400, 'Once and Future King'
UNION ALL SELECT 7, 88, 'Microsoft SQL Server 2008'
UNION ALL SELECT 8, 100, 'Bonfire of the Vanities'
UNION ALL SELECT 9, 100, 'Canary Row'
UNION ALL SELECT 10, 400, '1916'
UNION ALL SELECT 11, 100, 'The Picture of Dorian Gray'
UNION ALL SELECT 12, 88, 'The Disasters of War'
)
, BOOK_TO_SELLER_MAP AS
(
SELECT
B.*
, DENSE_RANK() OVER (PARTITION BY B.seller_id ORDER BY B.internal_book_id ASC) AS unique_book_id_for_seller
FROM
BOOKS B
)
SELECT
*
FROM
BOOK_TO_SELLER_MAP V
ORDER BY
V.seller_id
, V.unique_book_id_for_seller
Results
internal_book_id seller_id title unique_book_id_for_seller
7 88 Microsoft SQL Server 2008 1
12 88 The Disasters of War 2
-------------------------------------------------------------------------------
1 100 Secret of NIMH 1
8 100 Bonfire of the Vanities 2
9 100 Canary Row 3
11 100 The Picture of Dorian Gray 4
-------------------------------------------------------------------------------
2 400 Once and Future King 1
10 400 1916 2
OMG Ponies is correct that sequences are the only correct way to achieve this. There isn't really another viable option.
This will be a long question so I'll try and explain it as best as I can.
I've developed a simple reporting tool in which a number of results are stored and given a report id, these results were generated from a particular quote being used on the main system, with a huge list of these being stored in a quotes table. Here are the current batch:
REPORTS
REP_ID DESC QUOTE_ID
-----------------------------------
1 Test 1
2 Today 1
3 Last Week 2
RESULTS
RES_ID TITLE REFERENCE REP_ID
---------------------------------------------------
1 Equipment Toby 1
2 Inventory Carl 1
3 Stocks Guest 2
4 Portfolios Guest 3
QUOTE
QUOTE_ID QUOTE
------------------------------------
1 Booking a meeting room
2 Car Park Policy
3 New User Guide
So far, so good, a simple stored procedure was able to pull all the information necessary.
Now, the feature list has been upped to include categories and groups of the quotes. In the Reports table quote_id has been changed to group_id to link to the following tables.
REPORTS
- REPORT_ID
- DESC
- GROUP_ID
GROUP
- GROUP_ID
- GROUP
GROUP_CAT_JOIN
- GCJ_ID
- CAT_ID
- GROUP_ID
CATEGORIES
- CAT_ID
- CATEGORY
CAT_QUOTE_JOIN
- CQJ_ID
- CAT_ID
- QUOTE_ID
The idea of these changes is so that instead of running a report on a quote I should now write a report for a group where a group is a set of quotes for certain occasions. I should also be able to run a report on a category where a category is also a set of quotes for certain departments. The trick is that several categories can fall into one group.
To explain it further, the results table has a report_id that links to reports, reports has a group_id that links to groups, groups and categories are linked through a group_cat_join table, the same with categories and quotes through a cat_quote_join table.
In basic terms I should be able to pull all the results from either a group of quotes or a category of quotes. The query will aim to pull all the results from a certain report under either a certain category, a group or both. This puzzle has left me stumped for days now as inner joins don't appear to be working and I'm struggling to find other ways to solve the problem using SQL.
Can anyone here help me?
Here's some extra clarification.
I want to be able to return all the results within a category, but as of right now the solution below and the ones I've tried always output every solution within a description, which is not what I want.
Here's an example of the data I have in there at the moment
Results
RES_ID TITLE REFERENCE REP_ID
---------------------------------------------------
1 Equipment Toby 1
2 Inventory Carl 1
3 Stocks Guest 2
4 Portfolios Guest 3
Reports
REP_ID DESC GROUP_ID
-----------------------------------
1 Test 1
2 Today 1
3 Last Week 2
GROUP
GROUP_ID GROUP
---------------------------------
1 Standard
2 Target Week
GROUP_CAT_JOIN
GCJ_ID GROUP_ID CAT_ID
----------------------------------
1 1 1
2 1 2
3 2 3
CATEGORIES
CAT_ID CAT
-------------------------------
1 York Office
2 Glasgow Office
3 Aberdeen Office
CAT_QUOTE_JOIN
CQJ_ID CAT_ID QUOTE_ID
-----------------------------------
1 1 1
2 2 2
3 3 3
QUOTE
QUOTE_ID QUOTE
------------------------------------
1 Booking a meeting room
2 Car Park Policy
3 New User Guide
This is the test data I am using at the moment and to my knowledge it is similar to what will be run through once this is done. In all honesty I'm still trying to get my head around this structure.
The result I am looking for is if I choose to search by group I'll get everything within a group, if I choose everything inside a category I get everything just inside that category, and if I choose something from a category in a group I get everything inside that category. The problem at the moment is that whenever the group is referenced everything inside every category that's linked to the group is pulled.
The following will get the necessary rows from the results:
select
a.*
from
results a
inner join reports b on
a.rep_id = b.rep_id
and (-1 = #GroupID or
b.group_id = #GroupID)
and (-1 = #CatID or
b.cat_id = #CatID)
Note that I used -1 as the placeholder for all Groups and Categories. Obviously, use a value that makes sense to you. However, this way, you can specify a specific group_id or a specific cat_id and get the results that you want.
Additionally, if you want Group/Category/Quote details, you can always append more inner joins to get that info.
Also note that I added the Group_ID and Cat_ID conditions to the Reports table. This would be the SQL necessary if and only if you add a Cat_ID column to the Reports table. I know that your current table structure doesn't support this, but it needs to. Otherwise, as my grandfather used to say, "Boy, you can't get there from here." The issue here is that you want to limit reports by group and category, but reports only knows about group. Therefore, we need to tie something to the category from reports. Otherwise, it will never, ever, ever limit reports by category. The only thing that you can limit by both group and category is quotes. And that doesn't seem to be your requirement.
As an addendum: If you add cat_id to results instead of reports, the join condition should be:
and (-1 = #CatID or
a.cat_id = #CatID)
Is this what you are looking for?
SELECT a.*
FROM Results a
JOIN Reports b ON a.REP_Id = c.REP_Id
WHERE EXISTS (
SELECT * FROM CAT_QUOTE_JOIN c
WHERE c.QUOTE_ID = b.QUOTE_ID -- correlation to the outer query
AND c.CAT_ID = #CAT_ID -- parameterization
)
OR EXISTS (
-- note that subquery table aliases are not visible to other subqueries
-- so we can reuse the same letters
SELECT * FROM CAT_QUOTE_JOIN c, GROUP_CAT_JOIN d
WHERE c.CAT_ID = d.CAT_ID -- subquery join
AND c.QUOTE_ID = b.QUOTE_ID -- correlation to the outer query
AND d.GROUP_ID = #GROUP_ID -- parameterization
)