Explaining row and column dependencies - database

This is a simple and common scenario at work, and I'd appreciate some input.
Say I am generating a report for the owners of a pet show, and they want to know which of their customers have bought how many of each pet. In this scenario my only tools are SQL and something that outputs my query to a spreadsheet.
As the shop owner, I might expect reports in the form:
Customer Dog Cat Rabbit
1 2 3 0
2 0 1 1
3 1 2 0
4 0 0 1
And if one day I decided to stock Goldfish then the report should now come out as.
Customer Dog Cat Rabbit Goldfish
1 2 3 0 0
2 0 1 1 0
3 1 2 0 0
4 0 0 1 0
5 0 0 0 1
But as you probably know, to have a query which works this way would involve some form of dynamic code generation and would be harder to do.
The simplest query would work along the lines of:
Cross join Customers and Pets, Outer join Sales, Group, etc.
and generate:
Customer Pet Quantity
1 Dog 2
1 Cat 3
1 Rabbit 0
1 Goldfish 0
2 Dog 0
2 Cat 1
2 Rabbit 1
...etc
a) How would I explain to the shop owners that the report they want is 'harder' to generate? I'm not trying to say it's harder to read, but it is harder to write.
b) What is the name of the concept I am trying to explain to the customer (to aid with my Googling)?

The name of the concept is 'cross-tab' and can be accomplished in several ways.
MS Access has proprietary extensions to SQL to make this happen. SQL pre-2k5 has a CASE trick and 2k5 and later has PIVOT, but I think you still need to know what the columns will be.

Some databases indeed support some way of creating cross tables, but I think most need to know
the columns in advance, so you'd have to modify the SQL (and get a database that supports such an extension).
Another alternative is to create a program that will postprocess the second "easy" table to get your clients the cross table as output. This is probably easier and more generic than having to modify SQL or dynamically generate it.
And about a way to explain the problem... you could show them in an Excel how many steps are needed to get the desired result:
Source data (your second listing).
Select values from the pets column
Place each pet type found on a new column
Count values per each type per client
Fill the values
and then say that SQL gives you only the source data, so it's of course more work.

This concept is called pivoting
SQL assumes that your data is represented in terms of relations with fixed structure.
Like, equality is a binary relation, "customer has this many pets of this type" is a ternary relation and so on.
When you see this resultset:
Customer Pet Quantity
1 Dog 2
1 Cat 3
1 Rabbit 0
1 Goldfish 0
2 Dog 0
2 Cat 1
2 Rabbit 1
, it's actually a relation defined by all possible combinations of domain values being in this relation.
Like, a customer 1 (domain customers id's) has exactly 2 (domain positive numbers) pets of genus dog (domain pets).
We don't see rows like these in the resultset:
Customer Pet Quantity
1 Dog 3
Pete Wife 0.67
, because the first row is false (customer 1 doesn't have 3 items of dog, but 2), and the second row values are out of their domain scopes.
SQL paradigma implies that your relations are defined when you issue a query and each row returned defines the relation completely.
SQL Server 2005+ can map rows into columns (that is what you want), but you should know the number of columns when designing the query (not running).
As a rule, the reports you are trying to build are built with reporting software which knows how to translate relational SQL resultsets into nice looking human readable reports.

I have always called this pivoting, but that may not be the formal name.
Whatever it's called you can do almost all of this in plain SQL.
SELECT customer, count(*), sum(CASE WHEN pet='dog' THEN 1 ELSE 0 END) as dog, sum(case WHEN pet='cat' THEN 1 ELSE 0 END) as cast FROM customers join pets
Obviously what's missing is the dynamic columns. I don't know if this is possible in straight SQL, but it's certainly possible in a stored procedure to generate the query dynamically after first querying for a list of pets. The query is built into a string then that string is used to create a prepared statement.

Related

How to model arbitrarily ordering items in database?

I accepted a new feature to re-order some items by using Drag-and-Drop UI and save the preference for each user to the database. What's the best way to do so?
After reading some questions on StackOverflow, I found this solution.
Solution 1: Use decimal numbers to indicate order
For example,
id item order
1 a 1
2 b 2
3 c 3
4 d 4
If I insert item 4 between item 1 and 2, the order becomes,
id item order
1 a 1
4 d 1.5
2 b 2
3 c 3
In this way, every new order = order[i-1] + order[i+1] / 2
If I need to save the preference for every user, then I need to another relationship table like this,
user_id item_id order
1 1 1
1 2 2
1 3 3
1 4 1.5
I need num_of_users * num_of_items records to save this preference.
However, there's a solution I can think of.
Solution 2: Save the order preference in a column in the User table
This is straightforward by adding a column in the User table to record the order. Each value would be parsed as an array of item_ids that ranked by the index of the array.
user_id . item_order
1 [1,4,2,3]
2 [1,2,3,4]
Is there any limitation of this solution? Or is there any other ways to solve this problem?
Usually, an explicit ordering deals with the presentation or some specific processing of data. Hence, it's a good idea to separate entities of theirs presentation/processing. For example
users
-----
user_id (PK)
user_login
...
user_lists
----------
list_id, user_id (PK)
item_index
item_index can be a simply integer value :
ordered continuously (1,2...N): DELETE/INSERT of the whole list are normally required to change the order
ordered discretely with some seed (10,20...N): you can insert new items without reordering the whole list
Another reason to separate entity data and lists: reordering lists should be done in transaction that may lead to row/table locks. In case of separated tables only data in list table is impacted.

yet another tsql question

i have three tables
documents
attributes
attributevalues
documents can have many attributes
and these atributes have value in attributevalue table
what i want in single query get all documents and assigned atributes of relevant documents in row each row
(i assume every documents have same attributes assigned dont need complexity of diffrent attribues now)
for example
docid attvalue1 attvalue2
1 2 2
2 2 2
3 1 1
how can i do that in single query
Off the top if my head, I don't think you can do this without dynamic SQL.
The crux of the Entity-Attribute-Value (EAV) technique (which is what you are using) is to store columns as rows. What you want to do is convert those rows back to columns for the purpose of this query. Using PIVOT makes this possible. However, PIVOT requires knowing the number of rows that need to be converted to columns at the time the query is written. So assuming you are using EAV because you need flexible attributes/values, you won't know this information when you write the query.
So the solution would be to use dynamic SQL in conjunction with PIVOT. Did a quick search and this looks promising (didn't really read the whole thing):
http://www.simple-talk.com/community/blogs/andras/archive/2007/09/14/37265.aspx
For the record, I am not a fan of dynamic SQL and would recommend finding another approach to the larger problem (e.g. pivoting in application code).
If you know all the attributes (and their IDs) at design-time:
SELECT d.docid,
a1.attvalue AS attvalue1
a2.attvalue AS attvalue2
FROM documents d
JOIN attributevalues a1 ON d.docid = a1.docid
JOIN attributevalues a2 ON d.docid = a2.docid
WHERE a1.attrid = 1
AND a2.attrid = 2
If you don't, things get quite a bit messier and difficult to answer without knowing your schema.
lets make example
documents table's columns
docid,docname,createddate,createduser
and values
1 account.doc 10.10.2010 aeon
2 hr.doc 10.11.2010 aeon
atributes table's columns
attid,name,type
and values
1 subject string
2 recursive int
attributevalues table's columns
attvalueid,docid,attid,attvalue(sql_variant)
and values
1 1 1 "accounting doc"
1 1 2 0
1 2 1 "humen r doc"
1 2 2 1
and I want query result
docid,name,atribvalue1,atribvalue1,atribvalueN
1 account.doc "accounting doc" 0
2 hr.doc "humen r doc" 1

Query a Lookup Table in Excel or Access

I've got a mental block about what I'm sure is a common scenario:
I have some data in a csv file that I need to do some very basic reporting from.
The data is essentially a table with Resources as column headings and People as row headings, the rest of the table consists of Y/N flag, "Y" if the person has access to the resource, "N" if they don't. Both the resources and the people have unique names.
Sample data:
Res1 Res2 Res3
Bob Y Y N
Tom N N N
Jim Y N Y
The table is too large to simply view it as whole in Excel(say 300 resources and 600 people), so I need a way to easily query and display (A simple list would be ok) what resources a person has access to, given the person's name.
The person that will need to use this has MS Office, and not much else on their PC.
So, the question is: What is the best way to manipulate this data to get the report I need? My gut says MS Access would be the best, but I can't figure out to automatically import data like this into a normal relational database. If not Access, perhaps there are some functions in Excel that could help me out?
You should normalize your data. This will make it easier to query against. For example:
table users:
UserID UserName
1 Bob
2 Tim
3 Jim
table resources:
ResourceID ResourceDesc
1 Printer #1
2 Fax Machine
3 Bowling Ball Wax
table users_resources:
LinkID UserID ResourceID
1 1 1
2 1 2
3 3 1
4 3 3
SELECT ResourceID
FROM users_resources, users
WHERE users.UserName="Bob"

How do I create nested categories in a Database?

I am making a videos website where categories will be nested:
e.g. Programming-> C Language - > MIT Videos -> Video 1
Programming -> C Language -> Stanford Video - > Video 1
Programming -> Python -> Video 1
These categories and sub-categories will be created by users on the fly. I will need to show them as people create them in the form of a navigable menu, so that people can browse the collection easily.
Could someone please help me with how I can go about creating such a database?
Make a categories table with the following fields:
CategoryID - Integer
CategoryName - String/Varchar/Whatever
ParentID - Integer
Your ParentID will then reference back to the CategoryID of its parent.
Example:
CategoryID CategoryName ParentID
---------------------------------
1 Dog NULL
2 Cat NULL
3 Poodle 1
4 Dachsund 1
5 Persian 2
6 Toy Poodle 3
Quassnoi said :
You should use either nested sets or parent-child models.
I used to implement both of them. What I could say is:
Use the nested set architecture if your categories table doesn't change often, because on a select clause it's fast and with only one request you can get the whole branch of the hierarchy for a given entry. But on a insert or update clause it takes more time than a parent child model to update the left and right (or lower and upper in the example below) fields.
Another point, quite trivial I must admit, but:
It's very difficult to change the hierarchy by hand directly in the database (It could happen during the development). So, be sure to implement first an interface to play with the nested set (changing parent node, move a branch node, deleting a node or the whole branch etc.)
Here are two articles on the subject:
Storing Hierarchical Data in a Database
Managing Hierarchical Data in MySQL
Last thing, I didn't try it, but I read somewhere that you can have more than one tree in a nested set table, I mean several roots.
You should use either nested sets or parent-child models.
Parent-child:
typeid parent name
1 0 Buyers
2 0 Sellers
3 0 Referee
4 1 Electrical
5 1 Mechanic
SELECT *
FROM mytable
WHERE group IN
(
SELECT typeid
FROM group_types
START WITH
typeid = 1
CONNECT BY
parent = PRIOR typeid
)
will select all buyers in Oracle.
Nested sets:
typeid lower upper Name
1 1 2 Buyers
2 3 3 Sellers
3 4 4 Referee
4 1 1 Electrical
5 2 2 Mechanic
SELECT *
FROM group_types
JOIN mytable
ON group BETWEEN lower AND upper
WHERE typeid = 1
will select all buyers in any database.
See this answer for more detail.
Nested sets is more easy to query, but it's harder to update and harder to build a tree structure.
From the example in your question it looks like you'd want it to be possible for a given category to have multiple parents (e.g., "MIT Videos -> Video 1 Programming" as well as "Video -> Video 1 Programming"), in which case simply adding a ParentID column would not be sufficient.
I would recommend creating two tables: a simple Categories table with CategoryID and CategoryName columns, and a separate CategoryRelationships table with ParentCategoryID and ChildCategoryID columns. This way you can specify as many parent-child relationships as you want for any particular category. It would even be possible using this model to have a dual relationship where two categories are each other's parent and child simultaneously. (Off the top of my head, I can't think of a great use for this scenario, but at least it illustrates how flexible the model is.)
What you need is a basic parent-child relationship:
Category (ID: int, ParentID: nullable int, Name: nvarchar(1000))
A better way to store the parent_id of the table is to have it nested within the ID
e.g
100000 Programming
110000 C Language
111000 Video 1 Programming
111100 C Language
111110 Stanford Video
etc..so all you need it a script to process the ID such that the first digit represents the top level category and so on as you go deeper down the hierarchy

GROUP_CONCAT and DISTINCT are great, but how do i get rid of these duplicates i still have?

i have a mysql table set up like so:
id uid keywords
-- --- ---
1 20 corporate
2 20 corporate,business,strategy
3 20 corporate,bowser
4 20 flowers
5 20 battleship,corporate,dungeon
what i WANT my output to look like is:
20 corporate,business,strategy,bowser,flowers,battleship,dungeon
but the closest i've gotten is:
SELECT DISTINCT uid, GROUP_CONCAT(DISTINCT keywords ORDER BY keywords DESC) AS keywords
FROM mytable
WHERE uid !=0
GROUP BY uid
which outputs:
20 corporate,corporate,business,strategy,corporate,bowser,flowers,battleship,corporate,dungeon
does anyone have a solution? thanks a ton in advance!
What you're doing isn't possible with pure SQL the way you have your data structured.
No SQL implementation is going to look at "Corporate" and "Corporate, Business" and see them as equal strings. Therefore, distinct won't work.
If you can control the database,
The first thing I would do is change the data setup to be:
id uid keyword <- note, not keyword**s** - **ONE** value in this column, not a comma delimited list
1 20 corporate
2 20 corporate
2 20 business
2 20 strategy
Better yet would be
id uid keywordId
1 20 1
2 20 1
2 20 2
2 20 3
with a seperate table for keywords
KeywordID KeywordText
1 Corporate
2 Business
Otherwise you'll need to massage the data in code.
Mmm, your keywords need to be in their own table (one record per keyword). Then you'll be able to do it, because the keywords will then GROUP properly.
Not sure if MySql has this, but SQL Server has a RANK() OVER PARTITION BY that you can use to assign each result a rank...doing so would allow you to only select those of Rank 1, and discard the rest.
You have two options as I see it.
Option 1:
Change the way your store your data (keywords in their own table, join the existing table with the keywords table using a many-to-many relationship). This will allow you to use DISTINCT. DISTINCT doesn't work currently because the query sees "corporate" and "corporate,business,strategy" as two different values.
Option 2:
Write some 'interesting' sql to split up the keywords strings. I don't know what the limits are in MySQL, but SQL in general is not designed for this.

Resources