Preventing aggregation along dimensions' attributes - sql-server

Say I have this schema (sorry for the slightly convoluted example):
CREATE TABLE Sales
(
ID INT PRIMARY KEY,
Shop NVARCHAR(MAX),
ShopLocationLeft NVARCHAR(MAX),
ShopLocationRight NVARCHAR(MAX),
Amount DECIMAL
)
INSERT INTO Sales VALUES
(1, 'Shop #1', 'New', 'York', 10000),
(2, 'Shop #2', 'New', 'Delhi', 1000),
(3, 'Shop #3', 'North', 'York', 5000)
Then I create a cube with a Shop dimension with 3 attributes:
Name (column Shop)
Location Left (column ShopLocationLeft)
Location Right (column ShopLocationRight)
I can explore the cube along this dimension:
SELECT
[Amount] ON COLUMNS,
[Shop].[Name].Children ON ROWS
FROM
[Sales]
To get:
Amount
Shop #1 10000
Shop #2 1000
Shop #3 5000
So far so good.
But using other attributes like Location Left:
SELECT
[Amount] ON COLUMNS,
[Shop].[Location Left].Children ON ROWS
FROM
[Sales]
We get:
Amount
New 11000
North 5000
So the cube is allowing exploration and aggregation 1 level deeper than the dimension, along the attributes, making them some kind of sub-dimensions.
Which in this case has no business meaning.
I was expecting that, like an SQL SELECT, this would display the Location Left column instead:
Amount
New 10000
New 1000
North 5000
Because for me this dimension has 3 points:
('Shop #1', 'New', 'York')
('Shop #2', 'New', 'Delhi')
('Shop #3', 'North', 'York')
Which should be considered atomic entities that can't be broken down further.
I understand that this behavior can be useful (e.g. for first and last name) but in this case it does not make any sense.
Or if I had defined an n-levels hierarchy for an attribute (e.g. country -> city -> location) it would be logical too as I would have explicitly asked for a deeper exploration and aggregation.
How to prevent this behavior when it would lead to non relevant results?

If you have an attribute Location Left in your Shop dimension you can choose ID as the Key column and Location Left as the Name column of this attribute (in the Dimension structure tab - right click on the Location Left attribute and select properties, then you will look for KeyColumn and NameColumn properties). If you do this ,you will see 'New' being displayed multiple times in the results.
If you have an attribute say Location Left and choose the same Location Left both as the Key column and as the Name column, you will see only one entry per Location Left Name.

Related

How to implement many-to-many-to-many database relationship?

I am building a SQLite database and am not sure how to proceed with this scenario.
I'll use a real-world example to explain what I need:
I have a list products that are sold by many stores in various states. Not every Store sells a particular Product at all, and those that do, may only sell it in one State or another. Most stores sell a product in most states, but not all.
For example, let's say I am trying to buy a vacuum cleaner in Hawaii. Joe's Hardware sells vacuums in 18 states, but not in Hawaii. Walmart sells vacuums in Hawaii, but not microwaves. Burger King does not sell vacuums at all, but will give me a Whopper anywhere in the US.
So if I am in Hawaii and search for a vacuum, I should only get Walmart as a result. While other stores may sell vacuums, and may sell in Hawaii, they don't do both but Walmart does.
How do I efficiently create this type of relationship in a relational database (specifically, I am currently using SQLite, but need to be able to convert to MySQL in the future).
Obviously, I would need tables for Product, Store, and State, but I am at a loss on how to create and query the appropriate join tables...
If I, for example, query a certain Product, how would I determine which Store would sell it in a particular State, keeping in mind that Walmart may not sell vacuums in Hawaii, but they do sell tea there?
I understand the basics of 1:1, 1:n, and M:n relationships in RD, but I am not sure how to handle this complexity where there is a many-to-many-to-many situation.
If you could show some SQL statements (or DDL) that demonstrates this, I would be very grateful. Thank you!
An accepted and common way is the utilisation of a table that has a column for referencing the product and another for the store. There's many names for such a table reference table, associative table mapping table to name some.
You want these to be efficient so therefore try to reference by a number which of course has to uniquely identify what it is referencing. With SQLite by default a table has a special column, normally hidden, that is such a unique number. It's the rowid and is typically the most efficient way of accessing rows as SQLite has been designed this common usage in mind.
SQLite allows you to create a column per table that is an alias of the rowid you simple provide the column followed by INTEGER PRIMARY KEY and typically you'd name the column id.
So utilising these the reference table would have a column for the product's id and another for the store's id catering for every combination of product/store.
As an example three tables are created (stores products and a reference/mapping table) the former being populated using :-
CREATE TABLE IF NOT EXISTS _products(id INTEGER PRIMARY KEY, productname TEXT, productcost REAL);
CREATE TABLE IF NOT EXISTS _stores (id INTEGER PRIMARY KEY, storename TEXT);
CREATE TABLE IF NOT EXISTS _product_store_relationships (storereference INTEGER, productreference INTEGER);
INSERT INTO _products (productname,productcost) VALUES
('thingummy',25.30),
('Sky Hook',56.90),
('Tartan Paint',100.34),
('Spirit Level Bubbles - Large', 10.43),
('Spirit Level bubbles - Small',7.77)
;
INSERT INTO _stores (storename) VALUES
('Acme'),
('Shops-R-Them'),
('Harrods'),
('X-Mart')
;
The resultant tables being :-
_product_store_relationships would be empty
Placing products into stores (for example) could be done using :-
-- Build some relationships/references/mappings
INSERT INTO _product_store_relationships VALUES
(2,2), -- Sky Hooks are in Shops-R-Them
(2,4), -- Sky Hooks in x-Mart
(1,3), -- thingummys in Harrods
(1,1), -- and Acme
(1,2), -- and Shops-R-Them
(4,4), -- Spirit Level Bubbles Large in X-Mart
(5,4), -- Spiirit Level Bubble Small in X-Mart
(3,3) -- Tartn paint in Harrods
;
The _product_store_relationships would then be :-
A query such as the following would list the products in stores sorted by store and then product :-
SELECT storename, productname, productcost FROM _stores
JOIN _product_store_relationships ON _stores.id = storereference
JOIN _products ON _product_store_relationships.productreference = _products.id
ORDER BY storename, productname
;
The resultant output being :-
This query will only list stores that have a product name that contains an s or S (as like is typically case sensitive) the output being sorted according to productcost in ASCending order, then storename, then productname:-
SELECT storename, productname, productcost FROM _stores
JOIN _product_store_relationships ON _stores.id = storereference
JOIN _products ON _product_store_relationships.productreference = _products.id
WHERE productname LIKE '%s%'
ORDER BY productcost,storename, productname
;
Output :-
Expanding the above to consider states.
2 new tables states and store_state_reference
Although no real need for a reference table (a store would only be in one state unless you consider a chain of stores to be a store, in which case this would also cope)
The SQL could be :-
CREATE TABLE IF NOT EXISTS _states (id INTEGER PRIMARY KEY, statename TEXT);
INSERT INTO _states (statename) VALUES
('Texas'),
('Ohio'),
('Alabama'),
('Queensland'),
('New South Wales')
;
CREATE TABLE IF NOT EXISTS _store_state_references (storereference, statereference);
INSERT INTO _store_state_references VALUES
(1,1),
(2,5),
(3,1),
(4,3)
;
If the following query were run :-
SELECT storename,productname,productcost,statename
FROM _stores
JOIN _store_state_references ON _stores.id = _store_state_references.storereference
JOIN _states ON _store_state_references.statereference =_states.id
JOIN _product_store_relationships ON _stores.id = _product_store_relationships.storereference
JOIN _products ON _product_store_relationships.productreference = _products.id
WHERE statename = 'Texas' AND productname = 'Sky Hook'
;
The output would be :-
Without the WHERE clause :-
make Stores-R-Them have a presence in all states :-
The following would make Stores-R-Them have a presence in all states :-
INSERT INTO _store_state_references VALUES
(2,1),(2,2),(2,3),(2,4)
;
Now the Sky Hook's in Texas results in :-
Note This just covers the basics of the topic.
You will need to create combine mapping table of product, states and stores as tbl_product_states_stores which will store mapping of products, state and store. The columns will be id, product_id, state_id, stores_id.

Best way to use compound Index to query with multiple combination of query parameters?

I am building a functionality to estimate Inventory for my Ads serve platform.The fields on which I am trying to estimate with their cardinality is as below:
FIELD: CARDINALITY
location: 10000 (bengaluru, chennai etc..)
n/w speed : 6 (w, 4G, 3G, 2G, G, NA)
priceRange : 10 (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
users: contains number of users falling under any of the above combination.
Ex. {'location':'bengaluru', 'n/w':'4G', priceRange:8, users: 1000}
means 1000 users are from bengaluru having 4G and priceRange = 8
So total combination can be 10000 * 6 * 10 = 600000 and in future more fields can be added to around 29(currently it is 3 location, n/w, priceRange) and total combination can reach the order of 10mn. Now I want to estimate how many users fall under
Now queries I will need are as follows:
1) find all users who are from location:bengaluru , n/w:3G, priceRange: 6
2) find all users from bengaluru
3) Find all users falling under n/w: 3G and priceRange: 8
What is the best possible way to approach to this?
Which database can be best suited for this requirement.What indexes I need to build. Will compound index help? If yes then How ? Any help is appreciated.
Here's my final answer:
Create table Attribute(
ID int,
Name varchar(50));
Create table AttributeValue(
ID int,
AttributeID int,
Value varchar(50));
Create table userAttributeValue(
userID int,
AttributeID varchar(20),
AttributeValue varchar(50));
Create table User(
ID int);
Insert into user (ID) values (1),(2),(3),(4),(5);
Insert into Attribute (ID,Name) Values (1,'Location'),(2,'nwSpeed'),(3,'PriceRange');
Insert into AttributeValue values
(1,1,'bengaluru'),(2,1,'chennai'),
(3,2, 'w'), (4, 2,'4G'), (5,2,'3G'), (6,2,'2G'), (7,2,'G'), (8,2,'NA'),
(9,3,'1'), (10,3,'2'), (11,3,'3'), (12,3,'4'), (13,3,'5'), (14,3,'6'), (15,3,'7'), (16,3,'8'), (17,3,'9'), (18,3,'10');
Insert into UserAttributeValue (userID, AttributeID, AttributeValue) values
(1,1,1),
(1,2,5),
(1,3,9),
(2,1,1),
(2,2,4),
(3,2,6),
(2,3,13),
(4,1,1),
(4,2,4),
(4,3,13),
(5,1,1),
(5,2,5),
(5,3,13);
Select USERID
from UserAttributeValue
where (AttributeID,AttributeValue) in ((1,1),(2,4))
GROUP BY USERID
having count(distinct concat(AttributeID,AttributeValue))=2
Now if you need a count wrap userID in count and divide by the attributes passed in as each user will have 1 record per attribute and to get the "count of users" you'd need to divide by the number of attributes.
This allows for N growth of Attributes and the AttributeValues per user without changes to UI or database if UI is designed correctly.
By treating each datapoint as an attribute and storing them in once place we can enforce database integrity.
Attribute and AttributeValue tables becomes lookups for UserAttributevalue so you can translate the IDs back to attribute name and the value.
This also means we only have 4 tables user, attribute, attributeValue, and UserAttributeValue.
Technically you don't have to store attributeID on the userAttributeValue, but for performance reasons on later joins/reporting I think you'll find it beneficial.
You need to add proper Primary Key's, Foreign keys, and indexes to the tables. They should be fairly self explanatory. On UserAttributeValue I would have a few Composite indexes each with a different order of the unique key. Just depends on the type of reporting/analysis you'll be doing but adding keys as performance tuning is needed is commonplace.
Assumptions:
You're ok with all datavalues being varchar data in all cases.
If needed you could add a datatype, precision, and scale on the attribute table and allow the UI to cast the attribute value as needed. but since they are all in the same field in the database they all have to be the same datatype. and of the same precision/scale.
Pivot tables to display the data across will likely be needed and you know how to handle those (and engine supports them!)
Gotta say I loved the metal exercise; but still would appreciate feedback from others on SO. I've used this approach in 1 systems I've developed and it's been in two I've supported. There are some challenges but it does follow 3rd normal form db design (except for the replicated attributeID in userAttributevalue but that's there for performance gain in reporting/filtering.

Best way to extend information on a relational database

Let's say that we have to store information of different types of product in a database. However, these products have different specifications. For example:
Phone: cpu, ram, storage...
TV: size, resolution...
We want to store each specification in a column of a table, and all the products (whatever the type) must have a different ID.
To comply with that, now I have one general table named Products (with an auto increment ID) and one subordinate table for each type of product (ProductsPhones, ProductsTV...) with the specifications and linked with the principal with a Foreign Key.
I find this solution inefficient since the table Products has only one column (the auto incremented ID).
I would like to know if there is a better approach to solve this problem using relational databases.
The short answer is no. The relational model is a first-order logical model, meaning predicates can vary over entities but not over other predicates. That means dependent types and EAV models aren't supported.
EAV models are possible in SQL databases, but they don't qualify as relational since the domain of the value field in an EAV row depends on the value of the attribute field (and sometimes on the value of the entity field as well). Practically, EAV models tend to be inefficient to query and maintain.
PostgreSQL supports shared sequences which allows you to ensure unique auto-incremented IDs without a common supertype table. However, the supertype table may still be a good idea for FK constraints.
You may find some use for your Products table later to hold common attributes like Type, Serial number, Cost, Warranty duration, Number in stock, Warehouse, Supplier, etc...
Having Products table is fine. You can put there all the columns common across all types like product name, description, cost, price just to name some. So it's not just auto increment ID. Having an internal ID of type int or long int as the primary key is recommended. You may also add another field "code" or whatever you want to call it for user-entered or user-friendly which is common with product management systems. Make sure you index it if used in searching or query criteria.
HTH
While this can't be done completely relationally, you can still normalize your tables some and make it a little easier to code around.
You can have these tables:
-- what are the products?
Products (Id, ProductTypeId, Name)
-- what kind of product is it?
ProductTypes (Id, Name)
-- what attributes can a product have?
Attributes (Id, Name, ValueType)
-- what are the attributes that come with a specific product type?
ProductTypeAttributes (Id, ProductTypeId, AttributeId)
-- what are the values of the attributes for each product?
ProductAttributes (ProductId, ProductTypeAttributeId, Value)
So for a Phone and TV:
ProductTypes (1, Phone) -- a phone type of product
ProductTypes (2, TV) -- a tv type of product
Attributes (1, ScreenSize, integer) -- how big is the screen
Attributes (2, Has4G, boolean) -- does it get 4g?
Attributes (3, HasCoaxInput, boolean) -- does it have an input for coaxial cable?
ProductTypeAttributes (1, 1, 1) -- a phone has a screen size
ProductTypeAttributes (2, 1, 2) -- a phone can have 4g
-- a phone does not have coaxial input
ProductTypeAttributes (3, 2, 1) -- a tv has a screen size
ProductTypeAttributes (4, 2, 3) -- a tv can have coaxial input
-- a tv does not have 4g (simple example)
Products (1, 1, CoolPhone) -- product 1 is a phone called coolphone
Products (2, 1, AwesomePhone) -- prod 2 is a phone called awesomephone
Products (3, 2, CoolTV) -- prod 3 is a tv called cooltv
Products (4, 2, AwesomeTV) -- prod 4 is a tv called awesometv
ProductAttributes (1, 1, 6) -- coolphone has a 6 inch screen
ProductAttributes (1, 2, True) -- coolphone has 4g
ProductAttributes (2, 1, 4) -- awesomephone has a 4 inch screen
ProductAttributes (2, 2, False) -- awesomephone has NO 4g
ProductAttributes (3, 3, 70) -- cooltv has a 70 inch screen
ProductAttributes (3, 4, True) -- cooltv has coax input
ProductAttributes (4, 3, 19) -- awesometv has a 19 inch screen
ProductAttributes (4, 4, False) -- awesometv has NO coax input
The reason this is not fully relational is that you'll still need to evaluate the value type (bool, int, etc) of the attribute before you can use it in a meaningful way in your code.

how to make an easy graphical tool to combine database tables in vb.net?

We have to combine 3 dinstinct sql databases into one. I've copied all tables into a single Database, and now I'd have to assign then toghether. the situation is this:
the database has 3 Tables. Call them TB1 TB2 TB3
each table has its own ID column
each table contains different informations about the same item, except for the shelf property. for example, tb1 contains shelf, size and color, tb2 contains shelf, quantity and serial number, tb3 contains shelf, price and material
a shelf can contain multiple items. So same shelf does not mean same item. But a single Item cannot be on 2 shelfes.
the ID numbers of the tables do not match. so for example ID 30 of tb1 is not the same item as ID 30 on tb2.
A item present in one table MIGHT not be present in other tables.
each table contains about 1000 rows
What I need to do is to come up with a tool that allows the user to quickly create connections between tables. My current idea is to make a form with 3 Datagridviews one next to the other, containing the 3 databases. Then when I select a row on the first Datagridview it automatically scrolls to the rows in the other two datagridviews where the shelfnumber is the same. (if there is one..) the user selects one row in each table and hits the save Button, the three ID numbers of the single tables are saved into a new table.
But maybe there is a better solution to this. maybe something graphical? easier to use then selecting single rows in each table?
Thanks
The lack of a common Primary Key across the tables makes this difficult - as I'm sure you discovered.
I'd try something like this and see how it looks in the DGV
SELECT 'tb1' Table, ID, shelf, size, color, NULL QTY, NULL SN, NULL Price, NULL Material from T1
UNION
SELECT 'tb2' Table, ID, shelf, NULL, NULL, quantity,serial_number, NULL, NULL from T2
UNION
SELECT 'tb3' Table, ID, shelf, NULL, NULL, NULL, NULL, price, material from T3
You might be able to add a SORT BY ID, Table to the bottom.
Each SELECT needs the same number of columns. Only the first SELECT is used for column headers.

How to be use Sphinx to search across large, JOINed tables?

I have several different tables in my database and I'm trying to use Sphinx to do fast full-text searches. For ease of discussion, let's say the main records of interest are packing slips, one of which is included when an order ships. How do I use Sphinx to execute complex queries across all of these tables without completely denormalizing the database?
Each packing slip lists the order number, shipper, recipient, and the tracking number of each box included with the shipment. A separate table contains information about the order items. An additional table contains the customer address information. So, orders contain boxes and boxes contain items. (Example schema listed at the bottom of this question).
I would like to be able to query Sphinx to answers to questions like:
How many people who live on a street named "Maple" ordered an item with "large" in the description?
Which orders contain include the word "blue" in either the box description or order items' description?
To answer these types of questions, I need to refer to several tables. Since Sphinx doesn't have JOINs, one option is to denormalize the database. Denormalizing using a view, so that each row represents an order item--plus all of the data of it's parent box and order, would result in billions of very wide rows. So I've been creating a separate index for each table instead. But that doesn't allow me to query across tables as a SQL JOIN would. Is there another solution?
Example database
CREATE TABLE orders (
id integer PRIMARY KEY,
date_ordered date,
customer_po varchar
);
INSERT INTO orders VALUES (1, '2012-12-13', NULL);
INSERT INTO orders VALUES (2, '2012-12-14', 'DF312442');
CREATE TABLE parties (
id integer PRIMARY KEY,
order_id integer NOT NULL REFERENCES orders(id),
party_type varchar,
company varchar,
city varchar,
state char(2)
);
INSERT INTO parties VALUES (1, 1, 'shipper', 'ACME, Inc.', 'New York', 'NY');
INSERT INTO parties VALUES (2, 1, 'recipient', 'Wylie Coyote Corp.', 'Flagstaff', 'AZ');
INSERT INTO parties VALUES (3, 2, 'shipper', 'Cyberdyne', 'Las Vegas', 'NV');
-- Please disregard the fact that this design permits multiple shippers and multiple recipients
-- per order. This is a vastly simplified version of the system I'm working on.
CREATE TABLE boxes (
id integer PRIMARY KEY,
order_id integer NOT NULL REFERENCES orders(id),
tracking_num varchar NOT NULL,
description varchar NOT NULL,
);
INSERT INTO boxes VALUES (1, 1, '1234567890', 'household goods');
INSERT INTO boxes VALUES (2, 1, '0987654321', 'kitchen appliances');
INSERT INTO boxes VALUES (3, 2, 'ABCDE12345', 'audio equipment');
CREATE TABLE box_contents (
id integer PRIMARY KEY,
order_id integer NOT NULL REFERENCES orders(id),
box integer NOT NULL REFERENCES boxes(id),
qty_units integer,
description varchar
);
INSERT INTO box_contents VALUES (1, 1, 1, 4, 'cookbook');
INSERT INTO box_contents VALUES (2, 1, 1, 2, 'baby bottle');
INSERT INTO box_contents VALUES (3, 1, 2, 1, 'television');
INSERT INTO box_contents VALUES (4, 2, 3, 2, 'lamp');
You put the JOIN in the sql_query that builds the index. The tables remain normalized, but you denormalize when building the index.
Its only a basic example, but your query would be something like.. .
sql_query = SELECT o.id,customer_po,UNIX_TIMESTAMP(date_ordered) AS date_ordered, \
GROUP_CONCAT(DISTINCT party_type) AS party_type, \
GROUP_CONCAT(DISTINCT company) AS company, \
GROUP_CONCAT(DISTINCT city) AS city, \
GROUP_CONCAT(DISTINCT description) AS description \
FROM orders o \
INNER JOIN parties p ON (o.id = p.order_id) \
INNER JOIN box_contents b ON (o.id = b.order_id) \
GROUP BY o.id \
ORDER BY NULL
Update: alternatively can use sql_joined_field to do the same but avoid actual sql_query joins. Sphinx then does the join process for you

Resources