I have a users table, with following format :
users(id,name.....,settings)
settings field is of type number and contains a bitmask of settings.
I have to support(inter alia) queries like :
find all users which have setting1, setting23, setting125
Today such a query looks like :
select * from users where bit_and(settings,2^1+2^23+2^125) = 2^1+2^23+2^125
Of course it is not a perfect implementation, but it already works in this way a lot of time.
The problem is that today we have 126 different settings and it is exactly the limit of oracle 11g for bitwise operations. That means that we can't add new settings anymore.
I'm trying to find an alternative solution to this issue.
The obvious way is instead of setting field, create table of mapping (user-->setting), like :
user_id | setting
128 | 1
128 | 23
128 | 125
But then the query like above will be like :
select *
from users u1 join settings s1 on u1.id = s1.user_id and s1.setting = 1
join settings s2 on u1.id = s2.user_id and s2.setting = 23
join settings s3 on u1.id = s3.user_id and s3.setting = 125
It doesn't look good...
So if someone can advise any solution/approach to this issue it will be very helpful...
Here's my answer to a related question.
You could easily simplify your query:
select *
from users u
join settings s
on u.id = s1.user_id
and s1.setting in (1, 23, 125)
This gives you an "or" version of the query.
select u.userid, sum(s.setting)
from users u
join settings s
on u.id = s1.user_id
and s.setting in (1, 23, 125)
group by u.userid
having sum(s.setting) = 149
Gives you the "and" version of the query.
Your new design is fundamentally OK, but assuming the "get users of given settings" will be a predominant query, you can fine-tune it in the following way...
CREATE TABLE "user" (
user_id INT PRIMARY KEY
-- Other fields ...
);
CREATE TABLE user_setting (
setting INT,
user_id INT,
PRIMARY KEY(setting, user_id),
CHECK (setting BETWEEN 1 AND 125),
FOREIGN KEY (user_id) REFERENCING "user" (user_id)
) ORGANIZATION INDEX COMPRESS;
Note the order of fields in PRIMARY KEY and the ORGANIZATION INDEX COMPRESS clause:
ORGANIZATION INDEX will cluster (store physically close together) the rows having the same setting.
COMPRESS will minimize the storage (and caching!) cost of repeated setting fields.
You can then get users connected to any of the given settings like this...
SELECT * FROM "user"
WHERE user_id IN (
SELECT user_id FROM user_setting
WHERE setting IN (1, 23, 125)
);
...which will be very quick thanks to the favorable indexing and minimized I/O.
You can also get users that have all of the give settings like this:
SELECT * FROM "user"
WHERE user_id IN (
SELECT user_id
FROM user_setting
WHERE setting IN (1, 23, 125)
GROUP BY user_id
HAVING COUNT(setting) = 3
);
Using bitfield for all settings makes it awkward to query and hard to optimize for query performance (in your old design, every query is a table scan!). OTOH, the "column per setting" design would require a separate index per column for good performance and you'd still have some less-than-elegant queries.
Also, these approaches are inflexible, unlike your new design that can be easily extended to accept more settings or to to store additional information about each setting (instead of just number) by adding another table and referencing it from the user_setting.
Store each setting in it's own column.
Related
I am building a functionality to estimate Inventory for my Ads serve platform.The fields on which I am trying to estimate with their cardinality is as below:
FIELD: CARDINALITY
location: 10000 (bengaluru, chennai etc..)
n/w speed : 6 (w, 4G, 3G, 2G, G, NA)
priceRange : 10 (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
users: contains number of users falling under any of the above combination.
Ex. {'location':'bengaluru', 'n/w':'4G', priceRange:8, users: 1000}
means 1000 users are from bengaluru having 4G and priceRange = 8
So total combination can be 10000 * 6 * 10 = 600000 and in future more fields can be added to around 29(currently it is 3 location, n/w, priceRange) and total combination can reach the order of 10mn. Now I want to estimate how many users fall under
Now queries I will need are as follows:
1) find all users who are from location:bengaluru , n/w:3G, priceRange: 6
2) find all users from bengaluru
3) Find all users falling under n/w: 3G and priceRange: 8
What is the best possible way to approach to this?
Which database can be best suited for this requirement.What indexes I need to build. Will compound index help? If yes then How ? Any help is appreciated.
Here's my final answer:
Create table Attribute(
ID int,
Name varchar(50));
Create table AttributeValue(
ID int,
AttributeID int,
Value varchar(50));
Create table userAttributeValue(
userID int,
AttributeID varchar(20),
AttributeValue varchar(50));
Create table User(
ID int);
Insert into user (ID) values (1),(2),(3),(4),(5);
Insert into Attribute (ID,Name) Values (1,'Location'),(2,'nwSpeed'),(3,'PriceRange');
Insert into AttributeValue values
(1,1,'bengaluru'),(2,1,'chennai'),
(3,2, 'w'), (4, 2,'4G'), (5,2,'3G'), (6,2,'2G'), (7,2,'G'), (8,2,'NA'),
(9,3,'1'), (10,3,'2'), (11,3,'3'), (12,3,'4'), (13,3,'5'), (14,3,'6'), (15,3,'7'), (16,3,'8'), (17,3,'9'), (18,3,'10');
Insert into UserAttributeValue (userID, AttributeID, AttributeValue) values
(1,1,1),
(1,2,5),
(1,3,9),
(2,1,1),
(2,2,4),
(3,2,6),
(2,3,13),
(4,1,1),
(4,2,4),
(4,3,13),
(5,1,1),
(5,2,5),
(5,3,13);
Select USERID
from UserAttributeValue
where (AttributeID,AttributeValue) in ((1,1),(2,4))
GROUP BY USERID
having count(distinct concat(AttributeID,AttributeValue))=2
Now if you need a count wrap userID in count and divide by the attributes passed in as each user will have 1 record per attribute and to get the "count of users" you'd need to divide by the number of attributes.
This allows for N growth of Attributes and the AttributeValues per user without changes to UI or database if UI is designed correctly.
By treating each datapoint as an attribute and storing them in once place we can enforce database integrity.
Attribute and AttributeValue tables becomes lookups for UserAttributevalue so you can translate the IDs back to attribute name and the value.
This also means we only have 4 tables user, attribute, attributeValue, and UserAttributeValue.
Technically you don't have to store attributeID on the userAttributeValue, but for performance reasons on later joins/reporting I think you'll find it beneficial.
You need to add proper Primary Key's, Foreign keys, and indexes to the tables. They should be fairly self explanatory. On UserAttributeValue I would have a few Composite indexes each with a different order of the unique key. Just depends on the type of reporting/analysis you'll be doing but adding keys as performance tuning is needed is commonplace.
Assumptions:
You're ok with all datavalues being varchar data in all cases.
If needed you could add a datatype, precision, and scale on the attribute table and allow the UI to cast the attribute value as needed. but since they are all in the same field in the database they all have to be the same datatype. and of the same precision/scale.
Pivot tables to display the data across will likely be needed and you know how to handle those (and engine supports them!)
Gotta say I loved the metal exercise; but still would appreciate feedback from others on SO. I've used this approach in 1 systems I've developed and it's been in two I've supported. There are some challenges but it does follow 3rd normal form db design (except for the replicated attributeID in userAttributevalue but that's there for performance gain in reporting/filtering.
Due to non-disclosure at my work, I have created an analogy of the situation. Please try to focus on the problem and not "Why don't you rename this table, m,erge those tables etc". Because the actual problem is much more complex.
Heres the deal,
Lets say I have a "Employee Pay Rise" record that has to be approved.
There is a table with single "Users".
There are tables that group Users together, forexample, "Managers", "Executives", "Payroll", "Finance". These groupings are different types with different properties.
When creating a "PayRise" record, the user who is creating the record also selects both a number of these groups (managers, executives etc) and/or single users who can 'approve' the pay rise.
What is the best way to relate a single "EmployeePayRise" record to 0 or more user records, and 0 or more of each of the groupings.
I would assume that the users are linked to the groups? If so in this case I would just link the employeePayRise record to one user that it applies to and the user that can approve. So basically you'd have two columns representing this. The EmployeePayRise.employeeId and EmployeePayRise.approvalById columns. If you need to get to groups, you'd join the EmployeePayRise.employeeId = Employee.id records. Keep it simple without over-complicating your design.
My first thought was to create a table that relates individual approvers to pay rise rows.
create table pay_rise_approvers (
pay_rise_id integer not null references some_other_pay_rise_table (pay_rise_id),
pay_rise_approver_id integer not null references users (user_id),
primary key (pay_rise_id, pay_rise_approver_id)
);
You can't have good foreign keys that reference managers sometimes, and reference payroll some other times. Users seems the logical target for the foreign key.
If the person creating the pay rise rows (not shown) chooses managers, then the user interface is responsible for inserting one row per manager into this table. That part's easy.
A person that appears in more than one group might be a problem. I can imagine a vice-president appearing in both "Executive" and "Finance" groups. I don't think that's particularly hard to handle, but it does require some forethought. Suppose the person who entered the data changed her mind, and decided to remove all the executives from the table. Should an executive who's also in finance be removed?
Another problem is that there's a pretty good chance that not every user should be allowed to approve a pay rise. I'd give some thought to that before implementing any solution.
I know it looks ugly but I think somethimes the solution can be to have the table_name in the table and a union query
create table approve_pay_rise (
rise_proposal varchar2(10) -- foreign key to payrise table
, approver varchar2(10) -- key of record in table named in other_table
, other_table varchar2(15) );
insert into approve_pay_rise values ('prop000001', 'e0009999', 'USERS');
insert into approve_pay_rise values ('prop000001', 'm0002200', 'MANAGERS');
Then either in code a case statement, repeated statements for each other_table value (select ... where other_table = '' .. select ... where other_table = '') or a union select.
I have to admit I shudder when I encounter it and I'll now go wash my hands after typing a recomendation to do it, but it works.
Sounds like you'd might need two tables ("ApprovalUsers" and "ApprovalGroups"). The SELECT statement(s) would be a UNION of UserIds from the "ApprovalUsers" and the UserIDs from any other groups of users that are the "ApprovalGroups" related to the PayRiseId.
SELECT UserID
INTO #TempApprovers
FROM ApprovalUsers
WHERE PayRiseId = 12345
IF EXISTS (SELECT GroupName FROM ApprovalGroups WHERE GroupName = "Executives" and PayRiseId = 12345)
BEGIN
SELECT UserID
INTO #TempApprovers
FROM Executives
END
....
EDIT: this would/could duplicate UserIds, so you would probably want to GROUP BY UserID (i.e. SELECT UserID FROM #TempApprovers GROUP BY UserID)
Currently I am designing a small twitter/facebook kind of system, where in a user should be able to see his friends latest activities.
I am using ASP.NET with MySQL database.
My Friendships table is as follows:
|Friendshipid|friend1|Friend2|confirmed|
Friend1 and Friend2 in the above table are userids.
User activities table design following:
|activityId|userid|activity|Dated|
Now, I am looking for best way to query the latest 50 friend activities for a user.
For example, let's say if Tom logs into the system, he should be able to see latest 50 activities among all his friends.
Any pointers on the best practices, a query or any information is appreciated.
It largely depends on what data is stored in the Friendships table. For example, what order are the Friend1 and Friend2 fields stored in? If, for the fields (friend1, friend2) the tuple (1, 2) exists, will (2, 1) exist also?
If this is not the case, then this should work:
SELECT activities.*
FROM Activities
INNER JOIN Friendships ON userid = friend1 OR userid = friend2
WHERE activity.userid != [my own id]
AND confirmed = TRUE
LIMIT 50;
If you have database performance concern, you may redefine the friendship table as following:
friendshipid, userid, friendid, confirmed
When you query the latest 50 activities, the SQL would be:
SELECT act.*
FROM Activities AS act
INNER JOIN
Friendships AS fs
ON fs.friendid = act.userid
AND fs.user_id = 'logon_user_id'
AND confirmed = TRUE
ORDER BY act.dated DESC
LIMIT 50;
And if there is a index on Friendships(userid) column, it would give the database the chance to optimize the query.
The friendship table redefined needs to create two tuples when a friendship occur, but it still obey the rule of business, and, has performance benefit when you need it.
For each user in my webapp, there are n related Widgets. Each widget is represented in the database in a Widgets table. Users can sort their widgets, they'll never have more than a couple dozen widgets, and they will frequently sort widgets.
I haven't dealt with database items that have an inherent order to them very frequently. What's a good strategy for ordering them? At first, I thought a simple "sortIndex" column would work just fine, but then I started wondering how to initialize this value. It presumably has to be a unique value, and it should be greater or less than every other sort index. I don't want to have to check all of the other sort indexes for that user every time I create a new widget, though. That seems unnecessary.
Perhaps I could have a default "bottom-priority" sort index? But then how do I differentiate between those? I suppose I could use a creation date flag, but then what if a user wants to insert a widget in the middle of all of those bottom-priority widgets?
What's the standard way to handle this sort of thing?
If you have users sorting widgets for their own personal tastes, you want to create a lookup table, like so:
create table widgets_sorting
(
SortID int primary key,
UserID int,
WidgetID int,
SortIndex int
)
Then, to sort a user's widgets:
select
w.*
from
widgets w
inner join widgets_sorting s on
w.WidgetID = s.WidgetID
inner join users u on
s.UserID = u.UserID
order by
s.SortIndex asc
This way, all you'll have to do for new users is add new rows to the widgets_sorting table. Make sure you put a foreign key constraint and an index on both the WidgetID and the UserID columns.
These lookup tables are really the best way to solve the many-to-many relationships that are common with this sort of personalized listing. Hopefully this points you in the right direction!
The best way for user-editable sorting is to keep the id's in a linked list:
user_id widget_id prev_widget_id
---- ---- ----
1 1 0
1 2 8
1 3 7
1 7 1
1 8 3
2 3 0
2 2 3
This will make 5 widgets for user 1 in this order: 1, 7, 3, 8, 2; and 2 widgets for user 2 in this order: 3, 2
You should make UNIQUE indexes on (user_id, widget_id) and (user_id, prev_widget_id).
To get widgets in intended order, you can query like this, say, in Oracle:
SELECT w.*
FROM (
SELECT widget_id, level AS widget_order
FROM widget_orders
START WITH
user_id = :myuser
AND prev_widget_id = 0
CONNECT BY
user_id = PRIOR user_id
AND prev_widget_id = PRIOR widget_id
) o
JOIN widgets w
ON w.widget_id = o.widget_id
ORDER BY
widget_order
To update the order, you will need to update at most 3 rows (even if you move the whole block of widgets).
SQL Server and PostgreSQL 8.4 implement this functionality using recursive CTEs:
WITH
-- RECURSIVE
-- uncomment the previous line in PostgreSQL
q AS
(
SELECT widget_id, prev_widget_id, 1 AS widget_order
FROM widget_orders
WHERE user_id = #user_id
UNION ALL
SELECT wo.widget_id, wo.prev_widget_id, q.widget_order + 1
FROM q
JOIN wo.widget_orders wo
ON wo.user_id = #user_id
AND wo.prev_widget_id = q.widget_id
)
SELECT w.*
FROM q
JOIN widgets w
ON w.widget_id = q.widget_id
ORDER BY
widget_order
See this article in my blog on how to implement this functionality in MySQL:
Sorting lists
I like to use a two-table approach - which can be a bit confusing but if you're using an ORM such as ActiveRecord it's easy, and if you write a bit of clever code it can be manageable.
Use one table to link user to sorting, and one table to link widget and position and sorting. This way it's a lot clearer what's going on, and you can use an SQL join or a seperate query to pull the various data from the various tables. Your structure should look like this:
//Standard user + widgets table, make sure they both have unique IDs
CREATE TABLE users;
CREATE TABLE widgets;
//The sorting tables
CREATE TABLE sortings (
id INT, //autoincrement etc,
user_id INT
)
CREATE TABLE sorting_positions (
sorting_id INT,
widget_id INT,
position INT
)
Hopefully this makes sense, if you're still confused, comment on this message and I'll write you up some basic code.
Jamie
If you mean that each user assigns his own sort order to the widgets, then Eric's answer is correct. Presumably you then have to give the user a way to assign the sort value. But if the number is modest as you say, then you can just give him a screen listing all the widgets, and either let him type in the order number, or display them in order and put up and down buttons beside each, of if you want to be fancy, give him a way to drag and drop.
If the order is the same for all users, the question becomes, Where does this order come from? If it's arbitrary, just assign a sequence number as new widgets are created.
I have a postgres database with a user table (userid, firstname, lastname) and a usermetadata table (userid, code, content, created datetime). I store various information about each user in the usermetadata table by code and keep a full history. so for example, a user (userid 15) has the following metadata:
15, 'QHS', '20', '2008-08-24 13:36:33.465567-04'
15, 'QHE', '8', '2008-08-24 12:07:08.660519-04'
15, 'QHS', '21', '2008-08-24 09:44:44.39354-04'
15, 'QHE', '10', '2008-08-24 08:47:57.672058-04'
I need to fetch a list of all my users and the most recent value of each of various usermetadata codes. I did this programmatically and it was, of course godawful slow. The best I could figure out to do it in SQL was to join sub-selects, which were also slow and I had to do one for each code.
This is actually not that hard to do in PostgreSQL because it has the "DISTINCT ON" clause in its SELECT syntax (DISTINCT ON isn't standard SQL).
SELECT DISTINCT ON (code) code, content, createtime
FROM metatable
WHERE userid = 15
ORDER BY code, createtime DESC;
That will limit the returned results to the first result per unique code, and if you sort the results by the create time descending, you'll get the newest of each.
I suppose you're not willing to modify your schema, so I'm afraid my answe might not be of much help, but here goes...
One possible solution would be to have the time field empty until it was replaced by a newer value, when you insert the 'deprecation date' instead. Another way is to expand the table with an 'active' column, but that would introduce some redundancy.
The classic solution would be to have both 'Valid-From' and 'Valid-To' fields where the 'Valid-To' fields are blank until some other entry becomes valid. This can be handled easily by using triggers or similar. Using constraints to make sure there is only one item of each type that is valid will ensure data integrity.
Common to these is that there is a single way of determining the set of current fields. You'd simply select all entries with the active user and a NULL 'Valid-To' or 'deprecation date' or a true 'active'.
You might be interested in taking a look at the Wikipedia entry on temporal databases and the article A consensus glossary of temporal database concepts.
A subselect is the standard way of doing this sort of thing. You just need a Unique Constraint on UserId, Code, and Date - and then you can run the following:
SELECT *
FROM Table
JOIN (
SELECT UserId, Code, MAX(Date) as LastDate
FROM Table
GROUP BY UserId, Code
) as Latest ON
Table.UserId = Latest.UserId
AND Table.Code = Latest.Code
AND Table.Date = Latest.Date
WHERE
UserId = #userId