Lets say I have a table that stores user data. It stores 2 types of data - a UserId (partition key) with attributes (json blob). The other type is a reference to the UserId, based off of values within the attributes, for example, here would be 3 rows of the table:
pk attributes userId
5 | { email: example#example.com, tel: 123456789 } | null
email/example#example.com | null | 5
phone/123456789 | null | 5
This is so I am able to query directly off of values to obtain attributes, without needing to do a scan and filter (a very compute intensive operation on large tables).
My question is: Can I, in a single query, do something like getByPartitionKey(email/example#example.com), obtain the userId, and then use that userID to query for the whole attributes document, without doing 2 individual requests? Something akin to a join in SQL.
Your data model is very wrong, here is how to achieve what you want:
pk
sk
phone
email
other
user123
user123
0293480983
example#example.com
some map {}
SELECT * FROM mytable WHERE PK = 'user123'
This would allow you to get all of the information for a given userId. If you want the same information but this time by email, you create a GSI on the email attribute:
email
pk
sk
phone
other
example#example.com
user123
user123
0293480983
some map {}
SELECT * FROM mytable.myindex WHERE email = 'example#example.com'
Related
I have two tables (e.g.):
Users (ID, firstName,middleName, lastName)
Contacts (ID, userID, serialNo, phoneNumber, eMail).
I shall be communicating (sending messages) to Users via phoneNumber or eMail or both and save it in Database (e.g.).
Log (ID, userID, contactID, message, onPhoneOrEmail) where, say, last field stores, say, 'p','e' or 'b', for phoneNumber, eMail, or both.
So, when I check logs, I can get to know that which message was sent to which email/phonenumber.
Problem:
What to do when Users change their contact details?
If I update the Contacts table, I lose Log, because the messages were not sent to the new number.
If I store the number or email in Logs, it would be to much of data to store (on large scale as compared to just one character).
Last: If I add new Contact with +1 serial number (serialNo - field), will it be feasible ? What about performance issues ? (uniqueness is not required, Users can changes number or email as many as times as they want - these are just for communcation).
I read this and this, but could not get an approriate answer regarding performance/methodological issues.
Please guide.
SAMPLE DATA:
USERS
| 1 | John | null | Cena |
CONTACTS
| 1 | 1 | 1 | 123456 | abc#xyz.com
| 2 | 1 | 2 | null | xyz#mnp.com
| 3 | 1 | 3 | 987654 | null
If you say that a User can change his contact detail this means that you inverted the dependency. The User has the Contact, so it is reasonable to associate to a user a contactID and not the opposite. Now, a User can change e.g. phone whenever he wants, and at the same time it make no sense for the same phone number to change its user at some point.
So it would be turned like this:
User(ID, firstName,middleName, lastName, contactID)
Contact(ID,serialNo, phoneNumber, eMail)
Log (ID, userID, message, onPhoneOrEmail).
You don't need both userID and contactID on Log. Remember that one is foreign key for the other (transitive dependency).
EDIT
If you need to store multiple contacts per User, keep your schema but change the Log in
Log (ID, contactID, message, onPhoneOrEmail)
From my point of view, when you need to change contact of a user it means that you will remove one and add another. If you have never sent any message to that contact you are removing, you have no reason to keep it in memory, otherwise, if you need a record you have to maintain the contact information in memory even after you have replaced it (maybe a column saying it is invalid is preferable). This is already the default behavior in mySQL (ON DELETE RESTRICT).
Get rid of your Contact table.
Create a new UserPhone table (PK - ID, FK - User.Id, Phone#, ActiveDate)
Create a new UserEmail table (PK - ID, FK - User.Id, Email,
ActiveDate).
It looks like SerialNumber is just an incrementer for
one User's Contact data. If it is just an incrementer, ActiveDate
should suffice as a replacement.
When phone, email information changes do not update existing record, add new record with today's date instead.
Your Log table will look like (PK - LogID, FK - UserEmail.ID, FK - UserPhone.ID).
No need for the PhoneOrEmail field. That information can be determined by presence of the FKs.
You might have some other design issues but this answer should get you on the right track.
I'm currently designing my tables. i have three types of user which is, pyd, ppp and ppk. Which is better? inserting data in one row or in multiple row?
which is better?
or
or any suggestion? thanks
I would go for 3 tables:
user_type
typeID | typeDescription
Main_table
id_main_table | id_user | id_type
table_bhg_i
id_bhg_i | id_main_table | data1 | data2 | data3
Although I see you are inserting IDs for each user , I don't quite understand how are are you going to differentiate between the users , had I designed this DB , I would have gone for tables like
tableName: UserTypes
this table would contain two field first would be ID and second would be type of user
like
UsertypeID | UserType
the UsertypeID is a primary key and can be auto increment , while UserType would be your users pyd ,ppk or so on . Designing in this way would give you flexibility of adding data later on in the table without changing the schema of the table ,
the next you can edit a table for generating multiple users of a particular type, this table would refer the userID of the previous table , this will help you adding new user easily and would remove redundancy
tableName:Users
this table would again contain two fields, the first field would be the id call and the secind field would be the usertypeId try
UserId |UserName | UserTypeID
the next thing you can do is make a table to insert the data , let the table be called DataTable
tableName: DataTable
this table will contain the data of the users and this will reference then easily
DataTabID | DataFields(can be any in number) | UserID(refrences Users table)
these tables would be more than sufficient .If doubts as me in chatbox
I am trying to store meta data about a document into a SQL Server. The document are stored into a document archive, and returns back an identifier so I can get back that document by asking the archive to get the document by identifier.
Our user would like to be able to search for this document based on different meta data. The meta data could be 1 attribute or 5 depending on the document type, and the users should be able to create new document types from a admin site.
I can see two solution here. One is that each documenttype gets it's own metadata table, where all metadata attributes are predefined, and if one should be added a new column needs to be created. And if a new documenttype is created a new metadata table needs to be created. Our DBA will freak out with a solution like this, and I also see a problem with indexes. Because if the documenttype has 5 different meta data attributes it needs to be searchable with 1 or 4 of them specified in the search. Then I would need to write index for all the different combinations of possible searchs.
here is an example (fictiv)
|documentId | Name | InsertDate | CustomerId | City
| 1 | John | 2014-01-01 | 2 | London
| 2 | John | 2014-01-20 | 5 | New York
| 3 | Able | 2014-01-01 | 10 | Paris
I could here say:
Give me all documents where Name = 'John'
Give me all documets where Name = 'John' And CustomerId = 5
Give me all document where InserDate = '2014-01-01' and City = 'London'
This will be 3 differnet indexes and then I haven't coverd all possible combinations. This isn't practical.
So I am look in to the evil 'EAV' (anti)pattern.
So instead of having the metadata as columns I can have the as rows.
|documentId | MetaAttribute | MetaValue
| 1 | Name | John
| 1 | InsertDate | 2014-01-01
| 1 | CustomerId | 2
| 1 | City | London
| 2 | Name | John
| 2 | InsertDate | 2014-01-20
| 2 | CustomerId | 5
| 2 | City | New York
| 3 | Name | Able
| 3 | InserDate | 2014-01-01
| 3 | CustomerId | 10
| 3 | City | Paris
Here it's simple to create one index om MetaAttribute och metaValue, and it's covered. If a new documenttype is created, new metadata can be created with that documenttype into a MetaAttributeTable (that contains all MetaAttribute for the different documenttype). So no need to create new tables or coulms if a new documenttype is added or if a new attribute is added to a documenttype. Instead all MetaValues most be strings :( and the SQL Query to find the document id is a bit more complicated.
This is what I figured out. (In this example the MetaAttribute is a string, but would be an ID to the MetaAttribute Table)
SELECT * FROM [Document]
WHERE ID IN (SELECT documentId FROM [MetaData]
WHERE ((MetaAttribute = 'Name' AND MetaValue = 'John')
OR (MetaAttribute = 'CustomerId' and MetaValue = '5'))
GROUP BY [documentId]
HAVING Count(1) = 2)
Here I need to ask if the Name = 'John' and CustomerId = 5. I do that by finding all records that have Name = 'John' and CustomerId = '5' and the Group it on the documentId and count number of items in the group. If I got 2 then both Name = 'John' and CustomerId = '5' is true for this search. Return the documentId and use that to retrive information about the document, like the document archive storage id.
There should be a better SQL statement for this isn't there?
So my question is. Is there a better approche than these 2. Is the EAV-pattern so bad that I should stick with the first approche and have a Freaked out DBA and "ten millions of indexes"
We are talking about a system that will have around 10-20 millions of new records each month, and contain data for at least 3 years.... So the tables will be preatty big and good indexes are neccasary for performance.
Best Regards
Magnus
The EAV model is appealing if you have unbounded attributes--that is, anyone can set up anything as an attribute. However, it sounds from your description that this is not the case--the possible document attributes come from a known and fairly limited set. If this is the case, routine normalization suggests the following:
-- One per document
CREATE TABLE Document
(
DocumentId -- primary key
,DocumentType
,<etc>
)
-- One per "type" of document
CREATE TABLE DocumentType
(
DocumentTypeId -- pirmary key
,Name
)
-- One per possible document attribute.
-- Note that multiple document types can reference the same attribute
CREATE TABLE DocumentAttributes
(
AttributeId -- primary key
,Name
)
-- This lists which attributes are used by a given type
CREATE TABLE DocumentTypeAttributes
(
DocumentTypeId
,AttributeId
-- compound primary key on both columns
-- foeign keys on both columns
)
-- This contains the final association of document and attributes
CREATE TABLE DocumentAttributeValues
(
DocumentId
,AttributeId
,Value
-- compound primary key on DocumentId, AttributeId
-- foeign keys on both columns ot their respective parent tables
)
A tighter model with more robust keys could be implemented to ensure at the database level that an attribute cannot be assigned to a document with an “inappropriate” type.
Queries have to use joins, but (presumably) only the Documents and DocumentAttributes tables will ever be large. An index on on (AttributeId + Value) facilitiate lookups by attribute type, and depending on cardinality an index on (Value + AttributeId) could make searches for specific attributes quite efficient.
(Edit)
Ooh, clever, I created two tables with the same name. I've renamed the last one to DocumentAttributeValues. (Free advice is clearly worth what you paid for it!)
This shows how ugly these systems can get in SQL, as you have to “look up” both attributes separately. On the plus side you don’t have to worry about “does this type go with this document”, as those rules have (better had) been applied when the data was loaded. Two examples:
This one spells everything out in joins, and as such I think it might perform worse than the next:
-- Top-down
SELECT do.DocumentId
from Documents do
inner join DocumentAttributes da1
on da.Name = 'Name'
inner join DocumentAttributeValues dav1
on dav1.AttributeId = da1.AttributeId
and dav1.Value = 'John'
inner join DocumentAttributes da2
on da2.Name = 'CustomerId'
inner join DocumentAttributeValues dav2
on dav2.AttributeId = da2.AttributeId
and dav2.Value = '5'
This one picks out the attributes, then finds which documents have all of them. It might perform better, as there’s one less table to process:
-- Bottom-up
SELECT xx.DocumentId
from (-- All documents with name "John"
select dav.DocumentId
from DocumentAttributes da
inner join DocumentAttributeValues dav
on dav.AttributeId = da.AttributeId
where da.Name = 'Name'
and dav.Value = 'John'
-- This combines the two sets, with "all" keeping any duplicate entries
union all
-- All documents with CustomerId = "5"
select dav.DocumentId
from DocumentAttributes da
inner join DocumentAttributeValues dav
on dav.AttributeId = da.AttributeId
where da.Name = 'CustomerId'
and dav.Value = '5') xx -- Have to give the subquery an alias
group by xx.DocumentId
having count(*) = 2
While further refinements might be possible, the more more attributes you’re filtering on, the uglier the queries will be. Five attributes max might work ok in SQL, but if you’ve got tons of attributes, a NoSQL solution might be what you’re looking for.
(Please note that, as with my original post, I have not tested this code, so there may be typos or subtle--or not so subtle--errors in here.)
SQL Server 2008+ offers three related features for dealing with such cases:
Sparse Columns which allow you to define hundreds of columns even if only a subset are used at a time
Column Sets allow you to group these columns and treat them as a group
Filtered indexes can index only the rows that actually have values in them.
These features allow you to work with more-or-less normal SQL statements to handle all metadata columns.
These features were specifically added to address the EAV/metadata scenario.
EDIT
If you have a limited set of attributes that are always filled, there is no need for Sparse Columns or the EAV anti-pattern either.
You can create your tables as you normally would and add indexes to optimize the real workload you encounter. Certain types of queries will occur far more often than others and SQL Server's Index tuning advisor can propose the indexes and statistics to use based on a trace captured using SQL Server's Profiler.
It's quite possible that only a subset of the columns will accelerate searches and the rest can be added as include columns in the index.
Full Text Search
A more powerful option is to use SQL Server's Full Text Search. This will allow you to execute queries using arbitrary attributes. This is another technique using by document/content management systems, ERPs and CRMs to handle arbitrary attributes.
With FTS you simply specify the columns to include in one FTS index and don't have to create separate indexes for each attribute.
You can use FTS predicates in SELECT queries like this:
SELECT Name, ListPrice
FROM Production.Product
WHERE ListPrice = 80.99
AND CONTAINS(Name, 'Mountain')
This can result in much simpler queries (you just write a modified select) and administration (no worries about column order in indexes, only one FTS index to manage)
I have a system that allows a person to select a form type that they want to fill out from a drop down box. From this, the rest of the fields for that particular form are shown, the user fills them out, and submits the entry.
Form Table:
| form_id | age_enabled | profession_enabled | salary_enabled | name_enabled |
This describes the metadata of a form so the system will know how to draw it. So each _enabled column is a boolean true if the form should include a field to be filled out for this column.
Entry Table:
| entry_id | form_id | age | profession | salary | name | country |
This stores a submitted form. Where age, profession, etc stores the actual value filled out in the form (or null if it didn't exist in the form)
Users can add new forms to the system on the fly.
Now the main question: I would like to add the ability for a user designing a new form to be able to include a list of possible values for an attribute (e.g. profession is a drop down list of say 20 professions instead of just a text box when filling out the form). I can't simply store a global list of possible values for each column because each form will have a different list of values to pick from.
The only solution I can come up with is to include another set of columns in Form table like profession_values and then store the values in a character delimited format. I am concerned that a column may one day have a large number of possible values and this column will get out of control.
Note that new columns can be added later to Form if necessary (and thus Entry in turn), but 90% of forms have the same base set of columns, so I think this design is better than an EAV design. Thoughts?
I have never seen a relational design for such a system (as a whole) and I can't seem to figure out a decent way to do this.
Create a new table to contain groups of values:
CREATE TABLE values (
id SERIAL,
group INT NOT NULL,
value TEXT NOT NULL,
label TEXT NOT NULL,
PRIMARY KEY (id),
UNIQUE (group, value)
);
For example:
INSERT INTO values (group, value, label) VALUES (1, 'NY', 'New York');
INSERT INTO values (group, value, label) VALUES (1, 'CA', 'California');
INSERT INTO values (group, value, label) VALUES (1, 'FL', 'Florida');
So, group 1 contains three possible values for your drop-down selector. Then, your form table can reference what group a particular column uses.
Note also that you should add fields to a form via rows, not columns. I.e., your app shouldn't be adjusting the schema when you add new forms, it should only create new rows. So, make each field its own row:
CREATE TABLE form (
id SERIAL,
name TEXT NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE form_fields (
id SERIAL,
form_id INT NOT NULL REFERENCES form(id),
field_label TEXT NOT NULL,
field_type INT NOT NULL,
field_select INT REFERENCES values(id),
PRIMARY KEY (id)
);
INSERT INTO form (name) VALUES ('new form');
$id = last_insert_id()
INSERT INTO form_fields (form_id, field_label, field_type) VALUES ($id, 'age', 'text');
INSERT INTO form_fields (form_id, field_label, field_type) VALUES ($id, 'profession', 'text');
INSERT INTO form_fields (form_id, field_label, field_type) VALUES ($id, 'salary', 'text');
INSERT INTO form_fields (form_id, field_label, field_type, field_select) VALUES ($id, 'state', 'select', 1);
I think you are starting from the wrong place entirely.
| form_id | age_enabled | profession_enabled | salary_enabled | name_enabled |
Are you just going to keep adding to this table for every single for field you can ever have? Generically the list could be endless.
How will your application code display a form if all the fields are in columns in this table?
What about a form table like this:
| form_id | form description |
Then another table, formAttributes with one row per entry on the form:
| attribute_id | form_id | position | name | type |
Then a third table forAttributeValidValues with one row per attribute valid value:
| attribute_id | value_id | value |
This may seem like more work to begin with, but it really isn't. THink about how easy it is to add or remove new attribute or value to a form. Also think about how your application will render the form:
for form_element in (select name, attribute_id
from formAttributes
where form_id = :bind
order by position asc) loop
render_form_element
if form_element.type = 'list of values' then
render_values with 'select ... from formAttributeValidValues'
end if
end loop;
The dilema will then become how to store the form results. Ideally you would store them with 1 row per form element in a table that is something like:
| completed_form_id | form_id | attribute_id | value |
If you only ever work on one form at a time, then this model will work well. If you want to do aggregations over lots of forms, then the resulting queries become more difficult, however that is reporting, which can run in a different process to the online form entry. You can start to think of things that pivot queries to transform the rows in into columns or materialized view to pull together forms of the same type etc.
An application I am developing needs to provide access to data based on a list of cities defined for each client. A client can have:
access to all cities in a country OR
access to all cities in a state / region OR
access to select cities in any state
or country.
What would be the best way to define this in the database (if the db has a Country table, State / Region table, City table and a Client table)?
Clarification:
(A simplified view of the tables with only the essential columns pertaining to this question).
Country table -
idCountry | Name
State table -
idState | idCountry | Name
City table -
idCity | idState | Name
Client table -
idClient | Name
You could to create a Location self related table (Id, Name, ParentLocation) and a AccessControl table (ClientId, LocationId). When a client is related to a location, you could grant access to all location below it. Some examples:
ID Name Parent
-------------------
1 World NULL -- Need to represent all countries
2 Brazil 1 -- A country
3 São Paulo 2 -- A state
4 São Paulo 3 -- A city
If you want to stick your current model, maybe a table like (ClientId, CountryId nullable, StateId nullable, CityId nullable). This way you could define your security access as your definition, but would need to deal with nullable fields.