Is there any DB server which could support following operations? - database

I need to store a list of strings as field along with the Id: listId, <list>.
Now I need following operations in order O(1) time:-
Removing a given string from an existing listId.
Adding a new string in an existing listId.
Is there any DB which could support above operations? Having HashSet as one of its datatype would help. Note that I need a highly scale-able solution where list could have 10Mn keys in 1000+ listIds.
I understand that such datatype if exists in any database would have considerable indexing overhead. I believe that chances are really slim for something similar to exist. If not, then I would implement something myself.

What you describe sounds like a textbook case for normalization.
You'd have two tables: one that contains the lists, and another that contains the list elements.
They are linked through the list ID:
Lists table:
id name (+ whatever else you need)
List elements table:
id listId (connected to an id in the lists table) (+ whatever else you need)

Related

SQL DB schema best practice for List item table

I have a table say Table1 which has following columns
1. Id
2. Name
3. TransportModeId
4. ParkingId
5. ActivityId
Column 3,4,5 are the foreign keys and all three are simple list tables which has following columns
1. Id
2. Item
For simplicity I have shown 3 tables otherwise my actual schema contains almost 25 List table.
What should be the best Practice
Option 1.
Keep all list table separate which will create 25 tables but on the other hand i will have a clean modular schema
Option 2.
Make a table with self join and add all the items in that table in which ParentId null will represent the name of the table and it can have more than one references in other tables as described above and it has to be kept in some kind of common module
thanks
Option 1 is the way how it is normally done when designing a system that is not supposed to be very configurable by end user/implementator. It has several important advantages, two of them:
when you need to add an extra attribute to any of the enumerations (e.g. parking location to the Parking enumeration), it is quite simple and does not produce extra problems.
It is optimized for speed using relation database engine's native algorithms for linking records.
As for Option 2:
It is something called Generalization. You take more types with similar attributes (methods) and create a class/table with a structure that fits different purposes.
The self reference, as you speak about it, is not a good idea for Option 2, rather make a reference to another EnumerationType table containing type names like Parking, Activity etc. with id.
Using this approach could make sense in case you need to enable end user to configure the attributes himself within your app. But otherwise it could cause you problems when you find out, that different enumeration tables need to have different structures.

Finding a suitable data structure for deletion from both lists

This might be deleted, since involves idea sharing which is not quite allowed in stack overflow, but still before that if I could get any ideas from solid programmers, it will be a win situation for me
Assume that you have a class Student, stored in the database, and this class has a list property called favoriteTeachers. This list constantly gets updated by the system and involves the id of teachers.
You also have a class Teacher, also stored in database and likewise has a list property favouriteStudents. It is again updated constantly and involves the id's of students.
In our system, when a student calls a function (let's say notMyFavoriteTeacher), our system has to apply the changes below;
Delete the given teacher's id from favouriteTeacher list
Delete the student's id from given teacher's favouriteStudent list
I've tried to consider the number of rows updated could exhaust the database so instead of mapping the students with their favorite teachers in a separate table as user_id, teacher_id, instead I created a column and stored a string which contains the teachers id's separated by comma. (Ex: "1,2,14,4,25"). Same applied for the teacher as well.
However when we call this function, we also face another problem. In order for this operation to be done, you need to convert the string to list, find the element by linear search and later on delete, and later on convert list to string and push back to db. And you have to do the other operation for the teacher class as well. If we did not apply the string method, deletion would be easier but since we would be handling deletion and addition operations for like 2k times a day, i did not think it would be feasible to use separate tables.
I wanted to ask in order to decrease the number of operations, could a data structure be chosen such that it would increase the efficiency?
Storing an relation as an array in a single column is a violation of first normal form, and should not be done without good reason. Although various forms of denormalization may result in increased efficiency in some cases, I don't see this case being one of those. What's worse, you'll get no help from the database in enforcing referential integrity. And some operations will result in guaranteed row scans: When deleting a teacher, you will have to examine every row of every student to remove the teacher from each student's favorite list. Same goes for deleting a student.
Relational Databases are designed and built to link rows to other rows. You need a very good reason to keep them from doing what they're design to do. You should go ahead and design a proper relational schema, and only if actual measurement shows that it is too slow should you worry about its performance.
First of all, I don't understand your choice of storing ids of favorite teachers/students as comma separated strings, because either in the case of comma separated values or in case of a table with studentId, teacherId structure, you do exactly 2 row updates/deletes (first in the favoriteTeachers table, second in the favoriteStudent table).
But one way of optimizing performance given your current data structure would be keeping the comma separated strings sorted. I mean from the very formation of rows, keep your comma separated ids like "1, 5, 7, 15". This way, if you convert it to a list, you could perform binary search and it would take Log(n) time instead of n.
You are losing all the benefits provided by any RDBMS by storing it as a list of strings. Create a separate table with Student_id and favorite teacher_id. Apply filtering conditions (either for student or for teacher) before joining it to main tables.

database design for dictionary application

Currently I'd like to develop dictionary application for mobile device. The dictionary itself use offline file/database to translate the word. it just translates for two languages, for example english - spanish dictionary.
I've a simple design in my mind. it would be two tables: English Table and Spanish Table.
for each table contain of:
word_id = the id which would be a foreign key for other table
word = the word
word_description
correspond_trans_id = the id of other table which is the translation for this word to other language.
and also because of this is for mobile application, the database use SQLite.
The definition data for each table has been provided order by field 'word' on the table. However I'm still thinking the problem if there is addition for the data definition. Because the table would be order by field 'word', is there any method to put (insert) the new record still in order by word ? or any idea to make it more efficient ?
At least it for each translation there are a few translation possibilities depending on the context. if you like to do a bidirectional dictionary for two languages you need at least three tables:
ENGLISH
ID | WORD
1 | 'dictionary'
GERMAN
ID | WORD
1 | 'lexikon'
2 | 'wörterbuch'
TRANSLATION_EN_DE
ID_EN | ID_DE
1 | 1
1 | 2
The first two tables are containing all the words that are known in that language and the bidirectional mapping is done by the 3rd mapping table. this is a common n:n mapping case.
with two more tables you're always able to add a new language into you're dicitionary. If you're doing it with one table you'll have multiple definitions for a single word thus no normalized db.
you can also merge your language tables into a single table defining the words language by another column (referencing a language table). in that case you'll need a 2-column index for the language and the word itsself.
What do you intend to do when a word in language 1 can be translated by more than one word in language 2? I think you have to use something like wursT's design to handle that.
RE inserting records in alphabetical order: You do not normally worry about the physical ordering of records in a database. You use an ORDER BY clause to retrieve them in any desired order, and an index to make it efficient. There is nothing in the SQL standard to control physical ordering. Umm, I recall coming across something about forcing a physical ordering on some database I worked with, I think it was MySQL, but most will not give you any control of this. I haven't worked with SQLite so I can't say if it provides a way.
Surely the relationship between words and their possible translations is one-to-many or many-to-many. I'm not clear how you will represent this in your model. Seems like you may need at least one more table.
I agree with Matt - To make life much more easier I would stick with one table. Also if you plan to use CoreData, the index modelling of traditional database design is different to the object graph based model when working in Obj. C/IOS.
It's very easy to think along the traditional lines of Select querying and inner / outer joins but for example your column 'correspond_trans_id' would normally be handled by setting a 'relationship' when defining your data model for the two tables (if you are using CoreData of course).
In essence unless there is a good reason to have two tables I would stick with just one.
In relation to the ordering, you might not need to keep the order of words in the dataset. I'm guessing you want to keep everything Alphabetical which would involve some work if the data were to ever change, even for just one table.
Again using CoreData, NSFetchRequest and NSSortDescriptor, it is very easy to return a set of records ordered by a specified column, freeing you from having to worry about amends and additions to your database.
If you have any questions give me a shout.

Should the descriptive tags associated with an entity be stored in a separate database table?

I have a Questions model, and just like StackOverflow, each question can be tagged with multiple descriptive tags by a user.
What I'm trying to decide is whether it's necessary for the Tags associated with a question to be stored in a separate table in the database.
Or could I store the Tags as a single field of the Questions table as a list of space-separated strings?
I'm not sure which makes more sense - is there any good reason to separate the data?
Using a comma-separated string for a multi-valued attribute is another SQL Antipattern. :-)
How long does the string need to be? Stated another way: how many tags can a given entry have? (It depends on how long the individual tags are.)
How do you account for strings that contain the separator character? What if a character you currently use as a separator becomes a legitimate character in a tag?
How do you insert or delete elements from the list in SQL? (You have to fetch the whole list into the application, explode the list, filter through it, and re-post it to the database.)
How can you do aggregates like COUNT(*) in SQL?
How do you search efficiently for all entries that share a given tag? (You have to use costly pattern-matching queries.)
The solution is to use a separate table, as most other folks on this thread are advising.
Separating tags into their own table, plus a further table with a many:many relationship between Tags and Questions, is what's known in relational land ad "normal form". It makes it easier and faster to perform tasks such as getting all questions tagged with a certain tag, finding the most popular tags, &c.
(Just in case you don't know -- a "many:many relationship" is a table with just two columns [a foreign key into Tags and one into Questions] and no uniqueness constraints).
I would put the questions in 1 table, the tags in 1 table, and have a seperate table to connect the tags to questions. This would be the best way to build that database. It keeps all tags consistant and highly reduces redundency.
By seperating the data like this, your can assure that searching for a specific tag will bring back the same items. You don't have to worry about whether the tag is spelled the same throughout all the questions. Also, you can limit the tag options easier this way.
You should definitely store the tags in a separate table, it makes everything easier, and that's the whole idea of a 'relational' database.

How do you manage "pick lists" in a database

I have an application with multiple "pick list" entities, such as used to populate choices of dropdown selection boxes. These entities need to be stored in the database. How do one persist these entities in the database?
Should I create a new table for each pick list? Is there a better solution?
In the past I've created a table that has the Name of the list and the acceptable values, then queried it to display the list. I also include a underlying value, so you can return a display value for the list, and a bound value that may be much uglier (a small int for normalized data, for instance)
CREATE TABLE PickList(
ListName varchar(15),
Value varchar(15),
Display varchar(15),
Primary Key (ListName, Display)
)
You could also add a sortOrder field if you want to manually define the order to display them in.
It depends on various things:
if they are immutable and non relational (think "names of US States") an argument could be made that they should not be in the database at all: after all they are simply formatting of something simpler (like the two character code assigned). This has the added advantage that you don't need a round trip to the db to fetch something that never changes in order to populate the combo box.
You can then use an Enum in code and a constraint in the DB. In case of localized display, so you need a different formatting for each culture, then you can use XML files or other resources to store the literals.
if they are relational (think "states - capitals") I am not very convinced either way... but lately I've been using XML files, database constraints and javascript to populate. It works quite well and it's easy on the DB.
if they are not read-only but rarely change (i.e. typically cannot be changed by the end user but only by some editor or daily batch), then I would still consider the opportunity of not storing them in the DB... it would depend on the particular case.
in other cases, storing in the DB is the way (think of the tags of StackOverflow... they are "lookup" but can also be changed by the end user) -- possibly with some caching if needed. It requires some careful locking, but it would work well enough.
Well, you could do something like this:
PickListContent
IdList IdPick Text
1 1 Apples
1 2 Oranges
1 3 Pears
2 1 Dogs
2 2 Cats
and optionally..
PickList
Id Description
1 Fruit
2 Pets
I've found that creating individual tables is the best idea.
I've been down the road of trying to create one master table of all pick lists and then filtering out based on type. While it works, it has invariably created headaches down the line. For example you may find that something you presumed to be a simple pick list is not so simple and requires an extra field, do you now split this data into an additional table or extend you master list?
From a database perspective, having individual tables makes it much easier to manage your relational integrity and it makes it easier to interpret the data in the database when you're not using the application
We have followed the pattern of a new table for each pick list. For example:
Table FRUIT has columns ID, NAME, and DESCRIPTION.
Values might include:
15000, Apple, Red fruit
15001, Banana, yellow and yummy
...
If you have a need to reference FRUIT in another table, you would call the column FRUIT_ID and reference the ID value of the row in the FRUIT table.
Create one table for lists and one table for list_options.
# Put in the name of the list
insert into lists (id, name) values (1, "Country in North America");
# Put in the values of the list
insert into list_options (id, list_id, value_text) values
(1, 1, "Canada"),
(2, 1, "United States of America"),
(3, 1, "Mexico");
To answer the second question first: yes, I would create a separate table for each pick list in most cases. Especially if they are for completely different types of values (e.g. states and cities). The general table format I use is as follows:
id - identity or UUID field (I actually call the field xxx_id where xxx is the name of the table).
name - display name of the item
display_order - small int of order to display. Default this value to something greater than 1
If you want you could add a separate 'value' field but I just usually use the id field as the select box value.
I generally use a select that orders first by display order, then by name, so you can order something alphabetically while still adding your own exceptions. For example, let's say you have a list of countries that you want in alpha order but have the US first and Canada second you could say "SELECT id, name FROM theTable ORDER BY display_order, name" and set the display_order value for the US as 1, Canada as 2 and all other countries as 9.
You can get fancier, such as having an 'active' flag so you can activate or deactivate options, or setting a 'x_type' field so you can group options, description column for use in tooltips, etc. But the basic table works well for most circumstances.
Two tables. If you try to cram everything into one table then you break normalization (if you care about that). Here are examples:
LIST
---------------
LIST_ID (PK)
NAME
DESCR
LIST_OPTION
----------------------------
LIST_OPTION_ID (PK)
LIST_ID (FK)
OPTION_NAME
OPTION_VALUE
MANUAL_SORT
The list table simply describes a pick list. The list_ option table describes each option in a given list. So your queries will always start with knowing which pick list you'd like to populate (either by name or ID) which you join to the list_ option table to pull all the options. The manual_sort column is there just in case you want to enforce a particular order other than by name or value. (BTW, whenever I try to post the words "list" and "option" connected with an underscore, the preview window goes a little wacky. That's why I put a space there.)
The query would look something like:
select
b.option_name,
b.option_value
from
list a,
list_option b
where
a.name="States"
and
a.list_id = b.list_id
order by
b.manual_sort asc
You'll also want to create an index on list.name if you think you'll ever use it in a where clause. The pk and fk columns will typically automatically be indexed.
And please don't create a new table for each pick list unless you're putting in "relationally relevant" data that will be used elsewhere by the app. You'd be circumventing exactly the relational functionality that a database provides. You'd be better off statically defining pick lists as constants somewhere in a base class or a properties file (your choice on how to model the name-value pair).
Depending on your needs, you can just have an options table that has a list identifier and a list value as the primary key.
select optionDesc from Options where 'MyList' = optionList
You can then extend it with an order column, etc. If you have an ID field, that is how you can reference your answers back... of if it is often changing, you can just copy the answer value to the answer table.
If you don't mind using strings for the actual values, you can simply give each list a different list_id in value and populate a single table with :
item_id: int
list_id: int
text: varchar(50)
Seems easiest unless you need multiple things per list item
We actually created entities to handle simple pick lists. We created a Lookup table, that holds all the available pick lists, and a LookupValue table that contains all the name/value records for the Lookup.
Works great for us when we need it to be simple.
I've done this in two different ways:
1) unique tables per list
2) a master table for the list, with views to give specific ones
I tend to prefer the initial option as it makes updating lists easier (at least in my opinion).
Try turning the question around. Why do you need to pull it from the database? Isn't the data part of your model but you really want to persist it in the database? You could use an OR mapper like linq2sql or nhibernate (assuming you're in the .net world) or depending on the data you could store it manually in a table each - there are situations where it would make good sense to put it all in the same table but do consider this only if you feel it makes really good sense. Normally putting different data in different tables makes it a lot easier to (later) understand what is going on.
There are several approaches here.
1) Create one table per pick list. Each of the tables would have the ID and Name columns; the value that was picked by the user would be stored based on the ID of the item that was selected.
2) Create a single table with all pick lists. Columns: ID; list ID (or list type); Name. When you need to populate a list, do a query "select all items where list ID = ...". Advantage of this approach: really easy to add pick lists; disadvantage: a little more difficult to write group-by style queries (for example, give me the number of records that picked value X".
I personally prefer option 1, it seems "cleaner" to me.
You can use either a separate table for each (my preferred), or a common picklist table that has a type column you can use to filter on from your application. I'm not sure that one has a great benefit over the other generally speaking.
If you have more than 25 or so, organizationally it might be easier to use the single table solution so you don't have several picklist tables cluttering up your database.
Performance might be a hair better using separate tables for each if your lists are very long, but this is probably negligible provided your indexes and such are set up properly.
I like using separate tables so that if something changes in a picklist - it needs and additional attribute for instance - you can change just that picklist table with little effect on the rest of your schema. In the single table solution, you will either have to denormalize your picklist data, pull that picklist out into a separate table, etc. Constraints are also easier to enforce in the separate table solution.
This has served us well:
SQL> desc aux_values;
Name Type
----------------------------------------- ------------
VARIABLE_ID VARCHAR2(20)
VALUE_SEQ NUMBER
DESCRIPTION VARCHAR2(80)
INTEGER_VALUE NUMBER
CHAR_VALUE VARCHAR2(40)
FLOAT_VALUE FLOAT(126)
ACTIVE_FLAG VARCHAR2(1)
The "Variable ID" indicates the kind of data, like "Customer Status" or "Defect Code" or whatever you need. Then you have several entries, each one with the appropriate data type column filled in. So for a status, you'd have several entries with the "CHAR_VALUE" filled in.

Resources