SQL Column Search Question (SQL Server 2005)

SQL Column Search Question (SQL Server 2005) - sql-server

I have a column varchar(70) in one of my table where I store space separated tags:
Id Tags
1 Baby Kids Learning Alphabets
2 Kids Baby
3 Comedy Movie Fun
100 Kids Learning Alphabets
500 Kids Baby
I perform search on the column:
Get all ids where we have Baby Kids and Alphabets in the tags
I can do where Tags like '%Baby%' or Tags like '%Kids%' or Tags like '%Alphabets %'
Select query isslow when there are large # of rows. But add\delete\edit is always very fast.
So I added another table Called Tags where I store the tags alphabetically like:
Tag Id
Alphabets 1
Alphabets 100
Baby 1
Baby 2
Baby 500
Comedy 3
Kids 1
This make the searching faster, buy update\delete\insert painful.
Is my design right for future growth?
How should you design this tags column?
Thanks for reading
EDIT:
*All I am trying to pull is the list of related ids. Basically find all ids having the given tags or tag. Like on this question, you see "Related Questions" on the right. That is what I am trying to get.*

You will really have to analyze which aspects of your site will grow and what your goals are. Will there be tons of inserts? Will there be tons of tags? Do you need to answer the question "How many questions have the tag 'Alphabet'?" I hate "it depends" answers, but it really does depend on your goals and expectations.

Here is what I recommend:
Create 2 new tables. One simply stores the name of the tag
Tag Id
Alphabets 1
Baby 2
Comedy 3
Kids 4
Learning 5
Alphabets 6
The second one will link the tag to the entry in the first table
main_id tag_id
1 2
1 4
1 5
1 6
100 4
100 5
100 6
Then you can simply search with a join between the tables. It will be a LOT faster. Be sure to include the appropriate indexes.

Related

Simple database design - some columns have multiple values

Caveat: very new to database design/modeling, so bear with me :)
I'm trying to design a simple database that stores information about images in an archive. Along with file_name (which is one distinct string), I have fields like genre and starring where each field might contains multiple strings (if an image is associated with multiple genres, and/or if an image has multiple actors in it).
Right now the database is just a single table keyed on file_name, and the fields like starring and genre just have multiple comma-separated values stored. I can query it fine by using wildcards and like and in operators, but I'm wondering if there's a more elegant way to break out the data such that it is easier to use/query. For instance, I'd like to be able to find how many unique actors are represented in the archive, but I don't think that's possible with the current model.
I realize this is a pretty elementary question about data modeling, but any guidance anyone can provide or reading you can direct me to would be greatly appreciated!
Thanks!

You need to create extra tables in order to stick with the normalization. In your situation you need 4 extra tables to represent these n->m relations(2 extra would be enough if the relations were 1->n).
Tables:
image(id, file_name)
genre(id, name)
image_genres(image_id, genre_id)
stars(id, name, ...)
image_stars(image_id, star_id)
And some data in tables:
image table
id
file_name
1
/users/home/song/empire.png
2
/users/home/song/promiscuous.png
genre table
id
name
1
pop
2
blues
3
rock
image_genres table
image_id
genre_id
1
2
1
3
2
1
stars table
id
name
1
Jay-Z
2
Alicia Keys
3
Nelly Furtado
4
Timbaland
image_stars table
image_id
star_id
1
1
1
2
2
3
2
4
For unique actor count in database you can simply run the sql query below
SELECT COUNT(name) FROM stars

Database - Multilanguage online dictionary

I am implementing a database for a multilanguage online dictionary. At the moment I have only 2 languages (RO - Romanian, CS - Czech), but I think maybe in future I will want to add more languages (for example EN - english).
The website will be multilanguage too, so a romanian user can look for czech words and a czech user can look for romanian words.
I'm thinking how could I implement the TRANSLATIONS table for multiple translations. At the moment I have this implementation, but it's not the way to go, because in case I will add another 10 languages, I would have to add another 10 columns:
WORD (id,word,language_id,..)
TRANSLATIONS(id,ro_id,cs_id)
LANGUAGE(id,code)
LANGUAGE
id code
1 ro
2 cs
WORD
id word language_id
1 xxx 1
2 yyy 2
TRANSLATIONS
id ro_id cs_id
1 1 2
I could make a table like:
TRANSLATIONS(id,word_id,translation_id)
but in this case I don't know how to search for translation and i would have to add both cases:
TRANSLATIONS
id word_id translation_id
1 1 2
2 2 1
Some ideas? I hope my description makes sense.
Thanks.
UPDATE: Another approach could be a table with JSON column like:
TRANSLATIONS(id,data)
id data
1 {"ro":1,"cs":2,"another_lang": "another_id"}
but is this a good idea?

I think your database design may be a little off.
Maybe you could have something like :
WORD (id,word,language_id,..)
TRANSLATIONS(id,language_id_from,language_id_to,word_id_from,word_id_to)
LANGUAGE(id,code)
Then you would have records like:
LANGUAGES
id code
1 EN (english)
2 RO (romanian)
WORDS
id word language_id
221 hi 1
4423 Bună 2
TRANSLATIONS
id lang_id_from lang_id_to word_id_from word_id_to
54512 1 2 221 4423
Then, for a given word in a given language, such as "hi" in english, you would look for the translations record that matches the id of the word (1) and target the language id you're looking for, such as Romanian (2).
Something like:
SELECT *
FROM Translations
WHERE word_id_from = 1 AND language_id_to = 2
And it would be scalable, because you can add new languages, words and translations as you want, and your search wouldn't be affected.

relational database design for multidimensional matrix questions

I am designing a relational DB for an online survey.
However, I am not sure what is the best relational database design for storing multidimensional matrix questions.
Let's say, I have the following question (sorry, it does not allow me to insert HTML table):
What was your experience of...
----------| Not friendly| (2) |Very friendly|Length of stay|Visited in the last year?|
Sydney |radio button | rb | rb | drop down | check box |
--------------------------------------------------------------------------------------
New York | rb | rb | rb | drop down | check box |
--------------------------------------------------------------------------------------
London | rb | rb | rb | drop down | check box |
--------------------------------------------------------------------------------------
Do you think I should do something along the following lines or is there a better way?
To hold all the question:
Question
questionID
question
QuestionMatrix2d
matrix2dID
questionID
subquestionID
subquestion
QuestionMatrix
questionID
matrix2dID
question_parentID
And to hold all the responses:
QuestionResponse
questionID
response_code
QuestionMatrix2dResponse
questionID
subquestionID
response_code
Thank you for your help.

I disagree with ryan1234. This totally is a relational problem, and there is very little reason not to put it into a database.
I have to do a bit of guesswork though, in what you're trying to achieve here. You have an online survey, so I assume it will be used by more than one person. Your database will need to acommodate for that by having a session or user table, I'll go with the latter since it is more clear to read.
Secondly, you have a list of locations (Sidney, New York, London). I assume this list can either change over time or even from one questionaire to the next.
Then you have a set of questions. You don't explicitly state that these would be variable or fixed. Since you designed a set of tables for that, I assume it's supposed to be variable. Please note that your questions are not a matrix, but a list. Even if they are hierarchical, they still do not compose a matrix.
Last but not least you've got answers to those questions.
Lets create a users table:
user_id user_name
1 me
2 somebody else
Second table is as simple: locations
location_id location_name
1 Sidney
2 New York
3 London
Third table is a bit more complicated - and to be honest: just plain ugly. But this is what you get if you design a database in a database, and the alternatives (using DDL or storing that information in XML/JSON or even outside the database) are not pretty either. If there is a hierarchical question (your examples don't show them), you could add a "parent_question_id" column.
question_id question_text question_type question_type_info
1 How do you rate RADIO 0 to 5
2 Length of stay COMBOBOX 1 day, 2 days, whatever
3 Visited last year CHECKBOX
Finally you need a fourth table to store all the answers
user_id location_id question_id value
1 1 1 2 <-- value here means "rating of 2"
1 1 2 5 <-- value here means "5 days"
1 1 3 1 <-- value here means "yes, visited last year"
Yep. ugly as well. If you had a fixed list of questions I could provide you with a pretty database :)
Edit: Answering to your comments: To link your questions to a survey, you'll need a few more tables surveys defining which questions for which locations are going to be asked. The following database layout lets you specify a list of locations and a list of questions asked as well as a survey name.
Table surveys:
survey_id survey_name
1 Spring 2013 London Travel Survey
2 Spring 2013 Northern Hemisphere Short Survey
Table survey_questions:
survey_id question_id
1 1
1 2
1 3
2 1
Table survey_locations:
survey_id location_id
1 1
2 1
2 2
The contents I put in here gives you two surveys. Survey #1 will ask all three questions just on one location: 'London'. Survey #2 will just ask one question on both London and New York. If you want to ask different questions on different locations your table layout will have to accommodate for that, but such a system won't fit into your original table-like layout.

Having done things similar to this, I would recommend considering not turning this into a relational problem. What if you have objects and just serialize them to something like JSON and store that?
Doing this relationally you'll end up spending quite a bit of time making tables and wiring together complex drawing code in your application to make sure the questions/answers draw in the right order etc.
Otherwise I think you can make your approach work. There is no silver bullet for designing survey stuff in an RDBMS.

Explaining row and column dependencies

This is a simple and common scenario at work, and I'd appreciate some input.
Say I am generating a report for the owners of a pet show, and they want to know which of their customers have bought how many of each pet. In this scenario my only tools are SQL and something that outputs my query to a spreadsheet.
As the shop owner, I might expect reports in the form:
Customer Dog Cat Rabbit
1 2 3 0
2 0 1 1
3 1 2 0
4 0 0 1
And if one day I decided to stock Goldfish then the report should now come out as.
Customer Dog Cat Rabbit Goldfish
1 2 3 0 0
2 0 1 1 0
3 1 2 0 0
4 0 0 1 0
5 0 0 0 1
But as you probably know, to have a query which works this way would involve some form of dynamic code generation and would be harder to do.
The simplest query would work along the lines of:
Cross join Customers and Pets, Outer join Sales, Group, etc.
and generate:
Customer Pet Quantity
1 Dog 2
1 Cat 3
1 Rabbit 0
1 Goldfish 0
2 Dog 0
2 Cat 1
2 Rabbit 1
...etc
a) How would I explain to the shop owners that the report they want is 'harder' to generate? I'm not trying to say it's harder to read, but it is harder to write.
b) What is the name of the concept I am trying to explain to the customer (to aid with my Googling)?

The name of the concept is 'cross-tab' and can be accomplished in several ways.
MS Access has proprietary extensions to SQL to make this happen. SQL pre-2k5 has a CASE trick and 2k5 and later has PIVOT, but I think you still need to know what the columns will be.

Some databases indeed support some way of creating cross tables, but I think most need to know
the columns in advance, so you'd have to modify the SQL (and get a database that supports such an extension).
Another alternative is to create a program that will postprocess the second "easy" table to get your clients the cross table as output. This is probably easier and more generic than having to modify SQL or dynamically generate it.
And about a way to explain the problem... you could show them in an Excel how many steps are needed to get the desired result:
Source data (your second listing).
Select values from the pets column
Place each pet type found on a new column
Count values per each type per client
Fill the values
and then say that SQL gives you only the source data, so it's of course more work.

This concept is called pivoting
SQL assumes that your data is represented in terms of relations with fixed structure.
Like, equality is a binary relation, "customer has this many pets of this type" is a ternary relation and so on.
When you see this resultset:
Customer Pet Quantity
1 Dog 2
1 Cat 3
1 Rabbit 0
1 Goldfish 0
2 Dog 0
2 Cat 1
2 Rabbit 1
, it's actually a relation defined by all possible combinations of domain values being in this relation.
Like, a customer 1 (domain customers id's) has exactly 2 (domain positive numbers) pets of genus dog (domain pets).
We don't see rows like these in the resultset:
Customer Pet Quantity
1 Dog 3
Pete Wife 0.67
, because the first row is false (customer 1 doesn't have 3 items of dog, but 2), and the second row values are out of their domain scopes.
SQL paradigma implies that your relations are defined when you issue a query and each row returned defines the relation completely.
SQL Server 2005+ can map rows into columns (that is what you want), but you should know the number of columns when designing the query (not running).
As a rule, the reports you are trying to build are built with reporting software which knows how to translate relational SQL resultsets into nice looking human readable reports.

I have always called this pivoting, but that may not be the formal name.
Whatever it's called you can do almost all of this in plain SQL.
SELECT customer, count(*), sum(CASE WHEN pet='dog' THEN 1 ELSE 0 END) as dog, sum(case WHEN pet='cat' THEN 1 ELSE 0 END) as cast FROM customers join pets
Obviously what's missing is the dynamic columns. I don't know if this is possible in straight SQL, but it's certainly possible in a stored procedure to generate the query dynamically after first querying for a list of pets. The query is built into a string then that string is used to create a prepared statement.

How do I create nested categories in a Database?

I am making a videos website where categories will be nested:
e.g. Programming-> C Language - > MIT Videos -> Video 1
Programming -> C Language -> Stanford Video - > Video 1
Programming -> Python -> Video 1
These categories and sub-categories will be created by users on the fly. I will need to show them as people create them in the form of a navigable menu, so that people can browse the collection easily.
Could someone please help me with how I can go about creating such a database?

Make a categories table with the following fields:
CategoryID - Integer
CategoryName - String/Varchar/Whatever
ParentID - Integer
Your ParentID will then reference back to the CategoryID of its parent.
Example:
CategoryID CategoryName ParentID
---------------------------------
1 Dog NULL
2 Cat NULL
3 Poodle 1
4 Dachsund 1
5 Persian 2
6 Toy Poodle 3

Quassnoi said :
You should use either nested sets or parent-child models.
I used to implement both of them. What I could say is:
Use the nested set architecture if your categories table doesn't change often, because on a select clause it's fast and with only one request you can get the whole branch of the hierarchy for a given entry. But on a insert or update clause it takes more time than a parent child model to update the left and right (or lower and upper in the example below) fields.
Another point, quite trivial I must admit, but:
It's very difficult to change the hierarchy by hand directly in the database (It could happen during the development). So, be sure to implement first an interface to play with the nested set (changing parent node, move a branch node, deleting a node or the whole branch etc.)
Here are two articles on the subject:
Storing Hierarchical Data in a Database
Managing Hierarchical Data in MySQL
Last thing, I didn't try it, but I read somewhere that you can have more than one tree in a nested set table, I mean several roots.

You should use either nested sets or parent-child models.
Parent-child:
typeid parent name
1 0 Buyers
2 0 Sellers
3 0 Referee
4 1 Electrical
5 1 Mechanic
SELECT *
FROM mytable
WHERE group IN
(
SELECT typeid
FROM group_types
START WITH
typeid = 1
CONNECT BY
parent = PRIOR typeid
)
will select all buyers in Oracle.
Nested sets:
typeid lower upper Name
1 1 2 Buyers
2 3 3 Sellers
3 4 4 Referee
4 1 1 Electrical
5 2 2 Mechanic
SELECT *
FROM group_types
JOIN mytable
ON group BETWEEN lower AND upper
WHERE typeid = 1
will select all buyers in any database.
See this answer for more detail.
Nested sets is more easy to query, but it's harder to update and harder to build a tree structure.

From the example in your question it looks like you'd want it to be possible for a given category to have multiple parents (e.g., "MIT Videos -> Video 1 Programming" as well as "Video -> Video 1 Programming"), in which case simply adding a ParentID column would not be sufficient.
I would recommend creating two tables: a simple Categories table with CategoryID and CategoryName columns, and a separate CategoryRelationships table with ParentCategoryID and ChildCategoryID columns. This way you can specify as many parent-child relationships as you want for any particular category. It would even be possible using this model to have a dual relationship where two categories are each other's parent and child simultaneously. (Off the top of my head, I can't think of a great use for this scenario, but at least it illustrates how flexible the model is.)

What you need is a basic parent-child relationship:
Category (ID: int, ParentID: nullable int, Name: nvarchar(1000))

A better way to store the parent_id of the table is to have it nested within the ID
e.g
100000 Programming
110000 C Language
111000 Video 1 Programming
111100 C Language
111110 Stanford Video
etc..so all you need it a script to process the ID such that the first digit represents the top level category and so on as you go deeper down the hierarchy

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight