Database design implementing support for different languages - database

Today I was practicing database design, for example I was trying to build a database design where I would like to store my favorite shows. I have normlaised this database to the third form (at least I think I have). But I have faced a problem with this design, actually, how can I preform a such query:
Is there any show that has language (let's say Italian) and status Emitting and there are more than 5 episodes available?
I think I have made a mistake with the language and show_internation_data table, but I am not sure...
Also if you could tell me how bad this design is and what to improve, or just throw an awesome article, that would be awesome!!
Here is the image:

You are correct. There is a problem with the show_international_data but that's not the only issue with this design.
I would remove the show_ID column from show_languages and add it to show_international_data instead. This re-oredering of tables will allow you to write a query to answer your question.
select show_ID
from show s left join show_international_date sid
on s.show_ID = sid.show_ID
where s.episodes_available > 5
and sid.status = 'Emitting'
and sid.language_id = (select language_ID
from show_languages
where language = 'Italian')
Beyond this specific question however you should perhaps consider the following:
Do genre, theme and summary really vary depending on language?
...and do rating & cover_img NOT vary with language?
show to show_producer should probably be a many to many relation.
genre to show & theme to show could also potentially be many to many.
If I was developing this I would also create tables for status, genre & theme and store the IDs in show_international_data (or whatever table they end up in) rather than storing them as varchar


how to store static data that will be localized?

I am developing a health care system and I want the doctor when starting to type a diagnosis instead of typing it , he can select from a list that will be displayed for him.
the list contains diseases or symptoms that will then be inserted into database in a diagnosis table.
I did that because of two reasons:
I want all doctors to use the same list of symptoms when writing their diagnosis to work on that data later on, instead of each one typing his own way.
The data will be localized and translated to different languages when displayed to different regions.
I am facing a problem here, should i put all these in a lookup table in a database or a config file ? given that number of rows are 3000 in 7 languages ( each language will have it's own column ) and i may at anytime add new data or remove.
I would put them in a database. I find it easier to maintain, and faster to query than a config file.

Designing a database for an e-commerce store

Hi I am trying to design a database for an e-commerce website but I can't seem to find a way to do this right, this is what I have so far:
The problem appears at the products.I have 66 types of products most of them having different fields.I have to id's but both of them don't seem very practical:
At first I thought I to make a table for each product type, but that would result in 66 tables which is not very easy to maintain. I already started to do that I created the Product_Notebook and Product_NotebookBag tables. And then I stopped and thought about it a bit and this solution is not very good.
After thinking about it a bit more I came up with option B which is storing the data into a separate field called description. For example:
"Color : Red & Compatibility : 15.6 & CPU : Intel"
In this approach I could take the string and manipulate it after retrieving it from the database.
I know this approach is also not a very good idea, that's why I am asking for a more practical approach.
See my answer to this question here on Stack Overflow. For your situation I recommend using Entity Attribute Value (EAV).
As I explain in the linked answer, EAV is to be avoided almost all of the time for many good reasons. However, tracking product attributes for an online catalog is one application where the problems with EAV are minimal and the benefits are extensive.
Simply create a ProductProperties table and put all the possible fields there. (You can actually just add more fields to your Products table)
Then, when you list your products, just use the fields you need.
Surely, there are many fields in common as well.
By the way, if you're thinking of storing the data in array (option B?) you'll regret it later. You won't be able to easily sort your table that way.
Also, that option will make it hard to find a particular item by a specific characteristic.

Need advice on multilingual data storage

This is more of a question for experienced people who've worked a lot with multilingual websites and e-shops. This is NOT a database structure question or anything like that. This is a question on how to store a multilingual website: NOT how to store translations. A multilingual website can not only be translated into multiple languages, but also can have language-specific content. For instance an english version of the website can have a completely different structure than the same website in russian or any other language. I've thought up of 2 storage schemas for such cases:
table contents // to store some HYPOTHETICAL content
id // content id
table contents_loc // to translate the content
content, // ID of content to translate
lang, // language to translate to
value, // translated content
online // availability flag, VERY IMPORTANT
- Content can be stored in multiple languages. This schema is pretty common, except maybe for the "online" flag in the "_loc" tables. About that below.
- Every content can not only be translated into multiple languages, but also you could mark online=false for a single language and disable the content from appearing in that language. Alternatively, that record could be removed from "_loc" table to achieve the same functionality as online=false, but this time it would be permanent and couldn't be easily undone. For instance we could create some sort of a menu, but we don't want one or more items to appear in english - so we use online=false on those "translations".
- Quickly gets pretty ugly with more complex table relations.
- More difficult queries.
table contents // to store some HYPOTHETICAL content
id, // content id
online // content availability (not the same as in first example)
lang, // language of the content
value, // translated content
1. Less painful to implement
2. Shorter queries
2. Every multilingual record would now have 3 different IDs. It would be bad for eg. products in an e-shop, since the first version would allow us to store different languages under the same ID and this one would require 3 separate records to represent the same product.
First storage option would seem like a great solution, since you could easily use it instead of the second one as well, but you couldn't easily do it the other way around.
The only problem is ... the first structure seems a bit like an overkill (except in cases like product storage)
So my question to you is:
Is it logical to implement the first storage option? In your experience, would anyone ever need such a solution?
The question we ask ourselves is always:
Is the content the same for multiple languages and do they need a relation?
Translatable models
If the answer is yes you need a translatable model. So a model with multiple versions of the same record. So you need a language flag for each record.
PROS: It gives you a structure in which you can see for example which content has not yet been translated.
Separate records per language
But many times we see a different solution as the better one: Just seperate both languages totally. We mostly see this in CMS solutions. The story is not only translated but also different. For example in country 1 they have a different menu structure, other news items, other products and other pages.
PROS: Total flexibility and no unexpected records from other languages.
We see it like writing a magazine: You can write one, then translate to another language. Yes that's possible but in real world we see more and more that the content is structurally different. People don't like to be surprised so you need lots of steps to make sure content is not visible in wrong languages, pages don't get created in duplicate etc.
Sharing logic
So what we do is most time: Share the views, make the buttons, inputs etc. translatable but keep the content seperated. So that every admin can just work in his area. If we need to confirm that some records are available in all languages we can always trick that by creating a link (nicely relational) between them but it is not the standard we use most of the time.
Really translatable records like products
Because we are flexible in creating models etc. we can just use decide how to work with them based on the requirements. I would not try to look for a general solution which works for all because there is none. You need a solution based on your data.
Assuming that you need a translatable model, as it is described by Luc, I would suggest coming up with some sort of special-character-delimited key-value pair format for the value column of the content table. Example:
#en=English Term#de=German Term
You may use UDFs (User Defined Functions in T-SQL) to set/get the appropriate term based on the specified language.
For selecting :
select id, dbo.GetContentInLang(value, #lang)
from content
For updating:
update content
set value = dbo.SetContentInLang(value, #lang, new_content)
where id = #id
The UDFs:
a. do have a performance hit but this also the case for join that you will have to do between the content and content_loc tables
b. are somehow difficult to implement but are reusable practically throughout your database.
You can also do the above on the application/UI layer.

Template Matching for relational database

I am trying to do the following:
we are trying to design a fraud detection system for stock market.
I know the Specification for the frauds (they are like templates).
so I want to know if I can design a template, and find all records that match this template.
I can't use the traditional queries cause the templates are complex
for example one of my Fraud is circular trading,it's like this :
A bought from B, and B bought from C, And C bought from A (it's a cycle)
and this cycle can include 4 or 5 persons.
is there any good suggestion for this situation.
I don't see why you can't use "traditional queries" as you've stated. SQL can be used to write extraordinarily complex queries. For that matter I'm not sure that this is a hugely challenging question.
Firstly, I'd look at the behavior you have described as vary transactional, therefore I treat the transactions as a model. I'd likely have a transactions table with some columns like buyer, seller, amount, etc...
You could alternatively have the shares as its own table and store say the previous 100 owners of that share in the same table using STI (Single Table Inheritance) buy putting all the primary keys of the owners into an "owners" column in your shares table like 234/823/12334/1234/... that way you can do complex queries and see if that share was owned by the same person or look for patterns in the string really easily and quickly.
I wouldn't suggest making up a "small language" I don't see why you'd want to do something like that when you have huge selection of wonderful languages and databases to choose from, all of which have well refined and tested methods to solve exactly what you are doing.
My best advice is pop open your IDE (thumbs up for TextMate) and pick your favorite language (Ruby in my case). Find some sample data and create your database and start writing some code! You can't go wrong trying to experiment like this, it'll will totally expose better ways to go about it than we can dream up here on Stackoverflow.
Definitely Data Mining. But as you point out, you've already got the models (your templates). Look up fraud DETECTION rather than prevention for better search results?
I know a some banks use SPSS PASW Modeler for fraud detection. This is very intuitive and you can see what you are doing as you play around with the data. So you can implement your templates. I agree with Joseph, you need to get playing, making some new data structures.
Maybe a timeseries model?
Theoretically you could develop a "Small Language" first, something with a simple syntax (that makes expressing the domain - in your case fraud patterns - easy) and from it generate one or more SQL queries.
As most solutions, this could be thought of as a slider: at one extreme there is the "full Fraud Detection Language" at the other, you could just build stored procedures for the most common cases, and write new stored procedures which use the more "basic" blocks you wrote before to implement the various patterns.
What you are trying to do falls under the Data Mining umbrella, so you could also try to learn more about it: maybe you can find a Data Mining package for your specific DB (you didn't specify) and see if it helps you finding common patterns in your data.

Database Design

This is a general database question, not related to any particular database or programming language.
I've done some database work before, but it's generally just been whatever works. This time I want to plan for the future.
I have one table that stores a list of spare parts. Name, Part Number, Location etc. I also need to store which device(s) they are applicable too.
One way to do is to create a column for each device in my spare parts table. This is how it's being done in the current database. One concern is if in the future I want to add a new device I have to create a new column, but it makes the programming easier.
My idea is to create a separate Applicability table. It would store the Part ID and Device ID, if a part is applicable to more than one device it would have more than one row.
My questions are whether this is a valid way to do it, would it provide an advantage over the original way, and is there any better ways to do it?
Thanks for any answers.
I agree with Rex M's answer, this is a standard approach. One thing you could do on the PartsApplicability table is remove the ID column, and make the PartID/DeviceID a composite primary key. This will ensure that your Part cannot be associated to the same Device more than once, and vice-versa.
You're describing the standard setup of a many-to-many relationship in an RDBMS, using an intermediate join table. Definitely the way to go if that's how your model will end up working.
Using a separate table to hold many-to-many relationships is the right way to go.
Some of the benefits for join tables are
Parts may be applicable to any device and creating new devices or parts will not lead to modifications to the database schema
You don't have to save nulls or other sentinental values for each part-device mapping that doesn't exists i.e. things will be cleaner
Your tables remain narrow which makes them easier to understand
You seem to be on your way to discover the database normal forms. The 3rd normal form or BNF should be a good goal to have although sometimes it's a good idea to break the rules.
Your second design is a very good design, and similar to what I've done (at work and on my own projects) many times in terms of describing relationships between things. Lookup tables and their equivalent are often far simpler to use than trying to stuff everything in one table.
Would also agree on making the programming easier. Ultimately, you'll find that learning more makes programming far easier than trying to push things into what you already know even when they really don't fit. Knowing how to properly join tables and the like will make your programming with databases far easier than continually modifying columns would be.
