I have a legacy application which has below tables which has 1 to 1 mapping
customer (has already 40 columns)
customer_additional_attributes(has 20 columns)
My question :- Would not it be better design if customer and customer_additional_attributes tables were combined as it would have saves extra join or query sometime to fetch data
from customer_additional_attributes ?
Is there any disadvantage of single table(like in above scenario) but large number of columns?
The data format that you have is called "vertical partitioning". This is when rows of an entity are split across multiple tables. In a normalized structure, this is problematic, because inserts of rows (for instance) are not necessarily atomic -- they affect two tables.
But there are good reasons for doing this. The most obvious is when the rows are too wide. If the columns are too wide, they simply will not fit in one table, so they are spread through multiple tables.
Similarly, if some columns are much larger -- and rarely used -- then putting them in another table can be a big win on performance.
Before combining the tables, you should recognize that the data structure is intentional. It might simply be the result of "laziness". The first table was created -- and then additional attributes came along so they were put into another table. Or, it could be quite intentional, and you would want to understand why.
Note that the join between the two tables should be pretty fast, particularly if the same primary key is used for both.
You have many to many relationship maybe you have to create intermediate table so one for customer, one for customer_attributes and one for customer_additional_attibutes containing id of the two table
Related
I know it is a big and general question. Let me describe what I am looking for.
In big projects, we have some entities with many properties. (Many is over 100 properties for just a specific entity.) These properties have one to one relation. By the time goes, these tables with many columns are really big problems for maintenance and further development.
As you think, these 90 columns is created in a time with many projects. Not a single project. Therefore, requirements affect the table design in a wide time duration.
i.e. : There is a table to store information of payments between banks in global.
Some columns are foreign keys of others.(Customer, TransferType etc.)
Some columns are parameters of current payment. (IsActive, IsLoaded, IsOurCustomer etc.)
Some columns are fields of payment. (Information Bank, Receiver Bank etc.)
and so on.
These fields are always counting and now we have about 90 columns with one to one relation.
What are the concerns to divide a table to smaller tables. I know normalization rules and I am not interested it. (Already duplicated columns are normalized)
I try to find some patterns or some rules to divide a table which has one to one relation among columns.
If all of the columns are only dependent on the primary table key and are not repeating (phone1, phone2) they should be part of the same table. If you split a table you will have to do joins when you need all the columns of the table. If many of the values are null you may investigate the use of sparse columns (which don't take up any room if they have a null value).
I have two tables in my database, one for login and second for user details (the database is not only two tables). Logins table has 12 columns (Id, Email, Password, PhoneNumber ...) and user details has 23 columns (Job, City, Gender, ContactInfo ..). The two tables have one-to-one relationship.
I am thinking to create one table that contain the columns of both tables but I not sure because this may make the size of the table big.
So this lead to my question, what the number of columns that make table big? Is there a certain or approximate number that make size of table big and make us stop adding columns to a table and create another one? or it is up to the programmer to decide such number?
The number of columns isn't realistically a problem. Any kind of performance issues you seem to be worried with can be attributed to the size of the DATA on the table. Ie, if the table has billions of rows, or if one of the columns contains 200 MB of XML data on each separate row, etc.
Normally, the only issue arising from a multitude of columns is how it pertains to indexing, as it can get troublesome trying to create 100 different indexes covering each variation of each query.
Point here is, we can't really give you any advice since just the number of tables and columns and relations isn't enough information to go on. It could be perfectly fine, or not. The nature of the data, and how you account for that data with proper normalization, indexing and statistics, is what really matters.
The constraint that makes us stop adding columns to an existing table in SQL is if we exceed the maximum number of columns that the database engine can support for a single table. As can be seen here, for SQLServer that is 1024 columns for a non-wide table, or 30,000 columns for a wide table.
35 columns is not a particularly large number of columns for a table.
There are a number of reasons why decomposing a table (splitting up by columns) might be advisable. One of the first reasons a beginner should learn is data normalization. Data normalization is not directly concerned with performance, although a normalized database will sometimes outperform a poorly built one, especially under load.
The first three steps in normalization result in 1st, 2nd, and 3rd normal forms. These forms have to do with the relationship that non-key values have to the key. A simple summary is that a table in 3rd normal form is one where all the non-key values are determined by the key, the whole key, and nothing but the key.
There is a whole body of literature out there that will teach you how to normalize, what the benefits of normalization are, and what the drawbacks sometimes are. Once you become proficient in normalization, you may wish to learn when to depart from the normalization rules, and follow a design pattern like Star Schema, which results in a well structured, but not normalized design.
Some people treat normalization like a religion, but that's overselling the idea. It's definitely a good thing to learn, but it's only a set of guidelines that can often (but not always) lead you in the direction of a satisfactory design.
A normalized database tends to outperform a non normalized one at update time, but a denormalized database can be built that is extraordinarily speedy for certain kinds of retrieval.
And, of course, all this depends on how many databases you are going to build, and their size and scope,
I take it that the login tables contains data that is only used when the user logs into your system. For all other purposes, the details table is used.
Separating these sets of data into separate tables is not a bad idea and could work perfectly well for your application. However, another option is having the data in one table and separating them using covering indexes.
One aspect of an index no one seems to consider is that an index can be thought of as a sub-table within a table. When a SQL statement accesses only the fields within an index, the I/O required to perform the operation can be limited to only the index rather than the entire row. So creating a "login" index and "details" index would achieve the same benefits as separate tables. With the added benefit that any operations that do need all the data would not have to perform a join of two tables.
What is the best way to model a database? I have many known channels with values. Is it better create one table with many columns, one for each channel or create two table one for values and one for channels? Like that:
Table RAW_VALUES: SERIE_ID, CHANNEL_1, ..., CHANNEL_1000
or
Table RAW_VALUES: SERIE_ID, CHANNEL_ID, VALUE
Table CHANNELS: CHANNEL_ID, NAME, UNIT, ....
My question is about performance to search some data or save database space.
Thanks.
Usually, one would want to know what type of queries you will run against the tables as well as the data distribution etc to choose between two designs. However, I think that there are more fundamental issues here to guide you.
The second alternative is certainly more flexible. Adding one more channel ("Channel_1001") can be done simply by inserting rows in the two tables (a simple DML operation), whereas if you use the first option, you need to add a column to the table (a DDL operation), and that will not be usable by any programs using this table unless you modify them.
That type of flexibility alone is probably a good reason to go with the second option.
Searching will also be better served with the second option. You may create one index on the raw_values table and support indexed searches on the Channel/Value columns. (I would avoid the name "value" for a column by the way.)
Now if you consider what column(s) to index under the first option, you will probably be stumped: you have 1001 columns there. If you want to support indexed searches on the values, would you index them all? Even if you were dealing with just 10 channels, you would still need to index those 10 columns under your first option; not a good idea in general to load a table with more than a few indexes.
As an aside, if I am not mistaken, the limit is 1000 columns per table these days, but a table with more than 255 columns will store a row in multiple row pieces, each storing 255 columns and that would create a lot of avoidable I/O for each select you issue against this table.
I have two table with the same columns,one is use to save bank's amount and the other to save cashdesk,they both might have many data,so i'm concerned about data retrieving speed.I don't know it's better to combine them by adding extra column to determine type of each record or create a separate table for each one?
The main question you should be asking is - how am I querying the tables.
If there is no real logical connection between the 2 tables (you don't want to get the rows in the same query) - use 2 different tables, since the other why around you will need to hold another column to tell you what type of row you are working on, and that will slow you down and make your queries more complex
In addition FKs might be a problem if the same column if a FK to 2 different places
In addition (2nd) - locks might be an issue - if you work on one type you might block the other
conclusion - 2 tables, not just for speed
In theory you have one unique entity, So you need to consider one table to your accounts and another one for your types of accounts, for better performance you could separate these types of account on two different file groups and partitions and create an index on the typeFK for account table, in this scenario you have logically one entity that is ruled by relational theory and physically your data is separated and data retrieval process would be fast and beneficial.
I was wondering which approach is better for designing databases?
I have currently one big table (97 columns per row) with references to lookup tables where I could.
Wouldn't it be better for performance to group some columns into smaller tables and add them key columns for referencing one whole row?
If you split up your table into several parts, you'll need additional joins to get all your columns for a single row - that will cost you time.
97 columns isn't much, really - I've seen way beyond 100.
It all depends on how your data is being used - if your row just has 97 columns, all the time, and needs to 97 columns - then it really hardly ever makes sense to split those up into various tables.
It might make sense if:
you can move some "large" columns (like XML, VARCHAR(MAX) etc.) into a separate table, if you don't need those all the time -> in that case, your "basic" row becomes smaller and your basic table will perform better - as long as you don't need those extra large column
you can move away some columns to a separate table that aren't always present, e.g. columns that might be "optional" and only present for e.g. 20% of the rows - in that case, you might save yourself some processing for the remaining 80% of the cases where those columns aren't needed.
It would be better to group relevant columns into different tables. This will improve the performance of your database as well as your ease of use as the programmer. You should try to first find all the different relationships between your columns and following that you should attempt to break everything into tables while keeping in mind these relationships (using primary keys, forking keys, references and so forth).Try to create a diagram as this http://www.simple-talk.com/iwritefor/articlefiles/354-image008.gif and take it from there.
Unless your data is denormalized it is likely best to keep all the columns in the same table. SQL Server reads pages into the buffer pool from individual tables. Thus you will have the cost of the joins on every access even if the pages accessed are already in the buffer pool. If you access just a few rows of the data per query with a key then an index will serve that query fine with all columns in the same table. Even if you will scan a large percentage of the rows (> 1% of a large table) but only a few of the 97 columns you are still better off keeping the columns in the same table as you can use a non clustered index that covers the query. However, if the data is heavily denormalized then normalizing it, which by definition breaks it into many tables based upon the rules of normalization to eliminate redundancy, will result in much improved performance and you will be able to write queries to access only the specific data elements you need.