One table with many necessary rows in SQL issue - sql-server

Imagine one book table. A book details, are at least 40 items (Like: Name, HeadAuhtor, SecondAuthor, ISBN, DateOfPublish & so many more stupid columns).
I wanna add 30 more columns to this table completely related to my job but not really related to book table (Like: LibraryId, VisitedTimes, DownloadedTimes, HaveFiveStars, HaveFourStars, HaveThreeStars, HaveTwoStars, HaveOneStar [to calculate books rank], SoldTimes, LeasedTimes & some more).
SO, in total we have 70 columns for at least 5 million Books.
The second 30 columns will be filled eventually but:
Another important thing is that some libraries may fill All first 40 columns completely but some libraries with many books may fill just 10 of those 40 columns. so in this case we have at least 2 million rows with many NULL or 0 columns.
I want Speed & Performance.
This question is very important to me. and I cant test these both way to check the speed and performance myself, so don't tell me go and check it yourself.
I just need one best solution that explains what should I do!
Is it okay if I make a book table with 70 columns? or what? Split 70 columns in 2 tables with 1 to 1 relation? save the first 40 columns as Json in one string field (will Json be fast to get?)?
Does it really matter one 70-column table OR two 40 and 30-column tables with 1:1 relation?

Just add a separate table that has a book_id as a foreign key.
Since not all books will have additional details, left outer join from book table to the additional details table.

I will create 2 tables where most of the mandatory and important columns in table1(may 10-15 columns),rest in table2.
Most importantly some of your columns are extra,like HaveFiveStars, HaveFourStars, HaveThreeStars, HaveTwoStars, HaveOneStar.So instead of 5 column here you can have only one column like ViewerRating.
Similarly you can eliminate other columns
I think performance will improve .
Read this,
Which is more efficient: Multiple MySQL tables or one large table?
Most of the reason are already mention in this link.Also discussion in this link is not abut mySql related rather it is very generic RDBMS related.
You should carefully read each and every line.The reson given here are very technical.There is no assumption.
You have mention that there will be 5 million rows.Also most of the column will be null.I will say that not only your performance improve but also it will be very easy to maintain.
There are so many good point mention there in favour of multiple table.

Related

SQL Server : data type to use in a table

Might be a silly question to ask but what data type should I setup a column so I can enter multiple values?
Example: I have two tables, one called Application_users and the other Products.
Application_Users has an id column.
What I want is to have a column in Products which is called Application_Users_id and I enter 1,2,3,4
The idea is if an Application_User_id is say 3, they would only see products were the Products.Application_Users_ID contains a 3.
So what data type do I use so I can enter values such as 1,2,3,4 in a column?
I have tried NVARCHAR and INTEGER but neither work (NVARCHAR works but won't let me amend it e.g. add numbers).
Let me know what everyone thinks is the best approach here please.
Thanks
John
It might be a silly question but you would be surprised how many developers makes the very same mistake. It's so often that I have a ready-to-paste comment to address it:
Read Is storing a delimited list in a database column really that bad?, where you will see a lot of reasons why the answer to this question is Absolutely yes!
And if you actually go and read this link, you'll see that it's so wrong and so frequently used that Bill Karwin addressed it in the first chapter of his book - SQL Antipatterns: Avoiding the Pitfalls of Database Programming.
Having said that, SQL Server Does support XML columns, in which you can store multiple values, but that is not the case when you want to use them. XML columns are good for storing stuff like property bags, where you can't tell in advance the data types you'll have to deal with, for example.
tl;dr; - So what should you do?
What you want to do is probably a many to many relationship between Application_users and Products.
The way to create a many to many relationship is to add another table that will be the "bridge" between the two tables - let's call it Application_users_to_products.
This table will have only two columns - application_user_id and product_id, each of them is a foreign key to the respective table, and the combination of both columns is the primary key of the bridge table.

PostgreSql correct number of field

I would like your opinion. I have a table with 120 VARCHAR fields where I will have to hire about 1,000 records per month for at least 10 years, with a total number of 240,000 records.
I could divide the fields into multiple tables but I'd rather keep it that way. Do you think I will have problems in the future?
Thank you
Well, if the data of the columns is following a certain logic, keep it flat. Which means that I would let it that way. Otherwise separate it into multiple tables. I depents on your data.
I worked once worked with medical data where one table contained over 100 columns, but all these columns where needed to get a diagnostic result. I don't remember, what exactly it was, because I worked with that data set some years ago. But in that case it would make it more complicated, if the columns would be separated into multiple columns. Logically the data of each column served a certain purpose so it was easier to have them all in the same place (the table).
If you put the columns all together just to be lazy, so that you have to call the table once, I would recommend to separate the columns into different tables to make it more comfortable to work with, and to make the database schema more understandable.

Should I separate my table?

I have a question about my DB design.
I want to save some information about province and the country. At the first I thought that I can save all of this information in one table (General_info). Then for each record in this table; values of columns, which belongs to country should be repeated.
Another idea is to separate this table into two tables (General_info_country and general_info_province). The first table with only 3 columns and the other with more than 10.
What should I do? Which approach is more efficient?
The biggest issue with the first approach (one big table) is that if any one country information changes, you need to update multiple rows, meaning you might make a mistake and end up with inconsistent information.
The second approach is normalized and considered a better relational design.
If the relation is one to one and will remain one to one, then you can go both ways.
But if it is one to many then of course you must split them, and have the foreign key in the many part
In the case where the relation is one to one the question becomes, how frequent are you going to use the all of the attributes ?
If you are going to use both tables together frequently then i suggest you do not split them because you are going to need to join them a lot in your queries.
But if you are going to be using the 2 tales alone rather then using the info from both tables together then splitting them could save you some time.

How do I organize such database in SQLite?

folks! I need some help with organizing database for application and I have no idea how to do it. Suppose following:
There is a list of academic subjects. For each subject we need to have a list of academic groups, which attend this subject. Then, for each group we need to have a list of dates. And for each date we need to have a list of students, and whether this student was present that day or not.
I have ugly data structures in my mind, will appreciate any help.
UPDATE
How do I see it:
Table1(the first col is date and second is list student's id, who were present)
10/10/11 | id1, id2, id3
10/11/11 | id1, 1d3, id5
Table2:
subject1 | id1 id2 id3
subject2 | id3 id2
And again, ids are id of groups. Dont know how to connect those tables.
There are many considerations to balance when designing a database, but based on the information you provided so far, something like this might be a good start:
This ER model uses a lot identifying relationships (i.e. "migrating" parent's primary key into child's PK) and results in natural primary keys, as opposed to non-identifying relationships that would require usage of surrogate keys. A lot of people like surrogate keys these days, but the truth is that both design strategies have pros and cons. In particular:
Natural keys are bigger (they "accumulate" fields over multiple levels of parent-child relationships).
But also, natural keys require less JOINing.
Natural keys can enforce constraints better in some special cases (such as diamond-shaped dependencies).
You will design one table for each kind of "thing" (subjects, groups, students, meetings) in your database. Each table will have one column for each datum (piece of information) you need to store about the thing. Additionally, there must be one column, or a predictable combination of columns, that will allow you to uniquely identify each thing (row) that you store in the table.
Then, you will decide how the things (subjects, groups, students, meetings) are related to each other and make sure that you have the correct columns in each table to store those relationships. You will discover that in some cases this can be done by adding one or more columns to the tables you already defined. In other cases, you will need to add a completely new table that doesn't store a "thing", per se, but rather a relationship between two things.
Once you have your list of tables and columns, if you feel that fails to represent some part of the problem correctly, post another question with the work you've already done and I'm sure you'll find someone to help you complete the assignment.
Response to your update:
You are on the wrong track. It is a bad idea (and contrary to correct relational database design) to ever store two values in a single field. So each of the tables you wrote about should have two columns (as you said), but the second column should store one and only one id. Instead of one row in table1 for 10/10/11, you would have three separate rows in your table.
But, before you start worrying about the "relationships", create tables to hold the "things".
I also suggest you pick up a basic guide to relational databases.

SQL Optimization: how many columns on a table?

In a recent project I have seen a tables from 50 to 126 columns.
Should a table hold less columns per table or is it better to separate them out into a new table and use relationships? What are the pros and cons?
Generally it's better to design your tables first to model the data requirements and to satisfy rules of normalization. Then worry about optimizations like how many pages it takes to store a row, etc.
I agree with other posters here that the large number of columns is a potential red flag that your table is not properly normalized. But it might be fine in this case. We can't tell from your description.
In any case, splitting the table up just because the large number of columns makes you uneasy is not the right remedy. Is this really causing any defects or performance bottleneck? You need to measure to be sure, not suppose.
A good rule of thumb that I've found is simply whether or not a table is growing rows as a project continues,
For instance:
On a project I'm working on, the original designers decided to include site permissions as columns in the user table.
So now, we are constantly adding more columns as new features are implemented on the site. obviously this is not optimal. A better solution would be to have a table containing permissions and a join table between users and permissions to assign them.
However, for other more archival information, or tables that simply don't have to grow or need to be cached/minimize pages/can be filtered effectively, having a large table doesn't hurt too much as long as it doesn't hamper maintenance of the project.
At least that is my opinion.
Usually excess columns points to improper normalization, but it is hard to judge without having some more details about your requirements.
I can picture times when it might be necessary to have this many, or more columns. Examples would be if you had to denormalize and cache data - or for a type of row with many attributes. I think the keys are to avoid select * and make sure you are indexing the right columns and composites.
If you had an object detailing the data in the database, would you have a single object with 120 fields, or would you be looking through the data to extract data that is logically distinguishable? You can inline Address data with Customer data, but it makes sense to remove it and put it into an Addresses table, even if it keeps a 1:1 mapping with the Person.
Down the line you might need to have a record of their previous address, and by splitting it out you've removed one major problem refactoring your system.
Are any of the fields duplicated over multiple rows? I.e., are the customer's details replicated, one per invoice? In which case there should be one customer entry in the Customers table, and n entries in the Invoices table.
One place where you need to not fix broken normalisation is where you have a facts table (for auditing, etc) where the purpose is to aggregate data to run analyses on. These tables are usually populated from the properly normalised tables however (overnight for example).
It sounds like you have potential normalization issues.
If you really want to, you can create a new table for each of those columns (a little extreme) or group of related columns, and join it on the ID of each record.
It could certainly affect performance if people are running around with a lot of "Select * from GiantTableWithManyColumns"...
Here are the official statistics for SQL Server 2005
http://msdn.microsoft.com/en-us/library/ms143432.aspx
Keep in mind these are the maximums, and are not necessarily the best for usability.
Think about splitting the 126 columns into sections.
For instance, if it is some sort of "person" table
you could have
Person
ID, AddressNum, AddressSt, AptNo, Province, Country, PostalCode, Telephone, CellPhone, Fax
But you could separate that into
Person
ID, AddressID, PhoneID
Address
ID, AddressNum, AddressSt, AptNo, Province, Country, PostalCode
Phone
ID, Telephone, Cellphone, fax
In the second one, you could also save yourself from data replication by having all the people with the same address have the same addressId instead of copying the same text over and over.
The UserData table in SharePoint has 201 fields but is designed for a special purpose.
Normal tables should not be this wide in my opinion.
You could probably normalize some more. And read some posts on the web about table optimization.
It is hard to say without knowing a little bit more.
Well, I don't know how many columns are possible in sql but one thing for which I am very sure is that when you design table, each table is an entity means that each table should contain information either about a person, a place, an event or an object. So till in my life I don't know that a thing may have that much data/information.
Second thing that you should notice is that that there is a method called normalization which is basically used to divide data/information into sub section so that one can easily maintain database. I think this will clear your idea.
I'm in a similar position. Yes, there truly is a situation where a normalized table has, like in my case, about 90, columns: a work flow application that tracks many states that a case can have in addition to variable attributes to each state. So as each case (represented by the record) progresses, eventually all columns are filled in for that case. Now in my situation there are 3 logical groupings (15 cols + 10 cols + 65 cols). So do I keep it in one table (index is CaseID), or do I split into 3 tables connected by one-to-one relationship?
Columns in a table1 (merge publication)
246
Columns in a table2 (SQL Server snapshot or transactional publication)
1,000
Columns in a table2 (Oracle snapshot or transactional publication)
995
in a table, we can have maximum 246 column
http://msdn.microsoft.com/en-us/library/ms143432.aspx
A table should have as few columns as possible.....
in SQL server tables are stored on pages, 8 pages is an extent
in SQL server a page can hold about 8060 bytes, the more data you can fit on a page the less IOs you have to make to return the data
You probably want to normalize (AKA vertical partitioning) your database

Resources