I'm work many times on CRM Online which causes this issue, and then I have to do create entity relationship to get thorugh the issue.
So,
Is there a way I can increase the field limit in CRM form while doing customization?
Version: Any Online Version
No, as noted in the comments this is a limit in SQL.
Maximum Capacity Specifications for SQL Server
Quite a good discussion here, Best Practices - Maximum number of fields on an entity?
(Though why you would want to put 1000 fields on a single entity is beyond me. Would be interested to know your solution for curiosities sake).
This field limit for the database is not limited to SQL but also affect to other Database too so incase we need to reuse the same data in other databases then there may be the issues.
So now this limit makes sense to me, though still wondering all columns are stored in single column or span across rows like chain of columns, as like Oracle does.
I know oracle support 255 columns in a single row and upto 1000 in table, but if you got more then 255 then it spans the remaining to next corresponding row accordingly.
Related
I have to create a database to store information being sent and received to / from a 3rd party web service portal. There are about 150 fields of information to be sent though I can remove about 50 of those fields by normalising (there are three sets addresses that can be saved in an address table, for example). However, this still leaves a table that could potentially have 100 columns.
I've come up with two ways of handling this though I'm not sure which to use:
1. Have a table with 100 columns and three references to an address table.
2. Break it down into maybe 15-20 separate dedicated tables.
Option 1 seems the quickest as it involves the fewest joins but the idea of a table with 100 columns doesn't feel right.
Option 2 feels better and would break things down in to more managable chunks but it won't save any database space and will increase the number of joins. Pretty much all the columns in the database will have a value and I cannot normalise these columns any further.
My question is, in this situation is it acceptable to have a table with c.100 columns in it or should I try and break it down over several tables for presentation?
Please note: The table structure will not change over the course of it's useage, a new database would be created for a new version of the web service portal. I have no control over the web service data structure.
Edit: #Oded's answer below has made me think a bit more about how the data will be accessed; it will really only be accessed in whole and not in part. I wouldn't for example, need to return columns 5-20 on a regular basis.
Answer: I accepted Oded's answer based on the comments after he posted it helped me make my mind up and I decided to go with option 1. As the data is accessed in full then having one table seems the better solution. If, for example, I regularly wanted to access columns 5-20 rather than the full table row then I'd see about breaking it up into separate tables for performance reasons.
Speaking from a relational purist point of view - first, there is nothing against having 100 columns in a table, if they are related. The point here is that if after normalizing you still have 100 columns, that's OK.
But you should normalize, and in the process you may very well end up with 15-20 separate dedicated tables, which most relational database professionals would agree is a better design (avoid data duplication with the update/delete issues associated, smaller data footprint etc...).
Pragmatically, however, if there is a measurable performance problem, it may be sensible to denormalize your design for performance benefit. The key here - measureable. Don't optimize before you have an actual problem.
In that respect, I'd say you should go with the set of 15-20 tables as an initial design.
From MSDN:Maximum Capacity Specifications for SQL Server :
Columns per nonwide table: 1,024
Columns per wide table: 30,000
So I think 100 columns is ok in your case. And also maybe you need to note(from same link):
Columns per primary key: 16
Of course this is only in the case if need data only as Log for a service.
If after reading from service you need to maintain data -> then normalising seems better...
If you find it easier to "manage" tables with fewer columns, however you happen to define manageability (e.g. less horizontal scrolling when looking at the table data in SSMS), you can break the table up into several tables with 1-to-1 relationships without violating the rules of normalization.
I'm designing a database that will need to be optimized for maximum speed.
All the database data is generated once from something I call an input database (which holds the data I'm editing, mainly some polylines, markers, etc for google maps).
So the database is not subject to editing, but it needs to hold as many data as it can for quickly displaying results to the user (routes across town, custom polylines, etc).
The question is: choosing smaller data types for example like smallint over int will improve performance or it will affect it? Space is not quite a problem, after some quick calculations, the database will not exceed 200mb, and there will not be tables with more than 100.000 rows (average will be around 5.000).
I'm asking this because I read some articles around the internet and some say that smaller data types improve performance others say that it affects it because additional processing must be done. I'm aware that for smaller databases probably results are not noticeable, but I'm interested in every bit because I'm expecting many requests which will trigger a lot more queries.
The hosting environment is gonna be Windows Server 2008 R2 with SQL Server 2008 R2.
EDIT 1: Just to give you an example because I don't have a proper table structure yet:
I'm going to have a table which will hold public transportation lines (somewhere around 200), identified by a unique number in real life, and which is going to be referenced in all sorts of tables and on which all sorts of operations are going to be made. These referencing tables will hold the largest amount of data.
Because lines have unique numbers, I have thought of 3 examples of designs:
The PK is the line number of datatype: smallint
The PK is the line number of datatype: int
The PK is something different (identity for example) and the line number is stored in a different field.
Just for the sake of argument, because I used this on the 'input database' which is not subject to optimization, the PK is a GUID (16 bytes); if you like, you can make a comparison of how bad is this compared to others, if it really is
So keep in mind that the PK is going to be referenced in at least 15 tables, some of which will have over 50.000 rows (the rest averaging 5.000 as I said above) which are going to be subject to constant querying and manipulation, and I'm interested in every bit of speed that I can get.
I can detail this even more if you need. Thanks
EDIT 2: And another question related to this came to my mind, think it fits into this discussion:
Will I see any performance improvements in this specific scenario if I use native SQL queries from inside my .NET application rather than using LINQ to SQL? I know LINQ is strongly optimized and generates very good queries performance-wise, but still, sure worth asking. Thanks again.
Can you point to some articles that say that smaller data types = more processing? Keeping in mind that even with SSDs most workloads today are I/O-bound (or memory-bound) and not CPU-bound.
Particularly in cases where the PK is going to be referenced in many tables, it will be beneficial to use the smallest data type possible. In this case if that's a SMALLINT then that's what I would use (though you say there are about 200 values, so theoretically you could use TINYINT which is half the size and supports 0-255). Where you need to exercise caution is if you aren't 100% sure that there will always be ~200 values. Once you need 256 you're going to have to change the data type in all of the affected tables, and this is going to be a pain. So sometimes a trade-off is made between accommodating future growth and squeezing the absolute most performance today. If you don't know for certain that you will never exceed 255 or 32,000 values then I would probably just an INT. Unless you also don't know that you won't ever exceed 2 billion values, in which case you would use BIGINT.
The difference between INT/SMALLINT/TINYINT is going to be more noticeable in disk space than in performance. (And if you're on Enterprise, the differences in both disk space and performance can be offset quite a bit using data compression - particularly while your INT values all fit within SMALLINT/TINYINT, though in the latter case it really will be negligible because the values are unique.) On the other hand, the difference between any of these and GUID is going to be much more noticeable in both performance and disk space. Marc gave some great links from Kimberly; I wrote this article in 2003 and while it's a little dated it does contain most of the salient points that are still relevant today.
Another trade-off that sometimes needs to be considered (though not in your specific case, it seems) is whether values need to be unique across multiple systems. This is where you might need to sacrifice some performance in order to meet business requirements. In a lot of cases folks take the easy way and resign themselves to GUID. But there are other solutions too, such as identity ranges, a central custom sequence generator, and the new SEQUENCE object in SQL Server 2012. I wrote about SEQUENCE back in 2010 when the first public beta of SQL Server 2012 was released.
I think you will need to provide some more details about the tables structure and sample queries that will be running against them. Based on the information that you have provided I believe that impact of choosing smaller data types will be just a couple of percents and I would suggest to give higher attention to indexes that you will have. SQL Server does a good job on suggesting what indexes to create by providing you with execution plans for your queries and tuning advisor tool
One suggestion that I have is to incorporate a decimal datatype instead of using a combination of fields. For example, instead of having a table with Date (YYYYMMDD), Store (SSSS), and Item (IIII), I would recommend...YYYYMMDD.SSSSIIII. Especially when querying multiple tables with this same key combination, it dramatically improves processing time.
I read your article(SQL Server partitioning: not the answer to everything)
and being amazing of use partitioning for my case or not
I must to store about 1000 record per a second this data is about location of mobile nodes, these data make my database too huge
do you think i must partitioning my database or not(I have so much reporting in future).
1000 a second isn't that much.
Is it every second of 24/7?
In a defined window?
Is it a peak of 1000 per second but usually less?
We have a recent system growing at 20 million rows/month (after tidy ups of say another 50-80 million) and we're not thinking of anything like partitioning.
That's a lot of data.
What is the lifecycle of the data i.e. do you only need to store the records for a finite amount of time? For example after a month, perhaps certain data can be archived off or moved to a Data warehouse?
Given the volume of data that you intend to work with you are probably going to want to use an architecture that scales easily? For this reason you may want to look at using Cloud type services such as Amazon Ec2, or SQL Data Services on the Azure Platform.
http://aws.amazon.com/ec2/
http://www.microsoft.com/azure/data.mspx
Perhaps if you provide more specific details about what it is you are actually looking to do i.e. what business process you are looking to support, we may be able to provide more specific assistance.
Without such details it is not possible to ascertain whether or not SQL Server Partitioning would be an appropriate design approach for you.
You might need to look at a different RDMS. I would take a look at Vertica.
Presuming the table in question is indexed, then one of two options is certainly warranted when any of the indexes outgrow the available RAM. Not surprisingly, one of them is, increase RAM. The other of course is vertical partitioning.
gbn's answer provides some good things to consider which you have not mentioned, such as how many records per month (or week, or day) are being added. Richard's comment as to how big the (average) record is is also significant, particularly in terms of how big the average records for the indexes are, presuming the indexes do not include all the fields from the table.
gbn's answer however also seems a bit reckless to me. Growing at 20 million rows per month and not even "thinking of anything like partitioning". Without sufficient metrics as alluded to above, this is a possible recipe for disaster. You should at least be thinking about it, even it just to determine how long you can sustain your current and/or expected rate of growth, before needing to consider more RAM or partitioning.
I have a request to allow a dynamic table to have 1000 columns(randomly selected by my end users). This seems like a bad idea to me. It's a customizable table so it will have a mixture of varchar(200) and float columns(float best matches the applications c++ double type). This database is mostly an index for a legacy application and serves as a reporting repository. It's not the system of record. The application has thousands of data points very few of which could be normalized out.
Any ideas as to what the performance implications of this are? Or an ideal table size to partition this down too?
Since I don't know what fields out of 20k worth of choices the end users will pick normalizing the tables is not feasible. I can separate this data out to several tables That I would have to dynamically manage (fields can be added or drooped. The rows are then deleted and the system of record is re parsed to fill the table.) My preference is to push back and normalize all 20k bits of data. But I don't see that happening.
This smells like a bad design to me.
Things to consider:
Will most of those columns be contain NULL values?
Will many be named Property001, Property002, Property003, etc...?
If so, I recommend you rethink your data normalization.
from SQL2005 documentation:
SQL Server 2005 can have up to two billion tables per database and 1,024 columns per table. (...) The maximum number of bytes per row is 8,060. This restriction is relaxed for tables with varchar, nvarchar, varbinary, or sql_variant columns that cause the total defined table width to exceed 8,060 bytes. The lengths of each one of these columns must still fall within the limit of 8,000 bytes, but their combined widths may exceed the 8,060 byte limit in a table.
what is the functionality of these columns? why not better split them into master table, properties (lookup tables) and values?
Whenever you feel the need to ask what limits the system has, you have a design problem.
If you were asking "How many characters can I fit into a varchar?" then you shouldn't be using varchars at all.
If you seriously want to know if 1000 columns is okay, then you desperately need to reorganize the data. (normalization)
MS SQL Server has a limit of 1024 columns per table, so you're going to be running right on the edge of this. Using varchar(200) columns, you'll be able to go past the 8k byte per row limit, since SQL will store 8k on the data page, and then overflow the data outside of the page.
SQL 2008 added Sparse Columns for scenarios like this - where you'd have a lot of columns with null values in them.
Using Sparse Columns
http://msdn.microsoft.com/en-us/library/cc280604.aspx
As a rule: the wider the table the slower the performance. Many thin tables are preferable to one fat mess of a table.
If your table is that wide it's almost certainly a design issue. There's no real rule on how many is preferable, I've never really come across tables with more than 20 columns in the real world. Just group by relation. It's a RDBMS after all.
This will have huge performance and data issues. It probably needs to be normalized.
While SQl server will let you create a table that has more than 8060 bytes inteh row, it will NOT let you store more data than that in it. You could have data unexpectedly truncated (and worse not until several months later could this happen by which time fixing this monstrosity is both urgent and exptremely hard).
Querying this will also be a real problem. How would you know which of the 1000 columns to look for the data? Should every query ask for all 1000 columns in the where clause?
And the idea that this would be user customizable is scary indeed. Why would the user need a 1000 fields to customize? Most applications I've seen which give the user a chance to customize some fields set a small limit (usually less than 10). If there is that much they need to customize, then the application hasn't done a good job of defining what the customer actually needs.
Sometimes as a developer you just have to stand up and say no, this is a bad idea. This is one of those times.
As to what you shoud do instead (other than normalize), I think we would need more information to point you in the right direction.
And BTW, float is an inexact datatype and should not be used for fields where calculations are taking place unless you like incorrect results.
I have to disagree with everyone here.....I know it sounds mad but using tables with hundreds of columns is the best thing I have ever done.
Yes many columns frequently have null values;
Yes I could normalise it to just a few tables and transpose;
Yes it is inefficient
However it is incredibly fast and easy to analyze column data in endless different ways
Wasteful and inelegant - you'll never build anything as useful!
That is too many. Any more than 50 columns wide and you are asking for trouble in performance, code maintenance, and troubleshooting when there are problems.
Seems like an awful lot. I would first make sure that the data is normalized. That might be part of your problem. What type of purpose will this data serve? Is it for reports? Will the data change?
I would think a table that wide would be a nightmare performance and maintenance-wise.
Did you think of viewing your final (1000 columns) table as the result of a crosstab query? Your original table would then have just a few columns but many thousand records.
Can you please elaborate on your problem? I think nobody really understand why you need these 1000 columns!
Are there any hard limits on the number of rows in a table in a sql server table? I am under the impression the only limit is based around physical storage.
At what point does performance significantly degrade, if at all, on tables with and without an index. Are there any common practicies for very large tables?
To give a little domain knowledge, we are considering usage of an Audit table which will log changes to fields for all tables in a database and are wondering what types of walls we might run up against.
You are correct that the number of rows is limited by your available storage.
It is hard to give any numbers as it very much depends on your server hardware, configuration, and how efficient your queries are.
For example, a simple select statement will run faster and show less degradation than a Full Text or Proximity search as the number of rows grows.
BrianV is correct. It's hard to give a rule because it varies drastically based on how you will use the table, how it's indexed, the actual columns in the table, etc.
As to common practices... for very large tables you might consider partitioning. This could be especially useful if you find that for your log you typically only care about changes in the last 1 month (or 1 day, 1 week, 1 year, whatever). You could then archive off older portions of the data so that it's available if absolutely needed, but won't be in the way since you will almost never actually need it.
Another thing to consider is to have a separate change log table for each of your actual tables if you aren't already planning to do that. Using a single log table makes it VERY difficult to work with. You usually have to log the information in a free-form text field which is difficult to query and process against. Also, it's difficult to look at data if you have a row for each column that has been changed because you have to do a lot of joins to look at changes that occur at the same time side by side.
In addition to all the above, which are great reccomendations I thought I would give a bit more context on the index/performance point.
As mentioned above, it is not possible to give a performance number as depending on the quality and number of your indexes the performance will differ. It is also dependent on what operations you want to optimize. Do you need to optimize inserts? or are you more concerned about query response?
If you are truly concerned about insert speed, partitioning, as well a VERY careful index consideration is going to be key.
The separate table reccomendation of Tom H is also a good idea.
With audit tables another approach is to archive the data once a month (or week depending on how much data you put in it) or so. That way if you need to recreate some recent changes with the fresh data, it can be done against smaller tables and thus more quickly (recovering from audit tables is almost always an urgent task I've found!). But you still have the data avialable in case you ever need to go back farther in time.