I have an app that will have entries of both varchar(max) and varbinary(max) data types. I was considering putting these both in a separate table, together, even if only one of the two will be used at any given time.
The question is whether storing them together has any impact on performance. Considering that they are stored in the heap, I'm thinking that having them together will not be a problem. However, the varchar(max) column will be probably have the text in row table option set.
I couldn't find any performance testing or profiling while "googling bing," probably too specific a question?
The SQL Server 2008 table looks like this:
Id
ParentId
Version
VersionDate
StringContent - varchar(max)
BinaryContent - varbinary(max)
The app will decide which of the two columns to select for when the data is queried. The string column will much used much more frequently than the binary column - will this have any impact on performance?
See this earlier answer and this answer.
text in row option is deprecated and applies the the text, ntext, image data types
varchar(max) by default stores the text in the row up to the limit and then outside the row above the limit unless large value types out of row option is set, in which it's always stored out of row - which would now mean you're storing the data out of the table and then out of that table, too ;-).
If you are already storing them in this separate table, that might not strictly be necessary unless you need the one-to-many relationship which your other columns suggest - since the size of the existing data in your parent row may force these elements out of row. With the data logically in a separate table, you do get more options, however.
Related
I have been working with Redshift and now testing Snowflake. Both are columnar databases. Everything I have read about this type of databases says that they store the information by column rather than by row, which helps with the massive parallel processing (MPP).
But I have also seen that they are not able to change the order of a column or add a column in between existing columns (don't know about other columnar databases). The only way to add a new column is to append it at the end. If you want to change the order, you need to recreate the table with the new order, drop the old one, and change the name of the new one (this is called a deep copy). But this sometimes can't be possible because of dependencies or even memory utilization.
I'm more surprised about the fact that this could be done in row databases and not in columnar ones. Of course, there must be a reason why it's not a feature yet, but I clearly don't have enough information about it. I thought it was going to be just a matter of changing the ordinal of the tables in the information_schema but clearly is not that simple.
Does anyone know the reason of this?
Generally, column ordering within the table is not considered to be a first class attribute. Columns can be retrieved in whatever order you require by listing the names in that order.
Emphasis on column order within a table suggests frequent use of SELECT *. I'd strongly recommend not using SELECT * in columnar databases without an explicit LIMIT clause to minimize the impact.
If column order must be changed you do that in Redshift by creating a new empty version of the table with the columns in the desired order and then using ALTER TABLE APPEND to move the data into the new table very quickly.
https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE_APPEND.html
The order in which the columns are stored internally cannot be changed without dropping and recreating them.
Your SQL can retrieve the columns in any order you want.
General requirement to have columns listed in some particular order is for the viewing purpose.
You could define a view to be in the desired column order and use the view in the required operation.
CREATE OR REPLACE TABLE CO_TEST(B NUMBER,A NUMBER);
INSERT INTO CO_TEST VALUES (1,2),(3,4),(5,6);
SELECT * FROM CO_TEST;
SELECT A,B FROM CO_TEST;
CREATE OR REPLACE VIEW CO_VIEW AS SELECT A,B FROM CO_TEST;
SELECT * FROM CO_VIEW;
Creating a view to list the columns in the required order will not disturb the actual table underneath the view and the resources associated with recreation of the table is not wasted.
In some databases (Oracle especially) ordering columns on table will make difference in performance by storing NULLable columns at the end of the list. Has to do with how storage is beiing utilized within the data block.
I got a database that have 2TB of data, and i wanna reduce it to 500Go by dropping some rows and removing some useless columns, but i have other ideas of optimizations, and i need an answer of some questions before.
My database got one .mdf file, and 9 other .ndf file and each file has an initiale size of 100Go.
Should I reduce the initiale size of each .ndf file to 50Go? can this operation affect my data?
Dropping an index help to reduce space?
PS : My Database contains only one single table, that has one clustered index and two other non clustered indexes,
I want to remove the two non clustered indexes
Remove the insertdate column
If you have any other ideas of optimizations, it would be very helpful
Before droping any indexes run these two views.
sys.dm_db_index_usage_stats
sys.dm_db_index_operational_stats
They will let you know if any of them are being used to support queries. The last thing you want is to remove an index and start seeing full table scans on a 2TB table.
If you can't split up the table into a relational model then try these for starters.
Check your data types.
-Can you replace NVARCHAR with VARCHAR or NCHAR with CHAR? (they take up half the space)
-Does your table experience a lot of Updates or a lot of Inserts (above view will tell you this)? If there are very few updates then consider changing CHAR fields to VARCHAR fields. Heavy updates can cause page splits and result in poor Page fullness.
-Check that columns only storing a Date with no time are not declared as Datetime
-Check value ranges in numeric fields i.e. try and use Smallint instead of Int.
Look at the activity on the table, update & insert behaviour. If the activity means very few Pages are re-arranged then consider increasing your Fill Factor.
Look at the Plan Cache, get an idea of how the table is being queried, if the bulk of queries focus on a specific portion of the table then implement a Filtered Index.
Is your Clustered Index Unique? If not then SQL creates a "hidden extra Integer column" that creates uniqueness under the bonnet.
What is the best way to model a database? I have many known channels with values. Is it better create one table with many columns, one for each channel or create two table one for values and one for channels? Like that:
Table RAW_VALUES: SERIE_ID, CHANNEL_1, ..., CHANNEL_1000
or
Table RAW_VALUES: SERIE_ID, CHANNEL_ID, VALUE
Table CHANNELS: CHANNEL_ID, NAME, UNIT, ....
My question is about performance to search some data or save database space.
Thanks.
Usually, one would want to know what type of queries you will run against the tables as well as the data distribution etc to choose between two designs. However, I think that there are more fundamental issues here to guide you.
The second alternative is certainly more flexible. Adding one more channel ("Channel_1001") can be done simply by inserting rows in the two tables (a simple DML operation), whereas if you use the first option, you need to add a column to the table (a DDL operation), and that will not be usable by any programs using this table unless you modify them.
That type of flexibility alone is probably a good reason to go with the second option.
Searching will also be better served with the second option. You may create one index on the raw_values table and support indexed searches on the Channel/Value columns. (I would avoid the name "value" for a column by the way.)
Now if you consider what column(s) to index under the first option, you will probably be stumped: you have 1001 columns there. If you want to support indexed searches on the values, would you index them all? Even if you were dealing with just 10 channels, you would still need to index those 10 columns under your first option; not a good idea in general to load a table with more than a few indexes.
As an aside, if I am not mistaken, the limit is 1000 columns per table these days, but a table with more than 255 columns will store a row in multiple row pieces, each storing 255 columns and that would create a lot of avoidable I/O for each select you issue against this table.
I'm making changes to a Sql Server (2008) database to change an existing column type from a UUID to a varchar(128) - the column, we'll call it reference was originally used to save an ID of a reference project that provided more information about the data in the row, but now the business requirements have changed and we're just going to allow free-form text instead. There were no foreign keys setup against the column, so there won't be any breaking relationships.
My biggest concern at the moment is one of backwards compatibility with existing stored procedures. Unfortunately, the database is very big and very old, and contains hundreds of stored procs littered throughout the design. I'm very nervous about making a change which could potentially break existing stored procs.
I know this is a loaded question, but in general, can UUID columns be converted into varchar columns without deleterious effects? It seems like any existing stored proc which inserted into or queried on the UUID column would still work, given that UUIDs can be represented as strings.
I've tried performed the below steps and I didn't see any issues, so i think you can go ahead and change the column datatype:
Created the table with the column type as unique identifier.
Created the stored procedure to insert a value into that table through NewID() function.
Execute the stored procedure, and data was inserted without any issue.
Now, I changed the column type to varchar and again executed the stored procedure.
Procedure ran fine without any issue, and data was inserted.
So, the answer is Yes, you can change the column data type.
I want to add a column to one of my tables in SQL Server. I don't want it to be at the end of the column listing in the table...I actually want it to be somewhere else (location wise) in the table. Is there another option besides dropping and rebuilding (populating) the table to accomplish this? I obviously don't want ot lose any of my data, but I would prefer it not have to have the column at the end of the table definition.
Thanks,
S
Column order in the table is irrelevant -- it's purely cosmetic.
There is no extension to ALTER TABLE that allows you to specify the ordinal position of a new column (either for adding a new column or moving an existing column).
For more on the subject, see:
Change Order of Column In Database Tables
Short answer:
I agree with OMG Ponies, column order isn't important. If you don't have a clustered index, rather drop and recreate the table than run an ALTER TABLE x ADD col.
Long answer:
If your table has a fair bit of data (50Mb comes to mind) then you will be better off recreating the table rather than ALTER TABLE x ADD col The data page allocation plan for the table is calculated at table creation time so when you add a column, SQL Server will typically put your new column's data in separate pages and put forward pointers from your existing data pages to the new data pages for the column you added. If you're going to use the new column extensively then your table IO will be quite poor since reading even 1 row will require reading at least 2 pages. Table scans will also perform poorly since forward pointers will always be followed, causing table scans that are normally sequential to jump back and forth on your disk during a read.
In this case it's better to rename the existing table, recreate your table with the new column, insert into table_name select col1, col2, 'null or default for new col', col3 from temp_renamed_table and finally drop the old table that you renamed. The data pages will be much better organised and your IO will be faster despite looking the same from a SQL developer's point of view than when ALTER TABLE is used. If you have a clustered index the table will be reorganised when you add the column and page splits are less likely. You could also run ALTER TABLE x REBUILD if you have SQL Server 2008, don't have a clustered index and lots of time when users aren't using your table. It's hard to comment on your indexing strategy without knowing much more.
This is a much better reason for recreating the table than something cosmetic like column order.
It is an extremely bad practice to rearrange columns in a table. Do not even consider trying to do such a thing. The column order is irrelevant if you have used correct coding practices (such as never and I do mean never) using select *.
If you have used select * and you change the order of the columns, you are even more at risk of breaking code because the query may not be expecting Price as the third column but as the second column and that could seriously mess up a lot of things.
Further the only way to do this is to create another table, move your data and then drop the old table and rename the first one. Of course if you have FKs, they too have to be dropped and recreated. This takes alot of time if you hav ea large data set and could cause problems for users.
There is no cuircumstance where you would ever consider doing this for a table that in on production as it is just too risky. If you are in the early stages of design, you could consider doing it.
ALTER TABLE my_table ADD COLUMN column_name VARCHAR(50) AFTER col_name;
substituting whatever def you want for VARCHAR(50)
http://dev.mysql.com/doc/refman/5.1/en/alter-table.html
Edit This is of course the right answer for a MySQL server... but this is not what the OP wants.