PostgreSQL column size vs table size - database

I am trying to get only one table size, and then size of each row.
While I was using
SELECT pg_table_size(*mytable*) as DATA;
I get 393216 bytes.
Using
SELECT (SUM(pg_column_size(t) - 24)) FROM *mytable* AS t;
(as written here How to get each row size of a particular table in postgresql..?)
I get 560669 bytes.
560669 vs 393216 bytes - which one is real?

From https://www.postgresql.org/docs/14/functions-admin.html
pg_table_size - Computes the disk space used by the specified table,
excluding indexes (but including its TOAST table if any, free space
map, and visibility map).
So pg_table_size gives you the amount of disk postgres is using for the table and for some metadata that postgres keeps about the table (Visibility Map and Free Space Map). Deleting a row will not decrease this number (unless you do a VACUUM FULL), so we wouldn't expect the disk used by a table to match the sum of the data in each visible row. Instead, the disk used by a table would be larger.
pg_column_size - Shows the number of bytes used to store any individual
data value. If applied directly to a table column value, this reflects
any compression that was done.
So this returns the size of each row on disk (including the row-header information stored on disk).
I'm not sure whether you'd consider the row header information 'real', but it does take up space on your harddrive, so whether this is correct or not depends on your use case.
Using an example table from a database I have:
SELECT pg_table_size('users')
-- 3751936 <-- the size of my 'users' table on disk, including table meta-data
SELECT (SUM(pg_column_size(t.*))) FROM users AS t;
-- 3483028 <-- the total size on disk of the visible rows, including row "header" metadata.
SELECT (SUM(pg_column_size(t.*)-24)) FROM users AS t;
-- 3069412 <-- the size of the data in visible rows, excluding row "header" meta-data
We'd expect each of these queries to return different numbers, and each is useful for a different purpose.
As for the specific numbers that you've posted (with the pg_column_size being larger than the pg_table_size) I can't explain.

Related

Reading a table from Oracle is 10 times slower than reading from Sql-Server

I have a table with just 5 columns and 8500 rows. The following simple query on an Oracle database is taking around 8 seconds whereas if I import the same table into a Sql-Server database then, it takes less than 1 seconds.
SELECT CustomerList.* FROM CustomerList ORDER BY TypeID, OrderNr, Title
QUESTION: I am completely new to databases and have started acquiring knowledge about it, but 8 seconds for 8500 records is a way too long time. Is there anything that I can try to resolve the issue?
UPDATE: I exported the table from the Oracle database as a text file and then imported the test file into another fresh Oracle database to create the same table. When I executed the above query onto this newly created database, the execution time of the query is again the same as before (i.e. around 8 seconds).
Regarding High Water Mark (HWM). IN oracle, space for a table's rows is allocated in big chunks called an 'extent'. When an extent is filled up with rows of data a new extent is allocated. The HWM is the pointer to the highest allocated address.
If rows are deleted, the space occupied remains allocated to that table and available for new rows without have to acquire more space for them. And the HWM remains. Even if you delete ALL of the rows (simple DELETE FROM MYTABLE), all of the space remains allocated to the table and available for new rows without having to acquire more space. And the HWM remains. So say you have a table with 1 billion rows. Then you delete all but one of those rows. You still have the space for 1 billion, and the HwM set accordingly. Now, if you select from that table without a WHERE condition that would use an index (thus forcing a Full Table Scan, or FTS) oracle still has to scan that billion-row space to find all of the rows, which could be scattered across the whole space. But when you insert those rows into another database (or even another table in the same database) you only need enough space for those rows. So selecting against the new table is accordingly faster.
That is ONE possibility of your issue.

How to enable in-rows (LOB ) storage in Sybase and when to consider enabling it?

I had some blocks in a table that contains IMAGE and TEXT columns(similar to this SO question) after some researches in-row and off-row feature in sybase ase 15.7 can improve the performance (if the size less then 4k in the logical storage, the LOB data will be place with the same page of table values thats called in-row more info here).
Can anyone explain:
-How to enable this feature on the database ? is it enabled with create table command ? or alter table ?
-How to check if its enables ?
-Why it might reduce or remove the blocks ?
-Why Text/image datatype might cause locks/blocks and enabling in-rows would remove it ?
You have to enable the option for each column via an alter table against the column for an existing table or you can set the option if it's a new table against the column:
alter table tablename modify mycol in row (500)
Here I've said anything less than 500 bytes in length for the column will be stored in-row, anything over that in size will be stored off-row (old default behaviour). This can massively shrink a table size where you have lots of very small text/image columns and a large page size as it avoids wasting a full page per row in the text chain.
One enabled it will show against the column in sp_help output. To check whether it's a benefit you need to consider:
Your Sybase dataserver page size (bigger page sizes waste more space for each text chain page)
The average size of data in your text/image column, as the smaller the data the more you will benefit from using in-row LOBs. If all your data is larger than the page size of your dataserver, there will be no benefit as the data will still be stored off-row as it was before the change.
You have to re-load the data into the table for the change to take effect so you can do this via a select into (to create a new copy of the table and data) or via BCP out/in.
There's some more info in the Sybase docs here:
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc32300.1570/html/sqlug/CHECDADI.htm
In terms of blocking, where the data meets the criteria and is held in-row it would be read from the table itself rather than from the text chain which is effectively a big heap at the end of the table. You also get the benefit (size of data depending) of significant space savings on your table size which reduces IO and thus can help performance.

sql azure setting value to null increases table size

I had an uniqueidentifier field in SQL Server (SQL Azure to be precise) that I wanted to get rid of. Initially, when I ran the code as mentioned in SQL Azure table size to determine the size of the table it was about 0.19 MB.
As a first step I set all values in the uniqueidentifier field to null. There are no constraints/indexes that use the column. Now, when I ran the code to determine the table sizes the table had increased in size to about 0.23 MB. There are no records being added to a table (its in a staging environment).
I proceeded to delete the column and it still hovered at the same range.
Why does the database size show an increase when I delete a column. Any suggestions?
Setting an uniqueidentifier column to NULL value does not change the record size in any way, since is a fixed size type (16 bytes). Dropping a fixed size column column does not change the record size, unless is the last column in the physical layout and the space can be reused later. ALTER TABLE ... DROP COLUMN is only a logical operation, it simply marks the columns as dropped, see SQL Server Columns Under the Hood.
In order to reclaim the space you need to drop the column and then rebuild the clustered index of the table, see ALTER INDEX ... REBUILD.
For the record (since SHRINK is not allowed in SQL Azure anyway) on the standalone SQL Server SHRINK would had solved nothing, this is not about page reservation but about physical record size.
It's counting the number of reserved pages to calculate the size. Deleting a column may reduce the number of pages that are actually utilized to store data, but the newly-freed pages are probably still reserved for future inserts.
I think you'd need to shrink the database to see the size decrease, as per: http://social.msdn.microsoft.com/Forums/en-US/ssdsgetstarted/thread/ae698613-79d5-4f23-88b4-f42ee4b3ad46/
As an aside, I am fairly sure that setting the value of a non-variable-length column (like a GUID) to null will not save you any space at all- only deleting the column will do so. This per Space used by nulls in database

Reduced Row size in SQL hasn't reduced table size

Can someone explain some behaviour I'm seeing in SQL Server 2005?
I've been tasked with reducing the size or our DB.
The table contains nearly 6 million records, and I calculated the row size as being 1990 bytes. I took a copy of the table, and reduced the row size down to 803 bytes, through various techniques.
When I compare the original table's Data Size (right-click properties or sp_spaceused) with the new table I'm seeing saving of just 21.7 MB. This is nowhere near what I was expecting.
Here is how I calculated the row-size:
If the column was numeric/decimal then I used the MSDN size (http://msdn.microsoft.com/en-us/library/ms187746.aspx), for everything else I used syscolumns.length. If the column was nullable I added an extra byte.
Here are some of the changes I implemented.
Turned unnecessary nvarchars into varchars
Made columns NOT NULL
Reduced max length of varchar columns to suit actual data
Removed some unused columns
Couple of datetime into smalldatetime
Turned some decimals into ints.
Merged 16 nullable BIT columns into a bit masked int.
From this, my calculations showed a 60% row size reduction and against a 6M row table I would have expected more than 21MB of saving. It went down from 2,762,536 KB to 2,740,816 KB.
Can someone please explain this behaviour to me?
p.s. This does not take into account any indexes.
The problem is that altering a table does not reclaim any space. Dropping a column is logical only, the column is hidden, not deleted. Modifying a column type will often result adding a new column and hiding the previous one. All these operations increase the physical size of the table. To reclaim the space 'for real' you need to rebuild the table. With SQL 2008 and up you would issue an ALTER TABLE ... REBUILD. In SQL 2005 you can issue DBCC REINDEX(table).
I think you need to rebuild the clustered index on the table.

How can I get the size in bytes of a table returned by a SQL query?

How can I get the size in bytes of a table returned by a SQL query in SSMs?
Are you looking for the size of the table, or the size of a row in the table? The latter is only readily available if all your columns are of fixed size, i.e. nchar and not nvarchar etc.
With var sized columns you can use the maximum length of each column, and sum these, to give you a maximum row size, but this really won't accurately reflect your real row sizes.
select sum(max_length)
from sys.columns
where object_id = object_id('MyTable')
You might also create a query that returns DATALENGTH for each column in any particular row to get the total size of only that row.
SQL queries don't return tables, they return results. There is no API to determine the size of a result because results have streaming semantics, you start reading the result until the end and you cannot know the size upfront. Sending the size upfront would require the server to first get the result, store it somewhere, determine its size (number of rows), and then send the size followed by result. Obviously, this is inefficient and completely undesirable. It is much better to start streaming the result as soon as available w/o having to store it intermediately.
Perhaps you're looking for something else?
The size of a table in the database can always be determined from its number of pages, see sys.allocation_units. The helper procedure sp_spaceused can read and format this information for you.
In SSMS only, you can "include client statistics" from one of the menus which gives some information
Otherwise, as per Remus' answer

Resources