Clarification on the Snowflake Micro Partition Size? - snowflake-cloud-data-platform

I need the clarification on the size of Snowflake's Micro partition size. In the snowflake official document, It is mentioned as below.
Each micro-partition contains between 50 MB and 500 MB of uncompressed data (note that the actual size in Snowflake is smaller because data is always stored compressed).
However in some places i see below statement on the micro partition size.
Snowflake also stores multiple rows together in micro-partitions, which are variable-length data blocks of around 16 Mb size
What is the size of the data that Micro-partition can hold 16 MB or (50 -500 MB), or else does each Micro-partition has data block which is of 16 MB?

The key point is compression:
Benefits of Micro-partitioning
As the name suggests, micro-partitions are small in size (50 to 500 MB, before compression), which enables extremely efficient DML and fine-grained pruning for faster queries.
Columns are also compressed individually within micro-partitions. Snowflake automatically determines the most efficient compression algorithm for the columns in each micro-partition.
The size of 50-500MB is for uncompressed data and micropartition itself holds around 16MB(after compression).

Related

How to estimate the RAM used in an aggregation stage in a MongoDB query?

According to documentation, each stage must not exceed 100mb of RAM. How is it not possible to exceed this amount if your collection contains billions of rows then?
How to estimate the size of each stage?

What is the maximum file size limit of a SQLite db file containing one table on android

What is the maximum size limit of a SQLite db file containing one table on Android devices? Is there any limit on the number of columns inside a table?
The answers to these and others can be found here Limits In SQLite
File size as far as SQLite is concerned will more than likely be the constraint of the underlying file system rather than SQLite's potential for a theoretical limit of 140 Terabytes (241TB as from Version 3.33.0 - see Update). The underlying restriction, as far as SQLite is concerned, being the maximum number of pages which defaults to 1073741823 but can be as large as 2147483646. as per :-
Maximum Number Of Pages In A Database File
SQLite is able to limit the size of a database file to prevent the
database file from growing too large and consuming too much disk
space. The SQLITE_MAX_PAGE_COUNT parameter, which is normally set to
1073741823, is the maximum number of pages allowed in a single
database file. An attempt to insert new data that would cause the
database file to grow larger than this will return SQLITE_FULL.
The largest possible setting for SQLITE_MAX_PAGE_COUNT is 2147483646.
When used with the maximum page size of 65536, this gives a maximum
SQLite database size of about 140 terabytes.
The max_page_count PRAGMA can be used to raise or lower this limit at
run-time.
Maximum Number Of Rows In A Table
The theoretical maximum number of rows in a table is 2^64
(18446744073709551616 or about 1.8e+19). This limit is unreachable
since the maximum database size of 140 terabytes will be reached
first. A 140 terabytes database can hold no more than approximately
1e+13 rows, and then only if there are no indices and if each row
contains very little data.
Maximum Database Size
Every database consists of one or more "pages". Within a single
database, every page is the same size, but different database can have
page sizes that are powers of two between 512 and 65536, inclusive.
The maximum size of a database file is 2147483646 pages. At the
maximum page size of 65536 bytes, this translates into a maximum
database size of approximately 1.4e+14 bytes (140 terabytes, or 128
tebibytes, or 140,000 gigabytes or 128,000 gibibytes).
This particular upper bound is untested since the developers do not
have access to hardware capable of reaching this limit. However, tests
do verify that SQLite behaves correctly and sanely when a database
reaches the maximum file size of the underlying filesystem (which is
usually much less than the maximum theoretical database size) and when
a database is unable to grow due to disk space exhaustion.
Update (1 June 2021)
As of SQLite Version 3.33.0 (not yet included with Android) the maximum page size has been increased (doubled). So the theoretical maximum database size is now 281TB. As per :-
Maximum Number Of Pages In A Database File
SQLite is able to limit the size of a database file to prevent the database file from growing too large and consuming too much disk space. The SQLITE_MAX_PAGE_COUNT parameter, which is normally set to 1073741823, is the maximum number of pages allowed in a single database file. An attempt to insert new data that would cause the database file to grow larger than this will return SQLITE_FULL.
The largest possible setting for SQLITE_MAX_PAGE_COUNT is 4294967294. When used with the maximum page size of 65536, this gives a maximum SQLite database size of about 281 terabytes.
The max_page_count PRAGMA can be used to raise or lower this limit at run-time.
Maximum Database Size
Every database consists of one or more "pages". Within a single database, every page is the same size, but different database can have page sizes that are powers of two between 512 and 65536, inclusive. The maximum size of a database file is 4294967294 pages. At the maximum page size of 65536 bytes, this translates into a maximum database size of approximately 1.4e+14 bytes (281 terabytes, or 256 tebibytes, or 281474 gigabytes or 256,000 gibibytes).
This particular upper bound is untested since the developers do not have access to hardware capable of reaching this limit. However, tests do verify that SQLite behaves correctly and sanely when a database reaches the maximum file size of the underlying filesystem (which is usually much less than the maximum theoretical database size) and when a database is unable to grow due to disk space exhaustion.
However, other limits may be of concern so it is suggested that the document as linked to above is studied.
The default maximum number of columns is 2000, you can change this at compile time to to maximum of 32767, as per :-
Maximum Number Of Columns
The SQLITE_MAX_COLUMN compile-time parameter is used to set an upper
bound on:
The number of columns in a table The number of columns in an index The
number of columns in a view The number of terms in the SET clause of
an UPDATE statement The number of columns in the result set of a
SELECT statement The number of terms in a GROUP BY or ORDER BY clause
The number of values in an INSERT statement The default setting for
SQLITE_MAX_COLUMN is 2000. You can change it at compile time to values
as large as 32767. On the other hand, many experienced database
designers will argue that a well-normalized database will never need
more than 100 columns in a table.
In most applications, the number of columns is small - a few dozen.
There are places in the SQLite code generator that use algorithms that
are O(N²) where N is the number of columns. So if you redefine
SQLITE_MAX_COLUMN to be a really huge number and you generate SQL that
uses a large number of columns, you may find that sqlite3_prepare_v2()
runs slowly.
The maximum number of columns can be lowered at run-time using the
sqlite3_limit(db,SQLITE_LIMIT_COLUMN,size) interface.

Why LMDB database taking more than actual data size?

I put around 11K key&values in LMDB database .
LMDB database file size become 21Mb.
For the same data the leveldb is taking 8Mb only (with snappy compression).
LMDB env info ,
VERSION=3
format=bytevalue
type=btree
mapsize=1073741824
maxreaders=126
db_pagesize=4096
TO check why LMDB file size is more ,I iterated through all keys & values inside
the database. The total size of all key & value is 10Mb.
But the actual size of the file is 21Mb.
Remaining file size of 11Mb (21Mb - 10Mb) used for what purpose???!!.
If i compress data before put operation ,only 2Mb got reduced
Why LMDB database file size is more than actual data size?
Any way to shrink it ?
The database is bigger than the original file because lmdb requires to do some bookeeping to keep the data sorted. Also, there is an overhead because even if your record (key + value) is say 1kb lmdb allocates a fixed size of space to store those. I don't know the actual value. But this overhead is always expected.
Compression doesn't work well on small records.
lmdb doesn't support prefix or block compression. Your best bet is to use a key-value store that does, like wiredtiger.

SQL Server is creating too many LOB pages

I have a strange situation with a SQL Server database where to actual data in the table is roughly 320 MiB. This is determined by summing up the DATALENGTH of all the columns. This will ignore fragmentation, index space and other SQL Server internal overhead. The problem though is that the table is roughly 40 GiB in size and it's growing at an alarming rate, very disproportionate to the amount of data in bytes or rows that was inserted.
I used the sys.dm_db_index_physical_stats function to look at the physical data and the roughly 40 GiB of data is tied up in LOB_DATA.
The most of the 320 MiB that makes up the table contents is of type ntext. Now, my question is how come SQL Server has allocated 40 GiB of LOB_DATA when there's only roughly 310 MiB ntext data.
Will the problem go away if we convert the column to nvarchar(max)? Are there any storage engine specifics regarding ntext and LOB_DATA that is causing the LOB_DATA pages to not be reclaimed? Why is it groing at a so disproportionate rate with regards to the amount of changes that are being made?

how to calculate row size of an unstructured data?

In classical RDBMS it' relatively easy to calculate maximum row size by adding max size of each field defined within a table. This value multiplied by predicted number of rows will give max table size excluding indexes, logs etc.
Today in the era of structured way of storing unstructured data it's relatively hard to tell what will be the optimal table size.
Is there any way to calculate or predict table or even database growth and storage requirements without sample data load ?
What are your ways of calculating row size and planning storage capacity for unstructured database ?
It is pretty much the same. Find the average size of data you need to persist and multiply it with your estimated transaction count per time unit.
Database engines may allocate datafile chunks exponentially (first 16mb then 32mb etc.) so you need to know about the workings of your dbms engine to translate the data size to physical storage space size.

Resources