how to calculate row size of an unstructured data? - database

In classical RDBMS it' relatively easy to calculate maximum row size by adding max size of each field defined within a table. This value multiplied by predicted number of rows will give max table size excluding indexes, logs etc.
Today in the era of structured way of storing unstructured data it's relatively hard to tell what will be the optimal table size.
Is there any way to calculate or predict table or even database growth and storage requirements without sample data load ?
What are your ways of calculating row size and planning storage capacity for unstructured database ?

It is pretty much the same. Find the average size of data you need to persist and multiply it with your estimated transaction count per time unit.
Database engines may allocate datafile chunks exponentially (first 16mb then 32mb etc.) so you need to know about the workings of your dbms engine to translate the data size to physical storage space size.

Related

In Postgres, is overall indexing time for a column dependent on row count or on table disk space usage?

I'm working with a few large row count tables with a commonly named column (table_id). I now intend to add an index to this column in each table. One or two of these tables use up 10x more space than the others, but let's say for simplicity that all tables have the same row count. Basically, the 10x added space is because some tables have many more columns than others. I have two inter-related questions:
I'm curious about whether overall indexing time is a function of the table's disk usage or just the row count?
Additionally, would duplicate values in a column speed up the indexing time at all, or would it actually slow down indexing?
If indexing time is only dependent on row count, then all my tables should get indexed at the same speed. I'm sure someone could do benchmark tests to answer all these questions, but my disks are currently tied up indexing those tables.
The speed of indexing depends on the following factors:
Count of rows, it is one of the factors that most affect the speed of the index.
The type of column (int, text, bigint, json) is also one of the factors that influence indexing.
Duplicate data affects index size, not index speed. It may have a very slight effect on the speed of the index. It mainly affects the size. So if a column has a lot of duplicate data, the size of the column index decreases.
The speed of the index can be affected by the disk in this way, for example: When you created index choosing is in different tablespace, and this tablespace is setting to another HDD using configuration. And at this time, if the disk on which the index is created is an SSD, and the other is a regular HDD disk, then of course the index creation speed will increase.
Also, PostgreSQL server configurations have memory usage and other parameters that can affect index creation speed, so if some parameters like buffer memory are high, then indexing speed will increase.
The speed of CREATE INDEX depends on several factors:
the kind of index
the number of rows
the speed of your disks
the setting of maintenance_work_mem and max_parallel_maintenance_workers
The effort to sort the data grows with O(n * log(n)), where n is the number of rows. Reading and writing should grow linearly with the number of rows.
I am not certain, but I'd say that duplicate rows should slow down indexing a little bit from v13 on, where B-tree index deduplication was introduced.

How to estimate tablespace size in Oracle

i want to Estimate tablespace and block size in Oracle DataBase when i expect records in each table from 100K to 500K ,
i cannot find equation or way to esstimate this point.
It will depend a lot on your specific data, the number and type of indexes you have, and on things like use of compression and encryption of data at rest. There isn't a single one-size-fits-all formula to use. If your data is already stored somewhere else, use that as a guideline for the rough order of magnitude for the raw table data, then make your best guess on space required for indexes and leave room to grow based on the number and type of transactions you expect against the tables.
The trick is to give yourself just enough space to hold the data you have, plus room to grow in reasonably sized increments over time (you don't want to grow too often, so don't make the datafile auto expand increments too small).
You don't have to be super precise as far as Oracle is concerned. If you don't allocate enough space to start, your datafiles can grow over time as needed, or you can add more datafiles to your tablespace. If you allocate too much space to start, you can reduce the size of your datafiles so that you don't have a lot of empty space to backup for no reason.

How to estimate the RAM used in an aggregation stage in a MongoDB query?

According to documentation, each stage must not exceed 100mb of RAM. How is it not possible to exceed this amount if your collection contains billions of rows then?
How to estimate the size of each stage?

What is the maximum file size limit of a SQLite db file containing one table on android

What is the maximum size limit of a SQLite db file containing one table on Android devices? Is there any limit on the number of columns inside a table?
The answers to these and others can be found here Limits In SQLite
File size as far as SQLite is concerned will more than likely be the constraint of the underlying file system rather than SQLite's potential for a theoretical limit of 140 Terabytes (241TB as from Version 3.33.0 - see Update). The underlying restriction, as far as SQLite is concerned, being the maximum number of pages which defaults to 1073741823 but can be as large as 2147483646. as per :-
Maximum Number Of Pages In A Database File
SQLite is able to limit the size of a database file to prevent the
database file from growing too large and consuming too much disk
space. The SQLITE_MAX_PAGE_COUNT parameter, which is normally set to
1073741823, is the maximum number of pages allowed in a single
database file. An attempt to insert new data that would cause the
database file to grow larger than this will return SQLITE_FULL.
The largest possible setting for SQLITE_MAX_PAGE_COUNT is 2147483646.
When used with the maximum page size of 65536, this gives a maximum
SQLite database size of about 140 terabytes.
The max_page_count PRAGMA can be used to raise or lower this limit at
run-time.
Maximum Number Of Rows In A Table
The theoretical maximum number of rows in a table is 2^64
(18446744073709551616 or about 1.8e+19). This limit is unreachable
since the maximum database size of 140 terabytes will be reached
first. A 140 terabytes database can hold no more than approximately
1e+13 rows, and then only if there are no indices and if each row
contains very little data.
Maximum Database Size
Every database consists of one or more "pages". Within a single
database, every page is the same size, but different database can have
page sizes that are powers of two between 512 and 65536, inclusive.
The maximum size of a database file is 2147483646 pages. At the
maximum page size of 65536 bytes, this translates into a maximum
database size of approximately 1.4e+14 bytes (140 terabytes, or 128
tebibytes, or 140,000 gigabytes or 128,000 gibibytes).
This particular upper bound is untested since the developers do not
have access to hardware capable of reaching this limit. However, tests
do verify that SQLite behaves correctly and sanely when a database
reaches the maximum file size of the underlying filesystem (which is
usually much less than the maximum theoretical database size) and when
a database is unable to grow due to disk space exhaustion.
Update (1 June 2021)
As of SQLite Version 3.33.0 (not yet included with Android) the maximum page size has been increased (doubled). So the theoretical maximum database size is now 281TB. As per :-
Maximum Number Of Pages In A Database File
SQLite is able to limit the size of a database file to prevent the database file from growing too large and consuming too much disk space. The SQLITE_MAX_PAGE_COUNT parameter, which is normally set to 1073741823, is the maximum number of pages allowed in a single database file. An attempt to insert new data that would cause the database file to grow larger than this will return SQLITE_FULL.
The largest possible setting for SQLITE_MAX_PAGE_COUNT is 4294967294. When used with the maximum page size of 65536, this gives a maximum SQLite database size of about 281 terabytes.
The max_page_count PRAGMA can be used to raise or lower this limit at run-time.
Maximum Database Size
Every database consists of one or more "pages". Within a single database, every page is the same size, but different database can have page sizes that are powers of two between 512 and 65536, inclusive. The maximum size of a database file is 4294967294 pages. At the maximum page size of 65536 bytes, this translates into a maximum database size of approximately 1.4e+14 bytes (281 terabytes, or 256 tebibytes, or 281474 gigabytes or 256,000 gibibytes).
This particular upper bound is untested since the developers do not have access to hardware capable of reaching this limit. However, tests do verify that SQLite behaves correctly and sanely when a database reaches the maximum file size of the underlying filesystem (which is usually much less than the maximum theoretical database size) and when a database is unable to grow due to disk space exhaustion.
However, other limits may be of concern so it is suggested that the document as linked to above is studied.
The default maximum number of columns is 2000, you can change this at compile time to to maximum of 32767, as per :-
Maximum Number Of Columns
The SQLITE_MAX_COLUMN compile-time parameter is used to set an upper
bound on:
The number of columns in a table The number of columns in an index The
number of columns in a view The number of terms in the SET clause of
an UPDATE statement The number of columns in the result set of a
SELECT statement The number of terms in a GROUP BY or ORDER BY clause
The number of values in an INSERT statement The default setting for
SQLITE_MAX_COLUMN is 2000. You can change it at compile time to values
as large as 32767. On the other hand, many experienced database
designers will argue that a well-normalized database will never need
more than 100 columns in a table.
In most applications, the number of columns is small - a few dozen.
There are places in the SQLite code generator that use algorithms that
are O(N²) where N is the number of columns. So if you redefine
SQLITE_MAX_COLUMN to be a really huge number and you generate SQL that
uses a large number of columns, you may find that sqlite3_prepare_v2()
runs slowly.
The maximum number of columns can be lowered at run-time using the
sqlite3_limit(db,SQLITE_LIMIT_COLUMN,size) interface.

Does DynamoDB scale when partition key bucket gets huge?

Does DynamoDB scale when partition key bucket gets huge? What are the possible solutions if this is really a problem?
TL;DR - Yes, DynamoDB scales very well as long as you use the service as intended!
As a user of Dynamo your responsibility is to choose a partition key (and range key combination if need be) that provides a relatively uniform distribution of the data and to allocate enough read and write capacity for your use case. If you do use a range key, this means you should aim to have approximately the same number of elements for each of the partition key values in your table.
As long as you follow this rule, Dynamo will scale very well. Even when you hit the size limit for a partition, Dynamo will automatically split the data in the original partition into two equal sized partitions which will each receive about half the data (again - as long as you did a good job of choosing the partition key and range key). This is very well explained in the Dynamo DB documentation
Of course, as your table grows and you get more and more partitions, you will have to allocate more and more read and write capacity to ensure enough is provisioned to sustain all partitions of your table. The capacity is equally distributed to all partitions in your table (except for splits - though, again, if the distribution is uniform even splits will receive capacity uniformly).
For reference, see also: how partitions work.

Resources