i want to Estimate tablespace and block size in Oracle DataBase when i expect records in each table from 100K to 500K ,
i cannot find equation or way to esstimate this point.
It will depend a lot on your specific data, the number and type of indexes you have, and on things like use of compression and encryption of data at rest. There isn't a single one-size-fits-all formula to use. If your data is already stored somewhere else, use that as a guideline for the rough order of magnitude for the raw table data, then make your best guess on space required for indexes and leave room to grow based on the number and type of transactions you expect against the tables.
The trick is to give yourself just enough space to hold the data you have, plus room to grow in reasonably sized increments over time (you don't want to grow too often, so don't make the datafile auto expand increments too small).
You don't have to be super precise as far as Oracle is concerned. If you don't allocate enough space to start, your datafiles can grow over time as needed, or you can add more datafiles to your tablespace. If you allocate too much space to start, you can reduce the size of your datafiles so that you don't have a lot of empty space to backup for no reason.
Related
I'm somewhat confused about tablespace and what requirements determines the size needed for it to hold the data.
I have read the documentation, and many articles, including answers here in stackoverflow about tablespace, but I still don't get it.
Let's say I want to create 3 tables:
customer
product
sales
Does the above schema effect the size you choose for your tablespace? or is it completely irrelevant? If it is irrelevant, then what is relevant in this case?
Can someone please explain in simple terms for people who are new to this study.
The size (and number) of data files assigned to a tablespace depends on the amount of data that you're going to be storing in your tables. In most organizations, it also depends on what size chunks your storage admins prefer to use, how long it takes to get additional storage space, and other organization-specific bits of information.
Estimating the size of a table can get a bit complicated depending on how close you want to get and how much knowledge you have about your data. For estimating the size of data files to allocate to a tablespace, though, you can generally get away with a pretty basic estimate and then just monitor actual utilization.
Let's say that your customer table has a customer_id column that is a numeric identifier, it has a name column that averages, say, 30 characters, and a create_date that tells you when it was created. Roughly, that means that every row requires 7 bytes for the create_date, 30 bytes for the name, lets say an average of 5 bytes for the customer_id for a total of 42 bytes. If we expect to have, say 1,000,000 customers in the first 6 months (we're an optimistic bunch), we'd expect our table to be about 42 MB in size. If we repeat the process for the other tables in the tablespace and add up the result, that gives us a guess as to how big the data files we'd need to allocate to cover the first 6 months of operation.
Of course, in reality, there are lots of complications. You can't just add up the size of the columns to get the size of a row. You'd have to figure out how many rows would be in a block which may depend on patterns of how data changes over time. I'm ignoring things like pctfree that reserve space for future updates to rows. Plus your estimates for how many rows you're going to have and how big various strings will be are rarely particularly accurate. So the estimate you're coming up with is extremely rough. In this case, though, even if you're off by a factor of 2, it's not that big of a deal in general. Once you do the initial allocation, you'll want to monitor how much space is actually used. So you can always go in later and add files, increase the size of files, etc. if you're using more space than you guessed.
I have to create an index on a specific tablespace, on oracle database. I would like to know if there is a way to tell how much space of the tablespace will it take the creation of the index, so I can assure that my tablespace is capable of handling such index.
The dbms_space package has a procedure create_index_cost that will tell you the number of bytes that would be allocated to the index segment (which is presumably what you care about if you're trying to determine whether it will fit in your tablespace) and the number of bytes of that allocation that would actually be used. This procedure relies on the statistics that have been gathered on the underlying table, however, so if those statistics are inaccurate, the procedure's estimates will also be inaccurate.
A quick web search yields this (removed the actual calcs for that user's particular case):
A rough estimate of the space the index will need can be made by
adding the expected actual length of each column plus 6 for the rowid plus 2 for the header X number of table rows that will have an entry ...
we will use 20% for block overhead ...
The actual allocation will vary depending on your tablespace extent allocation method.
From here
Or a more detailed way of estimating here.
In classical RDBMS it' relatively easy to calculate maximum row size by adding max size of each field defined within a table. This value multiplied by predicted number of rows will give max table size excluding indexes, logs etc.
Today in the era of structured way of storing unstructured data it's relatively hard to tell what will be the optimal table size.
Is there any way to calculate or predict table or even database growth and storage requirements without sample data load ?
What are your ways of calculating row size and planning storage capacity for unstructured database ?
It is pretty much the same. Find the average size of data you need to persist and multiply it with your estimated transaction count per time unit.
Database engines may allocate datafile chunks exponentially (first 16mb then 32mb etc.) so you need to know about the workings of your dbms engine to translate the data size to physical storage space size.
If i takes larger datatype where i know i should have taken datatype that was sufficient for possible values that i will insert into a table will affect any performance in sql server in terms of speed or any other way.
eg.
IsActive (0,1,2,3) not more than 3 in
any case.
I know i must take tinyint but due to some reasons consider it as compulsion, i am taking every numeric field as bigint and every character field as nVarchar(Max)
Please give statistics if possible, to let me try to overcoming that compulsion.
I need some solid analysis that can really make someone rethink before taking any datatype.
EDIT
Say, I am using
SELECT * FROM tblXYZ where IsActive=1
How will it get affected. Consider i have 1 million records
Whether it will only have memory wastage only or perforamance wastage as well.
I know more the no of pages more indexing effort is required hence performance will also get affected. But I need some statistics if possible.
You are basically wasting 7 bytes per row on bigint, this will make your tables bigger and thus less will be stored per page so more IO will be needed to bring the same amount of rows back if you used tinyint. If you have a billion row table it will add up
Defining this in statistical terms is somewhat difficult, you can literally do the maths and work out the additional IO overhead.
Let's take a table with 1 million rows, and assume no page padding, compression and use some simple figures.
Given a table whose row size is 100 bytes, that contains 10 tinyints. The number of rows per page (assuming no padding / fragmentation) is 80 (8096 / 100)
By using Bigints, a total of 70 bytes would be added to the row size (10 fields that are 7 bytes more each), giving a row size of 170 bytes, and reducing the rows per page to 47.
For the 1 million rows this results in 12,500 pages for the tinyints, and 21277 pages for the Bigints.
Taking a single disk, reading sequentially, we might expect 300 IOs per second sequential reading, and each read is 8k (e.g. a page).
The respective read times given this theoretical disk is then 41.6 seconds and 70.9 seconds - for a very theoretical scenario of a made up table / row.
That however only applies to a scan, under an index seek, the increase in IO would be relatively small, depending on how many of the bigint's were in the index or clustered key. In terms of backup and restore as mentioned, the data is expanded out and the time loss can be calculated as linear unless compression is at play.
In terms of memory caching, each byte wasted on a page on disk is a byte wasted in the memory, but only applies to the pages in memory - this is were it will get more complex, since the memory wastage will be based on how many of the pages are sitting in the buffer pool, but for the above example it would be broadly 97.6 meg of data vs 166meg of data, and assuming the entire table was scanned and thus in the buffer pool, you would be wasting ~78 megs of memory.
A lot of it comes down to space. Your bigints are going to take 8 times the space (8 byte vs 1 byte for tinyint). Your nvarchar is going to take twice as many bytes as a varchar. Making it max won't affect much of anything.
This will really come into play if you're doing look ups on values. The indexes you will (hopefully) be applying will be much larger.
I'd at least pare it down to int. Bigint is way overkill. But something about this field is calling out to me that something else is wrong with the table as well. Maybe it's just the column name — IsActive sounds like it should be a boolean/bit column.
More than that, though, I'm concerned about your varchar(max) fields. Those will add up even faster.
All the 'wasted' space also comes into play for DR (if you are 4-6 times the size due to poor data type configuration, your recovery can be just as long).
Not only do the larger pages/extents require more IO to serve.... you also decrease your memory cache with the size. With billions of rows, depending on your server you could be dealing with constant memory pressure and clearing memory cache simply because you chose a datatype that was 8 times the size you needed it to be.
I've been using Oracle for quite some time since Oracle 8i was released. I was new to the database at that time and was taught that it was best to use constant sized extent sizes when defining tablespaces.
From what I have read, it seems that today using 10/11g, that Oracle can manage these extent sizes for you automatically and that it may not keep extent sizes constant. I can easily see how this can more efficiently use disk space, but are their downsides to this. I'm thinking it may be time to let go of the past on this one. (assuming my past teaching was correct in the first place)
Yes, except for very unusual cases it's time to let go of the past and use the new Oracle extent management features. Use locally-managed tablespaces (LMT's) and the auto extent sizing and you don't have to think about this stuff again.
As a DBA, the variable extent sizing worried me at first since in the 7.3 days I spent a lot of time reorganizing tablespaces to eliminate the fragmentation that resulted from extent allocation with non-zero percent increases. (and you needed non-zero percent increases because your maximum number of extents was capped at different levels depending on the database block size used when you created the database) However, Oracle uses an algorithm to determine the rate and magnititude of extent size increases that effectively eliminates fragmentation.
Also, forget anything you have heard about how the optimum configuration is to have a table or index fit into a single extent or that you can somehow manage i/o through extent configuration - this has never been true. In the days of dictionary-managed tablespace there was probably some penalty to having thousands of extents managed in a dictionary table, but LMT's use bitmaps and this is not an issue. Oracle buffers blocks, not segment extents.
If you have unlimited disk space with instant access time, you don't have to care about extents at all.
You just make every table INITIAL 100T NEXT 100T MAXEXTENTS UNLIMITED PCTINCREASE 0 and forget about extents for next 300 years.
The problems arise when your disk space is not unlimited or access time varies.
Extents are intended to cope with data sparseness: when your data are fragmented, you will have your HDD head to jump from one place to another, which takes time.
The ideal situation is having all your data for each table to reside in one extent, while having data for the table you join most often to reside in the next extent, so everything can be read sequentially.
Note that access time also includes access time needed to figure out where you data resides. If your data are extremely sparse, extra lookups into the extent dictionaries are required.
Nowadays, disk space is not what matters, while access time still matters.
That's why Oracle created extent management.
This is less efficient in term of space used than hand-crafted extent layout, but more efficient in terms of access time.
So if your have enough disk space (i. e. your database will take less than half of the disk for 5 years), then just use automatic extents.