I am using Sybase Adaptive Server Enterprise 15.7 and I have created a table like
create table student(
rollNum int identity,
name varchar(16),
primary key(rollNum)
)with identity_gap = 50
When records are inserted, rollNum jumps
from 3 to 51 --> where it should have been jumped to 53 maintaining a gap of 50.
from 60 t0 101 --> where it should have been jumped to 110 maintaining a gap of 50.
Is this behaviour expected or am I missing something ?.
You're misunderstanding how the identity gap setting works. What happens is your identity gap value (50) is taken as a block at once into memory and then be allocated as required (i.e. values 1 to 50). When the values are all used, then a new block would be taken at that point.
If the instance is killed before all the first block was used restarted the next block is allocated (51-100 since 1-50 was previously allocated) and so the next identity value set is 51.
The identity gap setting ensures that you never get more than the value set of 50 as a jump between values not that it's an absolute gap value.
This is why it's important not to set the identity gap too low on heavily inserted tables, as there could be a small performance hit in constantly allocating small blocks of values each time all values are taken. For example on a very heavily hit table you may want to consider an identity gap of 1000 or even 10000 to prevent constant allocation of blocks of values.
Related
I have been having challenge with identity columns jumping after some server restart, an example is it would start counting from say 1,2,3,4 then later it jumps to 108,109,110 then later jumps to 10001, 10002, 10003.
I am now currently managing the IDs manually through triggers but this is an expensive exercise overtime.
An option to avoid the Identity Cache on a table level is to use sequence (with no cache) instead of identity.
create sequence dbo.YourTableId as int minvalue 1 start with 1 increment by 1 no cache;
GO
create table dbo.YourTable
(
Id int not null constraint DF_YourTable_Id default (next value for dbo.YourTableId),
[Name] varchar(50) not null,
...
CREATE SEQUENCE (Transact-SQL)
Warning: setting the sequence to no cache may impact the insertion performance.
To find a compromise you can set the cache size to a smaller number than the default value. For example, use cache 10 instead of no cache.
Here is more.
I am running the MS Access database on SQL Server Management 18. For some reason, when I created a new entry, 994 index numbers just skipped. My last indexed number was 19311, and then it suddenly jumped to 20305 when captured. What can I do to let it run from 19311 onward again?
This is pretty usual.
An identity seed is allocated before a query is committed. This means, if you run a query that inserts 100 records, but when getting the prompt if you actually want to add 100 records you press cancel, the identity seed is still incremented by 100. The same counts for copy-pasting records and many, many other operations.
You shouldn't need to prevent this from happening. Identity values are not meant to convey any meaning, and there shouldn't be a real need for changing them. If you've set your identity column to an Int(8) or Long Integer, you still have plenty of numbers to use.
SQL server explicitly blocks updating an identity column, and you also can't reseed an unique column below the initially set seed. This means: as soon as you've inserted number 20305, you can't reset it to a lower number than 20305.
You can work around that limitation by deleting all records higher than 20305, and then running DBCC CHECKIDENT ( table_name ) on SQL server with your table name to reset the seed to the highest occurring value. You can then re-add the deleted records.
See more on this Q&A for reclaiming the lost numbers, though I certainly advise against it.
I've have SSIS Package that is exporting 2.5 GB OF DATA containing 10 million records into Sql Server Database which has 10 partitions including PRIMARY FILE GROUP.
Before Changing default Max Insert Commit size i.e."2147483647" and Row per batch.It was taking 7 mins for completed transformation with fast load option.
But After chaning it some decent value with some formula, the execution was done in only 2 minutes.
FYI- DefaultMaxBufferRows & DefaultMaxBufferSize were default value in both scenorio i.e. 10000 and 10 MB respectively.
To calculate Max Insert Commit size & Row per batch
Below calucation are used.
1) Calculated length of records from source that is being transfered. which comes around 1038 bytes.
CREATE TABLE [dbo].[Game_DATA2](
[ID] [int] IDENTITY(1,1) NOT NULL, -- AUTO CALCULATED
[Number] [varchar](255) NOT NULL, -- 255 bytes
[AccountTypeId] [int] NOT NULL, -- 4 bytes
[Amount] [float] NOT NULL,-- 4 bytes
[CashAccountNumber] [varchar](255) NULL, -- 255 bytes
[StartDate] [datetime] NULL,-- 8 bytes
[Status] [varchar](255) NOT NULL,-- 255 bytes
[ClientCardNumber] [varchar](255) NULL -- 255 bytes
)
2) Rows per batch = packate_size/bytes per record =32767/1038 =32 approx.
3) Max insert commit size = packate size *number of transaction = 32767*100=3276700
(Packate size and number transaction are variable can change as per requirement)
Question :
Is there any relevance of rows per batch and max insert commit size? As there's no information mentioned in an archive article for tunning DFT(DATA FLOW TASK) execution.
Are these configuration works along with DefaultBuffermaxzie and
DefualtBuffermaxrows?if yes how?
These parameters refer to DFT OLE DB Destination with Fast Load mode only. OLE DB Destination in Fast Load issues an insert bulk command. These two parameters control it in the following way:
Maximum insert commit size - controls how much data inserted in a single batch. So, if you have MICS set to 5000 and you have 9000 rows and you encounter an error in the first 5000 results, the entire batch of 5000 will be rolled back. MISC equates to the BATCHSIZE argument in the BULK INSERT transact-sql command.
Rows Per Batch - merely a hint to the query optimizer. The value of this should be set to the actual expected number of rows. RPB equates to the ROWS_PER_BATCH argument to the BULK INSERT transact-sql command.
Specifying a value for the MICS will have a few effects. Each batch is copied to the transaction log, which will cause it to grow quickly, but offers the ability to back up that transaction log after each batch. Also, having a large batch will negatively affect memory if you have indexes on the target table, and if you are not using table locking, you might have more blocking going on.
BULK INSERT (Transact-SQL) - MS Article on this command.
DefaultBuffermaxsize and DefaultBuffermaxrows controls RAM buffer management inside DFT itself, and has no interference with options mentioned above.
Rows per batch - The default value for this setting is -1 which specifies all incoming rows will be treated as a single batch. You can change this default behavior and break all incoming rows into multiple batches. The allowed value is only positive integer which specifies the maximum number of rows in a batch.
Maximum insert commit size - The default value for this setting is '2147483647' (largest value for 4 byte integer type) which specifies all incoming rows will be committed once on successful completion. You can specify a positive value for this setting to indicate that commit will be done for those number of records. You might be wondering, changing the default value for this setting will put overhead on the dataflow engine to commit several times. Yes that is true, but at the same time it will release the pressure on the transaction log and tempdb to grow tremendously specifically during high volume data transfers.
The above two settings are very important to understand to improve the performance of tempdb and the transaction log. For example if you leave 'Max insert commit size' to its default, the transaction log and tempdb will keep on growing during the extraction process and if you are transferring a high volume of data the tempdb will soon run out of memory as a result of this your extraction will fail. So it is recommended to set these values to an optimum value based on your environment.
Note: The above recommendations have been done on the basis of experience gained working with DTS and SSIS for the last couple of years. But as noted before there are other factors which impact the performance, one of the them is infrastructure and network. So you should do thorough testing before putting these changes into your production environment.
Dear Harsimranjeet Singh;
In based of my personal experience, Rows_Per_Batch determine count of rows per batch that oledb_destination must recieve from DFT component whereas DefualtBuffermaxrows determine the bacth size of DFT, so DefualtBuffermaxrows is depend on specification of SSIS server and Rows_Per_Batch is depend to destination server and each must be set with their conditions.
Also Maximum_Insert_Commit_Size determine number of records when it hit number then it write in log file and it commited; decreasing this number, makes increasing count of refers to log and this is bad but it cause that MSDB(system db) is not inflating and it is very good for increasing performance.
Another point, is relation between DefualtBuffermaxrows and DeafultBufferSize, that must be set together. DefualtBuffermaxrows multiplied by size of each record must be approximately equal to DeafultBufferSize, if this is bigger then ssis reduce that to reach to that and if this is smaller that and smaller than Minimum Buffer Size, then increase it to touch Minimum Buffer Size. These operation seriously reduce performance of your package.
Good Luck!
I'm looking for ways to reduce memory consumption by SQLite3 in my application.
At each execution it creates a table with the following schema:
(main TEXT NOT NULL PRIMARY KEY UNIQUE, count INTEGER DEFAULT 0)
After that, the database is filled with 50k operations per second. Write only.
When an item already exists, it updates "count" using an update query (I think this is called UPSERT). These are my queries:
INSERT OR IGNORE INTO table (main) VALUES (#SEQ);
UPDATE tables SET count=count+1 WHERE main = #SEQ;
This way, with 5 million operations per transaction, I can write really fast to the DB.
I don't really care about disk space for this problem, but I have a very limited RAM space. Thus, I can't waste too much memory.
sqlite3_user_memory() informs that its memory consumption grows to almost 3GB during the execution. If I limit it to 2GB through sqlite3_soft_heap_limit64(), database operations' performance drops to almost zero when reaching 2GB.
I had to raise cache size to 1M (page size is default) to reach a desirable performance.
What can I do to reduce memory consumption?
It seems that the high memory consumption may be caused by the fact that too many operations are concentrated in one big transaction. Trying to commit smaller transaction like per 1M operations may help. 5M operations per transaction consumes too much memory.
However, we'd balance the operation speed and memory usage.
If smaller transaction is not an option, PRAGMA shrink_memory may be a choice.
Use sqlite3_status() with SQLITE_STATUS_MEMORY_USED to trace the dynamic memory allocation and locate the bottleneck.
I would:
prepare the statements (if you're not doing it already)
lower the amount of INSERTs per transaction (10 sec = 500,000 sounds appropriate)
use PRAGMA locking_mode = EXCLUSIVE; if you can
Also, (I'm not sure if you know) the PRAGMA cache_size is in pages, not in MBs. Make sure you define your target memory in as PRAGMA cache_size * PRAGMA page_size or in SQLite >= 3.7.10 you can also do PRAGMA cache_size = -kibibytes;. Setting it to 1 M(illion) would result in 1 or 2 GB.
I'm curious how cache_size helps in INSERTs though...
You can also try and benchmark if the PRAGMA temp_store = FILE; makes a difference.
And of course, whenever your database is not being written to:
PRAGMA shrink_memory;
VACUUM;
Depending on what you're doing with the database, these might also help:
PRAGMA auto_vacuum = 1|2;
PRAGMA secure_delete = ON;
I ran some tests with the following pragmas:
busy_timeout=0;
cache_size=8192;
encoding="UTF-8";
foreign_keys=ON;
journal_mode=WAL;
legacy_file_format=OFF;
synchronous=NORMAL;
temp_store=MEMORY;
Test #1:
INSERT OR IGNORE INTO test (time) VALUES (?);
UPDATE test SET count = count + 1 WHERE time = ?;
Peaked ~109k updates per second.
Test #2:
REPLACE INTO test (time, count) VALUES
(?, coalesce((SELECT count FROM test WHERE time = ? LIMIT 1) + 1, 1));
Peaked at ~120k updates per second.
I also tried PRAGMA temp_store = FILE; and the updates dropped by ~1-2k per second.
For 7M updates in a transaction, the journal_mode=WAL is slower than all the others.
I populated a database with 35,839,987 records and now my setup is taking nearly 4 seconds per each batch of 65521 updates - however, it doesn't even reach 16 MB of memory consumption.
Ok, here's another one:
Indexes on INTEGER PRIMARY KEY columns (don't do it)
When you create a column with INTEGER PRIMARY KEY, SQLite uses this
column as the key for (index to) the table structure. This is a hidden
index (as it isn't displayed in SQLite_Master table) on this column.
Adding another index on the column is not needed and will never be
used. In addition it will slow INSERT, DELETE and UPDATE operations
down.
You seem to be defining your PK as NOT NULL + UNIQUE. PK is UNIQUE implicitly.
Assuming that all the operations in one transaction are distributed all over the table so that all pages of the table need to be accessed, the size of the working set is:
about 1 GB for the table's data, plus
about 1 GB for the index on the main column, plus
about 1 GB for the original data of all the table's pages changed in the transaction (probably all of them).
You could try to reduce the amount of data that gets changed for each operation by moving the count column into a separate table:
CREATE TABLE main_lookup(main TEXT NOT NULL UNIQUE, rowid INTEGER PRIMARY KEY);
CREATE TABLE counters(rowid INTEGER PRIMARY KEY, count INTEGER DEFAULT 0);
Then, for each operation:
SELECT rowid FROM main_lookup WHERE main = #SEQ;
if not exists:
INSERT INTO main_lookup(main) VALUES(#SEQ);
--read the inserted rowid
INSERT INTO counters VALUES(#rowid, 0);
UPDATE counters SET count=count+1 WHERE rowid = #rowid;
In C, the inserted rowid is read with sqlite3_last_insert_rowid.
Doing a separate SELECT and INSERT is not any slower than INSERT OR IGNORE; SQLite does the same work in either case.
This optimization is useful only if most operations update a counter that already exists.
In the spirit of brainstorming I will venture an answer. I have not done any testing like this fellow:
Improve INSERT-per-second performance of SQLite?
My hypothesis is that the index on the text primary key might be more RAM-intensive than a couple of indexes on two integer columns (what you'd need to simulate a hashed-table).
EDIT: Actually, you dont' even need a primary key for this:
create table foo( slot integer, myval text, occurrences int);
create index ix_foo on foo(slot); // not a unique index
An integer primary key (or a non-unique index on slot) would leave you with no quick way to determine if your text value were already on file. So to address that requirement, you might try implementing something I suggested to another poster, simulating a hashed-key:
SQLite Optimization for Millions of Entries?
A hash-key-function would allow you to determine where the text-value would be stored if it did exist.
http://www.cs.princeton.edu/courses/archive/fall08/cos521/hash.pdf
http://www.fearme.com/misc/alg/node28.html
http://cs.mwsu.edu/~griffin/courses/2133/downloads/Spring11/p677-pearson.pdf
I had an uniqueidentifier field in SQL Server (SQL Azure to be precise) that I wanted to get rid of. Initially, when I ran the code as mentioned in SQL Azure table size to determine the size of the table it was about 0.19 MB.
As a first step I set all values in the uniqueidentifier field to null. There are no constraints/indexes that use the column. Now, when I ran the code to determine the table sizes the table had increased in size to about 0.23 MB. There are no records being added to a table (its in a staging environment).
I proceeded to delete the column and it still hovered at the same range.
Why does the database size show an increase when I delete a column. Any suggestions?
Setting an uniqueidentifier column to NULL value does not change the record size in any way, since is a fixed size type (16 bytes). Dropping a fixed size column column does not change the record size, unless is the last column in the physical layout and the space can be reused later. ALTER TABLE ... DROP COLUMN is only a logical operation, it simply marks the columns as dropped, see SQL Server Columns Under the Hood.
In order to reclaim the space you need to drop the column and then rebuild the clustered index of the table, see ALTER INDEX ... REBUILD.
For the record (since SHRINK is not allowed in SQL Azure anyway) on the standalone SQL Server SHRINK would had solved nothing, this is not about page reservation but about physical record size.
It's counting the number of reserved pages to calculate the size. Deleting a column may reduce the number of pages that are actually utilized to store data, but the newly-freed pages are probably still reserved for future inserts.
I think you'd need to shrink the database to see the size decrease, as per: http://social.msdn.microsoft.com/Forums/en-US/ssdsgetstarted/thread/ae698613-79d5-4f23-88b4-f42ee4b3ad46/
As an aside, I am fairly sure that setting the value of a non-variable-length column (like a GUID) to null will not save you any space at all- only deleting the column will do so. This per Space used by nulls in database