WAL sequence number infinite? - sql-server

I am wondering if database WAL sequences are infinite? I guess most WAL records have a fix size for the WAL number? Is this a really big number that is so big that it just won't reach an end? This might be quite a waste of space? Or have the big DB-player invented a better method?
Or do they implement some logic to let the WAl start at 0 again? That might have heavy impact on many spots in the code...?
EDIT:
Impact: E.g. the recovery after a crash relies on the sequence number getting bigger along the timeline. If the sequence could start over the recovery could get confused.
Term WAL sequence number: WAL (Write Ahead Log a.k.a the transactional log that is guranteed to be on your disk before the application layer received that a transaction was successful). This log has a growing number to keep the database consitent e.g. in case of recovery by checking the WAL sequence number from the pages against the sequence number from the WAL.

I would not assume that every database implements the same strategy.
Speaking only for Oracle, the SCN (system change number) is a 48-bit number so an Oracle database can handle nearly 300 trillion transactions before hitting the limit. Realistically, that will take eons. Even if you could do 1 thousand transactions per second, the SCN wouldn't hit the limit for 300 billion seconds or roughly 9500 years. Now, there are various things that can cause the SCN to increment in addition to just doing transactions (famously the recent issue with hot backups and database links that caused a few users to exceed the database's checks for the reasonability of the SCN) so it won't really take 9500 years to hit the limit. But, realistically, it gives Oracle plenty of time to move to a 64-bit SCN some years down the line, buying everyone a few more centuries of functionality.

Like SQL Server, DB2 calls that counter a Log Sequence Number (LSN). IBM recently expanded the size of their LSN from six bytes to eight bytes, unsigned. The LSN is an ever-growing pointer that shows where in the log files that specific log record can be found. An eight byte LSN means that a DB2 database can write nearly 16 exbibytes of log records before running out of address space, at which point the contents of the database must be unloaded and copied into a new database.

Postgres WAL numbers technically can overflow, but only after writing 32 Eb of data, as the WAL file pointer is 64-bit

See What is an LSN: Log Sequence Number. The article describes the structure of the SQL Server LSN (the WAL number) and shows you how to decode one. Since LSNs are of fixed size and they don't roll over, it follows that you can run out of them. It will take a very very long time though.

In PostgreSQL, WAL logs are stored as a set of segment files. "Segment files are given ever-increasing numbers as names, starting at 000000010000000000000000." The number doesn't wrap around.

Related

snowflake query still running after byte scanned is 100

This might be more of snowflake knowledge question rather than issue.
I am running a copy command from s3 to snowflake.
and i see it took 30 min to get 100 the byte scanned however even after having byte scanned to 100 percent it took another 40 min for the query to be completed.
Can somebody please explain whats going on here, as this way i feel it is tough to estimate how much time any running copy command might take while looking into the history screen.
Sounds like you're referring to the 100% in the Bytes Scanned column of the query profile. If you have transformations in your COPY INTO command this will take additional time to process. As others have mentioned, the size of the warehouse will have an impact as the warehouse size will determine the number of cores and threads, which directly affect the parallelism of the writes.
In short the Bytes Scanned is just a measure of the total data read by Snowflake that will be processed by the job, but it still needs to process the job.
We in the past we have found each xsmall can load 40mb/s from S3, and thus a small can load 2x. So that's our base line expectation of load speeds.
What can legitimately slow down a copy is if you are coping from the root of the bucket s3://buck_name/ but there are millions of file in the that directory, with only one new 100 byte file. But I suspect that is also not the case.
The next thing it might be is failure to run the query part, which in the profile would have multiple profile stage tabs of the likes of 1 \ 1001 \ 2002 which the increment on the stage in the thousands indicating the query failed to execute and that it was re-run. This sometimes can be due to the warehouse get corrupted, and sometime due to the new run-time of the current release failing, and the retry's can be running on older releases to see if those succeed. But there are often clues to some of this, with time being seen "spilling to internal/external storage" is something we have seen when bugs occurs.
But in reality if things are seeming "really" strange, I would open a support ticket, and ask for an explanation of what is happening. With the usual, this is what I am seeing, this is why I think it's strange..

Method Replication in Postgresql 9.5

Hy guys . help me.
What is Difference wal level logical, hot_standby, minimal, logical and what is wal segment whether need to use big segment or not?
now i use Wal_segment : 50.
why after I try insert 5 million row segment archive over 50?
The parameter wal_level determines how much information is written to the transaction log (write-ahead log, short WAL).
The settings in decreasing order of amount of WAL emitted:
For logical replication or logical decoding, you need logical.
To run physical replication with hot standby, you need hot_standby.
To archive WAL files with archive_mode = on, you need archive.
The minimal level logs only the information required for crash recovery.
Note that from PostgreSQL 9.6 on, archive and hot_standby have been deprecated and replaced with the new setting replica.
A WAL segment is one 16 MB transaction log file as seen in pg_xlog or pg_wal.
I guess that with wal_segment you mean the parameter checkpoint_segments (max_wal_size since 9.5).
It does not pose an absolute limit on the number of WAL segments, it determines after how much WAL a checkpoint will be forced. If your archive_command is slow, WAL may pile up.

Solr File Descriptor Count

I have an Apache Solr 4.2.1 instance that has 4 cores of total size (625MB + 30MB + 20GB + 300MB) 21 GB.
It runs on a 4 Core CPU, 16GB RAM, 120GB HD, CentOS dedicated machine.
1st core is fully imported once a day.
2nd core is fully imported every two hours.
3rd core is delta imported every two hours.
4rth core is fully imported every two hours.
The server also has a decent amount of queries (search for and create, update and delete documents).
Every core has maxDocs: 100 and maxTime: 15000 for autoCommint and maxTime: 1000 for autoSoftCommit.
The System usage is:
Around 97% of 14.96 GB Physical Memory
0MB Swap Space
Around 94% of 4096 File Descriptor Count
From 60% to 90% of 1.21GB of JVM-Memory.
When I reboot the machine the File Descriptor Count fall to near 0 and then steadily over the course of on week or so it reaches the aforementioned value.
So, to conclude, my questions are:
Is 94% of 4096 File Descriptor Count normal?
How can I increase the maximum File Descriptor Count?
How can I calculate the theoretical optimal value for the maximum and used File Descriptor Count.
Will the File Descriptor Count reaches 100? If yes, the server will crash? Or it will keep it bellow 100% by itself and functions as it should?
Thanks a lot beforehand!
Sure.
ulimit -n <number>. See Increasing ulimit on CentOS.
There really isn't one - as many as needed depending on a lot of factors, such as your mergefactor (if you have many files, the number of open files will be large as well - this is especially true for the indices that aren't full imports. Check the number of files in your data directories and issue an optimize if the same index has become very fragmented and have a large mergefactor), number of searchers, other software running on the same server, etc.
It could. Yes (or at least it won't function properly, as it won't be able to open any new files). No. In practice you'll get a message about being unable to open a file with the message "Too many open files".
So, the issue with the File Descriptor Count (FDC) and to be more precise with the ever increasing FDC was that I was committing after every update!
I noticed that Solr wasn't deleting old transaction logs. Thus, after the period of one week FDC was maxing out and I was forced to reboot.
I stopped committing after every update and now my Solr stats are:
Around 55% of 14.96 GB Physical Memory
0MB Swap Space
Around 4% of 4096 File Descriptor Count
From 60% to 80% of 1.21GB of JVM-Memory.
Also, the old transaction logs are deleted by auto commit (soft & hard) and Solr has no more performance wornings!
So, as pointed very well in this article:
Understanding Transaction Logs, Soft Commit and Commit in SolrCloud
"Be very careful committing from the client! In fact, don’t do it."

Is 2ms a realistic performance goal for small writes to LMDB on a Raspberry Pi 3?

I'm about to begin work on a new project on a Raspberry Pi 3. It will be controlled with a complex GUI so interactive performance is important.
I decided to use LMDB for persistence, as - outside performance - its special traits make my system a lot simpler and for me there's no downside.
The application will be written in Rust, using the lmdb crate.
The critical path will be a single-threaded part where I will get the current timestamp and write a total of 16 or 20 bytes (not sure yet, is 16 bytes really better?) to the database under a key I already have computed.
For doing so (beginning with timestamp and ending after the write transaction has been commited) I have a performance budget of 2 milliseconds.
As far as I heard writes are LMDB's worst criteria, and those are very small random writes, so this is probably the worst possible application. This thought made me ask this question.
Further information:
as this is a GUI application this path will be called at most 100 times per second, and also never more than 1000 times per hour (this is a rewrite from scratch, and those are 10 times the numbers measured in the current system).
I do not understand much about how LMDB works but AFAIU it is using memory mapped files. I was hoping this means the OS internals will write back the pages aggregated.
Is 2ms a realistic goal for such an application? What would I need to consider to keep myself inside this window? Do I need to cache those writes manually?
It depends on how much data you have in you LMDB.
2ms is definitely achievable for small DB. I have got over 100K writes per second with SSDs and small DB size (<1GB). But if the DB is larger than your memory, you probably should not use LMDB if you are worried about write performance.
To wrap up, if your DB is smaller than memory, you can go ahead and use LMDB, 2ms is definitely enough for a write operation. If your DB is larger than memory, then you better use LSM-based kv-stores like LevelDB or RocksDB.

How do I quickly fill up a multi-petabyte NAS?

My company's product will produce petabytes of data each year at our client sites. I want to fill up a multi-petabyte NAS to simulate a system that has been running for a long time (3 months, 6 months, a year, etc). We want to analyze our software while it's running on a storage system under load.
I could write a script that creates this data (a single script could take weeks or months to execute). Are there recommendations on how to farm out the script (multiple machines, multiple threads)? The NAS has 3 load balanced incoming links... should I run directly on the NAS device?
Are there third-party products that I could use to create load? I don't even know how to start searching for products like this.
Does it matter if the data is realistic? Does anyone know anything about NAS/storage architecture? Can it just be random bits or does the regularity of the data matter? We fanning the data out on disk in this format
x:\<year>\<day-of-year>\<hour>\<minute>\<guid-file-name>.ext
You are going to be limited by the write speed of the NAS/disks - I can think of no way of getting round that.
So the challenge then is simply to write-saturate the disks for as long as needed. A script or set of scripts running on a reasonable machine should be able to do that without difficulty.
To get started, use something like Bonnie++ to find out how fast your disks can write. Then you could use the code from Bonnie as a starting point to saturate the writes - after all, to benchmark a disk Bonnie has to be able to write faster than the NAS.
Assuming you have 3x1GB ethernet connections, the max network input to the box is about 300 MB/s. A PC is capable of saturating a 1GB ethernet connection, so 3 PCs should work. Get each PC to write a section of the tree and voila.
Of course, to fill a petabyte at 300 MB/s will take about a month.
Alternatively, you could lie to your code about the state of the NAS. On Linux, you could write a user-space filesystem that pretended to have several petabytes of data by creating on-the fly metadata (filename, length etc) for a petabytes worth of files. When the product reads, then generate random data. When you product writes, write it to real disk and remember you've got "real" data if it's read again.
Since your product presumably won't read the whole petabyte during this test, nor write much of it, you could easily simulate an arbitrarily full NAS instantly.
Whether this takes more or less than a month to develop is an open question :)

Resources