TimescaleDB high disk space usage despite compression activated

TimescaleDB high disk space usage despite compression activated - database

I have a TimescaleDB database running on Docker (which is running on Ubuntu).
As you can guess, I store a lot of time-series data in it and to save disk space I activated a compression policy:
ALTER TABLE measurements SET (
timescaledb.compress,
timescaledb.compress_orderby = 'time DESC',
timescaledb.compress_segmentby = 'device'
)
SELECT add_compression_policy('measurements', INTERVAL '2 weeks')
and if I check the compression stats i get the result I expect:
before compression --> 81GB
after compression --> 3900MB
Until here it's all fine, except it does not free up actual disk space and the Docker folder that contains the TimescaleDB files is growing as if compression was never activated in the first place.
VACUUMing the whole DB has no effect, except for taking up an additional 1GB of disk space.
EDIT:
by disk space I mean the specific fs directory where the timescaleDB volume is mounted, where the database files are stored on the host machine.

It turns out that everything worked fine, I was just tricked by what the compression_stats were telling me.
In the end the compression was active but the chunks were not old enough so they weren't compressed yet, just pending to be compressed.
We noticed it by manually checking chunks status.
Thanks to everyone who commented.

Related

Transferring millions of documents to an external hard drive

I have 13 million documents on Azure Blob Storage that I can azcopy to my desktop memory within 24 hours. However, as soon as I try to transfer these files to my external hard drive, the time needed to complete the transfer jumps to 60 days. The files aren't large - each 100 kb - so the entire transfer is about 1.3 TB. I have tried:
Zipping the files, transfer, unzip. Problem: Unzipping takes just as long
azcopy directly into the SSD hard drive
robocopy files from internal to external drive
Simple ctrl-c ctrl-v.
Each of the above options take months to complete the transfer. Any ideas on how to speed this up??? Why would azcopy be so much faster for an internal drive than an external one?

There could be several reasons for the performance issue.
You can run a performance benchmark test on specific blob containers or file shares to view general performance statistics and to identify performance bottlenecks. You can run the test by uploading or downloading generated test data.
Use the following command to run a performance benchmark test.
Syntax
azcopy benchmark 'https://<storage-account-name>.blob.core.windows.net/<container-name>'
Optimize the performance of AzCopy with Azure Storage
There are several options for transferring data to and from Azure, depending on your needs: Transfer data to and from Azure
Azcopy Fast Data Transfer is a tool for fast upload of data into Azure – up to 4 terabytes per hour from a single client machine. It moves data from your premises to Blob Storage, to a clustered file system, or direct to an Azure VM. It can also move data between Azure regions.
The tool works by maximizing utilization of the network link. It efficiently uses all available bandwidth, even over long-distance links. On a 10 Gbps link, it reaches around 4 TB per hour, which makes it about 3 to 10 times faster than competing tools we’ve tested. On slower links, Fast Data Transfer typically achieves over 90% of the link’s theoretical maximum, while other tools may achieve substantially less.
For example, on a 250 Mbps link, the theoretical maximum throughput is about 100 GB per hour. Even with no other traffic on the link, other tools may achieve substantially less than that. In the same conditions (250 Mbps, with no competing traffic) Fast Data Transfer can be expected to transfer at least 90 GB per hour. (If there is competing traffic on the link, Fast Data Transfer will reduce its own throughput accordingly, in order to avoid disrupting your existing traffic.)
Fast Data Transfer runs on Windows and Linux. Its client-side portion is a command-line application that runs on-premises, on your own machine. A single client-side instance supports up to 10 Gbps. Its server-side portion runs on Azure VM(s) in your own subscription. Depending on the target speed, between 1 and 4 Azure VMs are required. An Azure Resource Manager template is supplied to automatically create the necessary VM(s).
Your files are very small (e.g. each file is only 10s of KB).
You have an ExpressRoute with private peering.
You want to throttle your transfers to use only a set amount of network bandwidth.
You want to load directly to the disk of a destination VM (or to a clustered file system). Most Azure data loading tools can’t send data direct to VMs. Tools such as Robocopy can, but they’re not designed for long-distance links. We have reports of Fast Data Transfer being over 10 times faster.
You are reading from spinning hard disks and want to minimize the overhead of seek times. In our testing, we were able to double disk read performance by following the tuning tips in Fast Data Transfer’s instructions.

rclone slow transfer from bucket to filesystem

Im using rclone to tranfer data between a minio bucket and a shared storage. Im migrating a store and The amount of data is around 200GB of product pictures. Every single picture have his own folder/path. So there are a lot of folders that needs to create to. Rclone is installed on the new server and the storage is connected to the server via san. The transfer is running over a week and we are at 170GB right now. Everything works fine but it is really slow in my opinion. Is it normal that a transfer out of a bucket into a classic filesystem is that slow?

(Doing the math, the speed is only 2.3Mbps. I am honestly not going to pay anything for that speed.)
Perhaps you should break down the issue and diagnose part by part. Below are several common places to look out for slow transfer (generally speaking for any file transfer):
First of all, network and file systems are usually not performant with lots of small files, so to isolate the issue, upload a bigger file to minio first (1GB+). And for each step, test with big file first.
Is the speed of the source fast enough? Try copying the files from minio to a local storage or Ramdisk (/tmp is usually tmpfs and in turn stored in RAM, use mount to check).
Is the speed of the destination fast enough? Try dd or other disk performance testing utility.
Is the network latency to source high? Try pinging or curling the API (with timing)
Is the network latency to destination high? Try iostat
Maybe the CPU is the bottleneck? As encoding and decoding stuff takes quite a lot of computing power. Try top when a copy is running.
Again, try these steps with the big file and fragmented file separately. The is quite a chance that fragmented files is an issue. If that is the case, I would try to look for concurrency option in rclone.

I had the same problem copying hundreds of thousands of small files from a S3-compatible storage to a local storage. Originally I was using s3fs+rsync. Very (very) slow, and it was getting stuck on the largest folders. Then I discovered rclone, and finished the migration within a few hours with these parameters:
rclone copy source:/bucket /destination/folder --checkers 256 --transfers 256 --fast-list --size-only --progress
Explanation of the options (from https://rclone.org/flags/)
--checkers 256 Number of checkers to run in parallel (default 8)
--transfers 256 Number of file transfers to run in parallel (default 4)
--fast-list Use recursive list if available; uses more memory but fewer transactions
--size-only Skip based on size only, not mod-time or checksum (wouldn't apply in your case if copying to an empty destination)
--progress Show progress during transfer

Secure delete files on Windows 10

I want to securely delete my context of my SSD hard disk. I had a look on sdelete but i realized that file names are not deleted or overwrited.
Is there any free tool that i can achieve the above?
Thank you

I'm not sure if you want to delete permanently or secure delete from the drive and cannot be recovered anymore.
So, these are the two ways:
Delete permanently: in Windows Explorer, you can select the file and type "shift + del" on the keyboard. This way the file'll not be moved to your recycle bin;
Secure delete: When you delete a file from a HDD, the sector of the disk is marked as unused and not really erased. So, you need a software to replace these sector with "nothing" and avoid others user can recovery your deleted files using others softwares. One very good software is ERASER, that have one very good method to total erase the file from the disk, called "Gutmann standard": it´s overwrite the deleted files 35 times. Yes, there are softwares that keep trying to read the same sectors on the disk severall times.
But, as your case, the disk is a SSD, the only way to secure erase the file, really destroying all the data, is reformating it.
An alternative to this bad solution, preventing this situation, is enabling file-drive encryption. This option is already available on Windows 10.
Obs: of course, the file that you want to delete, can't be in using.

Erasing an SSD is not that easy because SSDs are more like mini-computers with an own OS only showing you only some of the data saved in it's flash chips. Als wear-leveling algorithms and overprovisioning makes secure deleting on user level next to impossible.
As far as I know there is only one solution to securely delete data on an SSD (without destroying the SSD):
Perform the Secure Erase Command using a SSD software - usually provided by the SD manufacturer itself.
It deletes and recreates the internal encryption key which makes all the data unreadable that is stored on the SSD.
Note that the secure erase command is not supported by every SSD.

TempDB performance crawling; should we reboot?

A bit of background: We have 17 different TempDB database files and 6 TempDB log files on the server. These are spread out on different drives, but are hosted on 2 drive arrays.
I’m seeing Disk IO response times exceeding the recommended limits. Typically you want your disks to respond in 5-10ms, with nothing going over 200ms. We’re seeing random spikes up to 800ms on the TempDB files, but only on one drive array.
Proposed solution: Restart SQL server. While SQL server is shut down, reboot the drive array hosting the majority of the TempDB files. In addition, while SQL is down, redo the network connection to bypass the network switch in an attempt to eliminate any source of slowness on the hardware.
Is this a good idea or a shot in the dark? Any ideas?
Thanks in advance.

17? Who came up with that number? Please read this and this - very few scenarios where > 8 files will help, particularly if you only have 2 underlying arrays/controllers. Some suggestions:
Use an even number of files. Most folks start with 4 or 8, and increase beyond that only when they've proven that they still have contention (and also know that their underlying I/O can actually handle more files and scale with them; in some cases it will have no effect or the exact opposite effect - a different drive letter does not necessarily mean better I/O pathing).
Make sure all of the data files are sized the same, and have identical autogrow settings. Having 17 files with different sizes and autogrowth settings will defeat round robin - in a lot of cases only one file will be used due to the way SQL Server performs proportional fill. And having an odd number just seems... well, odd to me.
Get rid of your 5 extra log files. They are absolutely useless.
Use trace flag 1117 to make sure that all the data files grow at the same time and (because of 2.) at the same rate. Note though that this trace flag applies to all databases, not just tempdb. More info here.
You can also consider trace flag 1118 to change allocation, but please read this first.
Make sure instant file initialization is on, so that the file doesn't have to be zeroed out when it expands.
Pre-size your tempdb files so that they don't have to grow during normal day-to-day activity. Do not shrink tempdb files because they suddenly got big - this is just a rinse and repeat operation, since if they got that big once, they'll get that big again. It's not like you can lease out the recovered space in the meantime.
When possible, perform DBCC CHECKDB elsewhere. If you're running CHECKDB regularly, yay! Pat yourself on the back. However this can take a toll on tempdb - please see this article on optimizing this operation and pulling it away from your production instance where feasible.
Finally, validate what type of contention you're seeing. You say that tempdb performance crawls, but in what way? How are you measuring this? Some info on determining the exact nature of tempdb bottlenecks here and here and here and here and here.
Have you considered making less use of tempdb explicitly (fewer #temp tables, #table variables, and static cursors - or cursors altogether)? Are you making heavy use of RCSI, or MARS, or LOB-type local variables?

Memory leak using SQL FileStream

I have an application that uses a SQL FILESTREAM to store images. I insert a LOT of images (several millions images per days).
After a while, the machine stops responding and seem to be out of memory... Looking at the memory usage of the PC, we don't see any process taking a lot of memory (neither SQL or our application). We tried to kill our process and it didn't restore our machine... We then kill the SQL services and it didn't not restore to system. As a last resort, we even killed all processes (except the system ones) and the memory still remained high (we are looking in the task manager's performance tab). Only a reboot does the job at that point. We have tried on Win7, WinXP, Win2K3 server with always the same results.
Unfortunately, this isn't a one-shot deal, it happens every time.
Has anybody seen that kind of behaviour before? Are we doing something wrong using the SQL FILESTREAMS?

You say you insert a lot of images per day. What else do you do with the images? Do you update them, many reads?
Is your file system optimized for FILESTREAMs?
How do you read out the images?
If you do a lot of updates, remember that SQL Server will not modify the filestream object but create a new one and mark the old for deletion by the garbage collector. At some time the GC will trigger and start cleaning up the old mess. The problem with FILESTREAM is that it doesn't log a lot to the transaction log and thus the GC can be seriously delayed. If this is the problem it might be solved by forcing GC more often to maintain responsiveness. This can be done using the CHECKPOINT statement.
UPDATE: You shouldn't use FILESTREAM for small files (less than 1 MB). Millions of small files will cause problems for the filesystem and the Master File Table. Use varbinary in stead. See also Designing and implementing FILESTREAM storage
UPDATE 2: If you still insist on using the FILESTREAM for storage (you shouldn't for large amounts of small files), you must at least configure the file system accordingly.
Optimize the file system for large amount of small files (use these as tips and make sure you understand what they do before you apply)
Change the Master File Table
reservation to maximum in registry (FSUTIL.exe behavior set mftzone 4)
Disable 8.3 file names (fsutil.exe behavior set disable8dot3 1)
Disable last access update(fsutil.exe behavior set disablelastaccess 1)
Reboot and create a new partition
Format the storage volumes using a
block size that will fit most of the
files (2k or 4k depending on you
image files).