Out of memory on ffmpeg when converting to H265 - batch-file

I've a bunch of video files, mostly H264. To save storage, I wrote a batch script, that converts all of them to H265 using ffmpeg. Problem: Some files cause ffmpeg to use ALL my memory (24 GB). Then it crashes (cause it try to allocate even more RAM), which stops the converting process.
I think that these files are corrupt in some kind. Because with other files, it works well with low memory consumption. Now I want to reject those broken ones, so that unattended converting is possible.
How is it possible to detect such corruption? Can ffmpeg do this, or is a third party tool required?
My ffmpeg call
set crf=20
set codec=265
ffmpeg -hide_banner -i "!fullSourcePath!" -c:v libx%codec% -crf %crf% "%targetPath%\!targetFileName!"
mkvalidator can't help
mkvalidator says that a corrupt file is valid:
mkvalidator.exe "V:\Filme\_LegacyFormat\22 Jump Street.mkv"
........................................................................................................................
WRN0D0: There are 5306 bytes of void data..
mkvalidator 0.5.0: the file appears to be valid
file created with libebml v1.3.0 + libmatroska v1.4.1 / mkvmerge v6.9.1 ('Blue Panther') 64bit built on Apr 18 2014 18:23:38
eac3to331 can't help, too
I found the tool eac3to331, which has a check flag. But it gave me no errors, although the tested file seems corrupt (cause my PC to crash after several minutes running ffmpeg)
eac3to.exe -check "V:\Filme\_LegacyFormat\22 Jump Street.mkv"
MKV, 1 video track, 2 audio tracks, 1 subtitle track, 1:51:57, 24p /1.001
1: h264/AVC, English, 1920x808 24p /1.001 (240:101)
2: DTS, German, 5.1 channels, 1509kbps, 48kHz
3: DTS, English, 5.1 channels, 1509kbps, 48kHz
4: Subtitle (SRT), German
v01 Extracting video track number 1...
a02 Extracting audio track number 2...
a03 Extracting audio track number 3...
s04 Extracting subtitle track number 4...
Video track 1 contains 161039 frames.
eac3to processing took 1 minute, 26 seconds.
Done.

Related

Merge two videos with different HTTP Range header bytes

I want to save 10 seconds from any part of a video by using its URL (without downloading it completely).
the server supports the Range header and it's possible to get specific byte ranges, the video is ok when I save it with something like Range: bytes=0-102400 but when I change the start byte and save a video with Range: bytes=307200-614400 it's no longer playable.
I know maybe it lacks a MIME/header type that should be at the beginning of a file, but the first bytes are not in this response to specify the correct format of the file.
so I saved the video from 0-102400, which is ok and playable, and wanted to get that specific range and somehow append it after the first file to have both a correct file header (less than 1 sec) and that middle part of the video.
first.webm Range: bytes=0-102400 (valid playable file)
middle.webm Range: bytes=307200-614400 (not playable file)
I tried to merge them using this command recommended by this answer
ffmpeg -f concat -i list.txt -c copy merged.webm
logs:
[matroska,webm # 000002143c3e77c0] File ended prematurely00 bitrate=3752.0kbits/s speed=N/A
[matroska,webm # 000002143c429e40] Format matroska,webm detected only with low score of 1, misdetection possible!
[matroska,webm # 000002143c429e40] EBML header parsing failed
[concat # 000002143c3dda80] Impossible to open 'middle.webm'
list.txt: Invalid data found when processing input
frame= 42 fps=0.0 q=-1.0 Lsize= 10kB time=00:00:01.64 bitrate= 48.7kbits/s speed= 231x
video:9kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 8.500000%
generated video only shows the first video (which was already playable before merging) and ends.
I inspired this idea from this answer and don't want to download the complete video. If I can't merge them is it possible and how to write a MIME/header for the middle.webm manually to make it a valid playable video?

Drop some parts of video and re-make key frames (c++ + libav)

I'm trying to drop some parts of video in my app using libav, for example in a video that has 00:08:00 length, I try to drop frames 100-250 and 400-500 (Just for example).
I wrote this code that copy AVPacket and drop some packets, But there is a problem! In our videos every keyframe followed by 29 non-key frames. So when my code goes to drop frames 100-250 the frame 100 may be is a non-key frame, in this case the parts that are going to join (In this example frame 250 to frame 400) the frame 400 is positioned after a keyframe that is not belogs to.
In this section video frames shown garbled,
Video cutting speed is so important in my code, so I can't decode/re-encode all of video frames.
The question is that, How can I decode encode begin of each parts (from begin frame to first key frame) and make another frames copy without decode?
Or, Is there any another FAST solution for splitting/merging (Dropping some parts of video)?
The question is that, How can I decode encode begin of each parts (from begin frame to first key frame) and make another frames copy without decode?
You can't. It doesn't work that way.
Start to think about time, not in frames
you can get new videos fast in parts of base_video.mp4, for example,
ffmpeg -ss 00:00:00.000 -i base_video.mp4 -t 8.000 -c copy -strict -2 new_video_8seconds_fromstart.mp4
-ss 00:00:00.000 is the time to start the new video
-t is the duration in seconds and miliseconds, example, for 8 seconds of duration you have to use 8.000
-an if you dont want audio
-strict -2 if for copy some files codecs like DTS
but if you want with re-encoding remove -c copy but it never will be fast!
ffmpeg -ss 00:00:00.000 -i base_video.mp4 -t 8.000 new_video_8seconds_fromstart.mp4

make a video from a subset of images and audio

I want to create a video of three images and three audio files but the duration time of each image should be the time of the corresponding audio file.
Lets say I have three images image_0.png, image_1.png and image_2.png and three audio files audio_0.mp3 (length 10 seconds) , audio_1.mp3 (length 15 seconds), audio_2.mp3 (length 12 seconds).
I want to create a video showing first image_0.png with audio_0.mp3 for 10 seconds, then image_1.png with audio_1.mp3 for 15 seconds and in the end image_2.png with audio_2.mp3 for 12 seconds.
I tried to make this with avconv. I tried different variations of -i commands
avconv -i imageInputFile.png -i audioInputFile.mp3 -c copy output.avi
nothing worked. Indeed, I could make for each image+audio a single avi video, but I failed concatenating all single avi files... Besides this is not the best way I think because of quality loss.
How would you do this? Is this even possible with avconv?
first concatenate all your .mp3 in one single .mp3
then name your .png something like img01.png, img02.png ... imgxx.png
then try:
mencoder 'mf://img*.png' -oac mp3lame -ovc lavc -fps 1 -ofps 25 -vf harddup -audiofile audio.mp3 -o test.avi
obviously replace lavc with your preferred codec and 1 with a reasonable value to fit the frames in your audio track.
some may argue that it's stupid to recompress audio again and I can use -oac copy instead but when converting from multiple sources it can cause issues.
this command creates a 25 fps video stream with 15-26 duplicated frames per second, if you remove -ofps 25 you will avoid duplicate frames but some decoders could hang, especially when seeking

FFMpeg: Take Certain Amount of Screenshots Between X and X?

Is there any way to get ffmpeg to take X number of screenshots between X time and X time? The way I'm doing my command line code now is like this:
ffmpeg -ss 79 -i 1.avi -r 1/2.15 -f image2 1_%%05d.jpg
This method only starts taking screenshots starting at 79 seconds, but I can't figure out a way to set an ending time (before the video ends).
Also, I will be displaying these video screenshots on a website and want there to be the same amount of screenshots per video file for consistency purposes. Is there a way to set how many screenshots I want from a video? As in, ffmpeg figures out how much time is between the two points I specify, then figures out how often to take a screenshot based on how many I want total from a video?
There is a -vframes option to control, how many frames of input ffmpeg should work with.
There is also a -t option to control, how many seconds of content to process.
Use any one of them.

How many files can I put in a directory?

Does it matter how many files I keep in a single directory? If so, how many files in a directory is too many, and what are the impacts of having too many files? (This is on a Linux server.)
Background: I have a photo album website, and every image uploaded is renamed to an 8-hex-digit id (say, a58f375c.jpg). This is to avoid filename conflicts (if lots of "IMG0001.JPG" files are uploaded, for example). The original filename and any useful metadata is stored in a database. Right now, I have somewhere around 1500 files in the images directory. This makes listing the files in the directory (through FTP or SSH client) take a few seconds. But I can't see that it has any effect other than that. In particular, there doesn't seem to be any impact on how quickly an image file is served to the user.
I've thought about reducing the number of images by making 16 subdirectories: 0-9 and a-f. Then I'd move the images into the subdirectories based on what the first hex digit of the filename was. But I'm not sure that there's any reason to do so except for the occasional listing of the directory through FTP/SSH.
FAT32:
Maximum number of files: 268,173,300
Maximum number of files per directory: 216 - 1 (65,535)
Maximum file size: 2 GiB - 1 without LFS, 4 GiB - 1 with
NTFS:
Maximum number of files: 232 - 1 (4,294,967,295)
Maximum file size
Implementation: 244 - 26 bytes (16 TiB - 64 KiB)
Theoretical: 264 - 26 bytes (16 EiB - 64 KiB)
Maximum volume size
Implementation: 232 - 1 clusters (256 TiB - 64 KiB)
Theoretical: 264 - 1 clusters (1 YiB - 64 KiB)
ext2:
Maximum number of files: 1018
Maximum number of files per directory: ~1.3 × 1020 (performance issues past 10,000)
Maximum file size
16 GiB (block size of 1 KiB)
256 GiB (block size of 2 KiB)
2 TiB (block size of 4 KiB)
2 TiB (block size of 8 KiB)
Maximum volume size
4 TiB (block size of 1 KiB)
8 TiB (block size of 2 KiB)
16 TiB (block size of 4 KiB)
32 TiB (block size of 8 KiB)
ext3:
Maximum number of files: min(volumeSize / 213, numberOfBlocks)
Maximum file size: same as ext2
Maximum volume size: same as ext2
ext4:
Maximum number of files: 232 - 1 (4,294,967,295)
Maximum number of files per directory: unlimited
Maximum file size: 244 - 1 bytes (16 TiB - 1)
Maximum volume size: 248 - 1 bytes (256 TiB - 1)
I have had over 8 million files in a single ext3 directory. libc readdir() which is used by find, ls and most of the other methods discussed in this thread to list large directories.
The reason ls and find are slow in this case is that readdir() only reads 32K of directory entries at a time, so on slow disks it will require many many reads to list a directory. There is a solution to this speed problem. I wrote a pretty detailed article about it at: http://www.olark.com/spw/2011/08/you-can-list-a-directory-with-8-million-files-but-not-with-ls/
The key take away is: use getdents() directly -- http://www.kernel.org/doc/man-pages/online/pages/man2/getdents.2.html rather than anything that's based on libc readdir() so you can specify the buffer size when reading directory entries from disk.
I have a directory with 88,914 files in it. Like yourself this is used for storing thumbnails and on a Linux server.
Listed files via FTP or a php function is slow yes, but there is also a performance hit on displaying the file. e.g. www.website.com/thumbdir/gh3hg4h2b4h234b3h2.jpg has a wait time of 200-400 ms. As a comparison on another site I have with a around 100 files in a directory the image is displayed after just ~40ms of waiting.
I've given this answer as most people have just written how directory search functions will perform, which you won't be using on a thumb folder - just statically displaying files, but will be interested in performance of how the files can actually be used.
It depends a bit on the specific filesystem in use on the Linux server. Nowadays the default is ext3 with dir_index, which makes searching large directories very fast.
So speed shouldn't be an issue, other than the one you already noted, which is that listings will take longer.
There is a limit to the total number of files in one directory. I seem to remember it definitely working up to 32000 files.
Keep in mind that on Linux if you have a directory with too many files, the shell may not be able to expand wildcards. I have this issue with a photo album hosted on Linux. It stores all the resized images in a single directory. While the file system can handle many files, the shell can't. Example:
-shell-3.00$ ls A*
-shell: /bin/ls: Argument list too long
or
-shell-3.00$ chmod 644 *jpg
-shell: /bin/chmod: Argument list too long
I'm working on a similar problem right now. We have a hierarchichal directory structure and use image ids as filenames. For example, an image with id=1234567 is placed in
..../45/67/1234567_<...>.jpg
using last 4 digits to determine where the file goes.
With a few thousand images, you could use a one-level hierarchy. Our sysadmin suggested no more than couple of thousand files in any given directory (ext3) for efficiency / backup / whatever other reasons he had in mind.
For what it's worth, I just created a directory on an ext4 file system with 1,000,000 files in it, then randomly accessed those files through a web server. I didn't notice any premium on accessing those over (say) only having 10 files there.
This is radically different from my experience doing this on ntfs a few years back.
I've been having the same issue. Trying to store millions of files in a Ubuntu server in ext4. Ended running my own benchmarks. Found out that flat directory performs way better while being way simpler to use:
Wrote an article.
The biggest issue I've run into is on a 32-bit system. Once you pass a certain number, tools like 'ls' stop working.
Trying to do anything with that directory once you pass that barrier becomes a huge problem.
It really depends on the filesystem used, and also some flags.
For example, ext3 can have many thousands of files; but after a couple of thousands, it used to be very slow. Mostly when listing a directory, but also when opening a single file. A few years ago, it gained the 'htree' option, that dramatically shortened the time needed to get an inode given a filename.
Personally, I use subdirectories to keep most levels under a thousand or so items. In your case, I'd create 256 directories, with the two last hex digits of the ID. Use the last and not the first digits, so you get the load balanced.
If the time involved in implementing a directory partitioning scheme is minimal, I am in favor of it. The first time you have to debug a problem that involves manipulating a 10000-file directory via the console you will understand.
As an example, F-Spot stores photo files as YYYY\MM\DD\filename.ext, which means the largest directory I have had to deal with while manually manipulating my ~20000-photo collection is about 800 files. This also makes the files more easily browsable from a third party application. Never assume that your software is the only thing that will be accessing your software's files.
It absolutely depends on the filesystem. Many modern filesystems use decent data structures to store the contents of directories, but older filesystems often just added the entries to a list, so retrieving a file was an O(n) operation.
Even if the filesystem does it right, it's still absolutely possible for programs that list directory contents to mess up and do an O(n^2) sort, so to be on the safe side, I'd always limit the number of files per directory to no more than 500.
ext3 does in fact have directory size limits, and they depend on the block size of the filesystem. There isn't a per-directory "max number" of files, but a per-directory "max number of blocks used to store file entries". Specifically, the size of the directory itself can't grow beyond a b-tree of height 3, and the fanout of the tree depends on the block size. See this link for some details.
https://www.mail-archive.com/cwelug#googlegroups.com/msg01944.html
I was bitten by this recently on a filesystem formatted with 2K blocks, which was inexplicably getting directory-full kernel messages warning: ext3_dx_add_entry: Directory index full! when I was copying from another ext3 filesystem. In my case, a directory with a mere 480,000 files was unable to be copied to the destination.
"Depends on filesystem"
Some users mentioned that the performance impact depends on the used filesystem. Of course. Filesystems like EXT3 can be very slow. But even if you use EXT4 or XFS you can not prevent that listing a folder through ls or find or through an external connection like FTP will become slower an slower.
Solution
I prefer the same way as #armandino. For that I use this little function in PHP to convert IDs into a filepath that results 1000 files per directory:
function dynamic_path($int) {
// 1000 = 1000 files per dir
// 10000 = 10000 files per dir
// 2 = 100 dirs per dir
// 3 = 1000 dirs per dir
return implode('/', str_split(intval($int / 1000), 2)) . '/';
}
or you could use the second version if you want to use alpha-numeric characters:
function dynamic_path2($str) {
// 26 alpha + 10 num + 3 special chars (._-) = 39 combinations
// -1 = 39^2 = 1521 files per dir
// -2 = 39^3 = 59319 files per dir (if every combination exists)
$left = substr($str, 0, -1);
return implode('/', str_split($left ? $left : $str[0], 2)) . '/';
}
results:
<?php
$files = explode(',', '1.jpg,12.jpg,123.jpg,999.jpg,1000.jpg,1234.jpg,1999.jpg,2000.jpg,12345.jpg,123456.jpg,1234567.jpg,12345678.jpg,123456789.jpg');
foreach ($files as $file) {
echo dynamic_path(basename($file, '.jpg')) . $file . PHP_EOL;
}
?>
1/1.jpg
1/12.jpg
1/123.jpg
1/999.jpg
1/1000.jpg
2/1234.jpg
2/1999.jpg
2/2000.jpg
13/12345.jpg
12/4/123456.jpg
12/35/1234567.jpg
12/34/6/12345678.jpg
12/34/57/123456789.jpg
<?php
$files = array_merge($files, explode(',', 'a.jpg,b.jpg,ab.jpg,abc.jpg,ddd.jpg,af_ff.jpg,abcd.jpg,akkk.jpg,bf.ff.jpg,abc-de.jpg,abcdef.jpg,abcdefg.jpg,abcdefgh.jpg,abcdefghi.jpg'));
foreach ($files as $file) {
echo dynamic_path2(basename($file, '.jpg')) . $file . PHP_EOL;
}
?>
1/1.jpg
1/12.jpg
12/123.jpg
99/999.jpg
10/0/1000.jpg
12/3/1234.jpg
19/9/1999.jpg
20/0/2000.jpg
12/34/12345.jpg
12/34/5/123456.jpg
12/34/56/1234567.jpg
12/34/56/7/12345678.jpg
12/34/56/78/123456789.jpg
a/a.jpg
b/b.jpg
a/ab.jpg
ab/abc.jpg
dd/ddd.jpg
af/_f/af_ff.jpg
ab/c/abcd.jpg
ak/k/akkk.jpg
bf/.f/bf.ff.jpg
ab/c-/d/abc-de.jpg
ab/cd/e/abcdef.jpg
ab/cd/ef/abcdefg.jpg
ab/cd/ef/g/abcdefgh.jpg
ab/cd/ef/gh/abcdefghi.jpg
As you can see for the $int-version every folder contains up to 1000 files and up to 99 directories containing 1000 files and 99 directories ...
But do not forget that to many directories cause the same performance problems!
Finally you should think about how to reduce the amount of files in total. Depending on your target you can use CSS sprites to combine multiple tiny images like avatars, icons, smilies, etc. or if you use many small non-media files consider combining them e.g. in JSON format. In my case I had thousands of mini-caches and finally I decided to combine them in packs of 10.
The question comes down to what you're going to do with the files.
Under Windows, any directory with more than 2k files tends to open slowly for me in Explorer. If they're all image files, more than 1k tend to open very slowly in thumbnail view.
At one time, the system-imposed limit was 32,767. It's higher now, but even that is way too many files to handle at one time under most circumstances.
What most of the answers above fail to show is that there is no "One Size Fits All" answer to the original question.
In today's environment we have a large conglomerate of different hardware and software -- some is 32 bit, some is 64 bit, some is cutting edge and some is tried and true - reliable and never changing.
Added to that is a variety of older and newer hardware, older and newer OSes, different vendors (Windows, Unixes, Apple, etc.) and a myriad of utilities and servers that go along.
As hardware has improved and software is converted to 64 bit compatibility, there has necessarily been considerable delay in getting all the pieces of this very large and complex world to play nicely with the rapid pace of changes.
IMHO there is no one way to fix a problem. The solution is to research the possibilities and then by trial and error find what works best for your particular needs. Each user must determine what works for their system rather than using a cookie cutter approach.
I for example have a media server with a few very large files. The result is only about 400 files filling a 3 TB drive. Only 1% of the inodes are used but 95% of the total space is used. Someone else, with a lot of smaller files may run out of inodes before they come near to filling the space. (On ext4 filesystems as a rule of thumb, 1 inode is used for each file/directory.)
While theoretically the total number of files that may be contained within a directory is nearly infinite, practicality determines that the overall usage determine realistic units, not just filesystem capabilities.
I hope that all the different answers above have promoted thought and problem solving rather than presenting an insurmountable barrier to progress.
I ran into a similar issue. I was trying to access a directory with over 10,000 files in it. It was taking too long to build the file list and run any type of commands on any of the files.
I thought up a little php script to do this for myself and tried to figure a way to prevent it from time out in the browser.
The following is the php script I wrote to resolve the issue.
Listing Files in a Directory with too many files for FTP
How it helps someone
I recall running a program that was creating a huge amount of files at the output. The files were sorted at 30000 per directory. I do not recall having any read problems when I had to reuse the produced output. It was on an 32-bit Ubuntu Linux laptop, and even Nautilus displayed the directory contents, albeit after a few seconds.
ext3 filesystem: Similar code on a 64-bit system dealt well with 64000 files per directory.
I respect this doesn't totally answer your question as to how many is too many, but an idea for solving the long term problem is that in addition to storing the original file metadata, also store which folder on disk it is stored in - normalize out that piece of metadata. Once a folder grows beyond some limit you are comfortable with for performance, aesthetic or whatever reason, you just create a second folder and start dropping files there...
Not an answer, but just some suggestions.
Select a more suitable FS (file system). Since from a historic point of view, all your issues were wise enough, to be once central to FSs evolving over decades. I mean more modern FS better support your issues. First make a comparison decision table based on your ultimate purpose from FS list.
I think its time to shift your paradigms. So I personally suggest using a distributed system aware FS, which means no limits at all regarding size, number of files and etc. Otherwise you will sooner or later challenged by new unanticipated problems.
I'm not sure to work, but if you don't mention some experimentation, give AUFS over your current file system a try. I guess it has facilities to mimic multiple folders as a single virtual folder.
To overcome hardware limits you can use RAID-0.
There is no single figure that is "too many", as long as it doesn't exceed the limits of the OS. However, the more files in a directory, regardless of the OS, the longer it takes to access any individual file, and on most OS's, the performance is non-linear, so to find one file out of 10,000 takes more then 10 times longer then to find a file in 1,000.
Secondary problems associated with having a lot of files in a directory include wild card expansion failures. To reduce the risks, you might consider ordering your directories by date of upload, or some other useful piece of metadata.
≈ 135,000 FILES
NTFS | WINDOWS 2012 SERVER | 64-BIT | 4TB HDD | VBS
Problem: Catastrophic hardware issues appear when a [single] specific folder amasses roughly 135,000 files.
"Catastrophic" = CPU Overheats, Computer Shuts Down, Replacement Hardware needed
"Specific Folder" = has a VBS file that moves files into subfolders
Access = the folder is automatically accessed/executed by several client computers
Basically, I have a custom-built script that sits on a file server. When something goes wrong with the automated process (ie, file spill + dam) then the specific folder gets flooded [with unmoved files]. The catastrophe takes shape when the client computers keep executing the script. The file server ends up reading through 135,000+ files; and doing so hundreds of times each day. This work-overload ends up overheating my CPU (92°C, etc.); which ends up crashing my machine.
Solution: Make sure your file-organizing scripts never have to deal with a folder that has 135,000+ files.
flawless,
flawless,
absolutely flawless :
( G. M. - RIP )
function ff () {
d=$1; f=$2;
p=$( echo $f |sed "s/$d.*//; s,\(.\),&/,g; s,/$,," );
echo $p/$f ;
}
ff _D_ 09748abcGHJ_D_my_tagged_doc.json
0/9/7/4/8/a/b/c/G/H/J/09748abcGHJ_D_my_tagged_doc.json
ff - gadsf12-my_car.json
g/a/d/s/f/1/2/gadsf12-my_car.json
and also this
ff _D_ 0123456_D_my_tagged_doc.json
0/1/2/3/4/5/6/0123456_D_my_tagged_doc.json
ff .._D_ 0123456_D_my_tagged_doc.json
0/1/2/3/4/0123456_D_my_tagged_doc.json
enjoy !

Resources