looping a ffmpeg command that joins two files - loops

Command ffmpeg -i file-1.mp4 -vf ass=file-1a.ass burned-1.mp4
works to burn file-1a.ass subtitles on file-1.mp4 video.
But each time I have to reiterate the same command on over 40 different videos and subtitles and each time I have to wait for rendering the output.
So perhaps there is a way to automatically reiterate the same command on all the files.
Looking for a reply found the loop command
for f in *; do ffmpeg $f;
But I am confused how to use it with 2 files, the .mp4 and the .ass file, and also the output file which should have the same number
I imagine should put the same name on each couple of files, such as:
1.mp4 1.ass
2.mp4 2.ass
3.mp4 3.ass
etc
and then
for f in *; do ffmpeg -i $f.mp4 -vf ass=$f.ass $f-output.mp4
But I have no clear idea

You have the right idea. But it won’t work if the loop executes with f == 1.mp4, then again with f == 1.ass, and so on.
So you want to modify the loop to only iterate over .mp4 files. Then you want to strip the .mp4 extension from the value of f, that is, strip the last 4 characters from the value of f, using ${f:0: -4} (this means “get a substring of f, starting at character 0 and ending at 5 characters before the end”).
You obviously want to terminate the loop with done. I also suggest wrapping the parameters in quotes, to prevent word splitting (that is, if the filenames contain certain characters, they might be split into multiple arguments to ffmpeg).
Putting it all together:
for f in *.mp4; do f=${f%.*}; ffmpeg -i "$f.mp4" -vf ass="$f.ass" "$f-output.mp4"; done
Of course, once you have run this, you need to get rid of all the output files before you can run it again. Or you can just put the output files in a different directory to begin with.
Edit: Another user posted an answer, which seems to have been deleted. It was a good answer but lacked explanation. It was basically the same as my answer, except that it used ${f%.mp4} to strip the .mp4 extension. My answer is probably slightly more complex but slightly more efficient, so it’s basically a matter of personal preference.
Edit 2: Based on the link provided by llogan’s comment, I have made these changes:
Remove the quotes in the assignment, as assignments are not subject to word splitting (this is also stated in the bash man page).
Use ${f%.*} to strip the extension. This strips a dot followed by any sequence of characters from the end. It looks for the shortest possible match, so it’s really looking for a dot followed by any sequence of non-dot characters at the end.

Related

Replace a number in a file using array data, bash

I'm not an expert in bash coding and I'm trying to do one interative-like code to help me in my work.
I have a file that contains some numbers (coordinates), and I'm trying to make a code to read some specific numbers from the file and then store them in an array. Modify that array using some arithmetic operation and then replace the numbers in the original file with the modified array. So far I've done everything except replacing the numbers in the file, I tried using sed but it does not change the file. The original numbers are stored in an array called "readfile" and the new numbers are stored in an array called "d".
I'm trying to use sed in this way: sed -i 's/${readfile[$j]}/${d[$k]}/' file.txt
And I loop j and k to cover all the numbers in the arrays. Everything seems to work but the file is not being modified. After some digging, I'm noticing that sed is not reading the value of the array, but I do not know how to fix that.
Your help is really appreciated.
When a file isn't modified by sed -i, it means sed didn't find any matches to modify. Your pattern is wrong somehow.
After using " instead of ' so that the variables can actually be evaluated inside the string, look at the contents of the readfile array and check whether it actually matches the text. If it seems to match, look for special characters in the pattern, characters that would mean something specific to sed (the most common mistake is /, which will interfere with the search command).
The fix for special characters is either to (1) escape them, e.g. \/ instead of just /, or (2) (and especially for /) to use another delimiter for the search/replace command (instead of s/foo/bar/ you can use s|foo|bar| or s,foo,bar, etc - pretty much any delimiter works, so you can pick one that you know isn't in the pattern string).
If you post data samples and more of your script, we can look at where you went wrong.

Stable text feed for vim through vimserver

I am searching for a highly stable way to feed text (output of a program) into vim through vimserver. Assume that I have started a (g)vim session with gvim --servername vim myfile. The file myfile contains a (unique) line OUT: which marks the position where the text should be pasted. I can straight forwardly achieve this from the commandline with vim --servername vim --remote-send ':%s/OUT:/TEXT\\rOUT:/<Enter>'. I can repeatedly feed more text using the same command. Inside a C-program I can execute it with system(). However TEXT which is dynamic and arbitrary (received as a stream in the C-program) needs to be passed on the command line and hence it needs to be escaped. Furthermore using the replacement command %s vim will jump to the position where TEXT is inserted. I would like to find a way to paste large chunks of arbitrary text seamlessly in vim. An idea is to have vim read from a posix pipe with :r pipe and to write the the string from within the C-program to the pipe. Ideally the solution would be such that I can continuously edit the same file manually without noting that output is added at OUT: as long as this location is outside the visible area.
The purpose of this text feed is to create a command line based front end for scripting languages. The blocks of input is entered manually by the user in a vim buffer and is being sent to the interpreter through a pipe using vim's :! [interpreter] command. The [interpreter] can of course write the output to stdout (preceded by the original lines of input) in which case the input line is replaced by input and output (to be distinguished using some leading key characters for instance). However commands might take a long time to produce the actual output while the user might want to continue editing the file. Therefore my idea is to have [interpreter] return OUT: immediately and to append subsequent lines of output in this place as they become available using vimserver. However the output must be inserted in a way which does not disturb or corrupt the edits possibly made by the user at the same time.
EDIT
The proposed solutions seem to work.
However there seem to be at least two caveats: * if I send text two or more times this way the `` part of the commands will not take me back to the original cursor position (if I do it just once still the markers are modified which may interrupt the user editing the file manually) * if the user opens a different buffer (e.g. the online help) the commands will fail (or maybe insert the text in the present buffer)
Any ideas?
EDIT: After actually trying, this should work for you:
vim --servername vim --remote-send \
":set cpo+=C<CR>/OUT:<CR>:i<CR>HELLO<CR>WORLD<CR>.<CR>\`\`"
As far as I can see, the only caveats would be the period on a single line, which would terminate :insert instead of being inserted, and <..> sequences that might be interpreted as keypresses. Also, need to replace any newlines in the text with <CR>. However, you have no worries about regular expressions getting muddled, the input is not the command line, the amount of escaping necessary is minimal, and the jumping is compensated for with the double backticks.
Check out :help 'cpoptions', :help :insert, and :help ''.
Instead of dealing with the escaping, I would rather use lower-level functions. Use let lnum = search('^OUT:$') to locate the marker line, and call setline(lnum, [text, 'OUT:']) to insert the text and the marker line again.

How do I find out if a file name has any extension in Unix?

I need to find out if a file or directory name contains any extension in Unix for a Bourne shell scripting.
The logic will be:
If there is a file extension
Remove the extension
And use the file name without the extension
This is my first question in SO so will be great to hear from someone.
The concept of an extension isn't as strictly well-defined as in traditional / toy DOS 8+3 filenames. If you want to find file names containing a dot where the dot is not the first character, try this.
case $filename in
[!.]*.*) filename=${filename%.*};;
esac
This will trim the extension (as per the above definition, starting from the last dot if there are several) from $filename if there is one, otherwise no nothing.
If you will not be processing files whose names might start with a dot, the case is superfluous, as the assignment will also not touch the value if there isn't a dot; but with this belt-and-suspenders example, you can easily pick the approach you prefer, in case you need to extend it, one way or another.
To also handle files where there is a dot, as long as it's not the first character (but it's okay if the first character is also a dot), try the pattern ?*.*.
The case expression in pattern ) commands ;; esac syntax may look weird or scary, but it's quite versatile, and well worth learning.
I would use a shell agnostic solution. Runing the name through:
cut -d . -f 1
will give you everything up to the first dot ('-d .' sets the delimeter and '-f 1' selects the first field). You can play with the params (try '--complement' to reverse selection) and get pretty much anything you want.

removing a line from a text file?

I am working with a text file, which contains a list of processes under my programs control, along with relevant data.
At some point, one of the processes will finish, and thus will need to be removed from the file (as its no longer under control).
Here is a sample of the file contents (which has enteries added "randomly"):
PID=25729 IDLE=0.200000 BUSY=0.300000 USER=-10.000000
PID=26416 IDLE=0.100000 BUSY=0.800000 USER=-20.000000
PID=26522 IDLE=0.400000 BUSY=0.700000 USER=-30.000000
So for example, if I wanted to remove the line that says PID=26416.... how could I do that, without writing the file over again?
I can use external unix commands, however I am not very familiar with them so please if that is your suggestion, give an example.
Thanks!
Either you keep the contents of the file in temporary memory and then rewrite the file. Or you could have a file for each of the PIDs with the relevant information in them. Then you simply delete the file when it's no longer running. Or you could use a database for this instead.
As others have already pointed out, your only real choice is to rewrite the file.
The obvious way to do that with "external UNIX commands" would be grep -v "PID=26416" (or whatever PID you want to remove, obviously).
Edit: It is probably worth mentioning that if the lines are all the same length (as you've shown here) and order doesn't matter, you could delete a line more efficiently by copying the last line into the space being vacated, then shorten the file so eliminate what had been the last line. This will only work if they really are all the same length though (e.g., if you got a PID of '1', you'd need to pad it to the same length as the others in the file).
The only way is by copying each character that comes after the deleted line down over the characters that are deleted.
It is far more efficient to simply rewrite the file.
how could I do that, without writing the file over again?
You cannot. Filesystems (perhaps besides more esoteric record based ones) does not support insertion or deletion.
So you'll have to write the lines to a temporary file up till the line you want to delete, skip over that line, and write the rest of the lines to the file. When done, rename/copy the temp file to the original filename
Why are you maintaining these in a text file? That's not the best model for such a task. But, if you're stuck with it ... if these lines are guaranteed to all be the same length (it appears that way from the sample), and if the order of the lines in the file doesn't matter, then you can write the last line over the line for the process that has died and then shorten the file by one line with the (f)truncate() call if you're on a POSIX system: see Jonathan Leffler's answer in How to truncate a file in C?
But note carefully netrom's answer, which gives three different better ways to maintain this info.
Also, if you stick with a text file (preferably written from scratch each time from data structures you maintain, as per netrom's first suggestion), and you want to be sure that the file is always well formed, then write the new data into a temp file on the same device (putting it in the same directory is easiest) and then do a rename() call, which is an atomic operation.
You can use sed:
sed -i.bak -e '/PID=26416/d' test
-i is for editing in place. It also creates a back-up file with the new extension .bak
-e is for specifying the pattern. The /d indicates all lines matching the pattern should be deleted.
test is the filename
The unix command for it is:
grep -v "PID=26416" myfile > myfile.tmp
mv myfile.tmp myfile
The grep -v part outputs the file without the rows with the search term.
The > myfile.tmp part creates a new temp file for this output.
The mv part renames the temp file to the original file.
Note that we are rewriting the file here, and moreover, we can lose data if someone write something to file between the two commands.

Why should text files end with a newline?

I assume everyone here is familiar with the adage that all text files should end with a newline. I've known of this "rule" for years but I've always wondered — why?
Because that’s how the POSIX standard defines a line:
3.206 Line
A sequence of zero or more non- <newline> characters plus a terminating <newline> character.
Therefore, lines not ending in a newline character aren't considered actual lines. That's why some programs have problems processing the last line of a file if it isn't newline terminated.
There's at least one hard advantage to this guideline when working on a terminal emulator: All Unix tools expect this convention and work with it. For instance, when concatenating files with cat, a file terminated by newline will have a different effect than one without:
$ more a.txt
foo
$ more b.txt
bar$ more c.txt
baz
$ cat {a,b,c}.txt
foo
barbaz
And, as the previous example also demonstrates, when displaying the file on the command line (e.g. via more), a newline-terminated file results in a correct display. An improperly terminated file might be garbled (second line).
For consistency, it’s very helpful to follow this rule – doing otherwise will incur extra work when dealing with the default Unix tools.
Think about it differently: If lines aren’t terminated by newline, making commands such as cat useful is much harder: how do you make a command to concatenate files such that
it puts each file’s start on a new line, which is what you want 95% of the time; but
it allows merging the last and first line of two files, as in the example above between b.txt and c.txt?
Of course this is solvable but you need to make the usage of cat more complex (by adding positional command line arguments, e.g. cat a.txt --no-newline b.txt c.txt), and now the command rather than each individual file controls how it is pasted together with other files. This is almost certainly not convenient.
… Or you need to introduce a special sentinel character to mark a line that is supposed to be continued rather than terminated. Well, now you’re stuck with the same situation as on POSIX, except inverted (line continuation rather than line termination character).
Now, on non POSIX compliant systems (nowadays that’s mostly Windows), the point is moot: files don’t generally end with a newline, and the (informal) definition of a line might for instance be “text that is separated by newlines” (note the emphasis). This is entirely valid. However, for structured data (e.g. programming code) it makes parsing minimally more complicated: it generally means that parsers have to be rewritten. If a parser was originally written with the POSIX definition in mind, then it might be easier to modify the token stream rather than the parser — in other words, add an “artificial newline” token to the end of the input.
Each line should be terminated in a newline character, including the last one. Some programs have problems processing the last line of a file if it isn't newline terminated.
GCC warns about it not because it can't process the file, but because it has to as part of the standard.
The C language standard says
A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character.
Since this is a "shall" clause, we must emit a diagnostic message for a violation of this rule.
This is in section 2.1.1.2 of the ANSI C 1989 standard. Section 5.1.1.2 of the ISO C 1999 standard (and probably also the ISO C 1990 standard).
Reference: The GCC/GNU mail archive.
This answer is an attempt at a technical answer rather than opinion.
If we want to be POSIX purists, we define a line as:
A sequence of zero or more non- <newline> characters plus a terminating <newline> character.
Source: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206
An incomplete line as:
A sequence of one or more non- <newline> characters at the end of the file.
Source: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_195
A text file as:
A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.
Source: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_397
A string as:
A contiguous sequence of bytes terminated by and including the first null byte.
Source: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_396
From this then, we can derive that the only time we will potentially encounter any type of issues are if we deal with the concept of a line of a file or a file as a text file (being that a text file is an organization of zero or more lines, and a line we know must terminate with a <newline>).
Case in point: wc -l filename.
From the wc's manual we read:
A line is defined as a string of characters delimited by a <newline> character.
What are the implications to JavaScript, HTML, and CSS files then being that they are text files?
In browsers, modern IDEs, and other front-end applications there are no issues with skipping EOL at EOF. The applications will parse the files properly. It has to since not all Operating Systems conform to the POSIX standard, so it would be impractical for non-OS tools (e.g. browsers) to handle files according to the POSIX standard (or any OS-level standard).
As a result, we can be relatively confident that EOL at EOF will have virtually no negative impact at the application level - regardless if it is running on a UNIX OS.
At this point we can confidently say that skipping EOL at EOF is safe when dealing with JS, HTML, CSS on the client-side. Actually, we can state that minifying any one of these files, containing no <newline> is safe.
We can take this one step further and say that as far as NodeJS is concerned it too cannot adhere to the POSIX standard being that it can run in non-POSIX compliant environments.
What are we left with then? System level tooling.
This means the only issues that may arise are with tools that make an effort to adhere their functionality to the semantics of POSIX (e.g. definition of a line as shown in wc).
Even so, not all shells will automatically adhere to POSIX. Bash for example does not default to POSIX behavior. There is a switch to enable it: POSIXLY_CORRECT.
Food for thought on the value of EOL being <newline>: https://www.rfc-editor.org/old/EOLstory.txt
Staying on the tooling track, for all practical intents and purposes, let's consider this:
Let's work with a file that has no EOL. As of this writing the file in this example is a minified JavaScript with no EOL.
curl http://cdnjs.cloudflare.com/ajax/libs/AniJS/0.5.0/anijs-min.js -o x.js
curl http://cdnjs.cloudflare.com/ajax/libs/AniJS/0.5.0/anijs-min.js -o y.js
$ cat x.js y.js > z.js
-rw-r--r-- 1 milanadamovsky 7905 Aug 14 23:17 x.js
-rw-r--r-- 1 milanadamovsky 7905 Aug 14 23:17 y.js
-rw-r--r-- 1 milanadamovsky 15810 Aug 14 23:18 z.js
Notice the cat file size is exactly the sum of its individual parts. If the concatenation of JavaScript files is a concern for JS files, the more appropriate concern would be to start each JavaScript file with a semi-colon.
As someone else mentioned in this thread: what if you want to cat two files whose output becomes just one line instead of two? In other words, cat does what it's supposed to do.
The man of cat only mentions reading input up to EOF, not <newline>. Note that the -n switch of cat will also print out a non- <newline> terminated line (or incomplete line) as a line - being that the count starts at 1 (according to the man.)
-n Number the output lines, starting at 1.
Now that we understand how POSIX defines a line , this behavior becomes ambiguous, or really, non-compliant.
Understanding a given tool's purpose and compliance will help in determining how critical it is to end files with an EOL. In C, C++, Java (JARs), etc... some standards will dictate a newline for validity - no such standard exists for JS, HTML, CSS.
For example, instead of using wc -l filename one could do awk '{x++}END{ print x}' filename , and rest assured that the task's success is not jeopardized by a file we may want to process that we did not write (e.g. a third party library such as the minified JS we curld) - unless our intent was truly to count lines in the POSIX compliant sense.
Conclusion
There will be very few real life use cases where skipping EOL at EOF for certain text files such as JS, HTML, and CSS will have a negative impact - if at all. If we rely on <newline> being present, we are restricting the reliability of our tooling only to the files that we author and open ourselves up to potential errors introduced by third party files.
Moral of the story: Engineer tooling that does not have the weakness of relying on EOL at EOF.
Feel free to post use cases as they apply to JS, HTML and CSS where we can examine how skipping EOL has an adverse effect.
It may be related to the difference between:
text file (each line is supposed to end in an end-of-line)
binary file (there are no true "lines" to speak of, and the length of the file must be preserved)
If each line does end in an end-of-line, this avoids, for instance, that concatenating two text files would make the last line of the first run into the first line of the second.
Plus, an editor can check at load whether the file ends in an end-of-line, saves it in its local option 'eol', and uses that when writing the file.
A few years back (2005), many editors (ZDE, Eclipse, Scite, ...) did "forget" that final EOL, which was not very appreciated.
Not only that, but they interpreted that final EOL incorrectly, as 'start a new line', and actually start to display another line as if it already existed.
This was very visible with a 'proper' text file with a well-behaved text editor like vim, compared to opening it in one of the above editors. It displayed an extra line below the real last line of the file. You see something like this:
1 first line
2 middle line
3 last line
4
Some tools expect this. For example, wc expects this:
$ echo -n "Line not ending in a new line" | wc -l
0
$ echo "Line ending with a new line" | wc -l
1
A separate use case: commit hygiene, when your text file is version controlled.
If content is added to the end of the file, then the line that was previously the last line will have been edited to include a newline character. This means that blameing the file to find out when that line was last edited will show the newline addition, not the commit before that you actually wanted to see.
(The example is specific to git, but the same approach applies to other version control systems too.)
Basically there are many programs which will not process files correctly if they don't get the final EOL EOF.
GCC warns you about this because it's expected as part of the C standard. (section 5.1.1.2 apparently)
"No newline at end of file" compiler warning
I've wondered this myself for years. But i came across a good reason today.
Imagine a file with a record on every line (ex: a CSV file). And that the computer was writing records at the end of the file. But it suddenly crashed. Gee was the last line complete? (not a nice situation)
But if we always terminate the last line, then we would know (simply check if the last line is terminated). Otherwise we would probably have to discard the last line every time, just to be safe.
This originates from the very early days when simple terminals were used. The newline char was used to trigger a 'flush' of the transferred data.
Today, the newline char isn't required anymore. Sure, many apps still have problems if the newline isn't there, but I'd consider that a bug in those apps.
If however you have a text file format where you require the newline, you get simple data verification very cheap: if the file ends with a line that has no newline at the end, you know the file is broken. With only one extra byte for each line, you can detect broken files with high accuracy and almost no CPU time.
In addition to the above practical reasons, it wouldn't surprise me if the originators of Unix (Thompson, Ritchie, et al.) or their Multics predecessors realized that there is a theoretical reason to use line terminators rather than line separators: With line terminators, you can encode all possible files of lines. With line separators, there's no difference between a file of zero lines and a file containing a single empty line; both of them are encoded as a file containing zero characters.
So, the reasons are:
Because that's the way POSIX defines it.
Because some tools expect it or "misbehave" without it. For example, wc -l will not count a final "line" if it doesn't end with a newline.
Because it's simple and convenient. On Unix, cat just works and it works without complication. It just copies the bytes of each file, without any need for interpretation. I don't think there's a DOS equivalent to cat. Using copy a+b c will end up merging the last line of file a with the first line of file b.
Because a file (or stream) of zero lines can be distinguished from a file of one empty line.
Why should text files end with a newline?
Because that's the sanest choice to make.
Take a file with the following content,
one\n
two\n
three
where \n means a newline character, which on Windows is \r\n, a return character followed by line feed, because it's so cool, right?
How many lines does this file have? Windows says 3, we say 3, POSIX (Linux) says that the file is crippled because there should be a \n at the end of it.
Regardless, what would you say its last line is? I guess anybody agrees that three is the last line of the file, but POSIX says that's a crippled line.
And what is its second line? Oh, here we have the first strong separation:
Windows says two because a file is "lines separated by newlines" (wth?);
POSIX says two\n, adding that that's a true, honest line.
What's the consequence of Windows choice, then? Simple:
You cannot say that a file is made up of lines
Why? Try to take the last line from the previous file and replicate it a few times... What you get? This:
one\n
two\n
threethreethreethree
Try, instead, to swap second and third line... And you get this:
one\n
threetwo\n
Therefore
You must say that a text file is an alternation of lines and \ns, which starts with a line and ends with a line
which is quite a mouthful, right?
And you want another strange consequence?
You must accept that an empty file (0 bytes, really 0 bits) is a one-line file, magically, always because they are cool at Microsoft
Which is quite a crazyness, don't you think?
What is the consequence of POSIX choice?
That the file on the top is just a bit crippled, and we need some hack to deal with it.
Being serious
I'm being provocative, in the preceding text, for the reason that dealing with text files lacking the \n at the end forces you to treat them with ad-hoc ticks/hacks. You always need an if/else somewhere to make things work, where the branch dealing with the crippled line only deals with the crippled line, all the other lines taking the other branch. It's a bit racist, no?
My conclusion
I'm in favour of POSIX definition of a line for the following reasons:
A file is naturally conceived as a sequence of lines
A line shouldn't be one thing or another depending on where it is in the file
An empty file is not a one-line file, come on!
You should not be forced to make hacks in your code
And yes, Windows does encourage you to omit the trailing \r\n. If you want a two lines file below, you have to omit the trailing \r\n otherwise text editors will show it as a 3-lines file:
Presumably simply that some parsing code expected it to be there.
I'm not sure I would consider it a "rule", and it certainly isn't something I adhere to religiously. Most sensible code will know how to parse text (including encodings) line-by-line (any choice of line endings), with-or-without a newline on the last line.
Indeed - if you end with a new line: is there (in theory) an empty final line between the EOL and the EOF? One to ponder...
There's also a practical programming issue with files lacking newlines at the end: The read Bash built-in (I don't know about other read implementations) doesn't work as expected:
printf $'foo\nbar' | while read line
do
echo $line
done
This prints only foo! The reason is that when read encounters the last line, it writes the contents to $line but returns exit code 1 because it reached EOF. This breaks the while loop, so we never reach the echo $line part. If you want to handle this situation, you have to do the following:
while read line || [ -n "${line-}" ]
do
echo $line
done < <(printf $'foo\nbar')
That is, do the echo if the read failed because of a non-empty line at end of file. Naturally, in this case there will be one extra newline in the output which was not in the input.
Why should (text) files end with a newline?
As well expressed by many, because:
Many programs do not behave well, or fail without it.
Even programs that well handle a file lack an ending '\n', the tool's functionality may not meet the user's expectations - which can be unclear in this corner case.
Programs rarely disallow final '\n' (I do not know of any).
Yet this begs the next question:
What should code do about text files without a newline?
Most important - Do not write code that assumes a text file ends with a newline. Assuming a file conforms to a format leads to data corruption, hacker attacks and crashes. Example:
// Bad code
while (fgets(buf, sizeof buf, instream)) {
// What happens if there is no \n, buf[] is truncated leading to who knows what
buf[strlen(buf) - 1] = '\0'; // attempt to rid trailing \n
...
}
If the final trailing '\n' is needed, alert the user to its absence and the action taken. IOWs, validate the file's format. Note: This may include a limit to the maximum line length, character encoding, etc.
Define clearly, document, the code's handling of a missing final '\n'.
Do not, as possible, generate a file the lacks the ending '\n'.
It's very late here but I just faced one bug in file processing and that came because the files were not ending with empty newline. We were processing text files with sed and sed was omitting the last line from output which was causing invalid json structure and sending rest of the process to fail state.
All we were doing was:
There is one sample file say: foo.txt with some json content inside it.
[{
someProp: value
},
{
someProp: value
}] <-- No newline here
The file was created in widows machine and window scripts were processing that file using PowerShell commands. All good.
When we processed same file using sed command sed 's|value|newValue|g' foo.txt > foo.txt.tmp
The newly generated file was
[{
someProp: value
},
{
someProp: value
and boom, it failed the rest of the processes because of the invalid JSON.
So it's always a good practice to end your file with empty new line.
I was always under the impression the rule came from the days when parsing a file without an ending newline was difficult. That is, you would end up writing code where an end of line was defined by the EOL character or EOF. It was just simpler to assume a line ended with EOL.
However I believe the rule is derived from C compilers requiring the newline. And as pointed out on “No newline at end of file” compiler warning, #include will not add a newline.
Imagine that the file is being processed while the file is still being generated by another process.
It might have to do with that? A flag that indicates that the file is ready to be processed.
I personally like new lines at the end of source code files.
It may have its origin with Linux or all UNIX systems for that matter. I remember there compilation errors (gcc if I'm not mistaken) because source code files did not end with an empty new line. Why was it made this way one is left to wonder.
IMHO, it's a matter of personal style and opinion.
In olden days, I didn't put that newline. A character saved means more speed through that 14.4K modem.
Later, I put that newline so that it's easier to select the final line using shift+downarrow.

Resources