Why is it such a mess with mimetypes? - mime-types

I'd like to create a database table populated with mimetypes and related known extensions.
Here is a Venn Diagram I've just created:
Can we rely on libmagic for identifyng file for security purposes ?
It seems Libmagic is aware of a very small part of "all known mimetypes".
For Libmagic I've used:
find ~/file-5.14/magic/Magdir/ -type f -exec grep '!:mime' "{}" \; > /tmp/mime.log
Where file-5.14 is the source code of Linux file command.
For Apache I've used mime.types file:
For IANA, I've used IANA Media Types
There is a bias in my methodology ?

Related

How to find where a string was added at first in code, with ClearCase?

I want to find in which ClearCase label a specific string was added in code?
I am using base ClearCase.
I recommended before (8 years ago) to limit the scope of your search and use the exec clause of a cleartool find.
Example:
cleartool find -all -type f -user myLogin \
-version "lbtype(A_LABEL)" \
-exec ...
If you can do so in a dynamic view, you can then directly grep the content of CLEARCASE_XPN, the variable set by cleartool find for each version found.
It reference an extended pathname that (in a dynamic view) you can directly read and grep for your code)
You can do so for each label you can find in your Vob, from the oldest to the newest.
Z:myvob>ct lstype -kind lbtype -short
Z:myvob>ct find . -version "lbtype(A_LABEL)" -print
If you are looking for a specific change in a given source file, the cleartool annotate command will give you a good start. If you're familiar with GIT, this is the equivalent of "git blame."
Annotate works only if the element is one of the text file types (text_file, utf?_text_file, etc.) since those store delta information on a per version basis.
One caveat is that this will tell you what version the change came from, but if that version was created by a merge, you may have to backtrack the merge to find the original location of the change. ALMToolbox's "visual annotate" tool does that for you, if I recall correctly.

Error when trying to recursively add file extension to all files

Referring to this post, recursively add file extension to all files, I am trying to recursively add extensions to many files within many separate subfolders. All of the files appearing at the end of my subfolders do not have any extension at all, and I would like to give them all a .html extension.
I have tried the following in my command prompt after using cd to change to the parent directory that I would like to use:
find /path -type f -not -name "*.*" -exec mv "{}" "{}".html \;
However, I receive the following error: "FIND: Invalid switch"
I am new to using the command prompt for this type of manipulation, so please excuse my ignorance. I am thinking that maybe I have to change the /path to the directory I want it to look through, but I tried that to no avail.
I have also tried the following command:
find . -type f -exec mv '{}' '{}'.html \;
and receive the following error: FIND: Parameter format not correct
I am running Windows 10.
Seems like -not isn't available in your find version, use ! instead:
find /path -type f \! -name "*.*" -exec mv "{}" "{}".html \;
From find manual:
-not expr
Same as ! expr, but not POSIX compliant.
-not is an archaic form of logical negation; the current form is ! (exclamation). It has to be followed by a boolean expression. In this case, you followed it with -name, which fouled the command line parsing. -name is another option, not a valid expression operator.
You need to build the negation within your regular expression: negate the period, not the entire name.
I see another strong indicator: what is FIND? The command you supposedly ran is find; UNIX is case-significant. At whatever command line you're using, type man find or find --help to get a list of options and semantics. I'm worried that the bash you have under Windows isn't full-featured.
Are you familiar with the Windows command rename? It has a syntax similar to the UNIX mv, although it will work on multiple files. For instance
rename [^.]* *.html
I think would work for a single directory.
Apologies to all who commented and left answers. I believe I was unclear that I was trying to use this specifically from the windows cmd prompt. I used the following to add extensions to all files at the end of my subfolders:
FOR /R %f IN (*.) DO REN "%f" *.html

ClearCase UCM: Branch created of file that is not part of an activity. What happened?

I have somehow created a branch of a file in clearcase UCM that is not part of an activity. I have no idea how to reproduce this, but my stream is showing many files with this symptom. How can I find these files, remove them, and prevent it from happening again in the future?
Here is an example of one such file, names redacted to protect the innocent:
xxxxxxxxxxx.cpp##/main/xxx-integration/xxxxxx-xxxxxxxx/0 Rule: .../xxx-xxxxxxx/LATEST
A ct lsact -long | grep <filename> returns no results.
Update:
I used a find command to track down all the files that are on the branch given (and redacted) above, though I still do not understand the issue.
Per VonC's answer, where is what I ended up doing:
cleartool find . -type f -version "version(.../xxx/LATEST)&&version(.../xxx/0)" -print | tee ~/tmp/files2
I then read through the list of files generated to make sure they made sense, then I verified they were not attached to an activity and removed the versions:
cat ~/tmp/files2 | while read
do
if [ -z "$(ct describe -fmt "%[activity]p" $REPLY)" ]
then
ct rmbranch -f ${REPLY%/0}
fi
done
That can happen ig those file were checkout in a base ClearCase view, ie a non-UCM view, withg a simple config spec:
element * .../xxx-integration/LATEST -mkbranch xxxxxx-xxxxxxxx
You can use a find command similar to "How can I find all elements on a branch with version LATEST that has no label applied?".
The difference is: for each version found, you need to describe it in order to check if there is an activity attached to it or not (with a fmt_ccase):
cleartool describe -fmt "%[activity]p" "$CLEARCASE_XPN"

Text specification for a tree of files?

I'm looking for examples of specifying files in a tree structure, for example, for specifying the set of files to search in a grep tool. I'd like to be able to include and exclude files and directories by name matches. I'm sure there are examples out there, but I'm having a hard time finding them.
Here's an example of a possible syntax:
*.py *.html
*.txt *.js
-*.pyc
-.svn/
-*combo_*.js
(this would mean include file with extensions .py .html .txt .js, exclude .pyc files, anything under a .svn directory, and any file matching combo_.js)
I know I've seen these sorts of specifications in other tools before. Is this ringing any bells for anyone?
There is no single standard format for this kind of thing, but if you want to copy something that is widely recognized, have a look at the rsync documentation. Look at the chapter on "INCLUDE/EXCLUDE PATTERN RULES."
Apache Ant provides 'ant globs or patterns where:
**/foo/**/*.java
means "any file ending in '.java' in a directory which includes a directory named 'foo' in its path" -- including ./foo/X.java
In your example syntax, is it implicitly understood that there's an escaping character so that you can explicitly include a file that begins with a dash? (The same question goes for any other wildcard characters, but I suppose I'd expect to see more files with dashes in their names than asterisks.)
Various command shells use * (and possibly ? to match a single char), as in your example, but they generally only match against a string of characters that doesn't include a path component separator (i.e. '\' on Windows systems, '/' elsewhere). I've also seen such source control apps as Perforce use additional patterns that can match against path component separators. For instance, with Perforce the pattern "foo/...ext" (without quotes) will match all files under the foo/ directory structure that end with "ext", regardless of whether they are in foo/ itself or in one of its descendant directories. This seems to be a useful pattern.
If you're using bash, you can use the extglob extension to get some nice globbing functions. Enable it as follows:
shopt -s extglob
Then you can do things like the following:
# everything but .html, .jpg or ,gif files
ls -d !(*.html|*gif|*jpg)
# list file9, file22 but not fileit
ls file+([0-9])
# begins with apl or un only
ls -d +(apl*|un*)
See also this page.
How about find in unixish environments?
Find can, of course, do more than build a list of files, but that is one of the common ways it is used. From the man page:
NAME
find -- walk a file hierarchy
SYNOPSIS
find [-H | -L | -P] [-EXdsx] [-f pathname] pathname ... expression
find [-H | -L | -P] [-EXdsx] -f pathname [pathname ...] expression
DESCRIPTION
The find utility recursively descends the directory tree for each
pathname listed, evaluating an expression (composed of the
primaries''
andoperands'' listed below) in terms of each file in the tree.
to achieve your goal I would write something like (formatted for readability):
find ./ \( -name *.{py,html,txt,js,pyc} -or \
-name *combo_*.js -or \
\( -name *.svn -and -type d\)\) \
-print
Moreover there is a idomatic pattern using xargs which makes find suitable for sending the whole list so constructed to an arbitrary command as in:
find /path -type f -print0 | xargs -0 rm
find(1) is a fine tool as described in the previous answer but if it gets more complicated, you should consider either writing your own script in any of the usual suspects (Ruby, Perl, Python et al.) or try to use one of the more powerful shells such as zsh which has a ** globbing commands and you can specify things to exclude. The latter is probably more complicated though.
You might want to check out ack, which allows you to specify file types to search in with options like --perl, etc.
It also ignores .svn directories by default, as well as core dumps, editor cruft, binary files, and so on.

In ClearCase, how can I view old version of a file in a static view, from the command line?

In a static view, how can I view an old version of a file?
Given an empty file (called empty in this example) I can subvert diff to show me the old version:
% cleartool diff -ser empty File##/main/28
This feels like a pretty ugly hack. Have I missed a more basic command? Is there a neater way to do this?
(I don't want to edit the config spec - that's pretty tedious, and I'm trying to look at a bunch of old versions.)
Clarification: I want to send the version of the file to stdout, so I can use it with the rest of Unix (grep, sed, and so on.) If you found this question because you're looking for a way to save a version of an element to a file, see Brian's answer.
I'm trying to look at a bunch of old versions
I am not sure if you are speaking about "a bunch of old versions" of one file, "a bunch of old versions" from several files.
To visualize several old versions of one file, the simplest mean is to display its version tree (ct lsvtree -graph File), and then select a version, right-click on it and 'Send To' an editor which accepts multiple files (like Notepad++). In a few click you will have a view of those old versions.
Note: you must have CC6.0 or 7.0.1 IFix01 (7.0.0 and 7.0.1 fail to 'sent to' a file with the following error message "Access to unnamed file was denied")
But to visualize several old versions of different files, I would recommend a dynamic view and editing the config spec of that view (and not the snapshot view you are currently working with), in order to quickly select all those old files (hopefully through a simple select rule like 'element * aLabel')
[From the comments:]
what's the idiomatic way to "cat" an earlier revision of a file?
The idiomatic way is through a dynamic view (that you configure with the exact same config spec than your existing snapshot view).
You can then browse (as in 'change directory to') the various extended paths of a file.
If you want to cat all versions of a branch of a file, you go in:
cd /view/MyView/vobs/myVobs/myPath/myFile##/main/[...]/maBranch
cat 1
cat 2
...
cat x
'1', '2', ... 'x' being the version 1, 2, ... x of your file within that branch.
For a snapshot view, the extended path is not accessible, so your "hack" is the way to go.
However, 2 remarks here:
to quickly display all previous revisions of a snapshot file in a given branch, you can type:
(one line version for copy-paste, Unix syntax:)
cleartool find addon.xml -ver 'brtype(aBranch) && !version(.../aBranch/LATEST) && ! version(.../aBranch/0)' -exec 'cleartool diff -ser empty "$CLEARCASE_XPN"'
(multi-line version for readability:)
cleartool find addon.xml -ver 'brtype(aBranch) &&
!version(.../aBranch/LATEST) &&
! version(.../aBranch/0)'
-exec 'cleartool diff -ser empty "$CLEARCASE_XPN"'
you can quickly have an output a little nicer with
(one line version for copy-paste, Unix syntax:)
cleartool find addon.xml -ver 'brtype(aBranch) && !version(.../aBranch/LATEST) && ! version(.../aBranch/0)' -exec 'cleartool diff -ser empty "$CLEARCASE_XPN"' | ccperl -nle '$a=$_; $b = $a; $b =~ s/^>+\s(?:file\s+\d+:\s+)?//g;print $b if $a =~/^>/'
(multi-line version for readability:)
cleartool find addon.xml -ver 'brtype(aBranch) &&
!version(.../aBranch/LATEST) &&
! version(.../aBranch/0)'
-exec 'cleartool diff -ser empty "$CLEARCASE_XPN"'
| ccperl -nle '$a=$_; $b = $a;
$b =~ s/^>+\s(?:file\s+\d+:\s+)?//g;
print $b if $a =~/^>/'
That way, the output is nicer.
The "cleartool get" command (man page) mentioned below by Brian don't do stdout:
The get command copies only file elements into a view.
On a UNIX or Linux system, copy /dev/hello_world/foo.c##/main/2 into the current directory.
cmd-context get –to foo.c.temp /dev/hello_world/foo.c##/main/2
On a Windows system, copy \dev\hello_world\foo.c##\main\2 into the C:\build directory.
cmd-context get –to C:\build\foo.c.temp \dev\hello_world\foo.c##\main\2
So maybe than, by piping the result to a cat (or type in windows), you can then do something with the output of said cat (type) command.
cmd-context get –to C:\build\foo.c.temp \dev\hello_world\foo.c##\main\2 | type C:\build\foo.c.temp
I know this is an old thread...but I couldn't let this thrashing go by unresolved....
Static views have a "ct get" command that does exactly what you are looking for.
cleartool get -to ~/foo File##/main/28
will save this version of the file in ~/foo.
[ Rewritten based on the first comment ]
All files in Clearcase, including versions, are available in the virtual directory structure. I don't have a lot of familiarity with static views, but I believe they still go through a virtual fs; they just get updated differently.
In that case, you can just do:
cat File##/main/28
It can get ugly if you also have to find the right version of a directory that contained that file element. We have a PERL script at work that uses this approach to analyze historical changes made to files, and we quickly ran out of command-line space on Windows to actually run the commands!
If File is a Clearcase element, and cat File works, and the view is set correctly, then try:
cat File##/main/28
(note: without the ct shell-- you shouldn't need this if you're already in the view.)
Try typing:
ct ls -l File
If it shows the file with an extended name similar to the above, then you should be able to cat the file using an extended name.
ct shell cat File##version

Resources