I have a directory full of small text files.
I want to create a new text file which has contents of all the small files (first have to convert the contents to lower case). Once, I have appended the small text files to get the larger one, I just want to sort it and only save unique elements.
cat directoryname/*|tr [:upper:] [:lower:] >filename.txt
sort -u filename.txt
or just:
cat directoryname/*|tr [:upper:] [:lower:]|sort -u > unique_elements.txt
Edit: i missed the part about the lowercase, see Kimvais' answer for the case transforming.
First, to append all the contents of files in /path/to/dir to 1 file:
find /path/to/dir -type f -maxdepth 1 -exec cat {} >> /var/tmp/large_file \;
Or:
cat /path/to/dir/*.log >> /var/tmp/fie_with_all_files_contents
Then to sort and only keep unique items:
sort -u /var/tmp/large_file -o /var/tmp/large_file
Or, with redirection:
sort -u /var/tmp/large_file > /var/tmp/sorted_file
man sort
man find
Related
I have a .csv file in which looks something like this:
unnamed_0711-42_p1.mov,day1_0711-42_p1.mov
unnamed_0711-51_p2.mov,day1_0711-51_p2.mov
unnamed_0716-42_p1_2.mov,day1_0716-42_p1_2.mov
unnamed_0716-51_p2_2.mov,day1_0716-51_p2_2.mov
I have written this code to rename files from the name in field 1 (e.g. unnamed_0711-42_p1.mov), to the name in field 2 (e.g. day1_0711-42_p1.mov).
csv=/location/rename.csv
cat $csv | while IFS=, read -r -a arr; do mv "${arr[#]}"; done
However, this script only works when it and all the files that need to be renamed are in the same directory. This was okay previously, but now I need to find files in various subdirectories (without adding the full path to my .csv file).
How can I adapt my script so that is searches out the files in subdirectories then changes the name as before?
A simple way to make this work, though it leads to an inefficient script, is this:
for D in `find . -type d`
do
cat $csv | while IFS=, read -r -a arr; do mv "${arr[#]}"; done
done
This will run your command for every directory in the current directory, but this runs through the entire list of filenames for every subdirectory. An alternative would be to search for each file as you process it's name:
csv=files.csv
while IFS=, read -ra arr; do
while IFS= read -r -d '' old_file; do
old_dir=$(dirname "$old_file")
mv "$old_file" "$old_dir/${arr[1]}"
done < <(find . -name "${arr[0]}" -print0)
done<"$csv"
This uses find to locate each old filename, then uses dirname to get the directory of the old file (which we need so that mv does not place the renamed file into a different directory).
This will rename every instance of each file (i.e., if unnamed_0711-42_p1.mov appears in multiple subdirectories, each instance will be renamed to day1_0711-42_p1.mov). If you know each file name will only appear once, you can speed things up a bit by adding -print -quit to the end of the find command, before the pipe.
Below script
while IFS=, read -ra arr # -r to prevent mangling backslashes
do
find . -type f -name "${arr[0]}" -printf "mv '%p' '%h/${arr[1]}'" | bash
done<csvfile
should do it.
See [ find ] manpage to understand what the printf specifiers like %p,%h do
OK, sedAwkPerl-fu-gurus. Here's one similar to these (Extract specific strings...) and (Using awk to...), except that I need to use the number extracted from columns 4-10 in each line of File A (a PO number from a sales order line item) and use it to locate all related lines from File B and print them to a new file.
File A (purchase order details) lines look like this:
xxx01234560000000000000000000 yyy zzzz000000
File B (vendor codes associated with POs) lines look like this:
00xxxxx01234567890123456789001234567890
Columns 4-10 in File A have a 7-digit PO number, which is found in columns 7-13 of file B. What I need to do is parse File A to get a PO number, and then create a new sub-file from File B containing only those lines in File B which have the POs found in File A. The sub-file created is essentially the sub-set of vendors from File B who have orders found in File A.
I have tried a couple of things, but I'm really spinning my wheels on trying to make a one-liner for this. I could work it out in a script by defining variables, etc., but I'm curious whether someone knows a slick one-liner to do a task like this. The two referenced methods put together ought to do it, but I'm not quite getting it.
Here's a one-liner:
egrep -f <(cut -c4-10 A | sed -e 's/^/^.{6}/') B
It looks like the POs in file B actually start at column 8, not 7, but I made my regex start at column 7 as you asked in the question.
And in case there's the possibility of duplicates in A, you could increase efficiency by weeding those out before scanning file B:
egrep -f <(cut -c4-10 A | sort -u | sed -e 's/^/^.{6}/') B
sed 's_^...\(\d\{7\}\).*_/^.\{6\}\1/p_' FIRSTFILE > FILTERLIST
sed -n -f FILTERLIST SECONDFILE > FILTEREDFILE
The first line generates a sed script from firstfile than the second line uses that script to filter the second line. This can be combined to one line too...
If the files are not that big you can do something like
awk 'BEGIN { # read the whole FIRSTFILE PO numbers to an array }
substr($0,7,7} in array { print $0 }' SECONDFILE > FILTERED
You can do it like (but it will find the PO numbers anywhere on a line)
fgrep -f <(cut -b 4-10 FIRSTFILE) SECONDFILE
Another way using only grep:
grep -f <(grep -Po '^.{3}\K.{7}' fileA) fileB
Explanation:
-P for perl regex
-o to select only the match
\K is Perl positive lookbehind
How can I append the following code to the end of numerous php files in a directory and its sub directory:
</div>
<div id="preloader" style="display:none;position: absolute;top: 90px;margin-left: 265px;">
<img src="ajax-loader.gif"/>
</div>
I have tried with:
echo "my text" >> *.php
But the terminal displays the error:
bash : *.php: ambiguous redirect
I usually use tee because I think it looks a little cleaner and it generally fits on one line.
echo "my text" | tee -a *.php
You don't specify the shell, you could try the foreach command. Under tcsh (and I'm sure a very similar version is available for bash) you can say something like interactively:
foreach i (*.php)
foreach> echo "my text" >> $i
foreach> end
$i will take on the name of each file each time through the loop.
As always, when doing operations on a large number of files, it's probably a good idea to test them in a small directory with sample files to make sure it works as expected.
Oops .. bash in error message (I'll tag your question with it). The equivalent loop would be
for i in *.php
do
echo "my text" >> $i
done
If you want to cover multiple directories below the one where you are you can specify
*/*.php
rather than *.php
BashFAQ/056 does a decent job of explaining why what you tried doesn't work. Have a look.
Since you're using bash (according to your error), the for command is your friend.
for filename in *.php; do
echo "text" >> "$filename"
done
If you'd like to pull "text" from a file, you could instead do this:
for filename in *.php; do
cat /path/to/sourcefile >> "$filename"
done
Now ... you might have files in subdirectories. If so, you could use the find command to find and process them:
find . -name "*.php" -type f -exec sh -c "cat /path/to/sourcefile >> {}" \;
The find command identifies what files using conditions like -name and -type, then the -exec command runs basically the same thing I showed you in the previous "for" loop. The final \; indicates to find that this is the end of arguments to the -exec option.
You can man find for lots more details about this.
The find command is portable and is generally recommended for this kind of activity especially if you want your solution to be portable to other systems. But since you're currently using bash, you may also be able to handle subdirectories using bash's globstar option:
shopt -s globstar
for filename in **/*.php; do
cat /path/to/sourcefile >> "$filename"
done
You can man bash and search for "globstar" for more details about this. This option requires bash version 4 or higher.
NOTE: You may have other problems with what you're doing. PHP scripts don't need to end with a ?>, so you might be adding HTML that the script will try to interpret as PHP code.
You can use sed combined with find. Assume your project tree is
/MyProject/
/MyProject/Page1/file.php
/MyProject/Page2/file.php
etc.
Save the code you want to append on /MyProject/. Call it append.txt
From /MyProject/ run:
find . -name "*.php" -print | xargs sed -i '$r append.txt'
Explain:
find does as it is, it looks for all .php, including subdirectories
xargs will pass (i.e. run) sed for all .php that have just been found
sed will do the appending. '$r append.txt' means go to the end of the file ($) and write (paste) whatever is in append.txt there. Don't forget -i otherwise it will just print out the appended file and not save it.
Source: http://www.grymoire.com/unix/Sed.html#uh-37
You can do (Work even if there's space in your file path) :
#!/bin/bash
# Create a tempory file named /tmp/end_of_my_php.txt
cat << EOF > /tmp/end_of_my_php.txt
</div>
<div id="preloader" style="display:none;position: absolute;top: 90px;margin-left: 265px;">
<img src="ajax-loader.gif"/>
</div>
EOF
find . -type f -name "*.php" | while read the_file
do
echo "Processing $the_file"
#cp "$the_file" "${the_file}.bak" # Uncomment if you want to save a backup of your file
cat /tmp/end_of_my_php.txt >> "$the_file"
done
echo
echo done
PS: You must run the script from the directory you want to browse
Inspired from #Dantastic answer :
echo "my text" | tee -a file1.txt | tee -a file2.txt
I need to get all the file extension types in a folder. For instance, if the directory's ls gives the following:
a.t
b.t.pg
c.bin
d.bin
e.old
f.txt
g.txt
I should get this by running the script
.t
.t.pg
.bin
.old
.txt
I have a bash shell.
Thanks a lot!
See the BashFAQ entry on ParsingLS for a description of why many of these answers are evil.
The following approach avoids this pitfall (and, by the way, completely ignores files with no extension):
shopt -s nullglob
for f in *.*; do
printf '%s\n' ".${f#*.}"
done | sort -u
Among the advantages:
Correctness: ls behaves inconsistently and can result in inappropriate results. See the link at the top.
Efficiency: Minimizes the number of subprocess invoked (only one, sort -u, and that could be removed also if we wanted to use Bash 4's associative arrays to store results)
Things that still could be improved:
Correctness: this will correctly discard newlines in filenames before the first . (which some other answers won't) -- but filenames with newlines after the first . will be treated as separate entries by sort. This could be fixed by using nulls as the delimiter, or by the aforementioned bash 4 associative-array storage approach.
try this:
ls -1 | sed 's/^[^.]*\(\..*\)$/\1/' | sort -u
ls lists files in your folder, one file per line
sed magic extracts extensions
sort -u sorts extensions and removes duplicates
sed magic reads as:
s/ / /: substitutes whatever is between first and second / by whatever is between second and third /
^: match beginning of line
[^.]: match any character that is not a dot
*: match it as many times as possible
\( and \): remember whatever is matched between these two parentheses
\.: match a dot
.: match any character
*: match it as many times as possible
$: match end of line
\1: this is what has been matched between parentheses
People are really over-complicating this - particularly the regex:
ls | grep -o "\..*" | uniq
ls - get all the files
grep -o "\..*" - -o only show the match; "\..*" match at the first "." & everything after it
uniq - don't print duplicates but keep the same order
you can also sort if you like, but sorting doesn't match the example
This is what happens when you run it:
> ls -1
a.t
a.t.pg
c.bin
d.bin
e.old
f.txt
g.txt
> ls | grep -o "\..*" | uniq
.t
.t.pg
.bin
.old
.txt
I have an external file that contains a list of patterns (pattern per line).
pattern1
foo bar
pattern_n
bar
bar foo
I would like to grep all files including the ones within sub-folders using those patterns, if the pattern matches, copy the file to some /tmp/mybackup/ and then delete it. What would be a good way of doing this?
If I understand your problem correctly, you need the following switches to grep:
-R to scan recursively
-l to print only matching filenames
-f to read the patterns from a file
-I to ignore binary files
so:
grep -RlIf patterns-file *
then feed this result to some other utility to perform the backup, eg xargs:
grep -RlIf patterns-file * | xargs -I {} mv {} /tmp/backup
or with a loop:
for afile in `grep -RlIf patterns-file *`; do
mv $afile /tmp/backup
done
Try
for x in `fgrep -f patternfile.txt -l -r .`; do cp $x /tmp/mybackup; rm $x; done