UNIX replace particular line in file - file

I have file about 5.5GB of size. I want to view a particular line of the file. Lets say line number 100001 and I want to replace that line with my own texts. How to achieve this operation using Unix command. I can not view the file in editor. I can not be opened and that is a remote machine.
Can anyone share some idea to view that line and replacing it with some other texts?
Thanks :)

If you want to modify the line in-place and the replacement data is the same length as the text being replaced, you can use dd to (carefully!) overwrite part of the file.
# getting the byte offsets of the start and length of the line
perl -ne '$s+=length; if ($.==100001) {print "$s + ",length,"\n"; exit}' bigfile
# writing over the existing data
echo 'new line' | dd of=bigfile bs=1 seek=$start count=$length conv=notrunc
If the replacement data is a different length, and it's not at the very end of the file, you have no choice but to rewrite the file. This requires having enough disk space to keep both bigfile and a copy of it!
# The old file is renamed to bigfile.bak; a new bigfile is written with changes.
sed -i.bak -e '100001 c \
new line' bigfile

Related

Make a list of all files in two folders then iterate through the combined list randomly

I have two directories with photos that I want to manipulate to output a random order of the files each time a script is run. How would I create such a list?
d1=/home/Photos/*.jpg
d2=/mnt/JillsPC/home/Photos/*.jpg
# somehow make a combined list, files = d1 + d2
# somehow randomise the file order
# during execution of the for;do;done loop, no file should be repeated
for f in $files; do
echo $f # full path to each file
done
I wouldn't use variables if you don't have to. It's more natural if you chain a couple of commands together with pipes or process substitution. That way everything operates on streams of data without loading the entire list of names into memory all at once.
You can use shuf to randomly permute input lines, and find to list files one per line. Or, to be maximally safe, let's use \0 separators. Finally, a while loop with process substitution reads line by line into a variable.
while IFS= read -d $'\0' -r file; do
echo "$file"
done < <(find /home/Photos/ /mnt/JillsPC/home/Photos/ -name '*.jpg' -print0 | shuf -z)
That said, if you do want to use some variables then you should use arrays. Arrays handle file names with whitespace and other special characters correctly, whereas regular string variables muck them all up.
d1=(/home/Photos/*.jpg)
d2=(/mnt/JillsPC/home/Photos/*.jpg)
files=("${d1[#]}" "${d2[#]}")
Iterating in order would be easy:
for file in "${files[#]}"; do
echo "$file"
done
Shuffling is tricky though. shuf is still the best tool but it works best on a stream of data. We can use printf to print each file name with the trailing \0 we need to make shuf -z happy.
d1=(/home/Photos/*.jpg)
d2=(/mnt/JillsPC/home/Photos/*.jpg)
files=("${d1[#]}" "${d2[#]}")
while IFS= read -d $'\0' -r file; do
echo "$file"
done < <(printf '%s\0' "${files[#]}" | shuf -z)
Further reading:
How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
How can I find and safely handle file names containing newlines, spaces or both?
I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?
How can I randomize (shuffle) the order of lines in a file? Or select a random line from a file, or select a random file from a directory?
I came up with this solution after some more reading:
files=(/home/roy/Photos/*.jpg /mnt/JillsPC/home/jill/Photos/*.jpg)
printf '%s\n' "${files[#]}" | sort -R
Edit: updated with John's improvements from comments.
You can add any number of directories into an array declaration (though see caveat with complex names in comments).
sort -R seems to use shuf internally from looking at it's man page.
This was the original, which works, but is not as robust as the above:
files=(/home/roy/Photos/*.jpg /mnt/JillsPC/home/jill/Photos/*.jpg)
(IFS=$'\n'; echo "${files[*]}") | sort -R
With IFS=$'\n', echoing the array will display it line by line (IFS=$'somestring' is syntax for string literals with escape sequences. So unlike '\n', $'\n' is the correct way to set it to a line break). IFS is not needed when using the printf method above.
echo ${files[*]} will print out all array elements at once, using the IFS defined in

Add text on certain lines of a file, with the added text depending on the output of a command that takes a substring from the line

I'm trying to make a shell script to take an input file (thousands of lines) and produce an output file that is the same except that on certain lines there will be added text. When the text is added to the (middle of) the line, the exact added text will depend on a substring on the line. The correlation between the substring and the added text is complex and comes from an external program that I can call in the shell. I don't have the source for this converter program nor any control over how the mapping is done.
To explain further...
I have an input file of this general format:
Blah Blah Unimportant
Something Something
FIELD_INFO(field_name_1, output_1),
FIELD_INFO(field_name_2, output_2),
Yadda Yadda
The whole file needs to be copied, with added text, but the only important parts for me are the field names (e.g. field_name_1, field_name_2). I have a command line program called "converter" that can take a file of field names and output a list of corresponding actions. Converter cannot operate directly on the input file. The input to converter needs to be just field names and the output of converter has extra information I don't need:
converter_field_name_1 "action1" /* Use this action for field_name_1 */
converter_field_name_2 "action2" /* use this action for field_name_2 */
The desire is to create a second file that looks like this:
Blah Blah Unimportant
Something Something
FIELD_INFO(field_name_1, action1, output_1),
FIELD_INFO(filed_name_2, action2, output_2),
Yadda Yadda
Here is the script I'm working on, but I've hit a wall (or two):
#!/bin/bash
filename="input_file"
# Let's create an array of the field names to feed to the converter program
field_array=($(sed -e '/^\s*FIELD_INFO/ s/FIELD_INFO(\(.*\),.*),/\1/' -e 't' -e 'd' < ${filename}))
# Save the array to a file, to be able to use the converter's file option
printf "%s\n" "${field_array[#]}" > script_field_names.txt
# Use converter on the whole file and extract only the actions into another array
action_array=($(converter -f script_field_names.txt | cut -d'"' -f 2))
# I will make and use an associative array and try to use
# sed to do the substitution
declare -A mapper
for i in ${!field_array[*]}
do
mapper[${field_array[i]}]=${action_array[i]}
done
#Now go back through the file and add action names (source file unchanged)
sed -e "s/FIELD_INFO(\(.*\),\(.*?),\)/FIELD_INFO(\1, ${mapper[\1], \2}/" < ${filename}
I know now that I can't use the sed group capture "\1" as an index into the mapper array like this. It is not working as a key and the output looks like this:
Blah Blah Unimportant
Something Something
FIELD_INFO(field_name_1, , output_1),
FIELD_INFO(field_name_2, , output_2),
Yadda Yadda
My actual script has debug statements scattered throughout and I know the field array, action array, and mapper array are all getting created correctly. But my idea of using the group capture substring from sed as the index into the mapper array is not working because I now know that sed expands the variables before running in the sub-shell, so the mapper[] array is not seeing the substring as an index.
What should I be doing instead? This script may only be used once, but it's too time consuming and error prone to do the addition of the action strings by hand. I want to come up with a way to make this work but I can't tell if I'm close or completely on the wrong path.
sed -e "s/FIELD_INFO(\(.*\),\(.*?),\)/FIELD_INFO(\1, ${mapper[\1], \2}/" < ${filename}
[...]
I now know that sed expands the variables before running in the sub-shell, so the mapper[] array is not seeing the substring as an index.
Good job identifying the problem. Also, the non-greedy quantifier .*? does not work with sed and ${mapper[\1], \2} should probably be ${mapper[\1]}, \2.
If you want to keep your current approach I see two options.
Do the replacement line by line in bash, either by creating a giant sed command string that lists the action for each line, or by executing sed inside a loop for each line while creating the command strings on the fly.
Instead of the array mapper, create a file that lists the actions to be inserted in the order from the file. Then use GNU sed's R filename command. This command inserts the next line from filename. You can use this to insert the correct action each time you come across a filed. However, the linebreak is inserted too. So you have to fiddle with the hold space and so on to remove these linebreaks afterwards.
Both options are not that great. Therefore I'd switch to awk to insert the actions:
sed -En 's/^\s*FIELD_INFO\(([^,]*).*/\1/p' "$filename" > fields
converter -f fields | cut -d\" -f2 > actions
awk '/^\s*FIELD_INFO\(/ {getline a < "actions"; sub(",", ", " a ",")} 1' "$filename"
With GNU grep you can simplify the first line to
grep -Po '^\s*FIELD_INFO\(\K[^,]*' "$filename" > fields
Why not try,
sed -n -e 's/^[ ]*FIELD_INFO(\(.*\),.*,/\1/p' -- input_file > script_field_names.txt
printf '/^[ ]*FIELD_INFO(%s,/ s/(\\(.[^,]*\\), \\(.[^)]*\\))/(\\1, %s, \\2)/\n' \
$(converter -f script_field_names.txt | cut -d'"' -f 2 |
paste -- script_field_names.txt -) |
sed -f /dev/stdin -- input_file
where
paste emits the map of fields (from file) and actions (from stdin)
printf emits a script read by sed from stdin
each script line becomes: /^[ ]*FIELD_INFO(fieldnameN,/ s/(\(.[^,]*\), \(.[^)]*\))/(\1, actionN, \2)/

printing part of file

Is there a magic unix command for printing part of a file? I have a file that has several millions of lines and I would like to skip first million or so lines and print the next million lines of the file.
Thank you in advance.
To extract data, sed is your friend.
Assuming a 1-off task that you can enter to your cmd-line:
sed -n '200000,300000p' file | enscript
"number comma (,) number" is one form of a range cmd in sed. This one starts at line 2,000,000 and *p*rints until you get to 3,000,000.
If you want the output to go to your screen remove the | enscript
enscript is a utility that manages the process of sending data to Postscript compatible printers. My Linux distro doesn't have that, so its not necessarily a std utility. Hopefully you know what command you need to redirect to to get output printed to paper.
If you want to "print" to another file, use
sed -n '200000,300000p' file > smallerFile
IHTH
I would suggest awk as it is a little easier and more flexible than sed:
awk 'FNR>12 && FNR<23' file
where FNR is the record number. So the above prints lines above 12 and below 23.
And you can make it more specific like this:
awk 'FNR<100 || FNR >990' file
which prints lines if the record number is less than 100 or over 990. Or, lines over 100 and lines containing "fred"
awk 'FNR >100 || /fred/' file

Unique file names in a directory in unix

I have a capture file in a directory in which some logs are being written in a file
word.cap
now there is a script in which when its size becomes exactly 1.6Gb then it clears itself and prepares files in below format in same directory-
word.cap.COB2T_1389889231
word.cap.COB2T_1389958275
word.cap.COB2T_1390035286
word.cap.COB2T_1390132825
word.cap.COB2T_1390213719
Now i want to pick all these files in a script one by one and want to perform some actions.
my script is-
today=`date +%d_%m_%y`
grep -E '^IPaddress|^Node' /var/rawcap/word.cap.COB2T* | awk '{print $3}' >> snmp$today.txt
sort -u snmp$today.txt > snmp_final_$today.txt
so, what should i write to pick all file names of above mentioned format one by one as i will place this script in crontab,but i don't want to read main word.cap file as that is being edited.
As per your comment:
Thanks, this is working but i have a small issue in this. There are
some files which are bzipped i.e. word.cap.COB2T_1390213719.bz2, so i
dont want these files in list, so what should be done?
You could add a condition inside the loop:
for file in word.cap.COB2T*; do
if [[ "$file" != *.bz2 ]]; then
# Do something here
echo ${file};
fi
done

How can I remove lines from .txt files using a loop?

In my current directory I have a couple of .txt files. I want to write a script to search for a string in those .txt files, and delete lines which contains that string.
For example, I'd like to delete all lines which have the word "start" in all .txt files in my current directory.
I have written the following code, but I don't know how to continue!
#!bin\bash
files=`find . -maxdepth 1 -name \*.txt`
How should I use "while" to go through each file?
Use Globs to Populate Loop Variables
When you use -maxdepth 1 on the current directory, you aren't recursing into subdirectories. If that's the case, there's no need at all to use find just to match files with an extension; you can use shell globs instead to populate your loop constructs. For example:
#!/bin/bash
# Run sed on each file to delete the line.
for file in *txt; do
sed -i '/text to match/d' "$file"
done
This is simple, and avoids a number of filename-related issues that you may have when passing filename arguments between processes. Keep it simple!
Easy cheasy:
sed -i "s/^.*string.*//" *.txt
this will remove any line containing 'string' on each .txt file
You use it along with read to get each filename in turn, after piping the results of find to it. Then you just pass the filename to sed to delete the lines you're interested in.
with open(file_listoflinks, 'r+', encoding='utf-8') as f_link:
lines = f_link.readlines() # read an store all lines into list
f_link.seek(0) # move file pointer to the beginning of a file
f_link.truncate() # truncate the file
# start writing lines except the first line
# lines[1:] from line 2 to last line
f_link.writelines(lines[1:])

Resources