sed addressing for each of multiple input files - file

I would like to print from line 10 until the end of the file for each of several files in a folder. For a single file, I would do this with sed -n '10,$p', however when providing multiple input files to sed the addressing becomes in terms of the concatenated files. How can I print using the sed command and address each file's line numbers? This website says that the $ addressing character refers to each file's end if the -s option is used, but this does not work for me on my Macbook Pro.
Ideally I would like the whole procedure to be done with a single tool without writing a loop. I'm ok with the output being concatenated. I'm open to other tools than sed. tail might work for this like so tail -n +10 filenames but this is very very slow, so I imagine sed is better to use.

awk 'FNR>9{print $0}' file1 file2
This will do it

Related

Add text on certain lines of a file, with the added text depending on the output of a command that takes a substring from the line

I'm trying to make a shell script to take an input file (thousands of lines) and produce an output file that is the same except that on certain lines there will be added text. When the text is added to the (middle of) the line, the exact added text will depend on a substring on the line. The correlation between the substring and the added text is complex and comes from an external program that I can call in the shell. I don't have the source for this converter program nor any control over how the mapping is done.
To explain further...
I have an input file of this general format:
Blah Blah Unimportant
Something Something
FIELD_INFO(field_name_1, output_1),
FIELD_INFO(field_name_2, output_2),
Yadda Yadda
The whole file needs to be copied, with added text, but the only important parts for me are the field names (e.g. field_name_1, field_name_2). I have a command line program called "converter" that can take a file of field names and output a list of corresponding actions. Converter cannot operate directly on the input file. The input to converter needs to be just field names and the output of converter has extra information I don't need:
converter_field_name_1 "action1" /* Use this action for field_name_1 */
converter_field_name_2 "action2" /* use this action for field_name_2 */
The desire is to create a second file that looks like this:
Blah Blah Unimportant
Something Something
FIELD_INFO(field_name_1, action1, output_1),
FIELD_INFO(filed_name_2, action2, output_2),
Yadda Yadda
Here is the script I'm working on, but I've hit a wall (or two):
#!/bin/bash
filename="input_file"
# Let's create an array of the field names to feed to the converter program
field_array=($(sed -e '/^\s*FIELD_INFO/ s/FIELD_INFO(\(.*\),.*),/\1/' -e 't' -e 'd' < ${filename}))
# Save the array to a file, to be able to use the converter's file option
printf "%s\n" "${field_array[#]}" > script_field_names.txt
# Use converter on the whole file and extract only the actions into another array
action_array=($(converter -f script_field_names.txt | cut -d'"' -f 2))
# I will make and use an associative array and try to use
# sed to do the substitution
declare -A mapper
for i in ${!field_array[*]}
do
mapper[${field_array[i]}]=${action_array[i]}
done
#Now go back through the file and add action names (source file unchanged)
sed -e "s/FIELD_INFO(\(.*\),\(.*?),\)/FIELD_INFO(\1, ${mapper[\1], \2}/" < ${filename}
I know now that I can't use the sed group capture "\1" as an index into the mapper array like this. It is not working as a key and the output looks like this:
Blah Blah Unimportant
Something Something
FIELD_INFO(field_name_1, , output_1),
FIELD_INFO(field_name_2, , output_2),
Yadda Yadda
My actual script has debug statements scattered throughout and I know the field array, action array, and mapper array are all getting created correctly. But my idea of using the group capture substring from sed as the index into the mapper array is not working because I now know that sed expands the variables before running in the sub-shell, so the mapper[] array is not seeing the substring as an index.
What should I be doing instead? This script may only be used once, but it's too time consuming and error prone to do the addition of the action strings by hand. I want to come up with a way to make this work but I can't tell if I'm close or completely on the wrong path.
sed -e "s/FIELD_INFO(\(.*\),\(.*?),\)/FIELD_INFO(\1, ${mapper[\1], \2}/" < ${filename}
[...]
I now know that sed expands the variables before running in the sub-shell, so the mapper[] array is not seeing the substring as an index.
Good job identifying the problem. Also, the non-greedy quantifier .*? does not work with sed and ${mapper[\1], \2} should probably be ${mapper[\1]}, \2.
If you want to keep your current approach I see two options.
Do the replacement line by line in bash, either by creating a giant sed command string that lists the action for each line, or by executing sed inside a loop for each line while creating the command strings on the fly.
Instead of the array mapper, create a file that lists the actions to be inserted in the order from the file. Then use GNU sed's R filename command. This command inserts the next line from filename. You can use this to insert the correct action each time you come across a filed. However, the linebreak is inserted too. So you have to fiddle with the hold space and so on to remove these linebreaks afterwards.
Both options are not that great. Therefore I'd switch to awk to insert the actions:
sed -En 's/^\s*FIELD_INFO\(([^,]*).*/\1/p' "$filename" > fields
converter -f fields | cut -d\" -f2 > actions
awk '/^\s*FIELD_INFO\(/ {getline a < "actions"; sub(",", ", " a ",")} 1' "$filename"
With GNU grep you can simplify the first line to
grep -Po '^\s*FIELD_INFO\(\K[^,]*' "$filename" > fields
Why not try,
sed -n -e 's/^[ ]*FIELD_INFO(\(.*\),.*,/\1/p' -- input_file > script_field_names.txt
printf '/^[ ]*FIELD_INFO(%s,/ s/(\\(.[^,]*\\), \\(.[^)]*\\))/(\\1, %s, \\2)/\n' \
$(converter -f script_field_names.txt | cut -d'"' -f 2 |
paste -- script_field_names.txt -) |
sed -f /dev/stdin -- input_file
where
paste emits the map of fields (from file) and actions (from stdin)
printf emits a script read by sed from stdin
each script line becomes: /^[ ]*FIELD_INFO(fieldnameN,/ s/(\(.[^,]*\), \(.[^)]*\))/(\1, actionN, \2)/

sed: how to replace sth. by a backslash followed by reference

Despite all the sed-backslash discussions on Stackoverflow I cannot find a working solution for my specific problem. I want to precede a certain string in a file by a backslash: something -> \something.
sed -i -- 's/\(something\)/\\\1/g' file
This always returns the string \1 instead of \something, because for some reason sed thinks it should escape the third backslash. The (from my point of view more logical) behaviour can be achieved by inserting a space between \\ and \1 in the sed command, but then the result is \ something (i.e. with an inserted space in the result) which is not what I want.
I am running this command in a batch file on Windows, using sed from cygwin (I hope this does not matter as I am aiming for a cross-platform solution).
EDIT: /usr/bin/sed version 4.2.2.
In Windows cmd with Cygwin, use this sed command:
sed -e 's/\(something\)/\\\\\1/g' file
You can start your script from a batch file
myBatch.bat
#echo off
c:\cygwin64\bin\bash ./mySed
mySed
#!/bin/bash
echo asdfsomethingasdf | sed 's/\(something\)/\\\1/g'
It can be necessary to use /usr/bin/sed when your path isn't completely set

How to edit a file with shell scripting

I have a file containing thousands of lines like this:
0x7f29139ec6b3: W 0x7fff06bbf0a8
0x7f29139f0010: W 0x7fff06bbf0a0
0x7f29139f0014: W 0x7fff06bbf098
0x7f29139f0016: W 0x7fff06bbf090
0x7f29139f0036: R 0x7f2913c0db80
I want to make a new file which contains only the second hex number on each line (the part marked in bold above)
I have to put all these hex numbers in an array in a C program. So I am trying to make a file with only the hex numbers on the right hand side, so that my C program can use the fscanf function to read these numbers from the modified file.
I guess we can use some shell script to make a file containing those hex numbers? grep or something?
You can use sed and edit inplace. For matching "R" or any other char use
sed -i "s/.*:..//g" file
cat file
0x7fff06bbf0a8
0x7fff06bbf0a0
0x7fff06bbf098
0x7fff06bbf090
You can use grep -oP command:
grep -oP ' \K0x[a-fA-F0-9]*' file
0x7fff06bbf0a8
0x7fff06bbf0a0
0x7fff06bbf098
0x7fff06bbf090
0x7f2913c0db80
You can run a command on the file that will create a new file in the format you want: somecommand <oldfile >newfile. That will leave the original file intact and create a new one for you to feed to your C program.
As to what somecommand should be, you have multiple options. The easiest is probably awk:
awk '{print $NF}'
But you can also do it with sed or grep or perl or cut ... see other answers for an embarrassment of choices.
Since it seems that you always want to select the third field, the simplest approach is to use cut:
cut -d ' ' -f 3 file
or awk:
awk '{print $3}' file

printing part of file

Is there a magic unix command for printing part of a file? I have a file that has several millions of lines and I would like to skip first million or so lines and print the next million lines of the file.
Thank you in advance.
To extract data, sed is your friend.
Assuming a 1-off task that you can enter to your cmd-line:
sed -n '200000,300000p' file | enscript
"number comma (,) number" is one form of a range cmd in sed. This one starts at line 2,000,000 and *p*rints until you get to 3,000,000.
If you want the output to go to your screen remove the | enscript
enscript is a utility that manages the process of sending data to Postscript compatible printers. My Linux distro doesn't have that, so its not necessarily a std utility. Hopefully you know what command you need to redirect to to get output printed to paper.
If you want to "print" to another file, use
sed -n '200000,300000p' file > smallerFile
IHTH
I would suggest awk as it is a little easier and more flexible than sed:
awk 'FNR>12 && FNR<23' file
where FNR is the record number. So the above prints lines above 12 and below 23.
And you can make it more specific like this:
awk 'FNR<100 || FNR >990' file
which prints lines if the record number is less than 100 or over 990. Or, lines over 100 and lines containing "fred"
awk 'FNR >100 || /fred/' file

Moving things in terminal based on their name

Edit: I think this has been answered successfully, but I can't check 'til later. I've reformatted it as suggested though.
The question: I have a series of files, each with a name of the form XXXXNAME, where XXXX is some number. I want to move them all to separate folders called XXXX and have them called NAME. I can do this manually, but I was hoping that by naming them XXXXNAME there'd be some way I could tell Terminal (I think that's the right name, but not really sure) to move them there. Something like
mv *NAME */NAME
but where it takes whatever * was in the first case and regurgitates it to the path.
This is on some form of Linux, with a bash shell.
In the real life case, the files are 0000GNUmakefile, with sequential numbering. I'm having to make lots of similar-but-slightly-altered versions of a program to compile and run on a cluster as part of my research. It would probably have been quicker to write a program to edit all the files and put in the right place in the first place, but I didn't.
This is probably extremely simple, and I should be able to find an answer myself, if I knew the right words. Thing is, I have no formal training in programming, so I don't know what to call things to search for them. So hopefully this will result in me getting an answer, and maybe knowing how to find out the answer for similar things myself next time. With the basic programming I've picked up, I'm sure I could write a program to do this for me, but I'm hoping there's a simple way to do it just using functionality already in Terminal. I probably shouldn't be allowed to play with these things.
Thanks for any help! I can actually program in C and Python a fair amount, but that's through trial and error largely, and I still don't know what I can do and can't do in Terminal.
SO many ways to achieve this.
I find that the old standbys sed and awk are often the most powerful.
ls | sed -rne 's:^([0-9]{4})(NAME)$:mv -iv & \1/\2:p'
If you're satisfied that the commands look right, pipe the command line through a shell:
ls | sed -rne 's:^([0-9]{4})(NAME)$:mv -iv & \1/\2:p' | sh
I put NAME in brackets and used \2 so that if it varies more than your example indicates, you can come up with a regular expression to handle your filenames better.
To do the same thing in gawk (GNU awk, the variant found in most GNU/Linux distros):
ls | gawk '/^[0-9]{4}NAME$/ {printf("mv -iv %s %s/%s\n", $1, substr($0,0,4), substr($0,5))}'
As with the first sample, this produces commands which, if they make sense to you, can be piped through a shell by appending | sh to the end of the line.
Note that with all these mv commands, I've added the -i and -v options. This is for your protection. Read the man page for mv (by typing man mv in your Linux terminal) to see if you should be comfortable leaving them out.
Also, I'm assuming with these lines that all your directories already exist. You didn't mention if they do. If they don't, here's a one-liner to create the directories.
ls | sed -rne 's:^([0-9]{4})(NAME)$:mkdir -p \1:p' | sort -u
As with the others, append | sh to run the commands.
I should mention that it is generally recommended to use constructs like for (in Tim's answer) or find instead of parsing the output of ls. That said, when your filename format is as simple as /[0-9]{4}word/, I find the quick sed one-liner to be the way to go.
Lastly, if by NAME you actually mean "any string of characters" rather than the literal string "NAME", then in all my examples above, replace NAME with .*.
The following script will do this for you. Copy the script into a file on the remote machine (we'll call it sortfiles.sh).
#!/bin/bash
# Get all files in current directory having names XXXXsomename, where X is an integer
files=$(find . -name '[0-9][0-9][0-9][0-9]*')
# Build a list of the XXXX patterns found in the list of files
dirs=
for name in ${files}; do
dirs="${dirs} $(echo ${name} | cut -c 3-6)"
done
# Remove redundant entries from the list of XXXX patterns
dirs=$(echo ${dirs} | uniq)
# Create any XXXX directories that are not already present
for name in ${dirs}; do
if [[ ! -d ${name} ]]; then
mkdir ${name}
fi
done
# Move each of the XXXXsomename files to the appropriate directory
for name in ${files}; do
mv ${name} $(echo ${name} | cut -c 3-6)
done
# Return from script with normal status
exit 0
From the command line, do chmod +x sortfiles.sh
Execute the script with ./sortfiles.sh
Just open the Terminal application, cd into the directory that contains the files you want moved/renamed, and copy and paste these commands into the command line.
for file in [0-9][0-9][0-9][0-9]*; do
dirName="${file%%*([^0-9])}"
mkdir -p "$dirName"
mv "$file" "$dirName/${file##*([0-9])}"
done
This assumes all the files that you want to rename and move are in the same directory. The file globbing also assumes that there are at least four digits at the start of the filename. If there are more than four numbers, it will still be caught, but not if there are less than four. If there are less than four, take off the appropriate number of [0-9]s from the first line.
It does not handle the case where "NAME" (i.e. the name of the new file you want) starts with a number.
See this site for more information about string manipulation in bash.

Resources