How to edit a file with shell scripting - arrays

I have a file containing thousands of lines like this:
0x7f29139ec6b3: W 0x7fff06bbf0a8
0x7f29139f0010: W 0x7fff06bbf0a0
0x7f29139f0014: W 0x7fff06bbf098
0x7f29139f0016: W 0x7fff06bbf090
0x7f29139f0036: R 0x7f2913c0db80
I want to make a new file which contains only the second hex number on each line (the part marked in bold above)
I have to put all these hex numbers in an array in a C program. So I am trying to make a file with only the hex numbers on the right hand side, so that my C program can use the fscanf function to read these numbers from the modified file.
I guess we can use some shell script to make a file containing those hex numbers? grep or something?

You can use sed and edit inplace. For matching "R" or any other char use
sed -i "s/.*:..//g" file
cat file
0x7fff06bbf0a8
0x7fff06bbf0a0
0x7fff06bbf098
0x7fff06bbf090

You can use grep -oP command:
grep -oP ' \K0x[a-fA-F0-9]*' file
0x7fff06bbf0a8
0x7fff06bbf0a0
0x7fff06bbf098
0x7fff06bbf090
0x7f2913c0db80

You can run a command on the file that will create a new file in the format you want: somecommand <oldfile >newfile. That will leave the original file intact and create a new one for you to feed to your C program.
As to what somecommand should be, you have multiple options. The easiest is probably awk:
awk '{print $NF}'
But you can also do it with sed or grep or perl or cut ... see other answers for an embarrassment of choices.

Since it seems that you always want to select the third field, the simplest approach is to use cut:
cut -d ' ' -f 3 file
or awk:
awk '{print $3}' file

Related

Add text on certain lines of a file, with the added text depending on the output of a command that takes a substring from the line

I'm trying to make a shell script to take an input file (thousands of lines) and produce an output file that is the same except that on certain lines there will be added text. When the text is added to the (middle of) the line, the exact added text will depend on a substring on the line. The correlation between the substring and the added text is complex and comes from an external program that I can call in the shell. I don't have the source for this converter program nor any control over how the mapping is done.
To explain further...
I have an input file of this general format:
Blah Blah Unimportant
Something Something
FIELD_INFO(field_name_1, output_1),
FIELD_INFO(field_name_2, output_2),
Yadda Yadda
The whole file needs to be copied, with added text, but the only important parts for me are the field names (e.g. field_name_1, field_name_2). I have a command line program called "converter" that can take a file of field names and output a list of corresponding actions. Converter cannot operate directly on the input file. The input to converter needs to be just field names and the output of converter has extra information I don't need:
converter_field_name_1 "action1" /* Use this action for field_name_1 */
converter_field_name_2 "action2" /* use this action for field_name_2 */
The desire is to create a second file that looks like this:
Blah Blah Unimportant
Something Something
FIELD_INFO(field_name_1, action1, output_1),
FIELD_INFO(filed_name_2, action2, output_2),
Yadda Yadda
Here is the script I'm working on, but I've hit a wall (or two):
#!/bin/bash
filename="input_file"
# Let's create an array of the field names to feed to the converter program
field_array=($(sed -e '/^\s*FIELD_INFO/ s/FIELD_INFO(\(.*\),.*),/\1/' -e 't' -e 'd' < ${filename}))
# Save the array to a file, to be able to use the converter's file option
printf "%s\n" "${field_array[#]}" > script_field_names.txt
# Use converter on the whole file and extract only the actions into another array
action_array=($(converter -f script_field_names.txt | cut -d'"' -f 2))
# I will make and use an associative array and try to use
# sed to do the substitution
declare -A mapper
for i in ${!field_array[*]}
do
mapper[${field_array[i]}]=${action_array[i]}
done
#Now go back through the file and add action names (source file unchanged)
sed -e "s/FIELD_INFO(\(.*\),\(.*?),\)/FIELD_INFO(\1, ${mapper[\1], \2}/" < ${filename}
I know now that I can't use the sed group capture "\1" as an index into the mapper array like this. It is not working as a key and the output looks like this:
Blah Blah Unimportant
Something Something
FIELD_INFO(field_name_1, , output_1),
FIELD_INFO(field_name_2, , output_2),
Yadda Yadda
My actual script has debug statements scattered throughout and I know the field array, action array, and mapper array are all getting created correctly. But my idea of using the group capture substring from sed as the index into the mapper array is not working because I now know that sed expands the variables before running in the sub-shell, so the mapper[] array is not seeing the substring as an index.
What should I be doing instead? This script may only be used once, but it's too time consuming and error prone to do the addition of the action strings by hand. I want to come up with a way to make this work but I can't tell if I'm close or completely on the wrong path.
sed -e "s/FIELD_INFO(\(.*\),\(.*?),\)/FIELD_INFO(\1, ${mapper[\1], \2}/" < ${filename}
[...]
I now know that sed expands the variables before running in the sub-shell, so the mapper[] array is not seeing the substring as an index.
Good job identifying the problem. Also, the non-greedy quantifier .*? does not work with sed and ${mapper[\1], \2} should probably be ${mapper[\1]}, \2.
If you want to keep your current approach I see two options.
Do the replacement line by line in bash, either by creating a giant sed command string that lists the action for each line, or by executing sed inside a loop for each line while creating the command strings on the fly.
Instead of the array mapper, create a file that lists the actions to be inserted in the order from the file. Then use GNU sed's R filename command. This command inserts the next line from filename. You can use this to insert the correct action each time you come across a filed. However, the linebreak is inserted too. So you have to fiddle with the hold space and so on to remove these linebreaks afterwards.
Both options are not that great. Therefore I'd switch to awk to insert the actions:
sed -En 's/^\s*FIELD_INFO\(([^,]*).*/\1/p' "$filename" > fields
converter -f fields | cut -d\" -f2 > actions
awk '/^\s*FIELD_INFO\(/ {getline a < "actions"; sub(",", ", " a ",")} 1' "$filename"
With GNU grep you can simplify the first line to
grep -Po '^\s*FIELD_INFO\(\K[^,]*' "$filename" > fields
Why not try,
sed -n -e 's/^[ ]*FIELD_INFO(\(.*\),.*,/\1/p' -- input_file > script_field_names.txt
printf '/^[ ]*FIELD_INFO(%s,/ s/(\\(.[^,]*\\), \\(.[^)]*\\))/(\\1, %s, \\2)/\n' \
$(converter -f script_field_names.txt | cut -d'"' -f 2 |
paste -- script_field_names.txt -) |
sed -f /dev/stdin -- input_file
where
paste emits the map of fields (from file) and actions (from stdin)
printf emits a script read by sed from stdin
each script line becomes: /^[ ]*FIELD_INFO(fieldnameN,/ s/(\(.[^,]*\), \(.[^)]*\))/(\1, actionN, \2)/

Shell Script regex matches to array and process each array element

While I've handled this task in other languages easily, I'm at a loss for which commands to use when Shell Scripting (CentOS/BASH)
I have some regex that provides many matches in a file I've read to a variable, and would like to take the regex matches to an array to loop over and process each entry.
Regex I typically use https://regexr.com/ to form my capture groups, and throw that to JS/Python/Go to get an array and loop - but in Shell Scripting, not sure what I can use.
So far I've played with "sed" to find all matches and replace, but don't know if it's capable of returning an array to loop from matches.
Take regex, run on file, get array back. I would love some help with Shell Scripting for this task.
EDIT:
Based on comments, put this together (not working via shellcheck.net):
#!/bin/sh
examplefile="
asset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')
asset('3a/3b/3c.ext')
"
examplearr=($(sed 'asset\((.*)\)' $examplefile))
for el in ${!examplearr[*]}
do
echo "${examplearr[$el]}"
done
This works in bash on a mac:
#!/bin/sh
examplefile="
asset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')
asset('3a/3b/3c.ext')
"
examplearr=(`echo "$examplefile" | sed -e '/.*/s/asset(\(.*\))/\1/'`)
for el in ${examplearr[*]}; do
echo "$el"
done
output:
'1a/1b/1c.ext'
'2a/2b/2c.ext'
'3a/3b/3c.ext'
Note the wrapping of $examplefile in quotes, and the use of sed to replace the entire line with the match. If there will be other content in the file, either on the same lines as the "asset" string or in other lines with no assets at all you can refine it like this:
#!/bin/sh
examplefile="
fooasset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')bar
foobar
fooasset('3a/3b/3c.ext')bar
"
examplearr=(`echo "$examplefile" | grep asset | sed -e '/.*/s/^.*asset(\(.*\)).*$/\1/'`)
for el in ${examplearr[*]}; do
echo "$el"
done
and achieve the same result.
There are several ways to do this. I'd do with GNU grep with perl-compatible regex (ah, delightful line noise):
mapfile -t examplearr < <(grep -oP '(?<=[(]).*?(?=[)])' <<<"$examplefile")
for i in "${!examplearr[#]}"; do printf "%d\t%s\n" $i "${examplearr[i]}"; done
0 '1a/1b/1c.ext'
1 '2a/2b/2c.ext'
2 '3a/3b/3c.ext'
This uses the bash mapfile command to read lines from stdin and assign them to an array.
The bits you're missing from the sed command:
$examplefile is text, not a filename, so you have to send to to sed's stdin
sed's a funny little language with 1-character commands: you've given it the "a" command, which is inappropriate in this case.
you only want to output the captured parts of the matches, not every line, so you need the -n option, and you need to print somewhere: the p flag in s///p means "print the [line] if a substitution was made".
sed -n 's/asset\(([^)]*)\)/\1/p' <<<"$examplefile"
# or
echo "$examplefile" | sed -n 's/asset\(([^)]*)\)/\1/p'
Note that this returns values like ('1a/1b/1c.ext') -- with the parentheses. If you don't want them, add the -r or -E option to sed: among other things, that flips the meaning of ( and \(

How to split lines in a file, and have the output names be based on those lines

I am using CentOS. I have a file that contains information like:
100000,UniqueName1
100000,UniqueName2
100000,UniqueName4
100000,SoloName9
I want to split this out into files, one for each line, each named:
[secondvalue]_file.txt
For an example:
SoloName9_file.txt
Is it possible to split the file in this fashion using a command, or will I need to write a shell script? If the former, what's the command?
Thank you!
Here's one approach. Use the sed command to turn this file into a valid shell script that you can then execute.
sed -e 's/^/echo /g' -e 's/,/ >/g' -e 's/$/_file.txt/g' <your.textfile >your.sh
chmod +x your.sh
./your.sh
Note that trailing whitespace in the file would take some additional work.
Writing it into a shell script file gives you a chance to review it, but you can also execute it as a single line.
sed -e 's/^/echo /g' -e 's/,/ >/g' -e 's/$/_file.txt/g' <your.textfile | sh

How can I use sed (or awk or maybe a perl one-liner) to get values from specific columns in file A and use it to find lines in file B?

OK, sedAwkPerl-fu-gurus. Here's one similar to these (Extract specific strings...) and (Using awk to...), except that I need to use the number extracted from columns 4-10 in each line of File A (a PO number from a sales order line item) and use it to locate all related lines from File B and print them to a new file.
File A (purchase order details) lines look like this:
xxx01234560000000000000000000 yyy zzzz000000
File B (vendor codes associated with POs) lines look like this:
00xxxxx01234567890123456789001234567890
Columns 4-10 in File A have a 7-digit PO number, which is found in columns 7-13 of file B. What I need to do is parse File A to get a PO number, and then create a new sub-file from File B containing only those lines in File B which have the POs found in File A. The sub-file created is essentially the sub-set of vendors from File B who have orders found in File A.
I have tried a couple of things, but I'm really spinning my wheels on trying to make a one-liner for this. I could work it out in a script by defining variables, etc., but I'm curious whether someone knows a slick one-liner to do a task like this. The two referenced methods put together ought to do it, but I'm not quite getting it.
Here's a one-liner:
egrep -f <(cut -c4-10 A | sed -e 's/^/^.{6}/') B
It looks like the POs in file B actually start at column 8, not 7, but I made my regex start at column 7 as you asked in the question.
And in case there's the possibility of duplicates in A, you could increase efficiency by weeding those out before scanning file B:
egrep -f <(cut -c4-10 A | sort -u | sed -e 's/^/^.{6}/') B
sed 's_^...\(\d\{7\}\).*_/^.\{6\}\1/p_' FIRSTFILE > FILTERLIST
sed -n -f FILTERLIST SECONDFILE > FILTEREDFILE
The first line generates a sed script from firstfile than the second line uses that script to filter the second line. This can be combined to one line too...
If the files are not that big you can do something like
awk 'BEGIN { # read the whole FIRSTFILE PO numbers to an array }
substr($0,7,7} in array { print $0 }' SECONDFILE > FILTERED
You can do it like (but it will find the PO numbers anywhere on a line)
fgrep -f <(cut -b 4-10 FIRSTFILE) SECONDFILE
Another way using only grep:
grep -f <(grep -Po '^.{3}\K.{7}' fileA) fileB
Explanation:
-P for perl regex
-o to select only the match
\K is Perl positive lookbehind

printing part of file

Is there a magic unix command for printing part of a file? I have a file that has several millions of lines and I would like to skip first million or so lines and print the next million lines of the file.
Thank you in advance.
To extract data, sed is your friend.
Assuming a 1-off task that you can enter to your cmd-line:
sed -n '200000,300000p' file | enscript
"number comma (,) number" is one form of a range cmd in sed. This one starts at line 2,000,000 and *p*rints until you get to 3,000,000.
If you want the output to go to your screen remove the | enscript
enscript is a utility that manages the process of sending data to Postscript compatible printers. My Linux distro doesn't have that, so its not necessarily a std utility. Hopefully you know what command you need to redirect to to get output printed to paper.
If you want to "print" to another file, use
sed -n '200000,300000p' file > smallerFile
IHTH
I would suggest awk as it is a little easier and more flexible than sed:
awk 'FNR>12 && FNR<23' file
where FNR is the record number. So the above prints lines above 12 and below 23.
And you can make it more specific like this:
awk 'FNR<100 || FNR >990' file
which prints lines if the record number is less than 100 or over 990. Or, lines over 100 and lines containing "fred"
awk 'FNR >100 || /fred/' file

Resources