Bash shell scripting: How to replace characters at specific byte offsets - arrays

I'm looking to replace characters at specific byte offsets.
Here's what is provided:
An input file that is simple ASCII text.
An array within a Bash shell script, each element of the array is a numerical byte-offset value.
The goal:
Take the input file, and at each of the byte-offsets, replace the character there with an asterisk.
So essentially the idea I have in mind is to somehow go through the file, byte-by-byte, and if the current byte-offset being read is a match for an element value from the array of offsets, then replace that byte with an asterisk.
This post seems to indicate that the dd command would be a good candidate for this action, but I can't understand how to perform the replacement multiple times on the input file.
Input file looks like this:
00000
00000
00000
The array of offsets looks this:
offsetsArray=("2" "8" "9" "15")
The output file's desired format looks like this:
0*000
0**00
00*00
Any help you could provide is most appreciated. Thank you!

Please check my comment about about newline offset. Assuming this is correct (note I have changed your offset array), then I think this should work for you:
#!/bin/bash
read -r -d ''
offsetsArray=("2" "8" "9" "15")
txt="${REPLY}"
for i in "${offsetsArray[#]}"; do
txt="${txt:0:$i-1}*${txt:$i}"
done
printf "%s" "$txt"
Explanation:
read -d '' reads the whole input (redirected file) in one go into the $REPLY variable. If you have large files, this can run you out of memory.
We then loop through the offsets array, one element at a time. We use each index i to grab i-1 characters from the beginning of the string, then insert a * character, then add the remaining bytes from offset i. This is done with bash parameter expansion. Note that while your offsets are one-based, bash strings use zero-based indexing.
In use:
$ ./replacechars.sh < input.txt
0*000
0**00
00*00
$
Caveat:
This is not really a very efficient solution, as it causes the sting containing the whole file to be copied for every offset. If you have large files and/or a large number of offsets, then this will run slowly. If you need something faster, then another language that allows modification of individual characters in a string would be much better.

The usage of dd can be a bit confusing at the time, but it's not that hard:
outfile="test.txt"
# create some test data
echo -n 0123456789abcde > "$outfile"
offsetsArray=("2" "7" "8" "13")
for offset in "${offsetsArray[#]}"; do
dd bs=1 count=1 seek="$offset" conv=notrunc of="$outfile" <<< '*'
done
cat "$outfile"
Important for this example is to use conv=notrunc, otherwise dd truncates the file to the length of blocks it seeks over. bs=1 specifies that you want to work with blocks of size 1, and seek specifies the offset to satart writing count blocks to.
The above produces 01*3456**9abc*e

With the same offset considerations as #DigitalTrauma's superior solution, here's a GNU awk-based alternative. This assumes your file contains no null bytes
(IFS=','; awk -F '' -v RS=$'\0' -v OFS='' -v offsets="${offsetsArray[*]}" \
'BEGIN{split(offsets, o, ",")};{for (k in o) $o[k]="*"; print}' file)
0*000
0**00
00*00

Related

Replace a number in a file using array data, bash

I'm not an expert in bash coding and I'm trying to do one interative-like code to help me in my work.
I have a file that contains some numbers (coordinates), and I'm trying to make a code to read some specific numbers from the file and then store them in an array. Modify that array using some arithmetic operation and then replace the numbers in the original file with the modified array. So far I've done everything except replacing the numbers in the file, I tried using sed but it does not change the file. The original numbers are stored in an array called "readfile" and the new numbers are stored in an array called "d".
I'm trying to use sed in this way: sed -i 's/${readfile[$j]}/${d[$k]}/' file.txt
And I loop j and k to cover all the numbers in the arrays. Everything seems to work but the file is not being modified. After some digging, I'm noticing that sed is not reading the value of the array, but I do not know how to fix that.
Your help is really appreciated.
When a file isn't modified by sed -i, it means sed didn't find any matches to modify. Your pattern is wrong somehow.
After using " instead of ' so that the variables can actually be evaluated inside the string, look at the contents of the readfile array and check whether it actually matches the text. If it seems to match, look for special characters in the pattern, characters that would mean something specific to sed (the most common mistake is /, which will interfere with the search command).
The fix for special characters is either to (1) escape them, e.g. \/ instead of just /, or (2) (and especially for /) to use another delimiter for the search/replace command (instead of s/foo/bar/ you can use s|foo|bar| or s,foo,bar, etc - pretty much any delimiter works, so you can pick one that you know isn't in the pattern string).
If you post data samples and more of your script, we can look at where you went wrong.

UNIX - I want to find in a text file the records which have more columns than expected and their line numbers

I have a file with pipe delimiter and one record has more columns than expected.
For example:
File NPS.txt
1|a|10
2|b|20
3|c|30
4|d|40|old
The last column has more columns than expected and I want to know the line number to understand what the problem is.
I found this command:
awk -F\; '{print NF}' NPS.txt | sort | uniq -c
With this command I know that one columns has one column added but I do not know which one is.
I would use a bash script
a) Define a counter variable, starting at 0,
b) iterate over each line in your file, adding +1 to the counter at the beginning of each loop,
c) split each line into an array based on the "|" delimiter, logging the counter # if the array contains more than 3 elements. you can log to console or write to a file.
It's been awhile since I've scripted in Linux, but these references might help:
Intro:
https://www.guru99.com/introduction-to-shell-scripting.html
For Loops:
https://www.cyberciti.biz/faq/bash-for-loop/
Bash String Splitting
How do I split a string on a delimiter in Bash?
Making Scripts Executable
https://www.andrewcbancroft.com/blog/musings/make-bash-script-executable/
There may be a good one-liner out there, but it's not a difficult script to write.

Read a file from byte a until byte b

I want to break a big file into to smaller files mainly.
I use stream because I do not want to keep the big file in my disk.
What I am looking it is something similar to:
sed -n 'a,bp,' #this uses lines in file while i want bytes
or:
cat filename|head -c a| tail -c (a-b) # this way takes too long with big files
If performance is an issue, and you are using large files, I think you would do better with a bigger block size in dd, like this
dd bs=$a skip=1 if=filename | dd "bs=$((b-a))" count=1
If you want to extract from byte offset a to byte offset b, you can use the dd command:
dd bs=1 "skip=$a" "count=$(($b - $a))" if=filename
The quotes are optional. The main problem to worry about is whether the shell arithmetic will handle offsets bigger than 31-bits (2 GiB). Most likely it won't be an issue (for example, 64-bit Bash handles 12-digit numbers with ease on Mac OS X), but be cautious if you need to deal with really large files on 32-bit systems. You could use bc instead of the built-in $((…arithmetic…)) notation, if need be.

Provide hex value as an input to gets in C

I'm working on a simple arc injection exploit, wherein this particular string gives me the desired address of the place where I'd like to jump: Á^F#^#. This is the address 0x004006c1 (I'm using a 64 bit Intel processor, so x86-64 with little endian arrangement).
When I provide this string Á^F#^# as input to a vulnerable gets() routine in my function and inspect the addresses using gdb, the address gets modified to 0x00400681 instead of 0x004006c1. I'm not quite sure as to why this is happening. Furthermore, is there any way to easily provide hexadecimal values to a gets routine at stdin? I've tried doing something like: 121351...12312\xc1\x06\x40\x00, but instead of picking up \xc1 as it is, it translates individual character to hex, so I get something like 5c78.. (hex for \ and x, followed by hex for c and 1).
Any help is appreciated, thanks!
You could just put the raw bytes into a file somewhere and pipe it directly into your application.
$ path/to/my_app <raw_binary_data
Alternatively, you could wrap the application in a shell script that converts escaped hex bytes into their corresponding byte values. The echo utility will do this when the -e switch is set on the command line, for example:
$ echo '\x48\x65\x6c\x6c\x6f'
\x48\x65\x6c\x6c\x6f
$ echo -e '\x48\x65\x6c\x6c\x6f'
Hello
You can use this feature to process your application's input as follows:
while read -r line; do echo -e $line; done | path/to/my_app
To terminate the input, try pressing ControlD or ControlC.

Easiest way to overwrite a series of files with zeros

I'm on Linux. I have a list of files and I'd like to overwrite them with zeros and remove them. I tried using
srm file1 file2 file3 ...
but it's too slow (I have to overwrite and remove ~50 GB of data) and I don't need that kind of security (I know that srm does a lot of passes instead of a single pass with zeros).
I know I could overwrite every single file using the command
cat /dev/zero > file1
and then remove it with rm, but I can't do that manually for every single file.
Is there a command like srm that does a single pass of zeros, or maybe a script that can do cat /dev/zero on a list of files instead of on a single one? Thank you.
Something like this, using stat to get the correct size to write, and dd to overwrite the file, might be what you need:
for f in $(<list_of_files.txt)
do
read blocks blocksize < <(stat -c "%b %B" ${f})
dd if=/dev/zero bs=${blocksize} count=${blocks} of=${f} conv=notrunc
rm ${f}
done
Use /dev/urandom instead of /dev/zero for (slightly) better erasure semantics.
Edit: added conv=notrunc option to dd invocation to avoid truncating the file when it's opened for writing, which would cause the associated storage to be released before it's overwritten.
I use shred for doing this.
The following are the options that I generally use.
shred -n 3 -z <filename> - This will make 3 passes to overwrite the file with random data. It will then make a final pass overwriting the file with zeros. The file will remain on disk though, but it'll all the 0's on disk.
shred -n 3 -z -u <filename> - Similar to above, but also unlinks (i.e. deletes) the file. The default option for deleting is wipesync, which is the most secure but also the slowest. Check the man pages for more options.
Note: -n is used here to control the number of iterations for overwriting with random data. Increasing this number, will result in the shred operation taking longer to complete and better shredding. I think 3 is enough but maybe wrong.
The purpose of srm is to destroy the data in the file before releasing its blocks.
cat /dev/null > file is not at all equivalent to srm because
it does not destroy the data in the file: the blocks will be released with the original data intact.
Using /dev/zero instead of /dev/null does not even work because /dev/zero never ends.
Redirecting the output of a program to the file will never work for the same reason given for cat /dev/null.
You need a special-purpose program that opens the given file for writing, writes zeros over all bytes of the file, and then removes the file. That's what srm does.
Is there a command like srm that does a single pass of zeros,
Yes. SRM does this with the correct parameters. From man srm:
srm -llz
-l lessens the security. Only two passes are written: one mode with
0xff and a final mode random values.
-l -l for a second time lessons the security even more: only one
random pass is written.
-z wipes the last write with zeros instead of random data
srm -llzr will do the same recursively if wiping a directory.
You can even use 'srm -llz [file1] [file2] [file3] to wipe multiple files i this way with a single command

Resources