Packer build volume size and Cloudformation Launch Template volume size - filesystems

I am running packer with an extra disk set to 10GB and on Cloudformation I want one of my Launch Template to use another volume size (let's take 20GB).
If I add a file system directly to the disk then it's fine, this will use the extra space added (the 10GB extra) like it is part of it.
mkfs -t ext4 $EXTRA_DISK
Now I want to create multiple partitions on that disk and I want to use that extra space left (10GB) instead of leaving it unpartitioned.
But if I partition the disk then they will have a fixed size and that 10GB left in the volume will be unused.
parted -a optimal $EXTRA_DISK mklabel gpt
parted -a optimal $EXTRA_DISK mkpart primary 0% 40%
parted -a optimal $EXTRA_DISK mkpart primary 40% 100%
#
PARTITIONS=$(fdisk -l | grep nvme | grep -v $BOOT_DISK | awk '{print $1}' | sed '1d' | cut -d':' -f1)
for partition in $PARTITIONS; do mkfs.ext4 $partition; done
Is there a way to make sure that at least one partition is using all the space left when CF Launch Template has a higher volume size or is it just impossible (which would makes sense as I'm defining the partition beforehand)?

Related

Changing a partition with fdisk shows a warning like "partition#x contains ext4-signature"

I'm shrinking a partion size with
#Reduce Partition Size
fsck -f /dev/sdb2
resize2fs /dev/sdb2 -M -p
#Limit Partion
fdisk /dev/sdb
... #Now I'm changing the Partition 2 to the new (smaller) size
fdisk gives me a (red) warning like partition#2 contains ext4-signature (Partition #2 enthält eine ext4-Signatur)
Is there something wrong? Why does the fdisk show me a warning?
I tried to found the same.
So it means there're EXT4 fs on this partition.
For example, on partition and FS increasing this warning appears to. Just select "N" if you don't plan to remove fs.
Created a new partition 1 of type 'Linux' and of size 100 GiB.
Partition #1 contains an ext4 signature.
Do you want to remove the signature? [Y]es/[N]o: N
Most complete answer here: https://unix.stackexchange.com/questions/477991/what-is-a-vfat-signature/478001#478001

Vlookup-like function using awk in ksh

Disclaimers:
1) English is my second language, so please forgive any gramatical horrors you may find. I am pretty confident you will be able to understand what I need despite these.
2) I have found several examples in this site that address questions/problems similar to mine, though I was unfortunately not able to figure out the modifications that would need to be introduced to fit my needs.
3) You will find some text in capital letters here and there. Is is of course not me "shouting" at you, but only a way to make portions of text stand out. Plase do not consider this an act of unpoliteness.
4) For those of you who get to the bottom of this novella alive, THANKS IN ADVANCE for your patience, even if you do not get to be able to/feel like help/ing me. My disclamer here would be the fact that, after surfing the site for a while, I noticed that the most common "complaint" from people willing to help seems to be lack of information (and/or the lack of quality) provided by the ones seeking for help. I then preferred to be accused of overwording if need be... It would be, at least, not a common offense...
The "Problem":
I have 2 files (a and b for simplification). File a has 7 columns separated by commas. File b has 2 columns separated by commas.
What I need: Whenever the data in the 7th column of file a matches -EXACT MATCHES ONLY- the data on the 1st column of file b, a new line, containing the whole line of file a plus column 2 of file b is to be appended into a new file "c".
--- MORE INFO IN THE NOTES AT THE BOTTOM ---
file a:
Server Name,File System,Path,File,Date,Type,ID
horror,/tmp,foldera/folder/b/folderc,binaryfile.bin,2014-01-21 22:21:59.000000,typet,aaaaaaaa
host1,/,somefolder,test1.txt,2016-08-18 00:00:20.000000,typez,11111111
host20,/,somefolder/somesubfolder,usr.cfg,2015-12-288 05:00:20.000000,typen,22222222
hoster,/lol,foolie,anotherfile.sad,2014-01-21 22:21:59.000000,typelol,66666666
hostie,/,someotherfolder,somefile.txt,2016-06-17 18:43:12.000000,typea,33333333
hostile,/sad,folder22,higefile.hug,2016-06-17 18:43:12.000000,typeasd,77777777
hostin,/var,folder30,someotherfile.cfg,2014-01-21 22:21:59.000000,typo,44444444
hostn,/usr,foldie,tinyfile.lol,2016-08-18 00:00:20.000000,typewhatever,55555555
server10,/usr,foldern,tempfile.tmp,2016-06-17 18:43:12.000000,tipesad,99999999
file b:
ID,Size
11111111,215915
22222222,1716
33333333,212856
44444444,1729
55555555,215927
66666666,1728
88888888,1729
99999999,213876
bbbbbbbb,26669080
Expected file c:
Server Name,File System,Path,File,Date,Type,ID,Size
host1,/,somefolder,test1.txt,2016-08-18 00:00:20.000000,typez,11111111,215915
host20,/,somefolder/somesubfolder,usr.cfg,2015-12-288 05:00:20.000000,typen,22222222,1716
hoster,/lol,foolie,anotherfile.sad,2014-01-21 22:21:59.000000,typelol,66666666,1728
hostie,/,someotherfolder,somefile.txt,2016-06-17 18:43:12.000000,typea,33333333,212856
hostin,/var,folder30,someotherfile.cfg,2014-01-21 22:21:59.000000,typo,44444444,1729
hostn,/usr,foldie,tinyfile.lol,2016-08-18 00:00:20.000000,typewhatever,55555555,215927
server10,/usr,foldern,tempfile.tmp,2016-06-17 18:43:12.000000,tipesad,99999999,213876
Additional notes:
0) Notice how line with ID "aaaaaaaa" in file a does not make it into file c since ID "aaaaaaaa" is not present in file b. Likewise, line with ID "bbbbbbbb" in file b does not make it into file c since ID "bbbbbbbb" is not present in file a and it is therefore never looked out for in the first place.
1) Data is clearly completely made out due to confidenciality issues, though the examples provided fairly resemble what the real files look like.
2) I added headers just to provide a better idea of the nature of the data. The real files don't have it, so no need to skip them on the source file nor create it in the destination file.
3) Both files come sorted by default, meaning that IDs will be properly sorted in file b, while they will be most likely scrambled in file a. File c should preferably follow the order of file a (though I can manipulate later to fit my needs anyway, so no worries there, as long as the code does what I need and doesn't mess up with the data by combining the wrong lines).
4) VERY VERY VERY IMPORTANT:
4.a) I already have a "working" ksh code (attached below) that uses "cat", "grep", "while" and "if" to do the job. It worked like a charm (well, acceptably) with 160K-lines sample files (it was able to output 60K lines -approx- an hour, which, in projection, would yield an acceptable "20 days" to produce 30 million lines [KEEP ON READING]), but somehow (I have plenty of processor and memory capacity) cat and/or grep seem to be struggling to process a real life 5Million-lines file (both file a and b can have up to 30 million lines each, so that's the maximum probable amount of lines in the resulting file, even assuming 100% lines in file a find it's match in file b) and the c file is now only being feed with a couple hundred lines every 24 hours.
4.b) I was told that awk, being stronger, should succeed where the more weaker commands I worked with seem to fail. I was also told that working with arrays might be the solution to my performance problem, since all data is uploded to memory at once and worked from there, instead of having to cat | grep file b as many times as there are lines in file a, as I am currently doing.
4.c) I am working on AIX, so I only have sh and ksh, no bash, therefore I cannot use the array tools provided by the latter, that's why I thought of AWK, that and the fact that I think AWK is probably "stronger", though I might be (probably?) wrong.
Now, I present to you the magnificent piece of ksh code (obvious sarcasm here, though I like the idea of you picturing for a brief moment in your mind the image of the monkey holding up and showing all other jungle-crawlers their future lion king) I have managed to develop (feel free to laugh as hard as you need while reading this code, I will not be able to hear you anyway, so no feelings harmed :P ):
cat "${file_a}" | while read -r line_file_a; do
server_name_file_a=`echo "${line_file_a}" | awk -F"," '{print $1}'`
filespace_name_file_a=`echo "${line_file_a}" | awk -F"," '{print $2}'`
folder_name_file_a=`echo "${line_file_a}" | awk -F"," '{print $3}'`
file_name_file_a=`echo "${line_file_a}" | awk -F"," '{print $4}'`
file_date_file_a=`echo "${line_file_a}" | awk -F"," '{print $5}'`
file_type_file_a=`echo "${line_file_a}" | awk -F"," '{print $6}'`
file_id_file_a=`echo "${line_file_a}" | awk -F"," '{print $7}'`
cat "${file_b}" | grep ${object_id_file_a} | while read -r line_file_b; do
file_id_file_b=`echo "${line_file_b}" | awk -F"," '{print $1}'`
file_size_file_b=`echo "${line_file_b}" | awk -F"," '{print $2}'`
if [ "${file_id_file_a}" = "${file_id_file_b}" ]; then
echo "${server_name_file_a},${filespace_name_file_a},${folder_name_file_a},${file_name_file_a},${file_date_file_a},${file_type_file_a},${file_id_file_a},${file_size_file_b}" >> ${file_c}.csv
fi
done
done
One last additional note, just in case you wonder:
The "if" section was not only built as a mean to articulate the output line, but it servers a double purpose, while safe-proofing any false positives that may derive from grep, IE 100 matching 1000 (Bear in mind that, as I mentioned earlier, I am working on AIX, so my grep does not have the -m switch the GNU one has, and I need matches to be exact/absolute).
You have reached the end. CONGRATULATIONS! You've been awarded the medal to patience.
$ cat stuff.awk
BEGIN { FS=OFS="," }
NR == FNR { a[$1] = $2; next }
$7 in a { print $0, a[$7] }
Note the order for providing the files to the awk command, b first, followed by a:
$ awk -f stuff.awk b.txt a.txt
host1,/,somefolder,test1.txt,2016-08-18 00:00:20.000000,typez,11111111,215915
host20,/,somefolder/somesubfolder,usr.cfg,2015-12-288 05:00:20.000000,typen,22222222,1716
hoster,/lol,foolie,anotherfile.sad,2014-01-21 22:21:59.000000,typelol,66666666,1728
hostie,/,someotherfolder,somefile.txt,2016-06-17 18:43:12.000000,typea,33333333,212856
hostin,/var,folder30,someotherfile.cfg,2014-01-21 22:21:59.000000,typo,44444444,1729
hostn,/usr,foldie,tinyfile.lol,2016-08-18 00:00:20.000000,typewhatever,55555555,215927
server10,/usr,foldern,tempfile.tmp,2016-06-17 18:43:12.000000,tipesad,99999999,213876
EDIT: Updated calculation
You can try to predict how often you are calling another program:
At least 7 awk's + 1 cat + 1 grep for each line in file a multiplied by 2 awk's for each line in file b.
(9 * 160.000).
For file b: 2 awk's, one file open and one file close for each hit. With 60K output, that would be 4 * 60.000.
A small change in the code can change this into "only" 160.000 times a grep:
cat "${file_a}" | while IFS=, read -r server_name_file_a \
filespace_name_file_a folder_name_file_a file_name_file_a \
file_date_file_a file_type_file_a file_id_file_a; do
grep "${object_id_file_a}" "${file_b}" | while IFS="," read -r line_file_b; do
if [ "${file_id_file_a}" = "${file_id_file_b}" ]; then
echo "${server_name_file_a},${filespace_name_file_a},${folder_name_file_a},${file_name_file_a},${file_date_file_a},${file_type_file_a},${file_id_file_a},${file_size_file_b}"
fi
done
done >> ${file_c}.csv
Well, try this with your 160K files and see how much faster it is.
Before I explain that this still is the wrong way I will make another small improvement: I will move the cat for the while loop to the end (after done).
while IFS=, read -r server_name_file_a \
filespace_name_file_a folder_name_file_a file_name_file_a \
file_date_file_a file_type_file_a file_id_file_a; do
grep "${object_id_file_a}" "${file_b}" | while IFS="," read -r line_file_b; do
if [ "${file_id_file_a}" = "${file_id_file_b}" ]; then
echo "${server_name_file_a},${filespace_name_file_a},${folder_name_file_a},${file_name_file_a},${file_date_file_a},${file_type_file_a},${file_id_file_a},${file_size_file_b}"
fi
done
done < "${file_a}" >> ${file_c}.csv
The main drawback of the solutions is that you are reading the complete file_b again and again with your grep for each line in file a.
This solution is a nice improvement in the performance, but still a lot overhead with grep. Another huge improvement can be found with awk.
The best solution is using awk as explained in What is "NR==FNR" in awk? and found in the answer of #jas.
It is only one system call and both files are only read once.

Unix command-line tool for randomly shuffling chunks of bytes in a binary file?

Is there an easy way of shuffling randomly a fixed-size of byte chunks?
I have a large binary file (say, a hundreds of gigabytes) containing many fixed-size of bytes. I do not care about the randomness, but want to shuffle two-byte (or could be any fixed-size of bytes, up to 8) elements in the binary file. Is there a way of combining unix core tools to achieve this goal? If there is no such tool, I might have to develop a C code. I want to hear what recommendation people have.
Here's a stupid shell trick to do so.
First, break the file down two 2 byte chunks using xxd
Shuffle it with shuf
Reassemble the file using xxd.
eg.
xxd -p -c 2 input_file | shuf - | xxd -p -r - output_file
I haven't tested it on huge files. You may want to use an intermediary file.
Alternately, you could use sort -R like so:
xxd -c 2 in_file |sort -R | cut -d' ' -f 2 | xxd -r -p - out_file
This depends on xxd outputing offsets, which should sort differently for each line.
Given the size of the input files to work with, this is a sufficiently complex problem. I wouldn't try to push the limits of shell scripting, best to code this in C or other.
I'm not aware of a tool that can make this easy.
Try:
split -b $CHUNK_SIZE $FILE && find . -name "x*" | perl -MList::Util='shuffle' -e "print shuffle<>" | xargs cat > temp.bin
This creates a large amount of files each with a file size of $CHUNK_SIZE (or less, if the total file size doesn't divide by $CHUNK_SIZE), named xaa, xab, xac, etc., lists the files, shuffles the list, and joins them.
This will take up an extra 2 x of disk space and probably won't work with large files.

parsing the output of the 'w' command?

I'm writing a program which requires knowledge of the current load on the system, and the activity of any users (it's a load balancer).
This is a university assignment, and I am required to use the w command. I'm having a hard time parsing this command because it is very verbose. Any suggestions on what I can do would be appreciated. This is a small part of the program, and I am free to use whatever method i like.
The most condensed version of w which still has the information I require is `w -u -s -f' which produces this:
10:13:43 up 9:57, 2 users, load average: 0.00, 0.00, 0.00
USER TTY IDLE WHAT
fsm tty7 22:44m x-session-manager
fsm pts/0 0.00s w -u -s -f
So out of that, I am interested in the first number after load average and the smallest idle time (so i will need to parse them all).
My background process will call w, so the fact that w is the lowest idle time will not matter (all i will see is the tty time).
Do you have any ideas?
Thanks
(I am allowed to use alternative unix commands, like grep, if that helps).
Are you allowed to use other Unix commands? You could use grep, sed or head/tail to get the lines you need, and cut to split them up as needed.
Check out: http://www.gnu.org/s/libc/manual/html_node/Regexp-Subexpressions.html#Regexp-Subexpressions
Use regular expressions to match [0-9]+\.[0-9]{2} on the first line. You may have to fiddle with which characters are escaped. That will give you 3 load averages.
The remaining output is column-based, so if you count off the string positions from w, you'll be able to just strncpy the interesting bits.
Another possible theory (which sounds like it goes against the assignment, but I'd keep it in mind) is to go grab the source code of w and hack it up to just tell you the information via function calls. If you're feeling really hardcore, you can learn all the library api calls and do it directly that way.
I found i can use a combination of commands like so:
w -u -s -f | grep load | cut -d " " -f 11
and
w -u -s -f | grep tty | cut -d " " -f 13
the first takes the output of w, uses grep to only select the line with load, and then cuts everything except for the 11th chunk of data (delimiter is a space), which is the first load number with a comma.
the second does something similar, only for user load. And if there are multiple loads, its a list.
This is easy enough to parse, unless someone has an objection, or suggestion to improve it.

How many bytes per inodes?

I need to create a very high number of files which are not very large (like 4kb,8kb).
It's not possible on my computer cause it takes all inodes up to 100% and I cannot create more files :
$ df -i /dev/sda5
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda5 54362112 36381206 17980906 67% /scratch
(I started deleting files, it's why it's now 67%)
The bytes-per-nodes are of 256 on my filesystem (ext4)
$ sudo tune2fs -l /dev/sda5 | grep Inode
Inode count: 54362112
Inodes per group: 8192
Inode blocks per group: 512
Inode size: 256
I wonder if it's possible to set this value very low even below 128(during reformating). If yes,what value should I use?
Thx
The default bytes per inode is usually 16384, which is the default inode_ratio in /etc/mke2fs.conf (it's read prior to filesystem creation). If you're running out of inodes, you might try for example:
mkfs.ext4 -i 8192 /dev/mapper/main-var2
Another option that affects this is -T, typically -T news which further reduces it to 4096.
Also, you can not change the number of inodes in a ext3 or ext4 filesystem without re-creating or hex-editing it. Reiser filesystems are dynamic so you'll never have an issue with them.
You can find out the approximate inode ratio by dividing the size of available space by the number of available inodes. For example:
$ sudo tune2fs -l /dev/sda1 | awk -F: ' \
/^Block count:/ { blocks = $2 } \
/^Inode count:/ { inodes = $2 } \
/^Block size:/ { block_size = $2 } \
END { blocks_per_inode = blocks/inodes; \
print "blocks per inode:\t", blocks_per_inode, \
"\nbytes per inode:\t", blocks_per_inode * block_size }'
blocks per inode: 3.99759
bytes per inode: 16374.1
I have found solution to my problem on the mke2fs man page :
-I inode-size
Specify the size of each inode in bytes. mke2fs creates 256-byte inodes by default. In kernels after 2.6.10 and some earlier vendor kernels it is possible to utilize
inodes larger than 128 bytes to store extended attributes for improved performance. The inode-size value must be a power of 2 larger or equal to 128. The larger the
inode-size the more space the inode table will consume, and this reduces the usable space in the filesystem and can also negatively impact performance. Extended
attributes stored in large inodes are not visible with older kernels, and such filesystems will not be mountable with 2.4 kernels at all. It is not possible to change
this value after the filesystem is created.
The maximun you will be able to set is given by your block-size.
sudo tune2fs -l /dev/sda5 | grep "Block size"
Block size: 4096
Hope this can help....

Resources