MongoDB $in Operator Limit when using Array from sed command

MongoDB $in Operator Limit when using Array from sed command - arrays

I am trying to run through huge amount of data. The problem is when I pass 100 objects in array it works perfectly fine but moment I keep 150 or more it starts failing .
Example :--
DBQuery.shellBatchSize = 100000 ;
permissibleCars = [ "C:1456797:665","C:146:5722","C:145:57805","C:146:6070","C:14:60908"]
db.getCollection('contracts').aggregate([
{$match:
{ "methods.name": "image",
"methods.status": "ACTIVE",
container: {"$in": permissibleCars},
Class : "Download"
} },
{"$group" : {_id:"$container", count:{$sum:1}}}],
{ allowDiskUse: true}
);
This will work perfectly fine till the limit in permissibleCars is low say 100 but the moment it crosses 150 or so it starts failing randomly with below error.
2017-08-16T21:30:35.101+0000 E QUERY [thread1] SyntaxError: unterminated string literal #(shell):1:4091
2017-08-16T21:30:35.132+0000 E QUERY [thread1] SyntaxError: missing ; before statement #(shell):1:6
2017-08-16T21:30:35.162+0000 E QUERY [thread1] SyntaxError: missing ; before statement #(shell):1:2
2017-08-16T21:30:35.193+0000 E QUERY [thread1] ReferenceError: permissibleCars is not defined :
Now since it runs fine it cannot be syntax issue .
Anyway to get this fixed so that I can pass larger number of variables. I am running this through shell .
for((i=0; i < ${#arr[#]}; i+=batchsize))
do
set display=lastline
IFS=,
part=( "${arr[#]:i:batchsize}" )
{ echo "DBQuery.shellBatchSize = $contracts_count ; "; cat query/container_count_tmp.js; } > query/container_count.js
sed -i "2i permissibleCars = [ ${part[*]} ]" query/container_count.js
mongo mngdb-test-02:27068/test_db -u test_user -p test123 < query/container_count.js >> output/container_count.txt
done
Array Declation :--
distinct_array=`sed ':a;N;$!ba;s/\n/ /g' output/userdistinct.txt`
declare -a arr=($distinct_array)
echo " Total Number of Distinct Ids Stored in Array ${#arr[#]}"
batchsize=150
Any help will be highly appreciated.
Note :-- I checked the page mongodb $in limit not much of information .
Have uploaded the sample data at for testing and to replicate the issue . https://drive.google.com/file/d/0ByHEfbo541jIYlJhSGJIdElCODQ/view?usp=sharing
Regards,

This is not a MongoDB limitation, but if you are using POSIX standard sed implementation, the byte length limit is 8192 bytes. That would explains why you have a syntax error due to the array string has been truncated.
https://www.gnu.org/software/sed/manual/html_node/Limitations.html
For a workaround, use perl instead of sed:
perl -ni -e "print; print 'permissibleCars = [ ${part[*]} ]' if $. == 2" query/container_count.js

Related

reading multiple matches into arrays with bash

The utility 'sas2ircu' can output multiple lines for every hard drive attached to the host. A sample of the output for a single drive looks like this:
Enclosure # : 5
Slot # : 20
SAS Address : 5003048-0-185f-b21c
State : Ready (RDY)
I have a bash script that executes the sas2ircu command and does the following with the output:
identifies a drive by the RDY string
reads the numerical value of the enclosure (ie, 5) into an array 'enc'
reads the numerical value of the slot (ie, 20) into another array 'slot'
The code I have serves its purpose, but I'm trying to figure out if I can combine it into a single line and run the sas2ircu command once instead of twice.
mapfile -t enc < <(/root/sas2ircu 0 display|grep -B3 RDY|awk '/Enclosure/{print $NF}')
mapfile -t slot < <(/root/sas2ircu 0 display|grep -B2 RDY|awk '/Slot/{print $NF}')
I've done a bunch of reading on awk but I'm still quite novice with it and haven't come up with anything better than what I have. Suggestions?

Should be able to eliminate the grep and combine the awk scripts into a single awk script; the general idea is to capture the enclosure and slot data and then if/when we see State/RDY we print the enclosure and slot to stdout:
awk '/Enclosure/{enclosure=$NF}/Slot/{slot=$NF}/State.*(RDY)/{print enclosure,slot}'
I don't have sas2ircu so I'll simulate some data (based on OP's sample):
$ cat raw.dat
Enclosure # : 5
Slot # : 20
SAS Address : 5003048-0-185f-b21c
State : Ready (RDY)
Enclosure # : 7
Slot # : 12
SAS Address : 5003048-0-185f-b21c
State : Ready (RDY)
Enclosure # : 9
Slot # : 23
SAS Address : 5003048-0-185f-b21c
State : Off (OFF)
Simulating thw sas2ircu call:
$ cat raw.dat | awk '/Enclosure/{enclosure=$NF}/Slot/{slot=$NF}/State.*(RDY)/{print enclosure,slot}'
5 20
7 12
The harder part is going to be reading these into 2 separate arrays and I'm not aware of an easy way to do this with a single command (eg, mapfile doesn't provide a way to split an input file across 2 arrays).
One idea using a bash/while loop:
unset enc slot
while read -r e s
do
enc+=( ${e} )
slot+=( ${s} )
done < <(cat raw.dat | awk '/Enclosure/{enclosure=$NF}/Slot/{slot=$NF}/State.*(RDY)/{print enclosure,slot}')
This generates:
$ typeset -p enc slot
declare -a enc=([0]="5" [1]="7")
declare -a slot=([0]="20" [1]="12")

Saving the output of a for loop in an array in Bash

I am trying to write a custom script to monitor the disk usage space of "n" number of servers. I have two arrays, one array consists of the actual usage and the other array consists of the allowed limit. I would like to loop through the used storage array; determine the percentage, round it off to the nearest integer and output the same on the console to be later saved in an array.
I have the following piece of code that does this:
readarray -t percentage_storage_limit <<< "$(for ((j=0; j < ${#storage_usage_array[#]}; j++));
do $(awk "BEGIN {
ac=100*${storage_usage_array[$j]}/${storage_limit_array[$j]};
i=int(ac);
print (ac-i<0.5)?i:i+1
}");
done)";
The length of both storage_usage_array and storage_limit_array are the same. An index in storage_usage_array corresponds to the storage used on a server and an index on storage_limit_array corresponds to the limit on the same server.
Although the above statement runs as expected, I see a "command not found error" as follow, which is causing these output to not be saved in the "percentage_storage_limit" array.
8: command not found
4: command not found
am I missing something here? Any help would be really appreciated.

I think you getting over-complicated syntax-wise. I would just accumulate the array within the for loop
percentage_storage_limit=()
for ((j=0; j < ${#storage_usage_array[#]}; j++)); do
percentage_storage_limit+=( $(
awk -v u="${storage_usage_array[$j]}" -v l="${storage_limit_array[$j]}" '
BEGIN {
ac = 100 * u / l
i = int(ac)
print (ac-i < 0.5) ? i : i+1
}
'
) )
done

The reason it doesn't work, is that whan you enclose awk in $(...) you tell bash to execute it's output, thus you want to execute 8 or 4 and bash errors to you that it didn't find such command. Just don't enclose awk in $(...), you want to capture it's output, not execute it's output. And it would be better to use < <(...) then <<<"$(...)":
readarray -t percentage_storage_limit < <(
for ((j=0; j < ${#storage_usage_array[#]}; j++)); do
awk "BEGIN {
ac=100*${storage_usage_array[$j]}/${storage_limit_array[$j]};
i=int(ac);
print (ac-i<0.5)?i:i+1
}";
done
)
Anyway Glenn's answer shows the 'good' way to do this, without readarray call.

shell script : if array value was greater than a number then run a command

i have a a files containing usernames and users sent count mail per line . for example (dont know how many line have ) :
info.txt >
500 example1
40 example2
20 example3
....
..
.
if the number was greater than X , i want to run commands containing the user name and act on user .
getArray() {
users=() # Create array
while IFS= read -r line # Read a line
do
users+=("$line") # Append line to the array
done < "$1"
}
getArray "/root/.myscripts/spam1/info.txt"
# i know this part is incorrect and need help here :
if [ "${users[1$]}" -gt "50" ]
then
echo "${users[2$] has sent ${users[1$]} emails"
fi
please Help
Thanks

Not knowing how many lines of input you have is no reason to use an array. Indeed, it is generally more useful if you assume your input is infinite (an input stream), so reading into an array is impossible. Just read each line and take action if necessary:
#!/bin/sh
while read -r count user; do
if test "$count" -gt 50; then
echo "$user has sent $count emails"
fi
done < /root/.myscripts/spam1/info.txt

Shell script - awk extract block from file into array

I'm currently writing a shell script that reads a Vagrantfile and bootstraps it (in a nutshell ;) )
But I'm hitting a wall with the following piece of code:
TEST=()
while read result; do
TEST+=(`echo ${result}`)
done <<< `awk '/config.vm.define[ \s]\"[a-z]*\"[ \s]do[ \s]\|[a-zA-Z_]*\|/, /end/ { print }' Vagrantfile`
echo "${TEST[1]}"
When I pass a Vagrantfile into this awk pattern regex with two machines defined (config.vm.define) in it they are found.
The output
config.vm.define "web" do |web|
web.vm.box = "CentOs"
web.vm.box_url = "http://developer.nrel.gov/downloads/vagrant-boxes/CentOS-6.4-x86_64-v20130731.box"
web.vm.hostname = 'dev.local'
web.vm.network :forwarded_port, guest: 90, host: 9090
web.vm.network :private_network, ip: "22.22.22.11"
web.vm.provision :puppet do |puppet|
puppet.manifests_path = "puppet/manifests"
puppet.manifest_file = "web.pp"
puppet.module_path = "puppet/modules"
puppet.options = ["--verbose", "--hiera_config /vagrant/hiera.yaml", "--parser future"]
end
config.vm.define "db" do |db_mysql|
db_mysql.vm.box = "CentOs"
db_mysql.vm.box_url = "http://developer.nrel.gov/downloads/vagrant-boxes/CentOS-6.4-x86_64-v20130731.box"
db_mysql.vm.hostname = 'db.mysql.local'
db_mysql.vm.network :private_network, ip: "22.22.22.22"
db_mysql.vm.network :forwarded_port, guest: 3306, host: 3306
db_mysql.vm.provision :puppet do |puppet|
puppet.manifests_path = "puppet/manifests"
puppet.manifest_file = "db.pp"
puppet.module_path = "puppet/modules"
puppet.options = ["--verbose", "--hiera_config /vagrant/hiera.yaml", "--parser future"]
end
But I can't seem to pass them into a array nicely. What I want is that the TEST array contains two indexes with the machine config.vm.define block as their corresponding values.
E.g.
TEST[0] = 'config.vm.define "web" do |web|
.... [REST OF THE BLOCK CONTENT] ...
end'
TEST[1] = 'config.vm.define "db" do |db_mysql|
.... [REST OF THE BLOCK CONTENT] ...
end'
The output echo "${TEST[1]}" is nothing. echo "${TEST[0]}" returns the whole block as plotted above.
I played with IFS / RS / FS but I can't seem to get the output I want.

A solution might be to write the two blocks to two separate files (blk1 and blk2) as:
awk '
/config.vm.define[[:space:]]\"[a-z]*\"[[:space:]]do[[:space:]]\|[a-zA-Z_]*\|/{f=1; i++}
f{print $0 > "blk"i}
/end/ {f=0}' Vagrantfile
and then later read these two files into the bash array as
IFS= TEST=( $(cat <"blk1") $(cat <"blk2") )
Note:
The regex \s seems to work only for the latest version of gawk (Works with version 4.1, but not version 3.1.8.
For gawk version 3.1.8, use [[:space:]] instead.
For gawk version 4.1, the regex \s does not work inside brackets [\s]. Use either config.vm.define[[:space:]] or config.vm.define\s..
Update
An alternative could be to insert an artificial separator between the blocks, for instance the string ###. Then you could do
IFS= TEST=()
while IFS= read -r -d '#' line ; do
TEST+=($line)
done < <(awk '
/config.vm.define[[:space:]]\"[a-z]*\"[[:space:]]do[[:space:]]\|[a-zA-Z_]*\|/{f=1; i++}
f{print }
/end/ {f=0; print "###"}' Vagrantfile)

KSH Error : '$' unexpected

Below KSH script results in the error "Syntax error at line 4: '$' unexpected"
!#/bin/ksh
for i in `cat pins.list`
do
set -A array_${i} `grep -i "$i " pins.txt | awk '{print $2}'`
echo "Elements of array_${i} are ${array_${i}[#]}"
done
#=================================
I am creating multiple arrays (array_$i) for each iteration of i, after parsing the file pins.txt.
I can see the arrays array_block , array_group, array_range created and the elements of pins.txt stored in these arrays correctly, but I am unable to print the values of each of these arrays due to this error. Printing the contents of these 3 arrays outside the loop has no issues. But I need to access these arrays inside the loop for further processing in my script. Is there a way to resolve this?
Contents of pins.list and pins.txt are as follows:
pins.list (Arrays)
==================
block
group
range
pins.txt
===========
range 444
group 46
range 32
block 96
group 99
range 123
block 56
range 22
Thanks

You cannot create a dynamic variable name in this way, you need eval. For example:
while read i
do
eval "set -A array_${i} \$(grep -i $i pins.txt | awk '{print $2}')"
eval "echo \"Elements of array_${i} are \${array_${i}[#]}\" "
done < pins.list
I have changed from a for loop to a while, this is an alternative method of reading a file rather than using cat (also, check your #! line).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

MongoDB $in Operator Limit when using Array from sed command - arrays

Related

reading multiple matches into arrays with bash

Saving the output of a for loop in an array in Bash

shell script : if array value was greater than a number then run a command

Shell script - awk extract block from file into array

KSH Error : '$' unexpected

Categories

Resources