Difference between two solr indexes - solr

I have two solr indexes , Index A contains 100000 docs and B contains 110000 docs, A is subset of B, i have to perform operation where
A XOR B = result
and delete result.

Answer from here:
If there are only 100'000 documents dump all document ids and make diff. If you're using linux based system you can just use simple tools to do it. Something like that can be helpful
curl "<a href="http://your.hostA:port/solr/index/select?*:*&fl=id&wt=csv">http://your.hostA:port/solr/index/select?*:*&fl=id&wt=csv" > /tmp/idsA
curl "<a href="http://your.hostB:port/solr/index/select?*:*&fl=id&wt=csv">http://your.hostB:port/solr/index/select?*:*&fl=id&wt=csv" > /tmp/idsB
diff /tmp/idsA /tmp/idsB | grep "<\|>" | awk '{print $2;}' | sed
's/\(.*\)/<id>\1<\/id>/g' > /tmp/ids_to_delete.xml
Now you have file. Now you can just add to that file "<delete>" and
"</detele>" and upload that file into solr using curl
curl -X POST -d #/tmp/ids_to_delete.xml "<a href="http://your.hostA:port">http://your.hostA:port
/solr/index/upadte"

Related

How to create solr password hash

From solr documentation to create a user I need to add following lines to security.json config file:
"authentication":{
"class":"solr.BasicAuthPlugin",
"credentials":{
"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="
}
},
I know that under authentication.credentials the key solr is the username and value IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c= is the hash of password SolrRocks.
But my question is, how can I generate that hash?
The documentation does not mention it anywhere,
It does not look like md5, sha1, argon nor any hash known to me.
After decoding the base64 it seems to be stored as some binary data.
What kind of hash is that, and how can I create it from bash?
You'd usually use set-user in the Authentication API to add the user.
rmalchow on GitHub has created a standalone version for bash:
#!/bin/bash
PW=$1
SALT=$(pwgen 48 -1)
echo "hash : $(echo -n "$SALT$PW" | sha256sum -b | xxd -r -p | sha256sum -b | xxd -r -p | base64 -w 1024) $(echo -n "$SALT" | base64 -w1024)"

How can I get the names of all namespaces containing the word "nginx" and store those names in an array

Basically I want to automate this task where I have some namespaces in Kubernetes I need to delete and others that I want to leave alone. These namespaces contain the word nginx. So I was thinking in order to do that I could get the output of get namespace using some regex and store those namespaces in an array, then iterate through that array deleting them one by one.
array=($(kubectl get ns | jq -r 'keys[]'))
declare -p array
for n in {array};
do
kubectl delete $n
done
I tried doing something like this but this is very basic and doesn't even have the regex. But I just left it here as an example to show what I'm trying to achieve. Any help is appreciated and thanks in advance.
kubectl get ns doesn't output JSON unless you add -o json. This:
array=($(kubectl get ns | jq -r 'keys[]'))
Should result in an error like:
parse error: Invalid numeric literal at line 1, column 5
kubectl get ns -o json emits a JSON response that contains a list of Namespace resources in the items key. You need to get the metadata.name attribute from each item, so:
kubectl get ns -o json | jq -r '.items[].metadata.name'
You only want namespaces that contain the word "nginx". We could filter the above list with grep, or we could add that condition to our jq expression:
kubectl get ns -o json | jq -r '.items[]|select(.metadata.name|test("nginx"))|.metadata.name'
This will output your desired namespaces. At this point, there's no reason to store this in array and use a for loop; you can just pipe the output to xargs:
kubectl get ns -o json |
jq -r '.items[]|select(.metadata.name|test("nginx"))|.metadata.name' |
xargs kubectl delete ns
kubectl get ns
output
NAME STATUS AGE
default Active 75d
kube-public Active 75d
kube-system Active 75d
oci-service-operator-system Active 31d
olm Active 31d
command
kubectl get ns --no-headers | awk '{if ($1 ~ "de") print $1}'
Output
default
kube-node-lease
this will give you a list of namespaces
array=$(kubectl get ns --no-headers | awk '{if ($1 ~ "de") print $1}')
Testing
bash-4.2$ array=$(kubectl get ns --no-headers | awk '{if ($1 ~ "de") print $1}')
bash-4.2$ echo $array
default kube-node-lease
bash-4.2$ for n in $array; do echo $n; done
default
kube-node-lease
bash-4.2$

using command variable in column output not working

I'm using a script that uses curl to obtain specific array values from a configuration. I'd like to place the output into columns separating values (values are unknown to script). Here's my code:
# get overlay networks and their details
get_overlay=`curl -H "X-Person-Token: $auth_token" -H "X-Person-Email: $auth_email" -k "$api_host/api/v1/networks"`
# array of overlay names with uuid
overlay_name=`echo $get_overlay | jq '.[] | .name'`
overlay_uuid=`echo $get_overlay | jq '.[] | .uuid'`
echo ""
echo -e "Overlay UUID\n$oname $ouuid" | column -t
exit 0
Here's the ouput:
Overlay UUID
"TESTOVERLAY"
"Auto_API_Overlay"
"ANOTHEROVERLAYTEST" "ea178905-6ab0-4154-ab05-412dc4b39151"
"e5be9dbe-b0fc-4e30-aaf5-ac4bdcd863a7"
"850ebf6b-3651-4cf1-aae1-5a6c03fad61b"
What I was expecting was:
Overlay UUID
"TESTOVERLAY" "ea178905-6ab0-4154-ab05-412dc4b39151"
"Auto_API_Overlay" "e5be9dbe-b0fc-4e30-aaf5-ac4bdcd863a7"
"ANOTHEROVERLAYTEST" "850ebf6b-3651-4cf1-aae1-5a6c03fad61b"
I'm an absolute beginner at this, any insight is very much appreciated.
Thanks!
I would suggest using paste to combine your two variables line by line:
paste <(printf 'Overlay\n%s\n' "$name") <(printf 'UUID\n%s\n' "$uuid") | column -t
Two process substitutions are used to pass the contents of each variable along with their titles.

Solr Server Posting Error

How to post 5000 files to Solr server?
While posting by using command "java -jar post.jar dir/*.xml", command tool tells Argument list is too long.
The quickest solution would be using a bash script like the following:
for i in $( ls *.xml); do
cat $i | curl -X POST -H 'Content-Type: text/xml' -d #- http://localhost:8080/solr/update
echo item: $i
done
which adds to Solr, using curl, all the xml files within the current directory.
Otherwise you can write a Java main similar to the one included in post.jar, which adds all the xml files within a directory instead of having to pass all of them as arguments.

How to boost a SOLR document when indexing with /solr/update

To index my website, I have a Ruby script that in turn generates a shell script that uploads every file in my document root to Solr. The shell script has many lines that look like this:
curl -s \
"http://localhost:8983/solr/update/extract?literal.id=/about/core-team/&commit=false" \
-F "myfile=#/extra/www/docroot/about/core-team/index.html"
...and ends with:
curl -s http://localhost:8983/solr/update --data-binary \
'<commit/>' -H 'Content-type:text/xml; charset=utf-8'
This uploads all documents in my document root to Solr. I use tika and ExtractingRequestHandler to upload documents in various formats (primarily PDF and HTML) to Solr.
In the script that generates this shell script, I would like to boost certain documents based on whether their id field (a/k/a url) matches certain regular expressions.
Let's say that these are the boosting rules (pseudocode):
boost = 2 if url =~ /cool/
boost = 3 if url =~ /verycool/
# otherwise we do not specify a boost
What's the simplest way to add that index-time boost to my http request?
I tried:
curl -s \
"http://localhost:8983/solr/update/extract?literal.id=/verycool/core-team/&commit=false" \
-F "myfile=#/extra/www/docroot/verycool/core-team/index.html" \
-F boost=3
and:
curl -s \
"http://localhost:8983/solr/update/extract?literal.id=/verycool/core-team/&commit=false" \
-F "myfile=#/extra/www/docroot/verycool/core-team/index.html" \
-F boost.id=3
Neither made a difference in the ordering of search results. What I want is for the boosted results to come first in search results, regardless of what the user searched for (provided of course that the document contains their query).
I understand that if I POST in XML format I can specify the boost value for either the entire document or a specific field. But If I do that, it isn't clear how to specify a file as the document contents. Actually, the tika page provides a partial example:
curl "http://localhost:8983/solr/update/extract?literal.id=doc5&defaultField=text" \
--data-binary #tutorial.html -H 'Content-type:text/html'
But again it isn't clear where/how to specify my boost. I tried:
curl \
"http://localhost:8983/solr/update/extract?literal.id=mydocid&defaultField=text&boost=3"\
--data-binary #mydoc.html -H 'Content-type:text/html'
and
curl \
"http://localhost:8983/solr/update/extract?literal.id=mydocid&defaultField=text&boost.id=3"\
--data-binary #mydoc.html -H 'Content-type:text/html'
Neither of which altered search results.
Is there a way to update just the boost attribute of a document (not a specific field) without altering the document contents? If so, I could accomplish my goal in two steps:
1) Upload/index document as I have been doing
2) Specify boost for certain documents
To index a document in Solr, you have to POST it to the /update handler. The documents to index are put in the body of the POST request. In general, you have to use the xml format format of Solr. Using that xml, you can add a boost value to a specific field or to a whole document.

Resources