How to post 5000 files to Solr server?
While posting by using command "java -jar post.jar dir/*.xml", command tool tells Argument list is too long.
The quickest solution would be using a bash script like the following:
for i in $( ls *.xml); do
cat $i | curl -X POST -H 'Content-Type: text/xml' -d #- http://localhost:8080/solr/update
echo item: $i
done
which adds to Solr, using curl, all the xml files within the current directory.
Otherwise you can write a Java main similar to the one included in post.jar, which adds all the xml files within a directory instead of having to pass all of them as arguments.
Related
How might i take the output from a pipe and use curl to post that as a file?
E.g. the following workds
curl -F 'file=#data/test.csv' -F 'filename=test.csv' https://mydomain#apikey=secret
I'd like to get the file contents from a pipe instead but I can't quite figure out how to specify it as a file input. My first guess is -F 'file=#-' but that's not quite right.
cat data/test.csv | curl -F 'file=#-' -F 'filename=test.csv' https://mydomain#apikey=secret
(Here cat is just a substitute for a more complex sequence of events that would get the data)
Update
The following works:
cat test/data/test.csv | curl -XPOST -H 'Content-Type:multipart/form-data' --form 'file=#-;filename=test.csv' $url
If you add --trace-ascii - to the command line you'll see that curl already uses that Content-Type by default (and -XPOST doesn't help either). It was rather your fixed -F option that did the trick!
I have a very specific problem with bash and curl.
What we do is:
reading a password from jenkins and paste it to a config-file (i don't have access to the password)
read parameters from config-file in bash (host, user, password, etc.) and store it in variables
post something with curl to a database and store the result in a variable
Recently we added shellcheck to our deploy-scripts and therefore we need to put the variables in quotes.
That's the request we want to send (shellcheck-approved):
result=$(curl -s -XPOST "${dbURL}" --header "Authorization: Basic $(echo -n "${dbUser}:${dbPwd}" | base64)" --data-binary "blabla")
And here's the error message we get in return:
{"error":"authorization failed"}
It does work, when I unquote the password-variable ("${dbUser}":${dbPwd}). But then spellcheck complains, that I need to put all variables in quotes. Also it does work on another machine with different password (which I have no access to either).
It is the same, when I use --user username:password. So it seems like the problem lies within the password.
Using google and testing the procedure (without the curl) with different special characters couldn't solve it either.
Has anyone experienced something like this?
Edit1:
This is an extract from jenkins-deploy-file ..
stage('config files') {
withCredentials([string(credentialsId: "${env_params.db_password}", variable: 'db_pw')]) {
sshagent(credentials: ["${env_params.user}"]) {
sh "echo \"dbPwd=${db_pw}\" >> environment_variables/config.properties"
This is how the shell script stores the password ..
dbPwd=$(grep ^"$dbPwd" <PATH>/config.properties | cut -d "=" -f2)
thanks for your support.
It seems like there are trailing whitespaces in the password-storage.
I removed them using sed and now it works.
dbPwd=$(grep ^"$dbPwd" <PATH>/config.properties | cut -d "=" -f2 | sed -e 's/[[:space:]]*$//')
You can just set the password in another file and use the content of the file as your password variable.
How can I convert more than one document using the Document Conversion service.
I have between 50-100 MS Word and PDF documents that I want to convert using the convert_document API method?
For example, can you supply multiple .pdf or *.doc files like this?:
curl -u "username":"password" -X POST
-F "config={\"conversion_target\":\"ANSWER_UNITS\"};type=application/json"
-F "file=#\*.doc;type=application/msword"
"https://gateway.watsonplatform.net/document-conversion-experimental/api/v1/convert_document"
That gives an error unfortunately: curl: (26) couldn't open file "*.doc".
I have also tried "file=#file1.doc,file2.doc,file3.doc" but that gives errors as well.
The service only accept one file at a time, but you can call it multiple time.
#!/bin/bash
USERNAME="<service-username>"
PASSWORD="<service-password>"
URL="https://gateway.watsonplatform.net/document-conversion-experimental/api/v1/convert_document"
DIRECTORY="/path/to/documents"
for doc in *.doc
do
echo "Converting - $doc"
curl -u "$USERNAME:$PASSWORD" \
-F 'config={"conversion_target":"ANSWER_UNITS"};type=application/json' \
-F "file=#$doc;type=application/pdf" "$URL"
done
Document Conversion documentation and API Reference.
I am using SOLR 5 and I want to scan documents that have no extensions. Unfortunately changing the file to have extensions is not an option in my case.
the command I am using is simply:
$bin/post -c mycore ../foldertobescaned -type application/pdf
the command works fine for documents that do have extension but I am getting:
Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
If renaming the files is not an option, you can use the following script as a workaround until Solr improves its post method. It is a simple bash for loop that submits each file individually and works regardless of the file extension. Note that this script will be slower than using post on the whole folder, because each individual file transfer needs to be initialized.
Save the script below as postFolderToSolr.sh inside your Solr folder (so that Solrs bin/ folder is a subdirectory), make it executable with chmod +x postFolderToSolr.sh and then use it as follows: ./postFolderToSolr.sh mycore /home/user1/foldertobescaned/ application/pdf
Using no arguments or the wrong number of arguments prints a short usage message as help.
#!/bin/bash
set -o nounset
if [ "$#" -ne 3 ]
then
echo "Post contents of a folder to Solr."
echo
echo "Usage: postFolderToSolr.sh <colletionName> </path/to/folder> <MIME>"
echo
exit 1
fi
collection=$1
inputPath=${2%/} # remove suffix / if it exists
mime=$3
for element in $inputPath"/"*; do
bin/post -c $collection -type $mime $element
done
I am trying to make a web search application with solr but I have problems. The problem is that in the example that I followed, all the files are in the same folder. But I want to index files from different directories (ie give the root folder and index all the xml files from all subdirectories). Is that possible?
Try the SimplePostTool recursive option:
java -Dauto -Drecursive -jar post.jar
Try this in a shell script (untested):
#!/bin/sh
FILES=$(find . -iname "*.xml")
URL=http://localhost:8983/solr/update
for f in $FILES; do
echo "Posting $f"
curl $URL --data-binary #$f -H 'Content-type:application/xml'
echo
done
#send the commit command to make sure all the changes are flushed and visible
curl $URL --data-binary '<commit/>' -H 'Content-type:application/xml'
echo
Place it in the root folder where you have the xml files.
(i assume you have linux and the 'post.sh' script is the example you followed)