SOLR POST files with no extension - solr

I am using SOLR 5 and I want to scan documents that have no extensions. Unfortunately changing the file to have extensions is not an option in my case.
the command I am using is simply:
$bin/post -c mycore ../foldertobescaned -type application/pdf
the command works fine for documents that do have extension but I am getting:
Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log

If renaming the files is not an option, you can use the following script as a workaround until Solr improves its post method. It is a simple bash for loop that submits each file individually and works regardless of the file extension. Note that this script will be slower than using post on the whole folder, because each individual file transfer needs to be initialized.
Save the script below as postFolderToSolr.sh inside your Solr folder (so that Solrs bin/ folder is a subdirectory), make it executable with chmod +x postFolderToSolr.sh and then use it as follows: ./postFolderToSolr.sh mycore /home/user1/foldertobescaned/ application/pdf
Using no arguments or the wrong number of arguments prints a short usage message as help.
#!/bin/bash
set -o nounset
if [ "$#" -ne 3 ]
then
echo "Post contents of a folder to Solr."
echo
echo "Usage: postFolderToSolr.sh <colletionName> </path/to/folder> <MIME>"
echo
exit 1
fi
collection=$1
inputPath=${2%/} # remove suffix / if it exists
mime=$3
for element in $inputPath"/"*; do
bin/post -c $collection -type $mime $element
done

Related

Strange behaviour of sejda-console in batch for loop

I try to write a wrapper script for songbook generation using lilypond, latex and sejda-console (for the pdf part). Everything works so far, but I have a problem with sejda that is giving me nuts.
Here is the relevant part of my code:
for %%i in (%f%) do (
sejda-console.bat extractbybookmarks -f ".\%%~ni.pdf" -o "export\%%~ni\%title%.pdf" -l 2 -p [BOOKMARK_NAME] -e "%title%" --overwrite
)
where f is a ";"-separated list of files. This command works for the first file, but fails for all others. I can't find any difference between the commands that sejda receives. Here is my console output:
make_sheet.bat -t "Live it up" --supress *.lytex
Configuring Sejda 3.2.30
Starting execution with arguments: 'extractbybookmarks -f .\book_drums.pdf -o export\book_drums\Live it up.pdf -l 2 -p [BOOKMARK_NAME] -e Live it up --overwrite'
Java version: '1.8.0_151'
Validating parameters.
Starting task (org.sejda.impl.sambox.ExtractByOutlineTask#28701274) execution.
Opening C:\Users\skr1_\Desktop\Tools\Songbook\Sample\out\.\book_drums.pdf
Retrieving outline information for level 2 and match regex Live it up
Starting extraction by outline, level 2 and match regex Live it up
Found 0 inherited images and 0 inherited fonts potentially unused
Starting extracting Live it up pages 9 9
Created output temporary buffer C:\Users\skr1_\Desktop\Tools\Songbook\Sample\out\export\book_drums\.sejdaTmp2789047920522272436.tmp
Appended relevant outline items
Filtering annotations
Skipped acroform merge, nothing to merge
Ending extracting Live it up
Task progress: 0% done
Moving C:\Users\skr1_\Desktop\Tools\Songbook\Sample\out\export\book_drums\.sejdaTmp2789047920522272436.tmp to C:\Users\skr1_\Desktop\Tools\Songbook\Sample\out\export\book_drums\Live it up.pdf.
Extraction completed and outputs written to org.sejda.model.output.FileOrDirectoryTaskOutput#478190fc[C:\Users\skr1_\Desktop\Tools\Songbook\Sample\out\export\book_drums\Live it up.pdf]
Task (org.sejda.impl.sambox.ExtractByOutlineTask#28701274) executed in 0 seconds
Completed execution
C:\Users\skr1_\Desktop\Tools\Songbook\Sample>(sejda-console.bat extractbybookmarks -f ".\book_general.pdf" -o "export\book_general\Live it up.pdf" -l 2 -p [BOOKMARK_NAME] -e "Live it up" --overwrite )
Configuring Sejda 3.2.30
Starting execution with arguments: 'extractbybookmarks -f .\book_general.pdf -o export\book_general\Live it up.pdf -l 2 -p [BOOKMARK_NAME] -e Live it up --overwrite'
Java version: '1.8.0_151'
Invalid value (File '.\book_general.pdf' does not exist): --files -f value... : pdf files to operate on. A list of existing pdf files (EX. -f /tmp/file1.pdf or -f /tmp/password_protected_file2.pdf:secret123) (required)
Invalid value (File '.\book_general.pdf' does not exist): --files -f value... : pdf files to operate on. A list of existing pdf files (EX. -f /tmp/file1.pdf or -f /tmp/password_protected_file2.pdf:secret123) (required)
C:\Users\skr1_\Desktop\Tools\Songbook\Sample>(sejda-console.bat extractbybookmarks -f ".\book_guitar.pdf" -o "export\book_guitar\Live it up.pdf" -l 2 -p [BOOKMARK_NAME] -e "Live it up" --overwrite )
Configuring Sejda 3.2.30
Starting execution with arguments: 'extractbybookmarks -f .\book_guitar.pdf -o export\book_guitar\Live it up.pdf -l 2 -p [BOOKMARK_NAME] -e Live it up --overwrite'
Java version: '1.8.0_151'
Invalid value (File '.\book_guitar.pdf' does not exist): --files -f value... : pdf files to operate on. A list of existing pdf files (EX. -f /tmp/file1.pdf or -f /tmp/password_protected_file2.pdf:secret123) (required)
Invalid value (File '.\book_guitar.pdf' does not exist): --files -f value... : pdf files to operate on. A list of existing pdf files (EX. -f /tmp/file1.pdf or -f /tmp/password_protected_file2.pdf:secret123) (required)
Even worse, if I copy the commands that sejda receives and paste them as arguments for a new command, everything works fine.
I suspect that something is happening with the working directory in between, but I don't get it.
Also, note that the output includes the command for subsequent passes of the for-loop (starting with "(sejda-console.bat ...") though echo is off. It is not included for the first run, however.
I'm not an expert with programming, especially not with batch, and any help would be very appreciated.
I'd suggest that sejda.bat is changing the current directory.
Try
pushd
call sejda.bat ...
popd

How to Run a For Loop excluding certain filenames in PuTTY Terminal (SSH connection)?

In a SSH PuTTY connection, I have a directory with a bunch of files - I want to exclude some of these files in a FOR loop operation - Specifically, I want to exclude any files that have the words "Parrot" or "Tiger" in their filename - The below gives me the list of files I'm looking for.
ls Zoo_Animals*sas |grep -vi Parrot |grep -vi Tiger
For context, normally filenames would appear like Zoo_Animals_Monkey_07.sas or Zoo_Animals_Lion_12.sas, etc.
So, If I'd like to run a command for ALL animals, I'd normally Unixuse:
for file in Zoo_Animals*.sas; do <command here> "$file"; done
BUT, I can't seem to figure out how to run this FOR loop while excluding the files for Tigers and Parrots.
Any help would be greatly appreciated!!
Embed your first command inside the second using $()
for f in $(ls Zoo_Animals*.sas |grep -vi Parrot |grep -vi Tiger); do echo $f; done

finding a file in unix using wildcards in file name

I have few files in a folder with name pattern in which one of the section is variable.
file1.abc.12.xyz
file2.abc.14.xyz
file3.abc.98.xyz
So the third section (numeric) in above three file names changes everyday.
Now, I have a script which does some tasks on the file data. However, before doing the work, I want to check whether the file exists or not and then do the task:
if(file exist) then
//do this
fi
I wrote the below code using wildcard '*' in numeric section:
export mydir=/myprog/mydata
if[find $mydir/file1.abc.*.xyz]; then
# my tasks here
fi
However, it is not working and giving below error:
[find: not found [No such file or directory]
Using -f instead of find does not work as well:
if[-f $mydir/file1.abc.*.xyz]; then
# my tasks here
fi
What am I doing wrong here ? I am using korn shell.
Thanks for reading!
for i in file1.abc.*.xyz ; do
# use $i here ...
done
I was not using spaces before the unix keywords...
For e.g. "if[-f" should actually be " if [ -f" with spaces before and after the bracket.

Append some text to the end of multiple files in Linux

How can I append the following code to the end of numerous php files in a directory and its sub directory:
</div>
<div id="preloader" style="display:none;position: absolute;top: 90px;margin-left: 265px;">
<img src="ajax-loader.gif"/>
</div>
I have tried with:
echo "my text" >> *.php
But the terminal displays the error:
bash : *.php: ambiguous redirect
I usually use tee because I think it looks a little cleaner and it generally fits on one line.
echo "my text" | tee -a *.php
You don't specify the shell, you could try the foreach command. Under tcsh (and I'm sure a very similar version is available for bash) you can say something like interactively:
foreach i (*.php)
foreach> echo "my text" >> $i
foreach> end
$i will take on the name of each file each time through the loop.
As always, when doing operations on a large number of files, it's probably a good idea to test them in a small directory with sample files to make sure it works as expected.
Oops .. bash in error message (I'll tag your question with it). The equivalent loop would be
for i in *.php
do
echo "my text" >> $i
done
If you want to cover multiple directories below the one where you are you can specify
*/*.php
rather than *.php
BashFAQ/056 does a decent job of explaining why what you tried doesn't work. Have a look.
Since you're using bash (according to your error), the for command is your friend.
for filename in *.php; do
echo "text" >> "$filename"
done
If you'd like to pull "text" from a file, you could instead do this:
for filename in *.php; do
cat /path/to/sourcefile >> "$filename"
done
Now ... you might have files in subdirectories. If so, you could use the find command to find and process them:
find . -name "*.php" -type f -exec sh -c "cat /path/to/sourcefile >> {}" \;
The find command identifies what files using conditions like -name and -type, then the -exec command runs basically the same thing I showed you in the previous "for" loop. The final \; indicates to find that this is the end of arguments to the -exec option.
You can man find for lots more details about this.
The find command is portable and is generally recommended for this kind of activity especially if you want your solution to be portable to other systems. But since you're currently using bash, you may also be able to handle subdirectories using bash's globstar option:
shopt -s globstar
for filename in **/*.php; do
cat /path/to/sourcefile >> "$filename"
done
You can man bash and search for "globstar" for more details about this. This option requires bash version 4 or higher.
NOTE: You may have other problems with what you're doing. PHP scripts don't need to end with a ?>, so you might be adding HTML that the script will try to interpret as PHP code.
You can use sed combined with find. Assume your project tree is
/MyProject/
/MyProject/Page1/file.php
/MyProject/Page2/file.php
etc.
Save the code you want to append on /MyProject/. Call it append.txt
From /MyProject/ run:
find . -name "*.php" -print | xargs sed -i '$r append.txt'
Explain:
find does as it is, it looks for all .php, including subdirectories
xargs will pass (i.e. run) sed for all .php that have just been found
sed will do the appending. '$r append.txt' means go to the end of the file ($) and write (paste) whatever is in append.txt there. Don't forget -i otherwise it will just print out the appended file and not save it.
Source: http://www.grymoire.com/unix/Sed.html#uh-37
You can do (Work even if there's space in your file path) :
#!/bin/bash
# Create a tempory file named /tmp/end_of_my_php.txt
cat << EOF > /tmp/end_of_my_php.txt
</div>
<div id="preloader" style="display:none;position: absolute;top: 90px;margin-left: 265px;">
<img src="ajax-loader.gif"/>
</div>
EOF
find . -type f -name "*.php" | while read the_file
do
echo "Processing $the_file"
#cp "$the_file" "${the_file}.bak" # Uncomment if you want to save a backup of your file
cat /tmp/end_of_my_php.txt >> "$the_file"
done
echo
echo done
PS: You must run the script from the directory you want to browse
Inspired from #Dantastic answer :
echo "my text" | tee -a file1.txt | tee -a file2.txt

Solr Server Posting Error

How to post 5000 files to Solr server?
While posting by using command "java -jar post.jar dir/*.xml", command tool tells Argument list is too long.
The quickest solution would be using a bash script like the following:
for i in $( ls *.xml); do
cat $i | curl -X POST -H 'Content-Type: text/xml' -d #- http://localhost:8080/solr/update
echo item: $i
done
which adds to Solr, using curl, all the xml files within the current directory.
Otherwise you can write a Java main similar to the one included in post.jar, which adds all the xml files within a directory instead of having to pass all of them as arguments.

Resources