Is it possible to put myBatis (iBatis) xml mappers outside the project? - ibatis

According to the user guide i am able to use file path instead of
resource:
// Using classpath relative resources
<mappers>
<mapper resource="org/mybatis/builder/AuthorMapper.xml"/>
</mappers>
// Using url fully qualified paths
<mappers>
<mapper url="file:///var/sqlmaps/AuthorMapper.xml"/>
</mappers>
in my project I'm trying to put my mapper xml "outside" the project
and i'm doing this:
<mapper url="file://D:/Mappers/ComponentMapper1.xml" />
The output of my log4j console:
Error building SqlSession.
The error may exist in file://D:/Mappers/ComponentMapper1.xml
Cause: org.apache.ibatis.builder.BuilderException: Error parsing
SQL Mapper Configuration. Cause: java.net.UnknownHostException: D
Is it bug or it's me doing something wrong?

You just need an additional forward slash before the drive letter.

Sql Map Config looks for mapping files relatively to the classpath, so just try adding your ComponentMapper1.xml somewhere to the classpath.
set CLASSPATH=%CLASSPATH%;D:/Mappers/
...
<mapper resource="ComponentMapper1.xml" />

You must use
<mapper url="file:///usr/local/ComponentMapper1.xml" />
Where file:///usr/local/ComponentMapper1.xmlis the path to your XML File, instead of the resource if you want use mappers outside resource dir.

Related

WiX installer can't open config file

I have a WPF application and I've created an MSI to install it with the WixToolset 3.11 and Visual Studio Extension 2019. I'm trying to add either XmlFile or XmlConfig item to change values in the config file. I'm getting the following error:
Failed to open XML file C:\Program Files(x86)\CO Apps\Main App\OurApp.exe.config. system error: -2147024786
The file path is the full filepath because I gave it the full path trying to resolve the issue. Here's the important parts of the wxs file
<Wix xmlns="http://schemas.microsoft.com/wix/2006/wi"
xmlns:wixutil="http://schemas.microsoft.com/wix/UtilExtension">
<Product Id="9E76F000-5525-4BDF-8262-AE46B035D9CE"
Name="Our App"
Language="1033"
Version="2.0.0.0"
Manufacturer="CO Apps"
UpgradeCode="7CFB1B51-F5D5-4AD4-A509-F5C9BC05F875">
<Package Id="*" InstallerVersion="200" Compressed="yes" InstallScope="perMachine" Description="Our production application." />
<Directory Id="ProgramFilesFolder">
<Directory Id="VTAPPSDIR" Name="CO Apps">
<Directory Id="INSTALLFOLDER" Name="Our App">
<Component Id="MainExecutable" Guid="748368D7-7581-4809-A8FE-DFB1093D6A02">
<File Id="MainFile" Name="$(var.OurApp.TargetFileName)" DiskId="1" Source="$(var.OurApp.TargetDir)OurApp.exe" KeyPath="yes"></File>
<File Id="OurApp.exe.config" ReadOnly="no" Source="$(var.OurApp.TargetDir)OurApp.exe.config"></File>
... More File items for DLLs
<wixutil:XmlFile Id="SetAppMode" Action="setValue" File="C:\Program Files (x86)\CO Apps\Our App\OurApp.exe.confg" ElementPath="configuration/userSettings/OurApp.Properties.Settings/setting/AppMode/value" Value="Main" />
</Directory>
</Directory>
</Directory>
So I'm trying to set the "AppMode" value to "Main" when this installs. What I'm trying to set isn't the point it's that it can't seem to find or open the file. I've tried putting the XmlFile in its own component. I've tried several variations of File paths including [INSTALLDIR] and [INSTALLLOCATION] and the filename by itself. Without that line everything works great. With that line in, I get the error and it rolls back the install. I also tried XmlConfig instead of XmlFile:
<wixutil:XmlConfig Id="ClearConfigAppMode" Action="delete" File="[INSTALLLOCATION]OurApp.exe.config" ElementPath="userSettings/OurApp.Properties.Settings" Name="AppMode" />
<wixutil:XmlConfig Id="SetAppMode" Action="create" File="[INSTALLLOCATION]OurApp.exe.config" ElementPath="userSettings/OurApp.Properties.Settings" On="install" Node="element">
<wixutil:XmlConfig Id="SetConfigAppModeName" ElementId="SetAppMode" File="[INSTALLLOCATION]OurApp.exe.config" Name="name" Value="AppMode" />
<wixutil:XmlConfig Id="SetConfigAppModeSerializeAs" ElementId="SetAppMode" File="[INSTALLLOCATION]OurApp.exe.config" Name="serializeAs" Value="String" />
</wixutil:XmlConfig>
<wixutil:XmlConfig Id="SetAppModeValue" Action="create" File="[INSTALLLOCATION]OurApp.exe.config" ElementPath="userSettings/OurApp.Properties.Settings" On="install" Node="element" Sequence="2">
<wixutil:XmlConfig Id="SetAppModeVAlueMain" ElementId="SetAppModeValue" File="[INSTALLLOCATION]OurApp.exe.config" Name="Value" Value="Main" />
</wixutil:XmlConfig>
Since XmlConfig doesn't have setValue on an existing element I used the delete action to remove the item for use in development and insert a new one. Same error. It happens logged on as myself or as Administrator. Does anyone have a working example of WiX with WPF creating a MSI? I'm not looking for something as complex as WixBA. I just need to modify the app.exe.config file on install.
Thanks,
Mike
Example: Though I rarely use this feature, I have this working example here (my test project for XML): https://github.com/glytzhkof/WiXUpdateXmlFile. Snippets of the sample here and here.
Disclaimer: I am not sure if follows best practice for XML updates, since I prefer to do XML updates from application launch code instead - if possible (single source, easier debugability and in general more familiar territory for most developers).
app.config/web.config appsettings: Maybe check out this answer regarding appsettings or this answer (looks better) - just for your review, not necessarily a suggestion. Keeping deployed files read-only helps a lot to overwrite them reliably during updates and the file you generate can be kept untouched by the installer (the file is de-coupled from installer - it never touches them). Or as I wrote: HKCU can also be used to write "the few settings you actually have to change". Not so nice conceptually?
Clouded Settings: Personally I think settings should never be file-based but clouded in our day and age (kept in a remote database). See section 6 and 7 here. How realistic this is for your application I don't know. New challenges and problems - no doubt (network issues, firewalls, launch problems, etc...), but benefits: versioned settings, recovery and management (enforce new settings). Not sure about all the practicalities - never been involved that much, but would love to get rid of settings files - especially for corporate apps. However, sometimes nice concepts don't meet reality well - maybe it is too involved?

Setting timeout for pollenrich using configuration property

I am using pollenrich in my code to get the message from the queue:
<pollEnrich uri="activemq:queueName" timeout="5000"/>
Now, I want to read the timeout value from config file declared in etc folder.
Something like this:
<pollEnrich uri="file:inbox?fileName=data.txt" timeout="{{readTimeout}}"/>
While doing so, I am getting the following error:
org.xml.sax.SAXParseException : cvc-datatype-valid.1.2.1: '{{readTimeout}}' is not a valid value for 'integer'
This error only comes for pollenrich and nowhere else in my code. I am able to use other properties from config file in the same camel-context.
e.g.,
<from uri="timer://TestTimer?period={{timer.interval}}&delay={{startupDelay}}/>
See the documentation at: http://camel.apache.org/using-propertyplaceholder.html at the section titled Using property placeholders for any kind of attribute in the XML DSL

Configuration file

I need to read some property from configuration file. I don't want to store the property file inside the location. What is best practice
For example, if execute as follows
java -jar payara-micro.jar --deploy demo.jar
I want to keep the parameter file where payara-micro.jar located. I need to read the property file inside the war file. How to achieve it.
Thank you.
You may first start payara-micro with --rootDir path option. Payara treats this dir as working, so it creates `config' dir there. Then just edit domain.xml file as you need and start payara-micro again. All resources you create will be available at you beans as usual. For example you may add some properties like this:
...
<resources>
<jdbc-resource pool-name="DerbyPool" jndi-name="jdbc/__default" object-type="system-all" />
<jdbc-connection-pool is-isolation-level-guaranteed="false" name="DerbyPool" datasource-classname="org.apache.derby.jdbc.EmbeddedDataSource" res-type="javax.sql.DataSource">
<property name="databaseName" value="${com.sun.aas.instanceRoot}/lib/databases/embedded_default" />
<property name="connectionAttributes" value=";create=true" />
</jdbc-connection-pool>
<connector-connection-pool max-pool-size="250" steady-pool-size="1" name="jms/__defaultConnectionFactory-Connection-Pool" resource-adapter-name="jmsra" connection-definition-name="javax.jms.ConnectionFactory"></connector-connection-pool>
<connector-resource pool-name="jms/__defaultConnectionFactory-Connection-Pool" jndi-name="jms/__defaultConnectionFactory" object-type="system-all-req"></connector-resource>
<context-service description="context service" jndi-name="concurrent/__defaultContextService" object-type="system-all"></context-service>
<managed-executor-service maximum-pool-size="200" core-pool-size="1" long-running-tasks="true" keep-alive-seconds="300" hung-after-seconds="300" task-queue-capacity="20000" jndi-name="concurrent/__defaultManagedExecutorService" object-type="system-all"></managed-executor-service>
<managed-scheduled-executor-service core-pool-size="1" long-running-tasks="true" keep-alive-seconds="300" hung-after-seconds="300" jndi-name="concurrent/__defaultManagedScheduledExecutorService" object-type="system-all"></managed-scheduled-executor-service>
<managed-thread-factory description="thread factory" jndi-name="concurrent/__defaultManagedThreadFactory" object-type="system-all"></managed-thread-factory>
<custom-resource factory-class="org.glassfish.resources.custom.factory.PropertiesFactory" res-type="java.util.Properties" jndi-name="myconf">
<property name="some.my.property" value="some.value"></property>
</custom-resource>
</resources>
(see custom-resource tag)
Then just inject it into you bean:
#Resource(type=java.util.Properties.class, name="myconf")
private final Properties parameters;
Also you may specify --domainConfig file to keep configuration anywhere you want.
Use --help to see full options list.
You can pass system properties to the payara micro using a command line argument, like this:
java -jar payara-micro.jar --deploy app.war --systemProperties=sys.properties
Also check out the Payara micro documentation about this option.
You can pass system properties configured in domain.xml file. This overrides the default domain.xml.
java -jar payara-micro.jar --domainConfig domain.xml --deploy app.war
You can get the default domain.xml from the payara-micro.jar

Can I force the installer project to use the .config file from the built solution instead of the original one?

I am using the solution to this question in order to apply configuration changes to App.config in a Winforms project. I also have an installer project for the project that creates an installable *.msi file. The problem is, the config file bundled in the installers is the original, un-transformed config file. So we're not getting the production connection strings in the production installer even though the config file for the built winforms project has all the correct transformations applied.
Is there any way to force the installer project to use the output of project build?
First of all: it is impossible to make the Setup Project point to another app.config file by using the Primary output option. So my solution is going to be a work around. I hope you find it useful in your situation.
Overview:
The basic idea is:
Remove the forced app.config from the Setup Project;
Add a file pointing to the app.config, manually;
Use MSBuild to get into the vdproj file, and change it to match the real output of the transformed app.config.
Some drawbacks are:
The setup project only gets updated, if the project it deploys build. ahhh... not a real drawback!
You need MSBuild 4.0... this can also be worked around!
Need a custom Task, called FileUpdate... it is open source and has installer.
Lets Work:
1) Go to your Setup Project, and select the Primary Output object, right click and go to Properties. There you will find the Exclude Filter... add a filter for *.config, so it will remove the hard-coded app.config.
2) Right click your Setup Project in the Solution Explorer -> Add -> File... select any file that ends with .config.
3) Download MSBuild Community Tasks Project, I recomend the msi installer.
4) Unload your project (the csproj) and replace the code from the other question with this one:
Code:
<UsingTask TaskName="TransformXml" AssemblyFile="$(MSBuildExtensionsPath)\Microsoft\VisualStudio\v10.0\Web\Microsoft.Web.Publishing.Tasks.dll" />
<Import Project="$(MSBuildExtensionsPath)\MSBuildCommunityTasks\MSBuild.Community.Tasks.Targets" />
<Target Name="AfterCompile" Condition="exists('app.$(Configuration).config')">
<!-- Generate transformed app config in the intermediate directory -->
<TransformXml Source="app.config" Destination="$(IntermediateOutputPath)$(TargetFileName).config" Transform="app.$(Configuration).config" />
<!-- Force build process to use the transformed configuration file from now on. -->
<ItemGroup>
<AppConfigWithTargetPath Remove="app.config" />
<AppConfigWithTargetPath Include="$(IntermediateOutputPath)$(TargetFileName).config">
<TargetPath>$(TargetFileName).config</TargetPath>
</AppConfigWithTargetPath>
</ItemGroup>
<PropertyGroup>
<SetupProjectPath>$(MSBuildProjectDirectory)\$(IntermediateOutputPath)$(TargetFileName).config</SetupProjectPath>
</PropertyGroup>
<!-- Change the following so that this Task can find your vdproj file -->
<FileUpdate Files="$(MSBuildProjectDirectory)\..\Setup1\Setup1.vdproj"
Regex="(.SourcePath. = .8:).*\.config(.)"
ReplacementText="$1$(SetupProjectPath.Replace(`\`,`\\`))$2" />
<FileUpdate Files="$(MSBuildProjectDirectory)\..\Setup1\Setup1.vdproj"
Regex="(.TargetName. = .8:).*\.config(.)"
ReplacementText="$1$(TargetFileName).config$2" />
</Target>
5) The previous code must be changed, so that it can find your vdproj file. I have placed a comment in the code, indicating where you need to make the change.
Now, everytime you build your main project, the MSBuild will change the Setup project, so that it uses the correct app.config file. It may have drawbacks, but this solution can be polished and become better. If you need leave a comment, and I'll try to respond ASAP.
Resources I Used
MSBuild 4.0 is needed because I need to use String's Replace function, to replace single "\" to double "\" in the path. See
MSBuild Property Functions for details about using function in MSBuild.
I learned about the FileUpdate Task in this other question. The official project is MSBuild Community Tasks Project.
These two topics were important to my findings:
Trying to include configuration specific app.config files in a setup project
Problems with setup project - am I thick?
Another solution I've found is not to use the transformations but just have a separate config file, e.g. app.Release.config. Then add this line to your csproj file.
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|x86' ">
<AppConfig>App.Release.config</AppConfig>
</PropertyGroup>
This will force the deployment project to use the correct config file when packaging.
I combined the best of the following answers to get a fully working solution without using any external tools at all:
1. Setup App.Config transformations
Source: https://stackoverflow.com/a/5109530
In short:
Manually add additional .config files for each build configuration and edit the raw project file to include them similar to this:
<Content Include="App.config" />
<Content Include="App.Debug.config" >
<DependentUpon>App.config</DependentUpon>
</Content>
<Content Include="App.Release.config" >
<DependentUpon>App.config</DependentUpon>
</Content>
Then include the following XML at the end of the project file, just before the closing </project> tag:
<UsingTask TaskName="TransformXml" AssemblyFile="$(MSBuildExtensionsPath)\Microsoft\VisualStudio\v$(VisualStudioVersion)\Web\Microsoft.Web.Publishing.Tasks.dll" />
<Target Name="AfterCompile" Condition="exists('app.$(Configuration).config')">
<TransformXml Source="app.config" Destination="$(IntermediateOutputPath)$(TargetFileName).config" Transform="app.$(Configuration).config" />
<ItemGroup>
<AppConfigWithTargetPath Remove="app.config" />
<AppConfigWithTargetPath Include="$(IntermediateOutputPath)$(TargetFileName).config">
<TargetPath>$(TargetFileName).config</TargetPath>
</AppConfigWithTargetPath>
</ItemGroup>
</Target>
Finally edit the additional .config files to include the respective transformations for each build configuration:
<?xml version="1.0" encoding="utf-8"?>
<configuration xmlns:xdt="http://schemas.microsoft.com/XML-Document-Transform">
<!-- transformations here-->
</configuration>
2. Include the appropriate .config in the setup project
First, add a command in the postbuild event of your main project to move the appropriate transformed .config file to a neutral location (e.g. the main bin\ directory):
copy /y "$(TargetDir)$(TargetFileName).config" "$(ProjectDir)bin\$(TargetFileName).config"
(Source: https://stackoverflow.com/a/26521986)
Open the setup project and click the "Primary output..." node to display the properties window. There, add an ExludeFilter "*.config" to exclude the default (untransformed) .config file.
(Source: https://stackoverflow.com/a/6908477)
Finally add the transformed .config file (from the postbuild event) to the setup project (Add > File).
Done.
You can now freely add build configurations and corresponding config transforms and your setup project will always include the appropriate .config for the active configuration.
I accomplished this in a different manner with no external tools:
I added a post-build event that copied the target files to a 'neutral' directory (the root of the /bin folder in the project) and then added this file to the .vdproj. The deployment project now picks up whatever the latest built version is:
Post Build Command:
copy /y "$(TargetDir)$(TargetFileName).config" "$(ProjectDir)bin\$(TargetFileName).config"
This worked for what I needed without any external tools, and works nicely with SlowCheetah transformations.
Based off Alec's answer, here is a similar element that you can use along with the transformations and still get their full benefit:
<ItemGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' ">
<Content Include="$(OutputPath)$(AssemblyName).dll.config">
<InProject>false</InProject>
<Link>$(AssemblyName).dll.config</Link>
</Content>
</ItemGroup>
This way, you can use the SlowCheetah transforms or the built-in ones to transform your .config file, and then go into your Visual Studio Deployment Project (or other) and include the Content from the affected project in your Add -> Project Output... page easily, with minimal changes.
None of the above solutions or any articles worked for me in deployment/setup project. Spent many days to figure out the right solution. Finally this approach worked for me.
Pre requisites
I've used utility called cct.exe to transform file explicitly. You can download from here
http://ctt.codeplex.com/
I've used custom installer in setup project to capture installation events.
Follow these steps to achieve app config transformation
1) Add your desired config files to your project and modify your .csproj file like these
<Content Include="app.uat.config">
<DependentUpon>app.config</DependentUpon>
</Content>
<Content Include="app.training.config">
<DependentUpon>app.config</DependentUpon>
</Content>
<Content Include="app.live.config">
<DependentUpon>app.config</DependentUpon>
</Content>
I've added them as content so that they can be copied to output directory.
2) Add cct.exe to your project which you downloaded.
3) Add custom installer to your project which should look like this
[RunInstaller(true)]
public partial class CustomInstaller : System.Configuration.Install.Installer
{
string currentLocation = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location);
string[] transformationfiles = Directory.GetFiles(Path.GetDirectoryNam(Assembly.GetExecutingAssembly().Location), "app.*.config");
public CustomInstaller()
{
InitializeComponent();
// Attach the 'Committed' event.
this.Committed += new InstallEventHandler(MyInstaller_Committed);
this.AfterInstall += new InstallEventHandler(CustomInstaller_AfterInstall);
}
void CustomInstaller_AfterInstall(object sender, InstallEventArgs e)
{
try
{
Directory.SetCurrentDirectory(currentLocation);
var environment = Context.Parameters["Environment"];
var currentconfig = transformationfiles.Where(x => x.Contains(environment)).First();
if (currentconfig != null)
{
FileInfo finfo = new FileInfo(currentconfig);
if (finfo != null)
{
var commands = string.Format(#"/C ctt.exe s:yourexename.exe.config t:{0} d:yourexename.exe.config ", finfo.Name);
using (System.Diagnostics.Process execute = new System.Diagnostics.Process())
{
execute.StartInfo.FileName = "cmd.exe";
execute.StartInfo.RedirectStandardError = true;
execute.StartInfo.RedirectStandardInput = true;
execute.StartInfo.RedirectStandardOutput = true;
execute.StartInfo.UseShellExecute = false;
execute.StartInfo.CreateNoWindow = true;
execute.StartInfo.Arguments = commands;
execute.Start();
}
}
}
}
catch
{
// Do nothing...
}
}
// Event handler for 'Committed' event.
private void MyInstaller_Committed(object sender, InstallEventArgs e)
{
XmlDocument doc = new XmlDocument();
var execonfigPath = currentLocation + #"\yourexe.exe.config";
var file = File.OpenText(execonfigPath);
var xml = file.ReadToEnd();
file.Close();
doc.LoadXml(FormatXmlString(xml));
doc.Save(execonfigPath);
foreach (var filename in transformationfiles)
File.Delete(filename);
}
private static string FormatXmlString(string xmlString)
{
System.Xml.Linq.XElement element = System.Xml.Linq.XElement.Parse(xmlString);
return element.ToString();
}
}
Here I am using two event handlers CustomInstaller_AfterInstall in which I am loading correct config file and transforming .
In MyInstaller_Committed I am deleting transformation files which we don't need on client machine once we apply has been applied. I am also indenting transformed file because cct simply transforms elements were aligned ugly.
4) Open your setup project and add project output content file so that setup can copy config files like app.uat.config,app.live.config etc into client machine.
In previous step this snippet will load all available config files but we need supply right transform file
string[] transformationfiles = Directory.GetFiles(Path.GetDirectoryNam
(Assembly.GetExecutingAssembly().Location), "app.*.config");
For that I've added UI dialog on setup project to get the current config. The dialog gives options for user to select environment like "Live" "UAT" "Test" etc .
Now pass the selected environment to your custom installer and filter them.
It will become lengthy article if I explain on how to add dialog,how to set up params etc so please google them. But idea is to transform user selected environment.
The advantage of this approach is you can use same setup file for any environment.
Here is the summary:
Add config files
Add cct exe file
Add custom installer
Apply transformation on exe.config under after install event
Delete transformation files from client's machine
Modify setup project in such a way that
set up should copy all config files(project output content) and cct.exe into output directory
configure UI dialog with radio buttons (Test,Live,UAT..)
pass the selected value to custom installer
Solution might look lengthy but have no choice because MSI always copy app.config and doesn't care about project build events and transformations. slowcheetah works only with clickonce not setup project
The question is old, but the following could still help many folks out there.
I would simply use Wix WiFile.exe to replace the concerned file in the msi this way (for the sake of this example, we call your msi yourPackage.msi):
Step 1. From command prompt run: WiFile.exe "yourPackage.msi" /x "app.exe.config."
The above will extract the "wrong" app.exe.config file from the msi and place it the same directory as your msi;
Step 2. Place the new (prod) config file (must have the same name as the extracted file: app.exe.config) in same location as your msi;
This means that you are overwritting the app.exe.config that has just been extracted in step 1 above, with your new (production config file);
Step 3. From command prompt run: WiFile.exe "yourPackage.msi" /u "app.exe.config."
THAT'S ALL!
The above can be done in a few seconds. You could automate the task if you wanted, for instance, by running it as batch or else.
After running step 3 above, your msi will contain the new config file, which will now be installed at your clients' when they run the setup.

Indexing PDF with Solr

Can anyone point me to a tutorial.
My main experience with Solr is indexing CSV files. But I cannot find any simple instructions/tutorial to tell me what I need to do to index pdfs.
I have seen this: http://wiki.apache.org/solr/ExtractingRequestHandler
But it makes very little sense to me. Do I need to install Tika?
Im lost - please help
With solr-4.9 (the latest version as of now), extracting data from rich documents like pdfs, spreadsheets(xls, xlxs family), presentations(ppt, ppts), documentation(doc, txt etc) has become fairly simple.
The sample code examples provided in the downloaded archive from
here contains a basic solr template project to get you started quickly.
The necessary configuration changes are as follows:
Change the solrConfig.xml to include following lines :
<lib dir="<path_to_extraction_libs>" regex=".*\.jar" />
<lib dir="<path_to_solr_cell_jar>" regex="solr-cell-\d.*\.jar" />
create a request handler as follows:
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults" />
</requestHandler>
2.Add the necessary jars from the solrExample to your project.
3.Define the schema as per your needs and fire a query like :
curl "http://localhost:8983/solr/collection1/update/extract?literal.id=1&literal.filename=testDocToExtractFrom.txt&literal.created_at=2014-07-22+09:50:12.234&commit=true" -F "myfile=#testDocToExtractFrom.txt"
go to the GUI portal and query to see the indexed contents.
Let me know if you face any problems.
You could use the dataImportHandler. The DataImortHandle will be defined at the solrconfig.xml, the configuration of the DataImportHandler should be realized in an different XML config file (data-config.xml)
For indexing pdf's you could
1.) crawl the directory to find all the pdf's using the FileListEntityProcessor
2.) reading the pdf's from an "content/index"-XML File, using the XPathEntityProcessor
If you have the list of related pdf's, use the TikaEntityProcessor
look at this http://solr.pl/en/2011/04/04/indexing-files-like-doc-pdf-solr-and-tika-integration/ (example with ppt) and this Solr : data import handler and solr cell
The hardest part of this is getting the metadata from the PDFs, using a tool like Aperture simplifies this. There must be tonnes of these tools
Aperture is a Java framework for extracting and querying full-text content and metadata from PDF files
Apeture grabbed the metadata from the PDFs and stored it in xml files.
I parsed the xml files using lxml and posted them to solr
Use the Solr, ExtractingRequestHandler. This uses Apache-Tika to parse the pdf file. I believe that it can pull out the metadata etc. You can also pass through your own metadata.
Extracting Request Handler
public class SolrCellRequestDemo {
public static void main (String[] args) throws IOException, SolrServerException {
SolrClient client = new
HttpSolrClient.Builder("http://localhost:8983/solr/my_collection").build();
ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest("/update/extract");
req.addFile(new File("my-file.pdf"));
req.setParam(ExtractingParams.EXTRACT_ONLY, "true");
NamedList<Object> result = client.request(req);
System.out.println("Result: " +enter code here result);
}
This may help.
Apache Solr can now index all sort of binary files like PDF, Words, etc ... check out this doc:
https://lucene.apache.org/solr/guide/8_5/uploading-data-with-solr-cell-using-apache-tika.html

Resources