I have a solr project. I want to put my csv file data into solr using dataimporthandler. I wrote this db-data-config.xml.
<dataConfig>
<dataSource type="FileDataSource"/>
<document>
<entity name="item" processor="FileListEntityProcessor" fileName="TableArchive.csv" baseDir="${solr.install.dir}/server/solr/archiveCore" dataSource="null" recursive="true" rootEntity="false">
<field column="NameAdded" name="NameAdded" />
<field column="DateAdded" name="DateAdded" />
<field column="NameModified" name="NameModified" />
<field column="DateModified" name="DateModified" />
<field column="strSO" name="strSO" />
<field column="strCust" name="strCust" />
<field column="strOperator" name="strOperator" />
<field column="PackName" name="PackName" />
<field column="DocName" name="DocName" />
</entity>
</document>
</dataConfig>
When I run data import handler from solr admin panel, it not indexing files. I don't know how to solve it.
Related
I have 1 multivalued date type field, its definition in the schema.xml is shown below:
<field name="fecha_referencia" type="pdates" uninvertible="true" indexed="true" stored="true"/>
The total of values it can take are three, here is an example where it is already indexed:
fecha_referencia:["2015-12-04T00:00:00Z",
"2014-12-15T00:00:00Z",
"2014-02-03T00:00:00Z"]
I want to know is if you can divide the values at the time of indexing (I am indexing via DIH) into other dynamic fields or separate fields.
Example of what you are looking for:
fecha_referencia:["2015-12-04T00:00:00Z",
"2014-12-15T00:00:00Z",
"2014-02-03T00:00:00Z"],
fecha1:2015-12-04T00:00:00Z,
fecha2:2014-12-15T00:00:00Z,
fecha3:2014-02-03T00:00:00Z
Note: I have tried to test regex but have had no luck.
Any contribution would be of great help and well received by your server...
This is my data.config.xml structure:
<dataConfig>
<dataSource type="JdbcDataSource" driver="org.postgresql.Driver" url="jdbc:postgresql://10.152.11.47:5433/meta" user="us" password="ntm" URIEncoding="UTF-8" />
<document >
<entity name="tr_ident" query="SELECT id_ident, titulo,proposito,descripcion,palabra_cve
FROM ntm_p.tr_ident">
<field column="id_ident" name="id_ident" />
<field column="titulo" name="titulo" />
<field column="proposito" name="proposito" />
<field column="descripcion" name="descripcion" />
<field column="palabra_cve" name="palabra_cve" />
<entity name="tr_fecha_insumo" query="select fecha_creacion,fech_ini_verif,
fech_fin_verif from ntm_p.tr_fecha_insumo where id_fecha_insumo='${tr_ident.id_ident}'">
<field name="fecha_creacion" column="fecha_creacion" />
<field name="fech_ini_verif" column="fech_ini_verif" />
<field name="fech_fin_verif" column="fech_fin_verif" />
</entity>
<entity name="ti_fecha_evento"
query="select tipo_fecha,fecha_referencia from ntm_p.ti_fecha_evento where id_fecha_evento='${tr_ident.id_ident}'">
<field column="fecha_referencia" name="fecha_referencia" />
<entity name="tc_tipo_fecha" query="select des_tipo_fecha,id_tipo_fecha from ntm_p.tc_tipo_fecha where id_tipo_fecha='${ti_fecha_evento.tipo_fecha}'">
<field column="des_tipo_fecha" name="des_tipo_fecha" />
<field column="id_tipo_fecha" name="id_tipo_fecha" />
</entity>
</entity>
</entity>
</document>
</dataConfig>
When I try to do import of schooLocationDetails solr core, I get below error . Using Solr 5.3.1
Exception while processing: opportunityDetails document : SolrInputDocument(fields: []):org.apache.solr.handler.dataimport.DataImportHandlerException: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://:<solr_pwd>#<solr_server>:<solr_port>/solr/locationCore: org.apache.solr.search.SyntaxError: Cannot parse 'locationId:': Encountered "" at line 1, column 22.
Below is my data-config.xml for the solr core schooLocationDetails.
<dataConfig>
<document>
<entity name="school" dataSource="datasource" query="select * from school_table" transformer="RegexTransformer">
<field column="recordKey" name="recordKey" />
<field column="name" name="name" />
<field column="location" name="location" />
<field column="title" name="title" />
</entity>
<entity name="locationDetail" processor="SolrEntityProcessor" url="http://<solr-user>:<solr_pwd>#<solr_server>:<solr_port>/solr/locationCore" query="locationId:${school.location}"
fl="*,old_version:_version_">
<field column="locationId" name="locationId" />
<field column="city" name="city" />
<field column="state" name="state" />
<field column="old_version" name="old_version" />
</entity>
</document>
</dataConfig>
You have to add the entity referencing the value inside the other entity. When they're two separate entities, they can't reference each other values (and they'll be imported after each other instead).
<entity name="school" dataSource="datasource" query="select * from school_table" transformer="RegexTransformer">
<field column="recordKey" name="recordKey" />
<field column="name" name="name" />
<field column="location" name="location" />
<field column="title" name="title" />
<entity name="locationDetail" processor="SolrEntityProcessor" url="" query="locationId:${school.location}"
fl="*,old_version:_version_">
<field column="locationId" name="locationId" />
<field column="city" name="city" />
<field column="state" name="state" />
<field column="old_version" name="old_version" />
</entity>
</entity>
The data is imported with the data import handler:
<dataConfig>
<dataSource
...
/>
<!-- product import -->
<document>
<!-- entity = table -->
<entity name="skn" pk="SKN" rootEntity="true" query="select * from skn">
<field column="SKN" name="id" />
<field column="root" name="root" />
<field column="SEARCHDESCRIPTION" name="SEARCHDESCRIPTION" />
<entity name="sku" child="true" query="select * from sku where SKN = '${skn.SKN}'">
<field column="SKU" name="id" />
<field column="variant1" name="variant1" />
<field column="variant2" name="variant2" />
<field column="v1_long" name="v1_long" />
<field column="v2_long" name="v2_long" />
<field column="v1_type" name="v1_type" />
<field column="v2_type" name="v2_type" />
</entity>
</entity>
<propertyWriter
dateFormat="yyyy-MM-dd HH:mm:ss"
type="SimplePropertiesWriter"
directory="conf"
filename="dataimport.properties"
locale="de-DE"
/>
</document>
</dataConfig>
I can get all childs for a certain parent or all parents for a certain child (so the nested structure is working). But I cannot retrive parents with the corresponding childs.
I tried the following query:
q={!parent which="id:1"}&fl=*,[child]&rows=200
It returns the parent document but not the corresponding child. I dont't get any error message. I also checked the log file.
Can anybody help?
I'm indexing data from database. I'm using delta import to fetch the recently updated data. However, I find that it is fetching the whole data twice and processing it once though the changes are applicable to only one row.
My config.xml where deltaquery is given:
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.github.cassandra.jdbc.CassandraDriver" url="jdbc:c*://127.0.0.1:9042/test" autoCommit="true" rowLimit = '-1' batchSize="-1"/>
<document name="content">
<entity name="test" query="SELECT * from person" deltaImportQuery="select * from person where seq=${dataimporter.delta.seq}" deltaQuery="select seq from person where last_modified > '${dataimporter.last_index_time}' ALLOW FILTERING" autoCommit="true">
<field column="seq" name="id" />
<field column="last" name="last_s" />
<field column="first" name="first_s" />
<field column="city" name="city_s" />
<field column="zip" name="zip_s" />
<field column="street" name="street_s" />
<field column="age" name="age_s" />
<field column="state" name="state_s" />
<field column="dollar" name="dollar_s" />
<field column="pick" name="pick_s" />
</entity>
</document>
</dataConfig>
There are about 2100000 rows. So it always cause a large memory consumption resulting in Running Out of Memory. What could be the problem? Or does it work in this way only?
If solr is running out of memory then it is time to add more memory to the solr box. Adding more RAM will help alleviate the issue.
I want to use multiple datasources in DataImporthandler in Solr and pass URL value in child entity after querying database in parent entity.
Here is my rss-data-config file:
<dataConfig>
<dataSource type="JdbcDataSource" name="ds-db" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/HCDACoreDB"
user="root" password="CDA#318"/>
<dataSource type="URLDataSource" name="ds-url"/>
<document>
<entity name="feeds" query="select f.feedurl, f.feedsource, c.categoryname from feeds f, category c where f.feedcategory = c.categoryid">
<field column="feedurl" name="url" dataSource="ds-db"/>
<field column="categoryname" name="category" dataSource="ds-db"/>
<field column="feedsource" name="source" dataSource="ds-db"/>
<entity name="rss"
transformer="HTMLStripTransformer"
forEach="/RDF/channel | /RDF/item"
processor="XPathEntityProcessor"
url="${dataimporter.functions.encodeUrl(feeds.feedurl)}" >
<field column="source-link" dataSource="ds-url" xpath="/rss/channel/link" commonField="true" />
<field column="Source-desc" dataSource="ds-url" xpath="/rss/channel/description" commonField="true" />
<field column="title" dataSource="ds-url" xpath="/rss/channel/item/title" />
<field column="link" dataSource="ds-url" xpath="/rss/channel/item/link" />
<field column="description" dataSource="ds-url" xpath="/rss/channel/item/description" stripHTML="true"/>
<field column="pubDate" dataSource="ds-url" xpath="/rss/channel/item/pubDate" />
<field column="guid" dataSource="ds-url" xpath="/rss/channel/item/guid" />
<field column="content" dataSource="ds-url" xpath="/rss/channel/item/content" />
<field column="author" dataSource="ds-url" xpath="/rss/channel/item/creator" />
</entity>
</entity>
</document>
What I am doings is in first entity named feeds I am querying database and want to use the feedurl as the URL for the child entity names rss.
The error I get when I run the dataimport is:
java.net.MalformedURLException: no protocol: nullselect f.feedurl, f.feedsource, c.categoryname from feeds f, category c where f
.feedcategory = c.categoryid
the URL us NULL meaning its not assigning the feedurl to URL.
Any suggestion on what I am doing wrong?
Here's an example:
<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
<dataSource name="db1" ... />
<dataSource name="db2"... />
<document>
<entity name="outer" dataSource="db1" query=" ... ">
<field column="id" />
<entity name="inner" dataSource="db2" query=" select from ... where id = ${outer.id} ">
<field column="innercolumn" splitBy=":::" />
</entity>
</entity>
</document>
the idea is to have one definition of the entity nested that does the extra query to the other database.
you can access the parent entity fields like this ${outer.id}