How to index multi dimension array in solr field - solr

I am indexing Mysql database with solr, I have one-many relation between users table and order table:one user can have many orders.
order table have many columns (id, orderDate, caseNumber).
My goal is to index these tables in solr and have USR_ID field to store the user id, ORDERS feild type= multidimensional array to store each order for that user as an associative array.
the desired result is:
{
"USR_ID":"10",
"ORDERS":[
{"ID":"1" ,"ORDER_DATE":"12-03-2018", "CASE_NUMBER":"554"}, //FIRST FIELD
{"ID":"9","ORDER_DATE":"15-03-2018", "CASE_NUMBER":"569"} //SECOND FIELD
]
}
what i am getting is one dimensional array with all orders columns
{
"USR_ID":"10",
"ORDERS":[
"1", "12-03-2018", "554", //FIRST FIELD
"9", "15-03-2018", "569" //SECOND FIELD
]
}
Here is what I tried.
entities config in data-config.xml
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/mydb1"
user=""
password=""/>
<document>
<entity name="USERS"
pk="USR_ID"
query="SELECT USR_UID, FROM USERS"
deltaImportQuery="SELECT USR_UID, FROM USERS WHERE USR_UID='${dih.delta.USR_UID}'"
deltaQuery="SELECT USR_UID FROM USERS WHERE USERS.USR_UPDATE_DATE > '${dih.last_index_time}'">
<entity name="ORDER" pk="ID"
query="SELECT ID AS ORDERID, ORDER_DATE, CASE_NUMBER FROM ORDER WHERE USR_ID = '${USERS.USR_UID}'"
deltaQuery="select ID from ORDER where UPDATED_AT > '${dih.last_index_time}'"
parentDeltaQuery="SELECT USR_UID FROM USERS WHERE USR_UID = ${ORDER.USR_UID}">
<field column="ORDERID" name="ORDERS" />
<field column="CREATION_DATE" name="ORDERS" />
<field column="CASE_NUMBER" name="ORDERS" />
</entity>
</entity>
</document>
</dataConfig>
Here is fields definition in schema.xml file
<field name="USR_ID" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="ORDERS" type="text_general" indexed="true" stored="true" required="false" multiValued="true"/>

You will have to go with sub-documents, or at least have one document by order since you only have an Id at the root level :
{
"USR_ID":"10",
"ID":"1" ,
"ORDER_DATE":"12-03-2018",
"CASE_NUMBER":"554"
}
See this good explantion of nested documents :
http://yonik.com/solr-nested-objects/

The answer was to use the following attribute in child="true" field when u define your data-config.xml file
in my case
<entity child="true" name="ORDER" pk="ID"
query="SELECT ID AS ORDERID, ORDER_DATE, CASE_NUMBER FROM ORDER WHERE USR_ID = '${USERS.USR_UID}'"
deltaQuery="select ID from ORDER where UPDATED_AT > '${dih.last_index_time}'"
parentDeltaQuery="SELECT USR_UID FROM USERS WHERE USR_UID = ${ORDER.USR_UID}">
<field column="ORDERID" name="ORDERS" />
<field column="CREATION_DATE" name="ORDERS" />
<field column="CASE_NUMBER" name="ORDERS" />
</entity>

Related

How to separate the values of a multivalued field into dynamic fields

I have 1 multivalued date type field, its definition in the schema.xml is shown below:
<field name="fecha_referencia" type="pdates" uninvertible="true" indexed="true" stored="true"/>
The total of values it can take are three, here is an example where it is already indexed:
fecha_referencia:["2015-12-04T00:00:00Z",
"2014-12-15T00:00:00Z",
"2014-02-03T00:00:00Z"]
I want to know is if you can divide the values at the time of indexing (I am indexing via DIH) into other dynamic fields or separate fields.
Example of what you are looking for:
fecha_referencia:["2015-12-04T00:00:00Z",
"2014-12-15T00:00:00Z",
"2014-02-03T00:00:00Z"],
fecha1:2015-12-04T00:00:00Z,
fecha2:2014-12-15T00:00:00Z,
fecha3:2014-02-03T00:00:00Z
Note: I have tried to test regex but have had no luck.
Any contribution would be of great help and well received by your server...
This is my data.config.xml structure:
<dataConfig>
<dataSource type="JdbcDataSource" driver="org.postgresql.Driver" url="jdbc:postgresql://10.152.11.47:5433/meta" user="us" password="ntm" URIEncoding="UTF-8" />
<document >
<entity name="tr_ident" query="SELECT id_ident, titulo,proposito,descripcion,palabra_cve
FROM ntm_p.tr_ident">
<field column="id_ident" name="id_ident" />
<field column="titulo" name="titulo" />
<field column="proposito" name="proposito" />
<field column="descripcion" name="descripcion" />
<field column="palabra_cve" name="palabra_cve" />
<entity name="tr_fecha_insumo" query="select fecha_creacion,fech_ini_verif,
fech_fin_verif from ntm_p.tr_fecha_insumo where id_fecha_insumo='${tr_ident.id_ident}'">
<field name="fecha_creacion" column="fecha_creacion" />
<field name="fech_ini_verif" column="fech_ini_verif" />
<field name="fech_fin_verif" column="fech_fin_verif" />
</entity>
<entity name="ti_fecha_evento"
query="select tipo_fecha,fecha_referencia from ntm_p.ti_fecha_evento where id_fecha_evento='${tr_ident.id_ident}'">
<field column="fecha_referencia" name="fecha_referencia" />
<entity name="tc_tipo_fecha" query="select des_tipo_fecha,id_tipo_fecha from ntm_p.tc_tipo_fecha where id_tipo_fecha='${ti_fecha_evento.tipo_fecha}'">
<field column="des_tipo_fecha" name="des_tipo_fecha" />
<field column="id_tipo_fecha" name="id_tipo_fecha" />
</entity>
</entity>
</entity>
</document>
</dataConfig>

handling null value in nested entity of solr data importer

I'm using the Solr Data Importer to import some category data. I didn't want to use a left join in the parent query because it's too complicated, I preferred to use nested object queries in the configuration to keep it simple.
I've got 3 one to one relationships for feature images of a category. My question is though, how can I handle it when the value in mediaItemX_id field is null? I've tried the nested configuration below, but when the value is null it's reporting invalid sql because the nested query doesn't print null - it prints blank....
<entity name="category" query="SELECT concat('CATEGORY_', c.id) as docId, c.id, externalIdentifier, name, description, shortDescription, mediaItem1_id, mediaItem2_id, mediaItem3_id, created, lastUpdated, keywords, 'CATEGORY' as docType,
name as autoSuggestField
FROM categories c inner join base_content bc where c.id = bc.id">
<field column="id" name="categoryId" />
<field column="externalIdentifier" name="externalIdentifier" />
<field column="docType" name="docType" />
<field column="name" name="name" />
<field column="description" name="description" />
<field column="shortDescription" name="shortDescription" />
<field column="created" name="created" dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss" />
<field column="lastUpdated" name="lastUpdated" dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss" />
<field column="publishDate" name="publishDate" dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss" />
<field column="archiveDate" name="archiveDate" dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss" />
<field column="autoSuggestField" name="suburbSuggest" />
<field column="keywords" name="keywords" />
<entity name="mediaItem1" query="SELECT uri, title, altText from media where ${category.mediaItem1_id} is not null and id = ${category.mediaItem1_id}">
<field column="uri" name="featureImage1Url" />
<field column="title" name="featureImage1Title" />
<field column="altText" name="featureImage1AltText" />
</entity>
<entity name="mediaItem2" query="SELECT uri, title, altText from media where ${category.mediaItem2_id} is not null and id = ${category.mediaItem2_id}">
<field column="uri" name="featureImage2Url" />
<field column="title" name="featureImage2Title" />
<field column="altText" name="featureImage2AltText" />
</entity>
<entity name="mediaItem1" query="SELECT uri, title, altText from media where ${category.mediaItem3_id} is not null and id = ${category.mediaItem3_id}">
<field column="uri" name="featureImage3Url" />
<field column="title" name="featureImage3Title" />
<field column="altText" name="featureImage3AltText" />
</entity>
</entity>
Solr supports the notion ${value:default} as replacements in other locations, so I'd try that at least:
${category.mediaItem1_id} IS NOT NULL AND id = ${category.mediaItem1_id:0}
I were unable to find a decent way to skip the entities whole if the current value is false.

Why does the Solr Data Import Handler hashes the uniqueKey?

I have a very strange problem with Solr 4.6.0.
The uniqueKey field "id" contains a hash for every document instead of my string value. If add just one custom document with the update request handler in the Solr admin I get for example the ID value "book_45" that I specified, so that is correct.
But when I do a full import with the DIH (data import handler) then the id field contains a hash for every document like "[B#53bd370f" instead of my custom value. So the problem must be in the DIH.
My import script:
<dataConfig>
<dataSource
type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://host/database"
user="user"
password="password" />
<document name="project">
<entity name="document" transformer="RegexTransformer"
query="SELECT CONCAT('book_', b.id) AS book_id, b.slug, b.title, b.isbn,
b.publisher, b.releaseYear AS release_year, b.language, b.pageCount AS page_count, b.description,
b.print, b.addedBy_id AS added_by_id, b.dt AS created,
GROUP_CONCAT(a.name SEPARATOR ';') AS authors
FROM Book b
LEFT JOIN author_book ab ON ab.book_id = b.id
LEFT JOIN Author a ON a.id = ab.author_id
GROUP BY b.id
">
<field column="book_id" name="id" />
<field column="slug" name="book_slug" />
<field column="title" name="book_title" />
<field column="isbn" name="book_isbn" />
<field column="publisher" name="book_publisher" />
<field column="release_year" name="book_release_year" />
<field column="language" name="book_language" />
<field column="page_count" name="book_page_count" />
<field column="description" name="book_description" />
<field column="print" name="book_print" />
<field column="added_by_id" name="book_added_by_id" />
<field column="created" name="book_created" />
<field column="authors" splitBy=";" name="authors" />
</entity>
</document>
The id field in my schema.xml (which is the same as in the default shipped core collection1):
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<uniqueKey>id</uniqueKey>
Does anyone know what I am missing?
the [B#53bd370f is not a hash, but the result of a byte[].toString(). Whatever Mysql is returning is being treated as a byte[] instead of a String.
Try casting the id to varchar or char like this:
SELECT cast(CONCAT('book_', b.id) as CHAR) AS book_id...

I want to use multiple datasources in DataImporthandler in Solr and pass URL value in child entity after querying database in parent entity

I want to use multiple datasources in DataImporthandler in Solr and pass URL value in child entity after querying database in parent entity.
Here is my rss-data-config file:
<dataConfig>
<dataSource type="JdbcDataSource" name="ds-db" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/HCDACoreDB"
user="root" password="CDA#318"/>
<dataSource type="URLDataSource" name="ds-url"/>
<document>
<entity name="feeds" query="select f.feedurl, f.feedsource, c.categoryname from feeds f, category c where f.feedcategory = c.categoryid">
<field column="feedurl" name="url" dataSource="ds-db"/>
<field column="categoryname" name="category" dataSource="ds-db"/>
<field column="feedsource" name="source" dataSource="ds-db"/>
<entity name="rss"
transformer="HTMLStripTransformer"
forEach="/RDF/channel | /RDF/item"
processor="XPathEntityProcessor"
url="${dataimporter.functions.encodeUrl(feeds.feedurl)}" >
<field column="source-link" dataSource="ds-url" xpath="/rss/channel/link" commonField="true" />
<field column="Source-desc" dataSource="ds-url" xpath="/rss/channel/description" commonField="true" />
<field column="title" dataSource="ds-url" xpath="/rss/channel/item/title" />
<field column="link" dataSource="ds-url" xpath="/rss/channel/item/link" />
<field column="description" dataSource="ds-url" xpath="/rss/channel/item/description" stripHTML="true"/>
<field column="pubDate" dataSource="ds-url" xpath="/rss/channel/item/pubDate" />
<field column="guid" dataSource="ds-url" xpath="/rss/channel/item/guid" />
<field column="content" dataSource="ds-url" xpath="/rss/channel/item/content" />
<field column="author" dataSource="ds-url" xpath="/rss/channel/item/creator" />
</entity>
</entity>
</document>
What I am doings is in first entity named feeds I am querying database and want to use the feedurl as the URL for the child entity names rss.
The error I get when I run the dataimport is:
java.net.MalformedURLException: no protocol: nullselect f.feedurl, f.feedsource, c.categoryname from feeds f, category c where f
.feedcategory = c.categoryid
the URL us NULL meaning its not assigning the feedurl to URL.
Any suggestion on what I am doing wrong?
Here's an example:
<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
<dataSource name="db1" ... />
<dataSource name="db2"... />
<document>
<entity name="outer" dataSource="db1" query=" ... ">
<field column="id" />
<entity name="inner" dataSource="db2" query=" select from ... where id = ${outer.id} ">
<field column="innercolumn" splitBy=":::" />
</entity>
</entity>
</document>
the idea is to have one definition of the entity nested that does the extra query to the other database.
you can access the parent entity fields like this ${outer.id}

DynamicField names from SQL value

I have this "catch all" field in my schema.xml:
<dynamicField name="*_s" type="string" indexed="true" stored="true" />
In the example below lets say i have a table that has 2 fields: "custom_value" and "custom_key" with these values:
custom_key: "mykey"
custom_value: "myvalue"
My Goal is to index a document that has a field called "mykey" and the value "myvalue". How can i do that?
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/MY_DB"
user="MYUSER"
password="MYPASS"
batchSize="-1"/>
<document>
<entity name="article" query="SELECT id, custom_key, custom_value FROM mytable">
<field column="id" name="id"/>
<field column="custom_value" name=":::WHAT TO PUT HERE?:::_s"/>
</entity>
</document>
Found a (hacky?) solution, that works for my purposes, i will not mark this question as answered for a few days, incase someone comes up with a cleaner/better solution.
<dataConfig>
<script><![CDATA[
function insertVariants(row) {
row.put(row.get('custom_key') + '_custom', row.get('custom_value'));
return row;
}
]]></script>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/MY_DB"
user="MYUSER"
password="MYPASS"
batchSize="-1"/>
<document>
<entity name="article" query="SELECT id, custom_key, custom_value FROM mytable" transformer="script:insertVariants">
<field column="id" name="id"/>
</entity>
</document>
</dataConfig>

Resources