I want to retrieve the name from the user but it returns the id only.
I am using solr5.5.0
<dataConfig>
<dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://server:3306/dbname" user="user" password="pwd"/>
<document name="user">
<entity name="user" query="select id,name from user">
<field column="id" name="id"/>
<field column="name" name="name"/>
</entity>
</document>
</dataConfig>
<field type="int" indexed="true" stored="true" name="id" />
<field multiValued="true" name="name" type="text" indexed="true" stored="true" />
output
response:
{
"numFound": 38,
"start": 0,
"docs": [
{
"id": "1",
"_version_": 1527443171669180400
},
{
"id": "3",
"_version_": 1527443171672326100
},
Related
I am working on a project where the specification requires a parent - child relationship within the Solr data collection ... i.e. a user and the collection of languages they speak (each of which is made up of multiple data fields). My production system is a 4.10 Solr implementation but I have a 5.5 implementation as my disposal as well. Thus far, I am not getting this to work on either one and I have yet to find a complete documentation source on how to implement this.
The goal is to get a resulting document from Solr that looks like this:
{
"id": 123,
"firstName": "John",
"lastName": "Doe",
"languagesSpoken": [
{
"id": 243,
"abbreviation": "en",
"name": "English"
},
{
"id": 442,
"abbreviation": "fr",
"name": "French"
}
]
}
In my schema.xml, I have flatted out all of the fields as follows:
<field name="id" type="int" indexed="true" stored="true" required="true" multiValued="false" />
<field name="firstName" type="text_general" indexed="true" stored="true" />
<field name="lastName" type="text_general" indexed="true" stored="true" />
<field name="languagesSpoken" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="languagesSpoken_id" type="int" indexed="true" stored="true" />
<field name="languagesSpoken_abbreviation " type="text_general" indexed="true" stored="true" />
<field name="languagesSpoken_name" type="text_general" indexed="true" stored="true" />
The latest rendition of my db-data-config.xml looks like this:
<dataConfig>
<dataSource driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:...." />
<document name="clients">
<entity name="client" query="SELECT * FROM clients" deltaImportQuery="SELECT * FROM clients WHERE id = ${dih.delta.id}" deltaQuery="SELECT id FROM clients WHERE updateDate > '${dih.last_index_time}'">
<field column="id" name="id" />
<field column="firstName" name="firstName" />
<field column="lastName" name="lastName" />
<entity name="languagesSpoken" child="true" query="SELECT id, abbreviation, name FROM languages WHERE clientId = ${client.id}">
<field name="languagesSpoken_id" column="id" />
<field name="languagesSpoken_abbreviation" column="abbreviation" />
<field name="languagesSpoken_name" column="name" />
</entity>
</entity>
</document>
...
On the 4.10 server, when the data comes out of Solr, I get one flat document record with the fields for one language inline with the firstName and lastname like this:
{
"id": 123,
"firstName": "John",
"lastName": "Doe",
"languagesSpoken_id": 243,
"languagesSpoken_abbreviation ": "en",
"languagesSpoken_name": "English"
}
On the 5.5 server, when the data comes out, I get separate documents for the root client document and the child language documents with no relationship between them like this:
{
"id": 123,
"firstName": "John",
"lastName": "Doe"
},
{
"languagesSpoken_id": 243,
"languagesSpoken_abbreviation": "en",
"languagesSpoken_name": "English"
},
{
"languagesSpoken_id": 442,
"languagesSpoken_abbreviation": "fr",
"languagesSpoken_name": "French"
}
I have spent several days now trying to figure out what is going on here to no avail. Can anybody provide me with a pointer as to what I am missing here?
Thanks,
-- Jeff
You may want to flatten your json objects like below before you import into SOLR;
https://stackoverflow.com/a/19101235/929902
POST http://localhost:8983/solr/ggg_core/update?boost=1.0&commitWithin=1000&overwrite=true&wt=json HTTP/1.1
Then once you read from SOLR, you can unflatten it in similar way.
I am trying to index a nested structure as below and having difficulty indexing both with SOlrJ and the DIH. I have battled with this for a while and would really appreciate some help on this.
How do i fix this with either SolrJ or DIH.
Thanks
What i want my data to look like my index:
"docs": [
{
"name": "MR INCREDIBLE ",
"id": 101,
"job": "super hero",
"_version_": "1483934897344086016"
"children": [
{
"c_name":"Violet"
"c_age":10
"c_gender":"female"
},
{
"c_name":"Dash"
"c_age":8
"c_gender":"male"
}
]
}
]
My schema.xml
<schema name="datasearch" version="1.5">
<uniqueKey>id</uniqueKey>
<fields>
<field name="_version_" type="long" indexed="true" stored="true" />
<field name="_root_" type="string" indexed="true" stored="false"/>
<field name="id" type="string" indexed="true" stored="true" />
<field name="name" type="text" indexed="true" stored="true" />
<field name="job" type="string" indexed="true" stored="true"/>
<!-- I want to add children here -->
<!-- <field name="children" indexed="true" stored="true"/> -->
<field name="c_name" type="string" indexed="true" stored="true"/>
<field name="c_age" type="int" indexed="true" stored="true"/>
<field name="c_sex" type="string" indexed="true" stored="true"/>
</fields>
<types>
<fieldType name="string" class="solr.TrieLongField" />
<fieldType name="int" class="solr.TrieIntField" />
<fieldType name="date" class="solr.TrieDateField" omitNorms="true" />
<fieldType name="long" class="solr.StrField" sortMissingLast="true"/>
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
</types>
<defaultSearchField>name</defaultSearchField>
</schema>
SolrJ Attempt
val serverUrl = current.configuration.getString("solr.server.url").get
val solr = new HttpSolrServer(serverUrl)
def testAddChildDoc={
val doc = {
new SolrInputDocument(){
addField("id", "101")
addField("name", "Mr Incredible")
}
}
val c1 = new SolrInputDocument(){
addField("c_name", "violet")
addField("c_age", 10)
}
val c2 = new SolrInputDocument(){
addField("c_name", "dash")
addField("c_age", 8)
}
doc.addChildDocument(c1)
doc.addChildDocument(c2)
solr.deleteByQuery("*:*")
solr.add(doc)
solr.commit(true, true)
}
Response
=>ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: [doc=null] missing required field: id
[RemoteSolrException: [doc=null] missing required field: id]
So i go ahead and add id to childDocs making the above
...
val c1 = new SolrInputDocument(){
addField("id", "101")
addField("c_name", "violet")
addField("c_age", 10)
}
val c2 = new SolrInputDocument(){
addField("id", "101")
addField("c_name", "dash")
addField("c_age", 8)
}
.....
Then rerun the get-all query, now i get the results below
SolrJ Attempt 2 plus get-all query
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"indent": "true",
"q": "*:*",
"_": "1415194092582",
"wt": "json"
}
},
"response": {
"numFound": 3,
"start": 0,
"docs": [
{
"id": 101,
"c_name": violet,
"c_age": "10",
},
{
"id": 101,
"c_name": dash,
"c_age": "8"
},
{
"id": 101,
"name": "Mr Incredible",
"_version_": "1483938552238571520"
}
]
}
}
So i give up here and try the DIH as below
db-dataconfig.xml
<dataConfig>
<dataSource type="JdbcDataSource"
driver="org.postgresql.Driver"
url="jdbc:postgresql://xxx:5432/xxxx"
user="xx" password="xx"
readOnly="true" autoCommit="false" transactionIsolation="TRANSACTION_READ_COMMITTED" holdability="CLOSE_CURSORS_AT_COMMIT" />
<document>
<entity name="parent" query="select id,name, job from PARENTS LIMIT 1" >
<field column="name"/>
<field column="id"/>
<field column="job"/>
<entity child="true" name="children" query="select c_name, c_gender, c_age from CHILDREN" where="pid = ${parent.id}" processor="CachedSqlEntityProcessor">
<field column="c_age" />
<field column="c_gender" />
<field column="c_name"/>
</entity>
</entity>
</document>
</dataConfig>
query get-all after full import with DIH as above and no children indexed
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"indent": "true",
"q": "*:*",
"_": "1415195060664",
"wt": "json"
}
},
"response": {
"numFound": 1,
"start": 0,
"docs": [
{
"name": "Mr Incredible",
"id": 101,
"_version_": "1483939357483073536"
}
]
}
}
To be able to use child="true" in DIH apply the patch from https://issues.apache.org/jira/browse/SOLR-5147 (I think it's the same DIH patch at solr-3076).
The patch itself seems to be incompatible in neglectable details with the current trunk.
In order to get the following response from Solr 4.10.1
{
"name": "MR INCREDIBLE ",
"id": 101,
"job": "super hero",
"type": "parent",
"_root_":"101"
"_version_": "1483934897344086016"
"childDocuments": [
{
"c_name":"Violet",
"c_age":10,
"c_gender":"female",
"id":"101_Violet",
"_root_":"101"
},
{
"c_name":"Dash",
"c_age":8,
"c_gender":"male",
"id":"101Dash",
"_root_":"101"
}
]
}
"type" field needs to be defined in the schema to differentiate between parent and child documents:
<fields>
<field name="_version_" type="long" indexed="true" stored="true" />
<field name="_root_" type="string" indexed="true" stored="false"/>
<field name="id" type="string" indexed="true" stored="true" />
<field name="name" type="text" indexed="true" stored="true" />
<field name="job" type="string" indexed="true" stored="true"/>
<field name="c_name" type="string" indexed="true" stored="true"/>
<field name="c_age" type="int" indexed="true" stored="true"/>
<field name="c_gender" type="string" indexed="true" stored="true"/>
<field name="type" type="string" indexed="true" stored="true" />
</fields>
Child documents also need to have an unique "id", just like any other document.
All the documents in the index should be in parent/child relation, otherwise the queries may return unexpected results. In case you need documents which are neither parents or children, assign them a fake parent.
SolrJ
To work with child/parent docs, solrj.jar version 4.5 or higher is required.
SolrServer solr = new HttpSolrServer(serverUrl);
SolrInputDocument doc = new SolrInputDocument();
String id = "101";
doc.addField("id", id);
doc.addField("name", "Mr Incredible");
doc.addField("job", "super hero");
doc.addField("type", "parent");
SolrInputDocument childDoc1 = new SolrInputDocument();
String name1 = "Violet";
childDoc1.addField("id", id + "_" + name1);
childDoc1.addField("c_name", name1);
childDoc1.addField("c_age", 10);
childDoc1.addField("c_gender", "female");
doc.addChildDocument(childDoc1);
SolrInputDocument childDoc2 = new SolrInputDocument();
String name2 = "Dash";
childDoc2.addField("id", id + "_" + name2);
childDoc2.addField("c_name", name2);
childDoc2.addField("c_age", 8);
childDoc2.addField("c_gender", "male");
doc.addChildDocument(childDoc2);
solr.add(doc);
solr.commit();
Finally, the query looks like this:
http://localhost/solr/core/select?q={!parent which='type:parent'}&fl=*,[child parentFilter=type:parent]&wt=json&indent=true
To get only results of female gender:
http://localhost/solr/core/select?q={!parent which='type:parent'}c_gender:female&fl=*,[child parentFilter=type:parent childFilter=c_gender:female]&wt=json&indent=true
Hi can anybody point me in the right direction for using Solr's Data Import Handler (DIH) to create an array of strings based on the SQL query.
My Solr DIH config looks like this:
<dataConfig>
<dataSource driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost:5432/data"
user="xxxxx"
password="xxxxxx" />
<document>
<entity name="item" query="select id, subject from table1">
<field column="id" name="id" />
<field column="subject" name="subject" />
<entity name="ip_address" query="select ip_address from table2 where id='${item.id}'">
<field column="ip_address" name="ip_address" />
</entity>
</entity>
</document>
</dataConfig>
The query on table2 actually returns multiple items so I need this to be reflected in my documents.
e.g. :
{
"numFound": 1,
"start": 0,
"docs": [
{
"id": "29331109",
"subject": "Test document",
"ip_address": [
"88.103.210.139",
"88.103.210.144",
"88.103.210.133"
],
"_version_": 1468439879154139100
}
]
}
This is almost working for me except that Solr is only populating the first ip_address in my documents.
Here's the relevant part of my Schema:
<!-- Custom Field names -->
<field name="serial_number" type="string" indexed="true" stored="true"/>
<field name="subject" type="text_general" indexed="true" stored="true"/>
<field name="ip_address" type="string" indexed="true" stored="true" multiValued="true"/>
How is the "ip_address" field defined in schema.xml? It should be multiValued field.
We need a full-text search for a db with millions of records (music meta-data) and I've only been working on Solr for 2 weeks roughly, I need some help regarding indexing. I am using DataImportHandler and have SQL query that generates result like this:
As you can see in the attached image above, the id (Integer data type) is repeated in the SQL result also used for in DIH and when I set uniqueKey to <uniqueKey>id</uniqueKey> solr overwites the values leaving only one record/row , in fact I think the last one processed which is the one with countryCode 'TL'.
When I first had this issue, I knew why solr was overwriting the value, its's normal so I thought of adding a global identifer to each record in db, a guid - without thinking things properly, I ended up up with same duplicates as you can see charGuid which is a uuid() from MySQL is duplicated.
But when I use the charGuid (String data type) as uniqueKey to <uniqueKey>charGuid</uniqueKey>, I get all records indexed and nothing is overwritten but of course duplicates are inevitable. The problem I for-see here is when I have to do an incremental update, solr will not be able to know which document to update exactly, In fact a quick test from admin console, revealed that the last or first record its find with that unique key is updated. - This is not acceptable.
I stumbled upon an article referencing multiValued="true", I thought making the fields that represents a JOIN column in my SQL will do the trick, but it doesn't. I was hoping a record with id:10 will be returned with a List of countryCode but no.
I am just puzzled as to how to circumvent this issue and why I did not find a similar problem posted by someone.
If I don't get a meaningful answer, I guess I will have to use charGuid as <uniqueKey> which allows duplicate and then use Solr Document Deduplication Detection to handle updates of my index but I want to believe, there is a better way.
Update
Here is my data-config.xml and schema.xml defination:
<entity name="albums" query="select * from Album">
<entity name="track" query="select t.id as id, t.title as trackTitle, t.removed as trackRemovedDate, t.productState from Track t where t.albumId='${albums.id}'"/>
<entity name="albumSalesAreaId" query="select asa.salesAreaId as albumSalesAreaId from AlbumSalesArea asa where asa.albumId='${albums.id}'"/>
<entity name="albumSalesArea" query="select sa.name as albumSalesArea from SalesArea sa where sa.id='${albumSalesAreaId.salesAreaId}'"/>
<entity name="salesAreaCountry" query="select sac.countryId as 'salesAreaCountry' from SalesAreaCountry sac where sac.salesAreaId ='${salesArea.id}'"/>
<entity name="countryId" query="select c.id as 'countryId' from Country c where c.id = '${salesAreaCountry.countryId}'"/>
<entity name="countryName" query="select c.name as 'countryName' from Country c where c.id = '${salesAreaCountry.countryId}'"/>
</entity>
**Schema.xml**
<!--new multivalue fields -->
<field name="albumSalesArea" type="int" stored="true" indexed="true" multiValued="true"/>
<field name="albumSalesAreaId" type="int" indexed="true" stored="true" multiValued="true"/>
<field name="salesAreaCountry" type="int" stored="true" indexed="true" multiValued="true"/>
<field name="countryId" type="int" indexed="true" stored="true" multiValued="true"/>
<field name="countryName" type="text_general" indexed="true" stored="true" multiValued="true"/>
When I compare my solr response with SQL result, I see countryCode but solr has none, only returned
"albumSalesAreaId": [
1,
3
],
Not sure why country etc not showing up.
Update 2
data-config.xml
<document name="content">
<entity name="albums" query="select * from Album">
<entity name="tracks" query="select t.id, t.title, t.removed, t.productState from Track t where t.albumId='${albums.id}'">
<field column="id" name="id" />
<field column="title" name="trackTitle" />
<field column="removed" name="trackRemovedDate" />
<field column="productState" name="trackProductState" />
</entity>
<entity name="albumSalesAreaIds" query="select salesAreaId from AlbumSalesArea where albumId = '${albums.id}'">
<field column="salesAreaId" name="albumSalesAreaId"/>
</entity>
<entity name="albumSalesAreaNames" query="select name from SalesArea where id = '${albumSalesAreaIds.salesAreaId}'">
<field column="name" name="albumSalesArea"/>
</entity>
<entity name="salesAreaCountryIds" query="select countryId from SalesAreaCountry where salesAreaId ='${albumSalesAreaIds.salesAreaId}'">
<field column="countryId" name="countryId" />
</entity>
<entity name="salesAreaCountry" query="select name from Country where id ='${salesAreaCountryIds.countryId}'">
<field column="name" name="countryName" />
</entity>
<field column="title" name="albumTitle"/>
<field column="removed" name="albumRemovedDate"/>
<field column="productState" name="albumProductState" />
</entity>
</document>
schema.xml
<field name="catchall" type="text_general" stored="true" indexed="true" multiValued="true"/>
<field name="publisher" type="text_general" indexed="true" stored="true"/>
<field name="uuid" type="binary" indexed="false" stored="true"/>
<field name="trackRemovedDate" type="tdate" indexed="true" stored="true"/>
<field name="albumRemovedDate" type="tdate" indexed="true" stored="true"/>
<field name="trackProductState" type="int" indexed="true" stored="true"/>
<field name="albumProductState" type="int" indexed="true" stored="true"/>
<field name="countryCode" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="albumTitle" type="text_general" indexed="true" stored="true"/>
<field name="trackTitle" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="guid" type="text_general" indexed="true" stored="true"/>
<!--new multivalue fields -->
<field name="albumSalesAreaId" type="int" indexed="true" stored="true" multiValued="true"/>
<field name="salesAreaCountry" type="int" stored="true" indexed="true" multiValued="true"/>
<field name="countryId" type="int" indexed="true" stored="true" multiValued="true"/>
<field name="countryName" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="albumSalesArea" type="text_general" indexed="true" stored="true" multiValued="true"/>
sample solr response for id:5
{
"responseHeader": {
"status": 0,
"QTime": 1,
"params": {
"indent": "true",
"q": "id:5",
"_": "1383221233535",
"wt": "json"
}
},
"response": {
"numFound": 1,
"start": 0,
"docs": [
{
"id": "5",
"catchall": [
"5",
"Test Album 5",
"2011-10-21 00:00:00.0",
"[B#261ca3cb",
"Test Track 1",
"Ya man 2",
"2011-10-17 16:21:29.0",
"1",
"1450412569164513280"
],
"albumTitle": "Test Album 5",
"albumRemovedDate": "2011-10-21T00:00:00Z",
"uuid": "6oT/MMl+RDaPyKpGK1KN0w==",
"trackTitle": [
"Test Track 1",
"Ya man 2"
],
"trackRemovedDate": "2011-10-17T16:21:29Z",
"albumSalesAreaId": [
1
],
"_version_": 1450412569164513300
}
]
}
}
SQL result for id:5
trackTitle and albumSalesAreaId seem to be correct but not sure why others not been included however if hard code the albumSalesAreaNames entiy with from SalesArea where id = 1, then I get albumSalesArea field added to result, so it seem like from SalesArea where id = '${albumSalesAreaIds.salesAreaId}'" is returning null, also confirmed from by 'IN' test earlier.
This looks really a problem simply solved with a multivalued field.
If you use multivalued field in this structure what you will obtain is one document with ID=10, all the duplicated values will just be there once and all other fields will be multivalued. For example the NAME field will contain 4 different countries and so the country_code.
have a look at this article on how to structure your dataimportHandler to achieve this:
http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example
basically you need one query for each multivalued field:
<dataConfig>
<dataSource driver="org.hsqldb.jdbcDriver" url="jdbc:hsqldb:/temp/example/ex" user="sa" />
<document name="products">
<entity name="item" query="select * from item">
<field column="ID" name="id" />
<field column="code" name="code" />
<entity name="countryName" query="select name from countrytable where item_id='${item.ID}'">
<field name="name" column="description" />
</entity>
<entity name="countryCode" query="select countryCode from countrytable where item_id='${item.ID}'">
</entity>
</entity>
</document>
(Posted on behalf of the OP).
SOLUTION
<entity name="albumSalesAreaNames" query="select name from SalesArea where id = '${albumSalesAreaIds.salesAreaId}'">
<field column="name" name="albumSalesArea"/>
</entity>
<field column="salesAreaId" name="albumSalesAreaId"/>
</entity>
I am using Solr to index my database of reports. Reports can have text, submitter information, etc. This currently works and looks like this:
"docs": [
{
"Text": "Some Report Text"
"ReportId": "1",
"Date": "2013-08-09T14:59:28.147Z",
"SubmitterId": "11111",
"FirstName": "John",
"LastName": "Doe",
"_version_": 1444554112206110700
}
]
The other thing a report can have is viewers (which is a one-to-many relationship between a single report and the viewers.) I want to be able to capture those viewers like this in my JSON output:
"docs": [
{
"Text": "Some Report Text"
"ReportId": "1",
"Date": "2013-08-09T14:59:28.147Z",
"SubmitterId": "11111",
"FirstName": "John",
"LastName": "Doe",
"Viewers": [
{ ViewerId: "22222" },
{ ViewerId: "33333" }
]
"_version_": 1444554112206110700
}
]
I cannot seem to get that to happen, however. Here is my data-config.xml (parts removed that aren't necessary to the question):
<entity name="Report" query="select * from Reports">
<field column="Text" />
<field column="ReportId" />
<!-- Get Submitter Information as another entity. -->
<entity name="Viewers" query="select * from ReportViewers where Id='${Report.ReportId}'">
<field column="Id" name="ViewerId" />
</entity>
</entity>
And the schema.xml:
<field name="Text" type="text_en" indexed="true" stored="true" />
<field name="ReportId" type="string" indexed="true" stored="true" />
<field name="Viewers" type="string" indexed="true" stored="true" multiValued="true" />
<field name="ViewerId" type="string" indexed="true" stored="true" />
When I do the data import, I just don't see anything. No errors, nothing apparently wrong, but I'm pretty sure my data-config and/or my schema are not correct. What am I doing wrong?
Unfortunately Solr does not allow nesting (see http://lucene.472066.n3.nabble.com/Possible-to-have-Solr-documents-with-deeply-nested-data-structures-i-e-hashes-within-hashes-td4004285.html). You need to flatten your data!
So
"Viewers": [
{ ViewerId: "22222" },
{ ViewerId: "33333" }
]
is not possible. Instead flatten it and have a ViewerIds array:
"ViewerIds": ["22222", "33333" ]
In your schema, you will have:
<field name="ViewerIds" type="string" indexed="true" stored="true" multiValued="true" />
and modify your data-config accordingly.