Not wanting to clog up the question, I've left out most of the code but I can put it in if it helps.
using Breeze 1.4.9 and Breeze.angular v.0.9.0
I have a simple model: a ChartDefinition has a single DataQuery, and that DataQuery has some parameters.
I have a breeze query:
var query = breeze.EntityQuery
.from("ChartDefinitions")
.expand(["DataQuery","DataQuery.Parameters"]);
//.noTracking();
I can see the server's response (i've replaced most of the simple properties with '...'):
[{"$id":"1","$type":"itaprm4.Domain.ChartDefinition, itaprm4","Id":1,"Title":"FirstChart", ... ,
"DataQuery":
{"$id":"2","$type":"itaprm4.Domain.DataQuery, itaprm4","Id":1, ... ,
"Parameters":
[{"$id":"3","$type":"itaprm4.Domain.DataQueryParameter, itaprm4","Id":1, ...}]
}
}
,{"$id":"4","$type":"itaprm4.Domain.ChartDefinition, itaprm4","Id":2,"Title":"ProjectBudgets", ... ,
"DataQuery":
{"$id":"5","$type":"itaprm4.Domain.DataQuery, itaprm4","Id":2, ... ,
"Parameters":[]
}
},
{"$id":"6","$type":"itaprm4.Domain.ChartDefinition, itaprm4","Id":3,"Title":"ProjectActuals", ... ,
"DataQuery":
{"$id":"7","$type":"itaprm4.Domain.DataQuery, itaprm4","Id":3, ... ,
"Parameters":
[{"$id":"8","$type":"itaprm4.Domain.DataQueryParameter, itaprm4","Id":2,"DataQueryId":3, ...},
{"$id":"9","$type":"itaprm4.Domain.DataQueryParameter, itaprm4","Id":3,"DataQueryId":3, ...}
]
}
}]
After the entities have been materialised though, that last DataQuery object ($id:7) has a parameters array but, it only contains the last parameter ($id:9).
Digging around in breeze.debug I saw that noTracking causes the materialisation code down a different path so tacked the noTracking() option onto the query. This results in both the paramters appearing in the materlised Parameters array. (I'm assuming that since breeze can materialise the object graph correctly, there isn't anything wrong with the code on the server? so I haven't included it in this question...)
I would simply keep the noTracking option on but, I'm registering a constructor function with breeze and it doesn't get called if noTracking is on.
store.registerEntityTypeCtor('ChartDefinition', ChartDefinition);
Is there something else I need to do to get the parameters array filled without the noTracking option?
Edit:
Another observation : without the noTracking option, the DataQueryParameter with $id:8 actually ends up in the parameters array of the DataQuery with $id:5
Turns out this had a lot to do with what was on the server!
Our nHibernate set-up was using a different name for the DataQueryId property on the DataQuery class (the devs in the team tell me there were some issues with updating entities and doing this solved that issue):
<class name="DataQuery" table="sys_DataQuery" dynamic-update="true" >
<id name="Id" column="DataQueryId" type="int" unsaved-value="0">
<generator class="identity" />
</id>
...
<bag name="Parameters" cascade="all-delete-orphan">
<key column="DataQueryId"/>
<one-to-many class="DataQueryParameter"/>
</bag>
</class>
<class name="DataQueryParameter" table="sys_DataQueryParameter" dynamic-update="true" >
...
<property name="DataQueryId" type="int" not-null="true" insert="true" update="true" />
...
</class>
With matching identifiers in the class definitions.
Changing the Id to DataQueryId solved my problem:
<class name="DataQuery" table="sys_DataQuery" dynamic-update="true" >
<id name="DataQueryId" column="DataQueryId" type="int" unsaved-value="0">
<generator class="identity" />
</id>
...
This seems to make sense; how would breeze know to match DataQueryParamter.DataQueryId to DataQuery.Id but, I have no idea why Breeze could correctly materialise the object graph with noTracking switched on though?
Related
I need to index on Solr a bean that contains a generic spatial field (generally, a polygon).
I configured my Solr core schema in this way (following the tutorial here):
<fieldType name="area" class="solr.RptWithGeometrySpatialField" spatialContextFactory="org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory"
autoIndex="true"
validationRule="repairBuffer0"
distErrPct="0.025"
maxDistErr="0.001"
distanceUnits="kilometers" />
....
<field name="location" type="area" indexed="true" stored="true" required="true" multiValued="false" />
My bean class is as follows:
public class MySolrBean {
#Field("id")
private String id;
#Field("location")
private Geometry location;
// getters and setters...
}
where Geometry refers to com.vividsolutions.jts.geom.Geometry (jts-1.13)
When I try to add a new bean to the index with SolrClient.addBean(Object) I get the following error:
Unable to parse shape given formats "lat,lon", "x y" or as WKT because java.text.ParseException: Unknown Shape definition [com.vividsolutions.jts.geom.Polygon:POLYGON ((1 0, 0.9980267284282716 0.0627905195293134, 0.9921147013144779 0.12533323356430...]
where WKT representation of my polygon is prefixed by the class fqn. I remember I saw a similar problem some time ago, this time when using ZonedDateTime: I changed my code to use java.util.Date and everything worked.
Though now I would not know which class to use instead of com.vividsolutions.jts.geom.Geometry and surfing the web I didn't find any documentation about that.
Anyone can help me out sorting this issue?
EDIT
Forgot to mention I'm using the latest Solr and Solrj distribution: 6.5.1
Let me preface by mentioning that I've been through everything I could find about this topic including the Solr docs and all of the SO questions.
I have a Solr instance that I've setup with a Data Import Hanlder to pull in data from MSSQL using the JDBC driver. The data comes in, but it isn't structured as I'd expect based on the Solr DIH documentation
<document>
<entity>
<entity />
</entity>
</document>
I've tried all the attributes like rootEntity, flatten, using CachedSqlProvider, etc. With multiValued="True" The result ends up
docs [
{
recordId: '1234',
name: 'whatever'
subrows_col1: ['x','y','z']
subrows_col2: ['a','b','c']
}
]
When I'm looking for
docs [
{
recordId: '1234',
name: 'whatever'
subrows: [{
col1: 'x',
col2: 'a'
},
{
col1: 'y',
col2: 'b'
},
{
col1: 'z',
col2: 'c'
}]
} ]
I've seen the block-join stuff, but I'm confused as to where it goes. I added
<add>
<doc>
<field />
<doc>
<field />
</doc>
<doc>
</add>
to the DIH requestHandler, but it did nothing. I added it to the /update requestHandler and I got an error. I have no clue where that is supposed to go. Does it only work during a query or is it only for when you push data to solr via /update?
Where do I define the structure for the document? I tried nested fields in the schema, entities in the DIH config and the block-join stuff in the requestHandlers. nothing has worked yet.
Obviously I'm missing something.
Indexing nested document in DIH is finally supported from Solr 5.1 onwards.
https://issues.apache.org/jira/browse/SOLR-5147
Simply adding child=true to the child entity, then Solr DIH will automagically indexes as child document.
Example taken from JIRA (in the link above) :
<document>
<entity name='PARENT' query='select * from PARENT'>
<field column='id' />
<field column='desc' />
<field column='type_s' />
<entity child='true' name='CHILD' query="select * from CHILD where parent_id='${PARENT.id}'">
<field column='id' />
<field column='desc' />
<field column='type_s' />
</entity>
</entity>
</document>
I've also decompiled DocBuilder.class in solr-dataimporthandler-5.3.0.jar, found this code snippet : -
if (doc != null) {
if (epw.getEntity().isChild())
{
childDoc = new DocWrapper();
handleSpecialCommands(arow, childDoc);
addFields(epw.getEntity(), childDoc, arow, vr);
doc.addChildDocument(childDoc);
}
else
{
handleSpecialCommands(arow, doc);
addFields(epw.getEntity(), doc, arow, vr);
}
}
Noticed that if epw.getEntity().isChild() will return true if child="true" is set, thus it's creating a new DocWrapper and add as child document instead of simply adding the entity as a bunch of new fields.
DIH does not produce nested documents. Solr supports them, but DIH can't yet generate them.
The nested entities in DIH is to be able to merge sources and to be able to create entities based on iteration from a different source. E.g. if the outer entity reads a file for file names and inner entity loads content from those files with each file getting its own record.
You may want to move your nested object code into the client with SolrJ for now.
Let's say I have two XML document types, A and B, that look like this:
A:
<xml>
<a>
<name>First Number</name>
<num>1</num>
</a>
<a>
<name>Second Number</name>
<num>2</num>
</a>
</xml>
B:
<xml>
<b>
<aKey>1</aKey>
<value>one</value>
</b>
<b>
<aKey>2</aKey>
<value>two</value>
</b>
</xml>
I'd like to index it like this:
<doc>
<str name="name">First Name</str>
<int name="num">1</int>
<str name="spoken">one</str>
</doc>
<doc>
<str name="name">Second Name</str>
<int name="num">2</int>
<str name="spoken">two</str>
</doc>
So, in effect, I'm trying to use a value from A as a key in B. Using DataImportHandler, I've used the following as my data config definition:
<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" />
<document>
<entity name="document" transformer="LogTransformer" logLevel="trace"
processor="FileListEntityProcessor" baseDir="/tmp/somedir"
fileName="A.*.xml$" recursive="false" rootEntity="false"
dataSource="null">
<entity name="a"
transformer="RegexTransformer,TemplateTransformer,LogTransformer"
logLevel="trace" processor="XPathEntityProcessor" url="${document.fileAbsolutePath}"
stream="true" rootEntity="true" forEach="/xml/a">
<field column="name" xpath="/xml/a/name" />
<field column="num" xpath="/xml/a/num" />
<entity name="b" transformer="LogTransformer"
processor="XPathEntityProcessor" url="/tmp/somedir/b.xml"
stream="false" forEach="/xml/b" logLevel="trace">
<field column="spoken" xpath="/xml/b/value[../aKey=${a.num}]" />
</entity>
</entity>
</entity>
</document>
</dataConfig>
However, I encounter two problems:
I can't get the XPath expression with the predicate to match any rows; regardless of whether I use an alternative like /xml/b[aKey=${a.num}]/value, or even hardcoded value for aKey.
Even when I remove the predicate, the parser goes through the B file once for every row in A, which is obviously inefficient.
My question is: how, in light of the problems listed above, do I index the data correctly and efficiently with the DataImportHandler?
I'm using Solr 3.6.2 .
Note: This is a bit similar to this question, but it deals with two XML document types instead of a RDBMS and an XML document.
I have very bad experiences using DataImportHandler for that kind of data. A simple python script to merge your data would probably be smaller than your current configuration and much more readable. Depending on your requirements and data size, you could create a temporary xml file or you could directly pipe results to SOLR. If you really have to use the DataImportHandler, you could use a URLDataSource and setup a minimal server which generates your xml. Obvioulsy I'm a Python fan, but it's quite likely that it's also an easy job in Ruby, Perl, ...
I finally went with another solution due to an additional design requirement I didn't originally mention. What follows is the explanation and discussion. So....
If you only have one or a couple of import flow types for your Solr instances:
Then it might be best to go with Achim's answer and develop your own importer - either, as Achim suggests, in your favorite scripting language, or, in Java, using SolrJ's
ConcurrentUpdateSolrServer.
This is because the DataImportHandler framework does have a sudden spike in its learning curve once you need to define more complex import flows.
If you have a nontrivial number of different import flows:
Then I would suggest you consider staying with the DataImportHandler since you will probably end up implementing something similar anyway. And, as the framework is quite modular and extendable, customization isn't a problem.
This is the additional requirement I mentioned, so in the end I went with that route.
How I solved my particular quandary was indexing the files I needed to reference into separate cores and using a modified SolrEntityProcessor to access that data. The modifications were as follows:
applying the patch for the sub-entity problem,
adding caching (quick solution using Guava, there's probably a better way using an available Solr API for accessing other cores locally, but I was in a bit of a hurry at that point).
If you don't want to create a new core for each file, an alternative would be an extension of Achim's idea, i.e. creating a custom EntityProcessor that would preload the data and enable querying it somehow.
I have the following field type (notice no filters, no tokenizers)
<fieldType name="text_names" class="solr.StrField" />
I create a field in my schema using that type:
<field name="exact_type" type="text_names" indexed="true" stored="true" />
now, I search q=*:*&fq=exact_type:aa&fl=exact_type
I still get results which have other than 'aa' in the exact_type field.
What am I missing here?
Also this behaves the same:
q=exact_type:aa&fl=exact_type
I don't think that "q=*:*" works with DisMaxHandler and I believe that you are using it ,the correct syntax for both the queries should be:
q=&fq=exact_type:aa&fl=exact_type
fq=exact_type:aa&fl=exact_type
I have some XML to ingest into Solr, which sounds like a use case that is intended to be solved by the DataImportHandler. What I want to do is pull the column name from one XML attribute and the value from another attribute. Here is an example of what I mean:
<document>
<data ref="reference.foo">
<value>bar</value>
</data>
</document>
From this xml snippet, I want to add a field with name reference.foo and value bar. The DataImportHandler includes a XPathEntityProcessor for processing XML documents. I've tried using it and it works perfectly if I give it a known column name (e.g, <field column="ref" xpath="/document/data/#ref">) but have not been able to find any documentation or examples to suggest either how to do what I want, or that it cannot be done. So:
Can I do this using XPathEntityProcessor? If so, how?
If not, can I do this some other way with DataImportHandler?
Or am I left with writing my own import handler?
I haven't managed to find a way to do this without bringing in a transformer, but by using a simple ScriptTransformer I worked it out. It goes something like this:
...
<script>
function makePair(row) {
var theKey = row.get("theKey");
var theValue = row.get("theValue");
row.put(theKey, theValue);
row.remove("theKey");
row.remove("theValue");
return row;
}
</script>
...
<entity name="..."
processor="XPathEntityProcessor"
transformer="script:makePair"
forEach="/document"
...>
<field column="theKey" xpath="/document/data/#ref" />
<field column="theValue" xpath="/document/data/value" />
</entity>
...
Hope that helps someone!
Note, if your dynamicField is multivalued, you have to iterate over theKey since row.get("theKey") will be a list.
What you want to do is select the node keying on an attribute value.
From your example, you'd do this:
<field column="ref" xpath="/document/data[#ref='reference.foo']"/>