Check #lptr's comment for solution
I have this piece of XML from which I need to extract the values and ids using SQL Server:
<root>
<field id="1" value="gfjsdgfdjy duahsd "absdjsd"" />
<field id="37" value="ysgfdyua" />
<field id="13" value="asdas" />
<field id="73" value="fgdgfd" />
<field id="adsf" value="fdsa" />
</root>
This is what I use to extract the values and ids from that XML, which is stored into variable #test, and insert them into a temp table:
insert into #tmp (field, val)
select field.value('#id', 'nvarchar(100)') as fieldID,
field.value('#value', 'nvarchar(200)') as val
from #test.nodes('root/field') A(field)
That query works fine until there's a value that has double quotes like in the example above, which throws the followig error: XML parsing: line 1, character 108, whitespace expected
Any way of working around that?
I have to mention that I do not create these XMLs by hand, but get them from a DB, so any mistakes in their creation is not my fault.
Conformant XML parsers are required to report all well-formedness errors, so any conformant XML parser will reject this ill-formed XML.
You can sometimes get around this by using parsers (which are not conformant XML parsers) that attempt to repair errors. However, I don't know if any of them are capable of handling this particular problem, Note that in the general case it can't be detected, consider
<root>
<field id="1" value="some " inner " 3" />
<field id="2" value="some " inner= " 3" />
</root>
The second field is well-formed XML.
Whenever a supplier provides you with ill-formed XML, you need to complain, as you would with any other product defect. Usually I have found suppliers very responsive to such error reports. It's surprising how often the XML export capability was entrusted to some junior programmer with no XML knowledge or experience, even if the company concerned is a very professional outfit.
Related
I'm looking to configure SOLR to query a table based on certain data.
I unfortunately have to work with how the Database is setup, but here's what I'm after.
I have a table named Company that will contain a certain "prefix" value.
I want to use that prefix value to determine what tables I should query for the DIH.
As a quick sample:
<entity name="company" query="Select top 1 prefix from Company">
<field name="prefix" column="prefix"/>
<entity name="item" query="select * from ${company.prefix}item">
<field column="ItemID" name="id"/>
<field column="Description" name="description/>
</entity>
</entity>
However I only ever seem to get 1 document processed despite that table containing over 200,000 rows.
what am I doing wrong?
I think you could achieve this by:
using an stored procedure. You can call a sp from DIH as seen here
inside the stored procedure, you can do the table lookup as needed, and then return the results from the real query.
Depending on how good you are with MSSql-s SQL, you might be able to just put everything into a single SQL query and use that directly in DIH, but not sure about that.
A quick one, I am looking for a tool for data generation. I have an entity with dates; the date it was made, a start date and an end date. I want the data generation to take care of this constraints:
made maybe today or some day after
start maybe equal to made but not before
end maybe only be a day after start or any other date after start
I looked at http://generatedata.com and http://mockaroo.com, but they didn't have a way i could maintain the constraints. I just need that constraint, but not sure which softwares to try to maintain these constraints. I just need quick data to test my application. thanks
and just a by and by, have you ever been in such a situation where what you need you can't find?
benerator is the tool to use, which is is very flexible though one needs to learn it pretty fast. with my above situation, in the xml file for benerator (that's what it uses ), i just write the following and i'm good to go. in fact, i can even now put ranges for made, start and end dates. This is a section of a generate tag for 30 records of an entity (let's call it MY_ENTITY) with those dates
<import class="org.databene.commons.TimeUtil"/>
<generate name="MY_ENTITY" count="30" consumer="ENTITY_OUT">
<attribute name="MADE_DATE" type="date" script ="TimeUtil.today()" />
<variable name= "for_startDate" type="int" min="0" max="10" />
<attribute name="START_DATE" type="date" script="TimeUtil.addDays(this.MADE_DATE,
for_startDate)" nullable="false"/>
<variable name="for_endDate" type="int" min="1" max="10" />
<attribute name="END_DATE" type = "date" script="TimeUtil.addDays(this.START_DATE,
for_endDate)" nullable="false"/>
</generate>
and benerator supports many databases through JDBC, and it comes loaded with several JDBC drivers. try it here http://bergmann-it.de/test-software/index.php?lang=en. it's open source
I have a couple of nested ResultMaps in iBatis that have exactly same database column names. This is causing ambiguity and resulting in incorrect result being retrieved for the different database tables.
For e.g.,
`
<sql namespace="Shipment">
<resultMap id="consignment" class="com.model.Consignment">
<result property="consignmentId" column="Consignment_cd" />
<result property="shipmentCd" column="Shipment_cd" />
<result property="shipmentUnit" column="Shipment_Unit" />
<result property="location" resultMap="Shipment.size" />
</resultMap>
<resultMap id="size" class="com.model.Size">
<result property="consignmentId" column="Consignment_cd" />
<result property="shipmentCd" column="Shipment_cd" />
<result property="shipmentUnit" column="Shipment_Unit" />
</resultMap>
</sql>
`
Now when I write my select query joining the Size & Consignment tables, I get the same values for Shipment Code and Shipment Unit returned, whereas there are different values for these two columns in the database. Please note that I need both the Shipment Code and Unit from both the Size and Consignment levels pulled up in a single query.
Could someone help me solve this problem?
The only solution I found to this issue is prefixing the column names with the table name or short name. You are writing the queries yourself right?
Your select would look like
select consignment.Shipment_cd as consignment_Shipment_cd,
size.Shipment_cd as size_Shipment_cd
from consignment
join size on whatever
And yes that is pretty heavy if you want to get a lot of stuff in the same query
When using DataImportHandler with SqlEntityProcessor, I want to have several definitions going into the same schema with different queries.
How can I search both type of entities but also distinguish their source at the same time. Example:
<document>
<entity name="entity1" query="query1">
<field column="column1" name="column1" />
<field column="column2" name="column2" />
</entity>
<entity name="entity2" query="query2">
<field column="column1" name="column1" />
<field column="column2" name="column2" />
</entity>
</document>
How to get data from entity 1 and from entity 2?
As long as your schema fields (e.g. column1, column2) are compatible between different entities, you can just run DataImportHandler and it will populate Solr collection from both queries.
Then, when you query, you will see all entities combined.
If you want to mark which entity came from which source, I would recommend adding another field (e.g. type) and assigning to it different static values in each entity definition using TemplateTransformer.
Also beware of using clean command. By default it deletes everything from the index. As you are populating the index from several sources, you need to make sure it does not delete too much. Use preImportDeleteQuery to delete only entries with the same value in the type field that you set for that entity.
I'm creating a BCP IN process to load a pipe delimited file into a table. The BCP utility gives me the error Error = [Microsoft][SQL Server Native Client 10.0][SQL Server]Syntax error at line 4 column 51 in xml format file when it runs. The first rows of the format file are as follows:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="|", MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS" />
The location of the error is right at the "|" of the FIELD entry. I have tried various variations of single and double quotes, no quotes and slash in front of and around the pipe, but the error is always thrown at the same spot.
The file itself is just multiple rows of data with a pipe delimiter and short header/trailer records at the top and bottom of the file.
Not sure if this is the issue but the comma after the terminator attribute doesn't look right
TERMINATOR="|",
I can't tell from the question, but if this has something to do with Command windows, batch files, and/or not-so-subtle flat file manipulation, you may be being hit by the fact that the pipe character has a very specific use. There are a number of obscure ways to "escape" special characters in these environments... and while I never found any solid documentation back when I was fighting that fight, over time I determined that using one or a combination of quotes, ^, and & would often solve my problems. For example, you might carefully try the following:
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="^|", MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS" />
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="&|", MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS" />
Of course it might just be #Conrads comma problem, in which case ignore all this.