XmlDiff and Patch Use Scenario in https://docs.microsoft.com/en-us/previous-versions/dotnet/articles/aa302294(v=msdn.10) - xmldiff

I have to be able to compare two xmls which are from the same xsd, get differences including addition, deletion and update. I also need to get a value before the update if update has happened.
XmlDiff and Patch looked promising and I decided to write some codes to try myself based on the example in the article, but I have various different patched-up results depending on how the second xml looks like.
This is my spiked code.
public void TestDiffGram()
{
using (var sw = new StringWriter(CultureInfo.InvariantCulture))
{
var settings = new XmlWriterSettings();
settings.Indent = true;
settings.OmitXmlDeclaration = true;
using (var xw = XmlWriter.Create(sw, settings))
{
var ignore = XmlDiffOptions.IgnoreWhitespace |
XmlDiffOptions.IgnorePrefixes |
XmlDiffOptions.IgnoreNamespaces;
var xmldiff = new XmlDiff(ignore);
xmldiff.Algorithm = XmlDiffAlgorithm.Fast;
xmldiff.Compare("c://Temp//xml1.xml", "c://Temp//xml2.xml", false, xw);
xw.Close();
}
File.WriteAllText("c://Temp//diffgram.xml", sw.ToString());
XmlDocument sourceDoc = new XmlDocument(new NameTable());
sourceDoc.LoadXml(File.ReadAllText("c://Temp//xml1.xml"));
XmlTextReader diffgramReader = new XmlTextReader("c://Temp//diffgram.xml");
new XmlPatch().Patch(sourceDoc, diffgramReader);
XmlTextWriter output = new XmlTextWriter("c://Temp//test.xml", Encoding.Unicode);
sourceDoc.Save(output);
output.Close();
}
}
xml1.xml and xml2.xml are variations of two xmls in the article.
1. Experiment 1
Two xmls have the same number of "model"s in the same order, but have Outback's two tags updated in the second xml.
[xml1.xml]
<PartPriceInfo xmlns:ns1="http://www.Subaru.com">
<ns1:Subaru model="Outback">
<ns1:Muffler> 500 </ns1:Muffler>
<ns1:Bumper> 150 </ns1:Bumper>
<ns1:Floormat> 75 </ns1:Floormat>
<ns1:WindShieldWipers> 20 </ns1:WindShieldWipers>
</ns1:Subaru>
<ns1:Subaru model="Legacy">
<ns1:Muffler> 400 </ns1:Muffler>
<ns1:Bumper> 100 </ns1:Bumper>
<ns1:Floormat> 50 </ns1:Floormat>
<ns1:WindShieldWipers> 20 </ns1:WindShieldWipers>
</ns1:Subaru>
</PartPriceInfo>
[xml2.xml]
<PartPriceInfo xmlns:ns2="http://www.Subaru.com">
<ns2:Subaru model="Outback">
<ns2:Muffler> 600 </ns2:Muffler>
<ns2:Bumper> 150 </ns2:Bumper>
<ns2:Floormat> 75 </ns2:Floormat>
<ns2:WindShieldWipers> 25 </ns2:WindShieldWipers>
</ns2:Subaru>
<ns2:Subaru model="Legacy">
<ns2:Muffler> 400 </ns2:Muffler>
<ns2:Bumper> 100 </ns2:Bumper>
<ns2:Floormat> 50 </ns2:Floormat>
<ns2:WindShieldWipers> 20 </ns2:WindShieldWipers>
</ns2:Subaru>
</PartPriceInfo>
[Patched-up results]
The model Outback has those two elements which are updated, but I expect that the prefix would be ns2, but it is ns1 as below. Why is this?
<?xml version="1.0" encoding="UTF-8"?>
<PartPriceInfo xmlns:ns1="http://www.Subaru.com">
<ns1:Subaru model="Outback">
<ns1:Muffler>600</ns1:Muffler>
<ns1:Bumper>150</ns1:Bumper>
<ns1:Floormat>75</ns1:Floormat>
<ns1:WindShieldWipers>25</ns1:WindShieldWipers>
</ns1:Subaru>
<ns1:Subaru model="Legacy">
<ns1:Muffler>400</ns1:Muffler>
<ns1:Bumper>100</ns1:Bumper>
<ns1:Floormat>50</ns1:Floormat>
<ns1:WindShieldWipers>20</ns1:WindShieldWipers>
</ns1:Subaru>
</PartPriceInfo>
2. Experiment 2
The second xml has Outback's two tags updated and one new model Impreza. This is what is described in the article.
[xml1.xml]
<PartPriceInfo xmlns:ns1="http://www.Subaru.com">
<ns1:Subaru model="Outback">
<ns1:Muffler> 500 </ns1:Muffler>
<ns1:Bumper> 150 </ns1:Bumper>
<ns1:Floormat> 75 </ns1:Floormat>
<ns1:WindShieldWipers> 20 </ns1:WindShieldWipers>
</ns1:Subaru>
<ns1:Subaru model="Legacy">
<ns1:Muffler> 400 </ns1:Muffler>
<ns1:Bumper> 100 </ns1:Bumper>
<ns1:Floormat> 50 </ns1:Floormat>
<ns1:WindShieldWipers> 20 </ns1:WindShieldWipers>
</ns1:Subaru>
</PartPriceInfo>
[xml2.xml]
<PartPriceInfo xmlns:ns2="http://www.Subaru.com">
<ns2:Subaru model="Outback">
<ns2:Muffler> 600 </ns2:Muffler>
<ns2:Bumper> 150 </ns2:Bumper>
<ns2:Floormat> 75 </ns2:Floormat>
<ns2:WindShieldWipers> 25 </ns2:WindShieldWipers>
</ns2:Subaru>
<ns2:Subaru model="Legacy">
<ns2:Muffler> 400 </ns2:Muffler>
<ns2:Bumper> 100 </ns2:Bumper>
<ns2:Floormat> 50 </ns2:Floormat>
<ns2:WindShieldWipers> 20 </ns2:WindShieldWipers>
</ns2:Subaru>
<ns2:Subaru model="Impreza">
<ns2:Muffler> 450 </ns2:Muffler>
<ns2:Bumper> 120 </ns2:Bumper>
<ns2:Floormat> 65 </ns2:Floormat>
<ns2:WindShieldWipers> 20 </ns2:WindShieldWipers>
</ns2:Subaru>
</PartPriceInfo>
[Patched-up results]
Outback Muffler has its prefix ns1, but WindShieldWipers's one is ns2 while both are updated from the second xml. Does anyone understand why Outback Muffler prefix is ns1?
All of elements of Impreza should have its prefix ns2 as this model does not appear in the original xml, but has WindShieldWipers ns1. Why is this?
<?xml version="1.0" encoding="UTF-8"?>
<PartPriceInfo xmlns:ns1="http://www.Subaru.com">
<ns1:Subaru model="Outback">
<ns1:Muffler>600</ns1:Muffler>
<ns1:Bumper>150</ns1:Bumper>
<ns1:Floormat>75</ns1:Floormat>
<ns2:WindShieldWipers xmlns:ns2="http://www.Subaru.com">25</ns2:WindShieldWipers>
</ns1:Subaru>
<ns1:Subaru model="Legacy">
<ns1:Muffler>400</ns1:Muffler>
<ns1:Bumper>100</ns1:Bumper>
<ns1:Floormat>50</ns1:Floormat>
<ns1:WindShieldWipers>20</ns1:WindShieldWipers>
</ns1:Subaru>
<ns2:Subaru xmlns:ns2="http://www.Subaru.com" model="Impreza">
<ns2:Muffler>450</ns2:Muffler>
<ns2:Bumper>120</ns2:Bumper>
<ns2:Floormat>65</ns2:Floormat>
<ns1:WindShieldWipers>20</ns1:WindShieldWipers>
</ns2:Subaru>
</PartPriceInfo>
3. Experiment 3
The second xml has the model Legacy removed, a new model Impreza added and the model Outback's two tags updated.
[xml1.xml]
<PartPriceInfo xmlns:ns1="http://www.Subaru.com">
<ns1:Subaru model="Outback">
<ns1:Muffler> 500 </ns1:Muffler>
<ns1:Bumper> 150 </ns1:Bumper>
<ns1:Floormat> 75 </ns1:Floormat>
<ns1:WindShieldWipers> 20 </ns1:WindShieldWipers>
</ns1:Subaru>
<ns1:Subaru model="Legacy">
<ns1:Muffler> 400 </ns1:Muffler>
<ns1:Bumper> 100 </ns1:Bumper>
<ns1:Floormat> 50 </ns1:Floormat>
<ns1:WindShieldWipers> 20 </ns1:WindShieldWipers>
</ns1:Subaru>
</PartPriceInfo>
[xml2.xml]
<PartPriceInfo xmlns:ns2="http://www.Subaru.com">
<ns2:Subaru model="Outback">
<ns2:Muffler> 600 </ns2:Muffler>
<ns2:Bumper> 150 </ns2:Bumper>
<ns2:Floormat> 75 </ns2:Floormat>
<ns2:WindShieldWipers> 25 </ns2:WindShieldWipers>
</ns2:Subaru>
<ns2:Subaru model="Impreza">
<ns2:Muffler> 450 </ns2:Muffler>
<ns2:Bumper> 120 </ns2:Bumper>
<ns2:Floormat> 65 </ns2:Floormat>
<ns2:WindShieldWipers> 20 </ns2:WindShieldWipers>
</ns2:Subaru>
</PartPriceInfo>
[Patched-up results]
Updated tags of Outback should have the prefix ns2, but only one of them has it??
All of Impreza's tags should be ns2, but they are all ns1??
<?xml version="1.0" encoding="UTF-8"?>
<PartPriceInfo xmlns:ns1="http://www.Subaru.com">
<ns1:Subaru model="Outback">
<ns1:Muffler>600</ns1:Muffler>
<ns1:Bumper>150</ns1:Bumper>
<ns1:Floormat>75</ns1:Floormat>
<ns2:WindShieldWipers xmlns:ns2="http://www.Subaru.com">25</ns2:WindShieldWipers>
</ns1:Subaru>
<ns1:Subaru model="Impreza">
<ns1:Muffler>450</ns1:Muffler>
<ns1:Bumper>120</ns1:Bumper>
<ns1:Floormat>65</ns1:Floormat>
<ns1:WindShieldWipers>20</ns1:WindShieldWipers>
</ns1:Subaru>
</PartPriceInfo>
My expectation is that if a patched-up result has its value from a second file, then it would be associated with the prefix ns2, but sometimes it does. Sometimes it does not. Is my expectation not right? I would be happy to be corrected on the expectation.
Xoxo

Related

In PDF,which coding mode and font does Russian text need?

I want to get a pdf with Russina text.
VS2010 use libharu
My source code file is encoded as Cyrillic(ISO)
Use this code to set font and encoding
detail_font = HPDF_GetFont(pdf, "Times-Roman", "ISO8859-5");
The full code :
int main (int argc, char **argv)
{
HPDF_Doc pdf;
char fname[256];
HPDF_Page page;
HPDF_Font title_font;
HPDF_Font detail_font;
HPDF_UINT page_height = 400;
HPDF_UINT page_width = 400;
const char *detail_font_name;
strcpy (fname, "encoding");
strcat (fname, ".pdf");
pdf = HPDF_New (error_handler, NULL);
if (!pdf) {
printf ("error: cannot create PdfDoc object\n");
return 1;
}
if (setjmp(env)) {
HPDF_Free (pdf);
return 1;
}
page = HPDF_AddPage (pdf);
detail_font = HPDF_GetFont(pdf, "Times-Roman", "ISO8859-5");
HPDF_Page_BeginText (page);
/* move the position of the text to top of the page. */
HPDF_Page_MoveTextPos(page, 10, 280);
HPDF_Page_SetFontAndSize (page, detail_font, 16);
HPDF_Page_MoveTextPos (page, 0, -20);
HPDF_Page_ShowText(page,"регистратор температуры ");
HPDF_Page_SetWidth (page, page_width);
HPDF_Page_SetHeight (page, page_height);
/* finish to print text. */
HPDF_Page_EndText (page);
HPDF_SaveToFile (pdf, fname);
/* clean up */
HPDF_Free (pdf);
return 0;
}
I get a abnormal pdf with Russian text.
How to solved this problem? My source files encoding is not available?
I can get a normal pdf with Russian text by utf-8 encoding but the font is embeded in my PDF ,so i can not choose utf-8.
characters can be displayed but they overlap
I read the pdf by txt style.
check that
/Type /Font
/BaseFont /Times-Bold
/Subtype /Type1
/FirstChar 32
/LastChar 255
/Widths [
250 333 555 500 500 1000 833 278 333 333 500 570 250 333 250 278
500 500 500 500 500 500 500 500 500 500 333 333 570 570 570 500
930 722 667 722 722 667 611 778 778 389 500 778 667 944 722 778
611 778 722 556 667 722 722 1000 722 722 667 333 278 333 581 500
333 500 556 444 556 444 333 500 556 278 333 556 278 833 556 500
556 556 444 389 333 556 500 722 500 500 444 394 220 394 520 250
250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250
250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250
250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250
250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250
250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250
250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250
250 250 250 250 250 250 250 250 250 250 250 250 250 250 250 250
250 250 250 250 250 250 250 250 250 250 250 250 250 500 250 250
]
Russina text is from 176~256.
Should modify 176~256 width
like this widths array
/Widths [
600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600
600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600
600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600
600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600
600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600
600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600
600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600
600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600
250 600 250 250 250 250 250 250 250 250 250 250 250 250 250 250
600 600 600 600 600 600 900 600 600 600 600 600 700 600 700 600
600 600 600 600 600 600 700 600 800 880 800 800 600 600 600 600
500 500 500 480 540 500 600 500 500 500 500 500 600 600 500 500
500 500 500 500 800 500 580 500 700 800 700 600 600 500 600 500
800 600 600 600 600 600 600 600 600 600 600 600 600 500 600 600
]
Certanily,i fill the array by 600 and change the Russian character width which i like.

Flink SQL Match_Recognize giving incomplete results

I have the following data given to Flink as a stream
ID Val eventTime.rowtime
266 25 9000
266 22 10000
266 19 11000
266 18 12000
266 16 13000
266 15 14000
266 14 15000
266 13 16000
266 14 17000
266 15 18000
266 17 19000
266 18 20000
266 18 21000
266 19 22000
266 21 23000
266 21 24000
266 21 25000
266 22 26000
266 21 27000
266 21 28000
266 22 29000
266 24 30000
266 23 31000
266 24 32000
266 25 33000
266 24 34000
266 22 35000
266 23 36000
266 24 37000
266 19 38000
I need to run an SQL match recognize as follows
Select ID, sts, ets, intervalValue,valueDescription, intvDuration from
RawEvents Match_Recognize (
PARTITION BY ID
ORDER BY eventTime
MEASURES
A.ID AS id,
FIRST(A.eventTime) As sts,
LAST(A.eventTime) As ets,
MAX(A.val) As intervalValue,
'max' As valueDescription,
TIMESTAMPDIFF(SECOND, FIRST(A.eventTime), LAST(A.eventTime)) As
intvDuration
AFTER MATCH SKIP TO NEXT ROW
PATTERN (A+ B)
DEFINE
A as A.val>=20,
B As true)
I expect the output to include intervals like
(266,1970-01-01 00:00:09.0,1970-01-01 00:00:10.0,25.0,max,1)
(266,1970-01-01 00:00:10.0,1970-01-01 00:00:10.0,22.0,max,0)
(266,1970-01-01 00:00:23.0,1970-01-01 00:00:23.0,22.0,max,0)
(266,1970-01-01 00:00:23.0,1970-01-01 00:00:24.0,22.0,max,0)
...
(266,1970-01-01 00:00:23.0,1970-01-01 00:00:37.0,22.0,max,0)
...
(266,1970-01-01 00:00:37.0,1970-01-01 00:00:37.0,22.0,max,0)
but what I actually get is the first two to records only
Below is my full code to convert the stream into a table and back the query result to a stream
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
env.getConfig().setAutoWatermarkInterval(10);
DataStream<String> stringStream = env.addSource(new
LinearRoadSource("C:\\Work\\Data\\linear.csv"));
DataStream<SpeedEvent> speedStream = stringStream.map(new
SpeedMapper()).setParallelism(1);
speedStream = speedStream.assignTimestampsAndWatermarks(new
AssignerWithPeriodicWatermarks<SpeedEvent>() {
private long maxTimestampSeen = 0;
#Override
public Watermark getCurrentWatermark() {
return new Watermark(maxTimestampSeen);
}
#Override
public long extractTimestamp(SpeedEvent temperatureEvent, long l)
{
long ts = temperatureEvent.getTimestamp();
// if (temperatureEvent.getKey().equals("W"))
maxTimestampSeen = Long.max(maxTimestampSeen,ts);
return ts;
}
}).setParallelism(1);
TupleTypeInfo<Tuple3<String, Double, Long>> inputTupleInfo = new
TupleTypeInfo<>(
Types.STRING(),
Types.DOUBLE(),
Types.LONG()
);
StreamTableEnvironment tableEnv =
StreamTableEnvironment.getTableEnvironment(env);
tableEnv.registerDataStream("RawEvents",
keyedStream.map((MapFunction<SpeedEvent, Tuple3<String,
Double, Long>>) event -> new Tuple3<>(event.getKey(), event.getValue(),
event.getTimestamp())).returns(inputTupleInfo),
"ID, val, eventTime.rowtime"
);
Table intervalResult = tableEnv.sqlQuery("Select ID, sts, ets, intervalValue,valueDescription, intvDuration from
RawEvents Match_Recognize (
PARTITION BY ID
ORDER BY eventTime
MEASURES
A.ID AS id,
FIRST(A.eventTime) As sts,
LAST(A.eventTime) As ets,
MAX(A.val) As intervalValue,
'max' As valueDescription,
TIMESTAMPDIFF(SECOND, FIRST(A.eventTime), LAST(A.eventTime)) As
intvDuration
AFTER MATCH SKIP TO NEXT ROW
PATTERN (A+ B)
DEFINE
A as A.val>=20,
B As true)");
TupleTypeInfo<Tuple6<String, Timestamp, Timestamp, Double, String,
Integer>> tupleTypeInterval = new TupleTypeInfo<>(
Types.STRING(),
Types.SQL_TIMESTAMP(),
Types.SQL_TIMESTAMP(),
Types.DOUBLE(),
Types.STRING(),
Types.INT()
);
DataStream<Tuple6<String, Timestamp, Timestamp, Double, String, Integer>>
queryResultAsStream = tableEnv.toAppendStream(intervalResult, tupleTypeInterval);
queryResultAsStream.print();
Would there be anything wrong that I've done or something that I forgot to do?
I am using Flink 1.8.1.

How to add row in SQL if value from one column is greater than the value in another column

An example of my dataset would be:
ZoneA: 0-100
ZoneB: 100-200
Name SubName startValueA endValueA startValueB endValueB
A X 0 25 0 100
A X 25 35 0 100
A X 35 80 0 100
A X 80 95 0 100
A X 95 120 0 100
A Y 120 145 100 200
A Y 145 160 100 200
A Y 160 175 100 200
A Y 175 190 100 200
A Y 190 200 100 200
Essentially what I'm desiring is this:
Name SubName startValueA endValueA startValueB endValueB Percent
A X 0 25 0 100 1
A X 25 35 0 100 1
A X 35 80 0 100 1
A X 80 95 0 100 1
A X 95 100 0 100 .2 <--- (100-95)/(120-95)
A X 100 120 100 200 .8 <--- (120-100)/(120-95)
A Y 120 145 100 200 1
A Y 145 160 100 200 1
A Y 160 175 100 200 1
A Y 175 190 100 200 1
A Y 190 200 100 200 1
So a row is added where ValueA crosses over ValueB, and then the resulting percent of each is calculated. Basically I'm trying to figure out how much of valueA belongs in each Zone as defined by valueB. I have the first row done pretty simply with something along the lines of:
case
when endValueA <= endValueB then 1
else ((endValueB - startValueA)/(endValueA - startValueA))
I'm just not sure how to get the additional row added in with the inverse percent.
Thanks in advance for the help!

Calculating a column in SQL using the column's own output as input

I have problem that I find very hard to solve:
I need to calculate a column R_t in SQL where for each row, the sum of the "previous" calculated values SUM(R_t-1) is required as input. The calculation is done grouped over a ProjectID column. I have no clue how to proceed.
The formula for the calculation I am trying to achieve is R_t = ([Contract value]t - SUM(R{t-1})) / [Remaining Hours]_t * [HoursRegistered]t where "t" denotes time and SUM(R{t-1}) is the sum of R_t from t = 0 to t-1.
Time is always consecutive and always begin in t = 0. But number of time periods may differ across [ProjectID], i.e. one project having t = {0,1,2} and another t = {0,1,2,3,4,5}. The time period will never "jump" from 5 to 7
The expected output (using the data from below is) for ProjectID 101 is
R_0 = (500,000 - 0) / 500 * 65 = 65,000
R_1 = (500,000 - (65,000)) / 435 * 100 = 100,000
R_2 = (500,000 - (65,000 + 100,000)) / 335 * 85 = 85,000
R_3 = (500,000 - (65,000 + 100,000 + 85,000)) / 250 * 69 = 69,000
etc...
This calculation is done for each ProjectID.
My question is how to formulate this in a SQL query? My first thought was to create a recursive CTE, but I am actually not sure it is the right way proceed. Recursive CTE is (from my understanding) made for handling more of hierarchical like structure, which this isn't really.
My other thought was to calculate the SUM(R_t-1) using windowed functions, ie SUM OVER (PARITION BY ORDER BY) with a LAG, but the recursiveness really gives me trouble and I run my head against the wall when I am trying.
Below a query for creating the input data
CREATE TABLE [dbo].[InputForRecursiveCalculation]
(
[Time] int NULL,
ProjectID [int],
ContractValue float,
ContractHours float,
HoursRegistered float,
RemainingHours float
)
GO
INSERT INTO [dbo].[InputForRecursiveCalculation]
(
[Time]
,[ProjectID]
,[ContractValue]
,[ContractHours]
,[HoursRegistered]
,[RemainingHours]
)
VALUES
(0,101,500000,500,65,500),
(1,101,500000,500,100,435),
(2,101,500000,500,85,335),
(3,101,500000,500,69,250),
(4,101,450000,650,100,331),
(5,101,450000,650,80,231),
(6,101,450000,650,90,151),
(7,101,450000,650,45,61),
(8,101,450000,650,16,16),
(0,110,120000,90,10,90),
(1,110,120000,90,10,80),
(2,110,130000,90,10,70),
(3,110,130000,90,10,60),
(4,110,130000,90,10,50),
(5,110,130000,90,10,40),
(6,110,130000,90,10,30),
(7,110,130000,90,10,20),
(8,110,130000,90,10,10)
GO
For those of you who dare downloading something from a complete stranger, I have created an Excel file demonstrating the calculation (please download the file as you will not be to see the actual formula in the HTML representation shown when first clicking the link):
https://www.dropbox.com/s/3rxz72lbvooyc4y/Calculation%20example.xlsx?dl=0
Best regards,
Victor
I think it will be usefull for you. There is additional column SumR that stands for sumarry of previest rows (for ProjectID)
;with recu as
(
select
Time,
ProjectId,
ContractValue,
ContractHours,
HoursRegistered,
RemainingHours,
cast((ContractValue - 0)*HoursRegistered/RemainingHours as numeric(15,0)) as R,
cast((ContractValue - 0)*HoursRegistered/RemainingHours as numeric(15,0)) as SumR
from
InputForRecursiveCalculation
where
Time=0
union all
select
input.Time,
input.ProjectId,
input.ContractValue,
input.ContractHours,
input.HoursRegistered,
input.RemainingHours,
cast((input.ContractValue - prev.SumR)*input.HoursRegistered/input.RemainingHours as numeric(15,0)),
cast((input.ContractValue - prev.SumR)*input.HoursRegistered/input.RemainingHours + prev.SumR as numeric(15,0))
from
recu prev
inner join
InputForRecursiveCalculation input
on input.ProjectId = prev.ProjectId
and input.Time = prev.Time + 1
)
select
*
from
recu
order by
ProjectID,
Time
RESULTS:
Time ProjectId ContractValue ContractHours HoursRegistered RemainingHours R SumR
----------- ----------- ---------------------- ---------------------- ---------------------- ---------------------- --------------------------------------- ---------------------------------------
0 101 500000 500 65 500 65000 65000
1 101 500000 500 100 435 100000 165000
2 101 500000 500 85 335 85000 250000
3 101 500000 500 69 250 69000 319000
4 101 450000 650 100 331 39577 358577
5 101 450000 650 80 231 31662 390239
6 101 450000 650 90 151 35619 425858
7 101 450000 650 45 61 17810 443668
8 101 450000 650 16 16 6332 450000
0 110 120000 90 10 90 13333 13333
1 110 120000 90 10 80 13333 26666
2 110 130000 90 10 70 14762 41428
3 110 130000 90 10 60 14762 56190
4 110 130000 90 10 50 14762 70952
5 110 130000 90 10 40 14762 85714
6 110 130000 90 10 30 14762 100476
7 110 130000 90 10 20 14762 115238
8 110 130000 90 10 10 14762 130000

How to transform a CSV file data in Apache camel

I want to transform some field's data in specific rows in csv file.I tried the following .
1).Using csv marshaling and unmarshaling I achieved it ,but the output CSV is not coming in proper order even though I sent list of maps (i.e List) .
following is my program
from("file:E://camelinput//?noop=true")
.unmarshal(csv)
.convertBodyTo(List.class)
.process(new Processor() {
#Override
public void process(Exchange msg) throws Exception {
List<List<String>> data = (List<List<String>>) msg.getIn().getBody();
List<Map<String,Object>> newdata=new ArrayList<Map<String,Object>>();
Map<String,Object> map=null;
for (List<String> line : data) {
System.out.println(line.size());
map=new HashMap<String,Object>();
if("1502873".equals(line.get(3))){
line.set(18, "Y");
}
// newdata.add(line);
int count=0;
for(Object field:line){
// System.out.println("line.get(count) "+line.get(count));
map.put(String.valueOf(count),field);
count++;
}
newdata.add(map);
}
msg.getIn().setBody(newdata);
}
})
.marshal().csv().convertBodyTo(List.class)
.to("file:E://camelout").end();
2)And again I tried Using .split(body()) and trying to process each row(i.e with out using Marshaling I am trying),but it is taking very huge time and getting terminated with some Interrupted exception.
following is the code
from("file:E://camelinput//?noop=true")
.unmarshal(csv)
.convertBodyTo(List.class)
.split(body())
.process(new Processor() {
#Override
public void process(Exchange msg) throws Exception {
List<String> rec= new ArrayList<String>();
if("1502873".equals(rec.get(3))){
rec.set(18, "Y");
}
String dt=rec.toString().trim().replace("[","").replace("]", "");
msg.getIn().setBody(dt, String.class);
}
})
.to("file:E://camelout").end();
following is my sample Csv
25 STANDARD N 1435635 415 1087 15904 7 null 36 Cross Mechanical Pencil, Lead and Eraser, 0.5 mm 2 23162 116599 7/7/2015 15:45 N 828
25 STANDARD N 1435635 415 1087 15905 8 null 36 Jumbo Ballpoint and Selectip Refill, Medium, Black 4 23163 116599 7/7/2015 15:45 N 829
25 STANDARD N 1435635 415 1087 15906 1 3487 null 598 Copier Toner, Cannon 220 23164 116599 7/7/2015 15:45 N 830
25 STANDARD N 1435635 415 1087 15907 2 3495 null 823 Envelopes 27 23165 116599 7/7/2015 15:45 N 831
25 STANDARD N 1435635 415 1087 15908 3 3513 null 789 Legal Pads, 8 1/2 x 11 3/4" White" 30 23166 116599 N 832
25 STANDARD N 1435635 415 1087 15909 4 3577 null 791 Paper Clips 5 23167 116599 7/7/2015 15:45 N 833
31 STANDARD N 1574437 415 1087 15910 5 null 36 Clic Stic Pen, Fine, Black 0.72 23168 116599 7/7/2015 15:45 N 834
31 STANDARD N 1574437 557 1233 15911 6 null 36 Laser Cards, 50 Note Cards/Envelopes, 4-1/4 inch x 5-1/2 inch, White 21.58 23169 116599 7/7/2015 15:45 N 835
31 STANDARD N 1574437 578 1275 15912 1 201 null 32 Keyboard - 101 Key 20.82 23170 116599 7/7/2015 15:45 N 836
25 STANDARD N 1574437 147 2033 15913 1 225 null 30 Monitor - 19" 225.39 23171 116599 7/7/2015 15:45 N 837
1314 STANDARD N 1502873 22 2199 16287 1 628 null 1 Envoy Deluxe Laptop 822.87 23545 116599 7/7/2015 15:45 N 838
1314 STANDARD N 1502873 22 2199 16288 1 151 null 91 Envoy Standard Laptop 1283.44 23546 116599 7/7/2015 15:45 N 839
7653 STANDARD N 1502873 22 2199 16289 2 606 null 1 Battery - Extended Life 28 23547 116599 7/7/2015 15:45 N 840
7652 STANDARD N 1502873 21 459 16290 1 2157 null 1 Envoy Laptop - Rugged 1525.02 23548 116599 7/7/2015 15:45 N 841
1314 STANDARD N 1502873 3 1594 16291 1 251 null 32 RAM - 256MB 51.25 23549 116599 7/7/2015 15:45 N 842
7654 STANDARD N 1502873 22 2199 16292 1 606 null 1 Battery - Extended Life 28 23550 116599 7/7/2015 15:45 N 843
7652 STANDARD N 1502873 21 459 16293 1 247 null 30 Monitor - 17" 225.39 23551 116599 7/7/2015 15:45 N 844
1704 STANDARD N 1502873 41 2200 16294 2 225 null 30 Monitor - 19" 225.39 23552 116599 7/7/2015 15:45 N 845
7658 STANDARD N 1502873 21 460 16295 1 201 null 32 Keyboard - 101 Key 20.82 23553 116599 7/7/2015 15:45 N 846
I have large Csv files which contains hundreds of thousands of rows.
I think your solution 1 might be overly complex if you only want to alter values in csv and output it it back in the same order. Just edit fields in the original List and marshall it back to file.
I've made here assumption that your data was actually delimited by tabs rather than random amount of spaces in your example but I've included the CsvDataFormat that I used. Code uses camel-core and camel-csv version 2.15.3.
public void configure() {
CsvDataFormat csv = new CsvDataFormat();
csv.setDelimiter('\t'); // Tabs
csv.setQuoteDisabled(true); // Otherwise single quotes will be doubled.
from("file://src/data?fileName=data.csv&noop=true&delay=15m")
.unmarshal(csv)
.convertBodyTo(List.class)
.process(new Processor() {
#Override
public void process(Exchange msg) throws Exception {
List<List<String>> data = (List<List<String>>) msg.getIn().getBody();
for (List<String> line : data) {
// Checks if column two contains text STANDARD
// and alters its value to DELUXE.
if ("STANDARD".equals(line.get(1))) {
System.out.println("Original:" + line);
line.set(1, "DELUXE");
System.out.println("After: " + line);
}
}
}
}).marshal(csv).to("file://src/data?fileName=out.csv")
.log("done.").end();
}
The problem is that you are processing single line in single thread. If parallel processing correct for you, try to use ThreadPool.
<camel:camelContext id="camelContext">
.....
<camel:threadPoolProfile id="customThreadPoolProfile"
defaultProfile="true"
poolSize="{{split.thread.pool.size}}"
maxPoolSize="{{split.thread.max.pool.size}}"
maxQueueSize="{{split.thread.max.queue.size}}">
</camel:threadPoolProfile>
</camel:camelContext>
And upgrade split
.split(body().tokenize("\n"))
.streaming()
.parallelProcessing()
.executorServiceRef("customThreadPoolProfile")
.....
.end()

Resources