How to read Word Paragraph Line by Line in C#? - wpf

I am trying to read word content line by line. But I am facing an issue. When trying to read paragraph. If paragraph content is multi line. I am getting single line internally. Can any one please help me on this.
Expected Output:
Line 1 - > TERM BHFKGBHFGFKJHGKJSHFKG ABC1 IOUTOYTIUYRUYTIREYTU B08
Line 2 - > NBHFBHDFGJDSBHKHDGFJGJGDJK 3993 JBHKJSFGSDKFJDGFJKDSBF3993
Line 3 - > JHBJKFHKJGDGFSFGB08 HGHGGFGFDGJFFFDSGFABC1 JJBVHGHDFTERM
Line 4 - > TERMBHFKGBHFGFKJHGKJSHFKG ABC1IOUTOYTIUYRUYTIREYTU B08NBHFBHDFGJDSBHKHDGFJGJGDJK
Line 5 - > 39931234567890987654321
Actual Output:
Single Line -> TERM BHFKGBHFGFKJHGKJSHFKG ABC1 IOUTOYTIUYRUYTIREYTU B08 NBHFBHDFGJDSBHKHDGFJGJGDJK 3993 JBHKJSFGSDKFJDGFJKDSBF3993 JHBJKFHKJGDGFSFGB08 HGHGGFGFDGJFFFDSGFABC1 JJBVHGHDFTERM
TERMBHFKGBHFGFKJHGKJSHFKG ABC1IOUTOYTIUYRUYTIREYTU B08NBHFBHDFGJDSBHKHDGFJGJGDJK
39931234567890987654321
Below is my code sample:
OpenXml:
using (WordprocessingDocument doc = WordprocessingDocument.Open(fs, false))
{
var bodyText = doc.MainDocumentPart.Document.Body;
if (bodyText.ChildElements.Count > 0)
{
foreach (var items in bodyText)
{
if (items is Paragraph)
{
var par = items.InnerText;
}
}
}
}
Office.Interop
object nullobj = System.Reflection.Missing.Value;
Word.Application app = new Word.Application();
Word.Document doc = app.Documents.Open(FilePath, ref nullobj, FileAccess.Read,
ref nullobj, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj);
foreach (Word.Paragraph paragraph in doc.Paragraphs)
{
var line = paragraph.Range.Text;
}

It is not possible to determine individual lines in the closed file. Lines are dynamically generated when a document is opened in Word and where a line "breaks" depends on many factors - it's not necessarily the same from system profile to system profile. So it's necessary to use the interop, not Open XML to pick up where lines break on the screen.
What's more, the Word object model does not provide "Line" objects for this very reason - there is no "line", only a visual representation of how the page will print, given the current printer driver and version of Windows.
The only part of the Word object model that recognizes "lines" is Selection, as this works solely with what's displayed on the screen.
The following code demonstrates how this can be done.
First, since Selection is being worked with and this is visible on-screen, ScreenUpdating is disabled in order to reduce screen flicker and speed up processing. (Note that working with selections is generally much slower than other object model processing.)
Using ComputeStatistics the number of lines in a paragraph is determined. An array (you can also use a list or anything else) to contain the lines is instantiated. The paragraph range is "collapsed" to its starting point and visually selected.
Now the lines in the paragraph are looped, based on the number of lines. The selection is extended (MoveEnd method) by one line (again, moving by lines is only available to a selection) and the selected text written to the array (or whatever).
Finally, screen updating is turned back on.
wdApp.ScreenUpdating = false;
foreach (Word.Paragraph para in doc.Paragraphs)
{
Word.Range rng = para.Range;
int lNumLines = rng.ComputeStatistics(Word.WdStatistic.wdStatisticLines);
string[] aLines = new String[lNumLines];
rng.Collapse(Word.WdCollapseDirection.wdCollapseStart);
rng.Select();
for (int i = 0; i < lNumLines; i++)
{
wdApp.Selection.MoveEnd(Unit: Word.WdUnits.wdLine, Count: 1);
aLines[i] = wdApp.Selection.Text;
wdApp.Selection.Collapse(Word.WdCollapseDirection.wdCollapseEnd);
}
for (int i = 0; i < aLines.Length; i++)
{
Debug.Print(aLines[i]);
}
}
wdApp.ScreenUpdating = true;

In Word, a paragraph is a sinle line of text. Change the size of the print area (e.g. change the margins and/or page size) or the font/point size and the text reflows accordingly. Moerover, since Word uses the active printer driver to optimise the page layout, what exists on a given line in one computer may not exist on the same line on another computer.
Depending on your requirements, though, you could employ Word's predefined '\Line' bookmark to navigate between lines or the Rectangle.Lines property.

Related

Google Apps Script: how to create an array of values for a given value by reading from a two column list?

I have a set of data in a Google spreadsheet in two columns. One column is a list of article titles and the other is the ID of a hotel that is in that article. Call it list1.
Example data
I would like returned a new list with article titles in one column, and an array of the hotel IDs in that article in the other column. Call it list2.
Example data
There are thousands of lines that this needs to be done for, and so my hope was to use Google Apps Script to help perform this task. My original thinking was to
Create column 1 of list2 which has the unique article titles (no script here, just the G-sheets =unique() formula.
Iterate through the titles in list2, looking for a match in first column of the list1
If there is a match:
retrieve its corresponding value in column 2
push it to an empty array in column two of list2
move onto next row in list1
if no longer a match, loop back to step 2.
I've written the following code. I am currently getting a type error (TypeError: Cannot read property '0' of undefined (line 13, file "Code")), however, I wanted to ask whether this is even a valid approach to the problem?
function getHotelIds() {
var outputSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('list2');
var lastRow = outputSheet.getLastRow();
var data = outputSheet.getRange(2,1,lastRow,2).getValues();
var workingSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('list1');
var lastActiveRow = workingSheet.getLastRow();
var itemIDS = [];
for (var i=1; i<=data.length; i++) {
var currentArticle = data[i][0];
var lookupArticle = workingSheet[i][0];
if (currentArticle === lookupArticle) {
var tempValue = [workingSheet[i][1]];
itemIDS.push(tempValue);
}
}
}
Use a simple google sheets formula:
You can use a very simple formula to achieve your goal instead of using long and complicated scripts.
Use =unique(list1!A2:A) in cell A2 of list2 sheet to get the unique hotels.
and then use this formula to all the unique hotels by dragging it down in column B.
=JOIN(",",filter(list1!B:B,list1!A:A=A2))
You got the idea right, but the logic needed some tweaking. The "undefined" error is caused by the workingSheet[i][0]. WorkingSheet is a Sheet object, not an array of data. Also, is not necessary to get the data from list2 (output), it is rather the opposite. You have to get the data from the list1 (source) sheet instead, and iterate over it.
I added a new variable, oldHotel, which will be used to compare each line with the current hotel. If it's different, it means we have reached a different Hotel and the data should be written in list2.
function getHotelIds() {
var outputSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('list2');
var outLastRow = outputSheet.getLastRow();
var workingSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('list1');
var lastActiveRow = workingSheet.getLastRow();
var sourceValues = workingSheet.getRange("A2:B" + lastActiveRow).getValues();
var itemIDS = [];
var oldHotel = sourceValues[0][0]; //first hotel of the list
for (var i = 0; i < sourceValues.length; i++) {
if (sourceValues[i][0] == oldHotel) {
itemIDS.push(sourceValues[i][1]);
/*When we reach the end of the list, the oldHotel variable will never be different. So the next if condition is needed. Otherwise it wouldn't write down the last Hotel.
*/
if (i == sourceValues.length - 1) {
outputSheet.getRange(outLastRow + 1, 1, 1, 2).setValues([
[sourceValues[i][0], itemIDS.toString()]
]);
}
} else {
outputSheet.getRange(outLastRow + 1, 1, 1, 2).setValues([
[sourceValues[i - 1][0], itemIDS.toString()]
]);
oldHotel = sourceValues[i][0]; //new Hotel will be compared
outLastRow = outputSheet.getLastRow(); //lastrow has updated
itemIDS = []; //clears the array to include the next codes
}
}
}
I also converted the itemIDS array to a String each time, so it's written down in a single cell without issues.
Make sure each column of the Sheet is set to "Plain text" from Format > Number > Plain Text
References
getRange
setValues
toString()

Why does Flink emit duplicate records on a DataStream join + Global window?

I'm learning/experimenting with Flink, and I'm observing some unexpected behavior with the DataStream join, and would like to understand what is happening...
Let's say I have two streams with 10 records each, which I want to join on a id field. Let's assume that for each record in one stream had a matching one in the other, and the IDs are unique in each stream. Let's also say I have to use a global window (requirement).
Join using DataStream API (my simplified code in Scala):
val stream1 = ... // from a Kafka topic on my local machine (I tried with and without .keyBy)
val stream2 = ...
stream1
.join(stream2)
.where(_.id).equalTo(_.id)
.window(GlobalWindows.create()) // assume this is a requirement
.trigger(CountTrigger.of(1))
.apply {
(row1, row2) => // ...
}
.print()
Result:
Everything is printed as expected, each record from the first stream joined with a record from the second one.
However:
If I re-send one of the records (say, with an updated field) from one of the stream to that stream, two duplicate join events get emitted 😞
If I repeat that operation (with or without updated field), I will get 3 emitted events, then 4, 5, etc... 😞
Could someone in the Flink community explain why this is happening? I would have expected only 1 event emitted each time. Is it possible to achieve this with a global window?
In comparison, the Flink Table API behaves as expected in that same scenario, but for my project I'm more interested in the DataStream API.
Example with Table API, which worked as expected:
tableEnv
.sqlQuery(
"""
|SELECT *
| FROM stream1
| JOIN stream2
| ON stream1.id = stream2.id
""".stripMargin)
.toRetractStream[Row]
.filter(_._1) // just keep the inserts
.map(...)
.print() // works as expected, after re-sending updated records
Thank you,
Nicolas
The issue is that records are never removed from your global window. So you trigger the join operation on the global window, whenever a new record has arrived, but the old records are still present.
Thus, to get it running in your case, you'd need to implement a custom evictor. I expanded your example in a minimal working example and added the evictor, which I will explain after the snippet.
val data1 = List(
(1L, "myId-1"),
(2L, "myId-2"),
(5L, "myId-1"),
(9L, "myId-1"))
val data2 = List(
(3L, "myId-1", "myValue-A"))
val stream1 = env.fromCollection(data1)
val stream2 = env.fromCollection(data2)
stream1.join(stream2)
.where(_._2).equalTo(_._2)
.window(GlobalWindows.create()) // assume this is a requirement
.trigger(CountTrigger.of(1))
.evictor(new Evictor[CoGroupedStreams.TaggedUnion[(Long, String), (Long, String, String)], GlobalWindow](){
override def evictBefore(elements: lang.Iterable[TimestampedValue[CoGroupedStreams.TaggedUnion[(Long, String), (Long, String, String)]]], size: Int, window: GlobalWindow, evictorContext: Evictor.EvictorContext): Unit = {}
override def evictAfter(elements: lang.Iterable[TimestampedValue[CoGroupedStreams.TaggedUnion[(Long, String), (Long, String, String)]]], size: Int, window: GlobalWindow, evictorContext: Evictor.EvictorContext): Unit = {
import scala.collection.JavaConverters._
val lastInputTwoIndex = elements.asScala.zipWithIndex.filter(e => e._1.getValue.isTwo).lastOption.map(_._2).getOrElse(-1)
if (lastInputTwoIndex == -1) {
println("Waiting for the lookup value before evicting")
return
}
val iterator = elements.iterator()
for (index <- 0 until size) {
val cur = iterator.next()
if (index != lastInputTwoIndex) {
println(s"evicting ${cur.getValue.getOne}/${cur.getValue.getTwo}")
iterator.remove()
}
}
}
})
.apply((r, l) => (r, l))
.print()
The evictor will be applied after the window function (join in this case) has been applied. It's not entirely clear how your use case exactly should work in case you have multiple entries in the second input, but for now, the evictor only works with single entries.
Whenever a new element comes into the window, the window function is immediately triggered (count = 1). Then the join is evaluated with all elements having the same key. Afterwards, to avoid duplicate outputs, we remove all entries from the first input in the current window. Since, the second input may arrive after the first inputs, no eviction is performed, when the second input is empty. Note that my scala is quite rusty; you will be able to write it in a much nicer way. The output of a run is:
Waiting for the lookup value before evicting
Waiting for the lookup value before evicting
Waiting for the lookup value before evicting
Waiting for the lookup value before evicting
4> ((1,myId-1),(3,myId-1,myValue-A))
4> ((5,myId-1),(3,myId-1,myValue-A))
4> ((9,myId-1),(3,myId-1,myValue-A))
evicting (1,myId-1)/null
evicting (5,myId-1)/null
evicting (9,myId-1)/null
A final remark: if the table API offers already a concise way of doing what you want, I'd stick to it and then convert it to a DataStream when needed.

How do I update unlockable characters in SpriteKit game with Swift 3?

I have currently made a game featuring one player. I also made a character screen where the user can choose which character he/she wants to play with. How do I make it so that a certain high score unlocks a certain character, and allows the user to equip this character to use in the game?
Right now my player has his own swift file that defines all the properties of him:
import SpriteKit
class Player: SKSpriteNode, GameSprite {
var initialSize = CGSize(width:150, height: 90)
var textureAtlas: SKTextureAtlas = SKTextureAtlas(named: "Rupert")
let maxFlyingForce: CGFloat = 80000
let maxHeight: CGFloat = 900
var health:Int = 1
var invulnerable = false
var damaged = false
var damageAnimation = SKAction()
var dieAnimation = SKAction()
var forwardVelocity: CGFloat = 190
var powerAnimation = SKAction()
init() {
super.init(texture:nil, color: .clear, size: initialSize)
createAnimations()
self.run(soarAnimation, withKey: "soarAnimation")
let bodyTexture = textureAtlas.textureNamed("pug3")
self.physicsBody = SKPhysicsBody(texture: bodyTexture, size: self.size)
self.physicsBody?.linearDamping = 0.9
self.physicsBody?.mass = 10
self.physicsBody?.allowsRotation = false
self.physicsBody?.categoryBitMask = PhysicsCategory.rupert.rawValue
self.physicsBody?.contactTestBitMask = PhysicsCategory.enemy.rawValue | PhysicsCategory.treat.rawValue | PhysicsCategory.winky.rawValue | PhysicsCategory.ground.rawValue
func createAnimations() {
let rotateUpAction = SKAction.rotate(toAngle: 0.75, duration: 0.475)
rotateUpAction.timingMode = .easeOut
let rotateDownAction = SKAction.rotate(toAngle: 0, duration: 0.475)
rotateDownAction.timingMode = .easeIn
let flyFrames: [SKTexture] = [
textureAtlas.textureNamed("pug1"),
textureAtlas.textureNamed("pug2"),
textureAtlas.textureNamed("pug3"),
textureAtlas.textureNamed("pug4"),
textureAtlas.textureNamed("pug3"),
textureAtlas.textureNamed("pug2")
]
let flyAction = SKAction.animate(with:flyFrames, timePerFrame: 0.07)
flyAnimation = SKAction.group([SKAction.repeatForever(flyAction), rotateUpAction])
let soarFrames:[SKTexture] = [textureAtlas.textureNamed("pug5")]
let soarAction = SKAction.animate(with: soarFrames, timePerFrame: 1)
soarAnimation = SKAction.group([SKAction.repeatForever(soarAction), rotateDownAction])
This is not all the code but you get the point.
I then say: let player = Player() in my Gamescene file which essentially attaches all the attributes in the player file to my player that will be seen in the Gamescene. Even if I am able to replace the initial player with a certain different player, there are so many animations that I don't know how to replace everything at once. I want to set a condition that spans over both the gamescene class and the player class so that it can just sub out certain images for other ones and keep the same actions.
Thank you for any help!
Here are some techniques you can use for making things like this more manageable:
Have a naming convention for your character images and/or sprite sheets, so that you can pass in a name to your Player() constructor. Then, instead of loading texturenamed("pug3"), you load up texturenamed("\(playerName)3"). If the only difference between your characters are the sprite sheets, this is actually all you need on the Player end.
If your characters are more complex, with differences beyond just the images, like being larger or having more health, then you will probably want to go with a more data-oriented approach. There are a couple of approaches to this, but a handy one is to read your texture names, hitbox sizes, health levels, etc., out of a .plist file instead of hard-coding them. Then just pass in the name of the .plist file to load for the character you want. Then, to create a new character, you just create a new .plist file. Another approach would be to create a "character definition" struct that you could pass to the Player constructor that contains the information you need to construct the player (this, in turn, could be loaded from a .plist file, as well, but you could also hard-code them or save them directly using codable serialization).
If neither of the above approaches are sufficient, say if you need different behavior between characters, you could always go the subclassing route - pull the various parts and pieces out into functions, and then override those functions to add the specific functionality you need for more complex characters.

Store formatting information in an array then apply it to a range

I'm trying to create a script that will automatically format a selection based on the formatting of a table in another sheet. The idea is that a user can define a table style for header, rowOdd and rowEven in the Formats sheet, then easily apply it to a selected table using the script.
I've managed to get it working, but only by applying one type of formatting (background colour).
I based my code for reading the code into an array on this article.
As you will hopefully see from my code below, I am only able to read one formatting property into my array.
What I would like to do is read all formatting properties into the array, then apply them to the range in one go. I'm new to this so sorry if my code is a mess!
function formatTable() {
var activeRange = SpreadsheetApp.getActiveSpreadsheet().getActiveRange(); //range to apply formatting to
var arr = new Array(activeRange.getNumRows());
var tableStyleSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Formats"); //location of source styles
var tableColours = {
header: tableStyleSheet.getRange(1, 1, 1).getBackground(),
rowEven: tableStyleSheet.getRange(2, 1, 1).getBackground(),
rowOdd: tableStyleSheet.getRange(3, 1, 1).getBackground()
}
for (var x = 0; x < activeRange.getNumRows(); x++) {
arr[x] = new Array(activeRange.getNumColumns());
for (var y = 0; y < activeRange.getNumColumns(); y++) {
x == 0 ? arr[x][y] = tableColours.header :
x % 2 < 1 ? arr[x][y] = tableColours.rowOdd : arr[x][y] = tableColours.rowEven;
Logger.log(arr);
}
}
activeRange.setBackgrounds(arr);
}
Thanks!
I might be wrong but based from the list of methods given in Class Range, feature to save or store formatting details currently do not exist yet.
However, you may want to try using the following:
copyFormatToRange(gridId, column, columnEnd, row, rowEnd) or copyFormatToRange(sheet, column, columnEnd, row, rowEnd) wherein it copies the formatting of the range to the given location.
moveTo(target) wherein it cuts and paste (both format and values) from this range to the target range.
Did you know that you can get all of the different formatting elements for a range straight into an array?
E.g.
var backgrounds = sheet.getRange("A1:D50").getBackgrounds();
var fonts = sheet.getRange("A1:D50").getFontFamilies();
var fontcolors = sheet.getRange("A1:D50").getFontColors();
etc.
However, there's no way to get all of the formatting in one call unfortunately, so you have to handle each element separately. Then you can apply all of the formats in one go:
targetRng.setFontColors(fontcolors);
targetRng.setBackgrounds(backgrounds);
and so on.

How to use string indexing with IDataReader in F#?

I'm new to F# and trying to dive in first and do a more formal introduction later. I have the following code:
type Person =
{
Id: int
Name: string
}
let GetPeople() =
//seq {
use conn = new SQLiteConnection(connectionString)
use cmd = new SQLiteCommand(sql, conn)
cmd.CommandType <- CommandType.Text
conn.Open()
use reader = cmd.ExecuteReader()
let mutable x = {Id = 1; Name = "Mary"; }
while reader.Read() do
let y = 0
// breakpoint here
x <- {
Id = unbox<int>(reader.["id"])
Name = unbox<string>(reader.["name"])
}
x
//}
let y = GetPeople()
I plan to replace the loop body with a yield statement and clean up the code. But right now I'm just trying to make sure the data access works by debugging the code and looking at the datareader. Currently I'm getting a System.InvalidCastException. When I put a breakpoint at the point indicated by the commented line above, and then type in the immediate windows reader["name"] I get a valid value from the database so I know it's connecting to the db ok. However if I try to put reader["name"] (as opposed to reader.["name"]) in the source file I get "This value is not a function and cannot be applied" message.
Why can I use reader["name"] in the immediate window but not in my fsharp code? How can I use string indexing with the reader?
Update
Following Jack P.'s advice I split out the code into separate lines and now I see where the error occurs:
let id = reader.["id"]
let id_unboxed = unbox id // <--- error on this line
id has the type object {long} according to the debugger.
Jack already answered the question regarding different syntax for indexing in F# and in the immediate window or watches, so I'll skip that.
In my experience, the most common reason for getting System.InvalidCastException when reading data from a database is that the value returned by reader.["xyz"] is actually DbNull.Value instead of an actual string or integer. Casting DbNull.Value to integer or string will fail (because it is a special value), so if you're working with nullable columns, you need to check this explicitly:
let name = reader.["name"]
let name_unboxed : string =
if name = DbNull.Value then null else unbox name
You can make the code nicer by defining the ? operator which allows you to write reader?name to perform the lookup. If you're dealing with nulls you can also use reader?name defaultValue with the following definition:
let (?) (reader:IDataReader) (name:string) (def:'R) : 'R =
let v = reader.[name]
if Object.Equals(v, DBNull.Value) then def
else unbox v
The code then becomes:
let name = reader?name null
let id = reader?id -1
This should also simplify debugging as you can step into the implementation of ? and see what is going on.
You can use reader["name"] in the immediate window because the immediate window uses C# syntax, not F# syntax.
One thing to note: since F# is much more concise than C#, there can be a lot going on within a single line. In other words, setting a breakpoint on the line may not help you narrow down the problem. In those cases, I normally "expand" the expression into multiple let-bindings on multiple lines; doing this makes it easier to step through the expression and find the cause of the problem (at which point, you can just make the change to your original one-liner).
What happens if you pull the item accesses and unbox calls out into their own let-bindings? For example:
while reader.Read() do
let y = 0
// breakpoint here
let id = reader.["id"]
let id_unboxed : int = unbox id
let name = reader.["name"]
let name_unboxed : string = unbox name
x <- { Id = id_unboxed; Name = name_unboxed; }
x

Resources