Can window operator be used in flink batch mode? - apache-flink

I have a program that contains window operator. It works perfectly in streaming mode. However, when I switch to batch mode, window is not emitted. My question is:
Is it due to watermark not advanced in batch mode?
How can I use window operator in batch mode?

I assume you are referring using to batch execution mode with the DataStream or Table API (and not the legacy DataSet API).
Watermarks are unnecessary in batch mode, but you do need to use a source that handles bounded inputs, so that Flink realizes it has fully processed all of the input. For example, if you are using the KafkaSource you must use setBounded(), or if you are using the FileSource then you should not use monitorContinuously().

Related

Is it possible to change process function code(behavior) in the running time?

I just want to my flink application as much as configurable. And also i want to change the behavior of the process function in the running time instead of stopping the cluster and re-deploy the jar file.
Is there any document for that? Or is it possible to inject process function code into running jar. For instance, from the web ui, i will get the process function input(as a java code) then after submitting the form, I will update the process function behavior.
You can use a BroadcastProcessFunction (or a KeyedBroadcastProcessFunction), and on the broadcast channel, communicate (in some fashion) what the process function is supposed to do.
I've seen this technique used to broadcast javascript code (to be executed by Rhino), commands in a DSL, references to a JAR file to load, etc.
It's old and not well documented, but https://github.com/alpinegizmo/flink-training-exercises/blob/master/src/main/java/com/ververica/flinktraining/solutions/datastream_java/broadcast/TaxiQuerySolution.java is an example of this approach that uses Janino to compile and execute dynamically supplied Java expressions.

Does Windows ftp process commands at the same time or in sequence?

I'm having trouble finding the answer to this question, maybe I'm just not asking the question properly. I have to put a file that is relatively large (~500MB at least) in an ftp server and then run a process that takes it in as a parameter. My question is as follows. If i'm using ftp.exe to do this, does the put command lock the process until the file is finished being copied?
I was planning on using a .bat file to execute the commands needed but I don't know if the file is going to be completely copied before the other process starts reading it.
edit: for clarity's sake, here is a sample of the .bat that I would be executing.
ftp -s:commands.txt ftpserver
and the contents of the commands.txt would be
user
password
put fileName newFileName
quote cmd_to_execute
quit
The Windows ftp.exe (as probably all similar scriptable clients) executes the commands one-by-one.
No parallel processing takes place.
FTP as a protocol doesn't specify placing a lock on the files before writing it. However this doesn't prevent anyone from implementing this feature as it is a great value add.
Some FileSystems NTFS) may provide locking mechanism to prevent concurrent access. See this File locking - Wikipedia
See this thread as a reference: How do filesystems handle concurrent read/write?

Minifilter to detect block-level or disk-level changes made to a file?

I'm trying to develop a File System Minifilter driver to intercept I/O operations and determine the disk level changes made to a particular file. I found some sample code in Windows driver samples document https://github.com/Microsoft/Windows-driver-samples/tree/master/filesys/miniFilter/ .
This is my requirement: Each time a write operation occurs on a particular file, I need to filter it and find out what are the disk-level changes to the file this write operation makes. But I'm not sure which I/O operation I should filter for my requirement. Please point me in the right direction. I'm doing this for incremental backup purpose.
if you interesting in disk-level changes you need look only for FLTFL_CALLBACK_DATA_IRP_OPERATION with IrpFlags & IRP_NOCACHE

Is it possible to run a batch file which runs a executable jar on a different JVM?

I have a web project that picks up the report generated using batch and streams to the servletOutputstream. This batch is run using task scheduler at regular intervals. The problem here is, the report that the user downloads is not real time.
Generating and downloading the report on a click of a button without using batch is not possible as the server is not capable of it-Multiple DB calls and multiple cursors are being used and the application goes OOM, memory tuning didn't work at all-(weblogic10.3.6, jrockit1.6). The batch is set up such that it uses Sun JVM, with 1024m heap space and it is working well.
Now, I want to call this Batch on click of a button and I want it to run on the JVM specified in the .bat file-Sun JVM1.6- not the already loaded Jrockit. Is it possible to do this? If yes, how?
Any help on this is appreciated. Thanks!

FileSystemWatcher handling moving file - another solution

Hi
I was trying to use FileSystemWatcher to detect if some files or directories has been moved to another location. The problem was, i had to use onCreated and onDeleted events to handle this, but there are many issues using this solution
how could i detect change if i will select more than one file and press Ctrl+C, Ctrl+V, or right-click and select Copy and then Paste in the same directory?
how could i detect, if i will select more than one directory?
the last one, what if i simulate moving file? I could delete file and create with same name in different place.
I know i could use, Timers, process locking detection, verification which process uses file (if explorer.exe then it could be moving file), but this solution is not perfect and it's very ineffective. I was whinking about this how to solve this issue, and i have decided to implement this in low-level language. Is this possible to do this using C, or assembler? I know that every thing is possible to do using assembler, so is it possible to implement this in asm? I would like to create my own FileSystemWatcher using assembler or C but where should i looking for info how to do this?
File movement within the same filesystem can be detected easily using a filesystem filter driver, as the filesystem received the corresponding request from the OS. Other scenarios such as moving to the other disk or moving by copy/delete sequence are hardly traceable even with the filter driver because you would need to match between the file which have been created/written to and the file which is being deleted (possibly on the other disk).
If you plan to write some security mechanism (like a DRM), then I need to remind that the data can be altered during copying (eg. encrypted or compressed), which makes your task even harder.
Still you can look at filesystem filter drivers - should you decide to go on with detection of filesystem events, such driver is a much more reliable and powerful mechanism than FileSystemWatcher.

Resources