Node.JS: How does "fs.watchFile" work? - filesystems

According to the API docs for Node 0.4.3, the fs.watchFile(filename, [options], listener) function starts a routine that will
Watch for changes on filename. The callback listener will be called each time the file is accessed.
It also says
The options if provided should be an object containing two members a boolean, persistent, and interval, a polling value in milliseconds
Which indicates that it will check every so often based on what is in interval. But it also says
The default is { persistent: true, interval: 0 }.
So does that mean it will check every millisecond to see if the file time changed? Does it listen for OS level events? How does that work?

Yes, cpedros is correct, this does seem to be a duplicate. I think I can shed some more light on this though.
Each OS has its own file change event that gets fired. On Linux, it is inotify (used to be dnotify), on Mac it is fsevents, and on Windows it is FileSystemWatcher. I'm not sure if the underlying code handles each case, but that's the general Idea.
If you just want to watch a file on Linux, I recommend node-inotify-plus-plus. If you want to watch a directory, use inotify-plus-plus with node-walk. I've implemented this and it worked like a charm.
I can post some code if you're interested. The beauty behind node-inotify-plus-plus is that it abstracts much of the nastiness of inotify and gives an intuitive API for listening to specific events on a file.
EDIT: This shouldn't be used to watch tons of files. On my system, the max is 8192. Your max can be found by using this command cat /proc/sys/fs/inotify/max_user_watches. This could be used to just watch directories for changes and then figure out the individual files from there. A modified event will fire if a file directly under that directory is modified.
EDIT: Thanks #guiomie for pointing out that watching files is now fully supported on Windows. I assume this is with the v0.6.x release.

To extend on tjameson's fantastic answer, you could use watchr to normalise the API between the node versiosn and OS watching differences. It also provides events for unlink and new instead of just change, as well as adds support for directory tree watching.

Related

Guava Cache as ValueState in Flink

I am trying to de-duplicate events in my Flink pipeline. I am trying to do that using guava cache.
My requirement is that, I want to de-duplicate over a 1 minute window. But at any given point I want to maintain not more than 10000 elements in the cache.
A small background on my experiment with Flink windowing:
Tumbling Windows: I was able to implement this using Tumbling windows + custom trigger. But the problem is, if an element occurs in the 59th minute and 61st minute, it is not recognized as a duplicate.
Sliding Windows: I also tried sliding window with 10 second overlap + custom trigger. But an element that came in the 55th second is part of 5 different windows and it is written to the sink 5 times.
Please let me know if I should not be seeing the above behavior with windowing.
Back to Guava:
I have Event which looks like this and a EventsWrapper for these events which looks like this. I will be getting a stream of EventsWrappers. I should remove duplicate Events across different EventsWrappers.
Example if I have 2 EventsWrappers like below:
[EventsWrapper{id='ew1', org='org1', events=[Event{id='e1',
name='event1'}, Event{id='e2', name='event2'}]},
EventsWrapper{id='ew2', org='org2', events=[Event{id='e1',
name='event1'}, Event{id='e3', name='event3'}]}
I should emit as output the following:
[EventsWrapper{id='ew1', org='org1', events=[Event{id='e1',
name='event1'}, Event{id='e2', name='event2'}]},
EventsWrapper{id='ew2', org='org2', events=[Event{id='e3', name='event3'}]}
i.e Making sure that e1 event is emitted only once assuming these two events are within the time and size requirements of the cache.
I created a RichFlatmap function where I initiate a guava cache and value state like this. And set the Guava cache in the value state like this. My overall pipeline looks like this.
But each time I try to update the guava cache inside the value state:
eventsState.value().put(eventId, true);
I get the following error:
java.lang.NullPointerException
at com.google.common.cache.LocalCache.hash(LocalCache.java:1696)
at com.google.common.cache.LocalCache.put(LocalCache.java:4180)
at com.google.common.cache.LocalCache$LocalManualCache.put(LocalCache.java:4888)
at events.piepline.DeduplicatingFlatmap.lambda$flatMap$0(DeduplicatingFlatmap.java:59)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176)
On further digging, I found out that the error is because the keyEquivalence inside the Guava cache is null.
I checked by directly setting on the Guava cache(not through state, but directly on the cache) and that works fine.
I felt this could be because, ValueState is not able to serialize GuavaCache. So I added a Serializer like this and registered it like this:
env.registerTypeWithKryoSerializer((Class<Cache<String,Boolean>>)(Class<?>)Cache.class, CacheSerializer.class);
But this din't help either.
I have the following questions:
Any idea what I might be doing wrong with the Guava cache in the above case.
Is what I am seeing with my Tumbling and Slinding windows implementation is what is expected or am I doing something wrong?
What will happen if I don't set the Guava Cache in ValueState, instead just use it as a plain object in the DeduplicatingFlatmap class and operate directly on the Guava Cache instead of operating through the ValueState? My understanding is, the Guava cache won't be part of the Checkpoint. So when the pipeline fails and restarts, the GuavaCahe would have lost all the values in it and it will be empty on restart. Is this understanding correct?
Thanks a lot in advance for the help.
See below.
These windows are behaving as expected.
Your understanding is correct.
Even if you do get it working, using a Guava cache as ValueState will perform very poorly, because RocksDB is going to deserialize the entire cache on every access, and re-serialize it on every update.
Moreover, it looks like you are trying to share a single cache instance across all of the orgs that happen to be multiplexed across a single flatmap instance. That's not going to work, because the RocksDB state backend will make a copy of the cache for each org (a side effect of the serialization involved).
Your requirements aren't entirely clear, but a deduplication query might help. But I'm thinking MapState in combination with timers in a KeyedProcessFunction is more likely to be the building block you need. Here's an example that might help you get started (but you'll be wanting to handle the timers differently).

How broadcast and emit internal mechanism work in angularjs?

I am creating example of broadcast and emit. How it is internally maintain broadcast event lists. Also how angularjs identify which on method it should be executed when broadcast event is called.
Us I know If you defined multiple functions on one event,AngularJs will trigger only one function.The last you defined.If you call $broadcast('anyEvent'),angularjs will send a signal down to current scope (from where you called).It collects all listeners and event by name and easily calls it.The same thing with $emit,but only to up from scope (not to down)
I'd leave this in the comment section if my reputation allowed it, but I just started becoming an active member on StackOverflow...
Anyway, I myself am not too familiar with implementing Broadcast/Emit functionality within Angular but I did happen to come across, what I thought, was a very well written article on the matter. Hope this helps!
http://www.oakwoodsys.com/angularjs-using-emit-broadcast-open-controller-communication/

Webkit GTK: Determine when a document is finished loading

There are other questions on StackOverflow which are close to what I want to know, like Webkit GTK :: How to detect when a download has finished?, but I think I'm asking something a bit different:
In general, in the event-driven C Webkit-GTK API there are a lot of events which may relate to the idea of when some document is finished "loading". The problem is the documentation is pretty sparse, and the idea of "finished loading" isn't necessarily clear, because it can refer to a lot of things. Does "finished loading" mean that the document is finished downloading? That it's finished creating the DOM tree? That it's finished downloading including all other resources (like CSS, JS and image files?)
Relevant signals are signal::notify::load-status, document-load-finished, and resource-load-finished.
The load-status signal fires everytime the load status changes, so you need to manually call webkit_web_view_get_load_status and check the status each time. Even so, when the status finally is WEBKIT_LOAD_FINISHED, I'm not sure what that means - does it mean WebKit is done downloading the resource, or that it's finished creating the DOM tree, or what?
Question:
What is the difference between the various "finished" signals, and is there any signal that is equivalent to the standard Javascript DOM event window.onload?
I believe the document-load-finished signal is what you are looking for as it seems (in my opinion) to match more closely what you are trying to test for.
One idea to test which is the correct way to do this would be to test the various ways there are to test if a document has "loaded" manually. I.e. Try the one I linked to above, and output a string to the Terminal when the value is true. If the value is true before the page has completely displayed all of its contents, chances are that it's not the one you're after. Then move on to the next, until you've got the right one.
Other than that, I'm not really sure what else you can do, since as you mentioned, the definition isn't very clear. It's times like these I wish Gtk documentation was a little more verbose.

calling custom actions from File instead of Binary

I've found lots of examples of calling custom actions in WiX using Binary element, but none examples where a File element was used. Can anyone give me an example?
Not strictly true about needing to run the action deferred! You can use the InstallExecute action to run all the spooled actions up to that point, including, for example file installation. After that you could schedule an immediate action which depends on the newly-installed file, which at this point will be present.
That said, if the file is going to make any changes to the machine state, then the CA really needs to be deferred in system context, so InstallExecute doesn't really buy you anything.

FindFirstChangeNotification API

I am using FindFirstChangeNotification API to monitor the changes happening in a particular folder.But how to exclude a particular file(present in the watching folder) change notification Only.
It works at the directory level, if you want to exclude a specific file then just ignore any notifications about it in you application logic.
Use ReadDirectoryChanges(), it monitors files in a directory tree. ReadDirectoryChanges is basically doing the same thing as FindFirstChangeNotification, FindNextChangeNotification. ReadDirectoryChanges is just more powerful because if you provide the optional callback function to ReadDirectoryChangesW(), you can see which file changed, and why it changed, and then filter in your application logic without overhead of any system call(s) to find which file changed, ...you get this array of structures.
typedef struct _FILE_NOTIFY_INFORMATION {
DWORD NextEntryOffset;
DWORD Action; // <- reason for the change
DWORD FileNameLength;
WCHAR FileName[1];
} FILE_NOTIFY_INFORMATION, *PFILE_NOTIFY_INFORMATION;
FindNextChangeNotification is more like a sledgehammer, you still need to check the folder to see what exactly changed, but it easier to use if you already know which file to hunt for. Findfirst/Next also slightly easier to use in terms of thread waiting/IO completion logic.

Resources