I have an issue whilst streaming to a file, I'm sure there is a simple solution but I'm struggling to find it! What I'm attempting to do is very straightforward, I'm getting the contents of a class and serializing them into XML and streaming them to a file. The code I'm using is:
ObservableCollection<RackItemViewModel> rackVMs = ProjectTree.GetTreeData();
XmlSerializer serializer = new XmlSerializer(typeof(RackItem));
using (TextWriter tw = new StreamWriter(filename, false))
{
foreach (RackItemViewModel VM in rackVMs)
serializer.Serialize(tw, VM.RackItem);
}
ProjectTree.GetTreeData() just returns the data to be serialized. If I run the program and save the data it all works as expected, the data is saved and can be read back with Deserialize. The problem I'm having is when I perform more than one save. If I save one set of data to one file and then another set of data to another file, the first file is correct but the second file is a concatenation of the first file and the second! It seems that either the stream or the XMLSerializer are not releasing their contents between saves. I've tried using writefile instead of Stream and I still get the same issue, I've also tried flushing and closing the stream but this has no effect. Needless to say if I close and restart the application between saves it all works fine.
Could anyone tell me what I'm doing wrong please?
Before writing to new file try flushing the stream tw.Flush().
I thought I'd tidy up this thread as I've managed to solve the problem. It turns out that it was nothing to do with the serializing or streaming of the data. The data buffer being written wasn't fully releasing the data between writes. I was checking the View Model object which was OK but the object being written (RackItem), wasn't following suite. Silly error on my part. Thanks for the suggestions.
Related
We are planning to use Flink to process a stream of data from a kafka topic (Logs in Json format).
But for that processing, we need to use input files which change every day, and the information within can change completely (not the format, but the contents).
Each time one of those input files changes we will have to reload those files into the program and keep the stream processing going on.
Re-loading of the data could be done same way as it is done now:
DataSet<String> globalData = env.readTextFile("file:///path/to/file");
But so far I couldnt find examples or come up with a way to trigger that reload in a stream processing job.
As extra information, we wont be using HDFS but local filesystem on each node, so the reload will have to be done in each node, from the local file.
This is because the only reason why we would need HDFS would be for this input files, which are just 100 mb in total and using HDFS would be an overkill.
So far I have been experimenting with RichMapFunction, trying to find a kafka-topic that would provide this functionality (reload files) and trying to find examples of this with no luck.
Edit:
After reading a lot more, I found in several places that this is the way to go: DataArtisans examples.
Trying to make a simple code that would do a simple change in a stream from a control stream, I got the following code:
public class RichCoFlatMapExample extends EventTimeJoinHelper {
private String config_source_path = "NOT_INITIALIZED";
#Override
public void open(Configuration conf) {
config_source_path = "first_file_path";
}
public abstract void processElement1(String one, String two, Collector<String> out) {
config_source_path = one;
}
public abstract void processElement2(String one, String two, Collector<String> out) {
String three = two + config_source_path;
out.collect(three);
}
}
The problem Im having now is, no matter what I try, I get the following error:
Class 'RichCoFlatMapExample' must either be declared abstract or implement abstract method 'processElement1(String, String, Collector)' in 'RichCoFlatMapExample'
The problem is, the requested method is implemented, but I cant make them "abstract" in a non abstract class (I get an error from the IDE).
If I make the class RichCoFlatMapExample, I wont be able to call it from Flink methods (dataStream methods).
Im not sure what is happening but I think this must be close. I will keep trying and update if I make this work.
Flink can monitor a directory and ingest files when they are moved into that directory; maybe that's what you are looking for. See the PROCESS_CONTINUOUSLY option for readfile in the documentation.
However, if the data is in Kafka, it would be much more natural to use Flink's Kafka consumer to stream the data directly into Flink. There is also documentation about using the Kafka connector. And the Flink training includes an exercise on using Kafka with Flink.
I'm trying to record how a file changes over time down to the smallest details, which means that reading any changes per file save wouldn't give sufficient data for my use-case, it needs to be per-keystroke.
I just tried out inotify, but it can only notify me on file-save, not file-modification.
I then realized (I'm quite inexperienced with file-system stuff) that this is because text-editors use buffers to store yet-to-happen changes, committing the contents of the buffer on file save (I think, at least).
So, it seems to me that I could
read the buffer of a text-editor (which seems like it would have to be code specific to each particular editor; not viable)
force a save to the file on every keypress (which again seems like it would require editor-specific code)
monitor the keypresses of the user, store my own buffer of them, and then once the file is saved, correlate the keypresses with the diff (this seems way too hard and prone to error)
read the contents of the file on an interval faster than a person is likely the press any keys (this is hacky, and ridiculous)
I can't think of any other ways. It seems that to properly get the behavior I want I'd need to have editing occur within the terminal, or within a web form. I'd love to hear other ideas though, or a potential editor-agnostic solution to this problem. Thank you for your time.
I've been trying to use the log class to capture some strange device-specific failures using local storage. When I went into the Log class and traced the code I noticed what seems to be a bug.
when I call the p(String) method, it calls getWriter() to get the 'output' instance of the Writer. It will notice output is null so it calls createWriter() create it. Since I haven't set a File URL, the following code gets executed:
if(getFileURL() == null) {
return new OutputStreamWriter(Storage.getInstance().createOutputStream("CN1Log__$"));
}
On the Simulator, I notice this file is created and contains log info.
so in my app I want to display the logs after an error is detected (to debug). I call getLogContent() to retrieve it as a string but it does some strange things:
if(instance.isFileWriteEnabled()) {
if(instance.getFileURL() == null) {
instance.setFileURL("file:///" + FileSystemStorage.getInstance().getRoots()[0] + "/codenameOne.log");
}
Reader r = new InputStreamReader(FileSystemStorage.getInstance().openInputStream(instance.getFileURL()));
The main problem I see is that it's using a different file URL than the default writer location. and since the creation of the Writer didn't set the File URL, the getLogContent method will never see the logged data. (The other issue I have is a style issue that a method getting content shouldn't be setting the location for that content persistently for the instance, but that's another story).
As a workaround I think I can just call "getLogContent()" at the beginning of the application which should set the file url correctly in a place that it will retrieve it from later. I'll test that next.
In the mean time, is this a Bug, or is it functionality I don't understand from my user perspective?
It's more like "unimplemented functionality". This specific API dates back to LWUIT.
The main problem with that method is that we are currently writing into a log file and getting its contents which we might currently be in the middle of writing into can be a problem and might actually cause a failure. So this approach was mostly abandoned in favor of the more robust crash protection approach.
Just this, I'd like to stream uploaded filed directly from the net to the filesystem to avoid out of memory errors. Can I do it with CakeRequest::input()? Is there any other way?
Maybe it's best to read the API documentation CakeRequest::input() or the source;
http://api.cakephp.org/2.3/source-class-CakeRequest.html#876
According to the source, 'input()' reads directly from php://input via the _readInput() method:
However, if I read that part of the source code correctly, it will read the entire stream in memory before returning its content. So I don't think this will give you what you want.
There may be other solutions, maybe a plugin exists. However, you may write your own implementation, using the CakeRequest as an example?
You may also check the HttpSocket class
I found myself passing InputStream/OutputStream objects around my application modules.
I'm wondering if it's better to - save the content to disk and pass something like a Resource between the various methods calls - use a byte[] array instead of having to deal with streams everytime.
What's your approach in these situations?Thanks
Edit:
I've a Controller that receives a file uploaded by the user. I've an utility module that provides some functionality to render a file.
utilityMethod(InputStream is, OutputStream os)
The file in InputStream is the one uploaded by the user. os is the stream associated with the response. I'm wondering if it's better to have the utility method to save the generated file in a .tmp file and return the file path, or a byte[], etc. and have the controller to deal with the outputStream directly.
I try to keep as much in RAM as possible (mostly because of performance reasons and RAM is cheap). So I'm using a FileBackedBuffer to "save" data of unknown size. It has a limit. When less than limit bytes are written to it, it will keep them in an internal buffer. If more data is written, I'll create the actual file. This class has methods to get an InputStream and an OutputStream from it, so the using code isn't bothered with the petty details.
The answer actually depends on the context of the problem, which we dont know.
So, imagining the most generic case, I would create two abstractions. The first abstraction would take InputStream/OutputStream as parameters, whereas the other would take byte[].
The one that takes streams can read and pass the data to the byte[] implementation. So now your users can use both the stream abstraction and byte[] abstraction based on thier needs/comfort.