Building a Route to output differences of two files - apache-camel

Trying to put together a file diff route... could someone help? here is what I have ->
CsvDataFormat csv = new CsvDataFormat();
csv.setDelimiter(",");
from("file:inputdir?delete=true&sortBy=ignoreCase:file:name")
.unmarshal(csv)
.pollEnrich("file:backup?fileName=test.csv&sendEmptyMessageWhenIdle=true")
.unmarshal(csv)
// Need to aggregate here!!!!
.log("test");
A csv file gets dropped in the /input directory and then a backup file is consumed from the /backup directory. I would like to compare these two files and output the difference.

This is not a specific Camel problem. In order to solve this problem you may implement a diff functionality on your own, or you may use an existing library such as java-diff-utils.
Pseudocode:
// read file 1 into a list "list1"
// read file 2 into a list "list2"
// use java-diff-utils to calculate the difference
Patch patch = DiffUtils.diff(list1, list2);

Related

How to read a text file from resources without javaClass

I need to read a text file with readLines() and I've already found this question, but the code in the answers always uses some variation of javaClass; it seems to work only inside a class, while I'm using just a simple Kotlin file with no declared classes. Writing it like this is correct syntax-wise but it looks really ugly and it always returns null, so it must be wrong:
val lines = object {}.javaClass.getResource("file.txt")?.toURI()?.toPath()?.readLines()
Of course I could just specify the raw path like this, but I wonder if there's a better way:
val lines = File("src/main/resources/file.txt").readLines()
Thanks to this answer for providing the correct way to read the file. Currently, reading files from resources without using javaClass or similar constructs doesn't seem to be possible.
// use this if you're inside a class
val lines = this::class.java.getResourceAsStream("file.txt")?.bufferedReader()?.readLines()
// use this otherwise
val lines = object {}.javaClass.getResourceAsStream("file.txt")?.bufferedReader()?.readLines()
According to other similar questions I've found, the second way might also work within a lambda but I haven't tested it. Notice the need for the ?. operator and the lines?.let {} syntax needed from this point onward, because getResourceAsStream() returns null if no resource is found with the given name.
Kotlin doesn't have its own means of getting a resource, so you have to use Java's method Class.getResource. You should not assume that the resource is a file (i.e. don't use toPath) as it could well be an entry in a jar, and not a file on the file system. To read a resource, it is easier to get the resource as an InputStream and then read lines from it:
val lines = this::class.java.getResourceAsStream("file.txt").bufferedReader().readLines()
I'm not sure if my response attempts to answer your exact question, but perhaps you could do something like this:
I'm guessing in the final use case, the file names would be dynamic - Not statically declared. In which case, if you have access to or know the path to the folder, you could do something like this:
// Create an extension function on the String class to retrieve a list of
// files available within a folder. Though I have not added a check here
// to validate this, a condition can be added to assert if the extension
// called is executed on a folder or not
fun String.getFilesInFolder(): Array<out File>? = with(File(this)) { return listFiles() }
// Call the extension function on the String folder path wherever required
fun retrieveFiles(): Array<out File>? = [PATH TO FOLDER].getFilesInFolder()
Once you have a reference to the List<out File> object, you could do something like this:
// Create an extension function to read
fun File.retrieveContent() = readLines()
// You can can further expand this use case to conditionally return
// readLines() or entire file data using a buffered reader or convert file
// content to a Data class through GSON/whatever.
// You can use Generic Constraints
// Refer this article for possibilities
// https://kotlinlang.org/docs/generics.html#generic-constraints
// Then simply call this extension function after retrieving files in the folder.
listOfFiles?.forEach { singleFile -> println(singleFile.retrieveContent()) }
In order to have the same url that work for both Jar or in local, the url (or path) needs to be a relative path from the repository root.
..meaning, the location of your file or folder from your src folder.
could be "/main/resources/your-folder/" or "/client/notes/somefile.md"
The url must be a relative path from the repository root.
it must be "src/main/resources/your-folder/" or "src/client/notes/somefile.md"
Now you get the drill, and luckily for Intellij Idea users, you can get the correct path with a right-click on the folder or file -> copy Path/Reference.. -> Path From Repository Root (this is it)
Last, paste it and do your thing.

How to read config files on elixir mix project

I am creating an elixir project to search for patterns in files.
I want to store those patterns a config files to allow for easy changes in the app.
My first idea is storing those files as exs files in the config folder in the mix project.
So, the questions are:
Is there any easy way to store the config in the files a a keyword list?
How would I load it in the app?
I see there are modules like File to read the file, but is there no standard way to parse keyword lists in elixir? I was thinking something similar as the yml files in Rails.
You can read keyword lists stored in a *.exs file, using Mix.Config.read(path). For writing Elixir terms to a *.exs file, you can use Inspect.Algebra.to_doc(%Inspect.Opts{pretty: true}) and write the resulting string content to a file using File.write. It's not as well formatted as if you did it by hand, but it's definitely still readable.
If you don't mind using Erlang terms, you can read and write those easily using :file.consult(path) and :file.write_file(:io_lib.fwrite('~p.\n', [config]), path) respectively.
Using Code.eval_file
Adding another option, is to evaluate the file as a code file, using Code.eval_file and get in return the result as an elixir construct.
Config file config1.ex:
%{configKey1: "configValue1", configKey2: "configValue2"}
Reading the file:
{content, _} = Code.eval_file("config1.ex")
*evaluating a code file has security consideration needs to take in mind.
Regarding using Mix.Config.read! in #bitwalker correct answer
the config file needs to be in a specific format of:
[
appName: [key1: "val1", key2: "val2"]
]
In the Mix.Config.read code, it try to validate the contents and expect a main keyword list [ {}, {}.. ] which includes keys that has value of type keyword list also.
The code is not long:
def validate!(config) do
if is_list(config) do
Enum.all?(config, fn
{app, value} when is_atom(app) ->
if Keyword.keyword?(value) do
true
else
raise ArgumentError,
"expected config for app #{inspect app} to return keyword list, got: #{inspect value}"
end
_ ->
false
end)
else
raise ArgumentError,
"expected config file to return keyword list, got: #{inspect config}"
end
end
We can circumvent and use a first key which is not atom, and then the validate stops but does not throw:
[
{"mockFirstKey", "mockValue"},
myKey1: "myValue1",
myKey2: "myValue2"
]

boost log every hour

I'm using boost log and I want to make basic log principal file: new error log at the beginning of each hour (if error exists), and to name it like "file_%Y%m%d%H.log".
I have 2 problems with this boost library:
1. How to rotate file at the beginning of each hour?
This isn't possible with rotation_at_time_interval parameter because it creates new file regarding first written record in file, and the hour in file name doesn't match that rule. Is it possible to have multiple rotation_at_time_point for one file in sink or is there some other solution?
2. When file exceed some size I want it to start new file and in that case it should append some index to file name. With adding rotation_size parametar and %N to file name it will increment N all the time while application is running. I want that N to be reset at the beginning of each hour, just as my file name changes. Does anybody have any idea how to do that with this boost log library?
This is basic principal in creating log files in industry. I really don't understand how this can't be done with library which is dedicated for creating log files.
Library itself doesn't provide a way to rotate file at the begging of every hour, but i had same problem so i used a function wrapper, which return true on begging of every hour.
I find this way better for me, because i can controll efficency of code.
from boost.org:
bool is_it_time_to_rotate();
void init_logging(){
boost::shared_ptr< sinks::text_file_backend > backend =
boost::make_shared< sinks::text_file_backend >(
keywords::file_name = "file_%5N.log",
keywords::time_based_rotation = &is_it_time_to_rotate
);
}
For a second question i really dont undrestand it well.

Importing Excel file with dynamic name into SQL table via SSIS?

I've done a few searches here, and while some issues are similar, they don't seem to be exactly what I need.
What I'm trying to do is import an Excel file into a SQL table via SSIS, but the problem is that I will never know the exact filename. We get files at no steady interval, and the file usually has a date/month in the name. For instance, our current file is "Census Data - May 2013.xls". We will only ever load ONE file at a time, so I don't need to loop through a directory for multiple Excel files.
My concept is that I can take this file, copy it to a "Loading" directory, and load it from there. At the start of the package, I will first clear out the loading directory, then scan the original directory for an Excel file, copy it to the loading directory and then load it into SQL. I suppose I may have to store the file names somewhere so I don't copy the same file into the loading directory in subsequent months, but I'm not really sure of the best way to handle that.
I've pretty much got everything down except the part that scans the directory for the Excel file and copies it to the loading directory. I've taken the majority of my info from this page, which (again) is close to what I want to do but not quite exactly the solution I need.
Can anyone get me over the finish line? I can't seem to get the Excel Connection Manager right (this is my first time using variables), and I can't figure out how to get the file into the Loading directory.
Problem statement
How do I dynamically identify a file name?
You will require some mechanism to inspect the contents of a folder and see what exists. Specifically, you are looking for an Excel file in your "Loading" directory. You know the file extension and that is it.
Resolution A
Use a ForEach File Enumerator.
Configure the Enumerator with an Expression on FileSpec of *.xls or *.xlsx depending on which flavor of Excel you're dealing with.
Add another Expression on Directory to be your Loading directory.
I typically create SSIS Variables named FolderInput and FileMask and assign those in the Enumerator.
Now when you run your package, the Enumerator is going to look in Diretory and find all the files that match the FileSpec.
Something needs to be done with what is found. You need to use that file name that the Enumerator returns. That's done through the Variable Mappings tab. I created a third Variable called CurrentFileName and assign it the results of the enumerator.
If you put a Script Task inside the ForEach Enumerator, you should be able to see that the value in the "Locals" window for #[User::CurrentFileName] has updated from the Design time value of whatever to the "real" file name.
Resolution B
Use a Script Task.
You will still need to create a Variable to hold the current file name and it probably won't hurt to also have the FolderInput and FileMask Variables available. Set the former as ReadWrite and the latter as ReadOnly variables.
Chose the .NET language of your choice. I'm using C#. The method System.IO.Directory.EnumerateFiles
using System;
using System.Data;
using System.IO;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
namespace ST_fe2ea536a97842b1a760b271f190721e
{
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
public void Main()
{
string folderInput = Dts.Variables["User::FolderInput"].Value.ToString();
string fileMask = Dts.Variables["User::FileMask"].Value.ToString();
try
{
var files = Directory.EnumerateFiles(folderInput, fileMask, SearchOption.AllDirectories);
foreach (string currentFile in files)
{
Dts.Variables["User::CurrentFileName"].Value = currentFile;
break;
}
}
catch (Exception e)
{
Dts.Events.FireError(0, "Script overkill", e.ToString(), string.Empty, 0);
}
Dts.TaskResult = (int)ScriptResults.Success;
}
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
}
}
Decision tree
Given the two resolutions to the above problem, how do you chose? Normally, people say "It Depends" but there only possible time it would depend is if the process should stop/error out in the case that more than one file did exist in the Loading folder. That's a case that the ForEach enumerator would be more cumbersome than a script task. Otherwise, as I stated in my original response that adds cost to your project for Development, Testing and Maintenance for no appreciable gain.
Bits and bobs
Further addressing nuances in the question: Configuring Excel - you'll need to be more specific in what isn't working. Both Siva's SO answer and the linked blogspot article show how to use the value of the Variable I call CurrentFileName to ensure the Excel File is pointing to the "right" file.
You will need to set the DelayValidation to True for both the Connection Manager and the Data Flow as the design-time value for the Variable will not be valid when the package begins execution. See this answer for a longer explanation but again, Siva called that out in their SO answer.

How do I get a temporary File object (of correct content-type, without writing to disk) directly from a ZipEntry (RubyZip, Paperclip, Rails 3)?

I'm currently trying to attach image files to a model directly from a zip file (i.e. without first saving them on a disk). It seems like there should be a clearer way of converting a ZipEntry to a Tempfile or File that can be stored in memory to be passed to another method or object that knows what to do with it.
Here's my code:
def extract (file = nil)
Zip::ZipFile.open(file) { |zip_file|
zip_file.each { |image|
photo = self.photos.build
# photo.image = image # this doesn't work
# photo.image = File.open image # also doesn't work
# photo.image = File.new image.filename
photo.save
}
}
end
But the problem is that photo.image is an attachment (via paperclip) to the model, and assigning something as an attachment requires that something to be a File object. However, I cannot for the life of me figure out how to convert a ZipEntry to a File. The only way I've seen of opening or creating a File is to use a string to its path - meaning I have to extract the file to a location. Really, that just seems silly. Why can't I just extract the ZipEntry file to the output stream and convert it to a File there?
So the ultimate question: Can I extract a ZipEntry from a Zip file and turn it directly into a File object (or attach it directly as a Paperclip object)? Or am I stuck actually storing it on the hard drive before I can attach it, even though that version will be deleted in the end?
UPDATE
Thanks to blueberry fields, I think I'm a little closer to my solution. Here's the line of code that I added, and it gives me the Tempfile/File that I need:
photo.image = zip_file.get_output_stream image
However, my Photo object won't accept the file that's getting passed, since it's not an image/jpeg. In fact, checking the content_type of the file shows application/x-empty. I think this may be because getting the output stream seems to append a timestamp to the end of the file, so that it ends up looking like imagename.jpg20110203-20203-hukq0n. Edit: Also, the tempfile that it creates doesn't contain any data and is of size 0. So it's looking like this might not be the answer.
So, next question: does anyone know how to get this to give me an image/jpeg file?
UPDATE:
I've been playing around with this some more. It seems output stream is not the way to go, but rather an input stream (which is which has always kind of confused me). Using get_input_stream on the ZipEntry, I get the binary data in the file. I think now I just need to figure out how to get this into a Paperclip attachment (as a File object). I've tried pushing the ZipInputStream directly to the attachment, but of course, that doesn't work. I really find it hard to believe that no one has tried to cast an extracted ZipEntry as a File. Is there some reason that this would be considered bad programming practice? It seems to me like skipping the disk write for a temp file would be perfectly acceptable and supported in something like Zip archive management.
Anyway, the question still stands:
Is there a way of converting an Input Stream to a File object (or Tempfile)? Preferably without having to write to a disk.
Try this
Zip::ZipFile.open(params[:avatar].path) do |zipfile|
zipfile.each do |entry|
filename = entry.name
basename = File.basename(filename)
tempfile = Tempfile.new(basename)
tempfile.binmode
tempfile.write entry.get_input_stream.read
user = User.new
user.avatar = {
:tempfile => tempfile,
:filename => filename
}
user.save
end
end
Check out the get_input_stream and get_output_stream messages on ZipFile.

Resources