Conditional ETL in Camel based on matching .md5 - apache-camel

Looked through the docs for a way to use Camel for ETL just as in the site's examples, except with these additional conditionals based on an md5 match.
Like the camel example, myetl/myinputdir would be monitored for any new file, and if found, file of ${filename} would be processed.
Except it would first wait for ${filename}.md5 to show up, which would contain the correct md5. If ${filename}.md5 never showed up, it would simply ignore the file until it did.
And if ${filename}.md5 did show up but the md5 didn't match, it would be processed but with an error condition.
Found suggestions to use crypto for matching, but have not figured out how to ignore the file until the matching .md5 file shows up. Really, these two files need to be processed as a matched pair for everything to work properly, and they may not arrive in the input directory at the exact same millisecond. Or alternately, the md5 file might show up a few milliseconds before the data file.

You could use an aggregator to combine the two files based on their file name. If your files are suitably named, then you can use the file name (without extension) as the correlation ID. Continue the route once completionSize equals 2. If you set groupExchanges to true then in your next route step you have access to both the file to compute the hash value for and the contents of the md5 file to compare the hash value against. Or if the md5 or content file never arrived within completionTimeout you can trigger whatever action is appropriate for your scenario.

Related

How does one split large result set from a Group By into multiple flat files?

I'm far away from an SSIS expert and I'm attempting to correct an error (unspecified in the messages) that began once I modified a variable to increase the size of the data accumulated and exported into a flat file. (Note variable was a date in the WHERE statement that limited the data returned from the SELECT.)
So in the data flow there's a GROUP BY component and I'm trying to find the appropriate component to put in between that and the flat file destination component to chop up the results. I figured there'd be something to export, say flatFile1.csv, flatFile2.csv, etc. based on a number of lines (so if I set a limit of 1-million lines and the results returned 3.5-million, I'd get 4 files with the last one containing 1/2-million lines) or perhaps a max file size with similar results.
Which component should I use from the toolbox to guarantee a manageable file size?
Is a script component the only way to be able to handle any size output? If so would it sit in between the Group By and the Flat File output components or would the script completely obviate the need for the Flat File output?

Active Directory Query using LDAP Query in custom search

Some of the users in the domain I'm working on have no manager assigned or no Job title so I tried to create a new query with this LDAP query in the definequery>customsearch>advanced tab:
(&(objectCategory=user)(objectClass=user))(|(!manager=*)(!title=*)
This returns zero results even though I know they exist. Using the Custom Search creates the same search string and also returns zero results. I tried this, based on research elsewhere, which also returns zero results.
(&(objectCategory=person)(objectClass=user))(|(!manager=*)(!title=*)
What am I doing wrong?
Also I want to search only in specific folders and their subfolders, should I pre-pend this:
(|(OU=Innsbruck)(OU=Totnes)(OU=Dueren))
where these are immediately below the domain and each location has its own sub folders of Computers, Groups, Users.
Your query is just invalid. That window doesn't tell you that - it just gives zero results.
You're missing closing parentheses and you need to put the OR condition inside the AND condition. And you also need to use (objectCategory=person), not (objectCategory=user). You don't really need (objectCategory=person) since (objectClass=user) is good enough to limit the search to user objects, but it doesn't hurt.
This is what it should look like:
(&(objectCategory=person)(objectClass=user)(|(!manager=*)(!title=*)))
I will usually paste my query into Notepad++, which highlights matching parentheses, so it's easy to find missing ones. Or you can break it up over multiple lines to make it easier to read and easier to spot errors:
(&
(objectCategory=person)
(objectClass=user)
(|
(!manager=*)
(!title=*)
)
)
Regardless of how you search (through the Users and Computers UI or through code) you can only search one OU at a time. There is no OU attribute or any other attribute that you can use in a query to limit to specific OUs.
In the UI, you can click 'Browse' in the top right to pick the OU you want to search.
If you were doing this in code, you can do a couple things to limit it to specific OUs:
Search each OU separately (you can optionally set the Search Scope to not search sub-OUs if you want), or
Search the whole domain, then look at the distinguishedName attribute of each result and discard the results from OUs you don't want.
Option #2 will probably perform faster since it's less network requests.
It seems to me that the filter is not compliant with RFC 4515: LDAP String Representation of Search Filters.
May be AD and the tool you are using is accepting it, but NOT filters should be in the form of (!(manager=*)).
(&(objectCategory=person)(objectClass=user)(|(!(manager=*))(!(title=*))))

SAP FM EPS2_GET_DIRECTORY_LISTING file mask

The FM EPS2_GET_DIRECTORY_LISTING has a parameter file_mask which I guess that it should act as a pattern. I need to read from the AS the files containing a word but the file_mask is working faultly. For example if I pass "*ZIP" it returns a file named '.TXT'. Is there a proper way to use that parameter?
The parameters are described in SAP note 1860206 which I will not quote here because I'm not sure about the copyright status. However, wildcards generally do not work as expected in this case - your best bet is to read without the parameter and filter the table afterwards.
I had similar problem but due to poor(eg. * wildcard can be used only at the and of the file mask string :/ ) implementation of standard FILE_MASK-based filtering feature in EPS2_GET_DIRECTORY_LISTING I ended up with the solution where I read entire directory content and then process it with regular expressions to find matching files/directories.

Auto-generating destinations of split files in SSIS

I am working on my first SSIS package. I have a view with data that looks something like:
Loc Data
1 asd
1 qwe
2 zxc
3 jkl
And I need all of the rows to go to different files based on the Loc value. So all of the data rows where Loc = 1 should end up in the file named Loc1.txt, and the same for each other Loc.
It seems like this can be accomplished with a conditional split to flat file, but that would require a destination for each Location. I have a lot of Locations, and they all will be handled the same way other than being split in to different files.
Is there a built in way to do this without creating a bunch of destination components? Or can I at least use the script component to act as a way?
You should be able to set an expression using a variable. Define your path up to the directory and then set the variable equal to that column.
You'll need an Execute SQL task to return a Single Row result set, and loop that in a container for every row in your original result set.
I don't have access at the moment to post screenshots, but this link should help outline the steps.
So when your package runs the expression will look like:
'C:\Documents\MyPath\location' + #User::LocationColumn + '.txt'
It should end up feeding your directory with files according to location.
Set the User::LocationColumn equal to the Location Column in your result set. Write your result set to group by Location, so all your records write to a single file per Location.
I spent some time try to complete this task using the method #Phoenix suggest, but stumbled upon this video along the way.
I ended up going with the method shown in the video. I was hoping I wouldn't have to separate it in to multiple select statements for each location and an extra one to grab the distinct locations, but I thought the SSIS implementation in the video was much cleaner than the alternative.
Change the connection manager's connection string, in which you have to use variable which should be changed.
By varying the variable, destination file also changes
and connection string is :
'C:\Documents\ABC\Files\' + #User::data + '.txt'
vote this if it helps you

SSIS suitability

I'm tring to create an SSIS package to import some dataset files, however given that I seem to be hitting a brick
wall everytime I achieve a small part of the task I need to take a step back and perform a sanity check on what I'm
trying to achieve, and if you good people can advise whether SSIS is the way to go about this then I would
appreciate it.
These are my questions from this morning :-
debugging SSIS packages - debug.writeline
Changing an SSIS dts variables
What I'm trying to do is have a For..Each container enumerate over the files in a share on the SQL Server. For each
file it finds a script task runs to check various attributes of the filename, such as looking for a three letter
code, a date in CCYYMM, the name of the data contained therein, and optionally some comments. For example:-
ABC_201007_SalesData_[optional comment goes here].csv
I'm looking to parse the name using a regular expression and put the values of 'ABC', '201007', and
'SalesData' in variables.
I then want to move the file to an error folder if it doesn't meet certain criteria :-
Three character code
Six character date
Dataset name (e.g. SalesData, in this example)
CSV extension
I then want to lookup the Character code, the date (or part thereof), and the Dataset name against a lookup table
to mark off a 'checklist' of received files from each client.
Then, based on the entry in the checklist, I want to kick off another SSIS package.
So, for example I may have a table called 'Checklist' with these columns :-
Client code Dataset SSIS_Package
ABC SalesData NorthSalesData.dtsx
DEF SalesData SouthSalesData.dtsx
If anyone has a better way of achieving this I am interested in hearing about it.
Thanks in advance
That's an interesting scenario, and should be relatively easy to handle.
First, your choice of the Foreach Loop is a good one. You'll be using the Foreach File Enumerator. You can restrict the files you iterate over to be just CSVs so that you don't have to "filter" for those later.
The Foreach File Enumerator puts the filename (full path or just file name) into a variable - let's call that "FileName". There's (at least) two ways you can parse that - expressions or a Script Task. Depends which one you're more comfortable with. Either way, you'll need to create three variables to hold the "parts" of the filename - I'll call them "FileCode", "FileDate", and "FileDataset".
To do this with expressions, you need to set the EvaluateAsExpression property on FileCode, FileDate, and FileDataset to true. Then in the expressions, you need to use FINDSTRING and SUBSTRING to carve up FileName as you see fit. Expressions don't have Regex capability.
To do this in a Script Task, pass the FileName variable in as a ReadOnly variable, and the other three as ReadWrite. You can use the Regex capabilities of .Net, or just manually use IndexOf and Substring to get what you need.
Unfortunately, you have just missed the SQLLunch livemeeting on the ForEach loop: http://www.bidn.com/blogs/BradSchacht/ssis/812/sql-lunch-tomorrow
They are recording the session, however.

Resources