How to prevent maven surefire to execute test methods of a test-class in parallel? - maven-surefire-plugin

For our application I have written a class that does some house-keeping on received input files, i.e. it is supposed to move processed files into an "archive" folder and delete files older than a given cut-off date from that archive.
For that class I have also written some unit-tests, which first create a bunch of files with appropriate attributes and then executes the house-holding code and finally verifies for files being properly moved, deleted and others still existing.
Because the unit-tests manipulate "physical" files they must not run in parallel, to avoid that these tests interfere with each other.
How can I prevent maven's surefire plugin from running the test-methods of this specific class in parallel?
I found a few SO-questions asking related questions (usually on how to execute methods or classes in parallel) but I found no solution to prevent parallel execution of methods of a test-class (ideally for a single class only).
Any advice or hint?

Related

SSIS: Why does sequence container require TransactionOption Required in order to report task failure?

I have a job that creates files on the network folder from various non-database sources. Within the job, I isolate the various file creation tasks (contained in a sequence container) from the move file task (foreach enumerator) in order to prevent a spiders web of precedence constraints from the various file creation tasks:
Data flow task that contains script component using C# and LDAP to pull data from Active Directory and output it to multiple files
Script Component that downloads files from SFTP (implements WinSCPNET.dll)
Upon successful completion, the sequence container then goes to a foreach file enumerator to move the extracted files to a folder that indicates files are ready for loading - there is no problem here.
However, an intermittent problem arose in production where the AD connection was terminating before the file extract process completed, resulting in partial files (this was not observed in testing, but should have been contemplated - my bad). So, I added a foreach enumerator outside of the sequence container with a failure precedence constraint to delete these partial extract files.
During testing of this fix, I set one of the tasks within the sequence container to report failure. Initially, the sequence container reported success, thus bypassing the delete foreach enumerator. I tried setting the MaximumErrorCount from 0 to 1, but that did not result in the desired behavior change. I then changed the sequence container's TransactionOption from supported to required and this appears to have fixed the problem. Now, the job moves files that are completely extracted while deleting files that report and error on extraction.
My question is this: Is there a potential problem going this route? I am unsure as to why this solution works. The documentation online discusses the TransactionOption in the context of a connection to the database. But, in this case there is no connection to the database. I just don't want to release a patch that may have a potential bug that I am not aware of.
Regarding Transactions and Files.
Presume you write your files to disk with an NTFS or another file system supporting transactions. Then all file create and file save actions are enclosed into one transaction. Had the transaction failed due to task failure, all the files created inside the transaction will be rolled back, i.e. deleted.
So, you will have an "all or nothing" approach on files, receiving files only if all extractions worked out.
In case you store the files on non-transactional file system, like old FAT, this "all or nothing" will no longer work and you will receive partial set of files. Transaction set on Sequence will have no such effect.

How can I synchronize different Apache Camel servers to work together on files without issues?

Our setup : Our production is using several instances (distinct JVM) of Apache Camel spread over a few physical computers (each computer is running more than one JVM).
Most of our Camel routes are HTTP-based (REST or SOAP), and we have a network component dealing with load-balancing the http queries, so it's working fine for that purpose.
On the file side, the shared folder is a Linux NFS mount. Each of the physical computer have the same shared folder mounted.
My issue : For a "new" pattern, we have to deal with files : somebody will produce files and put them in that shared folder, Apache Camel will have to detect the files, do some work on them, and rename them.
We have 2 uses for that pattern, one is processing a single file, the other is processing a folder containing several file.
I've tried various things, but I can't find a reliable way to ensure that one and only one Apache Camel will consume the file.
Here are some of my attempts :
Using the file component with the option readlock=markerFile . As far as I know, it worked fine for the single file (there may have been issues that I'm not aware of, the single file wasn't used so much), but was not working properly for the folder (I don't remember the exact issues, I stopped trying that 1 year ago)
Using a timer to start a java bean that will do the work :
I tried to create my own lock files, but it turns out that it's not reliable on a NFS filesystem. On some rare occasions, 2 physical computers would both think they succeeded in creating the lock file, and process the folder, and when the rename operation happened one JVM would just hang there forever
I tried to use a database with primary key constraints to synchronize the Camels, but I'm getting some SQL deadlock exceptions which result in processing fails
I'm out of ideas, and I'm worried that there is no effective way to synchronize Camels to work together with our architecture (parallel peers).
Is there a way to simulate a master/slave architecture using the same code spread over different Camels ?

Should SSIS *.dtsx files be marked as -diff in .gitattributes

I'm using git to version control SSIS packages and I know that SSIS generates some crazy XML that is going to badly confuse any merge algorithms.
I'd like to know if having the following line in my .gitattributes file is the correct thing to do:
*.dtsx -diff
I believe this will stop git from attempting to merge the file, which is what I would like.
Am I correct in thinking that this also stops git from generating deltas and therefore stores every change as a whole file? (and therefore, takes up more storage)
My repository also holds the source for the database schema and any other source files, so I'm thinking that switching the repo to fast forward only is not appropriate.
if you don't want files to be merged in git, you need to use the -merge attribute. That way you can still be able to 'diff'.
We also treat packages as binaries, that does imply you will need to do changes multiple times if you need to do a patch from a branch and also need it in your main tree.

Cross-contamination between AngularJS tests

We have a configuration file that we use in our AngularJS app. Because we need our configuration information during the build phase, we define it as a value. The configuration file contains information about where to find the assets for one of several Assessments, so we have a configurationService with an updateAssessment() function that looks in the configuration file at the various Assessments that are defined and then copies the properties from the specific Assessment into another value, assessmentSettings.
We have some situations where we want to read in some additional settings from an XML files, but when those settings are already provided as part of the configuration for that Assessment, we want to ignore them. I have a test that checks that this is done and it runs and passes.
However, massaging the project configuration value, then calling configurationService.updateAssessment(1) in that test causes 41 other tests in different files to fail. My understanding is that Angular should be torn down and brought back up for each test and should certainly not cross-contaminate across different files. Is there something different about values that would cause this to happen?
Note that the project itself seems to load and run fine. I haven't provided code examples because it would be a fair amount of code and I don't think it would be that enlightening. Angular 1.3.

Where should Map put temporary files when running under Hadoop

I am running Hadoop 0.20.1 under SLES 10 (SUSE).
My Map task takes a file and generates a few more, I then generate my results from these files. I would like to know where I should place these files, so that performance is good and there are no collisions. If Hadoop can delete the directory automatically - that would be nice.
Right now, I am using the temp folder and task id, to create a unique folder, and then working within subfolders of that folder.
reduceTaskId = job.get("mapred.task.id");
reduceTempDir = job.get("mapred.temp.dir");
String myTemporaryFoldername = reduceTempDir+File.separator+reduceTaskId+ File.separator;
File diseaseParent = new File(myTemporaryFoldername+File.separator +REDUCE_WORK_FOLDER);
The problem with this approach is that I am not sure it is optimal, also I have to delete each new folder or I start to run out of space.
Thanks
akintayo
(edit)
I found that the best place to keep files that you don't want beyond the life of map would be job.get("job.local.dir") which provides a path that will be deleted when the map tasks finishes. I am not sure if the delete is done on a per key basis or for each tasktracker.
The problem with that approach is that the sort and shuffle is going to move your data away from where that data was localized.
I do not know much about your data but the distributed cache might work well for you
${mapred.local.dir}/taskTracker/archive/ : The distributed cache. This directory holds the localized distributed cache. Thus localized distributed cache is shared among all the tasks and jobs
http://www.cloudera.com/blog/2008/11/sending-files-to-remote-task-nodes-with-hadoop-mapreduce/
"It is common for a MapReduce program to require one or more files to be read by each map or reduce task before execution. For example, you may have a lookup table that needs to be parsed before processing a set of records. To address this scenario, Hadoop’s MapReduce implementation includes a distributed file cache that will manage copying your file(s) out to the task execution nodes.
The DistributedCache was introduced in Hadoop 0.7.0; see HADOOP-288 for more detail on its origins. There is a great deal of existing documentation for the DistributedCache: see the Hadoop FAQ, the MapReduce Tutorial, the Hadoop Javadoc, and the Hadoop Streaming Tutorial. Once you’ve read the existing documentation and understand how to use the DistributedCache, come on back."

Resources