hadoop write file and put in Distributed cache - file

I have a requirement to create a dynamic file based on the content in hadoop job.properties and then put it in Distributed Cache.
When I create the file I see that it has been created with the path of "/tmp".
I create a symbolic name and refer to this file in the cache. Now, when I try to read the file in the Dis. cache I am not able to access it. I get th error caused by: java.io.FileNotFoundException: Requested file /tmp/myfile6425152127496245866.txt does not exist.
Can you please let me know If should I need to specify the path also while creating the file and also use that path while accessing/reading the file.
I only need the file to be available only till the job is running.

I don't really get your meaning of
I only need the file to be available only till the job is running
But, when I practice to use distributed cache , I use path like this :
final String NAME_NODE = "hdfs://sandbox.hortonworks.com:8020";
job.addCacheFile(new URI(NAME_NODE + "/user/hue/users/users.dat"));
hope this will help you .

Related

Need custom backup filenames for file copy using Ansible

I have set / array of hosts that fall in below three categories i.e
source_hosts (multiple servers)
ansible_host (single server)
destination_hosts. (multiple servers)
Based on our architecture the plan is to do the following Steps.
Verify if the files exists of source_hosts and has copy permissions for the source user. Also, verify if the "path to folder" n the destination exists and has permissions for the files to get copied. Checking if we are not "Running out of space" on the destination should also be considered.
If the above verification is successful the files should get copied from source_host to ansible_server
Note: I plan to use ansible's fetch module for this http://docs.ansible.com/ansible/fetch_module.html
From the ansible server the files should get copied over to the destination server's respective locations.
Note: I plan to use ansible's copy module for this
http://docs.ansible.com/ansible/copy_module.html
If the file already exists on the destination server a backup must be created with a identifier say "tkt432" along with the timestamp.
Note: Again, I am planning to use copy module for backups but i don't know how to append the identifier to the backed-up files. The module does not have any such feature of appending custom identifier to file names as of my limited knowledge.
I have the following concerns.
what would be the ideal ansible module to address Step 1 ?
How do I address the issue highlighted in Step 4 ?
Any other suggestions are welcomed.
Q: "What would be the ideal ansible module to address Step 1 ?"
A: Modules file and stat. Checking "Running out of space" see Using ansible to manage disk space.
Q: "How do I address the issue highlighted in Step 4 ? If the file already exists on the destination server a backup must be created with an identifier say "tkt432" along with the timestamp."
A: Quoting from the parameters of copy module
backup - Create a backup file including the timestamp ...
Neither the extension nor the place of the backup files is optional. See add optional backup_dir for the backup option #16305.
Q: "Any other suggestions are welcomed."
A: Take a look at module synchronize.
Q: "1. Is there any module to check file/folder permissions (rights) for copy-paste operation with that user id?"
A: There are no copy-paste operations in Ansible.
Q: "Requesting more inputs on how we can append identifiers like "tkt432" to backup filenames while using "copy" modules backup option or any other good solution."
A: There is no more input. Ansible does not do that.
Q: "I feel I won't be able to use the copy module and will have to fallback to writing shell scripts for the above-mentioned issues."
A: Yes. Modules shell and command could help with this.

How to rename a file while using "move" in URL in apache camel

I have an URL like
url = "file:D:/inputFolder?move=D:/outputFolder". we are making this url dynamically.
I want to rename the file while moving, So I made it something like this
url = "file:D:/inputFolder?move=D:/outputFolder&fileName=abc.txt". But I think move and fileName do not work together, it is not renaming.
Is there any alternative to do it? Please remember I want with "move" only.
I cannot use .setHeader(..) also.
Thanks,
Hy,
as far as I understand you, your trying to move the file in one single uri.
That is not really how camel works.
The idea of camel is to have a "consumer" and a "producer", where the consumer loads data (e.g. your file) and the producer puts the data somewhere (e.g. save the file into a folder)
That being said, here is what worked for me with a java route:
from("file:/home/chris/temp/camel/in")
.to("file:/home/chris/temp/camel/out/?fileName=test.txt");
The from part configures the folder where camel looks for new files. A few notes on that:
The file component checks the folder each 0.5 sec for new files. This can be changed with the delay parameter
The option noop configures, if the file is being moved or copied. By default it is set to false, which means it is moved
In the to part you configure, where the file is supposed to be moved. Here you can use the fileName parameter to rename the file.
Be careful with this though, because setting an option in the uri directly does make it "static".
What I mean by that is, that the only way of changing the parameter is by completely reconfiguring the route or by restarting it, where neither is something you would want to do normally.
Note 1:
Moving all files that are put into one folder into the same file always overrides the previous file by default.
You could, for example, use the fileExists parameter to always just append the content of the file: fileExists=Append (See camel file docu for details)
Note 2:
There is an option in the file component to not "move" the file, but copy, rename and delete it, which sometimes is necessary, when you want to move it onto a different drive and a simple copy does not work.
Also see the docu for the camel file component for details on that.
Note 3:
You can have multiple to() statements in the same route to have the file moved to multiple locations. For example:
from("file:/home/chris/temp/camel/in")
.to("file:/home/chris/temp/camel/out/?fileName=test.txt")
.to("smtp:....");
Hope I could help you and answer you question.
Greets
Chris
Two possible ways to achieve your goal.
Use both "consumer" and "producer"
Using this way, you are free to control where and how your destination can be set and has great freedom to control filename with the use of a processor/bean.
from("file:D:/inputFolder")
.to("file:D:/outputFolder?fileName=abc.txt")
Use "consumer" only
Using this way, you are treating your work as source data control. This can be use when your file is going to move within same drive. The drawback is the filename rename pattern is limited (refer to camel file language)
from("file:D:/inputFolder?move=${file:parent}/../outputFolder/abc.txt")

UNIX Shell script: file reading issue

I have to read a file in my shell script. I was using PL/SQL's UTL_FILE to open the file.
But I have to do a new change which will append timestamp to the file.
e.g import.data file becomes import_20152005101200.data
Now timestamp is the time at which file arrive at the server.
Since the file name changed I can't use the old way of file accessing.
I came up with below solution:
UTL_FILE.FOPEN ('path','import_${file_date}.data','r');
To achieve this I have to get filename and trim it using SUBSTR to get timestamp and pass to file_date variable.
However I am not able to find how to access filename in a particular path. I can use basename. But My file name keeps changing because of timestamp.
Any help/ alternate ideas are welcome.
PL/SQL isn't a good tool to solve this problem; UTL_FILE doesn't have any tools to list all the files in a folder.
A better solution is to define a stored procedure which uses UTL_FILE and pass the file name to process as an argument to the procedure. That way, you use the shell (which has many powerful commands and tools to examine folders and files) or a script language like Python to determine which file to process.

How do I write a java code that will export a file to a usb drive

It is exactly wnat the title says I have beenlooking for quite sometime and haven't found anything the main use would be as a auto run file to collect error reports from our offic computers
You could do this if you run the program from the USB drive itself, and declare wherever the file is stored as the "working directory" as the USB may have different IDs for different computers...things get messy.
My recommendation is to use File in Java, and Path (http://docs.oracle.com/javase/7/docs/api/java/nio/file/Path.html)
A warning though is that if you copy a directory (folder), the files within that folder are actually not automatically copied...its just the way it works. (more here: http://docs.oracle.com/javase/tutorial/essential/io/copy.html)
Assuming the file is always in the same place going to the same place. For example:
Files.copy(source, destination, options);
or you can open text file and read from it for a more advanced method:
Files.copy(InputStream, path, options);
etc.

Is it possible to open a flatfile in PLSQL when only a partial filename is known?

Is it possible to open a flatfile when only part of the file name is known?
I have files in a directory that have a timestamp appended to the filename, is it possible to open it by specifiying the known part of the filename (excluding timestamp)?
Is it possible with a PLSQL only approach?
There is a dbms_ package which allows you to get a directory listing for the directory (or you can implement your own in a java stored procedure - google!) This will allow you to find the file you are looking for - if necessary choose which is the relevant file and then process.
See http://notdennis.wordpress.com/2013/07/03/listing-directory-files-plsql/

Resources