Hadoop Map Reduce - Read HDFS File - FileAlreadyExists error - file

I am new to Hadoop. I am trying to read an existing file on HDFS using the below code. The configuration seem file and the file path is correct as well. -
public static class Map extends Mapper<LongWritable, Text, Text, Text> {
private static Text f1, f2, hdfsfilepath;
private static HashMap<String, ArrayList<String>> friendsData = new HashMap<>();
public void setup(Context context) throws IOException {
Configuration conf = context.getConfiguration();
Path path = new Path("hdfs://cshadoop1" + conf.get("hdfsfilepath"));
FileSystem fs = FileSystem.get(path.toUri(), conf);
if (fs.exists(path)) {
BufferedReader br = new BufferedReader(
new InputStreamReader(fs.open(path)));
String line;
line = br.readLine();
while (line != null) {
StringTokenizer str = new StringTokenizer(line, ",");
String friend = str.nextToken();
ArrayList<String> friendDetails = new ArrayList<>();
while (str.hasMoreTokens()) {
friendDetails.add(str.nextToken());
}
friendsData.put(friend, friendDetails);
}
}
}
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
for (String k : friendsData.keySet()) {
context.write(new Text(k), new Text(friendsData.get(k).toString()));
}
}
}
I am getting the below exception when I run the code -
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://cshadoop1/socNetData/userdata/userdata.txt already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343)
I am just trying to read an existing file. Any ideas what I am missing here? Appreciate any help.

Exception tells you that your output directory already exists but it should not. Delete it or change its name.
Moreover the name of your output directory 'userdata.txt' looks like the name of a file. So check you are not mistaken in your input/output directories.

Related

How do I apply the trim and split methods to an ArrayList after reading a file?

public static void main(String[] args) throws FileNotFoundException {
File fileToOpen = new File("Faculty.txt");
Scanner fileInput = new Scanner(fileToOpen);
while (fileInput.hasNextLine()) {
ArrayList<String> saTokens = new ArrayList<>();
saTokens.add(fileInput.nextLine());
System.out.println(saTokens);
}
I'm able to print the content of the faculty.txt file, but I need to spit that at every ":" and trim all the extra spaces.
I don't know where to apply the methods.

What are the steps to opening a .dat file in java?

I have another assignment which is obviously giving me trouble.
I need to write code that will open a .dat file from my hard drive in Netbeans IDE.
The file doesn't contain binary data just simple text characters.
When I run the code I've written it gives an error...because it cannot locate the file. Any Suggestions
package countedcones;
import java.io.*;
public class CountedCones {
public static void main(String[]args) throws IOException{
FileReader in = new FileReader("icecream.dat");
BufferedReader br = new BufferedReader(in);
String line = br.readLine();
while (line!=null) {
System.out.println(line);
line = br.readLine();
}
in.close();
}
}

JavaMail MimeBodyPart.SaveFile provide corrupted files

I'm using JavaMail Library to parser email mime message.
I'm trying to extract the attached files and save them to the local disk but the saved files are not valid and their size is different from the original. only *.txt file are saved ok but *.PDF or *.xlsx are not.
Can you please help me to fix the code?
My code is:
private static void Test3() {
String email_string = File_Reader.Read_File_To_String("D:\\8.txt");
MimeMessage mm = Email_Parser.Get_MIME_Message_From_Email_String(email_string);
Email_Parser.Save_Email_Attachments_To_Folder(mm,"D:\\TEST");
}
public static String Read_File_To_String(String file_path) {
byte[] encoded = new byte[0];
try {
encoded = Files.readAllBytes(Paths.get(file_path));
} catch (IOException exception) {
Print_To_Console(exception.getMessage(), true,false);
}
return new String(encoded, m_encoding);
}
public static MimeMessage Get_MIME_Message_From_Email_String(String email_string) {
MimeMessage mm = null;
try {
Session s = Session.getDefaultInstance(new Properties());
InputStream is = new ByteArrayInputStream(email_string.getBytes());
mm = new MimeMessage(s, is);
} catch (MessagingException exception) {
Print_To_Console(exception.getMessage(), true, false);
}
return mm;
}
public static void Save_Email_Attachments_To_Folder(MimeMessage mm, String output_folder_path) {
ArrayList<Pair<String, InputStream>> attachments_InputStreams = Get_Attachments_InputStream_From_MimeMessage(mm);
String attachment_filename;
String attachment_filename_save_path;
InputStream attachment_InputStream;
MimeBodyPart mbp;
for (Pair<String, InputStream> attachments_InputStream : attachments_InputStreams) {
attachment_filename = attachments_InputStream.getKey();
attachment_filename = Get_Encoded_String(attachment_filename);
attachment_filename_save_path = String.format("%s\\%s", output_folder_path, attachment_filename);
attachment_InputStream = attachments_InputStream.getValue();
try {
mbp = new MimeBodyPart(attachment_InputStream);
mbp.saveFile(attachment_filename_save_path);
} catch (MessagingException | IOException exception) {
Print_To_Console(exception.getMessage(), true, false);
}
}
}
You're doing something very strange in Save_Email_Attachments_To_Folder. (Not to mention the strange naming convention using both camel case and underscores. :-)) I don't know what the InputStreams are you're collecting, but constructing new MimeBodyParts based on them and then using the new MimeBodyPart to save the attachment to the file is almost certainly not what you want to do.
What exactly is Get_Attachments_InputStream_From_MimeMessage doing? Why iterate over the message to collect a bunch of InputStreams, then iterate over the InputStreams to save them? Why not iterate over the message to find the attachments and save them as you find them using the MimeBodyPart.saveFile method? Have you seen the msgshow.java sample program?

What does 'moveFailed' really do?

I want to create a file input that behaves as follows:
Process the exchange
Attempt to copy the input file to a shared drive
If step (2) fails (e.g. share is down) then move to local file instead
Following the doc the 'moveFailed' parameter allows to "set a different target directory when moving files after processing (configured via move defined above) failed". So this sounds like the moveFailed would cover step (3).
The following test, however fails...what am I doing wrong ? I am using camel 2.10.0.fuse.
package sandbox.camel;
import java.io.File;
import org.apache.camel.Endpoint;
import org.apache.camel.builder.RouteBuilder;
import org.apache.camel.component.mock.MockEndpoint;
import org.junit.Test;
public class MoveFailedTest extends org.apache.camel.test.junit4.CamelTestSupport {
private String failedDir = "move-failed";
#Override
protected RouteBuilder createRouteBuilder() throws Exception {
return new RouteBuilder() {
#Override
public void configure() throws Exception {
from("file:tmp/prepare").to("file:tmp/input");
from("file:tmp/input?move=/doesnotexist&moveFailed=" + failedDir).to("file:tmp/output");
}
};
}
#Test
public void test_move() throws Exception {
// arrange
File moveFailedDir = new File("tmp/input/" + failedDir);
moveFailedDir.mkdirs();
File[] failedCount1 = moveFailedDir.listFiles();
failedCount1 = failedCount1 == null ? new File[0] : failedCount1;
String messagePayload = "Hello";
Endpoint input = getMandatoryEndpoint("file:tmp/prepare");
MockEndpoint output = getMockEndpoint("mock:file:tmp/output");
output.setMinimumExpectedMessageCount(1);
output.expectedBodiesReceived(messagePayload);
// act
template.asyncSendBody(input, messagePayload);
Thread.sleep(3000);
// assert: only 1 output
assertMockEndpointsSatisfied();
// assert: renamed failed, hence input file was moved to 'movefailed' directory
File[] failedCount2 = moveFailedDir.listFiles();
assertEquals("No file appeared in 'movefailed' directory", failedCount1.length + 1, failedCount2.length);
}
}
Your test is most likely wrong. The autocreate option is default true, which means directories is created if needed.

Cannot Find Symbol for another class file

I've had this problem a few times, where I've created another class file and the main class file can't find it.
Here's the main class file:
package textfiles;
import java.io.IOException;
public class FileData
{
public static void main(String[] args)
{
String file_name = "Lines.txt";
try {
ReadFile file = new ReadFile(file_name);
String[] aryLines = file.OpenFile();
for(int i =0; i<aryLines.length; i++)
{
System.out.println(aryLines);
}
}
catch(IOException e)
{
System.out.println(e.getMessage());
}
}
}
Here is the class file it can't find:
package textfiles;
import java.io.IOException;
import java.io.FileReader;
import java.io.BufferedReader;
public class ReadFile
{
private String path;
int numberOfLines=0;
public ReadFile(String file_path)
{
path = file_path;
}
public String[] OpenFile() throws IOException
{
FileReader fr = new FileReader(path);
BufferedReader br = new BufferedReader(fr);
int numberOfLines = readLines();
String[] textData = new String[numberOfLines];
for(int i=0; i<numberOfLines; i++)
{
textData[i] = br.readLine();
}
br.close();
return textData;
}
int readLines() throws IOException
{
FileReader file_to_read = new FileReader(path);
BufferedReader bf = new BufferedReader(file_to_read);
String aLine;
while((aLine = bf.readLine()) != null)
{
numberOfLines++;
}
bf.close();
return numberOfLines;
}
}
I've tried running javac textfiles\ReadFile.java and javac textfiles\FileData.java as a suggestion for this. That doesn't work. I've made sure I have compiled ReadFile and fixed all the errors there.
The compiler error I get is:
C:\Users\Liloka\Source>javac FileData.java
FileData.java:13: cannot find symbol
symbol : class ReadFile
location: class textfiles.FileData
ReadFile file = new ReadFile(file_name);
^
FileData.java:13: cannot find symbol
symbol : class ReadFile
location: class textfiles.FileData
ReadFile file = new ReadFile(file_name);
^
2 errors
I'm using notepad++and .cmd so it can't be an IDE error.
Thanks in advance!
Make sure the java files are all in the textfiles directory:
textfiles/FileData.java
textfiles/ReadFile.java
And run:
javac textfiles/FileData.java textfiles/ReadFile.java
java textfiles.FileData
Your code works without any modification. I think you are compiling from a wrong directory:
C:\Users\Liloka\Source>javac FileData.java
Move the FileData.java to the textfiles directory.
You have to compile all the java files used by your main class. As ReadFile is used by FileData you have to compile it too.
Did you tried
javac Filedata.java ReadFile.java
or
javac *.java
?
There must be a conflict with generated classes.
Just try to remove all the classes that have been generated and build project again.

Resources