Reading Hadoop sequence file in Flink

Reading Hadoop sequence file in Flink - apache-flink

How to read Hadoop sequence file in Flink? I hit multiple issues with the approach below.
I have:
DataSource<String> source = env.readFile(new SequenceFileInputFormat(config), filePath);
and
public static class SequenceFileInputFormat extends FileInputFormat<String> {
...
#Override
public void setFilePath(String filePath) {
org.apache.hadoop.conf.Configuration config = HadoopUtils.getHadoopConfiguration(configuration);
logger.info("Initializing:"+filePath);
org.apache.hadoop.fs.Path hadoopPath = new org.apache.hadoop.fs.Path(filePath);
try {
reader = new SequenceFile.Reader(hadoopPath.getFileSystem(config), hadoopPath, config);
key = (Writable) ReflectionUtils.newInstance(reader.getKeyClass(), config);
value = (Writable) ReflectionUtils.newInstance(reader.getValueClass(), config);
} catch (IOException e) {
logger.error("sequence file creation failed.", e);
}
}
}
One of the issues: Could not read the user code wrapper: SequenceFileInputFormat.

Once you get an InputFormat, you can call ExecutionEnvironment.createInput(<input format>) to create your DataSource.
For SequenceFiles, the type of the data is always Tuple2<key, value>, so you have to use a map function to convert to whatever type you're trying to read.
I use this code to read a SequenceFile that contains Cascading Tuples...
Job job = Job.getInstance();
FileInputFormat.addInputPath(job, new Path(directory));
env.createInput(HadoopInputs.createHadoopInput(new SequenceFileInputFormat<Tuple, Tuple>(), Tuple.class, Tuple.class, job);

Related

How can I e-mail a .csv file in Codename One?

In my app, I create a file with a comma-separated array by writing to an OutputStream. Then I want to be able to share this by e-mail so a user can get the data. This is the code I use to create the file:
public String getLogFile(String logName) {
String path = FileSystemStorage.getInstance().getAppHomePath() + "exp " + logName + ".csv";
Set<Long> keys;
OutputStream os = null;
try {
os = FileSystemStorage.getInstance().openOutputStream(path);
Hashtable<Long, Integer> log = (Hashtable<Long, Integer>) dataStorage
.readObject(logName);
keys = log.keySet();
for (Long key : keys) {
String outString = (key + "," + log.get(key) + "\n");
System.out.println(outString);
byte[] buffer = outString.getBytes();
os.write(buffer);
}
} catch (IOException e) {
AnalyticsService.sendCrashReport(e, "Error writing log", false);
e.printStackTrace();
} finally {
try {
os.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
return path;
}
Then, I've created a button that when pressed passes the path of the file to share. I've tried to use MIME types such as "text/plain" and "text/comma-separated-values", but that causes errors. Here is the code executed when the button is pressed.
public void exportLog(String logName) {
String path = dataBuffer.getLogFile(logName);
EmailShare email = new EmailShare();
// email.share("Here is your log.", path, "text/plain");
email.share("Here is your log.", path, "text/comma-separated-values");
}
When pressed (in the simulator). I get this stack after selecting the dummy e-mail contact to send to:
java.lang.NullPointerException
at com.codename1.impl.javase.JavaSEPort.scale(JavaSEPort.java:3483)
at com.codename1.ui.Image.scale(Image.java:963)
at com.codename1.ui.Image.scaledImpl(Image.java:933)
at com.codename1.ui.Image.scaled(Image.java:898)
at com.codename1.impl.javase.JavaSEPort$60.save(JavaSEPort.java:6693)
at com.codename1.share.ShareForm.<init>(ShareForm.java:75)
at com.codename1.share.EmailShare$1$2$1.actionPerformed(EmailShare.java:102)
at com.codename1.ui.util.EventDispatcher.fireActionSync(EventDispatcher.java:455)
at com.codename1.ui.util.EventDispatcher.fireActionEvent(EventDispatcher.java:358)
at com.codename1.ui.List.fireActionEvent(List.java:1532)
at com.codename1.ui.List.pointerReleasedImpl(List.java:2011)
at com.codename1.ui.List.pointerReleased(List.java:2021)
at com.codename1.ui.Form.pointerReleased(Form.java:2560)
at com.codename1.ui.Component.pointerReleased(Component.java:3108)
at com.codename1.ui.Display.handleEvent(Display.java:2017)
at com.codename1.ui.Display.edtLoopImpl(Display.java:1065)
at com.codename1.ui.Display.mainEDTLoop(Display.java:994)
at com.codename1.ui.RunnableWrapper.run(RunnableWrapper.java:120)
at com.codename1.impl.CodenameOneThread.run(CodenameOneThread.java:176)

The EmailShare class expects a path to an image file not an arbitrary file as its second argument so loading that fails.
The Message class is better suited for that indeed. You can also use the cloud send option which won't launch the native email app. E.g. the Log class includes that ability directly thru the Log.sendLog API.

It looks like the Messages class is better suited for this task, and should allow attachments, etc.

Using Database as Alfresco ContentStore

I'm working with Alfresco 4.2 and I need to use a table in my database as document content store.
Collecting some information hither and thither over the internet, I read that I have to just implement my custom DBContentStore DBContentWriter and DBContentReader classes. Someone told me to take as reference the FileContentStore class.
I need some help to mapping the FileContentStore in order to match my new class.
For example;
The DBContentWriter has to extend AbstractContentWriter and in the API docs I read that the only methods I have to overwrite are:
getReader() to create a reader to the underlying content
getDirectWritableChannel() to write content to the repository.
What about the second method?
protected WritableByteChannel getDirectWritableChannel()
This is called by getContentOutputStream():
public OutputStream getContentOutputStream() throws ContentIOException
{
try
{
WritableByteChannel channel = getWritableChannel();
OutputStream is = new BufferedOutputStream(Channels.newOutputStream(channel));
// done
return is;
}
The main method is the putContent(InputStream is) which wants to write content into a DB table.
public final void putContent(InputStream is) throws ContentIOException
{
try
{
OutputStream os = getContentOutputStream();
copyStreams(is, os);
Where copyStreams does something like this:
public final int copyStreams(InputStream in, OutputStream out, long sizeLimit) throws IOException
{
int byteCount = 0;
IOException error = null;
long totalBytesRead = 0;
try
{
byte[] buffer = new byte[BYTE_BUFFER_SIZE];
int bytesRead = -1;
while ((bytesRead = in.read(buffer)) != -1)
{
// We are able to abort the copy immediately upon limit violation.
totalBytesRead += bytesRead;
if (sizeLimit > 0 && totalBytesRead > sizeLimit)
{
StringBuilder msg = new StringBuilder();
msg.append("Content size violation, limit = ")
.append(sizeLimit);
throw new ContentLimitViolationException(msg.toString());
}
out.write(buffer, 0, bytesRead);
byteCount += bytesRead;
}
out.flush();
}
finally
{
try
{
in.close();
}
catch (IOException e)
{
error = e;
logger.error("Failed to close output stream: " + this, e);
}
try
{
out.close();
}
catch (IOException e)
{
error = e;
logger.error("Failed to close output stream: " + this, e);
}
}
if (error != null)
{
throw error;
}
return byteCount;
}
}
The main target is to write some code in order to write and read from the DB using these methods.
When the out.flush() is called i should have to write into the BLOB field.
thanks

Without looking at the example implementation in FileContentStore it is difficult to determine everything that getDirectWritableChennel() needs to do. Needless to say actually creating a WritableByteChannel to your database should be relatively easy.
Assuming you are using the BLOB type and you are using JDBC to get at your database then you just need to set a stream for your BLOB and turn it in to a channel.
OutputStream stream = myBlob.setBinaryStream(1);
WritableByteChannel channel = Channels.newChannel(stream);
Will you need to overwrite other methods? Maybe. If you have specific issues with those feel free to raise them.

JavaMail MimeBodyPart.SaveFile provide corrupted files

I'm using JavaMail Library to parser email mime message.
I'm trying to extract the attached files and save them to the local disk but the saved files are not valid and their size is different from the original. only *.txt file are saved ok but *.PDF or *.xlsx are not.
Can you please help me to fix the code?
My code is:
private static void Test3() {
String email_string = File_Reader.Read_File_To_String("D:\\8.txt");
MimeMessage mm = Email_Parser.Get_MIME_Message_From_Email_String(email_string);
Email_Parser.Save_Email_Attachments_To_Folder(mm,"D:\\TEST");
}
public static String Read_File_To_String(String file_path) {
byte[] encoded = new byte[0];
try {
encoded = Files.readAllBytes(Paths.get(file_path));
} catch (IOException exception) {
Print_To_Console(exception.getMessage(), true,false);
}
return new String(encoded, m_encoding);
}
public static MimeMessage Get_MIME_Message_From_Email_String(String email_string) {
MimeMessage mm = null;
try {
Session s = Session.getDefaultInstance(new Properties());
InputStream is = new ByteArrayInputStream(email_string.getBytes());
mm = new MimeMessage(s, is);
} catch (MessagingException exception) {
Print_To_Console(exception.getMessage(), true, false);
}
return mm;
}
public static void Save_Email_Attachments_To_Folder(MimeMessage mm, String output_folder_path) {
ArrayList<Pair<String, InputStream>> attachments_InputStreams = Get_Attachments_InputStream_From_MimeMessage(mm);
String attachment_filename;
String attachment_filename_save_path;
InputStream attachment_InputStream;
MimeBodyPart mbp;
for (Pair<String, InputStream> attachments_InputStream : attachments_InputStreams) {
attachment_filename = attachments_InputStream.getKey();
attachment_filename = Get_Encoded_String(attachment_filename);
attachment_filename_save_path = String.format("%s\\%s", output_folder_path, attachment_filename);
attachment_InputStream = attachments_InputStream.getValue();
try {
mbp = new MimeBodyPart(attachment_InputStream);
mbp.saveFile(attachment_filename_save_path);
} catch (MessagingException | IOException exception) {
Print_To_Console(exception.getMessage(), true, false);
}
}
}

You're doing something very strange in Save_Email_Attachments_To_Folder. (Not to mention the strange naming convention using both camel case and underscores. :-)) I don't know what the InputStreams are you're collecting, but constructing new MimeBodyParts based on them and then using the new MimeBodyPart to save the attachment to the file is almost certainly not what you want to do.
What exactly is Get_Attachments_InputStream_From_MimeMessage doing? Why iterate over the message to collect a bunch of InputStreams, then iterate over the InputStreams to save them? Why not iterate over the message to find the attachments and save them as you find them using the MimeBodyPart.saveFile method? Have you seen the msgshow.java sample program?

Parse method of BasicDexFileReader

I would like to know if I can use the parse method of BasicDexFileReader to load a dexfile which is decrypted into a byte array?
public void parse(byte[] dexBytes) throws IllegalArgumentException, IOException/*,
RefNotFoundException */ {
// Get a DalvikValueReader on the input stream.
reader = new DalvikValueReader(dexBytes, FILE_SIZE_OFFSET);
readHeader();
readStrings();
readTypes();
}
I would be glad if someone can explain what exactly is the purpose of parse method and can it be used in a way i have asked.
Thanks

Take a look at DexMaker.java, which needs to solve this problem for generating code rather than decrypting it.
Here's the relevant sample:
byte[] dex = ...;
/*
* This implementation currently dumps the dex to the filesystem. It
* jars the emitted .dex for the benefit of Gingerbread and earlier
* devices, which can't load .dex files directly.
*
* TODO: load the dex from memory where supported.
*/
File result = File.createTempFile("Generated", ".jar", dexCache);
result.deleteOnExit();
JarOutputStream jarOut = new JarOutputStream(new FileOutputStream(result));
jarOut.putNextEntry(new JarEntry(DexFormat.DEX_IN_JAR_NAME));
jarOut.write(dex);
jarOut.closeEntry();
jarOut.close();
try {
return (ClassLoader) Class.forName("dalvik.system.DexClassLoader")
.getConstructor(String.class, String.class, String.class, ClassLoader.class)
.newInstance(result.getPath(), dexCache.getAbsolutePath(), null, parent);
} catch (ClassNotFoundException e) {
throw new UnsupportedOperationException("load() requires a Dalvik VM", e);
} catch (InvocationTargetException e) {
throw new RuntimeException(e.getCause());
} catch (InstantiationException e) {
throw new AssertionError();
} catch (NoSuchMethodException e) {
throw new AssertionError();
} catch (IllegalAccessException e) {
throw new AssertionError();
}

Get stream from java.sql.Blob in Hibernate

I'm trying to use hibernate #Entity with java.sql.Blob to store some binary data. Storing doesn't throw any exceptions (however, I'm not sure if it really stores the bytes), but reading does. Here is my test:
#Test
public void shouldStoreBlob() {
InputStream readFile = getClass().getResourceAsStream("myfile");
Blob blob = dao.createBlob(readFile, readFile.available());
Ent ent = new Ent();
ent.setBlob(blob);
em.persist(ent);
long id = ent.getId();
Ent fromDb = em.find(Ent.class, id);
//Exception is thrown from getBinaryStream()
byte[] fromDbBytes = IOUtils.toByteArray(fromDb.getBlob().getBinaryStream());
}
So it throws an exception:
java.sql.SQLException: could not reset reader
at org.hibernate.engine.jdbc.BlobProxy.getStream(BlobProxy.java:86)
at org.hibernate.engine.jdbc.BlobProxy.invoke(BlobProxy.java:108)
at $Proxy81.getBinaryStream(Unknown Source)
...
Why? Shouldn't it read bytes form DB here? And what can I do for it to work?

Try to refresh entity:
em.refresh(fromDb);
Stream will be reopened. I suspect that find(...) is closing the blob stream.

It is not at all clear how you are using JPA here, but certainly you do not need to deal with Blob data type directly if you are using JPA.
You just need to declare a field in the entity in question of #Lob somewhat like this:
#Lob
#Basic(fetch = LAZY)
#Column(name = "image")
private byte[] image;
Then, when you retrieve your entity, the bytes will be read back again in the field and you will be able to put them in a stream and do whatever you want with them.
Of course you will need a getter and setter methods in your entity to do the byte conversion. In the example above it would be somewhat like:
private Image getImage() {
Image result = null;
if (this.image != null && this.image.length > 0) {
result = new ImageIcon(this.image).getImage();
}
return result;
}
And the setter somewhat like this
private void setImage(Image source) {
BufferedImage buffered = new BufferedImage(source.getWidth(null), source.getHeight(null), BufferedImage.TYPE_INT_RGB);
Graphics2D g = buffered.createGraphics();
g.drawImage(source, 0, 0, null);
g.dispose();
ByteArrayOutputStream stream = new ByteArrayOutputStream();
try {
ImageIO.write(buffered, "JPEG", stream);
this.image = stream.toByteArray();
}
catch (IOException e) {
assert (false); // should never happen
}
}
}

You need to set a breakpoint on method org.hibernate.engine.jdbc.BlobProxy#getStream on line stream.reset() and examine a reason of IOException:
private InputStream getStream() throws SQLException {
try {
if (needsReset) {
stream.reset(); // <---- Set breakpoint here
}
}
catch ( IOException ioe) {
throw new SQLException("could not reset reader");
}
needsReset = true;
return stream;
}
In my case the reason of IOException was in usage of org.apache.commons.io.input.AutoCloseInputStream as a source for Blob:
InputStream content = new AutoCloseInputStream(stream);
...
Ent ent = new Ent();
...
Blob blob = Hibernate.getLobCreator(getSession()).createBlob(content, file.getFileSize())
ent.setBlob(blob);
em.persist(ent);
While flushing a Session hibernate closes Inpustream content (or rather org.postgresql.jdbc2.AbstractJdbc2Statement#setBlob closes Inpustream in my case). And when AutoCloseInputStream is closed - it rases an IOException in method reset()
update
In your case you use a FileInputStream - this stream also throws an exception on reset method.
There is a problem in test case. You create blob and read it from database inside one transaction. When you create Ent, Postgres jdbc driver closes InputStream while flushing a session. When you load Ent (em.find(Ent.class, id)) - you get the same BlobProxy object, that stores already closed InputStream.
Try this:
TransactionTemplate tt;
#Test
public void shouldStoreBlob() {
final long id = tt.execute(new TransactionCallback<long>()
{
#Override
public long doInTransaction(TransactionStatus status)
{
try
{
InputStream readFile = getClass().getResourceAsStream("myfile");
Blob blob = dao.createBlob(readFile, readFile.available());
Ent ent = new Ent();
ent.setBlob(blob);
em.persist(ent);
return ent.getId();
}
catch (Exception e)
{
return 0;
}
}
});
byte[] fromStorage = tt.execute(new TransactionCallback<byte[]>()
{
#Override
public byte[] doInTransaction(TransactionStatus status)
{
Ent fromDb = em.find(Ent.class, id);
try
{
return IOUtils.toByteArray(fromDb.getBlob().getBinaryStream());
}
catch (IOException e)
{
return new byte[] {};
}
}
});
}

My current and only solution is closing the write session and opening new Hibernate session to get back the streamed data. It works. However I do not know what is the difference. I called inputStream.close(), but that was not enough.
Another way:
I tried to call free() method of blob after session.save(attachment) call too, but it throws another exception:
Exception in thread "main" java.lang.AbstractMethodError: org.hibernate.lob.SerializableBlob.free()V
at my.hibernatetest.HibernateTestBLOB.storeStreamInDatabase(HibernateTestBLOB.java:142)
at my.hibernatetest.HibernateTestBLOB.main(HibernateTestBLOB.java:60)
I am using PostgreSQL 8.4 + postgresql-8.4-702.jdbc4.jar, Hibernate 3.3.1.GA

Is the method IOUtils.toByteArray closing the input stream?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Reading Hadoop sequence file in Flink - apache-flink

Related

How can I e-mail a .csv file in Codename One?

Using Database as Alfresco ContentStore

JavaMail MimeBodyPart.SaveFile provide corrupted files

Parse method of BasicDexFileReader

Get stream from java.sql.Blob in Hibernate

Categories

Resources