EML vs MSG in terms of EWS - solr

I am going to ask very basic question of difference between EML and MSG file stack. But I am not expecting "MSG is outlook-understandable format" as an answer. I need to know, if I am using EML what properties, I won't be able to extract. I am fairly familiar with OLE and MIME
I am writing a metadata extractor that will get integrated with SOLR. I am using EWS(Exchange Web Services) which is quite easy to use with many advantages and disadvantages.
This question is to summon all Exchange Experts to shed some light on EML or MSG. I have tried endless blogs but none is explaining why to choose what for now.
Reference: Difference between a .msg file and a .eml file
Note: I don't want to convert EML to MSG or vice versa. I will be happy to use any of the component.

Okay so given your last comment your actual question is about the Message Body so you don't need to worry about MSG vs EML. Exchange stores bodies in one of three formats either Text, HTML or RTF (or a combination of these) and it will perform an on the fly conversion if a client asks for a specific format and that is not available. I would say for what you doing just use HTML (which is the default format EWS will return) and you won't have problem. Its pretty rare for people these days to use RTF (HTML has been the default format in Outlook since 2000).I would suggest reading https://msdn.microsoft.com/en-us/library/cc463905(v=exchg.80).aspx . The only time I could see you losing format in the body if you go with HTML is if you have RTF messages with embedded Ole objects but this is pretty rare for people to use these days.
Cheers
Glen

Related

How to write a message recovered from an MS Exchange server via JavaMail as an EML without parsing

I am not a programmer. I am a software solution designer. For compliance reasons I have to recover the messages received in our MS Exchange mailboxes and save them in their original form as an EML file, before I can save them to our CRM database for treatment by backoffice personnel.
My question is : Can I read the message and write the inputstream directly to the EML file without parsing it?
Our Sofware Architect and Judicial teams want to be sure that the contents are exactly equal to the original received e-mails in case of regulatory audit/investigation.
Also, can I save it to a blob-type database field and create a link that can be used to download the file?
Sorry for my lack of knowledge, but I am original a COBOL analyst.
Thanks!
Our CRM software package uses JavaMail to send and receive e-mail messages from our MS Exchange Server, and stores them in the package database (Oracle Exadata) for issue/ticket management.
EML files are MIME format. You can use JavaMail to read the MIME content of a message and write it to an EML file without parsing it first.
However, note that Exchange does not store the message in MIME format. So, even though the message may be received in MIME format, Exchange may transform it into its own internal format, and then transform it back to MIME when the message is read. Depending on your Exchange configuration, this transformation may or may not preserve the original MIME content exactly. In some case Exchange will transform the message into a different MIME format that it thinks will be easier for the client to process. I'm not an Exchange expert so if this concerns you you'll need to look into the Exchange documentation in more detail.

EML vs MAI mail format

I would like to allow variant end users to download a message file from my system "As Is" so they will be able to open in easily.
i don't want to use the Outlook .msg format because not all of the users have Microsoft Outlook. HTML is not good because i can not convert emails with attachments to HTML.
I investigated and found 2 common formats. EML and MAI.
Can you explain about each of them so i will be able to decide on the right format?
while testing, i took MAI file, renamed it to EML and i could successfully open if with programs supports EML files like Outlook so i am wondering, is it the same format but with different extension only?
I added the MailEnable tag because i understood that if i am taking EML file and rename it to MAI it should work. and i am wondering if its correct without no exceptions. Because as per my understanding, EML if more common so i may choose that format but i still want to be able to load the EML files to MailEnable inbox for searching and manage using the Web interface.
EML isn't a format, it's just a file extension that is typically used for email messages in the MIME format.
I have no idea what MAI is, I've never heard of it and Google isn't turning anything up.
Since MIME is the only standard email format in existence, use that.

IBM Watson, how to input data of entire books

Im using the IBM Watson analytics trial, it says it only takes data as CSV, Excel and a few others. How can i convert books or bodies of text into an acceptable format? thank you
It seems like the architecture of WCA(Watson Context Analytics) does not support PDF itself. Please refer the following images from IBM Link
I think it would be better to convert pdf to text with converter such as CONVERTER and pushing it into database or others.
Then, you can crawing the text data from it.
FYI, the document has to have a KEY column (i.e. name of the book).
Even if you do convert your book into an acceptable text format (.csv. .xls, .xlsx. .sav), Watson Analytics isn't optimized for text analytics. It sounds like Watson Explorer is the offering that'd best suit your needs.
Hope this helps.
Even though CSV or XLS is the acceptable format of the file, Datasets needs to be in the specific structure. You need headers for all the tables and data following it. I am not sure how a data of the book can fit into that format.
I have recently published this blog post on how to structure and refine data before importing into Watson Analytics to get the best results.
For your specific requirement, you can look into Watson Explorer as suggested by Brennan above, or even better you can learn to use IBM Content Analytics here.

Tasked with a very large email conversion project and lost

Im not sure where to start or what to even consider using for the following problem.
I have a Lotus Notes infrastructure with many user mail files that I need extracted and stored into another format/database such as sql. After the conversion I need to find a way to index all of the emails, while maintaining, from, to, subject, and content, attachments are not important. I need some way to search for all emails containing a keyword or context, regardless of sender, and pulling them all in a displayable search form in order of date. Does anyone know what may help my situation?
software that does what you want is called E-mail_archiving, there are many products, and for example you could try an open source one, http://www.mailarchiva.com/index.do
If mailarchiva does not import natively nsf files, you just have to convert them to a format they support.

How does Google Docs store documents (on the backend)?

I half imagine there being these great .docs in the sky... but another part of me doubts that my documents are even being stored in anything we'd traditionally call a "file." Does Google have its own document format? I feel like it must. Some branch of some existing format like ODF, maybe? Any idea what it's like, what's special about it (if anything), and/or why it is the way it is?
As far as I'm aware, Google Docs originally generated RTF files. Now, however, with the recent push of HTML5 and integration of the ContentEditable module, they may very well just store documents as plain HTML within their database.
I would guess that google definitly extracts some information for indexing from the file. For editing purposes however, I do not think the internal format will be so much different from ODF/MS-Office or other file formats. But those are only guesses, maybe someone else knows more.

Resources