Microsoft word Text Parser in "C" - c

I would like to know the procedure to adopt to parse and obtain text content from Microsoft word (.doc and .docx) documents . programming language used should be plain "C" (should be gcc).
Are there any libraries that already do this job,
extension : can i use the same procedure to parse text from Microsoft power point files also ?

Microsoft Word documents are an enormous beast - you definitely don't want to be writing this code yourself. Look into using an existing free Word library such as antiword or wvWare.

I don't know about libraries that exist, but the format specifications are available from Microsoft for free and under a promise not to sue you for using them.

on windows, let word do the job and interface with the COM object, on linux, the job was done in antiword. Or you can automate OpenOffice.org on any platform with the UNO object model.

If you're willing to go through the effort of using a COM interface in C, you can use the IFilter interface built into every version of Windows since Windows 2000. You can use it to extract text from any office document (Word, Excel, etc.), PDF file or any file type that has IFilter support installed.
I wrote a blog post about it a few years back. It's all C++, but you can use COM objects from C.

Related

Extract cab file and use the database

I found an old windows mobile dictionary application and I want to get the database. I extract it but I don't know how to convert it to csv or sql file. Have anybody idea about it ?
You can download the file from here http://www.mediafire.com/download/z32xgmc9fia3nr2/OGD.Akilli.Sozluk.CAB
And I use Ubuntu.
SQLite makes available the C source code to compile a shell program which can be used to do this. It's apparently also included by Ubuntu. Here is the man page for the sqlite shell included with Ubuntu. Please review the documentation as there are a few ways to convert to CSV or get the schema of a table.
Alternatively, you can use a 3rd party tool to view the database in a GUI. I can't speak for Linux solutions, but Firefox has a 3rd party plugin called "SQLite Manager" that will let you use SQLite files in a GUI if you prefer it that way.

ms search text ifilter

I'm a newbie in MS Search so please forgive the dumb question :-)
I'm storing a large amount of specialized text files for a card game (bridge).
These files are plain textfiles with a specific format to describe a bridge game played in a championship.
The only difference with a regular .txt file is the file extension that is NOT ".txt" but ".lin"
What I need is implement a new iFilter that is an exact copy of the standard MS Search text iFilter, but with another file extension.
Is this possible by copy/pasting an existing filter and tweaking (tampering) its content?
Or do I have to use c# to edit the iFilter and recompile?
The Windows 7 SDK has a sample IFilter implementation that would be a good blue print for what you are trying to do. It contains a project called "SmpFilt" The code shows parsing of a text file with a custom file extension. You will need to modify the code to parse your text instead and pull out any custom attributes from your .lin files.
Unfortunately, you can no longer build custom IFilters with managed code (C#/VB, etc). The sample project is in c++. Windows 7 and Server 2008 won't load IFilters written in managed code.
Good luck.

FTP Transactions Using Microsoft Visual C++ 6

Is there any tutorial about FTP transactions(like download, upload and files/directory listings) using Microsoft Visual C++ 6 using C language instead of C++?
You basically want a WinInet FTP client, which is the Win32 API for this kind of thing. You can do all this in straight C.
There's a decent writeup here:
http://www.teksoftco.com/articles/ftp%20client.htm
but the gist is: you use InternetOpen/InternetConnect to get a connection, then use FtpOpenFile/FtpGetFile/FtpPutFile etc. There are FtpFindFirstFile/NextFile to enumerate directories, and other methods for interrogating your current directory, deleting files, etc.

How can I programmatically create PowerPoint presentations. On Linux. For Free.

I'd like to create a PowerPoint (not Javascript/HTML/PDF/Keynote/.mov) using code (any language, C preferred) for free.
(I've seen this SO question which references how to create them in C#)
Is this even possible? How can I write the raw bits that make up a PowerPoint file? Any good libraries for doing this?
UPDATE The Microsoft Reference Page for the binary format is here.
Open Office has an API. You can use the C++ bindings (doc available here). If you really need C, you'll have to do some wrapping.. but hey, it's Christmas, isn't it ;-)
Open Office has export functions to create .ppt compatible files.
PowerPoint you may not, but OpenOffice Impress you may. (Yoda style answer :) )
Take a look at the ODF Toolkit project. They aim to produce lots of libraries for generating this kind of content programatically.
Unless you're specifically interested in PowerPoint 2003 binary files, PowerPoint 2007 and up .PPTX files are actually a collection of XML files inside a zipped file. You can see that, by simply renaming a .pptx file to .zip and opening it.
You can create these XML files in any way you like, such as writing code to do it.
PresentationML defines the powerpoint XML documents, have a look here for example:
http://msdn.microsoft.com/en-us/openspecifications/hh295812.aspx
The standards could be found here:
http://www.ecma-international.org/publications/standards/Ecma-376.htm
If you don't mind going to Java, Apache POI provides readers and writers for most MS Office formats (up to the 2003 version anyway).

Open and read Excel from a Linux based C program?

I am trying to locate a set of source code that would allow me to open and read the contents of an Excel file on Linux from within a C program.
I dont really want to link it to OpenOffice SDK if I can find something that just does these two things.
carl
If following suites you, then You may take read routines from
Sourceforge
and write routines from
What is a simple and reliable C library for working with Excel files?
As far as I know there is no library that does this. The common method is always to save the file as CVS in Excel, although then markup etc. is lost.
You could try to use the Excel plugin of Gnumeric:
http://svn.gnome.org/viewvc/gnumeric/trunk/plugins/excel/
It works very well (inside gnumeric).
You can use xlhtml to convert the Excel files into HTML, and then use your favorite HTML parser to extract the cell data.
Check out the answers to What is the best C library that can access Excel files?
Possible things for you to look at:
C : xlsLib
C++ : LibExcel
Though I think both are write-only, which is perhaps not what you need.
Grab the xls reading code from Open Office.
why don't you just use Google Docs? With Gears it has offline support and you can edit files too, just a thought - http://docs.google.com
Check out XLSX I/O at https://sourceforge.net/projects/xlsxio/
It is a cross platform C library to read from and write to Excel .xslx files.
Works on Windows, OS X, Linux and does not require Excel or Office to be installed.
It is intended for sequential access to data in .xlsx files, so if it's only the values you are interested in this is what you need.

Resources