How .extension readers are made? - file

For example, there are many pdf readers in the market and they are not from Adobe. So, how do they make the readers(viewers) for these extensions?
I want to make an online application which has an ability to view these formats:
pdf,
word,
powerpoint,
is there special libraries or frameworks to do that?

They will either search for the official file format and implement a viewer for it, or they will try to reverse engineer the file formats and make a viewer for it.
For PDF, the file format has always been publicly shared by Adobe so others could implement viewers (and more); Adobe still makes their version of the specification public here: http://www.adobe.com/devnet/pdf/pdf_reference.html. Meanwhile, the PDF file format became an international standard through the ISO as ISO 32000 so the latest version of the PDF specification can also be gotten through the ISO or your countries standards organisation (if it is member of the ISO community).
For Word and PowerPoint, you would have to find the information from Microsoft. These file formats are proprietary file formats and certainly for the beginning of their life, no public documentation (that I'm aware of) existed. The later formats have been (at least partially) made public by Microsoft) - how complete that support is I'm not aware of.
As to your second point - how would you implement this, there are basically two ways to do this:
1) You can write everything from scratch. That is certainly feasible for PDF; some tens of companies have done so.
2) You could use the (very many) man years of work these companies have put in this by using an existing library that supports the file format. Libraries exist both on the open source, free and commercial level which implement support for all or a partial list of features in these file formats.

Related

How to get extended locale information with Windows CRT API

I am working on a personal prooject in which I need to obtain full locale formatting information from a C locale.
I cannot simply use localeconv or localeconv_l since lconv does not provide all formatting information needed. To solve this on *NIX there are nl_langinfo and nl_langinfo_l functions, however they are not present on Windows.
What ways are there to obtain locale formatting information on Windows?
start with: GetUserDefaultUILanguage
Similar and related APIs include:
GetUserDefaultLocaleName
GetUserDefaultLCID
GetUserDefaultLangID
The Perl 5 open source C language code contains an emulation of nl_langinfo(), for Windows and other platforms that lack it. You can steal the code, though it is complicated by trying to work on a bunch of different platforms with a bunch of different configurations
A few fields aren't implemented such as the Japanese emperor era names. But anything in common use is available.
Start with this file: https://github.com/Perl/perl5/blob/blead/locale.c
The code continues to evolve

Reading Microsoft Outlook MSG Content in pure C code

I need to read some Microsoft Outlook MSG file in pure C code. What I need is a library that doesn't depend on any particular framework (.NET, Java, etc), so a library/class/set of functions completely written in C.
Well, if you're okay with MFC or ATL (which would be C++), you should take a look at this article. Also, the format is described in detail here.
What you'll essentially be doing is reading the nodes which have information in ASCII text format, which boils down to the __substg1.0_xxxxxxxx nodes ending with 001E.

Simple C audio library

I'm looking for a simple-ish library for outputting audio. I'd like it to meet these criteria:
Licensed under LPGL/zlib/MIT or something similar – i'm going to use it in an indie commercial application and i don't have the money for a license.
Written in C, but C++ is fine.
Cross-platform (Windows, Linux, maybe OSX)
Able to read from some sort of audio file (i'd prefer WAV or OGG but i will gladly use less popular formats if need be) in memory (i've seen the use of a memfile struct and user-defined I/O callbacks). I need the file to be in memory because i put all my resources into a .zip archive, and i use another library to load those archived files into memory.
Supports playing multiple sounds at the same time, having a max of 8 or so is ok.
I'd really like to either have the source code or a static library (MinGW/GCC lib???.a), but if nothing else is available i will use a shared library.
I must have come accross two dozen different audio libraries in my search, all of which haven't quite met these criteria...
I would recommend PortAudio + libsndfile. Very popular combo, meets your requirements. Used by many other software applications including audacity.
Some of the candidates that immediately spring to my mind are:
SDL (there is a tutorial that demonstrates how to play a .wav format sound)
libav
ffmpeg
libao
OpenAL Soft
Jack Audio
You may have already looked at these and eliminated them, though. Can you give some more detail about the libraries that you have eliminated from consideration and why? This will help narrow down our recommendations.
You might want to look into SDL and SDL_mixer. Here is a good tutorial.
I've used SDL_mixer and it makes it easy to play background sounds or music and play multiple simultaneous sounds without having a need to write your own sound sample mixer.
I ended up using PortAudio (very low-level, flexible license) and wrote a mixer myself. See this topic i made on the C++ forums for some other people's tips on writing a custom mixer. It's not hard at all, really; i'm surprised that there are so many mixer libraries out there. For a breakdown of the WAV format (ready-to-stream raw audio data with a 44-byte header) see this.

Is a .BIN CD Image file a standard format?

In this page:
http://en.wikipedia.org/wiki/List_of_file_formats
I found many .BIN files used by many applications. Are all these the same format?
I am going to deal with .BIN files. I want to know the standard format.
Google could not help me to find a site explaining the standard structure for the .BIN format.
Because it changes BIN to Binary in the search results.
I am talking about the CD Image .BIN files.
The BIN extension indicates that it is just binary data and doesn't say anything about the actual format. Like your linked Wikipedia page suggests, the extension has different meanings depending on where it's used.
If you know it is a CD image, the actual structure of the data inside the image is usually some file system (probably ISO 9660 with Rock Ridge or Joliet extensions). On Linux this can be mounted through a loop device and used like a regular CD.
EDIT
The ISO 9660:1988 standard can be downloaded freely online:
http://www.ecma-international.org/publications/standards/Ecma-119.htm
A draft of the current ISO 9660:1999 is available here:
http://www.pismotechnic.com/cfs/iso9660-1999.html
The draft does not represent the official released standard (which you may purchase) but it may be close enough to get you most of the way. Note, these do not include any information about extensions that may be in use. The linked Wikipedia page lists a few of the most common extensions each of which will also have their own published standards.

Where can I get started with Unicode-friendly programming in C?

So, I’m working on a plain-C (ANSI 9899:1999) project, and am trying to figure out where to get started re: Unicode, UTF-8, and all that jazz.
Specifically, it’s a language interpreter project, and I have two primary places where I’ll need to handle Unicode: reading in source files (the language ostensibly supports Unicode identifiers and such), and in ‘string’ objects.
I’m familiar with all the obvious basics about Unicode, UTF-7/8/16/32 & UCS-2/4, so on and so forth… I’m mostly looking for useful, C-specific (that is, please no C++ or C#, which is all that’s been documented here on SO previously) resources as to my ‘next steps’ to implement Unicode-friendly stuff… in C.
Any links, manpages, Wikipedia articles, example code, is all extremely welcome. I’ll also try to maintain a list of such resources here in the original question, for anybody who happens across it later.
A must read before considering anything else, if you’re unfamiliar with Unicode, and what an encoding actually is: http://www.joelonsoftware.com/articles/Unicode.html
The UTF-8 home-page: http://www.utf-8.com/
man 3 iconv (as well as iconv_open and iconvctl)
International Components for Unicode (via Geoff Reedy)
libbasekit, which seems to include light Unicode-handling tools
Glib has some Unicode functions
A basic UTF-8 detector function, by Christoph
International Components for Unicode provides a portable C library for handling unicode. Here's their elevator pitch for ICU4C:
The C and C++ languages and many operating system environments do not provide full support for Unicode and standards-compliant text handling services. Even though some platforms do provide good Unicode text handling services, portable application code can not make use of them. The ICU4C libraries fills in this gap. ICU4C provides an open, flexible, portable foundation for applications to use for their software globalization requirements. ICU4C closely tracks industry standards, including Unicode and CLDR (Common Locale Data Repository).
GLib has some Unicode functions and is a pretty lightweight library. It's not near the same level of functionality that ICU provides, but it might be good enough for some applications. The other features of GLib are good to have for portable C programs too.
GTK+ is built on top of GLib. GLib provides the fundamental algorithmic language constructs commonly duplicated in applications. This library has features such as (this list is not a comprehensive list):
Object and type system
Main loop
Dynamic loading of modules (i.e. plug-ins)
Thread support
Timer support
Memory allocator
Threaded Queues (synchronous and asynchronous)
Lists (singly linked, doubly linked, double ended)
Hash tables
Arrays
Trees (N-ary and binary balanced)
String utilities and charset handling
Lexical scanner and XML parser
Base64 (encoding & decoding)
I think one of the interesting questions is - what should your canonical internal format for strings be? The 2 obvious choices (to me at least) are
a) utf8 in vanilla c-strings
b) utf16 in unsigned short arrays
In previous projects I have always chosen utf-8. Why ; because its the path of least resistance in the C world. Everything you are interfacing with (stdio, string.h etc) will work fine.
Next comes - what file format. The problem here is that its visible to your users (unless you provide the only editor for your language). Here I guess you have to take what they give you and try to guess by peeking (byte order marks help)

Resources