How to use Sphinx3 in an application - c

I used Sphinx4 for some time which really fits my needs. I load a recognizer, pass the audio data to it and use the recognized String in my application.
Right now I'm working on a C application (C++ is unfortunately not an option) where I need something similar and thought that I could use Sphinx3 which is written in C.
The problem is that I don't really know how it is used inside an application and there is no "Hello World"-example as Sphinx4 provides it.
I already compiled and installed sphinxbase and sphinx3 and now I can include the sphinx header files in my application.
Now to my questions:
Is there a "simple" and well documented example application that uses sphinx3 from a C environment?
How can I load up the sphinx3 engine and call a recognizer with my binary audio data?
OR: Do I need to start an application like "sphinx3_decode" and call it from my own application? If so, is there an example application for that?
Thank you in advance!
Best regards,
Robert

It's not recommended to use Sphinx3. From the website:
Sphinx-3 is CMU’s large vocabulary speech recognition system. It’s
older C based decoder that we continue to maintain. It’s planned to
make it obsolete in the future, it’s still most accurate decoder for
large vocabulary tasks. We are using it as a baseline to check the
recognizer accuracy. This decoder is only intended for researchers who
want to evaluate bleeding edge methods in ASR like tree search method.
If you need to use a decoder you should use pocketsphinx. You can find the tutorial and the API documentation on the website
http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx
http://cmusphinx.sourceforge.net/api/pocketsphinx/pocketsphinx_8h.html

I Recently worked on an Intregated Project on Punjabi Language.
Here are some steps that we used...
First we recorded the punjabi audio data in a vaccumed room in 16000 hz sample rate.
Then we took the recorded data and segmented it using Praat Software into small wav and raw files of 2 to 30 sec and saved them in a folder named train.
Then we took a system having Linux ie. Ubuntu and installed the required plug in like autoconfig, automake etc and untarred Sphinx 3 along with 4 packages that are cmuclmtk, pocketsphinx, sphinxbase, sphinxtrain.
Then according to the small wav files we made many files like transcription, dic, phone, filler, file id, ccs etc.
Then we opened the terminal and typed –"sphinx_fe” to check the whether the sphinx is functional or not.
Then we created an folder named “man” and then in terminal wrote its path.
Then we run the command- “sphinxtrain –t man setup”. By running this command an folder named “etc” will be formed in “man” folder containing files “feat_paramas” & ”config”.
Changes were made in the in the config file according to our data.
Then we moved all the files that we created before ie. transcription, dic in the etc folder in that is located in man folder.
Then we placed ‘lang1.sh” script in etc folder and remaining 4 scripts in man folder.
Then we opened the path for etc folder in terminal and run command- “lang1.sh”
Then we run series of commands in terminal – “mfcgen2.sh” then “verify3.sh” then “hmm4.sh” and at last “end-test.sh” to get the final result.
Rest if you have worked on Sphinx 4 then you may know about the files that are mentioned above in the steps. I hope this helps you.

Related

How extract deployment files from MSI database

MSI database contains set of tables, and I can successfully enumerate File table, which has all deployable file' meta-deta. What I need to extract is the actual contents of those files. msiexec, lessmsi, 7-zip all can do it, but I couldn't find any source/API to do it.
What I've discovered it that all other (resource) files are in Binary table, and Data field can be used to get content of those files (like icons, custom DLL etc).
Further, I found and know that Media table contains information about the .CAB file (MSI has all content embedded with <MediaTemplate EmbedCab="yes"/>. This simply means the CAB file contains the actual content. I probably need to read contents from "Structured Storage" of the .msi file.
How to extract the contents of CAB/MSI file, using native C Msi* functions?
Phil has given you the easy/simple answer but I thought I might give you a little more information since you've done some research. Checkout:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa372919(v=vs.85).aspx
This is where the structured storage is. You'll see something like Disk1.cab as the Name (PK) and binary data. The data is a CAB file with the file entry in the cab matching the File.File column. From there you can use the File.FileName column to get the short name and long name (you'll want the long name no doubt) and do a joint to the Component table to get the directory table ID.
You'll also need to recurse the directory table to build the tree of directories and know where to put the files.
Fun stuff. There's some libraries in C# that make this WAY simpler. Or just call msiexec /a as Phil says. :)
The most straightforward to extract all the files to some location is to install the product in "advertised" mode. If you do a:
msiexec /a [path to msi] TARGETDIR=[some folder]
you'll see what happens.
In C++ call MsiInstallProduct () with that command line.
You have gotten many good answers already, including the use of dark.exe from the WiX toolkit. By downloading the WiX source code you should be able to get the code you need ready-made from there. I assume you may already have done this.
Chris has already linked to the DTF code you can check, but here is a link directly to dark.exe as well: https://github.com/wixtoolset/wix3/tree/develop/src/tools/dark. I would try both. This is C#, you seem to want native.
UPDATE: Before I get to the Win32 features you can use, check out this little summary of the C# DTF features: How to programmatically read the properties inside an MSI file?
Native Win32 functions: The database functions to deal with an MSI file can be found on MSDN (this is to deal with the MSI file as a database). There are also MSI Installer Functions (used to deal with the MSI file as an actual installer).
You can certainly find good examples of native code for this with a good Google search. Have fun!
BTW: It would help with a description of the actual problem you are trying to solve as well as what you need technically. There could - as always - be less involved ways to achieve what you need. Unless you are writing a security software or malware scanner or something super-involved.
And so it is clear: WiX's dark.exe fully decompiles MSI files into WiX source files and the resource files used to build them - you can then text and binary compare the various types of content (text compare for tables, binary compare for binaries, etc...). The process to do so via command line is described in the following answer: How can I compare the content of two (or more) MSI files? (this is about comparing MSI files, but one option to do so is to decompile them - see section on dark.exe - just for reference for others who find your question).
I like to link things together so we can find content easily at a later point in time. Strictly speaking it doesn't seem necessary here, you have what you need I think but others could perhaps benefit from some further links. Here are some related links:
Extract MSI from EXE.
What is the purpose of administrative installation initiated using msiexec /a?
How do I extract files from an MSI package? (explains why you should not use 7-Zip to extract).

Create a binary file extension reader for mobile

It is an ancient binary file extension, actually a video file created by Inter-Tel Web Conference software. It contains a screen recording video and voice audio, and also can capture the keyboard chat log, attendees and the document manager window during a conference. It can be played with Inter-Tel Collaboration Player, a standalone application included with the Web Conference software package.
What I am trying to do now is finding a way to play these files on mobile, although Inter-Tel Collaboration Player offers exporting the files in AVI format, I want to know how to make a command line script for that because the application have lots of problems with Windows 7,8,10 and don't have a Mac OS version.
What is the way to create a new player for that kind of extensions?
"Linktivity stopped support on this app, http://linktivity.com even disappeared from the web..."
It seems they were bought out by Mitel Software so now everything is under the Mitel brand name.
"I just want to find a way to manipulate this file extension, a new good player for mobile and computer"
To open/edit those .lrec files with modern software you'll have to look at their :
Collaboration products.
Unified Communication products.
I tried :
To contact them just to double-check facts but they expect a realtime phone conversation with a salesperson so it wasn't an option. I'd be a fake potential customer, but you can provide a real-world issue (with background details) to see if they can solve it.
Also downloaded for Android the MiCollab app but it needs login details before even starting anything (so no progress to just check if an .lrec file from PC would open within Android).
Export videos for mobile playback :
I've tried the desktop software. Unfortunately it does not accept external commands so there is no way to make a script that takes multiple lrecs and gives back multiple AVI.
The only option is to extract frames from .lrec bytes and use a tool like FFmpeg to combine the images (since appears to do image grabs as frames) into one .MP4 video. MP4 is then playable on mobile devices.
Also any of your existing AVI files should be converted with FFmpeg to MP4.
You can download FFmpeg for Windows here (just the big blue button, ignore other options).
Copy the ffmpeg.exe file to some folder like c:\ffmpeg and put your avi's there.
Now open Command prompt and do cd C:\ffmpeg to reach folder, then type : ffmpeg -i filename.avi filename.mp4 (replace filename with preferred for input and output)
If you know how, just include ffmpeg.exe path to Control Panel PATH settings so that FFmpeg can be accessed from any folder (no need to move files to its own folder).
PS:
I am still researching how to get the frames it's an akward format without the specs (bytes order is Big Endian but then entry values are filled as Little Endian, then also not sure whether to reverse every two or four bytes cos it's mixed up like that etc and the pixel bytes themselves seem to have compression but it's not JPEG more like ZIP or whatever). Only confirmed bytes so far are for video width and video height. It seems doable though if the .lrec only contains screen recordings.
After some research, I found that Media Player Classic can play .lrec files. I don't know, if this helps you a bit.
For a own video player for your company, you would need the encoding infos or a decoder directly from Inter-Tel since they own the licences, without it you can't create one.
Edit: Deprecated info see comments.

Decompressing .lz file

Curiosity is one of my personal keys. I got a folder of an executable c application, this folder include many files some are files.so , files.ini and other files.lz and I decided to try do some kind of reverse engineering, so I have used a reverse engineering online tool for the files.so and files.ini are already opened via notepad as we all know, but now my problem is about opening files.lz, which i already know that it contains libraries to be used for functions on files.so
This is what i want to know and to have some help in it how can I decompress it via a desktop tool or even an online tool?
Should be Lzip.
When you are in the linux-world, one very usefull commands is file:
$ file myFile.lz
myFile.lz: lzip compressed data, version: 1

Signable, streamable, "readable" archive format?

Is there any archive format that offers the following:
be digitally sign-able with a digital certificate from a trusted source like Verisign - for preventing changes to the file (I am not referring to read only, but in case the file was changed it should no longer be signed telling the user this is not the original file)
be stream-able - be able to be opened even if not all of the content has been transferred (also not strictly linearly)
be "readable" - be able to read the data without extracting to a temporary folder (AFAIK if you open a file in a zip archive it is extracted first, and this stays true even for zip based formats like OOXML. This is not what I want)
be portable - support on at least Windows, Linux and Mac OS X is a must, or at least future support
be free of patents - Be open source - also preferably a license that allows commercial use(as far as i know GPL a share-alike license so it doesn't allow commercial use, BSD on the other hand allows it)
Note: Though it may come in handy eventually I can not think right now of a scenario that would require both point 1 and point 2 simultaneously. Or lets leave it a be able to check the signature only when the whole file was downloaded.
I am not interested in:
being able to be compressed
being supported on legacy systems
Does any existing archive format fit this description (tar evolutions like DAR and pax come to mind) ?
If there is, are there programing libraries available for the above mentioned OSs?
If not, would it be hard to create such a thing?
Usage scenario:
I want to use this to create a new media container.
Current media containers contain the audio, video and subtitle streams directly.
Matroska, currently the most advanced container, has supplementary features like attachments and menus.
The menu functionality however is not implemented and very limited.
What I want to create is one level higher.
I want to create a file similar in a way to OOXML.
Also all of the menuing should be done in web technologies like HTML5 (as it is now the tag allows for any kind of codec to be used) and CSS.
Also just like you have holograms on dvds to prove the authenticity I want to create a sign-able file
Research notes:
Before asking this question I stumbled uppon this:
Whats the best way digitally sign a zip file for download using .Net
While detached signing would be feasable for the individual files contained in this archive it is not an ellegant solution for the archive file. Not end user friendly.End users should be able to doubleclick the file to open it in a media player like VLC, and see a message that the file is legit (just like you see in a browser if the page is transmitted with SSL through HTTPS or not)
EDIT: clarified point 5
EDIT 2: added a note to clarify point 1 and 2
EDIT 3: added usage scenario
EDIT 4: added research notes section
P.S.: This is my first question on StackOverflow
I doubt that you find such format out of the box. I understand how such solution can be built with help of our SolFS, but SolFS doesn't have built-in signing (you can add signing easily).

Configuration Management for FPGA Designs

Which configuration management tool is the best for FPGA designs, specifically Xilinx FPGA's programmed with VHDL and C for the embedded (microblaze) software?
There isn't a "best", but configuration control solutions that work for software will be OK for FPGAs - the flow is very similar. I use Subversion at work and git at home, and wrote a little on 'why' at my blog.
In other answers, binary files keep getting mentioned - the only binary files I deal with are compilation products (equivalent to software object and executables), so I don't keep them in the version control repository, I keep a zipfile for each release/tag that I create with all the important (and irritatingly slow to reproduce) ones in.
I don't think it much matters what revision control tool you use -- anything that you would consider good in general will probably be OK here. I personally use Git for a sizable Verilog + software project, and I'm quite happy with it.
What will bite you in the ass -- no matter what version control you use -- is this: The Xilinx tools don't generally respect a clean division between "input" and "output" or between (human edited) "source" and (opaque) "binary." Many of the tools like to store some state information, like a last-run time or a hash value, in their "input" files meaning that you'll get lots of false changes. Coregen does this to its .xco files, and project navigator (the main GUI) does this to its .xise files. Also, both tools have a habit of inserting or removing lines for default-valued parameters, seemingly at random.
The biggest issue I've encountered is the work-flow with Coregen: In many cases, at least one of the following is true:
You have to manually edit the HDL files produced by Coregen.
The parameters that went into Coregen are stored somewhere other than the .xco file (usually in what looks like an output file).
You have to copy-and-paste the output from Coregen into your top-level design.
This means that there is no single logical source/master location for your input to the core-generating process. So even if you have the .xco file under version control, there's no expectation that the design you're running corresponds to it. If you re-generate "the same" core from its nominal inputs, you probably won't get the right outputs. And don't even think about merging.
I suggest CM tools that support version labeling and binary files. Most Software CM applications are fine with ASCII text files. They may just store a "difference" file rather than the entire file for updates.
My recommendations: PVCS, ClearCase and Subversion. DO NOT USE Microsoft SourceSafe. I don't like it because it only supports one label per revision.
I've seen Perforce and Subversion used in a couple of FPGA-intensive companies.
We use Perforce, and its great. You can have your code that lives in Linux-land checked in side-by-side with your Specs and Docs that live in Windows-land. And you get branching, labels, etc.
I've seen everything from Clearcase to RCS used, and it is really all okay for this kind of thing. The important thing is to get a good set of check-in policies established for your group, and make sure they stick to it.
And have automated nightly regressions. That way, when someone breaks the rules, they can be identified and publicly shamed.
I have personally used Perforce, Subverion, git and ClearCase for FPGA projects. Since VHDL and C are just text files, any works fine. However be sure to capture the other project and contraint files and any libraries you use.
Also think about what to do with the outputs, e.g. log file and bitstreams. Both tend to be big and the bitstreams are binaries.
Previously I used Subversion but have switched to git two years ago. Git handles FPGA design files just as well as it handles every other text and binary file. Git is all you need for version controlling your files and artifacts.
For building the designs, I recommend just using a single ISE project called "ise" (living in a subdirectory called "ise/"). You can take a look at my (very modest) FPGA open-source project on github for the file layout. I don't bother storing the ISE files at all since they are easy to regenerate. The only things I save are the Verilog files and some ISIM waveform config files. In other projects that use coregen I save the coregen.cgp project file and all of the *.xco scripts for regenerating cores. Then I use a Makefile for actually running coregen on the *.xco files. There are a few other Xilinx-specific files you should version control too: *.ucf, *.coe, *.xcf, etc.
I experimented with using Makefiles and the Xilinx command-line tools but found that ISE did a much better job tracking dependencies and calling the tools with the right arguments. Just don't make the mistake of trying to version control your ise/ project files or you will go mad. Xilinx has something like 300 different file types which change every release. If you want to save a file, you can try the ISE project file itself with a .xise extension. Anything that is hard to recreate, like the golden bitfile that you know works and took 6 hours to build, you might want to copy that and configuration manage it explicitly.

Resources