What is the Informix database file extension - database

I've got an image file of a hard drive from a client that wants a database extracted out of it. The client does not know any details except that the database was once installed on the server from which the image was created.
I found out that it is a UNIX system with a Informix DBS installed, but I am unable to find any database files. I'm not sure about the version of Informix, but it seem that it was installed around 15 years ago.
I'm not able to boot from the image. I'm just viewing the files.
Do informix database files have an extension and what could it be? Any other tips how to identify the database files?

Do you know whether the database was Informix Standard Engine (SE) or Informix (Informix Dynamic Server — IDS — or one of its multitude of namesakes over the years)?
Standard Engine
If it was SE, then the database files are in a directory database.dbs and the files holding the indexes and data for the tables have extensions .idx and .dat. That's pretty much fixed and very easy. The files and the directory should belong to group informix; the owner will be whoever created the database or the table within the database.
Informix Dynamic Server
If it was IDS, then there is no guaranteed naming convention, and Informix didn't even recommend one. Depending on the state of the disk, I'd look for large files that are owned by user informix and belong to group informix and have 660 (-rw-rw----) permissions. The files will have structure, but it isn't easy to discern it.
For example, I have 'chunk 0 of the root dbspace' in a file toru_31.rootdbs.c0 (server name toru_31 — a naming standard I imposed on my systems). It starts with:
0x0000: 00 00 00 00 01 00 AB 89 03 00 00 18 30 01 C0 06 ............0...
0x0010: 00 00 00 00 00 00 00 00 49 42 4D 20 49 6E 66 6F ........IBM Info
0x0020: 72 6D 69 78 20 44 79 6E 61 6D 69 63 20 53 65 72 rmix Dynamic Ser
0x0030: 76 65 72 20 43 6F 70 79 72 69 67 68 74 20 32 30 ver Copyright 20
0x0040: 30 31 2C 20 32 30 31 31 20 20 49 42 4D 20 43 6F 01, 2011 IBM Co
0x0050: 72 70 6F 72 61 74 69 6F 6E 2E 00 00 00 00 00 00 rporation.......
0x0060: 00 00 00 00 00 00 00 00 00 00 03 00 00 08 00 00 ................
0x0070: BA 2D CC 50 1A 00 00 00 00 00 00 00 C8 00 00 00 .-.P............
0x0080: 31 31 37 33 05 00 00 00 00 00 00 00 00 00 00 00 1173............
0x0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
However, 'chunk 1 of the root dbspace' (file toru_31.rootdbs.c1) starts:
0x0000: 00 00 00 00 02 00 F1 C6 00 00 00 18 18 00 E4 07 ................
0x0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
* (125)
0x07F0: 00 00 00 00 00 00 00 00 00 00 00 00 61 C6 92 00 ............a...
0x0800: 69 6E 66 6F 72 6D 69 78 69 6E 66 6F 72 6D 69 78 informixinformix
* (127)
0x1000: 02 00 00 00 02 00 BE B6 06 00 08 08 20 00 D8 07 ............ ...
0x1010: 00 00 00 00 00 00 00 00 10 06 00 00 01 00 00 00 ................
0x1020: EA 06 00 00 03 00 00 00 75 08 00 00 08 00 00 00 ........u.......
0x1030: A1 08 00 00 08 00 00 00 B5 0A 00 00 30 03 00 00 ............0...
There is very little information there to give the game away. Although the appearance of informix (128 lines containing informix twice) looks like a tell-tale sign, it isn't — I created the chunk with a program that writes informix over the disk space. It simply shows where the server has not yet written data to this chunk.
You can look for $INFORMIXDIR. In fact, if there's a directory /INFORMIXTMP, you can look in the text file /INFORMIXTMP/.infxdirs to see where Informix has been installed. From those directories, you can look in $INFORMIXDIR/etc for onconfig files - they're text files that contain an entry ROOTPATH, which is the pathname of chunk 0 of the root dbspace — basically, the starting point for the whole system. There's usually an onconfig.std which is a template; the naming convention is not firmly fixed, though I always use onconfig.servername (so the config file for toru_31 is onconfig.toru_31). You may also find other files around in $INFORMIXDIR/etc. For example, there's a file oncfg_toru_31.31 (prefix oncfg_, followed by server name toru_31 followed by dot and the server number 31) which contains information about the chunks and other disk space etc used by the server. You might also see binary files analogous to .conf.toru_31 and .infos.toru_31 — the .infos file is normally present only while the server is up but the .conf file persists. These files have some limited information in them, most notably the name of the onconfig file.
If you can find these files on the disk, then you can proceed to identify where the data was stored on the disk.

Related

C program in Docker: fwrite(3) and write(2) fail to modify files on Windows but not on MacOS

I am writing a guest OS on top of Linux (Ubuntu distribution) within a Docker container. The filesystem is implemented as a single file resting inside the host OS, so anytime a file is changed in the guest OS filesystem, the file on the host OS must be opened, the correct block(s) must be overwritten, and the file must be closed.
My partner and I have developed the following recursive helper function to take in a block number and offset to abstract away all details at the block-level for higher level functions:
/**
* Recursive procedure to write n bytes from buf to the
* block specified by block_num. Also updates FAT to
* reflect changes.
*
* #param block_num identifier for block to begin writing
* #param buf buffer to write from
* #param n number of bytes to write
* #param offset number of bytes to start writing from as
* measured from start of file
*
* #returns number of bytes written
*/
int write_bytes(int block_num, const char *buf, int n, int offset) {
BlockTuple red_tup = reduce_block_offset(block_num, offset);
block_num = red_tup.block;
offset = red_tup.offset;
FILE *fp = fopen(fat->fname, "r+");
int bytes_to_write = min(n, fat->block_size - offset);
int write_n = max(bytes_to_write, 0);
fseek(fp, get_block_start(block_num) + offset, SEEK_SET);
fwrite(buf, 1, write_n, fp); // This line is returning 48 bytes written
fclose(fp);
// Check if there are bits remaining
int bytes_left = n - write_n;
if (bytes_left > 0) {
// Recursively write on next block
int next_block = get_free_block();
set_fat_entry(block_num, next_block); // point block to next block
set_fat_entry(next_block, 0xFFFF);
return write_bytes(next_block, buf + write_n, bytes_left, max(0, offset - fat->block_size)) + write_n;
} else {
set_fat_entry(block_num, 0xFFFF); // mark file as terminated
return write_n;
}
}
The issue is that fwrite(3) is reporting 48 bytes written (when n is passed as 48) but hexdumping the file on the host OS reveals no bytes have been changed:
00000000 00 01 ff ff ff ff 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00008000
This is particularly wacky because when my partner runs the code on the exact same commit (with no uncommitted changes), her write goes through and the file on the host OS hexdumps to:
00000000 00 01 ff ff ff ff 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000100 66 31 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |f1..............|
00000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000120 00 01 00 00 02 00 01 06 e7 36 75 63 00 00 00 00 |.........6uc....|
00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000200 e0 53 f8 88 c0 0d 37 ca 84 1f 19 b0 6c a8 68 7b |.S....7.....l.h{|
00000210 57 83 cf 13 f0 42 21 d3 21 e1 da de d4 8a f1 e6 |W....B!.!.......|
00000220 f0 12 98 fb 1c 30 4c 04 b3 16 1d 96 17 ba d7 5a |.....0L........Z|
00000230 7e f3 8a f5 6a 42 6b ef 58 f6 bc 01 db 0c 02 53 |~...jBk.X......S|
00000240 e5 10 7e f3 4a d5 3f ac 8e 38 82 c3 95 f8 11 8e |..~.J.?..8......|
00000250 a6 82 eb 3b 24 56 9a 75 44 36 8b 25 60 83 4c 04 |...;$V.uD6.%`.L.|
00000260 07 9e 14 99 9c 9f 87 3c 8a d4 c3 e8 17 60 81 0e |.......<.....`..|
00000270 bc eb 1d 35 68 fc d5 be 4f 1c 9d 5e 72 57 65 01 |...5h...O..^rWe.|
00000280 b7 43 54 26 d6 6d ba 51 bf 12 8c a1 03 d5 66 b3 |.CT&.m.Q......f.|
00000290 90 0d 60 b8 95 8d 15 bd 53 9a 70 77 4f 7a 04 1e |..`.....S.pwOz..|
000002a0 9e b2 4c 9a 79 dd de 48 cd fe 1e dc 57 7d d1 7f |..L.y..H....W}..|
000002b0 3f f5 77 96 fa e7 d7 33 33 48 ce 0a 4d 61 ab 96 |?.w....33H..Ma..|
000002c0 5f c4 88 bf c6 3a 09 37 76 c4 b8 db bc 6a 7d c0 |_....:.7v....j}.|
000002d0 c4 89 68 e7 b4 70 f8 a6 a8 00 9d c4 63 da fb 66 |..h..p......c..f|
000002e0 be d2 cd 68 1c d2 ff bf 00 e9 37 ab 6b 1a 3c f2 |...h......7.k.<.|
000002f0 7b c1 a2 c4 46 ae db 93 b4 4f 64 79 14 2a 1a d4 |{...F....Ody.*..|
00000300 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00008000
The 48 bytes I'm referring to that don't get written are the bytes written to the directory block running from address 00000100-0000012E (the bytes below that represent the actual file being written, the code seg faults on my end before reaching that write). It's worth noting my container can still format the filesystem file, so all writes aren't broken. This snippet just represents the first write that did not work.
We are both running the code in an identical Docker container. The only difference I could imagine is my computer is a Windows and hers is a Mac. What could possibly be the issue?
The very first thing I believed was that there was some conflict with the host OS that blocked my write, but assigning and printing the return value of fwrite(3) returned that 48 bytes were indeed written on both machines.
I was also expecting that my buffer was simply all 0s (it is initially allocated using calloc(3)), but printing out the first 48 bytes of the buffer proved that theory false.
I finally considered that this was some issue with the higher level interface in <stdio.h> instead of the lower level one in <unistd.h>. I replaced fopen(3), fwrite(3), flseek(3), fclose(3) each with their lower-level equivalents (write(2) etc) and it still turned up 48 bytes written with no actual change to the files.
EDIT:
The guest OS filesystem can be formatted with respect to user parameters. All testing above was performed with a block size of 256 bytes and 128 blocks total. I've attempted the exact same write sequence again with a block size of 1024 bytes and 16384 blocks total, and there was no error. It's still unclear why the code works on my partner's machine for both format configs and not mine, but this may narrow it down.
Running strace reveals the following excerpt around the write:
openat(AT_FDCWD, "minfs", O_RDWR) = 4
newfstatat(4, "", {st_mode=S_IFREG|0777, st_size=32768, ...}, AT_EMPTY_PATH) = 0
lseek(4, 0, SEEK_SET) = 0
read(4, "\0\1\377\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 256) = 256
write(4, "f1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 48) = 48
close(4)
It again appears the bytes get written, but a hd after the program finishes reveals the same output as above. My thought was perhaps the bytes written in the excerpt are overwritten later on, but the only write after the excerpt above in the strace is:
lseek(4, 0, SEEK_SET) = 0
read(4, "\0\1\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 320) = 320
write(4, "\0", 1) = 1
close(4)
which should be at address 320, squarely after the write at address 256 above.
It turns out the mismatch was due to undefined behavior concerning when changes were synchronized in mmap(2). There was a section of code where a region of memory mapped via mmap(2) was changed and then immediately followed by reads/writes to the file on the host OS containing the mapped region of memory. It seems the Mac would write through the changes before the following section while the Windows wouldn't synchronize until after the fact, resulting in the undefined behavior.
The problem was fixed by making a call to msync(2) immediately after modifying the mapped region from mmap(2) with the MS_SYNC flag forcing the write-through behavior.
Links to documentation here: mmap(2), msync(2).

How to parse this Apple-specific MPEG file header?

I have an MPEG file which starts like this:
0: 00 0f 6d 79 5f 66 69 6c 65 6e 61 6d 65 2e 6d 70 ..my_filename.mp
10: 67 00 04 fc 00 00 f0 00 b2 10 39 a8 b2 10 39 ad g.........9...9.
20: 0f 6d 79 5f 66 69 6c 65 6e 61 6d 65 2e 6d 70 67 .my_filename.mpg
30: 03 92 3b 40 00 00 00 00 03 7a b5 7c 03 7a d7 d0 ..;#.....z.|.z..
40: 00 4d 6f 6f 56 54 56 4f 44 01 00 01 2a 00 80 00 .MooVTVOD...*...
50: 00 00 00 00 36 b2 83 00 00 04 fc b2 10 39 a8 b2 ....6........9..
60: 10 39 ad 00 00 00 00 00 00 00 00 00 00 00 00 00 .9..............
70: 00 00 00 00 00 00 00 00 00 00 81 81 35 d3 00 00 ............5...
80: 00 36 b2 83 6d 64 61 74 00 00 01 ba 21 00 01 00 .6..mdat....!...
90: 05 80 2b 81 00 00 01 bb 00 0c 80 2f d9 04 e1 ff ..+......../....
a0: c0 c0 20 e0 e0 2e 00 00 01 c0 07 ea ff ff ff ff .. .............
What is the file format of the beginning of the file (first 0x80 bytes), and how do I parse it?
I've run a Google search on MooVTVOD, it looks like something related to QuickTime and iTunes.
What I understand already:
There is 4 bytes of big endian file size in front mdat, according to the QuickTime .mov file format when the .mov contains an MPEG.
Right after mdat there is the MPEG-PS header 00 00 01 ba. Shortly after there is the MPEG-PES header 00 00 01 c0 indicating an audio stream.
However, the first 0x80 bytes in this file seem to be in a different file file format (not QuickTime .mov, not MPEG-PS, not MPEG-PES), and in this question I'm only interested in the file format of the first 0x80 bytes.
Media players such as VLC routinely ignore junk at the beginning of the file, and start playing the MPEG-PS stream at offset 0x80. However, I'm interested in the 0x80 bytes they ignore.
The file format is "Quicktime movie atom," that contains meta information about the media file, or the media itself. mdat is a Media Data atom.
Media atoms describe and define a track’s media type and sample data.
the spec is here: https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFChap2/qtff2.html
You can parse it with this python script:
https://github.com/kzahel/quicktime-parse
or this software, mentioned in the linked question:
https://archive.codeplex.com/?p=mp4explorer
The file format of the first 0x80 bytes in the question is MacBinary II.
Based on the file format description https://github.com/mietek/theunarchiver/wiki/MacBinarySpecs , it's not MacBinary I, because MacBinary II has data[0x7a] == '\x81' and data[0x7b] == '\x81' (and MacBinary I has data[0x7a] == '\x00', and MacBinary III has data[0x7a] == '\x82'); and it's also not MacBinary III, because that one has data[0x66 : 0x6a] == 'mBIN'.
The CRC value data[0x7c : 0x7e] is incorrect because the filename was modified before posting to StackOverflow. FYI The CRC algorithm used is CRC16/XMODEM (on https://crccalc.com/), also implemented as CalcCRC in
MacBinary.c.
The data[0x41 : 0x45] == 'MooV' is the file type code. According to the Excel spreadsheet downloadable from TCDB, it means QuickTime movie (video and maybe audio).
The data[0x45 : 0x49] == 'TVOD' is the file creator code. TCDB and this database indicate that it means QuickTime Player.
More info and links about MacBinary: http://fileformats.archiveteam.org/wiki/MacBinary
Please note that all these headers are unnecessary to play the video: by removing the first 0x88 bytes we get an MPEG-PS video file, which many video players can play (not only QuickTime Player, and not only players running on macOS).

Visual C++, Extract text information from a DLL database (sentence mining)

The 2006 version of the free offline Chinese sentence dictionary Jukuu contains a collection of 100,000 publicly sourced example sentences in Chinese and English in a .dll file.
The application size is about 80mb, but once installed a 500mb DLL dictionary file is created with the source text. For whatever reason the application doesn't run on my computer, and I'd like to extract all the example sentences so I can do some POS analysis on them.
Opening the 500mb .DLL file is mostly gibberish, except for some fragments of text here and there and references to other resources.
I'm wondering if there is any way I can extract the information in plain text?
The application can be downloaded here: http://www.jukuu.com/down/download.html
Thanks!
Edit: Never-mind, It looks like when viewed in HEX, the file is ordered in a way that is not conducive to sentence mining at all:
00 06 00 02 00 07 03 00 01 00 00 00 FF FE 6E 65 76 65 72 7E 73 74 61
6E 64 20 75 70 0B 00 00 80 00 00 00 00 00 00 00 00 FF FE 33 30 32
32 31 38 32 36 38 2D 00 16 00 06 00 02 00 07 03 00 01 00 00 00 FF
FE 65 78 74 72 65 6D 65 6C 79 7E 63 6C 6F 73 65 0B 00 00 80 00 00
00 00 00 00 00
Something like ÿþnever~stand upÿþ302218268-ÿþextremely~close
Any other ideas on how to mine sentences from the application? Maybe a batch script?

How does raw result format returned from Microsoft SQL Server looks like?

I have been using SQL for like 10 years but now I realised I never knew how the client actually receive and process data it gets from the server.
My question is, how does the result from Microsoft SQL Server actually look like in raw format? The same as that result from HTTP server contains HTTP headers and a Content-Type header to tell what the body format is (mostly HTML for web pages).
The protocol name is TDS (Tabular Data Stream).
Some documentation is available at MSDN.
There is a very basic example of data being transferred for simple query select 'foo' as 'bar'
Request
Packet header (type, legth, etc)
01 01 00 5C 00 00 01 00
Packet data
16 00 00 00 - headers total length
12 00 00 00 - first header length
02 00 - type
00 00 00 00 00 00 00 01 00 00 00 00 - data
0A 00 73 00 65 00 6C 00 65 00 63 00 74 00 20 00
27 00 66 00 6F 00 6F 00 27 00 20 00 61 00 73 00
20 00 27 00 62 00 61 00 72 00 27 00 0A 00 20 00
20 00 20 00 20 00 20 00 20 00 20 00 20 00 - sql
Response
Packet header (type, legth, etc)
04 01 00 33 00 00 01 00
Packet data
columns metadata
81 - record id
01 - count
first column
00 00 00 00 00 - user type
20 00 - flags
A7 - type
03 00 - length
09 04 D0 00 34 - colation
03 - column name length
62 00 61 00 72 00 - column name bytes
rows
D1 - record id
03 00 - length
66 6F 6F - value
ending data
FD - record id
10 00 - status
C1 00
01 00 00 00 00 00 00 00 - rows total
We can also look at parser implementation thanks to reference sources.

Why does SQL Management Studio output null separated characters when saving as csv?

and can it be configured not to happen?
I'm usually finding myself saving a result of a query as a .csv and processing it later on my Unix machine. The characters being null separated makes me have to filter those chars and is a bit of a pain.
So, these are the questions:
Why is this so?
EDIT:
Because it outputs in UTF-16 by default. Easiest conversion would then be:
iconv -f utf-16 -t utf-8 origFile.csv > newFile.csv
Can it be disabled somehow? How?
Here's a piece of a hexdump of a file thus generated. Each char is followed by a
null char (00):
00000cf0 36 00 36 00 32 00 0d 00 0a 00 36 00 38 00 34 00 |6.6.2.....6.8.4.|
00000d00 30 00 36 00 32 00 31 00 36 00 0d 00 0a 00 36 00 |0.6.2.1.6.....6.|
00000d10 38 00 34 00 30 00 36 00 33 00 36 00 34 00 0d 00 |8.4.0.6.3.6.4...|
00000d20 0a 00 36 00 38 00 34 00 30 00 36 00 38 00 34 00 |..6.8.4.0.6.8.4.|
00000d30 32 00 0d 00 0a 00 36 00 38 00 34 00 30 00 37 00 |2.....6.8.4.0.7.|
00000d40 30 00 32 00 31 00 0d 00 0a 00 36 00 38 00 34 00 |0.2.1.....6.8.4.|
00000d50 30 00 37 00 37 00 39 00 37 00 0d 00 0a 00 36 00 |0.7.7.9.7.....6.|
00000d60 38 00 34 00 30 00 37 00 39 00 32 00 31 00 0d 00 |8.4.0.7.9.2.1...|
00000d70 0a 00 36 00 38 00 34 00 30 00 38 00 32 00 34 00 |..6.8.4.0.8.2.4.|
00000d80 31 00 0d 00 0a 00 36 00 38 00 34 00 30 00 38 00 |1.....6.8.4.0.8.|
00000d90 36 00 36 00 31 00 0d 00 0a 00 36 00 38 00 34 00 |6.6.1.....6.8.4.|
00000da0 30 00 38 00 37 00 35 00 31 00 0d 00 0a 00 36 00 |0.8.7.5.1.....6.|
00000db0 38 00 34 00 31 00 30 00 32 00 35 00 34 00 0d 00 |8.4.1.0.2.5.4...|
00000dc0 0a 00 36 00 38 00 34 00 31 00 30 00 34 00 34 00 |..6.8.4.1.0.4.4.|
The file is being outputted in Unicode, not ASCII. Unicode uses twice as many bits to represent each character, hence the preceding 00's.
There might be an option to save as ANSI or ASCII, which should use 8 bit characters.
I know this is an old post...but for new visitors...
When you are saving data from Microsoft SQL Management Studio, you will notice that the 'Save' button has a little arrow next to it. If you select the little arrow you can select 'Save With Encoding...' this will allow you to select the encoding you desire.
On Unix, I suggest the use of iconv -futf-16le -tutf-8 to filter your output. :-)

Resources