Format file size as MB, GB etc the Microsoft way - string-formatting

Basically the same problem as Format file size as MB, GB etc: Format the size of a file in a human readable way. The twist: I need it to be the same algorithm that MS windows uses in its file explorer (or else my users get confused). What algorithm does MS use?
Note: It's not the one from the answers to the referenced question

I fear I have found the answer: https://blogs.msdn.com/b/oldnewthing/archive/2011/03/15/10140985.aspx

Related

Streaming larger files in java

We are streaming the files form server in zip format and writning into oracle blob object using pipedstreams.It is working fine to me some 300MB size.But i have the requirement to stor e the gatter than 2GB data.When i tried to store 1GB data it is failing.Please suggest me the better way to stream the larger size files in java.
--Thanks in Adv
if your code fails around 300MB you most certainly have created faulty code - my guess is your JVM heap size is set to ~512MB and you only got ~300MB of free memory for your own purposes -- that is more than enough, just stream your file in small chunks (maybe about 1KiB or even 1MiB if you want) and you'll be good to go :
https://stackoverflow.com/a/55788/351861

How to truncate a file in FAT32 file system without zero padding in C?

I have 2TB HDD contains a single partition of FAT32 file system. While truncating a file to larger size say 100MB or 200MB using ftruncate(), it takes time of 5 to 10 seconds to do zero padding. Is there any way of file truncation that take a less time or do it without zero padding?
No. Ftruncate specifically adds zeros to the end according to the GNU C library page. It also says you could try and just use truncate, and it will try to fill it with holes instead, but this isn't supported on all machines so you might not be able to do that.
You could try and hack the file system by opening the inode and changing the value to make the File system think the file is larger, but this is a security issue.

What are possible values for the FileSystemName string that GetVolumeInformation returns?

The MSDN documentation and the knowledge base article of GetVolumeInformation are not very specific what the file system name string can contain.
The obvious values are NTFS, CDFS and FAT32. But can it also detect other file systems and what would be the strings ?
I also read somewhere that sometimes version numbers are included in the string. Any idea regarding this ? I don't know the specifics anymore. :(
Thanks for your help !
This function can detect the following file systems:
FAT, FAT32, NTFS, HPFS, CDFS, UDF, NWFS
As I can remember for my experiences in 3 years ago, ex2 or ex3 were not detectable at all on Windows XP SP3.
Edit
Since Vista SP2, there is also support for exFAT

Help required with ancient, unknown storage system

Morning all,
I've gone and told a customer I could migrate some of their old data out of a DOS based system into the new system I've developed for them. However I said that without actually looking at the files that stored the data in the old system - I just figured a quick google would solve all the problem for me... I was wrong!
Anyway, this program has a folder with hundreds... well 800 files with all sorts of file extensions, .ave, .bak, .brw, .dat, .001, .002...., .007, .dbf, .dbe and .his.
.Bak obviously isn't a SQL backup file.
Does anyone have any programming experience using any of those file types who may be able to point me in the direction of some way to read and extract the data?
I cant mention the program name for the reason that I don't think the original developer will allow this...
Thanks.
I'm willing to bet that the .dbf file is in DBase format, which is really straightforward. The contents of that might provide clues to the rest of them.
the unix 'file' utility can be used to recognize many file types by their 'magic number'. It examines the file's contents and compares it with thousands of known formats. If the files are in any kind of common format, this can probably save you a good amount of work.
if they're NOT in a common format, it may send you chasing after red herrings. Take its suggestions as just that, suggestions.
In complement to the sites suggested by Greg and Dmitriy, there's also the repository of file formats at http://www.wotsit.org ("What's its format?").
If that doesn't help, a good hex editor (with dump display) is your friend... I've always found it amazing how easy it can be to read and recognize many file formats.
Could be anything. Best be is to open with a hex editor, and see what you can see
Most older systems used a basic ISAM which had one file per table that contained a set of fixed length data records. The other files would probably be indexes
As you only need the data, not the index, just look for the files with repeating data patterns (it often looks like pretty patterns on the hex editor screen)
When you find the file with the data, try to locate a know record e.g. "Mr Smith" and see if you can work out the other fields. Integers are often byte for byte, dates are often encoded and days from a known start date, money could be in BCD
If you see a strong pattern, then most likely each record is a fixed length. There will probably be a header block on the file say 128 or 256 bytes, and then the fixed length records
Many old system where written in COBOL. There is plenty of info on the net re cobol formats, and some companies even sell COBOL ODBC drivers!
I think Greg is right about .dbf file. You should try to find some information about other file formats using sites like http://filext.com and http://dotwhat.net. The .bak file is usually a copy of another file with the same name, but other extension. For example there may be database.dbf file and database.bak file with backup of it. You should ask (if it's possible) for any details/documentation/source code of application that used that files from your customer.
Back in the DOS days, programmers used to make up their own file extentions pretty much as they saw fit. The DBF might well be a DBase file which is easy enough to read, and the .BAK is probably a backup of one of the other important files, or just a backup left by a text editor.
For the remaining files, first thing I would do is check if they are in a readable ASCII format by opening them in a text editor.
If this doesn't give you a good result, try opening them in a binary editor that shows side by side hex and ASCII with control characters blanked out. Look for repeating patterns that might correspond to record fields. For example, say the .HIS was something like an order histrory file, it might contain embedded product codes or names. If this is the case, count the number of bytes between such fields. If it is a regular number, you probably have a flat binary file of records. This is best decoded by opening the file in the app, looking for values in a given record, and searching for the corresponding values in the binary file. Time consuming, and a pain in the ass, but workable enough once you get the hang of it.
Happy hacking!
.DBF is a dBASE or early FoxPro database.
.DAT was used by Btrieve, and IIRC Paradox for DOS.
The .DBE and .00x files are probably either temporary or index files related to the .DAT files.
.DBF is easy. They'll open with MS Access or Excel (pre-2007 versions of Office, anyway), or with ADO or ODBC.
If the .DAT files are indeed Btrieve, you're in a world of hurt. They're a mess, even if you can get your hands on the right version of the data dictionary and a copy of the Btrieve structure. (Been there, done that, wore out the t-shirt before I got done.)
As others have suggested, I recommend a hex editor if you can't figure out what those files are and that dbf is probably Dbase.
BAK seems to be a backup file. I'm thinking that *.001, *.002, etc might be part of the backup. Are they all the same size? Maybe the backup was broken up into smaller pieces so that it could fit onto removable media?
Finally, take this as a life lesson. Before sending that Statement of Work over, if the customer asks you to import data from System A to System B, always ask for the sample schema and sample data and sample files. Lots of times things that seem straight forward hand up being nightmares.
Good luck!
Be sure to use the Modified date on the files as clues, if the .001, .002, etc all have similar time stamps, maybe along with the .BAK, they might be part of the backup. Also there may be some old cruft in the directory you can (somewhat safely) ignore. Look for .BAT files and try to dissect them as well.
One hint, if the .dbf files are DBase, FoxPro, or one of the other products that used that format. Then you may be able to read them using ODBC. My system still has the ODBC driver for .dbf (Vista, with VS 2008 - how it got there I'd have to hunt up, but I'd guess it was MDAC Microsoft Data Access which put that there). So, you may not have a "world of unpicking to do", if the ODBC driver will read the .dbf files.
I seem to remember (with a little confidence of 20+ years ago DBase III tinkering) that DBase used .001, .002, ... file for memo (big text) fields.
Good luck trying to salvage the data.
The DBF format is fairly common.
The other files are puzzling.
I'm guessing that either you're dealing with old BTrieve files (bad), or (hopefully) with the results of some ill-conceived backup scheme where someone backed up his database into the same directory rather than into the hard drive in which case you could ignore these.
It's now part of Pervasive, but I used, years ago, Data Junction to migrate data from lots of file types to others. Have a look, unless you want to write a parser.
.dat can also be old Clarion 2.1 files... It works on an ISAM basis also, with key/index files

Dataset size limit as an xml file

We are currently using DataSet for loading and saving our data to an xml file using Dataset and there is a good possibility that the size of the xml file could get very huge.
Either way we are wondering if there is any limit on the size for an xml file so the Dataset would not run into any issues in the future due to the size of it. Please advise.
Thanks
N
well, OS max file size in one thing to consider.
(although modern os won't have this problem).
old OS support only 2 GB per file if I recall right.
also - the time that you will need to waist on updating the file is enourmouse.
if you're going for a very very large file, use a small DB (mysql or sqlexpress or sqLite)
DataSets are stored in memory so the limit should be somewhere around the amount of memory the OS can address for custom processes.
While doing work for a prior client of reading and parsing large XML files over 2 gig per XML file, the system choked while trying to use an XML reader. While working w/Microsoft, we ultimately were passed on to the person who wrote the XML engine. His recommendation was for us to read and process it in smaller chunks, that it couldn't handle loading the entire thing into memory at one time. However, if you are trying to WRITE XML as a stream to a final output .XML file, you should be good to go on most current OS that support over 2 gig.

Resources