I should preface this by saying I'm working on a pocket PC app and the data files live on sd cards.
I have an app that has to create an array of size x.
malloc is failing every time.
I've got a 1 gig file on a 4 gig card.
I've got 64 megs of onboard memory (ram/data/application/os)
I can't process the data because the array I need is too big.
Accessing an sd card is almost as fast as ram.
I'm working in C++ (mfc)
what's the best way to access the file I'm going to use as an array?
Or would there be a different way to do this?
You should create a file large enough for the array, suitably padded (according to GetSystemInfo), and the map the file with CreateFileMapping/MapViewOfFile.
Atleast, that would be my first try - there might be restrictions on how large a mapped file can be on CE.
You'd need to create a window of n records (that will fit in memory) and move that window so as to keep the record(s) you're doing work on in it. I'm not fluent enough in mfc to give you a code sample, but it wouldn't be all that hard.
In c# i'd write a custom IEnumerable<T>
Related
I have five hundred xml files, I want to store file contents into a array or list. use Loadrunner call array and send xml files to application server, But I am not familiar with c.
example:
Result01.xml,Result02.xml,Result03.xml,Result4.xml,Result05.xml,....Result500.xml,
Thanks!
CPU, Disk, Memory, Network. This is your finite resource pool. Attempting, for every single virtual user, to pull all XML files into memory is going to drive the memory requirements for each virtual user through the roof and you will very likely wind up in a swap of death model on your load generator.
Consider storing all of the names of your files and their location on a very fast SSD local to the load generator in a parameter file. Select from that file randomly for the filename. Read it from disk and then submit it as appropriate. This will limit your in memory need to the size of your largest XML file, which you can free() as soon as you are done using the file. This does introduce a disk dependency, but note it is a read only dependency and the recommendation to used SSD for the storage is because of the absurdly high read IOPS on that media to reduce the window of conflict to an absolute minimum.
This is also a good time to brush up on your C programming. There are lots of great books out there. You need to be proficient with the language of your tool, no matter what the tool is, if you are going to be effective with the tool.
I was thinking recently, whenever I use a disc, I use it by either burning an image onto it, or by formatting it and using it like a USB. I never used it as a raw storage medium to poke bytes into/read bytes from.
I am now curious if it is possible to use a DVD as a blob of binary data that I can write bits onto as I please.
From what I understand, it is trivial to write to a DVD using C if I format it, so that I can interface it much like a typical C or D drive(I can even rename the disk name to C or D if I want to).
I'm curious if I can do the same without formatting the disk, so that the only bits on it are the ones that I write to, or the default ones.
To summarize, I want to be able to perform the following operations on an unformatted DVD-RW
read a bunch of bytes at an offset into an in-memory byte pool
overwrite a bunch of bytes at an offset from a in-memory byte pool without affecting other bytes on the disk
How can this be accomplished?
Thanks ahead of time.
On Linux, you can just open the block device and do sufficiently aligned writes:
Documentation/cdrom/packet-writing.txt in the kernel sources
You only need to format the media as DVD+RW once, using dvd+rw-format. This is a relatively simple procedure, so you could extract it from the source code of that tool.
However, according to the kernel documentation, what is a “sufficiently aligned write” is somewhat up to interpretation—the spec says 2 KiB, but some drives require more alignment. There is also no wear leveling or sector remapping at this layer, so good results really require that you use on-disk data structures which reflect that this technology is closer in reality to write-once rather than truly random access.
I am working on a project where I am using words, encoded by vectors, which are about 2000 floats long. Now when I use these with raw text I need to retrieve the vector for each word as it comes across and do some computations with it. Needless to say for a large vocabulary (~100k words) this has a large storage requirement (about 8 GB in a text file).
I initially had a system where I split the large text file into smaller ones and then for a particular word, I read its file, and retrieved its vector. This was too slow as you might imagine.
I next tried reading everything into RAM (takes about ~40GB RAM) figuring once everything was read in, it would be quite fast. However, it takes a long time to read in and a disadvantage is that I have to use only certain machines which have enough free RAM to do this. However, once the data is loaded, it is much faster than the other approach.
I was wondering how a database would compare with these approaches. Retrieval would be slower than the RAM approach, but there wouldn't be the overhead requirement. Also, any other ideas would be welcome and I have had others myself (i.e. caching, using a server that has everything loaded into RAM etc.). I might benchmark a database, but I thought I would post here to see what other had to say.
Thanks!
UPDATE
I used Tyler's suggestion. Although in my case I did not think a BTree was necessary. I just hashed the words and their offset. I then could look up a word and read in its vector at runtime. I cached the words as they occurred in text so at most each vector is read in only once, however this saves the overhead of reading in and storing unneeded words, making it superior to the RAM approach.
Just an FYI, I used Java's RamdomAccessFile class and made use of the readLine(), getFilePointer(), and seek() functions.
Thanks to all who contributed to this thread.
UPDATE 2
For more performance improvement check out buffered RandomAccessFile from:
http://minddumped.blogspot.com/2009/01/buffered-javaiorandomaccessfile.html
Apparently the readLine from RandomAccessFile is very slow because it reads byte by byte. This gave me some nice improvement.
As a rule, anything custom coded should be much faster than a generic database, assuming you have coded it efficiently.
There are specific C-libraries to solve this problem using B-trees. In the old days there was a famous library called "B-trieve" that was very popular because it was fast. In this application a B-tree will be faster and easier than fooling around with a database.
If you want optimal performance you would use a data structure called a suffix tree. There are libraries which are designed to create and use suffix trees. This will give you the fastest word lookup possible.
In either case there is no reason to store the entire dataset in memory, just store the B-tree (or suffix tree) with an offset to the data in memory. This will require about 3 to 5 megabytes of memory. When you query the tree you get an offset back. Then open the file, seek forwards to the offset and read the vector off disk.
You could use a simple text based index file just mapping the words to indices, and another file just containing the raw vector data for each word. Initially you just read the index to a hashmap that maps each word to the datafile index and keep it in memory. If you need the data for a word, you calculate the offset in the data file (2000 * 32 * index) and read it as needed. You probably want to cache this data in RAM (if you are in java perhaps just use a weak map as a starting point).
This is basically implementing your own primitive database, but it may still be preferable because it avoidy database setup / deployment complexity.
I am looking for a lightweight embedded database to store (and rarely modify) a few kilobytes of data (5kb to 100kb) in Java applications (mostly Android but also other platforms).
Expected characteristics:
fast when reading, but not necessarily fast when writing
almost no size overhead (kilobytes used even when there is no data), but not necessarily very compact (kilobytes used per kilobyte of actual data)
very small database client library JAR file size
Open Source
QUESTION: Is there a database format specialized for those tiny cases?
Text-based solutions acceptable too.
If relevant: it will be this kind of data.
Stuff it in an object and serialize it out to a file. Write the new file on save, rename it on top of the old one to "commit" it so you don't have to worry about corrupting it if the write fails. No DB, no nothing. Simple.
If you can use flat (text) files, you could keep the file on disk and read/seek around. Never read it all in at once. If you need e.g. a faster index, maybe you can build the index and a record number and use the index to find the right "rows" and use the record number to get the rest of the data from a constant size field database or as a line number in a text file.
I don't know about this Java and that static initializer message, but that sounds to me like a code size limit, not data? Why would the runtime data affect bytecode?
Can't suggest specific libraries. Maybe there's some Berkeley DB, DSV or xBase style library around.
We are currently using DataSet for loading and saving our data to an xml file using Dataset and there is a good possibility that the size of the xml file could get very huge.
Either way we are wondering if there is any limit on the size for an xml file so the Dataset would not run into any issues in the future due to the size of it. Please advise.
Thanks
N
well, OS max file size in one thing to consider.
(although modern os won't have this problem).
old OS support only 2 GB per file if I recall right.
also - the time that you will need to waist on updating the file is enourmouse.
if you're going for a very very large file, use a small DB (mysql or sqlexpress or sqLite)
DataSets are stored in memory so the limit should be somewhere around the amount of memory the OS can address for custom processes.
While doing work for a prior client of reading and parsing large XML files over 2 gig per XML file, the system choked while trying to use an XML reader. While working w/Microsoft, we ultimately were passed on to the person who wrote the XML engine. His recommendation was for us to read and process it in smaller chunks, that it couldn't handle loading the entire thing into memory at one time. However, if you are trying to WRITE XML as a stream to a final output .XML file, you should be good to go on most current OS that support over 2 gig.