So, for example let's say I wanted to setup a SQLite database that contains some data on invoices. Let's say each invoice has a date, invoice number, and company associated with it for simplicity. Is there a good way for the database to be able to access or store a PDF file(~300-700kb/file) of the specified invoice? If this wouldn't work any alternative ideas on what might work well?
Any help is greatly appreciated
You could store the data (each file) as a BLOB which is a byte array/stream so the file could basically be stored as it is within a BLOB.
However, it may be more efficient (see linked article) to just store the path to the file, or perhaps just the file name (depending upon standards) and then use that to retrieve and view the invoice.
Up to around 100k it can be more efficient to store files as BLOB. You may find this a useful document to read SQLite 35% Faster Than The Filesystem
SQLite does support a BLOB data type, which stores data exactly as it is entered. From the documentation:
The current implementation will only support a string or BLOB length up to 231-1 or 2147483647
This limit is much larger than your expected need of 300-700 KB per file, so what you want should be possible. The other thing to consider is the size of your database. Unless you expect to have well north of around 100 TB, then the database size limit also should not pose a problem.
Related
I have an application which produces a large amount of data, that is all written once and then unchangeable (by law), and is rarely ever read. When it is read, it is always read in its entirety, as in, all the data for 2012 is read in one shot, and either processed for reporting or output in a different format for export (or gasp printed). The only way to access the data is to access an entire day's worth of data, or more than one day.
This data is easily represented as either two or three relational tables, or as a long list of self-contained documents.
What is the most storage-space-efficient way to store such data in a file system? Specifically, we're thinking of using Amazon S3 (File storage) for storage, though we could use something like RDS (their version of MySQL).
My current best bet is a gzipped file with JSON data for the entire day, one file per day.
Unless my data was pure ASCII (and even if it was), I would probably choose a binary storage method like one of
BSON
Protocol Buffers
B encode
I would use Windows Azure's Table Storage because it allows for heterogenous structured data to be stored in a single table. Having a database-like storage will allow you to append data as needed. You can easily create new table for each year.
Assume one has 100K+ plaintext files. With each file there is some structured information associated. Files are likely to be retrieved by describing that properties. That is, I have a file important_file and an array with (mandatory) values filled in: {property0: value0, ..., propertyN: valueN}. Each of that field is filled before the file is added to collection, so at every moment thereafter I can describe that file with that values.
The question is: is it better to store files within DB (size is guaranteed to be <=5MB (most probable size is ~500KB in 99% cases)) or directly in FS? Should I look at document-oriented (like MongoDB) solution in case the answer is "DB"?
Links to similar cases are appreciated.
If you are using Oracle, storing files outside database has no benefits, according to Tom Kyte.
I suspect other modern DBMSes behave similarly. Even if some of them doesn't, consider very carefully whether it's worth trading the data integrity (guaranteed by the database) for performance...
I am storing image files (like jpg, png) in a PostgreSQL database. I found information on how to do that here.
Likewise, I want to store videos in a PostgreSQL database. I searched the net - some say one should use a data type such as bytea to store binary data.
Can you tell me how to use a bytea column to store videos?
I would generally not recommend to store huge blobs (binary large objects) inside PostgreSQL if referential integrity is not your paramount requirement. Storing huge files in the filesystem is much more efficient:
Much faster, less disk space used, easier backups.
I have written a more comprehensive assessment of the options you've got in a previous answer to a similar question. (With deep links to the manual.)
We did some tests about practical limits of bytea datatype. There are theoretical limit 1GB. But practical limit is about 20MB. Processing larger bytea data eats too much RAM and encoding and decoding takes some time too. Personally I don't think so storing videos is good idea, but if you need it, then use a large objects - blobs.
Without knowing what programming language you are using, I can only give a general approach:
Create a table with a column of type 'bytea'.
Get the contents of the video file into a variable.
Insert a row into that table with that variable as the data for the bytea column.
I am searching for a key value store that can handle values with a size of some Gigabytes. I have had a look on Riak, Redis, CouchDb, MongoDB.
I want to store a workspace of a user (equals to a directory in filesystem, recursively with subdirectories and files in it) in this DB. Of course I could use the file system but then I dont't have features such as caching in RAM, failover solution, backup and replication/clustering that are supported by Redis for instance.
This implies that most of the values saved will be binary data, eventually some Gigabytes big, as one file in a workspace is mapped to one key-value tupel.
Has anyone some experiences with any of these products?
First off, getting an MD5 or CRC32 from data size of GB is going to be painfully expensive computationally. Probably better to avoid that. How about store the data in a file, and index the filename?
If you insist, though, my suggestion is still to just store the hash, not the entire data value, with a lookup array/table to the final data location. Safeness of this approach (non-unique possibility) will vary directly with the number of large samples. The longer the hash you create -- 32bit vs 64bit vs 1024bit, etc -- the safer it gets, too. Most any dictionary system in a programming language, or a database engine, will have a binary data storage mechanism. Failing that, you could store a string of the Hex value corresponding to the hashed number in a char column.
We are now using MongoDB, as it supports large binary values, is very popular and has a large user base. Maybe we are going to switch to another store, but currently it looks very good!
In a Windows Phone 7 application, I would like to query a big XML file (list of cities) stored using Isolated Storage. If I do that this way, will the file be loaded to memory (> 5 mo) ? If so, what other solution do I have?
Edit:
More details. I want to use AutoCompleteBox (http://www.jeff.wilcox.name/2008/10/introducing-autocompletebox/), but instead of using a web service (this is fixed data, no need to be online), I want to query a file/database/isolated storage... I have a fixed list of cities. I said in the comments it's 40k, but it finally seems closer to 1k rows.
instead of using isolatedstorage for this, would it be an option for you to use a webservice instead... or do you design your app for an offline approach?
querying a webservice, wcf or json enabled webservice is really simple, and will be easier for you to maintain :)
Rather than have a big file containing all the data can you not break it down into lots of smaller files. (One for each city?)
You could have a separate file to keep an index of them all if need be. Alternatively, depending on the naming of the files, you may be able to use IsolatedStorageFile.GetFileNames to get a list of all files.
I would create my own file format, using, for example, a separator between fields, with one row for each record.
That way you can read your file line-by-line to fill your data structure with these advantages:
no need to pull the whole file into memory
no XML overhead (in a desktop application it may not be a problem, but in the phone context a 5 MB text file may become quite a bit smaller)
Dumb example:
New York City; 12345
Berlin; 25635
...
EDIT: given that the volume is not that large you don't need any form of indexing or loading on-demand. I would store the cities as stated above -one record per line-, load them in a list and use LINQ to select the items you need. This will probably be fast and keep your application very responsive.
In this case, in my opinion, XML is not the best tool for the job. Your structure is very simple and storing in XML would probably double the file size, which is a concern for a mobile device, and would also slow the parsing, also a concern in this case.