Using Redis to temporary caching files - database

I would like to temporary cache uploaded files in Redis. I know it is utilizing a lot of memory, but I think it is the best way to have a really low latency for a temporary amount of time.
How do I store files in Redis? Do I somehow convert them into binary and store them and decode them when I need them?

Strings in Redis are binary safe, which means you could store binary files without any problem (https://redis.io/topics/data-types#strings).
The way you will do this depends on the language and frameworks you are using, but, generally speaking, one way to accomplish this is just storing in Redis the file content as base64.
Hope it helps.

Related

How are Huge files stored in a database?

I was just wondering how exactly huge files are stored in databases. Most BLOBs are limeted to 1GB as far as I know but if you take youtube for example, they have multiple full-HD videos with over an hour of running time (I think that's a bit larger than 1GB).
Are they using some kind of special database, is there another datatype I've never heard of or are they just using a simple method like splitting the files?
If they use let's say a method where they split and rearrange the bits and bytes, how can the end user look a video without noticing.
It's just a question out of pure curiosity but I would be happy if you could answer it.
It is not really the best idea to store files into a database. Youtube and other websites are web applications that store files in files systems. Databases are then only necessary to store information allowing to retrieve the required files on the file system before providing them to users.
They could be stored on disk and use a DB to hold only the paths. I'm not sure what you're asking.
why do you want to store it as BLOB? you can just store it as a file ( FLV or whatever ) and just stream it from there.

Text Compression in Erlang

Is there a text compression library for Erlang? When working with very long strings, it may be advantageous to compress the character data. Has anyone compressed text or thought of a way to do it in Erlang?
I was thinking of using the zip module, but instead of working with files, I work in-memory like this:
compress(LargeText)->
Binary = list_to_binary(LargeText),
{ok,{_,Zipped}} = zip:zip("ZipName",[{"Name",Binary}],[memory]),
Zipped.
Then I would unzip the text back into memory when I need it. Like this:
{ok,[{"Name",Binary}]} = zip:unzip(Zipped,[memory]).
My Erlang application is supposed to be part of a middle tier in which large text may have to pass through to, and out of a storage system. The storage system is intended on storing large text. To optimize the storage, there is need to compress it before sending it. Assume that the text value is like a CLOB data type in Oracle Database. I am thinking that if I combine the zipping and the erlang:garbage_collect/0, i can pull it off.
Or if it's not possible in Erlang, perhaps it is possible using a system call via os:cmd({Some UNIX script}) and then I would grab the output in Erlang? If there's a better way, please show it.
There is a zlib module for Erlang, which supports in-memory compression and decompression.
You can consider using snappy compression which is a lot faster than zip especially for decompression.
Edit:
Nowadays I am using LZ4 a lot and I am very happy with it. It has a nice and readable code, simple format, well maintained and is even faster than Snappy.

database for huge files like audio and video

My application creates a large number of files, each up to 100MB. Currently we store these files in the file system which works pretty well. But I am wondering if there is a better solution to store the files in a some kind of file database. The simple advantage with database is if it can split the file and store in small chunks instead of one 100mb file.
A file system is perfectly suited for storing files. If you need to associate them with a database, do it by filename. The filesystem already does numerous fancy things to assure it is efficient. It's probably best that you don't try to outsmart it.
Relational databases are no good at files this big. You could go to something like HDFS, but it may not be worth the trouble if what you have is doing the job. I believe it does break large files down into chunks though.

How to efficiently store hundrets of thousands of documents?

I'm working on a system that will need to store a lot of documents (PDFs, Word files etc.) I'm using Solr/Lucene to search for revelant information extracted from those documents but I also need a place to store the original files so that they can be opened/downloaded by the users.
I was thinking about several possibilities:
file system - probably not that good idea to store 1m documents
sql database - but I won't need most of it's relational features as I need to store only the binary document and its id so this might not be the fastest solution
no-sql database - don't have any expierience with them so I'm not sure if they are any good either, there are also many of them so I don't know which one to pick
The storage I'm looking for should be:
fast
scallable
open-source (not crucial but nice to have)
Can you recommend what's the best way of storing those files will be in your opinion?
A filesystem -- as the name suggests -- is designed and optimised to store large numbers of files in an efficient and scalable way.
You can follow Facebook as it stores a lot of files (15 billion photos):
They Initially started with NFS share served by commercial storage appliances.
Then they moved to their onw implementation http file server called Haystack
Here is a facebook note if you want to learn more http://www.facebook.com/note.php?note_id=76191543919
Regarding the NFS share. Keep in mind that NFS shares usually limits amount of files in one folder for performance reasons. (This could be a bit counter intuitive if you assume that all recent file systems use b-trees to store their structure.) So if you are using comercial NFS shares like (NetApp) you will likely need to keep files in multiple folders.
You can do that if you have any kind of id for your files. Just divide it Ascii representation in to groups of few characters and make folder for each group.
For example we use integers for ids so file with id 1234567891 is stored as storage/0012/3456/7891.
Hope that helps.
In my opinion...
I would store files compressed onto disk (file system) and use a database to keep track of them.
and posibly use Sqlite if this is its only job.
File System : While thinking about the big picture, The DBMS use the file system again. And the File system is dedicated for keeping the files, so you can see the optimizations (as LukeH mentioned)

How to best store a large JSON document (2+ MB) in database?

What's the best way to store large JSON files in a database? I know about CouchDB, but I'm pretty sure that won't support files of the size I'll be using.
I'm reluctant to just read them off of disk, because of the time required to read and then update them. The file is an array of ~30,000 elements, so I think storing each element separately in a traditional database would kill me when I try to select them all.
I have lots of documents in CouchDB that exceed 2megs and it handles them fine. Those limits are outdated.
The only caveat is that the default javascript view server has a pretty slow JSON parser so view generation can take a while with large documents. You can use my Python view server with a C based JSON library (jsonlib2, simplejson, yajl) or use the builtin erlang views which don't even hit JSON serialization and view generation will be plenty fast.
If you intend to access specific elements one (or several) at a time, there's no way around breaking the big JSON into traditional DB rows and columns.
If you'd like to access it in one shot, you can convert it to XML and store that in the DB (maybe even compressed - XMLs are highly compressible). Most DB engines support storing an XML object. You can then read it in one shot, and if needed, translate back to JSON, using forward-read approaches like SAX, or any other efficient XML-reading technology.
But as #therefromhere commented, you could always save it as one big string (I would again check if compressing it enhances anything).
You don't really have a variety of choices here, you can cache them in RAM using something like memcached or push them to disk reading and writing them with a databsae (RDBMS like PostgreSQL/MySQL or DOD like CouchDB). The only real alternative to these is a hybrid system of caching the most frequently accessed documents in memcached for reading which is how a lot of sites operate.
2+MB isn't a massive deal to a database and providing you have plenty of RAM they will do an intelligent enough job of caching and using your RAM effectively. Do you have a frequency pattern of when and how often these documents are accessed and how man users you have to serve?

Resources