How to find a path for a Blob in Postgres - database

I am in the process of decommissioning a postgres database that has tens of thousands of blob files in it. The original setup of this database did not scale well, storing thousands of image files as blobs. The process now is to push the database files to slower storage and disable the database server.
I would like to be able to work out how to extract these image files by their blob ID, if such a thing is possible.
I know that, in general, the files are stored in:
/<path to postgres>/pg_data/base/<database_oid>/
However the files in there do not correlate to the blob's ID within the database. Is there a query I can run that will give me a mapping from OIDs to file paths or am I misunderstanding how the files are stored on disk?

It turns out I did misunderstand how data is stored.
Postgres stores Blobdata as a collection of byte arrays and then references to them. It is non-trivial to re-construct the files from this.

Related

Any reason to NOT use FileTable (as opposed to plain FileStream) in SQL Server? [duplicate]

I want to store images in a sql database. The size of the image is between 50kb to 1mb. I was reading about a FileStream and a FileTable but I don't know which to choose. Each row will have 2 images and some other fields.
The images will never be updated/deleted and about 3000 rows will be inserted a day.
Which is recommend in this situation?
Originally it was always a bad idea to store files (= binary data) in a database. The usual workaround is to store the filepath in the database and ensure that a file actually exists at that path. It wás possible to store files in the database though, with the varbinary(MAX) data type.
sqlfilestream was introduced in sql-server-2008 and handles the varbinary column by not storing the data in the database files (only a pointer), but in a different file on the filesystem, dramatically improving the performance.
filetable was introduced with sql-server-2012 and is an enhancement over filestream, because it provides metadata directly to SQL and it allows access to the files outside of SQL (you can browse to the files).
Advice: Definitely leverage FileStream, and it might not be a bad idea to use FileTable as well.
More reading (short): http://www.databasejournal.com/features/mssql/filestream-and-filetable-in-sql-server-2012.html
In SQL Server, BLOBs can be standard varbinary(max) data that stores the data in tables, or FILESTREAM varbinary(max) objects that store the data in the file system. The size and use of the data determines whether you should use database storage or file system storage.
If the following conditions are true, you should consider using FILESTREAM:
Objects that are being stored are, on average, larger than 1 MB.
Fast read access is important.
You are developing applications that use a middle tier for application logic.
For smaller objects, storing varbinary(max) BLOBs in the database
often provides better streaming performance.
Benefits of the FILETABLE:
Windows API compatibility for file data stored within a SQL Server database. Windows API compatibility includes the following:
Non-transactional streaming access and in-place updates to FILESTREAM data.
A hierarchical namespace of directories and files.
Storage of file attributes, such as created date and modified date.
Support for Windows file and directory management APIs.
Compatibility with other SQL Server features including management tools, services, and relational query capabilities over FILESTREAM and file attribute data.
It depends. I personally will preffer link to the image inside the table. It is more simple and the files from the directory can be backed up separately.
You have to take into account several things:
How you will process images. Having only link allows you easily incorporates imges inside web pages (with propper config of the Web server).
How much are the images - if they are stored in the DB and they are a lot - this will increase the size of the DB and backups.
Are the images change oftenly - in that case it may be better to have them inside DB to have actual state of the backup inside DB.

What is the better and faster way to save and get files Azure BLOB storage or SQL SERVER?

[Background]
Now I am creating WCF for keeping and getting articles of our university.
I need to save files and metadata of these files.
My WCF need to be used by 1000 person a day.
The storage will contains about 60000 aticles.
I have three different ways to do it.
I can save metadata(file name, file type) in sql server to create unique id) and save files into Azure BLOB storage.
I can save metadata and data into sql server.
I can save metadata and data into Azure BLOB storage.
What way do chose and why ?
If you suggest your own solution, it will be wondefull.
P.S. Both of them use Azure.
I would recommend going with option 1 - save metadata in database but save files in blob storage. Here're my reasons:
Blob storage is meant for this purpose only. As of today an account can hold 500TB of data and size of each blob can be of 200 GB. So space is not a limitation.
Compared to SQL Server, it is extremely cheap to store in blob storage.
The reason I am recommending storing metadata in database is because blob storage is a simple object store without any querying capabilities. So if you want to search for files, you can query your database to find the files and then return the file URLs to your users.
However please keep in mind that because these (database server and blob storage) are two distinct data stores, you won't be able to achieve transactional consistency. When creating files, I would recommend uploading files in blob storage first and then create a record in the database. Likewise when deleting files, I would recommend deleting the record from the database first and then removing blob. If you're concerned about having orphaned blobs (i.e. blobs without a matching record in the database), I would recommend running a background task which finds the orphaned blobs and delete them.

FileStream vs FileTable

I want to store images in a sql database. The size of the image is between 50kb to 1mb. I was reading about a FileStream and a FileTable but I don't know which to choose. Each row will have 2 images and some other fields.
The images will never be updated/deleted and about 3000 rows will be inserted a day.
Which is recommend in this situation?
Originally it was always a bad idea to store files (= binary data) in a database. The usual workaround is to store the filepath in the database and ensure that a file actually exists at that path. It wás possible to store files in the database though, with the varbinary(MAX) data type.
sqlfilestream was introduced in sql-server-2008 and handles the varbinary column by not storing the data in the database files (only a pointer), but in a different file on the filesystem, dramatically improving the performance.
filetable was introduced with sql-server-2012 and is an enhancement over filestream, because it provides metadata directly to SQL and it allows access to the files outside of SQL (you can browse to the files).
Advice: Definitely leverage FileStream, and it might not be a bad idea to use FileTable as well.
More reading (short): http://www.databasejournal.com/features/mssql/filestream-and-filetable-in-sql-server-2012.html
In SQL Server, BLOBs can be standard varbinary(max) data that stores the data in tables, or FILESTREAM varbinary(max) objects that store the data in the file system. The size and use of the data determines whether you should use database storage or file system storage.
If the following conditions are true, you should consider using FILESTREAM:
Objects that are being stored are, on average, larger than 1 MB.
Fast read access is important.
You are developing applications that use a middle tier for application logic.
For smaller objects, storing varbinary(max) BLOBs in the database
often provides better streaming performance.
Benefits of the FILETABLE:
Windows API compatibility for file data stored within a SQL Server database. Windows API compatibility includes the following:
Non-transactional streaming access and in-place updates to FILESTREAM data.
A hierarchical namespace of directories and files.
Storage of file attributes, such as created date and modified date.
Support for Windows file and directory management APIs.
Compatibility with other SQL Server features including management tools, services, and relational query capabilities over FILESTREAM and file attribute data.
It depends. I personally will preffer link to the image inside the table. It is more simple and the files from the directory can be backed up separately.
You have to take into account several things:
How you will process images. Having only link allows you easily incorporates imges inside web pages (with propper config of the Web server).
How much are the images - if they are stored in the DB and they are a lot - this will increase the size of the DB and backups.
Are the images change oftenly - in that case it may be better to have them inside DB to have actual state of the backup inside DB.

Concatenate VARBINARY Data in MSSQL

We have an application that stores Word and PDF documents in a share on a server. I'm looking into the possibility of storing these as BLOBs in the associated Microsoft SQL database instead, which seems like it's probably a good idea.
Separately, an idea which I'm investigating is the possibility of allowing users to easily view all of the documents in the share associated with a case (let's imagine they're grouped into folders by case) as one continuous stream on a tablet, as if they were all one big PDF file.
I think I've worked out how to do the latter, running a web service to convert the Word documents to PDFs and then concatenate them and the extant PDFs. But that's if we continue to store the documents as files in an NTFS share. What if we stored the documents as BLOBs in MSSQL instead?
Is there a way to concatenate BLOB data so that for every, say, 10 BLOB records (which might represent Word or PDF files), I could create an 11th record which was a concatenation of the other 10 as one giant PDF?
SQL Server Blobs are not an effective way of storing files. SQL 2008 bought about a better mechanism for this called FILESTREAM ( http://technet.microsoft.com/en-us/library/gg471497.aspx) which can store the files directly on the file system but managed by SQL.
As for the files you would not be able to simply concatenate the PDF files to form one continuous file but there are several libraries that you could use to do this, potentially on the fly. This would remove the need to store the concatenated document as well.

Save Access Report as PDF/Binary

I am using Access 2007 (VBA - adp) front end with a SQL Server 2005 Backend. I have a report that I want to save a copy as a PDF as a binary file in the SQL Server.
Report Opened
Report Closed - Closed Event Triggered
Report Saved as PDF and uploaded into SQL Server table as Binary File
Is this possible and how would I achieve this?
There are different opinions if it's a good idea to store binary files in database tables or not. Some say it's ok, some prefer to save the files in the file system and only store the location of the file in the DB.
I'm one of those who say it's ok - we have a >440 GB SQL Server 2005 database in which we store PDF files and images. It runs perfectly well and we don't have any problems with it (for example with speed...that's usually one main argument of the "file system" people).
If you don't know how to save the files in the database, google "GetChunk" and "AppendChunk" and you will find examples like this one.
Concerning database design:
It's best if you make two tables: one only with an ID and the blob field (where the PDF files are stored in) and one with the ID and additional fields for filtering.
If you do it this way, all the searching and filtering will happen on the small table, and only when you know the ID of the file you want to load, you hit the big table exactly one time and load the file.
We do it like this and like I said before - the database contains nearly 450 GB of files, and we have no speed problems at all.
The easiest way to do this is to save the report out to disk as a PDF (if you don't know how to do that, I recommend this thread on the MSDN forums). After that, you'll need to use ADO to import the file using OLE embedding into a binary type of field. I'm rusty on that, so I can't give specifics, but Google searching has been iffy so far.
I'd recommend against storing PDF files in Access databases -- Jet has a strict limit to database size, and PDFs can fill up that limit if you're not careful. A better bet is to use OLE linking to the file, and retrieving it from disk each time the user asks for it.
The last bit of advice is to use an ObjectFrame to show the PDF on disk, which MSDN covers very well here.

Resources