storing long text - database

what is the best way to store long texts (articles) in a database? it doesnt need to be searchable.
i want to allow ppl to read the first chapter of every book in my bookstore. dumping it into a database field makes it difficult to style paragraphs using css..
EDIT: access database

If it is sql server 2005 USE VARCHAR(MAX)
EDIT,
It seems he saif access,
so i would go with memo
Up to 63,999 characters. (If the Memo
field is manipulated through DAO and
only text and numbers [not binary
data] will be stored in it, then the
size of the Memo field is limited by
the size of the database.)
or OLE Object (if you can)
An object (such as a Microsoft Excel
spreadsheet, a Microsoft Word
document, graphics, sounds, or other
binary data) linked (OLE/DDE link: A
connection between an OLE object and
its OLE server, or between a Dynamic
Data Exchange (DDE) source document
and a destination document.) to or
embedded (embed: To insert a copy of
an OLE object from another
application. The source of the object,
called the OLE server, can be any
application that supports object
linking and embedding. Changes to an
embedded object are not reflected in
the original object.) in a Microsoft
Access table.
Up to 1 gigabyte (limited by available
disk space)

you have several options:
store it as a long single string with no formatting, which will look bland on the screen.
store it as a long single string with embedded html and css, which will be a bad choice if you ever want to make your site have a different look/feel.
normalize it so you have tables to store books, chapters, paragraphs, etc. you could then format and style the text as you load it into the application.

The main difference between long text (CLOB / TEXT / VARCHAR(MAX)) and long data (BLOB / IMAGE / VARBINARY(MAX)) is that the former is subject to character set conversions while the former is not.
If you need to make character set conversion on the database side, use CLOB and similar.
If you always want to retrieve your data as you atored it, byte-to-byte (as opposed character-to-character), use BLOB and similar.

I don't know which database you're using, but if text doesn't need to be searchable, then you can simply store the HTML formatted text (for instance, value coming from an FCKEditor or components like this). If you need also searchability, then you can store both HTML an plain text in two separated fields.
Fields can be nvarchar(MAX) if you use MS SQL Server 2008 or any equivalent datatype on other databases.
EDIT:
Seems you're using Access, so go for Memo data type!
If you decide to store HTML, consider to store only a generic markup (div, p) to divide your text, than later apply CSS formatting, wrapping stored text within another div specifing formatting classes for children elements.

I wouldn't store any of the documents in the database, but store the data in files in the file system, and the only thing that's in the database would be a pointer to the data files.
You don't give any details in your question that would suggest any need whatsoever to store the documents in the database itself.
And there are very few circumstances where it's advantageous.

Use a CLOB.

For SQL Server
TEXT / NTEXT for SQL Server 2000
VARCHAR(MAX) / NVARCHAR(MAX) for SQL Server 2005 onwards

I would propose storing the first chapter as pdf file. This is secure and allows for good formatting. Then use a blob, clob, varchar, or text field depending on your product (see the other answers).
Or you could use images and look into something like amazone's "look inside". It would work with the same db techniques.
Alternatively you could use something like markup.
I personally do not like to put html in my database. Even if it is only for output. Too easy to put in some javascript. But maybe I'm just too cautious.

The following applies to Jet 4.0 only, being the version of the Access Database Engine in the era Access2000 to Access2003 inclusive:
I wouldn't store any of the documents in the database, but store the data in files in the file system, and the only thing that's in the database would be a pointer to the data files.
You don't give any details in your question that would suggest any need whatsoever to store the documents in the database itself.
And there are very few circumstances where it's advantageous.
If you are using ACE, being the version of the Access Database Engine in the Access2007 era, the Attachment data type would be an option, however I don't really know how it works, I've never used it so I can't recommend it nor say whether it's better or worse for this purpose. I'm also wary of new data types in the first release of a major version of the Access Database Engine. I just remember all the issues with byte and decimal fields at the introduction of Jet 4 and don't want to commit to something that may never work properly. The Attachment type in the ACCDB format was introduced for Sharepoint compatibility, and that outside dependency is something that gives me pause. Will the ACCEDB data type change someday if Sharepoint changes the way it works? I'm not sure I'd want to take that risk.

Put it in a TEXT field, and put it with their <p> so you'll be able to style paragraphs.
As it doesn't need to be searchable, it won't impact your sql performance.

Related

SQL Blob to Base 64 in Table for FileMaker

I have looked and found some instances there something similar is being done for websites etc....
I have a SQL table that I am accessing in FileMaker Pro (Through ESS) via an ODBC connection to the SQL database and I have everything I need except there is one field(LNL_BLOB) in one table (duo.MMOBJS) which is an image "(image, null)" which cannot be accessed via the ODBC connection.
What I am hopping to accomplish is find a way that when an image is placed in the field, it is ALSO converted to Base64 in another field in the same table. Also, the database creator has a "View" (Foreign Concept to us Filemaker Developers) with this same data called "dbo.VW_BLOB_IMAGES" if that is helpful.
If there is a field with Base64 text, within FileMaker I can decode it to get the image.
What thoughts do you all have? Is there and even better way?
NOTE: I am using many tables and lots of the data in the app that I have made, this image is not the only reason I have created the ODBC connection.
Table
View
Well, one way to get base64 out of SQL would be to trick the XML engine in SQL to convert your column to base64, then strip out the XML:
SELECT SUBSTRING(Q.Base64Data, 7, LEN(Q.Base64Data)-9)
FROM (SELECT
(
SELECT LNL_BLOB AS B
FROM duo.MMOBJS
FOR XML raw('r'), BINARY BASE64
) AS [Base64Data]) AS [Q]
You'd probably want to add that to your select statement or a view, rather than add it to the table; but, you could write a trigger that would maintain the field using that definition.

SQL Server 2014 - Insert xml data into varbinary field

In my stored procedure, I am creating an XML file which has the potential to be very large, > 1GB in size. The data needs to be inserted into a varbinary column and I was wondering what the most efficient method of doing this is in SQL Server 2014?
I was storing it in an xml column but have been asked to move it to this new column as a result of a decision outside of my control
If you have the slightest chance to speak with these persons, you should do this!
You must be aware, that XML is not stored as the string representation you see, but as a hierarchically organized tree. Reading this data or manipulating it is astonishingly fast! If you store the XML as BLOB, you will keep it in its string format (hopefully this is unicode/UCS-2!). Reading this data will need a cast to NVARCHAR(MAX) and then to XML, which means a full parse of the whole document to get the hierarchy tree. When this is done, you can use XML data type methods like .value or .nodes). You will need this very expensive process over and over and over and ...
Especially in cases of huge XMLs (or - even worse - many of them) this is a really bad decision!! Why should one do this??? It will take roughly the same amount of storage space.
The only thing you will get is bad performance! And you will be the one who has to repair this later...
VARBINARY is the appropriate type for data, where you do not care what's inside (e.g. pictures). If these XMLs are just plain archive data and you do not want to read or manipulate them, this can be a choice. But there is no advantage at all!
I would look into using a File Table: https://learn.microsoft.com/en-us/sql/relational-databases/blob/filetables-sql-server
And Check out this for inserting blobs: How to insert a blob into a database using sql server management studio

Want to remove or obfuscate credit card info from xml stored as text

We have a log table that has a varchar(max) field, that contains a copy of xml passed back and forth in remote system calls.
The problem is that we have credit card data stored in the column, stored openly. Our goal to is to obfuscate, mask OR delete the credit card data. The format of the xml is varied. We know what the format will be in most cases.
So lets say the table is titled RemoteSysLog, and the field storing the text is titled InBoundMessage.
Any ideas on a solution?
we have considered
use the xml.remove functionality
use some sort of pattern matching for 16 digit numbers and replace with xxxxx( not sure what to use
replace the nodes with standard replace, substring, charindex string processing routines.
There are multiple sources that are logging to this table.So the company does NOT want to edit all the source code for those 'legacy' apps. So the solution needs to be within this scope.
The log service should cleanse the payment information before storing the messages...it's just too risky. Even if you make a pass afterwards to redact the CC info, your database and transaction backups may still contain the information you meant to wipe.
Since you're describing XML messages in need of transformation, my go-to approach would involve xslt.
Something like this.

How to attach and view pdf documents to access database

I have a very simple database in access, but for each record i need to attach a scanned in document (probably pdf). What is the best way to do this, the database should not just link to a file on the pc, but should copy and keep the file with it, meaning if the original file goes missing the database is moved or copied, the file should still be accessable from within the Database. Is This possible? and what is the easiest way of doing it? If is should i can write a macro, i just dont know where to start. and also when i display a report of the table, i would like to just see thumbnails of the documents.
Thank you.
As the other answerers have noted, storing file data inside a database table can be a questionable practice. That said, I wouldn't personally rule it out, though if you are going to take that option, I'd strongly suggest splitting out the file data into its own table in its own backend file. For example:
Create a new database file called Scanned files.mdb (or Scanned files.accdb).
Add a single table called Scans with fields such as FileID (AutoNumber, primary key), MainTableID (matches whatever is the primary key of the main table in the main database file), FileName (Text), FileExt (Text) and FileData ('OLE object', really just a BLOB - don't actually use OLE Objects because they will bloat the database horribly).
Back in the frontend, add a reference to Scans as a linked table.
Use a bit of VBA to upload and extract files from the Scans table (if you're interested in the mechanics of this, post a separate question).
Use the VBA Shell routine (if you must) or ShellExecute from the Windows API (= the better option IMO) to open extracted data.
If you are using the newer ACCDB format, then you have the 'attachment' field type available as smk081 suggests. This basically does most of the above steps for you, however doing things 'by hand' gives you greater flexibilty - for example, it allows giving each file a 'DateScanned' or 'DateEffective' field.
That said, your requirement for thumbnails will require explicit coding whatever option you take. It might be possible to leverage the Windows file previewing API, though I'd be certain thumbnails are a definite requirement before investigating this - Access VBA is powerful enough to encourage attempts at complex solutions, but frequently not clean and modern enough to allow fulfilling them in a particularly maintainable fashion.
There is an Attachment type under Data Type when you go into Design View of your table. You can add an attachment field here. When you go into the Datasheet view of the table you can select this field for a particular row and a window will open for you to specify the attachment. This will cause your database to quickly grow in size if you add a lot of large attachments.
You can use an OLE field in a table, but I would really suggest you not use this approach. The database is going to be HUGE in no time, and you're going to regret it.
Instead, you should consider adding a field that stores the path to the file, and keep the files in one folder on your network. Then you can use a SHELL() command to open the file. What's the difference between restoring an Access database and restoring PDF files if something goes wrong? This will keep your database at a manageable size and reduce the possibility of corruption.

SQL Server 2005 - How do I convert image data type to character format

Background: I am a software tester working with a test case management database that stores data using the deprecated image data type. I am relatively inexperienced with SQL Server.
The problem: Character data with rich text formatting is stored as an image data type. Currently the only way to see this data in a human readable format is through the test case management tool itself which I am in the process of replacing. I know that there is no direct way to convert an image data type to character, but clearly there is some way this can be accomplished, given that the test case management software is performing that task. I have searched this site and have not found any hits. I have also not yet found any solutions by searching the net.
Objective: My goal is to export the data out of the SQL Server database into an Access database There are fewer than 10,000 rows in the database. At a later stage in the project, the Access database will be upsized to SQL Server.
Request: Can someone please give me a method for converting the image data type to a character format.
.
You presumably want to convert to byte data rather than character. This post at my blog
Save and Restore Files/Images to SQL Server Database might be useful. It contains code for exporting to a byte array and to a file. The entire C# project is downloadable as a zip file.
One solution (for human readability) is to pull it out in chunks that you convert from binary to character data. If every byte is valid ASCII, there shouldn't be a problem (although legacy data is often not what you expect).
First, create a table like this:
create table Nums(
n int primary key
);
and insert the integers from 0 up to at least (maximum image column length in bytes)/8000. Then the following query (untested, so think it through) should get your data out in a relatively useful form. Be sure whatever client you're pulling it to won't truncate strings at smaller than 8000 bytes. (You can do smaller chunks if you want to be opening the result in Notepad or something.)
SELECT
yourTable.keycolumn,
Nums.n as chunkPosition,
CAST(SUBSTRING(imageCol,n*8000+1,8000) AS VARCHAR(8000)) as chunk
FROM yourTable
JOIN Nums
ON Nums.n <= (DATALENGTH(yourTable.imageCol)-1)/8000
ORDER BY yourTable.keycolumn, Nums.n

Resources