Best way to move data from SQL to file system (FileStream?) - sql-server

At the moment I have a SQL Server Express instance with a database nearing the 10gig limit. I have a table that stores details about activities, one column being "Note" which can be up to 3000 characters long.
I would like to take this table and convert the "Note" column to use the file system so that it frees up some database space. I also need this column to allow full text searching. I've poured over all the documents on FileStreams that I can find but I still can't figure out the best way to do this.
My main question is how I get my existing data (nvarchar(3000)) moved over to this new Filestream column and allow full text searches to work on this. Everything I've read states that you have to have a column that denotes the file extension being used -- but I don't know how to file my data into the new field using a specific file type. I'd be fine using .txt as the type, if I could figure out how.
Any ideas or best practices would be appreciated. I'm not hung up on the FileStream, so I'd take any other recommendations as well. My company is too cheap to buy SQL Standard, so I'd l

Related

Performance issue, flat file vs database storage

I am working on a project that asked me to develop a system to provide JSON output, the following is the flow:
1a) Some tables will be updated via the administration panel (my company side)
1b) Some related tables will be updated via the administration panel (partner side)
-- Let say SuperHeroes & Males were updated in 1a), Studios & Years were updated in 1b)
2) Client browse our site and request an information set which:
Has an enabled and not deleted row in SuperHeroes (Ant-man)
Has an enabled and not deleted row in Males linked to SuperHeroes (Scott Lang)
Joined the above records then look if they are linked to and exists in Studios (Marvel)
Linked to an existing row in Years (2015)
3) A very small data will be outputted to a JSON string as the following: { id:1,type:marvel },{ id:1,type:dc }
All rows in the above 4 tables will be updated/deleted at anytime without notification, [No Foreign Key as well]
I am thinking to update the information in a flat file every time 1a is performed (since we can update the system of my company side but not the partner side, and they are rejected to save some extra information into a flat file, so the situation is we have no easy way to know if the Studios or Years tables are modified)
Then while the JSON request will first load the information from the flat file (all outputting data will be stored in this file), then use a simple SQL statement to filter if a linked record exists in Studios & Years
I have done my research and getting confused, I concluded when the data amount is small then flat file will be great but beware the file comes larger and larger (The flat file we are talking will noway more than 50 rows at 1 time, and that should not be modified frequently)
Some answer said database is good at Query data (I think so and the requirement will perform SQL check too)
So I don't know if its good when my data amounts are small but still need some communication with the database..
I appreciate your time and your help, all idea & hints are welcome, thanks!
Your conclusion regarding amount of data is absolutely correct, and file should handle those 50 rows, but..
Using database as storage should give you more options in the future, e.g.:
you'll be able to produce any output due to separation of data
representation (today it's JSON, but what if you have to produce XML
at some point? will you add files that store XML's next to JSON
files? and then CSV or any other?)
transactions will guarantee the ACID
scalability and performance - if your dataset get bigger (you never know :)), many
DB's offer you many possibilities like table partitioning, partitioning-based clustering
or replication
Wrong choices regarding architecture and technology made at the beginning of the project will always backfire.

How to attach and view pdf documents to access database

I have a very simple database in access, but for each record i need to attach a scanned in document (probably pdf). What is the best way to do this, the database should not just link to a file on the pc, but should copy and keep the file with it, meaning if the original file goes missing the database is moved or copied, the file should still be accessable from within the Database. Is This possible? and what is the easiest way of doing it? If is should i can write a macro, i just dont know where to start. and also when i display a report of the table, i would like to just see thumbnails of the documents.
Thank you.
As the other answerers have noted, storing file data inside a database table can be a questionable practice. That said, I wouldn't personally rule it out, though if you are going to take that option, I'd strongly suggest splitting out the file data into its own table in its own backend file. For example:
Create a new database file called Scanned files.mdb (or Scanned files.accdb).
Add a single table called Scans with fields such as FileID (AutoNumber, primary key), MainTableID (matches whatever is the primary key of the main table in the main database file), FileName (Text), FileExt (Text) and FileData ('OLE object', really just a BLOB - don't actually use OLE Objects because they will bloat the database horribly).
Back in the frontend, add a reference to Scans as a linked table.
Use a bit of VBA to upload and extract files from the Scans table (if you're interested in the mechanics of this, post a separate question).
Use the VBA Shell routine (if you must) or ShellExecute from the Windows API (= the better option IMO) to open extracted data.
If you are using the newer ACCDB format, then you have the 'attachment' field type available as smk081 suggests. This basically does most of the above steps for you, however doing things 'by hand' gives you greater flexibilty - for example, it allows giving each file a 'DateScanned' or 'DateEffective' field.
That said, your requirement for thumbnails will require explicit coding whatever option you take. It might be possible to leverage the Windows file previewing API, though I'd be certain thumbnails are a definite requirement before investigating this - Access VBA is powerful enough to encourage attempts at complex solutions, but frequently not clean and modern enough to allow fulfilling them in a particularly maintainable fashion.
There is an Attachment type under Data Type when you go into Design View of your table. You can add an attachment field here. When you go into the Datasheet view of the table you can select this field for a particular row and a window will open for you to specify the attachment. This will cause your database to quickly grow in size if you add a lot of large attachments.
You can use an OLE field in a table, but I would really suggest you not use this approach. The database is going to be HUGE in no time, and you're going to regret it.
Instead, you should consider adding a field that stores the path to the file, and keep the files in one folder on your network. Then you can use a SHELL() command to open the file. What's the difference between restoring an Access database and restoring PDF files if something goes wrong? This will keep your database at a manageable size and reduce the possibility of corruption.

Better way to store updatable scientific data?

I am using a file consisting of published scientific data. I'm using this file with a program that reads in the first 5 space delimited data fields, and everything after that is considered a comment by the program.
2 example lines (of thousands):
FeII 1608.4511 0.521 55.36 -1300 M03 Journal of Physics
FeII 1611.23045 0.0321 55.36 1100 01J AJ
The program reads it as:
FeII 1608.4511 0.521 55.36 -1300
FeII 1611.23045 0.0321 55.36 1100
These numbers are each measurements and most (don't get me started) have associated errors that are not listed in this file. I would like to store this information in a useful and updatable way. That is, say the first entry FeII 1608.4511 has an error of plus/minus 0.002. Consider when a new measurement is made and changes it to: FeII 1608.45034 plus/minus 0.0005. I would like to update the value, the error, and record some information about the publication that it came from.
The program that uses this file is legacy code and is both crucial and inflexible: and it needs the file to look like the above output when it's read in. I would really like for there to be a way to update the input file to include things like errors on the values and publication hyperlinks in comments. I would also like a kind of version control ability to return the state of this large file today; or in 5 months after 20 more lines are updated with new values.
Any suggestions on how best to accomplish this? Should I store everything in some kind of database?
Databases are deeply tied to identity. If a database can't identify a row by the data that's in it, a database isn't going to help you.
If I were you, I'd start by storing the base file in a version control system, not a database. At 20 changes per 5 months, I'd probably make those changes manually and commit each batch of changes. (I don't know what might constitute a batch for you. Could be a single change every time.)
Since the format of the existing file is both crucial and brittle, I'm not sure whether modifying it is a good idea. I think I'd feel better about storing error ranges and publication hyperlinks in a separate file, and using a script to put the pieces together for applications that can use error ranges and hyperlinks.
A database sounds sensible, SQL Server Express is free and widely used.
You can read in the text file including all comments and output the edited data in the same format. You can use a number of front ends including Access, for rapid development, or something you create yourself in VB.Net, or even Excel, at a pinch.
You will need to consider the structure of the table(s) but it should not be too difficult, and you can get help here.
For updating the information in the file introducing errors and links, you don't need any database; just open the file, iterate through the lines and update each one.
If you want to be able to restore a line state, you definetively need some kind of database. You can create a database in Sql Server or Firebird for example, and store in it a row for each line historical state (with date of creation off course); your file itself would be the repository for current values and you would be able to restore the file with a date and some simple fetcing of the database information.
If you can't use a database like Firebird or SQL Server, you can store the historical data in a simple text file, it's up to you. Just remember that you necesarely will need, like #CatCall commented, a way to identify each line in order to create a relation between the line in the file and the historical data stored in your repository.

Export tables from SQL Server to be imported to Oracle 10g

I'm trying to export some tables from SQL Server 2005 and then create those tables and populate them in Oracle.
I have about 10 tables, varying from 4 columns up to 25. I'm not using any constraints/keys so this should be reasonably straight forward.
Firstly I generated scripts to get the table structure, then modified them to conform to Oracle syntax standards (ie changed the nvarchar to varchar2)
Next I exported the data using SQL Servers export wizard which created a csv flat file. However my main issue is that I can't find a way to force SQL Server to double quote column names. One of my columns contains commas, so unless I can find a method for SQL server to quote column names then I will have trouble when it comes to importing this.
Also, am I going the difficult route, or is there an easier way to do this?
Thanks
EDIT: By quoting I'm refering to quoting the column values in the csv. For example I have a column which contains addresses like
101 High Street, Sometown, Some
county, PO5TC053
Without changing it to the following, it would cause issues when loading the CSV
"101 High Street, Sometown, Some
county, PO5TC053"
After looking at some options with SQLDeveloper, or to manually try to export/import, I found a utility on SQL Server management studio that gets the desired results, and is easy to use, do the following
Goto the source schema on SQL Server
Right click > Export data
Select source as current schema
Select destination as "Oracle OLE provider"
Select properties, then add the service name into the first box, then username and password, be sure to click "remember password"
Enter query to get desired results to be migrated
Enter table name, then click the "Edit" button
Alter mappings, change nvarchars to varchar2, and INTEGER to NUMBER
Run
Repeat process for remaining tables, save as jobs if you need to do this again in the future
Use the SQLDeveloper migration tools
I think quoting column names in oracle is something you should not use. It causes all sort of problems.
As Robert has said, I'd strongly advise agains quoting column names. The result is that you'd have to quote them not only when importing the data, but also whenever you want to reference that column in a SQL statement - and yes, that probably means in your program code as well. Building SQL statements becomes a total hassle!
From what you're writing, I'm not sure if you are referring to the column names or the data in these columns. (Can SQLServer really have a comma in the column name? I'd be really surprised if there was a good reason for that!) Quoting the column content should be done for any string-like columns (although I found that other characters usually work better as the need to "escape" quotes becomes another issue). If you're exporting in CSV that should be an option .. but then I'm not familiar with the export wizard.
Another idea for moving the data (depending on the scale of your project) would be to use an ETL/EAI tool. I've been playing around a bit with the Pentaho suite and their Kettle component. It offered a good range of options to move data from one place to another. It may be a bit oversized for a simple transfer, but if it's a big "migration" with the corresponding volume, it may be a good option.

storing long text

what is the best way to store long texts (articles) in a database? it doesnt need to be searchable.
i want to allow ppl to read the first chapter of every book in my bookstore. dumping it into a database field makes it difficult to style paragraphs using css..
EDIT: access database
If it is sql server 2005 USE VARCHAR(MAX)
EDIT,
It seems he saif access,
so i would go with memo
Up to 63,999 characters. (If the Memo
field is manipulated through DAO and
only text and numbers [not binary
data] will be stored in it, then the
size of the Memo field is limited by
the size of the database.)
or OLE Object (if you can)
An object (such as a Microsoft Excel
spreadsheet, a Microsoft Word
document, graphics, sounds, or other
binary data) linked (OLE/DDE link: A
connection between an OLE object and
its OLE server, or between a Dynamic
Data Exchange (DDE) source document
and a destination document.) to or
embedded (embed: To insert a copy of
an OLE object from another
application. The source of the object,
called the OLE server, can be any
application that supports object
linking and embedding. Changes to an
embedded object are not reflected in
the original object.) in a Microsoft
Access table.
Up to 1 gigabyte (limited by available
disk space)
you have several options:
store it as a long single string with no formatting, which will look bland on the screen.
store it as a long single string with embedded html and css, which will be a bad choice if you ever want to make your site have a different look/feel.
normalize it so you have tables to store books, chapters, paragraphs, etc. you could then format and style the text as you load it into the application.
The main difference between long text (CLOB / TEXT / VARCHAR(MAX)) and long data (BLOB / IMAGE / VARBINARY(MAX)) is that the former is subject to character set conversions while the former is not.
If you need to make character set conversion on the database side, use CLOB and similar.
If you always want to retrieve your data as you atored it, byte-to-byte (as opposed character-to-character), use BLOB and similar.
I don't know which database you're using, but if text doesn't need to be searchable, then you can simply store the HTML formatted text (for instance, value coming from an FCKEditor or components like this). If you need also searchability, then you can store both HTML an plain text in two separated fields.
Fields can be nvarchar(MAX) if you use MS SQL Server 2008 or any equivalent datatype on other databases.
EDIT:
Seems you're using Access, so go for Memo data type!
If you decide to store HTML, consider to store only a generic markup (div, p) to divide your text, than later apply CSS formatting, wrapping stored text within another div specifing formatting classes for children elements.
I wouldn't store any of the documents in the database, but store the data in files in the file system, and the only thing that's in the database would be a pointer to the data files.
You don't give any details in your question that would suggest any need whatsoever to store the documents in the database itself.
And there are very few circumstances where it's advantageous.
Use a CLOB.
For SQL Server
TEXT / NTEXT for SQL Server 2000
VARCHAR(MAX) / NVARCHAR(MAX) for SQL Server 2005 onwards
I would propose storing the first chapter as pdf file. This is secure and allows for good formatting. Then use a blob, clob, varchar, or text field depending on your product (see the other answers).
Or you could use images and look into something like amazone's "look inside". It would work with the same db techniques.
Alternatively you could use something like markup.
I personally do not like to put html in my database. Even if it is only for output. Too easy to put in some javascript. But maybe I'm just too cautious.
The following applies to Jet 4.0 only, being the version of the Access Database Engine in the era Access2000 to Access2003 inclusive:
I wouldn't store any of the documents in the database, but store the data in files in the file system, and the only thing that's in the database would be a pointer to the data files.
You don't give any details in your question that would suggest any need whatsoever to store the documents in the database itself.
And there are very few circumstances where it's advantageous.
If you are using ACE, being the version of the Access Database Engine in the Access2007 era, the Attachment data type would be an option, however I don't really know how it works, I've never used it so I can't recommend it nor say whether it's better or worse for this purpose. I'm also wary of new data types in the first release of a major version of the Access Database Engine. I just remember all the issues with byte and decimal fields at the introduction of Jet 4 and don't want to commit to something that may never work properly. The Attachment type in the ACCDB format was introduced for Sharepoint compatibility, and that outside dependency is something that gives me pause. Will the ACCEDB data type change someday if Sharepoint changes the way it works? I'm not sure I'd want to take that risk.
Put it in a TEXT field, and put it with their <p> so you'll be able to style paragraphs.
As it doesn't need to be searchable, it won't impact your sql performance.

Resources