I have a requirement where i need to store a very large lookup(10 million records) file in database
File will look like this
{
"users":["user1","user2","user3","user4"]
}
and this file will also be updated time to time by the admin what should be the database i should choose here
Related
I have File csv that contains large data,every time the user upload new file the old data will be updated or deleted it depends on the file and save the new data.
I am using Spring bash for this task.
I am creating a job that contains two steps :
first steps : A tasklet for updating the old data
second steps : steps that contains a reader,procssor and writer with chunk data to persist the new data
the problèm is in the time of save and update is very lard 12min for file that contains 80000 row.
can I optimize the time for this job ?
Import (export) big data process, update, delete, update using joining to tables, searching process, all these operations very very faster on Databases than on programming languages. I recommended to you these:
Use SQL Server BULK INSERT command to import data from CSV into Database. For example, for 10 million records this process will be executed in 12 seconds.
After importing data you can update, delete or insert new data on the database using joining to import table.
This is the best way, I think that.
I have a UI that users can search IDs of students. The current database contains student data for the last 2 years, and data before that has been "FULL" backed up in some files in which saved in some name format that contains date, like backup_db_2017_01_to_2018_01.
Currently when the user searches for an old student ID:
I search the current database and if there is no data, it automatically restores the last backup and merges data with the current database. If the id is not in the last backup, it restored another backup, and so on...
In this way, too much data merged with current data and it takes too long time. In the worst scenario, the student id is in the oldest backup.
I wonder what is the best way to do that?
I assume that you have the space to RESTORE and MEREGE all of the old backups?
You could consider merging all of the old data onto a READ-ONLY FILEGROUP so that it is always available but not able to be updated.
my application play videos files after that user they are registered .(files are larger than 100 MB ) .
Is it better to do I store them on the hard drive and Keep file path in database ?
Or
do I store in database as File Stream Type ?
When data is stored in the database, are more secure against manipulation vs with stored in hard ?
How to provide data security against manipulation ?
Thanks .
There's a really good paper by Microsoft Research called To Blob or Not To Blob.
Their conclusion after a large number of performance tests and analysis is this:
if your pictures or document are typically below 256K in size, storing them in a database VARBINARY column is more efficient
if your pictures or document are typically over 1 MB in size, storing them in the filesystem is more efficient (and with SQL Server 2008's FILESTREAM attribute, they're still under transactional control and part of the database)
in between those two, it's a bit of a toss-up depending on your use
If you decide to put your pictures into a SQL Server table, I would strongly recommend using a separate table for storing those pictures - do not store the employee foto in the employee table - keep them in a separate table. That way, the Employee table can stay lean and mean and very efficient, assuming you don't always need to select the employee foto, too, as part of your queries.
For filegroups, check out Files and Filegroup Architecture for an intro. Basically, you would either create your database with a separate filegroup for large data structures right from the beginning, or add an additional filegroup later. Let's call it LARGE_DATA.
Now, whenever you have a new table to create which needs to store VARCHAR(MAX) or VARBINARY(MAX) columns, you can specify this file group for the large data:
CREATE TABLE dbo.YourTable
(....... define the fields here ......)
ON Data -- the basic "Data" filegroup for the regular data
TEXTIMAGE_ON LARGE_DATA -- the filegroup for large chunks of data
Check out the MSDN intro on filegroups, and play around with it!
1 - depends on how you define "better". In general, I prefer to store binary assets in the database so they are backed up alongside the associated data, but cache them on the file system. Streaming the binary data out of SQL Server for a page request is a real performance hog, and it doesn't really scale.
If an attacker can get to your hard drive, your entire system is compromised - storing things in the database will offer no significant additional security.
3 - that's a whole question in its own right. Too wide for Stack Overflow...
If an application requires images (ie. JPGs, PNGs etc) to be referenced in a database-driven application, should these images just be stored in a file system with their path referenced in a database, or should the images actually be stored in the database as BLOBS?
There's a really good paper by Microsoft Research called To Blob or Not To Blob.
Their conclusion after a large number of performance tests and analysis is this:
if your pictures or document are typically below 256K in size, storing them in a database VARBINARY column is more efficient
if your pictures or document are typically over 1 MB in size, storing them in the filesystem is more efficient (and with SQL Server 2008's FILESTREAM attribute, they're still under transactional control and part of the database)
in between those two, it's a bit of a toss-up depending on your use
If you decide to put your pictures into a SQL Server table, I would strongly recommend using a separate table for storing those pictures - do not store the employee foto in the employee table - keep them in a separate table. That way, the Employee table can stay lean and mean and very efficient, assuming you don't always need to select the employee foto, too, as part of your queries.
For filegroups, check out Files and Filegroup Architecture for an intro. Basically, you would either create your database with a separate filegroup for large data structures right from the beginning, or add an additional filegroup later. Let's call it "LARGE_DATA".
Now, whenever you have a new table to create which needs to store VARCHAR(MAX) or VARBINARY(MAX) columns, you can specify this file group for the large data:
CREATE TABLE dbo.YourTable
(....... define the fields here ......)
ON Data -- the basic "Data" filegroup for the regular data
TEXTIMAGE_ON LARGE_DATA -- the filegroup for large chunks of data
Check out the MSDN intro on filegroups, and play around with it!
Currently, I would like provide this as an option to the user when storing data to the database.
Save the data to a file and use a background thread to read data from the textfile to SQL server.
Flow of my program:
- A stream of data coming from a server constantly (100 per second).
- want to store the data in a textfile and use background thread to copy data from the textfile back to the SQL database constantly as another user option.
Has this been done before?
Cheers.
Your question is indeed a bit confusing.
I'm guessing you mean that:
100 rows per second come from a certain source or server (eg. log entries)
One option for the user is textfile caching: the rows are stored in a textfile and periodically an incremental copy of the contents of the textfile into (an) SQL Server table(s) is performed.
Another option for the user is direct insert: the data is stored directly in the database as it comes in, with no textfile in between.
Am I right?
If yes, then you should do something in the lines of:
Create a trigger on an INSERT action to the table
In that trigger, check which user is inserting. If the user has textfile caching disabled, then the insert can go on. Otherwise, the data is redirected to a textfile (or a caching table)
Create a stored procedure that checks the caching table or text file for new data, copies the new data into the real table, and deletes the cached data.
Create an SQL Server Agent job that runs above stored procedure every minute, hour, day...
Since the interface from T-SQL to textfiles is not very flexible, I would recommend using a caching table instead. Why a textfile?
And for that matter, why cache the data before inserting it into the table? Perhaps we can suggest a better solution, if you explain the context of your question.