Populate SQL database from textfile on a background thread constantly - sql-server

Currently, I would like provide this as an option to the user when storing data to the database.
Save the data to a file and use a background thread to read data from the textfile to SQL server.
Flow of my program:
- A stream of data coming from a server constantly (100 per second).
- want to store the data in a textfile and use background thread to copy data from the textfile back to the SQL database constantly as another user option.
Has this been done before?
Cheers.

Your question is indeed a bit confusing.
I'm guessing you mean that:
100 rows per second come from a certain source or server (eg. log entries)
One option for the user is textfile caching: the rows are stored in a textfile and periodically an incremental copy of the contents of the textfile into (an) SQL Server table(s) is performed.
Another option for the user is direct insert: the data is stored directly in the database as it comes in, with no textfile in between.
Am I right?
If yes, then you should do something in the lines of:
Create a trigger on an INSERT action to the table
In that trigger, check which user is inserting. If the user has textfile caching disabled, then the insert can go on. Otherwise, the data is redirected to a textfile (or a caching table)
Create a stored procedure that checks the caching table or text file for new data, copies the new data into the real table, and deletes the cached data.
Create an SQL Server Agent job that runs above stored procedure every minute, hour, day...
Since the interface from T-SQL to textfiles is not very flexible, I would recommend using a caching table instead. Why a textfile?
And for that matter, why cache the data before inserting it into the table? Perhaps we can suggest a better solution, if you explain the context of your question.

Related

What is best way to update multipli record in database (800000 row) and persist the new data from csv file using spring bash?

I have File csv that contains large data,every time the user upload new file the old data will be updated or deleted it depends on the file and save the new data.
I am using Spring bash for this task.
I am creating a job that contains two steps :
first steps : A tasklet for updating the old data
second steps : steps that contains a reader,procssor and writer with chunk data to persist the new data
the problèm is in the time of save and update is very lard 12min for file that contains 80000 row.
can I optimize the time for this job ?
Import (export) big data process, update, delete, update using joining to tables, searching process, all these operations very very faster on Databases than on programming languages. I recommended to you these:
Use SQL Server BULK INSERT command to import data from CSV into Database. For example, for 10 million records this process will be executed in 12 seconds.
After importing data you can update, delete or insert new data on the database using joining to import table.
This is the best way, I think that.

Read single text file and based on a particular value of a column load that record into its respective table

I have been searching on the internet for a solution to my problem but I can not seem to find any info. I have a large single text file ( 10 million rows), I need to create an SSIS package to load these records into different tables based on the transaction group assigned to that record. That is Tx_grp1 would go into Tx_Grp1 table, Tx_Grp2 would go into Tx_Grp2 table and so forth. There are 37 different transaction groups in the single delimited text file, records are inserted into this file as to when they actually occurred (by time). Also, each transaction group has a different number of fields
Sample data file
date|tx_grp1|field1|field2|field3
date|tx_grp2|field1|field2|field3|field4
date|tx_grp10|field1|field2
.......
Any suggestion on how to proceed would be greatly appreciated.
This task can be solved with SSIS, just with some experience. Here are the main steps and discussion:
Define a Flat file data source for your file, describing all columns. Possible problems here - different data types of fields based on tx_group value. If this is the case, I would declare all fields as strings long enough and later in the dataflow - convert its type.
Create a OLEDB Connection manager for the DB you will use to store the results.
Create a main dataflow where you will proceed the file, and add a Flat File Source.
Add a Conditional Split to the output of Flat file source, and define there as much filters and outputs as you have transaction groups.
For each transaction group data output - add Data Conversion for fields if necessary. Note - you cannot change data type of existing column, if you need to cast string to int - create a new column.
Add for each destination table an OLEDB Destination. Connect it to proper transaction group data flow, and map fields.
Basically, you are done. Test the package thoroughly on a test DB before using it on a production DB.

How to transfer only new records between two different databases (ie. Oracle and MSSQL) using SSIS?

Do you know how to transfer only new records between two different databases (ie. Oracle and MSSQL) using SSIS? There is no problem transfering new data only between two tables in the same database and server, but is this possible to do such operation between completely different servers and databases?
Ps. I know about solution using Lookup but it is not very efficient if anybody needs to check and add a lot of records (50k and more) several times per day. I would like to operate with new data only.
You have several options:
Timestamp based solution
If you have a column which stores the insertation time in the source system, you can select only the new records created since the last load. With the same logic, you can transfer modified records too, just mark the records with the timestamp value when it change.
Sequence based solution
If there is a sequence in the source table, you can load the new records based on that sequence. Query the last value from the destination system, then load avarything which is larger than that value.
CDC based solution
If you have CDC (Change Data Capture) in your source system, you can track the changes and you can load them based on the CDC entries.
Full load
This is the most resource hungry solution: you have to copy all data from the source to the destination. If you do not have any column which marks the new records, you should use this solution.
You have several options to achieve this:
TRUNCATE the destination table and reload it from source
Use a Lookup component to determine which records are missing
Load all data from source to a temporary table and write a query which retrieves the new/changed records.
Summary
If you have at least one column, which marks the new/modified records, you can use it to implement a differential/incremental load with SSIS. If you do not have any clue, which columns/rows are changed, you have to load (or at least query) all of them.
There is no solution which enables a one-query (INSERT .. SELECT) solution using multiple servers without transferring all data. (Please note, that a multi-server query using Linked Servers are transfers the data from the source system).
What about variables? Is it possible to use the same variable between different databases and servers in SSIS?
I would like to transfer last id number from a destination table and transfer it to the source table (different server!).
I can set a variable in a database scope like this:
DECLARE #Last int
SET #Last = (SELECT TOP 1 Id FROM dbo.Table_1 ORDER BY Id DESC)
SELECT *
FROM dbo.Table_2
WHERE ID > #Last;
However it works between two tables in the same database (as a SQL command) only. I can create a variable for a entire SSIS package in Variables --> Add variable, but I don't know it is possible to use the variable in a similar way as above - to keep an information about last id in a destination table and pass it to another table on a source server as data limit.

What is a better alternative to Excel for loading data to a SQL Server database?

I have a huge amount of trouble loading spreadsheets into a SQL Server database.
Currently, I'm using an SSIS package to load the data and I have had to make lots of adjustments to get the data to load:
All numbers must be formatted as text (otherwise they don't load properly).
Sometimes numbers must be preceded with single quote (') to get them to load.
If a column has a mix of number cells and text cells, the text cells must come first in the file (otherwise only numbers load and text comes in as NULL).
If a user changes a column name the file will not load.
If a user changes a tab name the file won't load.
If a user adds a new column (even at the end of a sheet) the file won't load.
Extra sheets in the file is not a problem, thankfully!
Dates seem sensitive whether or not they will load properly.
Connection strings to the Excel file must include "IMEX=1" or things are worse.
Scheduled SSIS jobs must be run as 32-bit even on 64-bit system.
I've been loading the data (usually 200,000-500,000 rows per file) into a table with all fields defined as nvarchar. Then, when loaded I transfer that data in the next step of the SSIS package to the working table with typed data fields.
All of the requirements that I must put on the user for how to format the Excel file is really a pain. We usually have to send the file back multiple times until all the formatting issues are correct before the file will load. I'd like to eliminate this thrash.
I know I'm not the only one that is facing this type of problem. So, I must ask...
What is a better alternative to Excel for loading data into a SQL Server database?
Or, am I going about this the wrong way? Should I be using something other than SSIS to load Excel spreadsheets?
You can try OpenRowSet:
SELECT *
INTO SomeTable
From OpenRowSet('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=\\servername\c$\filename.xls;HDR=YES;IMEX=1', [Sheet2$])
Not really a SQL answer, but an easy one:
You could require the users to copy and paste data to an excel spreadsheet where everything but the data fields to be included are locked. This will prevent many of the pain points described.

Is their a unique signature value for the state of of an sqlite3 DB (or table)?

I was looking for some single value that tracked the writes to either the DB or an individual table in the DB.
I would like to say "This data was extracted at this time, from this DB, in this state"
I am not bothered about future updates recreating the data of the table, just information equivalent to a simple count of the number of writes would do.
This would allow me to record the same info when I did other extracts from the DB and so could check to ensure consistency.
Thanks in advance :-)
Use the database file's modified date and time from the file system.

Resources