I'm working on an application that requires a lot of data. This data is stored in SAP (some big enterprise planning tool) and needs to be loaded in an Oracle database. The data I'm talking about is 15.000+ rows long and each row has 21 columns.
Every time an interaction is made with SAP (4 times a day), those 15.000 rows are exported and have to be loaded in the Oracle database. I'll try to explain what I do now to achieve my goal:
Export data from SAP to a CSV file
Remove all rows in the Oracle database
Load the exported CSV file and import this into the Oracle database
What you can conclude from this is that the data has to be updated in the Oracle database if there is a change in the row. This process takes about 1 minute.
Now I'm wondering if it would be faster to check each row in the Oracle database for changes in the CSV file. The reason why I ask this before trying it first is because it requires a lot of coding to do what my question is about. Maybe someone has done something similar before and can guide me with the best solution.
All the comments helped me reduce the time. First Truncate, then insert all rows with the Oracle DataAccess library instead of OleDb.
Related
We are building a DWH and the initial load would be millions of rows(a few tables have like around 300 million rows). The data will later be updated every 10 minutes using SSIS package which will be like a few thousand rows.Data migration would be from Oracle to SQL Server.
Can you suggest an efficient way of extracting data initially. Is using SQL Server Import and Export a good and faster option than SSIS for initial load?
Thanks
First: the SQL Server Import and Export wizard creates an SSIS "package under the covers".
I recently had to solve the same problem - our Oracle-to-SQL Server replication infrastructure cratered and we had to rebuild it, which involved initial table loads of the same size that you describe. We used SSIS packages for all of them, and the performance was sufficient to complete the task in the window we had available.
Another option to consider would be getting the Oracle data as a flat file export and BCP import, if the Oracle data are clean enough. If you go that route, though, I'm afrad that others will need to assist - I can barely spell "BCP".
I just extracted and loaded 24.5 million rows in 9 minutes from Oracle DB to SQL Server which I found super awesome!!!
Solution : Used Attunity connector for Oracle and change the batch size to whatever suits to you(1000/5000/10000) 1000 worked for me. (default 100)
The National Weather Service's Climate Prediction Center maintains data of recent weather data from about 1400 weather stations across the United States. The data for the previous day can always be found at the following address:
http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/cdus/prcp_temp_tables/dly_glob1.txt
In an ambitious attempt to store weather data for future reference, I want to store this data by row using SQL Server 2012. Five years ago a similar question was asked, and this answer mentioned the BULK INSERT command. I do not have access to this option.
Is there an option which allows for direct import of a web hosted text file which does not use the BULK statement? I do not want to save the file as I plan on automating this process and having it run daily direct to the server.
Update: I have found another option in Ad Hoc Distributed Queries. This option is also unavailable to me based on the nature of the databases in question.
Why do you NOT have access to Bulk Insert? I can't think of a reason that would be disabled on your version of SQL Server.
I can think of a couple ways of doing the work.
#1) Record a macro, using excel, to do everything from the data import, to the parsing of the data sets, and then to saving as a CSV file. I just did it; very easy. Then, use BULK INSERT to get the data from the CSV to SQL Server.
#2) Record a macro, using excel, to do everything from the data import, to the parsing of the data sets. Then use a VBA script to send the data to SQL Server. You will find several ideas from the link below.
http://www.excel-sql-server.com/excel-sql-server-import-export-using-vba.htm#Excel%20Data%20Export%20to%20SQL%20Server%20using%20ADO
#3) You could actually use Python or R to get the data from the web. Both have excellent HTML parsing packages. Then, as mentioned in point #1 above, save the data as a CSV (using Python or R) and BULK INSERT into SQL Server.
R is probably a bit off topic here, but still a viable option. I just did it to test my idea and everything is done in just two lines of code!! How efficient is that!!
X <- read.csv(url("http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/cdus/prcp_temp_tables/dly_glob1.txt"))
write.csv(X, file = "C:\\Users\\rshuell001\\Desktop\\foo.csv")
I'm copying about 200 tables from Oracle to SQL Server using SSIS. Right now, the basic package template follows this logic:
Get time
Truncate table
Load data and get row count
Record table name, row count, and time to log table.
Currently, I copy and paste the package and change the data flow. Is there a better way to do this? I know SSIS is metadata driven, but doing 200 tables like this is a little ridiculous. And if my boss wants me to change something in the template then I get to do it all over again. Is there a way to loop through tables? I would just use linked servers in SQL Server but since we have SQL Server Enterprise I'm able to use the Attunity connectors and they are much faster.
Any help would be appreciated. It seems like there must be a better way but I'm not familiar enough with SSIS to really know what to ask for.
I have 4k records in access database. And one of the field value contains ~100 lines each
so and one other field has ~25 lines. So total database size reaches ~30MB and it takes lot of time 15-20 seconds to load the database in vb.net using odbc http://www.homeandlearn.co.uk/net/nets12p5.html
and updating of any other small fields also takes time due to database being large
So as an alternative I used rtf file (txt files were not preserving all the newline characters). So these file are around 5-10kb only. But for 4k records and 2 fields I have now 8k files. And copying of these 8k rtf files is taking huge time for 5MB transfer it takes an hour or so.
So is there any other alternative for storage of this data. So that it will be portable and easily loaded/accessed/updated from vb.net?
MDB Databases
MDB is the Access database filetype. Access databases were never designed to be used for backends of web systems, they are mainly for light office use.
Improving performance
For temporary improvement of performance, you can compact and repair the database. Open it up, and find the link in the tools menu. Alternatively you can do this programaticaly. This should be done reasonably frequently depending on the number of changes your databases has made to it. What does compact and repairing do?
Also, slowness is often a sign of inefficient design. Consider reading up on database normalisation if your database is not fully normalised. This should significantly improve performance and is an essential standard that should be learned.
Alternatives
For 4k+ records you should probably be using a decent database system designed specifically for larger amounts of data.
SQL-Server is an excellent database system from Microsoft. MySQL is also a great open source alternative. The Internet is full of tutorials on how to connect to these databases.
I'm using sometimes Access databases in .net too. Ok, MS-Access isn't the best database for this kind of application, I know. But the easy-doing complex queryes and the functional and well-knowed reports makes Access a good cost-benefit solution.
I saw the link that you've indicated. This way was my first technique, but then I realized there was another easier and faster. I suggest you to do the linkage for Access database in a different way.
Create a dataSet, if you already didn't it.
Create a connection to the MS-Access database using database explorer.
Drag and drop your desired tables on created DataSet (.net will create the designer code for you in backStage)
On code, create an tableAdapter object and a table object:
Supose that your dataSet name is DS1 and a table name is table01.
language: VB.NET
check intellisense autocomplete for your dataobjects
creates a tableadapter object and table object (designed when you drop the database explorer objects in dataset)
dim table01_TA as new ds1Tableadapters.table01_tableAdapter
dim table01 as new ds1.table01dataTable
loads the database data into the on-memory table table01
table01 = table01_TA.getData
do your opperations using table01 (add, update, insert, delete, queries)
for automatic generation of scripts for update, insert and delete, make sure your table has primaryKeys and correct relationships.
finally, update the table adapter. Unless you do it, the data will not be updated in the database.
table01_Ta.update(table01)
I suggest you use LINQ to query your data, and the datatable methods to adding and editing data. These methods are created automatically when you drop the databaseExplorer tables on dataSet and save it. Its worth to compact and repair Access database frequently.
Contat-me if you have troubles.
I agree with Tom's recommendation. Get yourself a decent database server. However, judging by your description of your performance issues it seems like you have other serious problems which are probably going to be difficult to resolve here.
I am tasked with exporting the data contained inside a MaxDB database to SQL Server 200x. I was wondering if anyone has gone through this before and what your process was.
Here is my idea but its not automated.
1) Export data from MaxDB for each table as a CSV.
2) Clean the CSV to remove ? (which it uses for nulls) and fix the date strings.
3) Use SSIS to import the data into tables in SQL Server.
I was wondering if anyone has tried linking MaxDB to SQL Server or what other suggestions or ideas you have for automating this.
Thanks.
AboutDev.
I managed to find a solution to this. There is an open source MaxDB library that will allow you to connect to it through .Net much like the SQL provider. You can use that to get schema information and data, then write a little code to generate scripts to run in SQL Server to create tables and insert the data.
MaxDb Data Provider for ADO.NET
If this is a one time thing, you don't have to have it all automated.
I'd pull the CSVs into SQL Server tables, and keep them forever, will help with any questions a year from now. You can prefix them all the same, "Conversion_" or whatever. There are no constraints or FKs on these tables. You might consider using varchar for every column (or the ones that cause problems, or not at all if the data is clean), just to be sure there are no data type conversion issues.
pull the data from these conversion tables into the proper final tables. I'd use a single conversion stored procedure to do everything (but I like tsql). If the data isn't that large millions and millions of rows or less, just loop through and build out all the tables, printing log info as necessary, or inserting into exception/bad data tables as necessary.