WebAPI EF update 30,000 rows of data is very slow

WebAPI EF update 30,000 rows of data is very slow - sql-server

I am trying to have my asp.net WebAPI web service read a .csv and update a database using Entity Framework. The .csv file is about 20,000-30,000 rows.
As of now I am using a TextfieldParser to read the .csv, each row of the .csv file I create a new object, then add object to the EF context.
Once it's done adding all rows to the context, then I call db.SaveChanges();
Watching the console I noticed it calls an update statement for each row... which takes a long time. Is there a better more efficient way to accomplish this?
if (filetype == "xxx")
{
using (TextFieldParser csvReader = new TextFieldParser(downloadFolder + fileName))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
int rowCount = 1;
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
//skip header row
if (rowCount != 1)
{
var t = new GMI_adatpos
{
PACCT = fieldData[3]
};
db.GMI_adatpos.Add(t);
}
rowCount++;
}
}
}
db.SaveChanges();

This issue is very common,
In your case, we can split it into two category:
Add vs AddRange Performance
Write & database round-trip
Add vs AddRange Performance
The Add method will try to detect change every time you add a new record while the AddRange only does it once. Detecting changes every time can take several minutes.
This issue is very easy to fix, simply create a list, add the entity to this list instead and use AddRange with the list at the end.
List<GMI_adatpo> list = new List<GMI_adatpo>();
if (filetype == "xxx")
{
using (TextFieldParser csvReader = new TextFieldParser(downloadFolder + fileName))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
int rowCount = 1;
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
//skip header row
if (rowCount != 1)
{
var t = new GMI_adatpos
{
PACCT = fieldData[3]
};
list.Add(t);
}
rowCount++;
}
}
}
db.GMI_adatpos.AddRange(list)
db.SaveChanges();
Write & Database round-trip
Everytime you save a record, you perform a database round-trip. So if you insert average 30,000 record, you perform 30,000 database round-trip which is insane!
Disclaimer: I'm the owner of the project Entity Framework Extensions
This library allows to perform:
BulkSaveChanges
BulkInsert
BulkUpdate
BulkDelete
BulkMerge
You can either call BulkSaveChanges instead of SaveChanges or create a list to insert and use directly BulkInsert instead for even more performance.
BulkSaveChanges Solution (Way faster than SaveChanges)
db.GMI_adatpos.AddRange(list)
db.SaveChanges();
BulkInsert Solution (Fastest than BulkSaveChanges but do not save related entities)
db.BulkInsert(list);

Because the number of items added to the DbContext is very high, ram space is gradually filled, and operation is very slow. Therefore is better that after a few records (ex 100), calling SaveChanges Methods and renew DbContext.
if (filetype == "xxx")
{
using (TextFieldParser csvReader = new TextFieldParser(downloadFolder + fileName))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
int rowCount = 1;
while (!csvReader.EndOfData)
{
if(rowCount%100 == 0)
{
db.Dispose();
db.SaveChanges();
db = new AppDbContext();//Your DbContext
}
string[] fieldData = csvReader.ReadFields();
//skip header row
if (rowCount != 1)
{
var t = new GMI_adatpos
{
PACCT = fieldData[3]
};
db.GMI_adatpos.Add(t);
}
rowCount++;
}
}
}

Related

Selenium Webdriver - passing bulk data with excel sheet by header name- more than 50 fields of form

I am looking for some solution where i want to pass 100s of records to the form where i am having more than 50 fields. I did some research for the testNG data providers but it looks like that it returns only strings so i feel that it will not be feasible to go with data providers as if its not good to pass 50 string arguments to specific function. Also i did some research to read excel file and i get two ways that either i can go with the jxl or Apache poi but with that also i am not able to read the data by the column header as if i can not go with the row and column number of approach as i have so many fields that i need to work with. The reason behind that is that in future is one field has added to single form that its going to be rework and again its not feasible.
enter image description here
I have been following this link:
http://www.softwaretestinghelp.com/selenium-framework-design-selenium-tutorial-21/
for reading data column wise but any how i am not getting the records based on the column header. Do we have any other way to achieve this.
Thanks

"testNG data providers but it looks like that it returns only strings" - incorrect. It allows you to return a multidimensional array of type Object. What kind of object you create is your own code. You may choose to read from the excel, encapsulate all the fields in one object (your own pojo) or multiple objects and then the method argument can have just those object types declared and not the 50 strings.
Both jxl and poi are libraries to interact with excel. If you want to have specific interaction with excel, like reading based on header, then you need to write code for that - it doesn't come out of the box.
If you are concerned about addition of one more column , then build your indices first by reading the header column, then put it in a relevant data structure and then go about reading your data.

I finally achieved that with the help of apache poi. I created on centralized function that is returning the hashmap having title as an index.
Here is that function:
Here is my main test function:
#Test(dataProvider="dpCreateNewCust")
public void createNewCustomer(List<Map<String, String>> sheetList){
try{
//Step 2. Login
UtilityMethods.SignIn();
for(Map<String, String> map : sheetList){
//Step 3. New Customer
if(map.get("Testcase").equals("Yes"))
{
//Process with excel data
ProcessNewCustomer(map);
}
}
}
catch(InterruptedException e)
{
System.out.println ("Login Exception Raised: <br> The exception get caught" + e);
}
}
//My data provider
#DataProvider(name = "dpCreateNewCust")
public Object[][] dpCreateNewCust(){
XLSfilename = System.getProperty("user.dir")+"//src//watts//XLSFiles//testcust.xlsx";
List<Map<String, String>> arrayObject = UtilityMethods.getXLSData(XLSfilename,Sheetname));
return new Object[][] { {arrayObject } };
}
//----GetXLSData Method in UtilityMethods Class :
public static List<Map<String, String>> getXLSData(String filename, String sheetname)
{
List<String> titleList = new ArrayList<String>();
List<Map<String, String>> sheetList = new ArrayList<Map<String, String>>();
try {
FileInputStream file = new FileInputStream(filename);
//Get the workbook instance for XLS file
XSSFWorkbook XLSbook = new XSSFWorkbook(file);
//Get first sheet from the workbook
//HSSFSheet sheet = workbook.getSheetAt(0);
WorkSheet = XLSbook.getSheet(sheetname);
//Iterate through each rows from first sheet
int i = 0;
Iterator<Row> rowIterator = WorkSheet.iterator();
while(rowIterator.hasNext()) {
Row row = rowIterator.next();
//For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
int j = 0;
Map<String, String> valueMap = new HashMap<>();
while(cellIterator.hasNext()) {
Cell cell = cellIterator.next();
if(i==0){
titleList.add(cell.getStringCellValue());
}
else
{
String cellval = "";
switch(cell.getCellType()) {
case Cell.CELL_TYPE_BOOLEAN:
cellval = cell.getBooleanCellValue()+"";
break;
case Cell.CELL_TYPE_NUMERIC:
cellval = String.valueOf(cell.getNumericCellValue())+"";
break;
case Cell.CELL_TYPE_STRING:
cellval = cell.getStringCellValue();
break;
default:
break;
}
if(cellval!="")
{
valueMap.put(titleList.get(j), cellval); valueMap.put("ResultRow",String.valueOf(row.getRowNum()));
valueMap.put("ResultCol",String.valueOf(0));
}
}
j++;
}
if(i!=0 && !valueMap.isEmpty()){
//System.out.println(valueMap);
sheetList.add(valueMap);
}
i++;
}
//System.out.println(sheetList); System.exit(0);
file.close();
XLSbook.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return sheetList;
}

SQL Server 2008 changed table name bizarre behavior

I changed the name of one of my tables, then afterwards encoded some data then pulled it using a view to my surprise the data is not showing. I tried renaming it back to its original name with no luck the same thing is happening.
Then finally I tried retyping the data on one of the columns and then executed the view and there the data is finally showing now the problem arises I need to re encode the data on one of the column every time a data is inserted which is obviously not a good thing to do.
here is the code on how i added some data
tblcsv.Columns.AddRange(new DataColumn[7] { new DataColumn("unit_name", typeof(string)), new DataColumn("unit", typeof(string)), new DataColumn("adrress", typeof(string)), new DataColumn("latitude", typeof(string))
,new DataColumn("longitude" , typeof(string)) , new DataColumn("region" , typeof(string)) , new DataColumn("linkid" , typeof(string))});
string ReadCSV = File.ReadAllText(forex);
foreach (string csvRow in ReadCSV.Split('\n'))
{
if (!string.IsNullOrEmpty(csvRow))
{
//Adding each row into datatable
tblcsv.Rows.Add();
int count = 0;
foreach (string FileRec in csvRow.Split(','))
{
tblcsv.Rows[tblcsv.Rows.Count - 1][count] = FileRec;
if (count == 5)
{
tblcsv.Rows[tblcsv.Rows.Count - 1][6] = link;
}
count++;
}
}
}
string consString = ConfigurationManager.ConnectionStrings["diposlConnectionString"].ConnectionString;
using (SqlConnection con = new SqlConnection(consString))
{
using (SqlBulkCopy sqlBulkCopy = new SqlBulkCopy(con))
{
//Set the database table name
sqlBulkCopy.DestinationTableName = "dbo.FRIENDLY_FORCES";
//[OPTIONAL]: Map the Excel columns with that of the database table
sqlBulkCopy.ColumnMappings.Add("unit_name", "unit_name");
sqlBulkCopy.ColumnMappings.Add("unit", "unit");
sqlBulkCopy.ColumnMappings.Add("adrress", "adrress");
sqlBulkCopy.ColumnMappings.Add("latitude", "latitude");
sqlBulkCopy.ColumnMappings.Add("longitude", "longitude");
sqlBulkCopy.ColumnMappings.Add("region", "region");
sqlBulkCopy.ColumnMappings.Add("linkid", "linkid");
con.Open();
sqlBulkCopy.WriteToServer(tblcsv);
con.Close();
}
}
the column region is where i manually edited the data
Did the renaming of the table did something to my data?
Or am I just missing something?
Thank you

Does dapper support .net dataset

in my opinion for dapper.query object there is a datareader, for dapper.Execute there is a ExectureNonQuery object. Correct me if i am wrong .
Can we use dapper for dataset which returns multiple tables?

No, there is not any built in support for DataSet, primarily because it seems largely redundant, but also because that isn't what dapper targets. But that doesn't mean it doesn't include an API for handling a query that selects multiple results; see QueryMultiple:
using (var multi = conn.QueryMultiple(sql, args))
{
var ids = multi.Read<int>().ToList();
var customers = multi.Read<Customer>().ToList();
dynamic someOtherRow = multi.Read().Single();
int qty = someOtherRow.Quantity, price = someOtherRow.Price;
}
Note that this API is forwards only (due to the nature of IDataReader etc) - basically, each Read / Read<T> etc maps to the next result grid in turn.

I might be late here but this is how I am doing the conversion of the IDataReader to a DataSet. Dapper returns a IDataReader when we use the ExecuteReaderAsync method. More information on this addition can be found here and here.
This is my attempt on this:
public async Task<DataSet> GetUserInformationOnUserId(int UserId)
{
var storedprocedure = "usp_getUserInformation";
var param = new DynamicParameters();
param.Add("#userId", UserId);
var list = await SqlMapper.ExecuteReaderAsync(_connectionFactory.GetEpaperDBConnection, storedprocedure, param, commandType: CommandType.StoredProcedure);
var dataset = ConvertDataReaderToDataSet(list);
return dataset;
}
And the ConvertDataReaderToDataSet will take in the IDataReader, you can use this method to convert the IReader to Dataset:
public DataSet ConvertDataReaderToDataSet(IDataReader data)
{
DataSet ds = new DataSet();
int i = 0;
while (!data.IsClosed)
{
ds.Tables.Add("Table" + (i + 1));
ds.EnforceConstraints = false;
ds.Tables[i].Load(data);
i++;
}
return ds;
}

How to know the order of update with Domain context SubmitChanges?

Suppose I have 3 entities generated from EF, say tab1, tab2 and tab3. In SL app, I call SubmitChanges to save data to DB, all changes will be process by WCF and EF automatically.
Question is: how can I know the order of Update operation in Database?
I need to know this because I have triggers on those tables and need to know the order of the updating.

One thing you can do is to override the PeristChangeSet() in your DomainService and manually control the order of saves. Just do nothing in your regular update/insert statements. Here's some pseudocode for a saving a document exmmple to explain my answer:
[Insert]
public void InsertDocument(MyDocument objDocument) { }
[Update]
public void UpdateDocument(MyDocument objDocument) { }
protected override bool PersistChangeSet()
{
try {
// have to save document first to get its id....
MyDocument objDocumentBeingSaved = null;
foreach (ChangeSetEntry CSE in ChangeSet.ChangeSetEntries.Where(i => i.Entity is MyDocument)) {
var changedEntity = (MyDocument)CSE.Entity;
objDocumentBeingSaved = documentRepository.SaveDocument(changedEntity);
break; // only one doc
}
if (objDocumentBeingSaved == null)
throw new NullReferenceException("CreateDocumentDomainService.PersistChangeSet(): Error saving document information. Document is null in entity set.");
// save document assignments after saving document object
foreach (ChangeSetEntry CSE in ChangeSet.ChangeSetEntries.Where(i => i.Entity is DocumentAssignment)) {
var changedEntity = (DocumentAssignment)CSE.Entity;
changedEntity.DocumentId = objDocumentBeingSaved.Id;
changedEntity.Id = documentRepository.SaveDocumentAssignment(objDocumentBeingSaved, changedEntity);
}
// save line items after saving document assignments
foreach (ChangeSetEntry CSE in ChangeSet.ChangeSetEntries.Where(i => i.Entity is LineItem)) {
var changedEntity = (LineItem)CSE.Entity;
changedEntity.DocumentId = objDocumentBeingSaved.Id;
changedEntity.Id = documentRepository.SaveLineItem(objDocumentBeingSaved, changedEntity);
}
documentRepository.GenerateDocumentNumber(objDocumentBeingSaved.Id);
}
catch {
// ....
throw;
}
return false;
}

Update record using linq with ADO.NET with Oracle

I have created an ADO.NET entity data model, and using linq to update/edit my oracle database.
using (Entities ent = new Entities())
{
RUSHPRIORITYRATE rp = new RUSHPRIORITYRATE();
rp.RATE = rate;
var query = from j in ent.RUSHPRIORITYRATEs
select j;
List<RUSHPRIORITYRATE> list = query.ToList();
if (list.Count == 0)
{
ent.AddToRUSHPRIORITYRATEs(rp);
ent.SaveChanges();
}
else
{
foreach (RUSHPRIORITYRATE r in query)
{
r.RATE = rp.RATE;
}
ent.SaveChanges();
}
}
I have a method that either adds or updates a Table that will always have one record. The record's value is then only update once there is one record in place. Adding to the table is no problem, but I've looked up how to update recores through MSDN, and "ent" does not seem to have the "submitchanges" method that the solution requires. Running this, I get the error: "The property 'RATE' is part of the object's key information and cannot be modified."