I'm converting an Excel app from a local MS Access backend (with DAO) to SQL Server running on Azure (with ADO).
A common task I perform with DAO is the index + seek method to scan a large amount of input rows (~10,000, and using multi field indexes), check for matching records in the database, and update or add new records as required. The NoMatch property of the Seek method works very nicely when deciding to add or update.
This seems like it should be pretty simple with SQL Server, but I can't seem to find a good solution that lets me check for matches, add or update, and use multi column indexes.
Downloading the table to memory then doing a batch update would be fine, but ADO's Find method doesn't seem as good as index + seek, and it can't use multi columns. Connecting to SQL Server with an ADO provider that supports index seek would also work (Jet 4.0?) but I can't find examples of that either.
Am I missing something obvious? What is the best way to check and add or update large number of rows to SQL Server? Thanks
Edit:
Here's a simple example of the operation I'm doing currently in Access/DAO:
Set rs = db.OpenRecordset("TableName", dbOpenTable)
With rs
.Index = "MultiFieldIndex"
'Loop through the input data
For i = 1 To 10000
.Seek "=", Criteria1, Criteria2
If Not .NoMatch Then
'Found a match, just update specific fields
!Field1 = a
!Field2 = b
Else
'No match found, add a new record then populate
.AddNew
!Field3 = c
!Field4 = d
End If
.Update
Next i
End With
Whats the best way to do something like this only with SQL? I'd still probably start with loading a disconnected recordset of the full target table, but not sure how to update a few thousand records when I don't know if I'll need to update or add new, or the values of the input criteria. How do I find the rows I need to update/check without index + seek?
OR can I create a temp table in memory with only the input data then somehow just 'merge' that table to the database and the db will figure out how to update or add?
I feel like this should be a pretty basic procedure, but maybe I'm just missing some fundamental SQL concept?
Thanks for all the help!
The solution would be to write an SQL query
create procedure dbo.prc_TableNameUpdIns (#parm1 int, #parm2 int)
AS
IF EXISTS (SELECT * from MyTable WHERE A = #parm1 AND B = #parm2)
UPDATE ...
ELSE
INSERT....
GO
(you could use merge but I advise against it)
And you call the SP by creating an AdoConnection (in VBA)
dim myAdoConnection as ADOConnection
dim X as Integer, Y as Integer
X = 7
Y = 8
'SET the myAdoConnection
'Calling the proc
myAdoConnection.prc_TableNameUpdIns X, Y
Processing as much code as possible in the SQL is much preferable. The only time you need to do this type of ADO manipulation is when you need to update the form only and not necessarily the database itself.
About ADO connection
And research the subject some more, this is only a pointer
Related
I've got some SSIS packages that take CSV files that come from the vendor and puts them into our local database. The problem I'm having is that sometimes the vendor adds or removes columns and we don't have time to update our packages before our next run, which causes the SSIS packages to abend. I want to somehow prevent this from happening.
I've tried reading in the CSV files line by line, stripping out new columns, and then using an insert statement to put the altered line into the table, but that takes far longer than our current process (the CSV files can have thousands or hundreds of thousands of records).
I've started looking into using ADO connections, but my local machine has neither the ACE nor JET providers and I think the server the package gets deployed to also lacks those providers (and I doubt I can get them installed on the deployment server).
I'm at a loss as to what I can do to be able to load tables and be able to ignore newly added or removed columns (although if a CSV file is lacking a column the table has, that's not a big deal) that's fast and reliable. Any ideas?
I went with a different approach, which seems to be working (after I worked out some kinks). What I did was take the CSV file rows and put them into a temporary datatable. When that was done, I did a bulk copy from the datatable to my database. In order to deal with missing or new columns, I determined what columns were common to both the CSV and the table and only processed those common columns (new columns were noted in the log file so they can be added later). Here's my BulkCopy module:
Private Sub BulkCopy(csvFile As String)
Dim i As Integer
Dim rowCount As Int32 = 0
Dim colCount As Int32 = 0
Dim writeThis As ArrayList = New ArrayList
tempTable = New DataTable()
Try
'1) Set up the columns in the temporary data table, using commonColumns
For i = 0 To commonColumns.Count - 1
tempTable.Columns.Add(New DataColumn(commonColumns(i).ToString))
tempTable.Columns(i).DataType = GetDataType(commonColumns(i).ToString)
Next
'2) Start adding data from the csv file to the temporary data table
While Not csvReader.EndOfData
currentRow = csvReader.ReadFields() 'Read the next row of the csv file
rowCount += 1
writeThis.Clear()
For index = 0 To UBound(currentRow)
If commonColumns.Contains(csvColumns(index)) Then
Dim location As Integer = tableColumns.IndexOf(csvColumns(index))
Dim columnType As String = tableColumnTypes(location).ToString
If currentRow(index).Length = 0 Then
writeThis.Add(DBNull.Value)
Else
writeThis.Add(currentRow(index))
End If
'End Select
End If
Next
Dim row As DataRow = tempTable.NewRow()
row.ItemArray = writeThis.ToArray
tempTable.Rows.Add(row)
End While
csvReader.Close()
'3) Bulk copy the temporary data table to the database table.
Using copy As New SqlBulkCopy(dbConnection)
'3.1) Set up the column mappings
For i = 0 To commonColumns.Count - 1
copy.ColumnMappings.Add(commonColumns(i).ToString, commonColumns(i).ToString)
Next
'3.2) Set the destination table name
copy.DestinationTableName = tableName
'3.3) Copy the temporary data table to the database table
copy.WriteToServer(tempTable)
End Using
Catch ex As Exception
message = "*****ERROR*****" + vbNewLine
message += "BulkCopy: Encountered an exception of type " + ex.GetType.ToString()
message += ": " + ex.Message + vbNewLine + "***************" + vbNewLine
LogThis(message)
End Try
End Sub
There may be something more elegant out there, but this so far seems to work.
Look into BiML, which build and executes your SSIS Package dynamically based on the meta-data at run time.
Based on this comment:
I've tried reading in the CSV files line by line, stripping out new
columns, and then using an insert statement to put the altered line
into the table, but that takes far longer than our current process
(the CSV files can have thousands or hundreds of thousands of
records).
And this:
I used a csvreader to read the file. The insert was via a sqlcommand
object.
It would appear at first glance that the bottleneck is not in the flat file source, but in the destination. An OLEDB Command executes in a row by row fashion, one statement per input row. By changing this to an OLEDB destination, it will convert the process to a bulk insert operation. To test this out, just use the flat file source and connect it to a derived column. Run that and check the speed. If it's faster, change to the oledb destination and try again. It also helps to be inserting into a heap (no clustered or nonclustered indexes) and use tablock.
However, this does not solve your whole varied file problem. I don't know what the flat file source does if you are short a column or more from how you originally configured it at design time. It might fail, or it might import the rows in some jagged form where part of the next row is assigned to the last columns in the current row. That could be a big mess.
However, I do know what happens, when a flat file source gets extra columns. I put in this connect item for it which was sadly rejected: https://connect.microsoft.com/SQLServer/feedback/details/963631/ssis-parses-flat-files-incorrectly-when-the-source-file-contains-unexpected-extra-columns
What happens is that the extra columns are concatenated into the last column. If you plan for it, you could make the last column large and then parse in SQL from the staging table. Also, you could just jam the whole row into SQL and parse each column from there. That's a bit clunky though because you'll have a lot of CHARINDEX() checking the position of values all of the place.
An easier option might be to parse it in .Net in a script task using some combo of split() to get all the values and check the count of values in the array to know how many columns you have. This would also allow you to direct the rows to different buffers based on what you find.
And lastly, you could ask the vendor to commit to a format. Either a fixed number of columns or use a format that handles variation like XML.
I've got a C# solution (I haven't checked it, but I think it works) for a source script component.
It will read the header into an array using split.
And then for each data row use the same split function and use the header value to check the column and use rowval to set the output.
You will need to put all the output columns in to the output area.
All columns that are not present will have a null value on exit.
public override void CreateNewOutputRows()
{
using (System.IO.StreamReader sr = new System.IO.StreamReader(#"[filepath and name]"))
{
while (!sr.EndOfStream)
{
string FullText = sr.ReadToEnd().ToString();
string[] rows = FullText.Split('\n');
//Get header values
string[] header = rows[0].Split(',');
for (int i = 1; i < rows.Length - 1; i++)
{
string[] rowVals = rows[i].Split(',');
for (int j = 0; j < rowVals.Length - 1; j++)
{
Output0Buffer.AddRow();
//Deal with each known header name
switch (header[j])
{
case "Field 1 Name": //this is where you use known column names
Output0Buffer.FieldOneName = rowVals[j]; //Cast if not string
break;
case "Field 2 Name":
Output0Buffer.FieldTwoName = rowVals[j]; //Cast if not string
break;
//continue this pattern for all column names
}
}
}
}
}
}
have used Stack Overflow as a resource hundreds of times, but my first time posting a question for some help!
I've got a table in SQL Server 2005 which contains 4 nVarChar(Max) fields.
I'm trying to pull out the data from an Access (2010) VBA Module using ADO 2.8
I'm connecting using SQL driver SQLNCLI10
(I can't use a linked table, as the 'table' I will ultimately be querying is a Table-Valued Function)
When I then print / use the recordset, the data is getting jumbled and concatenated with other fields in the same record - with a bunch of obscure characters thrown in.
The VBA: (various other methods were tried with the same result)
Sub TestWithoutCasting()
Dim cn As New ADODB.Connection
Dim rs As New ADODB.Recordset
Dim i As Integer
cn.Open "Data Source=ART;DataTypeCompatibility=80;MARS Connection=True;"
Set rs = cn.Execute("SELECT * FROM JobDetail WHERE JobID = 2558 ORDER BY SeqNo ASC")
Do While Not rs.EOF
For i = 1 To rs.Fields.Count
Debug.Print rs.Fields(i).Name & ": " & rs.Fields(i).Value
Next i
rs.MoveNext
Loop
End Sub
Example Output:
SeqNo: 1
CommandID: 2
Parameter1: 2 Daily Report é [& some other chars not showing on here]
Parameter2: [Null]
Parameter3: [Null]
Parameter4: [Null]
Description: Daily Report
Active: False
Expected Output:
SeqNo: 1
CommandID: 2
Parameter1: SELECT Day_Number ,Day_Text ,Channel_Group_ID [...etc]
Parameter2: [Null]
Parameter3: [Null]
Parameter4: [Null]
Description: Daily Report
Active: False
So, it's grabbing bits of data from other fields instead of the correct data (in this case, it's an SQL statement)
I then tried casting the nvarchar(max) fields as text at source
View Created:
CREATE VIEW TestWithCast
AS
SELECT jd.JobID, jd.SeqNo, jd.CommandID
,cast(jd.Parameter1 as text) as Parameter1
,cast(jd.Parameter2 as text) as Parameter2
,cast(jd.Parameter3 as text) as Parameter3
,cast(jd.Parameter4 as text) as Parameter4
,jd.[Description]
,jd.Active
FROM JobDetail jd
Now, I initially had some luck here - using the same code as above does bring back data - but when I use this code in my main code (which jumps in & out of other procedures); as soon as I've queried the first result of the recordset, it appears to wipe the rest of the records / fields, setting them to Null. I also tried setting the value of each field to a variable whilst the rest of the vba runs before getting the next record - but this doesn't help either.
It almost feels like I need to dump the recordset into a local Access table, and query from there - which is a bazaar workaround for what is already a workaround (by casting as text).
I there something I'm completely missing here, or do I indeed need to cast as text and load to a local table?
Thanks for any help - it's driving me mad!
ps. Hope I've given the right level of detail / info - please let me know if I missed anything key.
EDIT:
Yikes, I think I've done it / found the issue...
I changed the driver to SQLSRV32 (v6.01) - and seems to work fine directly against the text casted field.
So... why would it work with an older driver but not the newer 'recommended' (by various sources I read) as the one to use.
And... will there be a significant drawback in using this over the native client?
EDIT 2:
Ok, I've tried a few drivers on a few machines, in each case with both the TEXT CASTING and Directly to VARCHAR MAX..
[On my windows 7 machine w/ SQLSMS 2008]
SQL Native Client 10.0 - Neither method works reliably with this driver
SQL Server 6.01 - BOTH methods appear to work reliably - further testing needed though
[On our production server w/ SQLS 2005]
SQL Native Client (v2005.90) - Does not work at all with varchar(max), but DOES work with text casting
SQL Server (v2008.86) - BOTH methods appear to work reliably - further testing needed though
This should make deployment interesting!
It's not a real answer, because I did not test it, but ... You are using a "DataTypeCompatibility=80" parameter in your connection. As far as I know, DataTypeCompatibility=80 refers to SQL Server 2000, where the nvarchar(max) field type was still not implemented.
I had the same problem, solved it by converting the field to an nvarchar(1000). Would be an easy, compatible solution for your problem if 1000 chars is enough.
I have a database formatting problem in which I am trying to concatenate column "B" rows based on column "A" rows. Like So:
https://docs.google.com/spreadsheet/ccc?key=0Am8J-Fv99YModE5Va3hLSFdnU0RibmQwNVFNelJCWHc
Sorry I couldn't post a picture. I don't have enough reputation points YET. I'LL Get them eventually though
So I'd like to solve this problem within Excel or Access. Its currently an access database, but I can export it to excel easily. As you can see, I want to find "userid" in column A and where there are multiple column A's such as "shawn" I'd like to combine the multiple instances of shawn and concatenate property num as such.
Even though there are multiple instances of column A still, I could just filter all unique instances of the table later. My concern is how to concatenate column B with a "|" in the middle if column A has multiple instances.
This is just a segment of my data (There is a lot more), so I would be very thankful for your help.
The pseudo code in my head so far is:
If( Column A has more than one instance)
Then Concatenate(Column B with "#"+ "|" +"#")
I'm also wondering if there is a way to do this on access with grouping.
Well Anyways, PLEASE HELP.
In excel we can achieve it easily by custom function in vba module. Hopefully using vba(Macros) is not an issue for you.
Here is the code for the function which can be added in vba. (Press Alt+F11, this will take you to visual editor, right click the project and add a module. Add the below code in module)
Public Function ConcatenatePipe(ByVal lst As Range, ByVal values As Range, ByVal name As Range) As String
ConcatenatePipe = ""
Dim i As Integer
For i = 1 To lst.Count
If name.Value = lst.Item(i).Value Then ConcatenatePipe = ConcatenatePipe & "|" & values.Item(i).Value
Next
ConcatenatePipe = Mid(ConcatenatePipe, 2)
End Function
This function you can use in excel in F Column of your example. Copy the below formulla in F2 and the copy paste the cell to rest of F column. =ConcatenatePipe($A$2:$A$20,$B$2:$B$20,E2)
I believe you can solve this with an SQL GROUP BY function. At least, here's how I'd do it in MySQL or similar:
SELECT userid, GROUP_CONCAT(propertynum SEPARATOR '|') FROM Names GROUP BY userid
as described in this stack overflow post: How to use GROUP BY to concatenate strings in MySQL?
Here's a link on how to use SQL in MS Access: http://www.igetit.net/newsletters/Y03_10/SQLInAccess.aspx
Unfortunately there is not a GROUP_CONCAT function in MSAccess, but this other SO post explains some ways round that: is there a group_concat function in ms-access?
I have a table in Access with 1 field called HostName, it is a text field, with 100 char max. I use it to store DNS host names. The field is setup as the primary key. If I do the following query it returns the expected results, but takes about 8 seconds to complete on a table with 1 million records:
SELECT TOP 1 HostsRev.HostName
FROM HostsRev
WHERE (((HostsRev.HostName)>="test"))
ORDER BY HostsRev.HostName;
If I remove the "ORDER BY" part, it returns in less than 1 second, but doesn't always return what I would expect -- not the first record that is >= to "test".
I am doing the query via ADO from a C++ app, but I've tested in Access also, by creating a query, and get the same results.
What I need is to quickly find the first record, if any, that starts with a given string. I also tried using LIKE query but that had the same results. I need to do this because if I search on images.google.com, I need to know if the list contains google.com but not images.google.com (I actually store the host names in reverse string order to make this work correctly, and reverse the strings before doing the lookup).
The issue is that the TOP command on it's own does not apply sorting to the data, so without the ORDER BY it will return in a different order and thus give different results, you could try the following instead:
SELECT Min(HostName) FROM HostsRev WHERE HostName >= "test"
Not sure if this will give any better performance though but worth a go : )
I am not sure if you can do this from C++, not being a C++ programmer, but ADO supports a property .index to allow you to set the index you wish to use and a .seek method to search on that index. here is some code in VB for what it is worth.
Dim conn As ADODB.Connection
Dim rs As ADODB.Recordset
Set conn = New ADODB.Connection
conn.Open ConnectionString
rs.Open "mytable", conn
rs.Index = "primarykey"
rs.Seek "test", adSeekAfterEQ
If rs.EOF Then ' record not found
I have a database with a bunch of stuff in it, and right now I'm reading in data, doing some processing on it, and then sticking it in a new database. My code generates this string:
query_string = "INSERT INTO OrgPhrase (EXACT_PHRASE,Org_ID) VALUES (HELLO,123)"
Then it's used this way:
Dim InsertCmd = New System.Data.OleDb.OleDbCommand(query_string, connection)
InsertCmd.ExecuteNonQuery()
The associated database (OLEdb connection) exists and opens fine, with all the tables and columns it's trying to work with already existing. The error message I get is "No value given for one or more required parameters"
Am I missing something? Did I spell something wrong? I don't have a ton of experience with database work, but I've never had this trouble inserting before.
I believe the query should be
query_string = "INSERT INTO OrgPhrase (EXACT_PHRASE,Org_ID) VALUES ('HELLO',123)"
Also, it may happen that the table has more than 2 columns that are NOT NUll and the values to them are required.
Consider parameterizing the query string. There are a couple of reasons for this. First, you can pass in the values without having to worry about whether or not you need single quotes. Second, you prevent SQL injection.
query_string = "INSERT INTO OrgPhrase (EXACT_PHRASE,Org_ID) VALUES (#ExactPhrase,#OrgId)"
You then create parametes based on the parameter names in the string. Unless, of course, your query string is always the same values, but that sounds a bit too hardcoded to be good.