I'm new to SQL Server and I wonder:
I have built some tables that display the documents that produced in my program.
I need to write a stored procedure that inserts a document to the tables with transaction.
I think to create a main procedure with transaction that get 2 DTU of table:
main and details
My question is:
is it valid to create many DTU of table, for example:
if in my DB their is 10 tables i need create 10 data type user of tables?
How can I use polymorphism in the procedure parameter ,so I can write a one procedure that get for example:
if i have 2 tables person and teachers that all teacher is person so my parameter will be alwase as person type but i allow send also teacher type ?
After I saw the XML type but its more slowly to use that and also it is more difficult
I wonder if SQL Server has other solutions to write to multiple tables with one transaction that would require fewer parameters?
Thanks for advance and hope that you will help me
It all depends on your front-end. If you have two different UI to capture header and details separately. You would need two separate stored procs, if not one stored proc would suffice.
User Defined table type can decrease the number of params to stored proc. I agree XML is complex and user defined table type is much easier to use.
I have a database which has nearly 1000 tables. All the textual type field is of varchar type. It has nearly 1000+ stored procedures and functions too.
These procedures and functions got their own varchar parameters and varchar variables.
I need to convert every varchar type field to nvarchar type in one go in every table, procedures and functions.
Of-course I could do it one by one but that would take years to do it manually.
I don't need to change the size. If its varchar(50). I want output like nvarchar(50). No change in size.
Of-course I could do it one by one but that would take years to do it manually.
This appears to be the answer to your question, as if 'in one go' would take years in a single transaction then this would have to be executed in digestible chunks based on what maintenance windows you have to do this.
Tables first considering any PK-FK ordering of execution
Views next unless any views are schemabinding, then you'd have to drop view - alter table - recreate view in that order.
Then functions and SP's.
OR
If it's possible, get a copy of your database, apply all of your changes, then do a switch between your current prod database and that one. This assumes you can arrange to make sure all prod data in the current db is in the copy when you do the switch.
I'm reading and parsing CSV files into a SQL Server 2008 database. This process uses a generic CSV parser for all files.
The CSV parser is placing the parsed fields into a generic field import table (F001 VARCHAR(MAX) NULL, F002 VARCHAR(MAX) NULL, Fnnn ...) which another process then moves into real tables using SQL code that knows which parsed field (Fnnn) goes to which field in the destination table. So once in the table, only the fields that are being copied are referenced. Some of the files can get quite large (a million rows).
The question is: does the number of fields in a table significantly affect performance or memory usage? Even if most of the fields are not referenced. The only operations performed on the field import tables are an INSERT and then a SELECT to move the data into another table, there aren't any JOINs or WHEREs on the field data.
Currently, I have three field import tables, one with 20 fields, one with 50 fields and one with 100 fields (this being the max number of fields I've encountered so far). There is currently logic to use the smallest file possible.
I'd like to make this process more generic, and have a single table of 1000 fields (I'm aware of the 1024 columns limit). And yes, some of the planned files to be processed (from 3rd parties) will be in the 900-1000 field range.
For most files, there will be less than 50 fields.
At this point, dealing with the existing three field import tables (plus planned tables for more fields (200,500,1000?)) is becoming a logistical nightmare in the code, and dealing with a single table would resolve a lot of issues, provided I don;t give up much performance.
First, to answer the question as stated:
Does the number of fields in a table affect performance even if not referenced?
If the fields are fixed-length (*INT, *MONEY, DATE/TIME/DATETIME/etc, UNIQUEIDENTIFIER, etc) AND the field is not marked as SPARSE or Compression hasn't been enabled (both started in SQL Server 2008), then the full size of the field is taken up (even if NULL) and this does affect performance, even if the fields are not in the SELECT list.
If the fields are variable length and NULL (or empty), then they just take up a small amount of space in the Page Header.
Regarding space in general, is this table a heap (no clustered index) or clustered? And how are you clearing the table out for each new import? If it is a heap and you are just doing a DELETE, then it might not be getting rid of all of the unused pages. You would know if there is a problem by seeing space taken up even with 0 rows when doing sp_spaceused. Suggestions 2 and 3 below would naturally not have such a problem.
Now, some ideas:
Have you considered using SSIS to handle this dynamically?
Since you seem to have a single-threaded process, why not create a global temporary table at the start of the process each time? Or, drop and recreate a real table in tempdb? Either way, if you know the destination, you can even dynamically create this import table with the destination field names and datatypes. Even if the CSV importer doesn't know of the destination, at the beginning of the process you can call a proc that would know of the destination, can create the "temp" table, and then the importer can still generically import into a standard table name with no fields specified and not error if the fields in the table are NULLable and are at least as many as there are columns in the file.
Does the incoming CSV data have embedded returns, quotes, and/or delimiters? Do you manipulate the data between the staging table and destination table? It might be possible to dynamically import directly into the destination table, with proper datatypes, but no in-transit manipulation. Another option is doing this in SQLCLR. You can write a stored procedure to open a file and spit out the split fields while doing an INSERT INTO...EXEC. Or, if you don't want to write your own, take a look at the SQL# SQLCLR library, specifically the File_SplitIntoFields stored procedure. This proc is only available in the Full / paid-for version, and I am the creator of SQL#, but it does seem ideally suited to this situation.
Given that:
all fields import as text
destination field names and types are known
number of fields differs between destination tables
what about having a single XML field and importing each line as a single-level document with each field being <F001>, <F002>, etc? By doing this you wouldn't have to worry about number of fields or have any fields that are unused. And in fact, since the destination field names are known to the process, you could even use those names to name the elements in the XML document for each row. So the rows could look like:
ID LoadFileID ImportLine
1 1 <row><FirstName>Bob</FirstName><LastName>Villa</LastName></row>
2 1 <row><Number>555-555-5555</Number><Type>Cell</Type></row>
Yes, the data itself will take up more space than the current VARCHAR(MAX) fields, both due to XML being double-byte and the inherent bulkiness of the element tags to begin with. But then you aren't locked into any physical structure. And just looking at the data will be easier to identify issues since you will be looking at real field names instead of F001, F002, etc.
In terms of at least speeding up the process of reading the file, splitting the fields, and inserting, you should use Table-Valued Parameters (TVPs) to stream the data into the import table. I have a few answers here that show various implementations of the method, differing mainly based on the source of the data (file vs a collection already in memory, etc):
How can I insert 10 million records in the shortest time possible?
Pass Dictionary<string,int> to Stored Procedure T-SQL
Storing a Dictionary<int,string> or KeyValuePair in a database
As was correctly pointed out in comments, even if your table has 1000 columns, but most of them are NULL, it should not affect performance much, since NULLs will not waste a lot of space.
You mentioned that you may have real data with 900-1000 non-NULL columns. If you are planning to import such files, you may come across another limitation of SQL Server. Yes, the maximum number of columns in a table is 1024, but there is a limit of 8060 bytes per row. If your columns are varchar(max), then each such column will consume 24 bytes out of 8060 in the actual row and the rest of the data will be pushed off-row:
SQL Server supports row-overflow storage which enables variable length
columns to be pushed off-row. Only a 24-byte root is stored in the
main record for variable length columns pushed out of row; because of
this, the effective row limit is higher than in previous releases of
SQL Server. For more information, see the "Row-Overflow Data Exceeding
8 KB" topic in SQL Server Books Online.
So, in practice you can have a table with only 8060 / 24 = 335 nvarchar(max) non-NULL columns. (Strictly speaking, even a bit less, there are other headers as well).
There are so-called wide tables that can have up to 30,000 columns, but the maximum size of the wide table row is 8,019 bytes. So, they will not really help you in this case.
yes. large records take up more space on disk and in memory, which means loading them is slower than small records and fewer can fit in memory. both effects will hurt performance.
I have 20 record group that i need to batch insert them all in one connection so there is two solution (XML or stored procedure). this operation frequently executed so i need fast performance and least overhead
1) I think XML is performs slower but we can freely specify how many record we need to insert as a batch by producing the appropriate XML, I don't know the values of each field in a record, there maybe characters that malformed our XML like using " or filed tags in values so how should i prevent this behavior ?
2) using stored procedure is faster but i need to define all input parameters which is boring task and if i need to increase or decrease the number of records inserted in a batch then i need to change the SP
so which solution is better in my environment with respect to my constrains
XML is likely the better choice, however there are other options
If you're using SQL Server 2008 you can use Table Valued parameters instead.
Starting with .NET 2.0 you had the option to use the SQLBulkCopy
If you're using oracle you can pass a user defined type but I'm not sure what versions of ODP and Oracle that works with.
Note these are all .NET samples. I don't know that this will work for you. It would probably help if you include the database and version and client technology that you're using.
I have a SQL Server 2008 database. This database has a stored procedure that will update several records. The ids of these records are stored in a parameter that is passed in via a comma-delimited string. The property values associated with each of these ids are passed in via two other comma-delimited strings. It is assumed that the length (in terms of tokens) and the orders of the values are correct. For instance, the three strings may look like this:
Param1='1,2,3,4,5'
Param2='Bill,Jill,Phil,Will,Jack'
Param3='Smith,Li,Wong,Jos,Dee'
My challenge is, I'm not sure what the best way to actually go about parsing these three CSVs and updating the corresponding records are. I have access to a procedure named ConvertCSVtoTable, which converts a CSV to a temp table of records. So Param1 would return
1
2
3
4
5
after the procedure was called. I thought about a cursor, but then it seems to get really messy.
Can someone tell me/show me, what the best way to address this problem is?
I'd give some thought to reworking the inputs to your procedure. Since you're running SQL 2008, my first choice would be to use a table-valued parameter. My second choice would be to pass the parameters as XML. Your current method, as you already know, is a real headache and is more error prone.
You can use bulk load to insert values to tmp table after that PIVOT them and insert to proper one.