I have a large table in SQL Server 2008 with roughly 500000 records and 40 columns. Some columns are string and they contain \n and other symbols. I want to convert this table to an XML file for use in project. When I use FOR XML to export this table, some errors are shown.
For example, when test:
select testData.*
from testData
FOR XML PATH('sample'), Type, ELEMENTS, ROOT(TestData')
only 3500 records are converted to XML and also, final element (that is record 3500) is not complete.
When test (without Type):
select testData.*
from testData
FOR XML PATH('sample'), ELEMENTS, ROOT(TestData')
All the records are converted to XML but some CR/LF character are added to the XML file that failed xml file. So, some tag like Product split to prod CRLF uct.
I searched for a long time but no page was helpful.
If its a one-shot work, you can use the soft Altova XMLSpy, which is free during 30 days. The Altova mission kit suite contains a lot of tools like XML MapForce which can map a db to a xml.
Related
I'm successfully reading numbers from a .csv file into SQL Server with below statement, assuming that I've created a linked server named CSV_IMPORT.
select *
from CSV_IMPORT...Sophos#csv
However, the problem is that if I have a comma with numbers as data, it will show NULL instead of a correct one. How can I read the "54,375" correctly into SQL Server? Thank you very much for your help.
Below is the data in csv file.
09/07/2017,52029,70813,10898,6691,6849,122,25,147427
09/08/2017,47165,61253,6840,5949,5517,75,2,126801
09/14/2017,"54,375","16944","15616","2592","3280",380,25,"96390"
This is the result from the statement:
2017-09-07 00:00:00.000 52029 70813 10898 6691 6849 122 25 147427
2017-09-08 00:00:00.000 47165 61253 6840 5949 5517 75 2 126801
2017-09-14 00:00:00.000 NULL 16944 15616 2592 3280 380 25 96390
One way to go would be using temporary table. Read all data as text, then replace every comma in whole table to dot (.), if you want it as decimal separator, or to empty string('') if it's a thousand separator, then load data to exisitng table converting everything (you don't have to do it explicitly, SQL does it implictly).
Last year I did a project for a client which involved importing csv files, which were meant to be the same format, but which came from different sources and hence were inconsistent (even to the point of using different separators according to source). I ended up writing a CLR routine, which read the csv line by line, parsed the content, and added it to a DataTable. This DataTable I then inserted to SQL Server using the SqlBulkCopy class.
The advantage of this approach, was that I was totally in control of dealing with all anomalies in the file. It was also much faster than the alternative of inserting the whole file into a temporary table of varchars and then parsing within SQL Server. Effectively I did one line by line parse in c# and one bulk insert of parsed data.
I have some financial data for over 6600 stocks stored in a Foxpro database. I could download the database views into a set of 15 files, which I did first into .dbf files and then into .txt files (comma-delimited).
For the .dbf set of files I used a spatialite virtualization extension with Python and Sqlite to convert them into Sqlite tables then merged them into an 8-table database (Let's call it DBF-derived). So with c for cursor:
c.execute("CREATE VIRTUAL TABLE temp_virt USING VirtualDbf({}, UTF-8)".format(file))
c.execute("CREATE TABLE {} AS SELECT * FROM temp_virt;".format(table_name))
For the .txt files, I used Pandas to convert and combine 12 of the 15 files into 5 CSV files, then I plied them with other remaining 3 .txt files in Python and Sqlite to create an 8-table database (let's call it CSV-derived) using a modified version of this code (from this page):
with open(csvfile, "rb") as f:
reader = csv.reader(f)
header = True
for row in reader:
if header:
# gather column names from the first row of the csv
header = False
sql = "DROP TABLE IF EXISTS %s" % tablename
c.execute(sql)
sql = "CREATE TABLE %s (%s)" % (tablename,
", ".join([ "%s text" % column for column in row ]))
c.execute(sql)
for column in row:
if column.lower().endswith("_id"):
index = "%s__%s" % ( tablename, column )
sql = "CREATE INDEX %s on %s (%s)" % ( index, tablename, column )
c.execute(sql)
insertsql = "INSERT INTO %s VALUES (%s)" % (tablename,
", ".join([ "?" for column in row ]))
Now when I examined both sqlite databases, I found the following:
The DBF-derived database retained its ID column (although it was not designed as primary key).
The ID column did not survive the download to .txt in the CSV-derived db so I declared the stock ticker column as primary key.
The DBF-derived was not indexed in sqlite.
The CSV-derived got automatic indexing in sqlite.
Dates retained their date format in the CSV-derived db, whereas they turned into a number of days in the DBF-derived db.
The main data type that came through the vertualization process for the DBF-derived db was REAL, which I also set as the datatype as I
created the CSV-derived db.
All else was identical, except that the CSV-derived db was 22% smaller in size than the DBF-derived, and I am puzzled as to why
considering that it is indexed and has the same data and datatype.
The two databases technically display the same information in the DB
Browser program.
Any explanation as to why the difference in size? Is it because of the 3 .txt files that I did not convert to CSV?
It is hard to understand what you are doing and particularly why you would ever want to use a CSV in between when you could directly get data from another database system. Anyway, it is your choice, the difference is probably for the fact that VFP DBF data with character fields have trailing spaces. Say a 30 chars field, having a single letter in it still has a length of 30. Your conversion to SQLite might not be trimming the trailing spaces, while in a CSV file those data are already saved as trimmed.
Probably the easiest and most reliable way would be to directly create the SQLite tables and fill them with data from within a VFP program (using VFP is not a must of course, could be done in any language).
SQL Server 2008 (but have access to higher versions too)
I'm getting a string from another database on the same server. Using the below code i get some data and replace the content
INSERT INTO [DestinationDatabase].[DBO].[Table](ID, XML)
(SELECT ID, REPLACE(XML,'ReferenceID="1234"','PropertyID="2468"')
FROM [SourceDatabase].[DBO].[Customers]
This works as expected but every record has a different ReferenceID so is there a way to remove the current ReferenceID value as in the 4 digits (theres around 1000 records with different values) and replace it with another 4 digit value?
I will get the replacement value from another procedure but at this stage i need to know if it possible to find and strip the 4 digits and replace them.
If you want to use the replace function you can do it like that
REPLACE(XML,'ReferenceID="'+cast(table.field as nvarchar)+'"','ReferenceID="2468"')
REPLACE(XML,'ReferenceID="'+cast(table.field as nvarchar)+'"','ReferenceID="'+cast(table.another_field as nvarchar)+'"')
You can use xml function to do so but it seems like your XML column is not xml data type. is that correct.
I am exporting a file that is going to be picked up by another system. To avoid rework in the other system I am trying to match an existing excel csv output exactly. I have a date column in the DB which I want to export as dd/mm/yyy. In the data flow task I have the following SQL as the source where I do the appropriate conversion. If I run this query in ssms I get the right output.
SELECT [Code]
,[Agency_Name]
,[Region_Group]
,CONVERT( varchar(20), [GrossAmtYrly] , 1) GrossAmtYrly
,CONVERT ( varchar(20), [SaleDate] , 103) SaleDate
,[MemberNo]
,[Surname]
,[Scale]
FROM [Land].[Sales]
I then link this to a flat file destination, the column that this is mapped to is set to DT_SR width 20 not text qualified.
But the output file is spitting out a date in format yyyy-mm-dd.
Similarly for the grossamtyrly the old excel generated csv had the amount with commas after each 3 digits, wrapped in ". The output column it is mapped to is DT_SR width 20 with text qualified to true.
The output file for that column is missing the commas for grossamtyrly.
So it seems like my conversions in the SQL are being ignored completely, but I can't work out why.
Thanks in advance for any suggestions!
Using SSIS 2012 - Visual Basic 2010, DB is SQL Server 2012
I'd use a derived column in the data flow to convert it to the format you want. If it's coming in as a text field in format yyyy-mm-dd, you can convert it to dd/mm/yyyy with the following expression:
SUBSTRING(dt,9,2) + "//" + SUBSTRING(dt,6,2) + "//" + SUBSTRING(dt,1,4)
Thanks Custodian, I figure out how to get it to work.
I double clicked on the flow arrow between the tasks and the metadata tab shows the data type of each column. When I first set this up I did access mode as table or view and so date and grossamt were set to DT_DATE and DT_CY, so I think SSIS was implictly converting the column back again to its original type.
Now I couldn't work out how to change them, So I deleted the DB Source and recreated it starting with the SQL Command option, and everything works as expected.
I am copying data from Excel sheet to the SQL server tables.
In some of the sheets I have data bigger in size of the Table's schema in SQL.
i.e. Table's column has data type nvarchar(50) where as my Excel sheet has data of more than 50 characters in some of the shells.
Now while copying, the rows which has such data are not being inserted in to the database. Instead I would like to insert rows with such data by truncating extra characters. How do I do this?
You can use Java's substring method with a check to the length of the string with something like:
row1.foobar.length() > 50 ? row1.foobar.substring(0,50) : row1.foobar
This uses Java's String length method to test to see if it's longer than 50. If it is then it uses the substring method to get the characters between 0 and 50 (so the first 50 characters) and if it's not then it returns the whole string.
If you pop this in a tMap or a tJavaRow then you should be able to limit strings to 50 characters (or whatever you want with some tweaking):
If you'd prefer to remove any rows not compliant with your database schema then you should define your job's schema to match the database schema and then use a tSchemaComplianceCheck component to filter out the rows that don't match that schema.