How do I generate a primary key using SSIS? - sql-server

I'm mucking about with SSIS and can't figure out how to generate a primary key?
I have a very simple SSIS Data Flow:
Excel File Source -> Character Map -> ADO.NET Destination
The Excel file has the following structure:
Field1
The destination table has the following structure:
ID - UniqueIdentifier (GUID)
Field1
There's no problem mapping Field1, but how do I get ID to map to the NEWID() SQL Server function (or something equivalent like the .NET method System.Guid.NewGuid()).

What you do is create a DEFAULT CONSTRAINT for the ID field as NEWID(). In SQL Server, go to the table design, click on the ID column and in the properties, look for DEFAULT. There, type NEWID(). Save the table.
In SSIS, only insert into Field1, and the ID will automatically be generated at every attempted insert.

Related

How can split a row of data and insert each row (but different columns) into two tables (with a FK relationship) in SSIS?

I have two tables in SQL Server:
Person
ID (PK, int, IDENTITY)
Name (varchar(100))
UploadedBy (varchar(50))
DateAdded (datetime)
PersonFile
ID (PK, int, IDENTITY)
PersonId (FK, int)
PersonFile (varchar(max))
I am reading in a large file (150MB), and I have a script component that can successfully parse the file into several columns. The issue is that I need to insert the first 3 columns of my parsed data row into my Person table first, then use the ID of that Row to insert the final column into my PersonFile table. Is there an easy way to do this in SSIS?
I suppose I could technically script everything out to handle inserts in the database, but I feel like in that case, I might as well just skip SSIS altogether and user powershell. I also thought about writing a procedure in SQL server and then passing the information to the procedure to handle inserts. But again, this seems very inefficient.
What's the best way for me to insert a row of data into two tables, if one of them has a foreign key constraint?
I think the best way is to use a stage table in the database to hold the parsed source file and then use stored procedures or SQL-query to load your tables. There is a lookup component in SSIS that can be used for your case but I try avoiding it for various reasons.
Create a table resembeling the source file, something like:
CREATE TABLE dbo.[SourceFileName](
Name nvarchar(100) NULL,
UploadedBy nvarchar(50) NULL,
DateAdded datetime NULL,
PersonFile nvarchar(max) NULL
)
Truncate the stage table. Use a dataflow component to get the source data. Use script or stored procedures to insert the source data in your destination table (begin with Person and the load PersonFile). Your SSIS dataflow should look something like this:
For the insert script for person do something like:
INSERT INTO dbo.Person (Name, UploadedBy,DateAdded)
SELECT Name,UploadedBy,DateAdded
FROM dbo.SourceFileName;
For the insert for PersonFile make a join to the destination table:
INSERT INTO dbo.PersonFile(PersonId,PersonFile)
SELECT
Person.ID,
SourceFile.PersonFile
FROM dbo.SourceFileName SourceFile
JOIN dbo.Person Person
ON Person.Name = SourceFile.Name
You should also add a UNIQUE CONSTRAINT to the column that identifies the person (Name for example).
One very common thing to do would be to stage the data first.
So you insert all columns into a table on the server, which also has an extra nullable column for the PersonID.
Then you’d have a stored procedure which inserts unique Person records into the Person table, and updates the staging table with the resulting PersonID, which is the extra field you need for the PersonFile insert, which could then be performed either in the same procedure or another one. (You’d call these procedures in SSIS with an Execute SQL Task.)
I suppose this could possibly be done purely in SSIS, for example with a Script Destination that performs an insert and retrieves the PersonID for a second insert, but I’m fairly sure performance would take a huge hit with an approach like that.

Using pandas to_sql to append data frame to an existing table in sql server gives IntegrityError

I tried to append my pandas dataframe to an existing data table in sql server like below. All my column names in the data are absolutely identical to the database table.
df.to_sql(table_name,engine,schema_name,index=False,method='multi',if_exists='append',chunksize=100)
But it failed and I got error like below:
IntegrityError: ('23000', "[23000] [Microsoft][ODBC Driver 17 for SQL Server]
[SQL Server]Cannot insert explicit value for identity column in table 'table_name'
when IDENTITY_INSERT is set to OFF. (544) (SQLParamData)")
I have non clue what that means and what I should do to make it work. It looks like the issue is IDENTITY_INSERT is set to OFF?. Appreciate if anyone can help me understand why and what potentially I can do. Thanks.
In Layman's terms, the data frame consists of primary key values and this insert is not allowed in the database as the INDENTITY_INSERT is set to OFF. This means that the primary key will be generated by the database itself. Another point is that probably the primary keys are repeating in the dataframe and the database and you cannot add duplicate primary keys in the table.
You have two options:
First: Check in the database, which column is your primary key column or identity column, once identified remove that column from your dataframe and then try to save it to the database.
SECOND: Turn on the INDENTITY INSERT SET IDENTITY_INSERT Table1 ON and try again.
If your dataframe doesn't consists of unique primary keys, you might still get another error.
If you get error after trying both of the option, kindly update your question with the table schema and the dataframe value using df.head(5)

POSTGRES - INSERT INTO FOREIGN TABLE

I created a table into an external server
CREATE FOREIGN TABLE external_table (
field_1 varchar(15) NULL,
field_2 int4 NULL
)
SERVER server_name
OPTIONS(compression 'pglz', stripe_row_count '500000');
Now I want to insert into external_table, but if I run this query
INSERT INTO external_table (field_1, field_2) VALUES ('test',1);
It return this error
ERROR: operation is not supported
How can I add record into a foreign table?
I've tried with the following insert
INSERT INTO external_table (field_1, field_2) select 'test',1;
It works, but I can't use a INSERT INTO with SELECT statment.
Looks like the extension you are using supports "insert into ... select .." but not direct inserts.
you can use you should probably ask this question while specifying the extension.
PS: It looks like the extension you use is cstore_fdw. It does not support direct inserts, because it completely cancels benefits of using columnar storage and create some extra overhead. If you are using cstore_fdw, try to use bulk inserts instead of single row ones. Inserting into a regular table and moving data into cstore_fdw table when data reaches certain size (i.e. stripe_row_count number of rows) is much better option.

How to create identity column when importing data from Excel into MS SQL Server (with Import and Export Wizard)?

I need to import great amount of data from excel into MS SQL Server, using the Import/ Export Wizard. Then I'll continue importing more data into the same table on a weekly basis.
The bad thing is that my excel data doesn't have an identity column, which to use as a primary key. The only option with what available is to use 2 string columns as a primary key, which is not a good idea.
Is there a way the sql server to add auto-identity column (integer) when importing the data, and what's the trick? I prefer such a column to be automatically added, because I'll need to import big amount of data into the same table on a weekly basis.
I tested a couple of times (with no success) and looked for a solution in the internet, but didn't found an answer to that particular question. Thanks in advance!
You can create the table first along with the new identity column.
CREATE TABLE YourTable
(
id INT IDENTITY,
col1....
col2....
col3....
PRIMARY KEY(id)
)
Then run the import/export wizard. When you get to the destination section pick your newly created table and map all the fields except the identity column. After the import you can check the table and see the id column has been populated.
Column names in Excel sheet should be same as that of sql Table.
Map Excel columns with that of SQL table columns by Clicking Edit Mapping.
Just don't map that (identity) column of sql table to anything .
In Import-Export Wizard don't check Enable identity insert Checkbox. (Leave that
un selected).
and go ahead import .. This worked for me. !!!!!
Previously when i used to check Enable identity insert it used to give me error.
I had a similar issue. I have a SQL table with an identity column (auto increment ID value) defined. I needed to import an Excel spreadsheet into this table.
I finally got it to work via the following:
Do NOT add a column to the Excel spreadsheet for your identity column in the SQL table.
When you run the import wizard and get to the Edit Mappings step, do NOT select the Enable identity insert checkbox. This, in particular, was tripping me up.

Is it possible to bring a GUID from SQL Server in PostgreSQL using a Foreign Table?

I am trying to pull in data from a table in SQL Server into PostgreSQL using a Foreign Table. Example:
CREATE FOREIGN TABLE test
(testguid text ,
name text )
SERVER test_server
OPTIONS (schema_name 'dbo', table_name 'test');
My problem is that the first field testguid is a GUID data type in SQL Server. Something like ca902d1e-1082-e711-80bf-005056bdd6cd. This came into PostgreSQL as blank. The other non GUID fields are coming correctly.
Does anybody have a solution for doing this?

Resources