Replication Snapshot Agent file naming - sql-server

When SQL Server Snapshot Agent creates a snapshot (for transactional replication), there's a bunch of .PRE, .SCH, .BCP, and .IDX files, usually prefixed with the object name, a sequence number and part number. Like MY_TABLE_1#1.bcp for MY_TABLE.
But when table names are a little longer like MY_TABLE_IS_LONG it can name the files like MY_TABLE_IS_LO890be30c_1#1.
I want to process some of these files manually (i.e. grab a snapshot and process the BCPs myself) but that requires the full name of the table, and I haven't been able to find where that hex number is created from or stored. They don't appear to be a straight object_id, and I've checked various backing tables in the distribution and publication databases where the tables have an objid and sycobjid and it's neither of those either (after converting hex to decimal).
Does anyone know where that number comes from? It must be somewhere.

It appears they're just random. What happens is when the snapshot is generated a set of commands are placed into the distribution database (you can see them with EXEC sp_browsereplcmds) and these have the hardcoded table name along with the script names, and in what order to run them.
When you run the distribution agent for the first time, it gets those replicated commands, and these instruct it to run all the scripts (alternately, if you've got it set to replication support only, I suspect these commands are just ignored).
In order to process the scripts semi-automatically you'd need to grab everything from replcmds (hopefully on a quiet system) and parse the commands before running them manually.

Related

What is the most efficient way to remove a space from a string when importing a csv file to SQL Server using SSIS?

I will be importing records from thousands of CSV files using SSIS. These CSV files will contain a Postal Code column, which has the format A5A 5A5, where "A" is any letter and "5" is any number from 0 to 9.
There is a space between the "A5A" and "5A5" that I want to remove, so that all Postal Codes appear as "A5A5A5".
I am reviewing the documentation and see several options, and I'm trying to narrow down the best one, i.e. the one that requires the least number of steps. So far I am looking at the Derived Column transformation, but that would involve adding another column to my SQL Table.
Is there a way I can trim the space without having to add an extra column?
As #Larnu answers via comments, a Derived Column is likely the most appropriate component to use here.
The expression you're looking for is a REPLACE. Syntax ought to be
REPLACE([PostalCode], " ", "")
You have 10 columns from your CSV. The Derived Column can either replace and existing or add a new column to row buffer. I would advocate adding a new column. PostalCodeStripped or something like that. At some point, something weird is going to happen with the data and you'll get an A5A 5A5 that didn't get the space stripped. Having both the original and the parsed value available in debugging can help sort out problems (Oh, this has a non-breaking space or a tab instead of a space, or in addition to)
But, just because a column is in the buffer does not mean you need to create a column for that in the destination table. Just unmap the PostalCode from the row buffer and map PostalCodeStripped to the PostalCode column in the database. You'll see what I'm talking about in the destination component. By default, they'll map based on name matching but you're welcome to wire them up however you see fit.
ETL is an alternate option. Bulk load the data into a staging table. Then do a simple select into the destination to do the transformation. I might be tempted to not use SSIS. BCP or Import-DbaCsv (DBATools powershell module) would both be a quick alternates. If you know PowerShell and want to process the files in a pipe, you can pipe the files into Import-DbaCsv. The PowerShell script can also execute Invoke-DbaQuery to run update or insert queries to do the transformation.
SSIS can also just do the bulk load and then run the T-SQL to do the transformations. I don't like the overhead of maintaining and upgrading SSIS packages. I'd take T-SQL jobs over SSIS jobs any day. (We have about 1/2 year for a FTE to upgrade our SSIS packages to SQL 2019. The T-SQL jobs just keep working when moved to a new version.)
Or go the ETL route and do the transformation in the SSIS data flow. A Derived Column transformation between a flat file source and a OLE DB destination should do the trick.
To handle multiple files, you can use the Foreach Loop Container. There's an enumerator for files using a wildcard path. (The initial T-SQL task just truncates the table for testing.)
You'll need to parameterize the thing to get the file source to be each file.
For PowerShell it might be something like (no transformation yet) the script below.
Get-ChildItem 'C:\TestFolder\*.csv' |
import-dbacsv -SqlInstance 'localhost\DEV' -Database 'Test' -Schema 'dbo' -Table 'Test' -AutoCreateTable -verbose
If you run this in the ISE, be aware of a bug where the connection might not be released after calling import-dbacsv that will cause it to hang. This is not an issue in the command line from what I can tell. (If this happens to you, you might have to kill the ISE process - closing it is not enough.)

Unit testing SQL scripts

For below script written in .sql files:
if not exists (select * from sys.tables where name='abc_form')
CREATE TABLE abc_forms (
x BIGINT IDENTITY,
y VARCHAR(60),
PRIMARY KEY (x)
)
Above script has a bug in table name.
For programming languages like Java/C, compiler help resolve most of the name resolutions
For any SQL script, How should one approach unit testing it? static analysis...
15 years ago I did something like you request via a lot of scripting. But we had special formats for the statements.
We had three different kinds of files:
One SQL file to setup the latest version of the complete database schema
One file for all the changes to apply to older database schema's (custom format like version;SQL)
One file for SQL statements the code uses on the database (custom format like statementnumber;statement)
It was required that every statement was on one line so that it could be extracted with awk!
1) At first I set up the latest version of the database by executing from statement after the other and logging the errors to a file.
2) Secondly I did the same for all changes to have a second schema
3) I compared the two database schemas to find any differences
4) I filled in some dummy test values in the complete latest schema for testing
5) Last but not least I executed every SQL statement against the latest schema with test data and logged every error again.
At the end the whole thing runs every night and there was no morning without new errors that one of 20 developers had put into the version control. But it saved us a lot of time during the next install at a new customer.
You could also generate the SQL scripts from your code.
Code first avoids these kinds of problems. Choosing between code first or database first usually depends on whether your main focus is on your data or on your application.

Why script generated by SSMS shown in red is different from script stored in system tables

Why script generated by SSMS shown in red is different from script stored in system tables. Please notice stored procedure names in query, query result and Object explorer.
i.e.
All these methods are giving me same script
sql_module
object_definition
sp_helptext
However when generated from SSMS, right click -> script as Create or Modify is giving a different script.
How is it possible and generating different scripts.
The answer can be confusing.
The Stored procedure getBudgets4programManager2 was renamed (very likely using sp_rename https://msdn.microsoft.com/en-us/library/ms188351.aspx), so the original definition does not match the new name. BTW. Notice that the definition stored in metadata will always change the DDL command to CREATE in case of issuing an ALTER PROCEDURE statement.
At the same time, SSMS scripting features will not simply get the definition from metadata as it has an object representation of the stored procedure, it will normalize the schema name & object name, and it may also normalize the DDL command accordingly (CREATE/ALTER). Notice that the schema is showing it is normalized (i.e. [dbo]), and that the current name is also normalized.
As for why the metadata definition is not renamed at the same time you rename the object. The answer is not 100% clear, but such change would affect any features in the SQL Server engine that relies on the definition, including using the WITH ENCRYPTION option on ALTER/CREATE PROCEDURE as well as the verification of digital signatures.
As far as I know, other elements in both versions of the scripts should remain intact (comments, blank spaces, etc.).
I hope this information helps.

Better way to store updatable scientific data?

I am using a file consisting of published scientific data. I'm using this file with a program that reads in the first 5 space delimited data fields, and everything after that is considered a comment by the program.
2 example lines (of thousands):
FeII 1608.4511 0.521 55.36 -1300 M03 Journal of Physics
FeII 1611.23045 0.0321 55.36 1100 01J AJ
The program reads it as:
FeII 1608.4511 0.521 55.36 -1300
FeII 1611.23045 0.0321 55.36 1100
These numbers are each measurements and most (don't get me started) have associated errors that are not listed in this file. I would like to store this information in a useful and updatable way. That is, say the first entry FeII 1608.4511 has an error of plus/minus 0.002. Consider when a new measurement is made and changes it to: FeII 1608.45034 plus/minus 0.0005. I would like to update the value, the error, and record some information about the publication that it came from.
The program that uses this file is legacy code and is both crucial and inflexible: and it needs the file to look like the above output when it's read in. I would really like for there to be a way to update the input file to include things like errors on the values and publication hyperlinks in comments. I would also like a kind of version control ability to return the state of this large file today; or in 5 months after 20 more lines are updated with new values.
Any suggestions on how best to accomplish this? Should I store everything in some kind of database?
Databases are deeply tied to identity. If a database can't identify a row by the data that's in it, a database isn't going to help you.
If I were you, I'd start by storing the base file in a version control system, not a database. At 20 changes per 5 months, I'd probably make those changes manually and commit each batch of changes. (I don't know what might constitute a batch for you. Could be a single change every time.)
Since the format of the existing file is both crucial and brittle, I'm not sure whether modifying it is a good idea. I think I'd feel better about storing error ranges and publication hyperlinks in a separate file, and using a script to put the pieces together for applications that can use error ranges and hyperlinks.
A database sounds sensible, SQL Server Express is free and widely used.
You can read in the text file including all comments and output the edited data in the same format. You can use a number of front ends including Access, for rapid development, or something you create yourself in VB.Net, or even Excel, at a pinch.
You will need to consider the structure of the table(s) but it should not be too difficult, and you can get help here.
For updating the information in the file introducing errors and links, you don't need any database; just open the file, iterate through the lines and update each one.
If you want to be able to restore a line state, you definetively need some kind of database. You can create a database in Sql Server or Firebird for example, and store in it a row for each line historical state (with date of creation off course); your file itself would be the repository for current values and you would be able to restore the file with a date and some simple fetcing of the database information.
If you can't use a database like Firebird or SQL Server, you can store the historical data in a simple text file, it's up to you. Just remember that you necesarely will need, like #CatCall commented, a way to identify each line in order to create a relation between the line in the file and the historical data stored in your repository.

What FoxPro data tools can I use to find corrupted data?

I have some SQL Server DTS packages that import data from a FoxPro database. This was working fine until recently. Now the script that imports data from one of the FoxPro tables bombs out about 470,000 records into the import. I'm just pulling the data into a table with nullable varchar fields so I'm thinking it must be a weird/corrupt data problem.
What tools would you use to track down a problem like this?
FYI, this is the error I'm getting:
Data for source column 1 ('field1') is not available. Your provider may require that all Blob columns be rightmost in the source result set.
There should not be any blob columns in this table.
Thanks for the suggestions. I don't know if it a corruption problem for sure. I just started downloading FoxPro from my MSDN Subscription, so I'll see if I can open the table. SSRS opens the table, it just chokes before running through all the records. I'm just trying to figure out which record it's having a problem with.
Cmrepair is an excellent freeware utility to repair corrupted .DBF files.
Have you tried writing a small program that just copies the existing data to a new table?
Also,
http://fox.wikis.com/wc.dll?Wiki~TableCorruptionRepairTools~VFP
My company uses Foxpro to store quite a bit of data... In my experience, data corruption is very obvious, with the table failing to open in the first place. Do you have a copy of foxpro to open the table with?
At 470,000 records you might want to check to see if you're approaching the 2 gigabyte limit on FoxPro table size. As I understand it, the records can still be there, but become inaccessible after the 2 gig point.
#Lance:
if you have access to Visual FoxPro command line window, type:
SET TABLEVALIDATE 11
USE "YourTable" EXCLUSIVE && If the table is damaged VFP must display an error here
PACK && To reindex the table and deleted "marked" records
PACK MEMO && If you have memo fields
After doing that, the structure of the table must ve valid, if you want to see fields with invalid data, you can try:
SELECT * FROM YourTable WHERE EMPTY(YourField) && All records with YourField empty
SELECT * FROM YourTable WHERE LEN(YourMemoField) > 200 && All records with a long memo field, there can be corrupted data
etc.
Use Repair Databases from my site (www.shershahsoft.com) for FREE (and Will always be FREE).
I have designed this program to repair damaged Foxpro/FoxBase/Dbase files. The program is very quick. It will repair 1 GB table in less than a minute.
You can asign files, and folders to the program. As you start the program it will mark all the corrupted files, and by clicking Repair or Check and Repair button, it will repair all the corrupted files. Moreover, it will create a folders "CorruptData" in the folders where the actual data exist, and will keep copies of the corrupt files there.
One thing to keep in mind, always run Windows CheckDsk on the drives where you store the files. Cause, when records are being copied to a table and power failure occures, there exists lost clusters which Windows converts to files during CheckDsk. After that, the RepairDatabases will do the job for you.
I have used many paid and free programs which repair tables, but all such programs leave extra records in the tables with embiguit characters (and they are time consuming too). The programer needs to find and delete such records manually. But Repair Databases actually recovers the original records, you need no further action. The only action you need is reindexing your files.
In the repair process some times File Open Dialog appears which asks to locate the compact index file for a table with indeces. You may click cancel the dialog at that point, the table will be repaired, however, you will need to reindex the file later. (this dialog may appear several times depending upon the number of corrupted indeces.)

Resources