Detect hijacks with only content change in ClearCase snapshot view - clearcase

Normally, an update of a snapshot view detects hijacks by examining file size and timestamp. Is there a way in clearcase to detect a file whose size and timestamp is unchanged but the file content has changed?

This isn't taken into account by ClearCase, since it assumes that, if the content has changed, the timestamp also has.
See "How the update operation determines whether a file is hijacked":
When a version is loaded into a snapshot view, the file size and last-modified time stamp (as reported by the UNIX® or Windows® file system) are recorded in the view database.
These values are modified each time you check out a file, check in a file, or load a new version into the view.
The update operation
When you update a view, the current size and last-modified time stamp of a non-checked-out file are compared with the size and time stamp that it recorded in the view database.
If either value is different from the value in the view database, the file is considered hijacked.
Changing only the read-only permission (on UNIX systems) or attribute (on Windows systems) of a non-checked-out file does not necessarily mean that the file is considered hijacked
The content isn't taken into account here.
The only instance where I had this case, I simply created another snapshot view and fired up a diff tool (WinMerge, KDiff3, BeyondCompare, ...), comparing the content of the two snapshot views.

Related

Is there a faster way to delete the first x rows from a DBF?

I have been trying to use CDBFLite to delete records of a DBF file from records 1 to 5 million or so (in order to decrease the filesize). Due to factors beyond my control, this is something I will have to do every day. The filesize exceeds 2 GB.
However, it takes forever to run the delete commands. Is there a faster way to just eliminate the first X records of a DBF (and thus result in a smaller filesize) ?
As noted by Ethan, if a .DBF file, it typically caps at standard 32-bit OS capacity of 2-gig per single file when it comes to .DBFs unless you are dealing with another software engine such as SyBase Database Advantage which can read/write to .DBF files and exceed the 2 gig capacity.
That said, the DBF standard format has a single character on each record as a "flag" that the record is deleted, yet still retains the space. In order to reduce the size, you would need to PACK the file which actually REMOVES the deleted records and thus will reduce the file size back down.
Now Ethan has options via Python, and I via C#.net and using Microsoft Visual Foxpro OleDb Provider and can offer more, but don't know what you have access to.
If you have VFP (or dBASE) directly, then it should be as simple as getting to the command window and doing
USE [YourTable] exclusive
pack
But I would make a backup copy of the file first as simple precaution.
Here's a very rough outline using my dbf package:
import dbf
import shutil
database = r'\some\path\to\database.dbf'
backup = r'\some\backup\path\database.backup.dbf')
# make backup copy
shutil.copy(database, backup)
# open copy
backup = dbf.Table(backup)
# overwrite original
database = backup.new(database)
# copy over the last xxx records
with dbf.Tables(backup, database):
for record in backup[-10000:]:
database.append(record)
I suspect copying over the last however many records you want will be quicker than packing.

Does the stream_id change if I move the file to some other directory within the same filetable?

I am using MSSQL 2012 and its feature called File Table to store some big amount of files stored in hierarchical directories. I am referencing the entries on the file from other custom tables via the column stream_id, which is unique for every record on the file table. Sometimes I need to move the files on the file table to some other location on the same file table. So far I have noticed that the stream_id does not change if I move the file to another directory. However, now in the production environment the stream_id does change after the move, so my custom table is referencing a not existing entry on the file table.
For moving the Files I am using File.Move(source, target);
Is there something wrong with the deployment of the file table in my production environment or is it just a feature that the stream_id can sometime change if I change the location?
I haven't found any reference in the internet regarding the stream_id and its lifetime.
The stream_id is the pointer to the blob, the path_locator is the PK of the table. The former refers to the file no matter where it is in the system, the latter refers to whatever file is currently occupying the space at that path location.
My understanding is that the stream_id will not change for the life of that file. If you are seeing the "same" file with a different stream_id, consider whether an application (like MS Word), created a temp file then renamed the temp file to the current file when you saved. This would result in the new file having the same path_locator but a different stream_id.

Allocate list of views accessed in last 12 months

Is it possible to list the clearcase views accessed in the last 12 months only? In a particular server, I want to list only the views accessed in a 12 months period. Since I am decommissioning this server, I want to keep the records of these view. Is it possible?
Any inputs are appreciated !!
Considering cleartool lsview, yo can use the -age option:
Reports when and by whom the view was last accessed. Because view access events are updated only every 60 seconds, lsview may not report all recent events.
This technote describes precisely what event will modify the "Last access" date of a view.
Only operations that result in a view database change will change this "last accessed" time.
These actions include:
Writing a view private object
Removing a view private file
Checking out a file (which creates a view-private copy of the checked-out version)
Checking in a file (which removes the view-private copy)
Creating a view
Write or create a derived object
Wink-in a derived object
Promote a derived object
Set a config spec
Actions such as starting the view, cd'ing in to a view, and setting to a view do not change the view configuration or database, and, thus, do not update the last accessed time.
Additionally, since ClearCase caches RPC results to improve performance, subsequent executions of cleartool lsview -age may not immediately reflect the most recent operation that changed the above "last accessed" time. The "last accessed" change may take up to 5 minutes to be reflected in the command's output.
If the "last accessed" is to be used in a script to delete views over a certain age, please note that this implementation issue may cause views that are in fact in use to be eligible for removal.
One example is a view that is created to hold trigger scripts that are under source control. This view's configuration may never change, and it may not be used for actual modifications to the trigger scripts. Any such views would have to be specifically excluded from removal.

Replication Snapshot Agent file naming

When SQL Server Snapshot Agent creates a snapshot (for transactional replication), there's a bunch of .PRE, .SCH, .BCP, and .IDX files, usually prefixed with the object name, a sequence number and part number. Like MY_TABLE_1#1.bcp for MY_TABLE.
But when table names are a little longer like MY_TABLE_IS_LONG it can name the files like MY_TABLE_IS_LO890be30c_1#1.
I want to process some of these files manually (i.e. grab a snapshot and process the BCPs myself) but that requires the full name of the table, and I haven't been able to find where that hex number is created from or stored. They don't appear to be a straight object_id, and I've checked various backing tables in the distribution and publication databases where the tables have an objid and sycobjid and it's neither of those either (after converting hex to decimal).
Does anyone know where that number comes from? It must be somewhere.
It appears they're just random. What happens is when the snapshot is generated a set of commands are placed into the distribution database (you can see them with EXEC sp_browsereplcmds) and these have the hardcoded table name along with the script names, and in what order to run them.
When you run the distribution agent for the first time, it gets those replicated commands, and these instruct it to run all the scripts (alternately, if you've got it set to replication support only, I suspect these commands are just ignored).
In order to process the scripts semi-automatically you'd need to grab everything from replcmds (hopefully on a quiet system) and parse the commands before running them manually.

Why md5 always changes on a certain file?

I have this task that needs investigation as to why the md5 value of a file keeps changing.
Example:
I need to generate the diagnostic file of a certain machine.
After generating the file, it produces a .zip file, say, Diag.zip which contains all the information/files of that certain machine.
Inside Diag.zip file contain a .xls, say, Data.xls which contains all the summary of all files in that certain machine, includes, the directory of the file, file version, file size, create time and md5.
Then save all the information of Data.xls in database.
After a day or so, do it again back in Step 1-4.
Then when I queried all the save data of Data.xls in the database in a 2 weeks range, and it shows that almost all files in that certain machine have its md5 value changed.
The question is: Why is it that md5 value always changed every time I generated a new diagnostic files?
There seems to be an issue with excel files, in particular Excel 2003 xls files. Whenever they get opened in Excel, even if they don't get changed and don't get saved, Excel automatically updates some of the file's metadata, such as the "Document Properties and Personal Information" and "Last Accessed Statistics". Therefore, the file every time it gets opened changes a little bit, and this makes that the MD5 changes also.
One way to avoid this problem is to remove "document properties and personal information".
Remove hidden data and personal information from Office documents. Excel 2007: Remove Hidden Data and Personal Information from Office Documents
Remove hidden data and personal information from Office documents. Excel 2013, Excel 2010: Remove Hidden Data and Personal Information by Inspecting Workbooks
Other way to avoid this would be to use xlsx files. I have been trying to replicate this behavior in xlsx files, but it seems it only happens on xls (2003).
The MD5 is based on a lot of things. But I can assume filesize, filename & creationdate.
If one of those changes, the md5 hash changes. The exact same file will always return the exact same md5 hash. A new file always generates a new md5 hash.

Resources