Date / File lineage library / code

Date / File lineage library / code - file

I am looking for a program or library which is able to provide date/file lineage which is able to monitor a file or directory and which records the metadata about the history of the data object and its derivation history. This will be stuff like processes which access the file, the time, the changes made, etc.
Does anybody know of anything which is able to provide this functionality please? Unfortunately the only things I have found so far are research papers and they don't seem to have made their code available.
Good to hear any ideas people might have.

(This is my area of research, so I'm pretty up on what is available.)
Unfortunately, there's very little available for file level provenance like you are looking for. The most advanced tools out there are PASS, from Harvard, which is a custom kernel, not stable enough for day to day use last I checked, or Burrito, from Stanford, which is not quite as fully featured, and is built using SystemTap.
Burrito: http://www.pgbovine.net/burrito.html
If you want to try PASS, the researchers are normally pretty friendly about sharing source. But again, this code is not stable.
Another thing that you should be aware of is that provenance at that level is extremely bulky, so be prepared to dedicate a lot of disk space, or prune your provenance regularly.

Related

Looking for advice on a Visual Studio/VS Code language for non-developer

In short, I am looking to create a POC (proof of concept) for an interface I have designed for aggregating data from several different sources. Currently I am just using flat tables in SQL with no relationships (although there are fields that are in the data for that purpose) to store the data, which is being gathered by various means (mostly PowerShell scripts). I also want it to be modular enough so I can add hooks to REST calls to other sources. So, if I make a call to get data about a certain asset, I don't have to know where that data is at that level. The middleware will know where to find it and use the appropriate method to get it (SQL, REST, etc.).
I have been in IT for over 31 years, I have experience with SQL (and have been a professional DBA at one point), a few scripting languages, and wrote a similar app in C# at one point, but I was given a template interface from a profession developer and I just ran with that and changed it as I went.
I want a high level view with colors indicating general health of a location, and the ability to search the data for assets (this could be anything from a VM, BM server, ethernet switches, FC switches, connected SAN, hypervisors, vCenters, Nutanix clusters, etc.) to find their location, health and most importantly, their connections to other assets.
I know a little about everything, enough to be dangerous in most and some a master of. What direction should I go here? I know BASIC, PowerShell well enough to call myself an expert, but more than likely, just a hacker. I know the basic concepts of coding well enough (I was an adjunct at a CC 20+ years ago) to teach it to others. I want to use a language I will be able to hand over to a professional to maintain once we have a new DevOps member on the team.
EDIT: I wrote a very small program earlier this year using React. That was broken into the interface and the 'middleware' with two separate project. Java is not an intuitive language for me and I could go this direction again, but I am looking for possible alternatives.

I'm not sure there's a 'non-developer' way of doing this unfortunately. If you really want to do this yourself, I'd suggest a C# API since you have some experience in that (even if it's minimal, it's still comfort), and luckily for you they just released minimal API's which should make your life much easier (https://dotnetcoretutorials.com/2021/07/16/building-minimal-apis-in-net-6/).
In terms of actually displaying the data, if it's a really simple UI I'd just keep it simple and use JavaScript and HTML/CSS (ie no frameworks such as React). It's a POC, and if someone comes in and wants to move over to using a framework because there's more work to be done on it, it's not that hard to take your existing code and move it into something more modern. But if it's small, a framework would be over engineering this, especially if you need to learn it all.
For smaller projects like this, you can kind of learn the code you need for the project, without needing to truly learn to code. Get your components from a library like Bootstrap (it's basically premade elements like your dropdowns, progress bars etc, their documentation is pretty good), and keep it as simple as possible. Good luck!

Is there value in producing code so flexible that it will never need to be updated?

I am currently involved in a debate with my coworkers surrounding how I should design an API that will be used by my department. Specifically, I am tasked with writing an API that will serve as a wrapper facade to access Active Directory information - tailored to my company's/department's needs. I am aware that open source wrappers facades already exist but that is not the crux of this question and is merely being used to serve as an example.
When I presented my design proposal to my team, they shot me down because the API was not "configurable" enough. They claimed that they did not want the API to make the link between "Phone number" and <Obscure Active Directory representation of Phone number>. Every person in the meeting (except for me) agreed that they would prefer to ask around, "What is the correct field in Active Directory to use for the user's phone number?", and plug that into their respective apps (LOL!).
They asked me, "What if our company decides to use a different field for phone number and you weren't around to make the change in your source code?" They eventually admitted that they were afraid to be tasked with changing someone else's source code, even if the code was pristine and had extensive unit tests. Every senior IT person in my department was in agreement on this.
Is this really the correct attitude to have when designing software?!

http://en.wikipedia.org/wiki/Inner_platform_effect
While hard-coding too many assumptions into your program is bad, overzealously avoiding hard-coded assumptions can be just as bad. If you try to make code excessively flexible, it becomes essentially impossible to configure, as the configuration scheme becomes almost a programming language in itself. I think in general, phone number is a common enough thing that it can just be hard coded as a field.

If I understood correctly, they want to have the option of mapping the links outside the code, be it through a configuration file, a database, whatever. If that is correct, I think they have a valid point - why be forced to change any code at all if all you need to do is to change a configuration mapping.

If possible, you should always err on the side of more configurable. It will save you headaches later.
Column Names
Specifically in your case, columns in tables are an inherently non-static variable. They will commonly change as your needs change.
If you have a "phonenum" column, then they add a second phone number, they change the column to "phonenum1" and "phonenum2". It would need to be changed in the code. Then if they change them to "Home_Phone", "Work_Phone", "Cell_Phone" then the code would again have to be changed. If, however, you had a mapping file (a key/value config file) then all these changes would be extremely simple to make.
In General
I disagree with dsimcha that an application can be 'too configurable'. What he is talking about is 'feature bloat', where there are so many intertwining configurables that it becomes impossible to change any one without futzing all the others. This is a very real problem. However, the problem is not the number of configuring options, the problem is how they are presented to the user.
If you present all the configuration options in a concise, clear, streamlined manner. There should be comments to explain each one, and how it interacts with the others. In that case, you can have as many configuration variables as you want, because you have been careful to keep them segregated into singles or pairs, and have marked them as such.
You should be writing applications so that external (environmental) changes do NOT require code changes. Things such as
Database user password changes
Column names change
"Temp folder" location changes
Target Machine name/ip change
App needs to be run twice a day instead of once
Logging levels
None of those changes affect the function of the application and so there should be NO CODE CHANGES required. That is the metric you should use if you ever wonder whether hard-coding is all right.
If the functionality needs to change, it should be a code change. Otherwise, make it configurable.

It seems easy enough to do both: produce a flexible API which allows the field to be specified, and then a wrapper around it which knows about the obscure ActiveDirectory name.
Of course, you could build that flexible solution later and just hard code the name for the moment. If that's significantly easier than the two-pronged approach, it's reasonable to argue for it - but if you'd probably end up with that sort of separation internally anyway, then it doesn't do much harm to expose it.

I can honestly say I have been in your position before and I agree with the argument they are presenting you. Especially with an in-house app you will see feature creep. The more useful your application, the worse the feature creep. It is possible your application could be used in another office and they will have fields mapped differently than your current office. If you hard code mappings you are then stuck with different versions for different locations. Maintaining separate versions of source code quickly becomes a nightmare for a programmer. If you design in configurability now and your application is forgotten you have lost very little, but if your application becomes a standard across the company you have saved yourself an immense amount of time in the future.

Fear of change, as well as fear of accountability for making a change, is not uncommon in IT software organizations. Often, the culture in the organization is what drives this attitude.
Having said that, in your specific example, you are describing an API that provides a facade on top of the ActiveDirectory service - one that appears to be intended to be used by different groups and/or projects in your organization.
In that particular scenario, it makes sense to make portions of your API support configurability, since you may ultimately find that the requirements for different projects or groups could be different, or change over time.
As a general practice, when I build code that involves a mapping of one programming interface to another and there are data mapping considerations involved, I try to make the mapping configurable. I've found that this helps both unit testing as well as dealing with evolving requirements or contradictory requirements from different consumers.

If you're saying "should I hard code everything", then I think it's not a good idea.
In 2 years you will be gone and there will be a programmer that will waste a lot of time trying to update your legacy code when updating a configuration file would have been way easier.
In some cases it makes sense to hard code information, but I' don't think that your situation is one of these cases. I'd need more knowledge of the situation to be sure, this is just my guess from what you said.

I think it depends on why the API is being created, and what problems you're aiming to solve. If the aim of the API is to be a service that lives on a server somewhere and manages requests from different applications, then I think your approach is probably the way to go, with the addition of a database or config files to perhaps customize the LDAP paths of certain properties.
However, if the goal of the API is to simple be a set of classes that abstract away the details of accessing Active Directory, but not what properties are being accessed, then what your coworkers have specified is the way to go.
Either approach isn't necessarily right or wrong, so it ultimately depends on your overall reasons for creating the API in the first place.

Database design for physics hardware

I have to develop a database for a unique environment. I don't have experience with database design and could use everybody's wisdom.
My group is designing a database for piece of physics hardware and a data acquisition system. We need a system that will store all the hardware configuration parameters, and track the changes to these parameters as they are changed by the user.
The setup:
We have nearly 200 detectors and roughly 40 parameters associated with each detector. Of these 40 parameters, we expect only a few to change during the course of the experiment. Most parameters associated with a single detector are static.
We collect data for this experiment in timed runs. During these runs, the parameters loaded into the hardware must not change, although we should be able to edit the database at any time to prepare for the next run. The current plan:
The database will provide the difference between the current parameters and the parameters used during last run.
At the start of a new run, the most recent database changes be loaded into hardware.
The settings used for the upcoming run must be tagged with a run number and the current date and time. This is essential. I need a run-by-run history of the experimental setup.
There will be several different clients that both read and write to the database. Although changes to the database will be infrequent, I cannot guarantee that the changes won't happen concurrently.
Must be robust and non-corruptible. The configuration of the experimental system depends on the hardware. Any breakdown of the database would prevent data acquisition, and our time is expensive. Database backups?
My current plan is to implement the above requirements using a sqlite database, although I am unsure if it can support all my requirements. Is there any other technology I should look into? Has anybody done something similar? I am willing to learn any technology, as long as it's mature.
Tips and advice are welcome.
Thank you,
Sean
Update 1:
Database access:
There are three lite applications that can write and read to the database and one application that can only read.
The applications with write access are responsible for setting a non-overlapping subset of the hardware parameters. To be specific, we have one application (of which there may be multiple copies) which sets the high voltage, one application which sets the remainder of the hardware parameters which may change during the experiment, and one GUI which sets the remainder of the parameters which are nearly static and are only essential for the proper reconstruction of the data.
The program with read access only is our data analysis software. It needs access to nearly all of the parameters in the database to properly format the incoming data into something we can analyze properly. The number of connections to the database should be >10.
Backups:
Another setup at our lab dumps an xml file every run. Even though I don't think xml is appropriate, I was planning to back up the system every run, just in case.

Some basic things about the design; you should make sure that you don't delete data from any tables; keep track of the most recent data (probably best with most recent updated datetime); when the data value changes, though, don't delete the old data. When a run is initiated, tag every table used with the Run ID (in another column); this way, you maintain full historical record about every setting, and can pin exactly what the state used at a given run was.

Ask around of your colleagues.
You don't say what kind of physics you're doing, or how big the working group is, but in my discipline (particle physics) there is a deep repository of experience putting up and running just this type of systems (we call it "slow controls" and similar). There is a pretty good chance that someone you work with has either done this or knows someone who has. There may be a detailed description of the last time out in someone's thesis.
I don't personally have much to do with this, but I do know this: one common feature is to have no-delete-no-overwrite design. You can only add data, never remove it. This preserves your chances of figuring out what really happened in the case of trouble
Perhaps I should explain a little more. While this is an important task and has to be done right, it is not really related to physics, so you can't look it up on Spires or on arXive.org. No one writes papers on the design and implementation of medium sized slow controls databases. But they do sometimes put it in their dissertations. The easiest way to find a pointer really is to ask a bunch of people around the lab.

This is not a particularly large database by the sounds of things. So you might be able to get away with using Oracle's free database which will give you all kinds of great flexibility with journaling (not sure if that is an actual word) and administration.
Your mentioning of 'non-corruptible' right after you say "There will be several different clients that both read and write to the database" raises a red flag for me. Are you planning on creating some sort of application that has a interface for this? Or were you planning on direct access to the db via a tool like TOAD?
In order to preserve your data integrity you will need to get really strict on your permissions. I would only allow one (and a backup) person to have admin rights with the ability to do the data manipulation outside the GUI (which will make your life easier).
Backups? Yes, absolutely! Not only should you do daily, weekly and monthly backups you should do full and incremental. Also, test your backup images often to confirm they are in fact working.
As for the data structure I would need much greater detail in what you are trying to store and how you would access it. But from what you have put here I would say you need the following tables (to begin with):
Detectors
Parameters
Detector_Parameters
Some additional notes:
Since you will be doing so many changes I recommend using a version control like SVN to keep track of all your DDLs etc. I would also recommend using something like bugzilla for bug tracking (if needed) and using google docs for team document management.
Hope that helps.

Non-file FileSystems?

I've been thinking on this for a while now (you know, that dangerous thing programmers tend to do) and I've been wondering, is the method of storing data that we're so accustomed to really all that efficient? The trouble with answering this question is that I really don't have anything to compare it to, since it's the only thing I've ever used.
I don't mean FAT or NTFS or a particular type of file system, I mean the filesystem structure as a whole. We are simply used to thinking of "files" inside "folders" like our hard drive was one giant filing cabinet. This is a great analogy and indeed, it makes it a lot easier to learn when we think of it this way, but is it really the best way to go about describing programs and their respective parts?
I'd like to know if anyone can think of (or knows about) a data storage technique that might be used to store data for an Operating System to use that would organize the parts of data in a different manner. Does anything... different even exist?

Emails are often stored in folders. But ever since I have migrated to Gmail, I have become accustomed to classifying my emails with tags.
I often wondered if we could manage a whole file-system that way: instead of storing files in folders, you could tag files with the tags you like. A file identifier would not look like this:
/home/john/personal/contacts.txt
but more like this:
contacts[john,personal]
Well... just food for thought (maybe this already exists!)

You can for example have dedicated solutions, like Oracle Raw Partitions. Other databases support similar thing. In these cases the filesystem provides unnecessary overhead and can be ommited - DB software will take care of organising the structure.
The problem seems very application dependent and files/folders seem to be a reasonable compromise for many applications (and is easy for human beings to comprehend).

Mainframes used to just give programmers a number of 'devices' to use. The device corresponsed to a drive or a partition thereof and the programmer was responsible for the organisation of all data on it. Of course they quickly built up libraries to help with that.
The only OS I think think of that does use the common hierachical arrangement of flat files (like UNIX) is PICK. That used a sort of relational database as the filesystem.

Microsoft had originally planned to introduce a new file-system for windows vista (WinFS - windows future storage). The idea was to store everything in a relational database (SQL Server). As far as I know, this project was never (or not yet?) finished.
There's more information about it on wikipedia.

I knew a guy who wrote his doctorate about a hard disk that comes with its own file system. It was based on an extension of SCSI commands that allowed the usual open, read, write and close commands to be sent to the disk directly, bypassing the file system drivers of the OS. I think the conclusion was that it is inflexible, and does not add much efficiency.
Anyway, this disk based file system still had a folder like structure I believe, so I don't think it really counts for you ;-)

Well, there's always Pick, where the OS and file system were an integrated database.

Traditional file systems are optimized for fast file access if you know the name of the file you want (including its path). Directories are a way of grouping files together so that they're easier to find if you know properties of the file but not its actual name.
Traditional file systems are not good at finding files if you know very little about them, however they are robust enough that one can add a layer on top of them to aid in retrieving files based on content or meta-information such as tags. That's what indexers are for.
The bottom line is we need a way to store persistently the bytes that the CPU needs to execute. So we have traditional file systems which are very good at organizing sequential sets of bytes. We also need to store persistently the bytes of files that aren't executed directly, but are used by things that do execute. Why create a new system for the same fundamental thing?
What more should a file system do other than store and retrieve bytes?

I'll echo the other responses. If I could pick a filesystem type, I personally would rather see a hybrid approach: a flat database of subtrees, where each subtree is considered as a cohesive unit, but if you consider the subtrees themselves as discrete units they would have no hierarchy, but instead could have metadata + be queryable on that metadata.

The reason for files is that humans like to attach names to "things" they have to use. Otherwise, it becomes hard to talk or think about or even distinguish them.
When we have too many things on a heap, we like to separate the heap. We sort it by some means and we like to build hierarchies where you can navigate arbitrarily sized amounts of things.
Hence directories and files just map our natural way of working with real objects. Since you can put anything in a file. On Unix, even hardware is mapped as "device nodes" into the file system which are special files which you can read/write to send commands to the hardware.
I think the metaphor is so powerful, it will stay.

I spent a while trying to come up with an automagically versioning file system that would maintain versions (and version history) of any specific file and/or directory structure.
The idea was that all of the standard access command (e.g. dir, read, etc.) would have an optional date/time parameter that could be passed to access the file system as it looked at that point in time.
I got pretty far with it, but had to abandon it when I had to actually go out and earn some money. It's been on the back-burner since then.

If you take a look at the start-up times for operating systems, it should be clear that improvements in accessing disks can be made. I'm not sure if the changes should be in the file system or rather in the OS start-up code.

Personally, I'm really sorry WinFS didn't fly. I loved the concept..
From Wikipedia (http://en.wikipedia.org/wiki/WinFS) :
WinFS includes a relational database
for storage of information, and allows
any type of information to be stored
in it, provided there is a well
defined schema for the type.
Individual data items could then be
related together by relationships,
which are either inferred by the
system based on certain attributes or
explicitly stated by the user. As the
data has a well defined schema, any
application can reuse the data; and
using the relationships, related data
can be effectively organized as well
as retrieved. Because the system knows
the structure and intent of the
information, it can be used to make
complex queries that enable advanced
searching through the data and
aggregating various data items by
exploiting the relationships between
them.

Software protection for small vendors

This is a problem we all have to consider at some point.
After many years and many approaches I tend to agree in general with the staterment:
"For any protected software used by more than a few hundred people, you can find a cracked version. So far, every protection scheme can be tampered with."
Does your employer enforce the use of anti-piracy software?
Further, every time I post about this subject, some one will remind me;
"First of all, no matter what kind of protection you'll employ,a truly dedicated cracker will, eventually, get through all of the protective barriers."
What's the best value for money c# code protection for a single developer
So not withstanding these two broadly true disclaimers, lets talk about "protection"!
I still feel that for smaller apps that are unlikely to warrent the time and attention of a skilled cracker, protection IS a worthwhile exercise.
It seems obvious that no matter what you do, if the cracker can switch the outcome of an IF statement (jmp) by patching the application, then all the passwords and dongles in the world anre not going to help.
So My approach has been to obfuscate the code with virtualization using products like:
http://www.oreans.com/codevirtualizer.php
I have been very happy with this product. To my knowledge it has neve been defeated.
I can even compress the executable with PEcompact
Does anyone else have experience with it?
Had nothing but problems with EXEcryptor
http://www.strongbit.com/news.asp
Even the site is a headache to use.
The compiled apps would crash when doing any WMI calls.
This approach allows you to surround smaller sections of code with the obfuscation and thus protect the security checking etc.
I Use the online authorization approach, as the application needs data from the server regularly so it makes no sense for the user to use it off line for extended periods. By definition, the app is worthless at that point, even if it is cracked.
So a simple encrypted handshake is plenty good. I just check it occasionally within the obfuscation protection. If the user installs the app on a different machine, a New ID is uploaded upon launch and the server disables the old ID and returns a new authorization.
I also use a hash of the compiled app and check it at launch to see if a single bit has changed, then open the app as a file (with a read LOCK) from within the app to prevent anyone changing it once launched.
Since all static strings are clearly visible in the .exe file, I try to be generic with error messages and so forth. You will not find the string "Authorization failed" anywhere.
To protect against memory dumps, I use a simple text obfuscation technique (like XOR every character) This makes plain text data in memory harder to distinguish from variables and so forth.
Then of course there is AES for any data that is really sensitive. I like counter mode for text as this results in no repeating sequences revealing underlying data like a sequence of white spaces.
But with all these techniques, if the Key or Initialization vector can be dumped from memory, or the IF statement bypassed, everything is wasted.
I tend to use a switch statement rather than a conditional statement. Then I create a second function that is basically a dead end instead of the function that actually performs the desired task.
Another idea is to code pointers with a variable added. The variable is the result of the authorization (usually zero). This will inevitable lead to a GPF at some point.
I only use this as a last resort after a few lower level authorizations have failed otherwise real users may encounter it. Then the reputation of your software is lowered.
What techniques do you use?
(this is NOT a thread debating the merits of implementing something. It is designed for those that have decided to do SOMETHING)

I disagree xsl.
We protect our code, not because we want to protect our revenue - we accept that those who would use if without a license probably would never pay for it anyway.
Instead, we do it to protect the investment our customers have made in our software. We believe that the use of our software makes them more competative in their market place and that if other companies have access to it without paying they have an unfair advantage - ie, they become as competative without having the overhead of the licensing cost.
We are very careful to ensure that the protection - which is home grown - is as unobtrusive as possible to the valid users, and to this end we would never consider 'buying in' an off the shelf solution that may impact this.

You don't need a few hundred users to get your software cracked. I got annoyed at having my shareware cracked so many times, so as an experiment I created a program called Magic Textbox (which was just a form with a textbox on it) and released it to shareware sites (it had its own PAD file and everything). A day later a cracked version of Magic Textbox was available.
This experience made me pretty much give up trying to protect my software with anything more than rudimentary copy protection.

I personally use the code techniques discussed here. These tricks have the benefit of inconveniencing pirates without making life more difficult for your legitimate end-users
But the more interesting question isn't "what", but "why". Before a software vendor embarks on this type of exercise, it's really important to build a threat model. For example, the threats for a low-priced B2C game are entirely different to those for a high-value B2B app.
Patrick Mackenzie has a good essay where he discusses some of the threats, including an analysis of 4 types of potential customer. I recommend doing this threat analysis for your own app before making choices about protecting your business model.

I've implemented hardware keying (dongles) before myself, so I'm not totally unfamiliar with the issues. In fact, I've given it a great deal of thought. I don't agree with anyone violating copyright law, as your crackers are doing. Anyone who doesn't want to legally aquire a copy of your software should do without. I don't ever violate software copyright myself. That being said...
I really, really dislike the word "protect" used here. The only thing you are trying to protect is your control. You are not protecting the software. The software is just fine either way, as are your users.
The reason that keeping people from copying and sharing your software is such an unholy PITA is that preventing such activites is unnatural. The whole concept of a computer revolves around copying data, and it is simple human nature to want to share useful things. You can fight these facts if you really insist, but it will be a lifelong fight. God isn't making humans any differently, and I'm not buying a computer that can't copy things. Perhaps it would be better to find some way to work with computers and people, rather than fighting against them all the time?
I, along with the majority of professional software developers, am employed full time by a company that needs software developed so that it can do its business, not so it can have a "software product" with artificial scarcity to "sell" to users. If I write something generally useful (that isn't considered a "competive advantage" here), we can release it as Free Software. No "protection" is needed.

From some of the links:
The concept I tried to explain is what I call the “crack spread”. It doesn’t matter that a crack (or keygen, or pirated serial, or whatever) exists for your application. What matters is how many people have access to the crack.
Where/when to check the serial number: I check once on startup. A lot of people say “Check in all sorts of places”, to make it harder for someone to crack by stripping out the check. If you want to be particularly nasty to the cracker, check in all sorts of places using inlined code (i.e. DON’T externalize it all into SerialNumberVerifier.class) and if at all possible make it multi-threaded and hard to recognize when it fails, too. But this just makes it harder to make the crack, not impossible, and remember your goal is generally not to defeat the cracker. Defeating the cracker does not make you an appreciable amount of money. You just need to defeat the casual user in most instances, and the casual user does not have access to a debugger nor know how to use one.
If you’re going to phone home, you should be phoning home with their user information and accepting the serial number as the output of your server’s script, not phoning home with the serial number and accepting a boolean, etc, as the output. i.e. you should be doing key injection, not key verification. Key verification has to ultimately happen within the application, which is why public key crypto is the best way to do it. The reason is that the Internet connection is also in the hands of the adversary :) You’re a hosts file change away from a break-once, break-everywhere exploit if your software is just expecting to read a boolean off the Internet.
Do not make an “interesting” or “challenging” protection. Many crackers crack for the intellectual challenge alone. Make your protection hard to crack but as boring as possible.
There are some cracks which search for byte patterns in search for the place to patch. They usually aren’t defeated by a recompile, but if your .EXE is packed (by ASProtect, Armadillo, etc) these kind of cracks must first unpack the .EXE.. and if you use a good packer such as ASProtect, the cracker will be able to unpack the EXE manually using an assembly level debugger such as SoftICE, but won’t be able to create a tool which unpacks the .EXE automatically (to apply the byte patches afterwards).

I have used .NET Reactor in the past with good results - http://www.eziriz.com/
What I liked about this product is that it did not require you to obfuscate the code in order to have pretty good protection.

xsl, that is a very narrow point of view with MANY built in assumtions.
It seems obvious to me that any app that relies on delivering something from a server under your control should be able to do a fairly good job of figuring our who has a valid account!
I am also of the belief that regular updates (meaning a newly compiled app with code in different locations) will make cracked vesrions obsolete quickly. If your app communicates with a server, launching a secondary process to replace the main executable every week is a piece of cake.
So yes, nothing is uncrackable, but with some clever intrinsic design, it becomes a moot point. The only factor that is significant is how much time are the crackers willing to spend on it, and how much effort are your potential customers willing to exert in trying to find the product of their efforts on a weekly or even daily basis!
I suspect that if your app provides a usefull valuable function then they will be willing to pay a fair price for it. If not, Competitive products will enter the market and your problme just solved itself.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight