Related
I wrote a new consensus algorithm. Is there a self-evaluation checklist I can run to see if it meets the basic requirements? Like is it resistant to double-spent attacks? Or how does it scales?
I reviewed this entire algorithm. Though the idea is great, it feels a bit incomplete. The self-evaluation checklist below is based on the requirements and safety measures taken by well established blockchains i.e. ETH, BTC, etc.
System Criteria:
What is the required storage capacity of system?
-- RAM usage, bandwidth
What happens when the entire network goes offline?
Algorithm evaluation:
Is this algorithm scalable? scalable as in operable when the users multiply exponentially.
How long does it take the miners to reach a 2/3 consensus?
Are there safety measures for the user' funds?
How can a user transfer funds safely? (Cryptographic hash algorithms that can be deciphered only by the authorised entities to ensure safety)
Architecture evaluation:
Is it decentralized, transparent and immutable?
User evaluation:
Is there enough incentive for a miner/validator to validate the transactions?
Is there enough incentive for a "new" miner/validator to join the network?
Is it possible for a single entity to dominate the network?
What safety measure is there in order to prevent blind/unreliable transfer of data?
Resistance to attacks evaluation:
Is the algorithm resistant to double spend attacks, eclipse attacks, sybil attacks(identity thefts), 67% attack?
Is there a way for the honest users to defend against such attacks? If not then how likely is it that an attacker is successful after attacking the blockchain?
As an attacker what is the weakness of this algorithm? Once something is confirmed by 2/3 then it is unchangeable, so how can you get that 2/3 vote ?
These are some conditions that came to my mind while reading the algorithm description which were unanswered. A consensus algorithm takes into account the maximum throughput and latency of the current systems in order to provide a holistic idea of how to evade attacks and secure the users. If the consensus algorithm fails to do either of those then it would not fly in the market because the network would become untrustworthy because of a lacking algorithm. To ensure it is not, such questions should be asked in addition to the blockchain/algorithm specific questions that would rise in a user's mind when trying to join a network. At the end of the day, everyone likes to keep their money safe and secure and hidden away from the general public to avoid any and all kinds of attack.
I'll admit I didn't read it too carefully - but I was looking on how the document hands CAP theorem.
There is a statement in your doc: "since they (validators) are looking at the full blockchain picture". This statement is never true in a distributed system.
Second statement "Once 2/3 of the validators approve an item" - who does this decision that 2/3 reached? When does the customer knows that the transaction is good? It seems the system is not too stable and will come to halt quite often.
Looking forward for other comments from the community :)
I will be in my final year (Electrical and Computer Engineering )the next semester and I am searching for a graduation project in embedded systems or hardware design . My professor advised me to search for a current system and try to improve it using hardware/software codesign and he gave me an example of the "Automated License Plate Recognition system" where I can use dedicated hardware by VHDL or verilog to make the system perform better .
I have searched a bit and found some youtube videos that are showing the system working ok .
So I don't know if there is any room of improvement . How to know if certain algorithms or systems are slow and can benefit from codesign ?
How to know if certain algorithms or systems are slow and can benefit
from codesign ?
In many cases, this is an architectural question that is only answered with large amounts of experience or even larger amounts of system modeling and analysis. In other cases, 5 minutes on the back of an envelop could show you a specialized co-processor adds weeks of work but no performance improvement.
An example of a hard case is any modern mobile phone processor. Take a look at the TI OMAP5430. Notice it has a least 10 processors, of varying types(the PowerVR block alone has multiple execution units) and dozens of full-custom peripherals. Anytime you wish to offload something from the 'main' CPUs, there is a potential bus bandwidth/silicon area/time-to-market cost that has to be considered.
An easy case would be something like what your professor mentioned. A DSP/GPU/FPGA will perform image processing tasks, like 2D convolution, orders of magnitude faster than a CPU. But 'housekeeping' tasks like file-management are not something one would tackle with an FPGA.
In your case, I don't think that your professor expects you to do something 'real'. I think what he's looking for is your understanding of what CPUs/GPUs/DSPs are good at, and what custom hardware is good at. You may wish to look for an interesting niche problem, such as those in bioinformatics.
I don't know what codesign is, but I did some verilog before; I think simple image (or signal) processing tasks are good candidates for such embedded systems, because many times they involve real time processing of massive loads of data (preferably SIMD operations).
Image processing tasks often look easy, because our brain does mind-bogglingly complex processing for us, but actually they are very challenging. I think this challenge is what's important, not if such a system were implemented before. I would go with implementing Hough transform (first for lines and circles, than the generalized one - it's considered a slow algorithm in image processing) and do some realtime segmentation. I'm sure it will be a challenging task as it evolves.
First thing to do when partitioning is to look at the dataflows. Draw a block diagram of where each of the "subalgorithms" fits, along with the data going in and out. Anytime you have to move large amounts of data from one domain to another, start looking to move part of the problem to the other side of the split.
For example, consider an image processing pipeline which does an edge-detect followed by a compare with threshold, then some more processing. The output of the edge-detect will be (say) 16-bit signed values, one for each pixel. The final output is a binary image (a bit set indicates where the "significant" edges are).
One (obviously naive, but it makes the point) implementation might be to do the edge detect in hardware, ship the edge image to software and then threshold it. That involves shipping a whole image of 16-bit values "across the divide".
Better, do the threshold in hardware also. Then you can shift 8 "1-bit-pixels"/byte. (Or even run length encode it).
Once you have a sensible bandwidth partition, you have to find out if the blocks that fit in each domain are a good fit for that domain, or maybe consider a different partition.
I would add that in general, HW/SW codesign is useful when it reduces cost.
There are 2 major cost factors in embedded systems:
development cost
production cost
The higher is your production volume, the more important is the production cost, and development cost becomes less important.
Today it is harder to develop hardware than software. That means that development cost of codesign-solution will be higher today. That means that it is useful mostly for high-volume production. However, you need FPGAs (or similar) to do codesign today, and they cost a lot.
That means that codesign is useful when cost of necessary FPGA will be lower than an existing solution for your type of problem (CPU, GPU, DSP, etc), assuming both solutions meet your other requirements. And that will be the case (mostly) for high-performance systems, because FPGAs are costly today.
So, basically you will want to codesign your system if it will be produced in high volumes and it is a high-performance device.
This is a bit simplified and might become false in a decade or so. There is an ongoing research on HW/SW synthesis from high-level specifications + FPGA prices are falling. That means that in a decade or so codesign might become useful for most of embedded systems.
Any project you end up doing, my suggestion would be to make a software version and a hardware version of the algorithm to do performance comparison. You can also do a comparison on development time etc. This will make your project a lot more scientific and helpful for everyone else, should you choose to publish anything. Blindly thinking hardware is faster than software is not a good idea, so profiling is important.
I need to implement a multi-agent system for an assignment. I have been brainstorming for ideas as to what I should implement, but I did not come up with anything great yet. I do not want it to be a traffic simulation application, but I need something just as useful.
I once saw an application of multiagent systems for studying/simulating fire evacuation plans in large buildings. Imagine a large building with thousands of people; in case of fire, you want these people to follow some rules for evacuating the building. To evaluate the effectiveness of your evacuation plan rules, you may want to simulate various scenarios using a multiagent system. I think it's a useful and interesting application. If you search the Web, you will find papers and works in this area, from which you might get further inspiration.
A few come to mind:
Exploration and mapping: send a team of agents out into an environment to explore, then assimilate all of their observations into consistent maps (not an easy task!)
Elevator scheduling: how to service call requests during peak capacities considering the number and location of pending requests, car locations, and their capacities (not too far removed from traffic-light scheduling, though)
Air traffic control: consider landing priorities (i.e. fuel. number of passengers, emergency conditions,etc.), airplane position and speed, and landing conditions (ie. number of runways, etc). Then develop a set of rules so that each "agent" (i.e. airplane) assumes its place in a landing sequence. Note that this is a harder version of the flocking problem mentioned in another reply.
Not sure what you mean by "useful" but... you can always have a look at swarmbased AI (school of fish, flock of birds etc.) Each agent (boid) is very very simple in this case. Make the individual agents follow each other, stay away from a predator etc.
Its not quite multi-agent but have you considered a variation on ant colony optimisation ?
http://en.wikipedia.org/wiki/Ant_colony_optimization_algorithms
What are some good ways for an organization to share key data across many deparments and applications?
To give an example, let's say there is one primary application and database to manage customer data. There are ten other applications and databases in the organization that read that data and relate it to their own data. Currently this data sharing is done through a mixture of database (DB) links, materialized views, triggers, staging tables, re-keying information, web services, etc.
Are there any other good approaches for sharing data? And, how do your approaches compare to the ones above with respect to concerns like:
duplicate data
error prone data synchronization processes
tight vs. loose coupling (reducing dependencies/fragility/test coordination)
architectural simplification
security
performance
well-defined interfaces
other relevant concerns?
Keep in mind that the shared customer data is used in many ways, from simple, single record queries to complex, multi-predicate, multi-sort, joins with other organization data stored in different databases.
Thanks for your suggestions and advice...
I'm sure you saw this coming, "It Depends".
It depends on everything. And the solution to sharing Customer data for department A may be completely different for sharing Customer data with department B.
My favorite concept that has risen up over the years is the concept of "Eventual Consistency". The term came from Amazon talking about distributed systems.
The premise is that while the state of data across a distributed enterprise may not be perfectly consistent now, it "eventually" will be.
For example, when a customer record gets updated on system A, system B's customer data is now stale and not matching. But, "eventually", the record from A will be sent to B through some process. So, eventually, the two instances will match.
When you work with a single system, you don't have "EC", rather you have instant updates, a single "source of truth", and, typically, a locking mechanism to handle race conditions and conflicts.
The more able your operations are able to work with "EC" data, the easier it is to separate these systems. A simple example is a Data Warehouse used by sales. They use the DW to run their daily reports, but they don't run their reports until the early morning, and they always look at "yesterdays" (or earlier) data. So there's no real time need for the DW to be perfectly consistent with the daily operations system. It's perfectly acceptable for a process to run at, say, close of business and move over the days transactions and activities en masse in a large, single update operation.
You can see how this requirement can solve a lot of issues. There's no contention for the transactional data, no worries that some reports data is going to change in the middle of accumulating the statistic because the report made two separate queries to the live database. No need to for the high detail chatter to suck up network and cpu processing, etc. during the day.
Now, that's an extreme, simplified, and very coarse example of EC.
But consider a large system like Google. As a consumer of Search, we have no idea when or how long it takes for a search result that Google harvests to how up on a search page. 1ms? 1s? 10s? 10hrs? It's easy to imaging how if you're hitting Googles West Coast servers, you may very well get a different search result than if you hit their East Coast servers. At no point are these two instances completely consistent. But by large measure, they are mostly consistent. And for their use case, their consumers aren't really affected by the lag and delay.
Consider email. A wants to send message to B, but in the process the message is routed through system C, D, and E. Each system accepts the message, assume complete responsibility for it, and then hands it off to another. The sender sees the email go on its way. The receiver doesn't really miss it because they don't necessarily know its coming. So, there is a big window of time that it can take for that message to move through the system without anyone concerned knowing or caring about how fast it is.
On the other hand, A could have been on the phone with B. "I just sent it, did you get it yet? Now? Now? Get it now?"
Thus, there is some kind of underlying, implied level of performance and response. In the end, "eventually", A's outbox matches B inbox.
These delays, the acceptance of stale data, whether its a day old or 1-5s old, are what control the ultimate coupling of your systems. The looser this requirement, the looser the coupling, and the more flexibility you have at your disposal in terms of design.
This is true down to the cores in your CPU. Modern, multi core, multi-threaded applications running on the same system, can have different views of the "same" data, only microseconds out of date. If your code can work correctly with data potentially inconsistent with each other, then happy day, it zips along. If not you need to pay special attention to ensure your data is completely consistent, using techniques like volatile memory qualifies, or locking constructs, etc. All of which, in their way, cost performance.
So, this is the base consideration. All of the other decisions start here. Answering this can tell you how to partition applications across machines, what resources are shared, and how they are shared. What protocols and techniques are available to move the data, and how much it will cost in terms of processing to perform the transfer. Replication, load balancing, data shares, etc. etc. All based on this concept.
Edit, in response to first comment.
Correct, exactly. The game here, for example, if B can't change customer data, then what is the harm with changed customer data? Can you "risk" it being out of date for a short time? Perhaps your customer data comes in slowly enough that you can replicate it from A to B immediately. Say the change is put on a queue that, because of low volume, gets picked up readily (< 1s), but even still it would be "out of transaction" with the original change, and so there's a small window where A would have data that B does not.
Now the mind really starts spinning. What happens during that 1s of "lag", whats the worst possible scenario. And can you engineer around it? If you can engineer around a 1s lag, you may be able to engineer around a 5s, 1m, or even longer lag. How much of the customer data do you actually use on B? Maybe B is a system designed to facilitate order picking from inventory. Hard to imagine anything more being necessary than simply a Customer ID and perhaps a name. Just something to grossly identify who the order is for while it's being assembled.
The picking system doesn't necessarily need to print out all of the customer information until the very end of the picking process, and by then the order may have moved on to another system that perhaps is more current with, especially, shipping information, so in the end the picking system doesn't need hardly any customer data at all. In fact, you could EMBED and denormalize the customer information within the picking order, so there's no need or expectation of synchronizing later. As long as the Customer ID is correct (which will never change anyway) and the name (which changes so rarely it's not worth discussing), that's the only real reference you need, and all of your pick slips are perfectly accurate at the time of creation.
The trick is the mindset, of breaking the systems up and focusing on the essential data that's necessary for the task. Data you don't need doesn't need to be replicated or synchronized. Folks chafe at things like denormalization and data reduction, especially when they're from the relational data modeling world. And with good reason, it should be considered with caution. But once you go distributed, you have implicitly denormalized. Heck, you're copying it wholesale now. So, you may as well be smarter about it.
All this can mitigated through solid procedures and thorough understanding of workflow. Identify the risks and work up policy and procedures to handle them.
But the hard part is breaking the chain to the central DB at the beginning, and instructing folks that they can't "have it all" like they may expect when you have a single, central, perfect store of information.
This is definitely not a comprehensive reply. Sorry, for my long post and I hope it adds to thoughts that would be presented here.
I have a few observations on some of the aspect that you mentioned.
duplicate data
It has been my experience that this is usually a side effect of departmentalization or specialization. A department pioneers collection of certain data that is seen as useful by other specialized groups. Since they don't have unique access to this data as it is intermingled with other data collection, in order to utilize it, they too start collecting / storing the data, inherently making it duplicate. This issue never goes away and just like there is a continuos effort in refactoring code and removing duplication, there is a need to continuously bring duplicate data for centralized access, storage and modification.
well-defined interfaces
Most interfaces are defined with good intention keeping other constraints in mind. However, we simply have a habit of growing out of the constraints placed by previously defined interfaces. Again a case for continuos refactoring.
tight coupling vs loose coupling
If any thing, most software is plagued by this issue. The tight coupling is usually a result of expedient solution given the constraint of time we face. Loose coupling incurs a certain degree of complexity which we dislike when we want to get things done. The web services mantra has been going rounds for a number of years and I am yet to see a good example of solution that completely alleviates the point
architectural simplification
To me this is the key to fighting all the issues you have mentioned in your question. SIP vs H.323 VoIP story comes into my mind. SIP is very simplified, easy to build while H.323 like a typical telecom standard tried to envisage every issue on the planet about VoIP and provide a solution for it. End result, SIP grew much more quickly. It is a pain to be H.323 compliant solution. In fact, H.323 compliance is a mega buck industry.
On a few architectural fads that I have grown up to.
Over years, I have started to like REST architecture for it's simplicity. It provides a simple unique access to data and easy to build applications around it. I have seen enterprise solution suffer more from duplication, isolation and access of data than any other issue like performance etc. REST to me provides a panacea to some of those ills.
To solve a number of those issues, I like the concept of central "Data Hubs". A Data Hub represents a "single source of truth" for a particular entity, but only stores IDs, no information like names etc. In fact, it only stores ID maps - for example, these map the Customer ID in system A, to the Client Number from system B, and to the Customer Number in system C. Interfaces between the systems use the hub to know how to relate information in one system to the other.
It's like a central translation; instead of having to write specific code for mapping from A->B, A->C, and B->C, with its attendance exponential increase as you add more systems, you only need to convert to/from the hub: A->Hub, B->Hub, C->Hub, D->Hub, etc.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
My company is looking to start distributing some software we developed and would like to be able to let people try the software out before buying. We'd also like to make sure it can't be copied and distributed to our customers' customers.
One model we've seen is tying a license to a MAC address so the software will only work on one machine.
What I'm wondering is, what's a good way to generate a license key with different information embedded in it such as license expiration date, MAC address, and different software restrictions?
I've used both FLEXlm from Macrovision (formerly Globetrotter) and the newer RLM from Reprise Software (as I understand, written by FlexLM's original authors). Both can key off either the MAC address or a physical dongle, can be either node-locked (tied to one machine only) or "floating" (any authorized machine on the network can get a license doled out by a central license server, up to a maximum number of simultaneously checked-out copies determined by how much they've paid for). There are a variety of flexible ways to set it up, including expiration dates, individual sub-licensed features, etc. Integration into an application is not very difficult. These are just the two I've used, I'm sure there are others that do the job just as well.
These programs are easily cracked, meaning that there are known exploits that let people either bypass the security of your application that uses them, either by cutting their own licenses to spoof the license server, or by merely patching your binary to bypass the license check (essentially replacing the subroutine call to their library with code that just says "return 'true'". It's more complicated than that, but that's what it mostly boils down to. You'll see cracked versions of your product posted to various Warez sites. It can be very frustrating and demoralizing, all the more so because they're often interested in cracking for cracking sake, and don't even have any use for your product or knowledge of what to do with it. (This is obvious if you have a sufficiently specialized program.)
Because of this, some people will say you should write your own, maybe even change the encryption scheme frequently. But I disagree. It's true that rolling your own means that known exploits against FLEXlm or RLM won't instantly work for your application. However, unless you are a total expert on this kind of security (which clearly you aren't or you wouldn't be asking the question), it's highly likely that in your inexperience you will end up writing a much less secure and more crackable scheme than the market leaders (weak as they may be).
The other reason not to roll your own is simply that it's an endless cat and mouse game. It's better for your customers and your sales to put minimal effort into license security and spend that time debugging or adding features. You need to come to grips with the licensing scheme as merely "keeping honest people honest", but not preventing determined cracking. Accept that the crackers wouldn't have paid for the software anyway.
Not everybody can take this kind of zen attitude. Some people can't sleep at night knowing that somebody somewhere is getting something for nothing. But try to learn to deal with it. You can't stop the pirates, but you can balance your time/effort/expense trying to stop all piracy versus making your product better for users. Remember, sometimes the most pirated applications are also the most popular and profitable. Good luck and sleep well.
I'd suggest you take the pieces of information you want in the key, and hash it with md5, and then just take the first X characters (where X is a key length you think is manageable).
Cryptographically, it's far from perfect, but this is the sort of area where you want to put in the minimum amount of effort which will stop a casual attacker - anything more quickly becomes a black hole.
Oh, I should also point out, you will want to provide the expiration date (and any other information you might want to read out yourself) in plain text (or slightly obfuscated) as part of the key as well if you go down this path - The md5 is just to stop the end user from changing he expiration date to extend the license.
The easiest thing would be a key file like this...
# License key for XYZZY
expiry-date=2009-01-01
other-info=blah
key=[md5 has of MAC address, expiry date, other-info]
We've used the following algorithm at my company for years without a single incident.
Decide the fields you want in the code. Bit-pack as much as possible. For example, dates could be "number of days since 2007," and then you can get away with 16-bits.
Add an extra "checksum" field. (You'll see why in a second.) The value of this field is a checksum of the packed bytes from the other fields. We use "first 32 bits from MD5."
Encrypt everything using TEA. For the key, use something that identifies the customer (e.g. company name + personal email address), that way if someone wants to post a key on the interweb they have to include their own contact info in plain text.
Convert hex to a string in some sensible way. You can do straight hex digits but some people like to pick a different set of 16 characters to make it less obvious. Also include dashes or something regularly so it's easier to read it over the phone.
To decrypt, convert hex to string and decrypt with TEA. But then there's this extra step: Compute your own checksum of the fields (ignoring the checksum field) and compare to the given checksum. This is the step that ensures no one tampered with the key.
The reason is that TEA mixes the bits completely, so if even one bit is changed, all other bits are equally likely to change during TEA decryption, therefore the checksum will not pass.
Is this hackable? Of course! Almost everything is, but this is tight enough and simple to implement.
If tying to contact information is not sufficient, then include a field for "Node ID" and lock it to MAC address or somesuch as you suggest.
Don't use MAC addresses. On some hardware we've tested - in particular some IBM Thinkpads - the MAC address can change on a restart. We didn't bother investigating why this was, but we learned quite early during our research not to rely on it.
Obligatory disclaimer & plug: the company I co-founded produces the OffByZero Cobalt licensing solution. So it probably won't surprise you to hear that I recommend outsourcing your licensing, & focusing on your core competencies.
Seriously, this stuff is quite tricky to get right, & the consequences of getting it wrong could be quite bad. If you're low-volume high-price a few pirated copies could seriously dent your revenue, & if you're high-volume low-price then there's incentive for warez d00dz to crack your software for fun & reputation.
One thing to bear in mind is that there is no such thing as truly crack-proof licensing; once someone has your byte-code on their hardware, you have given away the ability to completely control what they do with it.
What a good licensing system does is raise the bar sufficiently high that purchasing your software is a better option - especially with the rise in malware-infected pirated software. We recommend you take a number of measures towards securing your application:
get a good third-party licensing system
pepper your code with scope-contained checks (e.g. no one global variable like fIsLicensed, don't check the status of a feature near the code that implements the feature)
employ serious obfuscation in the case of .NET or Java code
The company I worked for actually used a usb dongle. This was handy because:
Our software was also installed on that USB Stick
The program would only run if it found the (unique) hardware key (any standard USB key has that, so you don't have to buy something special, any stick will do)
it was not restricted to a computer, but could be installed on another system if desired
I know most people don't like dongles, but in this case it was quite handy as it was actually used for a special purpose media player that we also delivered, the USB keys could thus be used as a demo on any pc, but also, and without any modifications, be used in the real application (ie the real players), once the client was satisfied
We keep it simple: store every license data to an XML (easy to read and manage), create a hash of the whole XML and then crypt it with a utility (also own and simple).
This is also far from perfect, but it can hold for some time.
Almost every commercial license system has been cracked, we have used many over the years all eventually get cracked, the general rule is write your own, change it every release, once your happy try to crack it yourself.
Nothing is really secure, ultimately look at the big players Microsoft etc, they go with the model honest people will pay and other will copy, don't put too much effort into it.
If you application is worth paying money for people will.
I've used a number of different products that do the license generation and have created my own solution but it comes down to what will give you the most flexibility now and down the road.
Topics that you should focus on for generating your own license keys are...
HEX formating, elliptic curve cryptography, and any of the algorithms for encryption such as AES/Rijndael, DES, Blowfish, etc. These are great for creating license keys.
Of course it isn't enough to have a key you also need to associate it to a product and program the application to lock down based on a key system you've created.
I have messed around with creating my own solution but in the end when it came down to making money with the software I had to cave and get a commercial solution that would save me time in generating keys and managing my product line...
My favorite so far has been License Vault from SpearmanTech but I've also tried FlexNet (costly), XHEO (way too much programming required), and SeriousBit Ellipter.
I chose the License Vault product in the end because I would get it for much cheaper than the others and it simply had more to offer me as we do most of our work in .NET 3.5.
It is difficult to provide a good answer without knowing anything about your product and customers. For enterprise software sold to technical people you can use a fairly complex licensing system and they'll figure it out. For consumer software sold to the barely computer-literate, you need a much simpler system.
In general, I've adopted the practice of making a very simple system that keeps the honest people honest. Anyone who really wants to steal your software will find a way around any DRM system.
In the past I've used Armadillo (now Software Passport) for C++ projects. I'm currently using XHEO for C# projects.
If your product requires the use of the internet, then you can generate a unique id for the machine and use that to check with a license web service.
If it does not, I think going with a commercial product is the way to go. Yes, they can be hacked, but for the person who is absolutely determined to hack it, it is unlikely they ever would have paid.
We have used: http://www.aspack.com/asprotect.aspx
We also use a function call in their sdk product that gives us a unique id for a machine.
Good company although clearly not native English speakers since their first product was called "AsPack".