How to chain together points in an array - arrays

I have a series of points with lengths and rotations like this:
I need to create separate chains from points whose lines overlap but I’m having real trouble doing this efficiently.
I have an array of simple Point objects, in no particular order, and I can loop through them and test them with a simple "intersect" function. I need to end up with an array of chains, each with an ordered list of points. (Or another way of representing the chains).
At the moment every avenue I explore seems to involve a convoluted hack of arrays, nudges and fudges. Having never studied Computer Science I wonder if there is some sort of data structure or technique that would lend itself well to this sort of thing.
Could anyone point me in the right direction to achieving this? Pseudocode is fine (or indeed any language), although I am coding in Processing/Java if that helps.
Many thanks,
Josh

You can use union-find algorithm to find joined sets.
If your sets always are well-defined (there are no multiple intersections and so on), then some modifications seem possible to build chains during join process:
for every set, besides a 'representative', keep two extreme segments, and modify them when joining.

Related

Why would we need to sort an array?

This might be a silly question; for all of you who have been in this field for a while, nevertheless, I would still appreciate your insight on the matter - why does an array need to be sorted, and in what scenario would we need to sort an array?
So far it is clear to me that the whole purpose of sorting is to organize the data in such a way that will minimize the searching complexity and improve the overall efficiency of our program, though I would appreciate it if someone could describe a scenario in which it would be most useful to sort an array? If we are searching for something specific like a number wouldn't the process of sorting an array be equally demanding as the process of just iterating through the array until we find what we are looking for? I hope that made sense.
Thanks.
This is just a general question for my coursework.
A lot of different algorithms work much faster with sorted array, including searching, comparing and merging arrays.
For one time operation, you're right, it is easier and faster to use unsorted array. But as soon as you need to repeat the operation multiple times on the same array, it is much faster to sort the array one time, and then use its benefits.
Even if you are going to change array, you can keep it sorted, again it improves performance of all other operations.
Sorting brings useful structure in a list of values.
In raw data, reading a value tells you absolutely nothing about the other values in the list. In a sorted list, when you read a value, you know that all preceding elements are not larger, and following elements are not smaller.
So to search a raw list, you have no other choice than exhaustive comparison, while when searching a sorted list, comparing to the middle element tells you in which half the searched value can be found and this drastically reduces the number of tests to be performed.
When the list is given in sorted order, you can benefit from this. When it is given in no known order, you have to ponder if it is worth affording the cost of the sort to accelerate the searches.
Sorting has other algorithmic uses than search in a list, but it is always the ordering property which is exploited.

Multiple Keys for exact and approximate lookups in C

I am trying to develop a network resource manager component in C which keeps track of various network elements over TCP/UDP sockets. For this, I use three values :
Hardware Location Number
Service Group Number
Node Number
The rule is that no two elements on a network may have the same set of these three numbers. Thus, each location's identity will be unique on the network. This information needs to be saved in the program (non-persistently) in a way so that given any of the parameters (could be just a single number, or a combination of any two, or all three) the program returns the eligible candidates by performing a quick search.
The addition and deletion should also be efficient, but given that there will be few insertions or deletions after the initial transient phase if they are a bit slower than search, it should be OK. Using trees is one option, but the answer of 'Which one to use?' still eludes me (Not that I know of many, but I look forward to learning newer ones if they serve my purpose).
To do this, I could have three different trees maintained separately with similar nodes pointing to a same structure in memory, but I feel that is inefficient and not compact. I am looking for a unified data set which can handle these variations like multiple keys.
Or I could have a single AVL tree with multiple keys (if that is allowed).
The number of elements in the network is dynamic, so using a 3D array is out of option.
A friend also suggested hashing, but I am not too sure.
Please help.
Hashing seems like a silly choice for this. Perhaps the most significant reason is that you seem interested in approximate lookups. Hashing your values will likely mean iterating through the entire collection to find a group of nodes that have a common prefix, or a similar prefix.
PATRICIA is commonly used in routing tables, and makes itself quite amenable to searching for items that have similar keys. Note that I have found much misleading information about PATRICIA tries, which I've written about here. I found this resource to be particularly helpful.
Similarly to an AVL tree, you'll need to combine the three keys to form one (without hashing, preferably).
unsigned int key[3] = { hardware_location_number, service_group_number, node_number };
/* ^------- Use something like this as your key */

How does a non deterministic turing machine work?

I understand they aren't real and they seem to branch computation whenever there are 2 options, instead of picking one. But, for example, if I say this:
"Non deterministically guess a bijection p of vertices from Graph G to Graph H" (context here is Graph Isomorphism)
What is that supposed to mean? I understand the bijection, but it says "non deterministically guess". If it's guessing, how is that an algorithmic approach? How can it guarantee it's going to work?
They don't, they just sort of illustrate a point. Basically what they do is guess an answer, and check if it's right(deterministically). It's not the guessing the answer part that's important though, it's checking that the answer is right. It's just like saying given an arbitrary solution, is it correct? So for example there are problems that take exponential time to compute, and some of their answers can be checked in polynomial time, but some can't. So what the non-deterministic TM does is it divides those two, the ones that can be checked quickly from the ones that can't. And then this brings up the bigger question, if one group of questions solutions can be verified much quicker than another, can their solutions also be generated quicker? This question hasn't been answered, yet.
There's different ways to picture one. One I find useful is the oracle model. Did you ever see the Far Side cartoon where a derivation on the blackboard has "Here a miracle occurs" as one of the intermediate steps? In this version of a NDTM, when you need to choose something, the oracle writes the correct version on the right part of the tape. (This is taken from Garey and Johnson, Computers and Intractability, their classic book on NP-complete problems.) You aren't allowed to assume you've got the right one, though, and there may not be a correct one.
Therefore, when you non-deterministically guess a bijection, you're getting the correct bijection for your purposes, provided one exists.
It isn't a good basis for an algorithm, since the complexity of implementing a non-deterministic Turing machine is basically exponential in the nondeterministic states, and the algorithmic equivalent of the nondeterministic guess is to try every possible bijection.
From a theoretical point of view, I'd translate it as "If there is a bijection such that....". From an algorithmic point of view, find another book, or another chapter of the same book, since that approach is useless for even moderately large graphs.
I believe what is meant is "non deterministically choose a solution" and then test that the solution is true. Since all possible choices (guesses) are tested, the solution is guaranteed.
A physical implementation of the non-deterministic Turing machine is the DNA computer. For example, here's an outline of how to solve the traveling salesman problem in DNA:
Get/make a bunch of DNA sequences, each with length proportional to the cost of an edge in your graph and sticky ends with sequences uniquely identifying one of the vertices that the edge connects.
Mix them together, with DNA ligase in a big beaker. They'll anneal to each other in sequences that represent every possible path through the graph (ok, not the really long ones).
Remove all the sequences that are missing at least one vertex. To do this, sequentially select for each vertex using hybridization. For example, if "ACGTACA" encodes vertex 1, select for sequences that bind to "TGTACGA". Then repeat this selection for every other vertex.
Sort the remaining sequences by size using gel electrophoresis. Then sequence the shortest one. The sequence encodes the shortest path through your graph.

Explaining benefits of an array to a lay person?

I develop code in our proprietry system using a scripting language that is unique to that system.
Our director has allowed us to request enhancements to this language, which currently lacks user definable arrays.
I need to write a concept brief on why we need arrays and how they can benefit us, however I need to explain it in a fashion that someone who has no understanding of code will understand.
I'm a programmer, therefore I suck at documentation and explaining things in a non-technical manner. I tried banging my head on the desk to see if anything useful would come out but it hasn't. Can anyone help?
I love analogies.
Much easier to have a 100 DVD holder that sits neatly on your floor and holds 100 dvds in order than 100 individual DVDs scattered around your house where you last used them
Especially relevant when you need to move the collection from one place to another or share it with a friend.
What's your application area? To speak the users' language you need to know that. Suppose it's stocks trading: then what to you is an array, to the users may be a portfolio -- get the quotes for several stock at once rather than having to do it repeatedly for one at a time. If your application area is CRM, then the array will let the users check on a group of customers at once, rather than do it one at a time. And so on, and so forth.
In every application area there will be cases in which users may want to deal with a bunch of things at once, it being easier than dealing with one thing at a time. Phrase it in the appropriate vocabulary, and you have the case for arrays!
You might want to see if you can move the business away from your custom scripting environment and into a standard scripting environment like LUA or Python. You might be surprised at how much easier it is to get LUA up and running than it is to :
Support an in house system
Create tools for it (do you have an IDE?)
Train new programmers in it
Live without modern features that you lack the time/skills to impliment.
Key to getting that to happen would be to make LUA interoperable with your standard scripting system or writing a translation from your old scripts to LUA scripts.
The benefit is that it makes the code shorter, and thus less money is spent coding and debugging. You can then present some example code that you could make it shorter had the language supported arrays.
Sounds like you've been asked to create code in the past (or anticipate having to create code in the future), where your job would have been faster/easier/cheaper if the system that you used had arrays.
That's the issue: you want to do more for your director and you need arrays to help you.
Your director will understand the business benefits of you having a better toolkit--you'll be able to do more for him or her. And that's how you increase business efficiency.
Tell your director: I want to improved my productivity for you and our team. To do so, arrays would be very helpful.
I like Alex's answer - it has to be put in terms of the user's problems. What problem (that they care about) can they do with it that they cannot do without it?
I used to teach introductory programming in college, and arrays are simply not something that comes easily to non-programmers. They need to understand some other basics first, like the sequential nature of programs, the lego-block way programs are constructed, the idea of run-time (as opposed to write-time) and really importantly the concept of a variable as a container of a value, and how that is different from its name, and how its contents changes with time while its name does not.
I found a useful way to get into this area is to let them program a very simple, decimal, simulated computer, in "machine language". They get the notion of memory address vs. memory contents, and that address is just a number. That makes it a lot easier to introduce arrays in a more "real" language.
Another approach is to have them work on a kind of problem where they really start wishing they could invent variables on-the-fly. Like they don't want to just have a variable A, but they feel a need for A1, A2, etc. and then they would really like to say Ai where i is a another variable. Once they feel the need for that, then they will grasp arrays. (For example, they could take a simple program that asks for their name and has a simple conversation with them, and then extend it to talk to two people at once, then three, and so on.)
Then, a useful next step is "parallel arrays" which can serve as rudimentary arrays of structures. i.e. N$(i) can be name of student i, while A(i) can be age of student i. This makes the idea useful.
Only then would I dare to start to introduce algorithms like sorting, merging, table lookup, and so on.
I think to fully realize the potential of arrays, you must somehow mention two things:
1) Array Algorithms
Sort, Find, etc. All the basics. Equate this in your business brief as structured data that can organize itself. No extra query language. No variable naming conventions. All you need is good standards.
2) Multi-Dimensional Arrays
The power of arrays seems fully realized to me with matrices. With these you can practically hold limitless data.
Plus, depending on the power of the propriety language you are using, arrays can store objects.
The power of an array is that it allows you to put a group of things together so that you can perform the same operation on all of them with less code.
Sorting is one example of an array operation, and is like having a box of index cards that you are putting in order.
Or if you had a collection of letters that need to go out, being able to write a loop that stamps each letter and then sends it is better than writing out
Take first letter, stamp it.
Mail it.
Takes second letter, stamp it,
Mail it.
Basically anything you'ld use to refer to the first, second, third, fifth, etc, is basically like an array.
And then indexed/hashed arrays are like having an index in the book - you know the author describes the Defenistration of Prague somewhere in the volume, but looking in the index shows that it's on page 255.
Here's the easiest-to-understand benefit: It lets you refer to things by a number. Try to emphasize the importance of this.

Linked lists or hash tables?

I have a linked list of around 5000 entries ("NOT" inserted simultaneously), and I am traversing the list, looking for a particular entry on occasions (though this is not very often), should I consider Hash Table as a more optimum choice for this case, replacing the linked list (which is doubly-linked & linear) ?? Using C in Linux.
If you have not found the code to be the slow part of the application via a profiler then you shouldn't do anything about it yet.
If it is slow, but the code is tested, works, and is clear, and there are other slower areas that you can work on speeding up do those first.
If it is buggy then you need to fix it anyways, go for the hash table as it will be faster than the list. This assumes that the order that the data is traversed does not matter, if you care about what the insertion order is then stick with the list (you can do things with a hash table and keep the order, but that will make the code much tricker).
Given that you need to search the list only on occasion the odds of this being a significant bottleneck in your code is small.
Another data structure to look at is a "skip list" which basically lets you skip over a large portion of the list. This requires that the list be sorted however, which, depending on what you are doing, may make the code slower overall.
Whether using hash table is more optimum or not depends on the use case, which you have not described in detail. But more importantly, make sure the bottleneck of performance is in this part of the code. If this code is called only once in a while and not in a critical path, no use bothering to change the code.
Have you measured and found a performance hit with the lookup? A hash_map or hash table should be good.
If you need to traverse the list in order (not as a part of searching for elements, but say for displaying them) then a linked list is a good choice. If you're only storing them so that you can look up elements then a hash table will greatly outperform a linked list (for all but the worst possible hash function).
If your application calls for both types of operations, you might consider keeping both, and using whichever one is appropriate for a particular task. The memory overhead would be small, since you'd only need to keep one copy of each element in memory and have the data structures store pointers to these objects.
As with any optimization step that you take, make sure you measure your code to find the real bottleneck before you make any changes.
If you care about performance, you definitely should. If you're iterating through the thing to find a certain element with any regularity, it's going to be worth it to use a hash table. If it's a rare case, though, and the ordinary use of the list is not a search, then there's no reason to worry about it.
If you only traverse the collection I don't see any advantages of using a hashmap.
I advise against hashes in almost all cases.
There are two reasons; firstly, the size of the hash is fixed.
Second and much more importantly; the hashing algorithm. How do you know you've got it right? how will it behave with real data rather than test data?
I suggest a balanced b-tree. Always O(log n), no uncertainty with regard to a hash algorithm and no size limits.

Resources