C-Input/Output From File-Insertion Sort - c

How can I do this programming?
Can you give many hint or advice me?
c: Read the file and get the words to be alphabetic sorted (I did reading,but I didn't sorting)

It looks like the problem is worded poorly; is c supposed to direct you to read the words into an unsorted list? That would make sense to me.
Anyway, design your insertionsort function to match the prototype of the standard library's qsort. This way you can reuse your code and move the logic for comparing two words out of your sort function. Determining whether a word "comes before" another word is trivial.
For calculating the running time of your algorithm, take a look at the clock function. This does not return the running time of your program but be a better indicator of how much CPU time your sorting algorithm took. A good way to minimize the running time of your program is to refrain from making system calls and heap allocations in your loops, if possible. Note that insertion sort has a very bad worst-case time complexity but is very good for almost-sorted data. Selecting the right sorting algorithm for your data set can make a big difference.

Related

Why bubble sort is not efficient?

I am developing backend project using node.js and going to implement sorting products functionality.
I researched some articles and there were several articles saying bubble sort is not efficient.
Bubble sort was used in my previous projects and I was surprised why it is bad.
Could anyone explain about why it is inefficient?
If you can explain by c programming or assembler commands it would be much appreciated.
Bubble Sort has O(N^2) time complexity so it's garbage for large arrays compared to O(N log N) sorts.
In JS, if possible use built-in sort functions that the JS runtime might be able to handle with pre-compiled custom code, instead of having to JIT-compile your sort function. The standard library sort should (usually?) be well-tuned for the JS interpreter / JIT to handle efficiently, and use an efficient implementation of an efficient algorithm.
The rest of this answer is assuming a use-case like sorting an array of integers in an ahead-of-time compiled language like C compiled to native asm. Not much changes if you're sorting an array of structs with one member as the key, although cost of compare vs. swap can vary if you're sorting char* strings vs. large structs containing an int. (Bubble Sort is bad for any of these cases with all that swapping.)
See Bubble Sort: An Archaeological Algorithmic Analysis for more about why it's "popular" (or widely taught / discussed) despite being one the worst O(N^2) sorts, including some accidents of history / pedagogy. Also including an interesting quantitative analysis of whether it's actually (as sometimes claimed) one of the easiest to write or understand using a couple code metrics.
For small problems where a simple O(N^2) sort is a reasonable choice (e.g. the N <= 32 element base case of a Quick Sort or Merge Sort), Insertion Sort is often used because it has good best-case performance (one quick pass in the already-sorted case, and efficient in almost-sorted cases).
A Bubble Sort (with an early-out for a pass that didn't do any swaps) is also not horrible in some almost-sorted cases but is worse than Insertion Sort. But an element can only move toward the front of the list one step per pass, so if the smallest element is near the end but otherwise fully sorted, it still takes Bubble Sort O(N^2) work. Wikipedia explains Rabbits and turtles.
Insertion Sort doesn't have this problem: a small element near the end will get inserted (by copying earlier elements to open up a gap) efficiently once it's reached. (And reaching it only requires comparing already-sorted elements to determine that and move on with zero actual insertion work). A large element near the start will end up moving upwards quickly, with only slightly more work: each new element to be examined will have to be inserted before that large element, after all others. So that's two compares and effectively a swap, unlike the one swap per step Bubble Sort would do in it's "good" direction. Still, Insertion Sort's bad direction is vastly better than Bubble Sort's "bad" direction.
Fun fact: state of the art for small-array sorting on real CPUs can include SIMD Network Sorts using packed min/max instructions, and vector shuffles to do multiple "comparators" in parallel.
Why Bubble Sort is bad on real CPUs:
The pattern of swapping is probably more random than Insertion Sort, and less predictable for CPU branch predictors. Thus leading to more branch mispredicts than Insertion Sort.
I haven't tested this myself, but think about how Insertion Sort moves data: each full run of the inner loop moves a group of elements to the right to open up a gap for a new element. The size of that group might stay fairly constant across outer-loop iterations so there's a reasonable chance of predicting the pattern of the loop branch in that inner loop.
But Bubble Sort doesn't do so much creation of partially-sorted groups; the pattern of swapping is unlikely to repeat1.
I searched for support for this guess I just made up, and did find some: Insertion sort better than Bubble sort? quotes Wikipedia:
Bubble sort also interacts poorly with modern CPU hardware. It produces at least twice as many writes as insertion sort, twice as many cache misses, and asymptotically more branch mispredictions.
(IDK if that "number of writes" was naive analysis based on the source, or looking at decently optimized asm):
That brings up another point: Bubble Sort can very easily compile into inefficient code. The notional implementation of swapping actually stores into memory, then re-reads that element it just wrote. Depending on how smart your compiler is, this might actually happen in the asm instead of reusing that value in a register in the next loop iteration. In that case, you'd have store-forwarding latency inside the inner loop, creating a loop-carried dependency chain. And also creating a potential bottleneck on cache read ports / load instruction throughput.
Footnote 1: Unless you're sorting the same tiny array repeatedly; I tried that once on my Skylake CPU with a simplified x86 asm implementation of Bubble Sort I wrote for this code golf question (the code-golf version is intentionally horrible for performance, optimized only for machine-code size; IIRC the version I benchmarked avoided store-forwarding stalls and locked instructions like xchg mem,reg).
I found that with the same input data every time (copied with a few SIMD instructions in a repeat loop), the IT-TAGE branch predictors in Skylake "learned" the whole pattern of branching for a specific ~13-element Bubble Sort, leading to perf stat reporting under 1% branch mispredicts, IIRC. So it didn't demonstrate the tons of mispredicts I was expecting from Bubble Sort after all, until I increased the array size some. :P
Bubble sort runs in O(n^2) time complexity. Merge sort takes O(n*log(n)) time, while quick sort takes O(n*log(n)) time on average, thus performing better than bubble sort.
Refer to this: complexity of bubble sort.

Distinguishing between sorting algorithms

Is there any way to distinguish between sorting algorithms from their executable files? I found this problem in a varsity programming mailing list that goes like this: Say I have a number of executable files that sort an array of data using different algorithms. I know what algorithms are used to code those executables, but I don't know which algorithm was used in which executable file. The algorithms used are:
UNAWARE BUBBLE SORT
BUBBLE SORT WITH EARLY EXIT
TRADITIONAL INSERTION SORT
INSERTION SORT ON LIST
INSERTION SORT WITH BINARY SEARCH
TRADITIONAL SELECTION SORT
MERGE SORT
TRADITIONAL QUICK SORT
QUICK SORT MEDIAN OF THREE
RANDOMIZED QUICK SORT
SHELL SORT TIMES 4
BOGO SORT
RADIX SORT LSD FIRST
BUCKET SORT
COUNTING SORT
You can check their asymptotic behavior by giving them larger and larger input, but many of the listed algorithms fall in the same complexity classes, so you wouldn't be able to distinguish between, say merge sort and quick sort based on this alone.
To break some of these degeneracies you could also look at the memory usage of the different executables, to continue with the merge sort and quick sort example you would see that merge sort would require O(n) additional space while quick sort would only need O(log n) additional space (stack size) to perform the sort.
You might be able to deduce something from giving them degenerate input such as a megabyte of zeros or a megabyte of reversed strings for instance. But you wouldn't be able to do more than educated guesses.
(Excellent comments below. Making this a community wiki, feel free to edit.)
Change the kinds of data and the amount of data you input and compare execution times.
Changing the nature of the data (repeating small numbers (few digits), vs widely distributed data with no duplicates) helps you determine whether a sorting algorithm is comparison-based, (radix/bucket sort vs comparison-based sorts). For example, sorting 1000000 1-digit numbers is super fast with bucket sort since it scales mainly off of the number of digits, but slower for comparison-based sorts that scale mainly off the data set size.
You could also tailor the data to perform better for some algorithms over others, like using best case scenario and worst case scenario for the various algorithms and look for the .exe with the most dramatic change in execution time.
For example, to distinguish between insert sort and selection sort, use an almost sorted result set like (2, 3, ...98, 99, 1). Insertion sort will do one insert-shift and then the next check will notice that the list is sorted. This will take almost no time. Select sort will have to swap at every index, since the minimum will always be at the final index, and this will take a long time.
use the following command in CMD you will find the processing time for each codes with which we can order them.
echo %time%
filename.exe
echo %time%

Which sort takes O(n) in the given condition

I am preparing for a competition and stumbled upon this question: Considering a set of n elements which is sorted except for one element that appears out of order. Which of the following takes O(n) time?
Quick Sort
Heap Sort
Merge Sort
Bubble Sort
My reasoning is as follows:
I know Merge sort takes O(nlogn) even in best case so its not the answer.
Quick sort too will take O(n^2) since the array is almost sorted.
Bubble sort can be chosen but only if we modify it slightly to check whether a swap has been made in a pass or not.
Heap sort can be chosen as if we create the min heap of a sorted array it takes O(n) time since only one guy is not in place so he takes logn.
Hence I think its Heap sort. Is this reasoning correct? I would like to know if I'm missing something.
Let's start from the bubble sort. From my experience most resources I have used defined bubble sort with a stopping condition of not performing any swaps in an iteration (see e.g. Wikipedia). In this case indeed bubble sort will indeed stop after a linear number of steps. However, I remember that I have stumbled upon descriptions that stated a constant number of iterations, which makes your case quadratic. Therefore, all I can say about this case is "probably yes"—it depends on the definition used by the judges of the competition.
You are right regarding merge sort and quick sort—the classical versions of both algorithms enforce Θ(n log n) behavior on every input.
However, your reasoning regarding heap sort seems incorrect to me. In a typical implementation of heap sort, the heap is being built in the order opposite to the desired final order. Therefore, if you decide to build a min-heap, the outcome of the algorithm will be a reversed order, which—I guess—is not the desired one. If, on the other hand, you decide to build a max-heap, heap sort will obviously spend lots of time sifting elements up and down.
Therefore, in this case I'd go with bubble sort.
This is a bad question because you can guess which answer is supposed to be right, but it takes so many assumptions to make it it actually right that the question is meaningless.
If you code bubblesort as shown on the Wikipedia page, then it will stop in O(n) if the element that's out of order is "below" its proper place with respect to the sort iteration. If it's above, then it moves no more than one position toward its proper location on each pass.
To get the element unconditionally to its correct location in O(n), you'd need a variation of bubblesort that alternately makes passes in each direction.
The conventional implementations of the other sorts are O(n log n) on nearly sorted input, though Quicksort can be O(n^2) if you're not careful. A proper implementation with a Dutch National Flag partition is required to prevent bad behavior.
Heapsort takes only O(n) time to build the heap, but Theta(n log n) time to pull n items off the heap in sorted order, each in Theta(log n) time.

How to solve problems that have many distinct levels (they are recursive) in c

I would like to know how to handle problems where there is a process which is used in many different 'levels' of a problem in c, preferable in an 'idiomatic' way.I know I did not explain this well enough, so let me give an example:
Consider the general problem of making a game solver, which is supposed to print the best next move.I think that it should check all possible moves in a for loop and see if it is a winning move(in this round) if it is, return the move, otherwise check every possible move the opponent can play against your move (for loop) and call the function to find the best move again.
However, I find that this approach has some limitations, such as performance (the program will spend it's time running boilerplate code required to call the functions etc)
and limited flexibility , since the function will have to find a method to communicate with the caller how good a move was found.That is, if it could be done at all.
bestmove()
{
for (;i<maxmove;i++)
{
if(checkifwinning(moves[i])) return;
for (;n<maxopponentmove;n++)
{
bestmove();
}
}
I have been messing with haskell for a while now, so I am afraid that my mind is set on seeking recursive solutions.I hope that you can show me a way to write this function in a 'c native' way.
The C language permits recursive functions. However, it has no notion of tail-recursive calls (but in some occasions, recent GCC compilers are able to optimize some tail-recursive calls into purely iterative machine code).
When coding recursive functions you should avoid too deep recursion and too big local call frames (so use heap memory).
You are talking about searching a game tree to find a best move; you could use a standard algorithm like minimax. Your approach seems to be a depth-first search of the tree which terminates at the first winning move found; note that this won't find the shortest path to a winning move nor does it guard against the opponent's winning.
There are ways to speed up the searching of game trees such as alpha-beta pruning. Such a standard algorithm is the way to go - much better than worrying about the overhead of calling functions in C, etc. C function calls are not expensive. Beware of such "optimizations" when writing C code - the optimizer is likely to be better at such things anyway. At the very least, first write a version in the most straightforward way so you have something to benchmark against. Your job is to find a good algorithm.
I think what you really need is a branch-and-bound like algorithm: keep a list of "open" moves (i.e., not yet considered) and a list of "closed" moves (i.e., already considered). In C this would best be implemented having two queues (open and closed). Then you would have an algorithm as follows:
while (open is not empty)
{
pop gameState from open
if (gameState in closed)
continue;
push gameState to closed
for (eachMove)
{
computeNewState();
addStateToOpen();
}
}
There are several advantages to this compared to a recursive approach:
you are sure that every game state is only considered once
you do not "overwhelm" the stack

How to best sort a portion of a circular buffer?

I have a circular, statically allocated buffer in C, which I'm using as a queue for a depth breadth first search. I'd like have the top N elements in the queue sorted. It would be easy to just use a regular qsort() - except it's a circular buffer, and the top N elements might wrap around. I could, of course, write my own sorting implementation that uses modular arithmetic and knows how to wrap around the array, but I've always thought that writing sorting functions is a good exercise, but something better left to libraries.
I thought of several approaches:
Use a separate linear buffer - first copy the elements from the circular buffer, then apply qsort, then copy them back. Using an additional buffer means an additional O(N) space requirement, which brings me to
Sort the "top" and "bottom" halve using qsort, and then merge them using the additional buffer
Same as 2. but do the final merge in-place (I haven't found much on in-place merging, but the implementations I've seen don't seem worth the reduced space complexity)
On the other hand, spending an hour contemplating how to elegantly avoid writing my own quicksort, instead of adding those 25 (or so) lines might not be the most productive either...
Correction: Made a stupid mistake of switching DFS and BFS (I prefer writing a DFS, but in this particular case I have to use a BFS), sorry for the confusion.
Further description of the original problem:
I'm implementing a breadth first search (for something not unlike the fifteen puzzle, just more complicated, with about O(n^2) possible expansions in each state, instead of 4). The "bruteforce" algorithm is done, but it's "stupid" - at each point, it expands all valid states, in a hard-coded order. The queue is implemented as a circular buffer (unsigned queue[MAXLENGTH]), and it stores integer indices into a table of states. Apart from two simple functions to queue and dequeue an index, it has no encapsulation - it's just a simple, statically allocated array of unsigned's.
Now I want to add some heuristics. The first thing I want to try is to sort the expanded child states after expansion ("expand them in a better order") - just like I would if I were programming a simple best-first DFS. For this, I want to take part of the queue (representing the most recent expanded states), and sort them using some kind of heuristic. I could also expand the states in a different order (so in this case, it's not really important if I break the FIFO properties of the queue).
My goal is not to implement A*, or a depth first search based algorithm (I can't afford to expand all states, but if I don't, I'll start having problems with infinite cycles in the state space, so I'd have to use something like iterative deepening).
I think you need to take a big step back from the problem and try to solve it as a whole - chances are good that the semi-sorted circular buffer is not the best way to store your data. If it is, then you're already committed and you will have to write the buffer to sort the elements - whether that means performing an occasional sort with an outside library, or doing it when elements are inserted I don't know. But at the end of the day it's going to be ugly because a FIFO and sorted buffer are fundamentally different.
Previous answer, which assumes your sort library has a robust and feature filled API (as requested in your question, this does not require you to write your own mod sort or anything - it depends on the library supporting arbitrary located data, usually through a callback function. If your sort doesn't support linked lists, it can't handle this):
The circular buffer has already solved this problem using % (mod) arithmetic. QSort, etc don't care about the locations in memory - they just need a scheme to address the data in a linear manner.
They work as well for linked lists (which are not linear in memory) as they do for 'real' linear non circular arrays.
So if you have a circular array with 100 entries, and you find you need to sort the top 10, and the top ten happen to wrap in half at the top, then you feed the sort the following two bits of information:
The function to locate an array item is (x % 100)
The items to be sorted are at locations 95 to 105
The function will convert the addresses the sort uses into an index used in the real array, and the fact that the array wraps around is hidden, although it may look weird to sort an array past its bounds, a circular array, by definition, has no bounds. The % operator handles that for you, and you might as well be referring to the part of the array as 1295 to 1305 for all it cares.
Bonus points for having an array with 2^n elements.
Additional points of consideration:
It sounds to me that you're using a sorting library which is incapable of sorting anything other than a linear array - so it can't sort linked lists, or arrays with anything other than simple ordering. You really only have three choices:
You can re-write the library to be more flexible (ie, when you call it you give it a set of function pointers for comparison operations, and data access operations)
You can re-write your array so it somehow fits your existing libraries
You can write custom sorts for your particular solution.
Now, for my part I'd re-write the sort code so it was more flexible (or duplicate it and edit the new copy so you have sorts which are fast for linear arrays, and sorts which are flexible for non-linear arrays)
But the reality is that right now your sort library is so simple you can't even tell it how to access data that is non linearly stored.
If it's that simple, there should be no hesitation to adapting the library itself to your particular needs, or adapting your buffer to the library.
Trying an ugly kludge, like somehow turning your buffer into a linear array, sorting it, and then putting it back in is just that - an ugly kludge that you're going to have to understand and maintain later. You're going to 'break' into your FIFO and fiddle with the innards.
-Adam
I'm not seeing exactly the solution you asked for in c. You might consider one of these ideas:
If you have access to the source for your libc's qsort(), you might copy it and simply replace all the array access and indexing code with appropriately generalized equivalents. This gives you some modest assurance that the underling sort is efficient and has few bugs. No help with the risk of introducing your own bugs, of course. Big O like the system qsort, but possibly with a worse multiplier.
If the region to be sorted is small compared to the size of the buffer, you could use the straight ahead linear sort, guarding the call with a test-for-wrap and doing the copy-to-linear-buffer-sort-then-copy-back routine only if needed. Introduces an extra O(n) operation in the cases that trip the guard (for n the size of the region to be sorted), which makes the average O(n^2/N) < O(n).
I see that C++ is not an option for you. ::sigh:: I will leave this here in case someone else can use it.
If C++ is an option you could (subclass the buffer if needed and) overload the [] operator to make the standard sort algorithms work. Again, should work like the standard sort with a multiplier penalty.
Perhaps a priority queue could be adapted to solve your issue.'
You could rotate the circular queue until the subset in question no longer wraps around. Then just pass that subset to qsort like normal. This might be expensive if you need to sort frequently or if the array element size is very large. But if your array elements are just pointers to other objects then rotating the queue may be fast enough. And in fact if they are just pointers then your first approach might also be fast enough: making a separate linear copy of a subset, sorting it, and writing the results back.
Do you know about the rules regarding optimization? You can google them (you'll find a few versions, but they all say pretty much the same thing, DON'T).
It sounds like you are optimizing without testing. That's a huge no-no. On the other hand, you're using straight C, so you are probably on a restricted platform that requires some level of attention to speed, so I expect you need to skip the first two rules because I assume you have no choice:
Rules of optimization:
Don't optimize.
If you know what you are doing, see rule #1
You can go to the more advanced rules:
Rules of optimization (cont):
If you have a spec that requires a certain level of performance, write the code unoptimized and write a test to see if it meets that spec. If it meets it, you're done. NEVER write code taking performance into consideration until you have reached this point.
If you complete step 3 and your code does not meet the specs, recode it leaving your original "most obvious" code in there as comments and retest. If it does not meet the requirements, throw it away and use the unoptimized code.
If your improvements made the tests pass, ensure that the tests remain in the codebase and are re-run, and that your original code remains in there as comments.
Note: that should be 3. 4. 5. Something is screwed up--I'm not even using any markup tags.
Okay, so finally--I'm not saying this because I read it somewhere. I've spent DAYS trying to untangle some god-awful messes that other people coded because it was "Optimized"--and the really funny part is that 9 times out of 10, the compiler could have optimized it better than they did.
I realize that there are times when you will NEED to optimize, all I'm saying is write it unoptimized, test and recode it. It really won't take you much longer--might even make writing the optimized code easier.
The only reason I'm posting this is because almost every line you've written concerns performance, and I'm worried that the next person to see your code is going to be some poor sap like me.
How about somthing like this example here. This example easely sorts a part or whatever you want without having to redefine a lot of extra memory.
It takes inly two pointers a status bit and a counter for the for loop.
#define _PRINT_PROGRESS
#define N 10
BYTE buff[N]={4,5,2,1,3,5,8,6,4,3};
BYTE *a = buff;
BYTE *b = buff;
BYTE changed = 0;
int main(void)
{
BYTE n=0;
do
{
b++;
changed = 0;
for(n=0;n<(N-1);n++)
{
if(*a > *b)
{
*a ^= *b;
*b ^= *a;
*a ^= *b;
changed = 1;
}
a++;
b++;
}
a = buff;
b = buff;
#ifdef _PRINT_PROGRESS
for(n=0;n<N;n++)
printf("%d",buff[n]);
printf("\n");
}
#endif
while(changed);
system( "pause" );
}

Resources