I have to organize a struct or an array of some people's first name, last name, and age then organize them in alphabetical order writing them from an input file to an output file using the string library.
This is partically for a Lab that I have tomorrow afternoon in which my TA said we might be asked to complete the task I stated above. I'm trying to get opinions or suggestions from individuals whom are much more experienced in C than myself so I'll be more prepared than I was last lab where I didn't so hot.
I'm stuck, any suggestions on where I should start?
Well, reading and writing the input and output should be a no brainer (you open the file, read/write and then close it when you're done with it).
The trick is to get those pesky strings sorted.
The way I look at strnig sorting is an array of arrays sorting. The first array-level is an iteration over all the structs while the second is an iteration over all the letters of a name (note that some names are longer then others)
1. First you sort the first column - all the first letters
2. Then all the sub-columns - within each first letter group you sort the second letter
3. Repeat 1&2 until you run out of letters to sort.
The sorting of an array of letters is the same as sorting an array of byte (AKA chars), and if you're working with an algorithm like bubble sort, working on a sub array is seamless.
Hope this idea helps
You can use strcmp from string.h library, which compares two strings and returns the result.
You should read every string from input file and keep in memory. You will probably use a list for that, for example linked list. While adding you can organize them in alphabetical order. Here is an example:
Take the an element in list.
Compare it with new string.
Where should it be placed? if it should be before the current element, place it before the current element. If it should be after current element, do the same for next element.
This is a simple algorithm for your problem.
The tricky part is to implement the comparation of two strings. the strcmp function only tells you whether the strings are identical. You can loop through the char array to do it.
Once this is done, the remaining sorting is simple.
Related
I have an array with quasar URLs stored in it
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0269/spec-0269-51581-0467.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0329/spec-0329-52056-0059.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/104/spectra/2957/spec-2957-54807-0164.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0342/spec-0342-51691-0089.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/2881/spec-2881-54502-0508.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0302/spec-0302-51616-0435.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/2947/spec-2947-54533-0371.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0301/spec-0301-51942-0460.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/104/spectra/2962/spec-2962-54774-0461.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/2974/spec-2974-54592-0185.fits
I want to sort out the URL array on basis of the number next to spec- and not using alphabetic order. I sorted the array with sort but it didn't help as it always took the 3rd row and 2nd last row to the top because they have a 1.
I'd like to have an output like this
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0269/spec-0269-51581-0467.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0301/spec-0301-51942-0460.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0302/spec-0302-51616-0435.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0329/spec-0329-52056-0059.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0342/spec-0342-51691-0089.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/2881/spec-2881-54502-0508.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/2947/spec-2947-54533-0371.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/104/spectra/2957/spec-2957-54807-0164.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/104/spectra/2962/spec-2962-54774-0461.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/2974/spec-2974-54592-0185.fits
If you will always have this pattern, you can try:
fileName = strsplit(myUrl, '/')(end)
number = strsplit(fileName(5:end), '.')(0)
Gonna walk you through this cause understanding is everything...
We start with
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0269/spec-0269-51581-0467.fits
First we split the URL on the / characters. This will return a vector of strings split up from this character. Since the number to sort on resides after the final /, we can pass end to grab the last one. Now we have
spec-0269-51581-0467.fits
Next, let's remove that pesky spec- from the number. This step isn't actually necessary, since it's constant across all the URLs, but let's just do it for fun. We can use Matlab's substring to grab the characters after the -, using fileName(5:end). This will create a string starting with the 5th character (in this case, a 0) and continue to the end. Great, now we have
0269-51581-0467.fits
Looking good! Again, this part isn't completely necessary either, but just in case for whatever reason you may need to, I've included it. We can use the strsplit function again, but this time split on the ., and grab the first element by passing a 0. Now, we have
0269-51581-0467
Go ahead and sort that little guy and you're good to go!
I have a large document that I want to build an index of for word searching. (I hear this type of array is really called a concordances). Currently it takes about 10 minutes. Is there a fast way to do it? Currently I iterate through each paragraph and if I find a word I have not encountered before, I add it too my word array, along with the paragraph number in a subsidiary array, any time I encounter that same word again, I add the paragraph number to the index. :
associativeArray={chocolate:[10,30,35,200,50001],parsnips:[5,500,100403]}
This takes forever, well, 5 minutes or so. I tried converting this array to a string, but it is so large it won't work to include in a program file, even after removing stop words, and would take a while to convert back to an array anyway.
Is there a faster way to build a text index other than linear brute force? I'm not looking for a product that will do the index for me, just the fastest known algorithm. The index should be accurate, not fuzzy, and there will be no need for partial searches.
I think the best idea is to build a trie, adding a word at the time of your text, and having for each leaf a List of location you can find that word.
This would not only save you some space, since storing word with similar prefixes will require way less space, but the search will be faster too. Search time is O(M) where M is the maximum string length, and insert time is O(n) where n is the length of the key you are inserting.
Since the obvious alternative is an hash table, here you can find some more comparison between the two.
I would use a HashMap<String, List<Occurrency>> This way you can check if a word is already in yoz index in about O(1).
At the end, when you have all word collected and want to search them very often, you might try to find a hash-function that has no or nearly no collisions. This way you can guarantee O(1) time for the search (or nearly O(1) if you have still some collisions).
Well, apart from going along with MrSmith42's suggestion of using the built in HashMap, I also wonder how much time you are spending tracking the paragraph number?
Would it be faster to change things to track line numbers instead? (Especially if you are reading the input line-by-line).
There are a few things unclear in your question, like what do you mean in "I tried converting this array to a string, but it is so large it won't work to include in a program file, even after removing stop words, and would take a while to convert back to an array anyway."?! What array, is your input in form of array of paragraphs or do you mean the concordance entries per word, or what.
It is also unclear why your program is so slow, probably there is something inefficient there - i suspect is you check "if I find a word I have not encountered before" - i presume you look up the word in the dictionary and then iterate through the array of occurrences to see if paragraph# is there? That's slow linear search, you will be better served to use a set there (think hash/dictionary where you care only about the keys), kind of
concord = {
'chocolate': {10:1, 30:1, 35:1, 200:1, 50001:1},
'parsnips': {5:1, 500:1, 100403:1}
}
and your check then becomes if paraNum in concord[word]: ... instead of a loop or binary search.
PS. actually assuming you are keeping list of occurrences in array AND scanning the text from 1st to last paragraph, that means arrays will form sorted, so you only need to check the very last element there if word in concord and paraNum == concord[word][-1]:. (Examples are in pseudocode/python but you can translate to your language)
I'm sure there may be a matlab function to do this but I'm required to write my own. As the title says, I need to write a function which when given a single cell array of strings, returns a structure array, containing the same strings but in alphabetical order. Furthermore, the 'count' fields must contain the number of times that that particular string has occurred eg
z=myfunction({'bag','dig','bag'})
ans =
str: 'bag'
count = 2
Ideally, the method should have an expected number of comparisons for n strings of O(n log n)
Assuming you don't want to use standard functions like sort or unique this is not an easy question. Furthermore it is more about math then about programming.
If you just want to practice programming, try implementing something simple like bubble sort.
This will not achieve O(n log n) however, if you really want that look into merge sort for example.
Several options are explained roughly here, and with a bit of searching it should not be hard to find what you need.
I'm writing a blackjack program and would like to be able to write the top scores to a file. Obviously the first few times the program is run it will populate the high score list with every score, after which I'd like only scores that are greater than the number 10 score to be added, and the initial number 10 score (now number 11) to be deleted. I've been thinking of using a linked list like this:
struct highScore
{
char name;
int score;
highScore *next;
};
My knowledge of linked lists is pretty basic so I intend on doing my research before I can actually code it.
I'm wondering if I'm overcomplicating this and if there's a simpler way to get the job done, or am I on the right track here?
I do not think that linked list is best approach. Do not take me wrong, but linked list is used to add and to remove at random position, most commonly at beginning or end. Problem whit linked list is that it has same inefficiency to find element as ordinary array, becuase it has to check each element until it finds one.
I think that using array will make your program having same efficiency and less complicated code.
Comparing both data structures (linked list and array):
Finding element is in both case proportional to length in average.
Inserting element at end is constant in both structures.
Linked lists are efficient to add element at any position, but position has to be found, so this compensates whit problem that array has, where you have to move right elements by one position on right.
I think that something like
typedef struct
{
int score;
char name[51]; //name should hold 50 chars and null
} highScore;
highScore scores[10]; // voilla 10 scores
//or if you need little bit of complications
//and more speed when moving array during insertion
highScore * scorespointer[10]; //array of 10 pointers on score
Will make simple solution whit same efficiency as linked list.
If you implement your structure as linked list you will not be able to serialize this directly in file, because you can store only name and score, but not pointer on next highScore. Pointers can not be stored in file, because they are dynamical allocated and are valid only during program lifetime.
If you are disappointed whit this solution, you can check heap and tree for ultimate efficency for more than few (like in your case) scores.
I think your approach looks good. Using a linked list for this makes sense, or even a queue (which can be implemented using a linked list).
A linked list is certainly a good way to do this. It keeps the data related to a score together and with some basic bookkeeping you can keep it up to date. When you keep it sorted, you can easily insert new scores at the right place and remove too low scores from the linked list.
An alternative is to maintain two arrays with 10 elements each, one for the names and one for the scores but this has several drawbacks:
If you're going to add more information later, you'll need another array and more importantly,
Any operation that you do on one array also needs to be done on the other which can cause trouble later on (bugs).
So, in short, your solution looks good.
Just to understand your requirement:
the program runs and a score is obtained
you read the scores file into a list and check the new score against the old top scores
you modify the list and write it back to the file
program exits.
If this is correct, then a linked list seems like overkill. An array would do the job fine. Read each entry from the file into the array, inserting the new score as you go. You could forego the array altogether and just write out to a file each score you read in, again inserting the new score as you go. Then rename the files (scores.txt becomes scores.bak, scores.new becomes scores.txt) - this way you get a backup of the scores file too. Note that your file is best written as text (ie. not binary).
As has been stated, your structure will not compile as pure C code. If you want to write in C as opposed to C++, you should make your compiler reject C++. How you do that depends upon your platform, but it might just be a matter of naming the file scores.c instead of scores.cpp.
I have been given an assignment to write a program that reads in a number of assignment marks from a text file into an array, and then counts how many marks there are within particular brackets, i.e. 40-49, 50-59 etc. A value of -1 in the text file means that the assignment was not handed in, and a value of 0 means that the assignment was so bad that it was ungraded.
I could do this easily using a couple of for loops, and then using if statements to check the values whilst incrementing appropriate integers to count the number of occurences, but in order to get higher marks I need to implement the program in a "better" way. What would be a better, more efficient way to do this? I'm not looking for code right now, just simply "This is what you should do". I've tried to think of different ways to do it, but none of them seem to be better, and I feel as if I'm just trying to make it complicated for the sake of it.
I tried using the 2D array that the values are stored in as a parameter of a function, and then using the function to print out the number of occurences of the particular values, but I couldn't get this to compile as my syntax was wrong for using a 2D array as a parameter, and I'm not too sure about how to do this.
Any help would be appreciated, thanks.
Why do you need a couple for loops? one is enough.
Create an array of size 10 where array[0] is marks between 0-9, array[1] is marks between 10-19, etc. When you see a number, put it in the appropriate array bucket using integer division, e.g. array[(int)mark/10]++. when you finish the array will contain the count of the number of marks in each bucket.
As food for thought, if this is a school assignment, you might want to apply other things you have learned in the course.
Did you learn sorting yet? Maybe you could sort the list first so that you are not iterating over the array several times. You can just go over it once, grab all the -1's, and spit out how many you have, then grab all the ones in the next bracket and so on.
edit: that is of course, assuming that you are using a 1d array.