Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
i am looking to create a simple spreadsheet application where users are allowed to customize the rows and columns based on their input.
for example user inputs,
newSS 10 9
which means user wants to create a newSpreadSheet of 10 columns and 9 rows.
an empty spreadsheet should be created and displayed in the console window.
the catch here is that there will always be an extra row and column for which the user has entered as the extra rows and columns are for the headers.
so 10 columns and 9 rows will have 11 columns and 10 rows stored in a 3D array because the first index of every column and row are the headers. it should look something like this.
first row would show: A B C D E F G H I J
first column would show: 1 2 3 4 5 6 7 8 9
as their labels
the spaces in between are their paticular index say A1, A2, B4 etc.. which the user can then edit the values inside.
can anyone teach me how to get started? am a very new c programmer here. am facing some issues with the codes. am able to create a 2d array but i need to alter the first row and first column to print it as the header according to how many rows and columns the users wants. so if 5 columns it will be (A-E)
WORKSHEET *ws_new(int cols, int rows) {
//return NULL;
int r;
int c;
int n[rows][cols];
for (r=1;r<=rows;r++)
{
for(c=1;c<=cols;c++)
{
//printf("%d",cols);
n[r][c]=0;
printf("%d", n[r][c]);
}
printf("\n");
}
}
I am looking to make a simple spreadsheet application
Spreadsheets cannot be a simple application. They have two related components which are complex:
a nice GUI presenting a table of cells (you could also consider a terminal interface, using something like ncurses, but that won't be simpler than providing a simple GUI). If your program don't provide some tabular interface, don't call it a spreadsheet.
a lazy "functional" interpreter, running in each spreadsheet cell (or interpreting a 2D array of formulas); you have some scripting language involved, and your spreadsheet has conceptually some matrix of formulae.
You could look (for inspiration) into the source code of existing free software spreadsheets, e.g. Gnumeric
For the GUI part, use some existing GUI toolkit. Since you want to code that in C, consider GTK.
For the interpreter part, read first the Dragon Book (after having read SICP), something like Programming Languages Pragmatics and probably Lisp In Small Pieces. If you want to parse your own formula language, read more about parsing techniques, recursive descent parsing, and look into bison infix calc example.
You need at least several months, and perhaps several years, of work.
You might embed some existing interpreter (instead of designing and implementing your own one). Consider using Lua or Guile.
Your incomplete code is conceptually wrong: what a spreadsheet needs to have is some matrix of formulae, not just of numbers (like your n array). Each cell contains a formula (or a tagged union of formula and plain number), and you want to keep the AST of that formula. A cell apparently containing a number is a degenerate case of a formula reduced to a constant number.
This answer shows how to implement a numerical matrix as some abstract data type. It could inspire you to represent a spreadsheet as some matrix of formulae. Of course you need to also have a type for the AST of your formulae.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Process of extracting data,
I am analyzing 4000 to 8000 DICOM files using matlab codes. DICOM files are read using dicomread() function. Each DICOM file contains 932*128 photon count data coming from 7 detectors. While reading DICOM files, I convert data into double and stored in 7 cell array variables (from seven detectors). So each cell contains 128*128 photon counting data and cell array contain 4000 to 8000 cells.
Question.
When I save each variable separately, size of each variable is 3GB. So for 7 variables it will be 21GB, Saving them and reading back takes awful lot of time. (RAM of my computer is 4GB)
Is there a way to reduce the size of variable?
Thanks.
Different data type will help. You can save data as float instead of double, as DICOM files have it as float too (from http://northstar-www.dartmouth.edu/doc/idl/html_6.2/DICOM_Attributes.html; Graphic Data). This halves size at no loss. You might want to expand to double when doing operations on data to avoid inaccuracies creeping up.
Additional compression by saving it as uint16 (additional x2 space saving) or even uint8 (x4) might be possible, but I would be wary of this - it might work great in all test cases but make problems when you least expect it.
Cell array is not problematic in terms of speed or size - you will not gain (much) by switching to something else. Your data gobbles up memory, not the cell array itself. If you wish, you can save data in a 128x128x7x8000 float array - it should work just fine too.
But if the number of images (this 4000-8000) can increase at any point, rescaling the array will be a pretty costly operation in terms of space and time. Cell arrays are much easier to extend - 8k values to move around instead of 8k*115k=900M values.
Another option is to separate data in chunks. You probably don't need to be working on all 4000 images at once. You can load 500 images, finish your work on them, move on to next 500 images etc. Batch size obviously depends on your hardware and what processing you do with data, but I guess about 500 could be a pretty reasonable starting point.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Examples include http://www.thewordfinder.com/, http://www.anagram-solver.org/, or the various applications for "cheating" at anagram based games such as Words With Friends.
Where would one begin if they were looking to create an application with similar functionality using Swift?
Since this seems to be getting down-voted, can someone tell me where the best place to ask this question is?
Converting a word list into a prefix tree (especially one that uses extra memory to avoid sparse arrays) should allow one to efficiently check all permutations of an input word, since entire branches of the search can be eliminated quickly.
Let's consider a simplified example. Assume our dictionary contains 3 words: aab, abb, and abc. If we convert it to a prefix tree, that would look like:
a
ab
b
b
c
Where indentation indicates children of the row above. All the words in our dictionary start with an a, followed by either ab or b, and in the latter case followed by either b or c. As you can see, we don't branch on every letter, but rather on alternatives. This helps keep the size of the tree down.
Now, if we iterate over all permutations of our input word cba:
abc
acb
bac
bca
cab
cba
All permutations that start with a letter other than a can be eliminated quickly since our prefix tree has no matching entry at the root level. acb can be eliminated on the second check, meaning only abc would find a match. If you integrate the check into your algorithm for generating permutations, you could match incrementally and avoid generating partial permutations that don't match the current branch of the tree you're exploring.
The comment about avoiding sparse trees is to avoid having to use binary search to match children of the current branch. Rather, an array of 26 elements, indexed by the ascii values of the letters (pick upper or lowercase and be consistent), should allow very quick lookups, at the cost of additional memory.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Suppose you have two strings. Each string has lines, seperated by a newline character. Now you want to compare both strings and then find the best method (shortest number of steps) by only adding or deleting lines of one string, to transform the second string in to the first string.
i.e.
string #2:
abc
def
efg
hello
123
and string #1:
abc
def
efg
adc
123
The best (shortest steps) solution to transform string #2 in to string #1 would be:
remove line at line position 3 ('hello')
add 'abc' after line
position 3
How would one write a generic algorithm to find the quickest, least steps, solutions for transforming one string to another, given that you can only add or remove lines?
This is a classic problem.
For a given set of allowed operations the edit distance between two strings is the minimal number of operations required to transform one into the other.
When the set of allowed operations consists of insertion and deletion only, it is known as the longest common subsequence edit distance.
You'll find everything you need to compute this distance in Longest common subsequence problem.
Note that to answer this question fully, one would have to thoroughly cover the huge subject of graph similarity search / graph edit distance, which I will not do here. I will, however, point you in directions where you can study the problem more thoroughly on your own.
... to find the quickest, least steps, solutions for transforming
one string to another ...
This is a quite common problem known as the (minimum) edit distance problem (or, originally, the specific 'The String-to-String Correction problem', by R. Wagner and M. Fischer), which is a non-trivial problem for the optimal (minimum = least steps) edit distance, which is what you ask for in your question.
See e.g.:
https://en.wikipedia.org/wiki/Edit_distance
https://web.stanford.edu/class/cs124/lec/med.pdf
The minimum edit distance problem for string similarity is in itself a subclass of the more general minimum graph edit distance problem, or graph similarity search (since any string or even sequenced object, as you have noted yourself, can be represented as a graph), see e.g. A survey on graph edit distance.
For details regarding this problem here on SO, refer to e.g. Edit Distance Algorithm and Faster edit distance algorithm.
This should get you started.
I'd tag this problem rather as a math problem (algorithmic instructions) rather than language specific problems, unless someone could guide you to an existing language (C) library for solving edit distance problems.
The fastest way would be to remove all sub-strings, then append (not insert) all new sub-strings; and to do "all sub-strings at once" if you can (possibly leading to a destPointer = sourcePointer approach).
The overhead of minimising the amount of sub-strings removed and inserted will be higher than removing and inserting/appending without checking if its necessary. It's like spending $100 to pay a consultant to determine if you should spend $5.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I want to make an algorithm which will enable the conduct of A/B testing over a variable number of subjects with a variable number of properties per subject.
For example I have 1000 people with the following properties: they come from two departments, some are managers, some are women etc. these properties may increase/decrease according to the situation.
I want to make an algorithm which will split the population in two with the best representation possible in both A and B of all the properties. So i want two groups of 500 people with equal number of both departments in both, equal number of managers and equal number of women. More specifically, I would like to maintain the ratio of each property in both A and B. So if we have 10% managers I want 10% of sample A and Sample B to be managers.
Any pointers on where to begin? I am pretty sure that such an algorithm exists. I have a gut feeling that this may be unsolvable in some cases as there may be an odd number of managers AND women AND Dept. 1.
Make a list of permutations of all a/b variables.
Dept1,Manager,Male
Dept1,Manager,Female
Dept1,Junior,Male
...
Dept2,Junior,Female
Go through all the people and assign them to their respective permutation. Maybe randomise the order of the people first just to be sure there is no bias in the order they are added to each permutation.
Dept1,Manager,Male-> Person1, Person16, Person143...
Dept1,Manager,Female-> Person7, Person10, Person83...
Have a second process that goes through each permutation and assigns half the people to one test group and half to the other. You will need to account for odd numbers of people in the group, but that should be fairly easy to factor in, obviously a larger sample size will reduce the impact of this odd number on the final results.
The algorithm for splitting the groups is simple - take each group of people who have all dimensions in common and assign half to the treatment and half to the control. You don't need to worry about odd numbers of people, whatever statistical test you are using will account for that. If some dimension is so skewed (i.e., there are only 2 females in your entire sample), it may be wise throw the dimension out.
Simple A/B tests usually use a t-test or g-test, but in your case, you'd be better of using an ANOVA to determine the significance of the treatment on each of the individual dimensions.
Edit:
So, this format would work:
featureID charge xcoordinate ycoordinate
1 2 5105.9217 336.125209180674
1 2 5108.7642 336.124751115092
2 0 2434.9217 145.893331325278
But what if I have two columns with multiple value that are linked. Say column quality has a machine and a quality linked and the column looks like this
MachineQuality
[[{1:1224}, {2:3453}], [{1:2242}, {2:4142}]
Now if I want to split that up like I did with the coordinates of the convexhull I would need 2 rows instead of 1. But wouldn't I need 2 rows for every row that is already in (so 4, because there are already 2 extra for the coordinates) like this:
featureID charge xcoordinate ycoordinate quality1 quality2
1 2 5105.9217 336.125209180674 1224 3453
1 2 5105.9217 336.125209180674 2242 4142
1 2 5108.7642 336.124751115092 1224 3453
1 2 5108.7642 336.124751115092 2242 4142
[...]
Would it have to be like this?
I'm very new to R, my knowledge doesn't go much further than knowing how to make a vector and some simple plots. I'm going to use R for an internship project the next couple of months and during this time I will (hopefully) learn some of the ins and outs of R. However, before I start I need to produce the data that I'm going to do the statistics on. I need to know beforehand how I should format my output CSV data so that I can easily read it in once I start my R analysis.
One thing that I've been asked to do is make a CSV file out of the data so that it can be read in by R. The example CSV files for importing with R that I've seen all look like this
featureID Charge value
1 2 10
2 0 9
However, my data mostly consists out of columns for which the values contain multiple values. To clarify:
As an example, my data exists of "features" that, amongs other information has a "convexhull". This convexhull consists of paired x and y coordinates. So what I could have for data is (only showing two coordinates, can be many)
featureID Charge Convexhull
1 2 [[{'y': '336.125209180674'}, {'x': '5105.9217'}], [{'y': '336.124751115092'}, {'x': '5108.7642'}]]
Is it possible to get this in one CSV file, being able to read it in R correctly (so that the paired x and y coordinates are preserved)? If so, how should the CSV file look like? For example, I've seen examples for CSV files with multiple values that look like this:
featureID charge xcoordinate ycoordinate
1 2 5105.9217 336.125209180674
5108.7642 336.124751115092
2 0 2434.9217 145.893331325278
But I can't find if this is easily imported by R.
If this is not doable in one CSV file, are the CSV files easily imported independently, with a primary key idea, like database linking?
The only critical things are that you have a unique character separating your data columns and that each column is the same length. As long as the second row in your last example is filled in that will import fine.
You need to consider what you want to do with the data after it's in R to decide how you might want any other special formatting beforehand. But, as long as the column separator is a unique character and the columns are of equal length then it will import.
(You can violate the unique separator requirement if your entries are wrapped in quotes. And if you want to get really fancy you could "import" almost anything. But if someone's asking you to format the data then they probably want a rectangular data.frame compatible layout. They probably want unique values in each column (no columns of points). But that's between you and them.)
long vs. wide form. Your last example is known as long form (except all cells should be filled in) and your first example is roughly wide form as discussed on the ?reshape page and illustrated in the examples at the end of that page. You likely want to stick with long form. For an alternative see the reshape2 package.
save & load. Note that if you are only writing it out to read it back in to R later (as opposed to communicating it to some other software) you could use save and load which don't require any change to the object at all.
json. Another possibility given the form of your example is that you might want to look at the rjson package .