Creating binary indexed tree - arrays

I read various tutorials on BIT.. topcoder etc ones, all operations are well explained in those, but m not getting the way BIT is created i.e.
Given an array, 1-D, how e have to kake the corresponding BIT for that? ex. if the array is 10 8 5 9 1 what will the BIT for this?
I am a beginner, so apologies if my question sounds stupid but i am not understanding this. So, please help.

You simply start with an empty structure (allo 0s) and insert each element. Complexity is O(NLogN) but likely the rest of your algotihm is also NLogN so it will not matter.

Related

Mathematical Idea with the purpose to store data more efficiently

dear reader
I have been thinking about how to store data efficiently since the beginning of my studies and while taking a shower, I came up with the following idea:
For example, you take a picture and convert that picture into 0(zeros) and 1(ones). Then you take this eternally long number and divide it by e.g. 10 then again by 10 and then again by 10 etc. and at the end you have a small number. Now the small number and the calculation path are stored and if someone wants to read the data, they only have to perform the inverse operation to get the result.
The idea is too good to be true --> my gut feeling tells me. But I would still like to know why this should not work?
Kind regards
Hello, dear reader
I have been thinking about how to store data efficiently since the beginning of my studies and while taking a shower, I came up with the following idea:
For example, you take a picture and convert that picture into 0(zeros) and 1(ones). Then you take this eternally long number and divide it by e.g. 10 then again by 10 and then again by 10 etc. and at the end you have a small number. Now the small number and the calculation path are stored and if someone wants to read the data, they only have to perform the inverse operation to get the result.
The idea is too good to be true --> my gut feeling tells me. But I would still like to know why this should not work?
Kind regards
Fun theorem. No bijection on the natural numbers can map every number to a smaller one. Proof by contradiction, consider F(F(1)).
Lots of ways to map numbers 1-1 to smaller numbers such that many map to smaller numbers. These are lossless compression algorithms. Most have the property that repeated application of the algorithm make the data larger, or leave it unchanged.
In your proposal, to the extent I understand it, you would have to store all the remainders of the division, which would be as large as the original data.

How to find all sums in an int array

I have an array of integers, example
{2,3,7}
and I need to find how to find a number as a sum of these numbers
For example, let's say I need to find 17
I could do 7+2+2+3+3, 7+2+2+2+2+2, 7+3+7, 3+3+3+2+2+2+2, etc.
But looping through everything is very inefficient, it would be O(N^N) in the best case...
How would i solve a problem like this in an optimized way?
I believe you're asking StackOverflow to help you solve the knapsack problem. If you manage to find a polynomial solution, you can go claim a million dollars reward for solving P=NP. Good luck !

Updating values in an array with logical indexing with a non-constant value

A common problem I encounter when I want to write concise/readable code:
I want to update all the values of a vector matching a logical expression with a value that depends on the previous value.
For example, double all even entries:
weights = [10 7 4 8 3];
weights(mod(weights,2)==0) = weights(mod(weights,2)==0) * 2;
% weights = [20 7 8 16 3]
Is it possible to write the second line in a more concise fashion (i.e. avoiding the double use of the logical expression, something like i+=3 for i=i+3 in other languages). If I often use this kind of vector operation in different contexts/variables, and I have long conditionals, I feel that my code is less concise and readable than it could be.
Thanks!
How about
ind = mod(weights,2)==0;
weights(ind) = weights(ind)*2;
This way you avoid calculating the indices twice and it's easy to read.
Starting your other comment to Wauzl, such powerful operation capabilities is the Fortran side. This is purely matlab's design that is quickly getting obsolete. Let's use this horribleness further:
for i=1:length(weights),if (mod(weights(i),2)==0)weights(i)=weights(i)*2;end,end
It is even slightly faster than your two liner because you are doing the conditional indexing twice on both sides. In general, consider switching to Python3.
Well, I after more searching around, I found this link that deals with this issue (I used search before posting, I swear!), and there is interesting further discussion regarding this topic in the links in that thread. So apparently there are issues with ambiguity when introducing such an operator.
Looks like that is the price we have to pay in terms of syntactic limitations for having such powerful matrix operation capabilities.
Thanks a lot anyway, Wauzl!

Most efficient way to store a big DNA sequence?

I want to pack a giant DNA sequence with an iOS app (about 3,000,000,000 base pairs). Each base pair can have a value A, C, T or G. Storing each base pair in one bytes would give a file of 3 GB, which is way too much. :)
Now I though of storing each base pair in two bits (four base pairs per octet), which gives a file of 750 MB. 750 MB is still way too much, even when compressed.
Are there any better file formats for efficiently storing giant base pairs on disk? In memory is not a problem as I read in chunks.
I think you'll have to use two bits per base pair, plus implement compression as described in this paper.
"DNA sequences... are not random; they contain
repeating sections, palindromes, and other features that
could be represented by fewer bits than is required to spell
out the complete sequence in binary...
With the proposed algorithm, sequence will be compressed by 75%
irrespective of the number of repeated or non-repeated
patterns within the sequence."
DNA Compression Using Hash Based Data Structure, International Journal of Information Technology and Knowledge Management
July-December 2010, Volume 2, No. 2, pp. 383-386.
Edit: There is a program called GenCompress which claims to compress DNA sequences efficiently:
http://www1.spms.ntu.edu.sg/~chenxin/GenCompress/
Edit: See also this question on BioStar.
If you don't mind having a complex solution, take a look at this paper or this paper or even this one which is more detailed.
But I think you need to specify better what you're dealing with. Some specifics applications can lead do diferent storage. For example, the last paper I cited deals with lossy compression of DNA...
Base pairs always pair up, so you should only have to store one side of the strand. Now, I doubt that this works if there are certain mutations in the DNA (like a di-Thiamine bond) that cause the opposite strand to not be the exact opposite of the stored strand. Beyond that, I don't think you have many options other than to compress it somehow. But, then again, I'm not a bioinformatics guy, so there might be some pretty sophisticated ways to store a bunch of DNA in a small space. Another idea if it's an iOS app is just putting a reader on the device and reading the sequence from a web service.
Use a diff from a reference genome. From the size (3Gbp) that you post, it looks like you want to include a full human sequences. Since sequences don't differ too much from person to person, you should be able to compress massively by storing only a diff.
Could help a lot. Unless your goal is to store the reference sequence itself. Then you're stuck.
consider this, how many different combinations can you get? out of 4 (i think its about 16 )
actg = 1
atcg = 2
atgc = 3 and so on, so that
you can create an array like [1,2,3] then you can go one step further,
check if 1 is follow by 2, convert 12 to a, 13 = b and so on...
if I understand DNA a bit it means that you cannot get a certain value
as a must be match with c, and t with g or something like that which reduces your options,
so basically you can look for a sequence and give it a something you can also convert back...
You want to look into a 3d space-filling curve. A 3d sfc reduces the 3d complexity to a 1d complexity. It's a little bit like n octree or a r-tree. If you can store your full dna in a sfc you can look for similar tiles in the tree although a sfc is most likely to use with lossy compression. Maybe you can use a block-sorting algorithm like the bwt if you know the size of the tiles and then try an entropy compression like a huffman compression or a golomb code?
You can use the tools like MFCompress, Deliminate,Comrad.These tools provides entropy less than 2.That is for storing each symbol it will take less than 2 bits

large test data for knapsack problem

i am researcher student. I am searching large data for knapsack problem. I wanted test my algorithm for knapsack problem. But i couldn't find large data. I need data has 1000 item and capacity is no matter. The point is item as much as huge it's good for my algorithm. Is there any huge data available in internet. Does anybody know please guys i need urgent.
You can quite easily generate your own data. Just use a random number generator and generate lots and lots of values. To test that your algorithm gives the correct results, compare it to the results from another known working algorithm.
I have the same requirement.
Obviously only Brute force will give the optimal answer and that won't work for large problems.
However we could pitch our algorithms against each other...
To be clear, my algorithm works for 0-1 problems (i.e. 0 or 1 of each item), Integer or decimal data.
I also have a version that works for 2 dimensions (e.g. Volume and Weight vs. Value).
My file reader uses a simple CSV format (Item-name, weight, value):
X229257,9,286
X509192,11,272
X847469,5,184
X457095,4,88
etc....
If I recall correctly, I've tested mine on 1000 items too.
Regards.
PS:
I ran my algorithm again the problem on Rosette Code that Mark highlighted (thank you). I got the same result but my solution is much more scalable than the dynamic programming / LP solutions and will work on much bigger problems

Resources