Processing time between strings and integers in C - c

As we know the integer storage length is 4 bytes and character storage is 1 byte.Here comes my problem,I have a huge data and I need to write them into a file.
For eg. my data is like
Integers - 123456789 (of length 9) (Total 9! factorial records)
Character - abcdefghi (of length 9) (Total 9! factorial records)
Which one will take less processing time? Any thoughts...

It's insignificant compared to the file's access time.

If your integers are stored in individual 32-bit ints and you save them in binary, you have 4 bytes per integer and no conversion overhead.
If your character strings are stored in arrays of 9 chars and you save them as-is, you have 9 bytes per string and no conversion overhead.
In this case strings will take more I/O time than integers.
If you convert your integers into readable 9-char strings and save them the same way as the other strings, the I/O time will be the same, but there will be extra processing time for the integers required for their conversion into text.

Using integer will save you some space and may be some time as well but it will be so small that you wont notice. As an integer takes 4 bytes and 9 characters will take 9 bytes. So for each value, you will be using 5 extra bytes. As the length of your data set is 9! = 362880, so you will be wasting 362880*5 bytes which is 1.73 MB. This is very small chunk and writing it to disk will not be noticeable. So choose integer or character based on what suits you more and not based on which will be faster as you wont notice a difference for data set of this size.

It seems that the processing time is the same, since you have the same bytes!
A file is stored in hard-drive, not in RAM.

Related

what size do we consider the pointer of 2d array in space complexity?

This is the code given in my algorithms book.We need to calculate its space complexity.
this is the answer given-
The space complexity is S=C+Sp and in this case Sp is zero as the code is independent of n.
But I wanted to calculate the C part of the code and in this case it would be 5*2bytes=10 bytes considering a,x,n,0 and -1.
So my question is what would be the C in case a was a 2d matrix,do we consider it as 2 bytes only as in the case of 1d array or we take it as 4 bytes?
Pointers and integers are no more 16 bits since the mid-eighties so I wonder what you mean with your 2 bytes.
Also note that your accounting of space is a little weird. You don't count the variable i, but you do count literal constants, which usually appear as immediate arguments in the assembly code.
Anyway, on a 32 bits machine, all addresses (and pointers) are represented on 4 bytes and the extra accounting of array dimensions is performed by the compiler and hard-coded into the assembly.
On a 64 bits machine, count 8 bytes per pointer and 4 bytes per int.

Convert int/string to byte array with length n

How can I convert a value like 5 or "Testing" to an array of type byte with a fixed length of n byte?
Edit:
I want to represent the number 5 in bits. I know that it's 101, but I want it represented as array with a length of for example 6 bytes, so 000000 ....
I'm not sure what you are trying to accomplish here but all I can say is assuming you simply want to represent characters in the binary form of it's ASCII code, you can pad the binary representation with zeros. For example if the set number of characters you want is 10, then encoding the letter a (with ASCII code of 97) in binary will be 1100001, padded to 10 characters will be 0001100001, but that is for a single character to be encoded. The encoding of a string, which is made up of multiple characters will be a set of these 10 digit binary codes which represent the corresponding character in the ASCII table. The encoding of data is important so that the system knows how to interpret the binary data. Then there is also endianness depending on the system architecture - but that's less of an issue these days with more old and modern processors like the ARM processors being bi-endian.
So forget about representing the number 5 and the string "WTF" using
the same number of bytes - it makes the brain hurt. Stop it.
A bit more reading on character encoding will be great.
Start here - https://en.wikipedia.org/wiki/ASCII
Then this - https://en.wikipedia.org/wiki/UTF-8
Then brain hurt - https://en.wikipedia.org/wiki/Endianness

Perl: Efficiently store/get 2D array of constrained integers in file

This is an attempt to improve my
Perl: seek to and read bits, not bytes by explaining more
thoroughly what I was trying to do.
I have x, a 9136 x 42 array of integers that I want to store
super-efficiently in a file. The integers have the following
constraints:
All of the 9136 integers in x[0..9135][0] are between
-137438953472 and 137438953471, and can therefore be stored
using 38 bits.
All of the 9136 integers in x[0..9135][1] are between -16777216 and
16777215, and can therefore be stored using 25 bits.
And so on... (the integer bit constraints are known in
advance; Perl doesn't have to compute them)
Question: Using Perl, how do I efficiently store this array in a file?
Notes:
If an integer can be stored in 25 bits, it can also be stored
in 4 bytes (32 bits), if you're willing to waste 7 bits. In my
situation, however, every bit counts.
I want to use file seek() to find data quickly, not read
sequentially through the file.
The array will normally be accessed as x[i]. In other words,
I'll want the 42 integers corresponding to a given x[i], so
these 42 integers should be stored close to each other
(ideally, they should be stored adjacent to each other in the
file)
My initial approach was to just lay down a bitstream, and
then find a way to read it back and change it back into an
integer. My original question focused on that, but perhaps
there's a better solution to the bigger problem that I'm not
seeing.
Far too much detail on what I'm doing:
https://github.com/barrycarter/bcapps/blob/master/ASTRO/bc-read-cheb.m
https://github.com/barrycarter/bcapps/blob/master/ASTRO/bc-read-cheb.pl
https://github.com/barrycarter/bcapps/blob/master/ASTRO/playground.pl
I'm not sure I should be encouraging you, but it loks like Data::BitStream will do what you ask.
The program below writes a 38-bit value and a 25-bit value to a file, and then opens and retrieves the values intact.
#!/usr/bin/perl
use strict;
use warnings;
use Data::BitStream;
{
my $bs_out = Data::BitStream->new(
mode => 'w',
file => 'bits.dat',
);
printf "Maximum %d bits per word\n", $bs_out->maxbits;
$bs_out->write(38, 137438953471);
$bs_out->write(25, 16777215);
printf "Total %d bits written\n\n", $bs_out->len;
}
{
my $bs_in = Data::BitStream->new(
mode => 'ro',
file => 'bits.dat',
);
printf "Total %d bits read\n\n", $bs_in->len;
print "Data:\n";
print $bs_in->read(38), "\n";
print $bs_in->read(25), "\n";
}
output
Maximum 64 bits per word
Total 63 bits written
File size 11 bytes
Total 63 bits read
Data:
137438953471
16777215
38 and 25 is 63 bits of data written, which the module confirms. But there is clearly some additional housekeeping data involved as the total size of the resulting file is eleven bytes, and not just the eight that would be the minimum necessary. Note that, when reopened, the data remembers that it is 63 bits long. However, it is shorter than the sixteen bytes that a file would have to be to contain two simple 64-bit integers.
What you do with this information is up to you, but remember that data packed in this way will be extremely difficult to debug with a hex editor. You may be shooting yourself in the foot if you adopt something like this.

Write 9 bits binary data in C

I am trying to write to a file binary data that does not fit in 8 bits. From what I understand you can write binary data of any length if you can group it in a predefined length of 8, 16, 32,64.
Is there a way to write just 9 bits to a file? Or two values of 9 bits?
I have one value in the range -+32768 and 3 values in the range +-256. What would be the way to save most space?
Thank you
No, I don't think there's any way using C's file I/O API:s to express storing less than 1 char of data, which will typically be 8 bits.
If you're on a 9-bit system, where CHAR_BIT really is 9, then it will be trivial.
If what you're really asking is "how can I store a number that has a limited range using the precise number of bits needed", inside a possibly larger file, then that's of course very possible.
This is often called bitstreaming and is a good way to optimize the space used for some information. Encoding/decoding bitstream formats requires you to keep track of how many bits you have "consumed" of the current input/output byte in the actual file. It's a bit complicated but not very hard.
Basically, you'll need:
A byte stream s, i.e. something you can put bytes into, such as a FILE *.
A bit index i, i.e. an unsigned value that keeps track of how many bits you've emitted.
A current byte x, into which bits can be put, each time incrementing i. When i reaches CHAR_BIT, write it to s and reset i to zero.
You cannot store values in the range –256 to +256 in nine bits either. That is 513 values, and nine bits can only distinguish 512 values.
If your actual ranges are –32768 to +32767 and –256 to +255, then you can use bit-fields to pack them into a single structure:
struct MyStruct
{
int a : 16;
int b : 9;
int c : 9;
int d : 9;
};
Objects such as this will still be rounded up to a whole number of bytes, so the above will have six bytes on typical systems, since it uses 43 bits total, and the next whole number of eight-bit bytes has 48 bits.
You can either accept this padding of 43 bits to 48 or use more complicated code to concatenate bits further before writing to a file. This requires additional code to assemble bits into sequences of bytes. It is rarely worth the effort, since storage space is currently cheap.
You can apply the principle of base64 (just enlarging your base, not making it smaller).
Every value will be written to two bytes and and combined with the last/next byte by shift and or operations.
I hope this very abstract description helps you.

Efficient way to store a fixed range float

I'm heaving an (big) array of floats, each float takes 4 bytes.
Is there a way, given the fact that my floats are ranged between 0 and 255, to store each float in less than 4 bytes?
I can do any amount of computation on the whole array.
I'm using C.
How much precision do you need?
You can store each float in 2 bytes by representing it as an unsigned short (ranges from 0 to 65,535) and dividing all values by 2^8 when you need the actual value. This is essentially the same as using a fixed point format instead of floating point.
Your precision is limited to 1.0 / (2^8) = 0.00390625 when you do this, however.
The absolute range of your data doesn't really matter that much, it's the amount of precision you need. If you can get away with e.g. 6 digits of precision, then you only need as much storage as would be required to store the integers from 1-1000000, and that's 20 bits. So, supposing this, what you can do is:
1) Shift your data so that the smallest element has value 0. I.e. subtract a single value from every element. Record this shift.
2) Scale (multiply) your data by a number just large enough so that after truncation to an integer, you will not lose any precision you need.
3) Now this might be tricky unless you can pack your data into convenient 8- or 16-bit units--pack the data into successive unsigned integers. Each one of your data values needs 20 bits in this example, so value 1 takes up the first 20 bits of integer 1, value 2 takes up the remaining 12 bits of integer 1 and the first 8 bits of integer 2, and so on. In this hypothetical case you end up saving ~40%.
4) Now, 'decrypting'. Unpack the values (you have saved the # of bits in each one), un-scale, and un-shift.
So, this will do it, and might be faster and more compact than standard compression algorithms, as they aren't allowed to make assumptions about how much precision you need, but you are.
For example you could store integers (floats with .0) on one byte, but the other float need more bytes.
You could also use fixed-point if you don't worry about precision...

Resources