Reading a specific sector from a hard drive using C on Linux (GNU/Linux )

Reading a specific sector from a hard drive using C on Linux (GNU/Linux ) - c

I know that a hard drive are system's files (/dev/sdXX) –then treated like files-, I have questions about this:
I tried those following lines of code but nothing positive
-------first attempt----------
int numSecteur=2;
char secteur [512];
FILE* disqueF=fopen("/dev/sda","r"); //tried "rb" and sda1 ...every thing
fseek(disqueF, numSecteur*512,SEEK_SET);
fread(secteur, 512, 1, disqueF);
fclose(disqueF);
-------Second attempt----------
int i=open("/dev/sda1",O_RDONLY);
lseek(i, 0, SEEK_SET);
read(i,secteur,512);
close(i);
------printing the results----------
printf("hex : %04x\n",secteur);
printf("string : %s\n",secteur);
Why the size is of the file /dev/sda1 is just 8 KBytes ?
How the data is stored (binary or hex….) "for the printing"
Please, I need some clues, and if someone need more details just he ask.
Thanks a lot.
Ps: Running kali 2 64bits ”debian” on VMware and i am RooT.

I tried those following lines of code but nothing positive
That's not a question.
Why the size is of the file /dev/sda1 is just 8 KBytes ?
That isn't the file size, it's the device number. (There are two parts, so the device number of sda1 is 8,1)
How the data is stored (binary or hex….) "for the printing"
Data isn't "stored for the printing". Data is stored (in electrical voltages representing binary, but you don't need to know that), and you can print it any way you want.

You cannot printf() array of chat like that, you need right print result. For example as hex dump:
for (int i = 0; i < sizeof(secteur); i++) {
printf ("%02x ", secteur[i]);
if ((i + 1) % 16 == 0)
printf ("\n");
}

Related

Portable way to determine sector size in Linux

I want to write a small program in C which can determine the sector size of a hard disk. I wanted to read the file located in /sys/block/sd[X]/queue/hw_sector_size, and it worked in CentOS 6/7.
However when I tested in CentOS 5.11, the file hw_sector_size is missing, and I have only found max_hw_sectors_kb and max_sectors_kb.
Thus, I'd like to know how can I determine (APIs) the sector size in CentOS 5, or is there an other better way to do so. Thanks.

The fdisk utility displays this information (and runs successfully on kernels older even than than the 2.6.x vintage on CentOS 5), so that seems a likely place to look for an answer. Fortunately, we're living in the wonderful world of open source, so all it requires is a little investigation.
The fdisk program is provided by the util-linux package, so we need that first.
The sector size is displayed in the output of fdisk like this:
Disk /dev/sda: 477 GiB, 512110190592 bytes, 1000215216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
If we look for Sector size in the util-linux code, we find it in disk-utils/fdisk-list.c:
fdisk_info(cxt, _("Sector size (logical/physical): %lu bytes / %lu bytes"),
fdisk_get_sector_size(cxt),
fdisk_get_physector_size(cxt));
So, it looks like we need to find fdisk_get_sector_size, which is defined in libfdisk/src/context.c:
unsigned long fdisk_get_sector_size(struct fdisk_context *cxt)
{
assert(cxt);
return cxt->sector_size;
}
Well, that wasn't super helpful. We need to find out where cxt->sector_size is set:
$ grep -lri 'cxt->sector_size.*=' | grep -v tests
libfdisk/src/alignment.c
libfdisk/src/context.c
libfdisk/src/dos.c
libfdisk/src/gpt.c
libfdisk/src/utils.c
I'm going to start with alignment.c, since that filename sounds promising. Looking through that file for the same regex I used to list the files, we find this:
cxt->sector_size = get_sector_size(cxt->dev_fd);
Which leads me to:
static unsigned long get_sector_size(int fd)
{
int sect_sz;
if (!blkdev_get_sector_size(fd, &sect_sz))
return (unsigned long) sect_sz;
return DEFAULT_SECTOR_SIZE;
}
Which in turn leads me to the definition of blkdev_get_sector_size in lib/blkdev.c:
#ifdef BLKSSZGET
int blkdev_get_sector_size(int fd, int *sector_size)
{
if (ioctl(fd, BLKSSZGET, sector_size) >= 0)
return 0;
return -1;
}
#else
int blkdev_get_sector_size(int fd __attribute__((__unused__)), int *sector_size)
{
*sector_size = DEFAULT_SECTOR_SIZE;
return 0;
}
#endif
And there we go. There is a BLKSSZGET ioctl that seems useful. A search for BLKSSZGET leads us to this stackoverflow question, which includes the following information in a comment:
For the record: BLKSSZGET = logical block size, BLKBSZGET = physical
block size, BLKGETSIZE64 = device size in bytes, BLKGETSIZE = device
size/512. At least if the comments in fs.h and my experiments can be
trusted. – Edward Falk Jul 10 '12 at 19:33

gnu FORTRAN unformatted file record markers stored as 64-bit width?

I have a legacy code and some unformatted data files that it reads, and it worked with gnu-4.1.2. I don't have access to the method that originally generated these data files. When I compile this code with a newer gnu compiler (gnu-4.7.2) and attempt to load the old data files on a different computer, it is having difficulty reading them. I start by opening the file and reading in the first record which consists of three 32-bit integers:
open(unit, file='data.bin', form='unformatted', status='old')
read(unit) x,y,z
I am expecting these three integers here to describe x,y,z spans so that next it can load a 3D matrix of float values with those same dimensions. However, instead it's loading a 0 for the first value, then the next two are offset.
Expecting:
x=26, y=127, z=97 (1A, 7F, 61 in hex)
Loaded:
x=0, y=26, z=127 (0, 1A, 7F in hex)
When I checked the data file in a hex editor, I think I figured out what was happening.
The first record marker in this case has a value of 12 (0C in hex) since it's reading three integers at 4 bytes each. This marker is stored both before and after the record. However, I notice that the 32bits immediately after each record marker is 00000000. So either the record markers are treated as 64bit integers (little-Endian) or there is a 32-bit zero padding after each record marker. Either way, the code generated with the new compiler is reading the record markers as 32-bit integers and not expecting any padding. This effectively intrudes/corrupts the data being read in.
Is there an easy way to fix this non-portable issue? The old and new hardware are 64 bit architecture and so is the executable I compiled. If I try to use the older compiler version again will it solve the problem, or is it hardware dependent? I'd prefer to use the newer compilers because they are more efficient, and I really don't want to edit the source code to open all the files as access='stream' and manually read in a trailing 0 integer after each record marker, both before and after each record.
P.S. I could probably write a C++ code to alter the data files and remove these zero paddings if there is no easier alternative.

See the -frecord-marker= option in the gfortran manual. With -frecord-marker=8 you can read the old style unformatted sequential files produced by older versions of gfortran.

Seeing as how Fortran doesn't have a standardization on this, I opted to convert the data files to a new format that uses 32-bit wide record lengths instead of 64-bit wide. In case anyone needs to do this in the future I've included some Visual C++ code here that worked for me and should be easily modifiable to C or another language. I have also uploaded a Windows executable (fortrec.zip) here.
CFile OldFortFile, OutFile;
const int BUFLEN = 1024*20;
char pbuf[BUFLEN];
int i, iIn, iRecLen, iRecLen2, iLen, iRead, iError = 0;
CString strInDir = "C:\folder\";
CString strIn = "file.dat";
CString strOutDir = "C:\folder\fortnew\"
system("mkdir \"" + strOutDir + "\""); //create a subdir to hold the output files
strIn = strInDir + strIn;
strOut = strOutDir + strIn;
if(OldFortFile.Open(strIn,CFile::modeRead|CFile::typeBinary)) {
if(OutFile.Open(strOut,CFile::modeCreate|CFile::modeWrite|CFile::typeBinary)) {
while(true) {
iRead = OldFortFile.Read(&iRecLen, sizeof(iRecLen)); //Read the record's raw data
if (iRead < sizeof(iRecLen)) //end of file reached
break;
OutFile.Write(&iRecLen, sizeof(iRecLen));//Write the record's raw data
OldFortFile.Read(&iIn, sizeof(iIn));
if (iIn != 0) {//this is the padding we need to ignore, ensure it's always zero
//Padding not found
iError++;
break;
}
i = iRecLen;
while (i > 0) {
iLen = (i > BUFLEN) ? BUFLEN : i;
OldFortFile.Read(&pbuf[0], iLen);
OutFile.Write(&pbuf[0], iLen);
i -= iLen;
}
if (i != 0) { //Buffer length mismatch
iError++;
break;
}
OldFortFile.Read(&iRecLen2, sizeof(iRecLen2));
if (iRecLen != iRecLen2) {//ensure we have reached the end of the record proeprly
//Record length mismatch
iError++;
break;
}
OutFile.Write(&iRecLen2, sizeof(iRecLen));
OldFortFile.Read(&iIn, sizeof(iIn));
if (iIn != 0) {//this is the padding we need to ignore, ensure it's always zero
//Padding not found
break;
}
}
OutFile.Close();
OldFortFile.Close();
}
else { //Could not create the ouput file.
OldFortFile.Close();
return;
}
}
else { //Could not open the input file
}
if (iError == 0)
//File successfully converted
else
//Encountered error

Create an array of values from different text files in C

I'm working in C on 64-bit Ubuntu 14.04.
I have a number of .txt files, each containing lines of floating point values (1 value per line). The lines represent parts of a complex sample, and they're stored as real(a1) \n imag(a1) \n real(a2) \n imag(a2), if that makes sense.
In a specific scenario there are 4 text files each containing 32768 samples (thus 65536 values), but I need to make the final version dynamic to accommodate up to 32 files (the maximum samples per file would not exceed 32768 though). I'll only be reading the first 19800 samples (depending on other things) though, since the entire signal is contained in those 39600 points (19800 samples).
A common abstraction is to represent the files / samples as a matrix, where columns represent return signals and rows represent the value of each signal at a sampling instant, up until the maximum duration.
What I'm trying to do is take the first sample from each return signal and move it into an array of double-precision floating point values to do some work on, move on to the second sample for each signal (which will overwrite the previous array) and do some work on them, and so forth, until the last row of samples have been processed.
Is there a way in which I can dynamically open files for each signal (depending on the number of pulses I'm using in that particular instance), read the first sample from each file into a buffer and ship that off to be processed. On the next iteration, the file pointers will all be aligned to the second sample, it would then move those into an array and ship it off again, until the desired amount of samples (19800 in our hypothetical case) has been reached.
I can read samples just fine from the files using fscanf:
rx_length = 19800;
int x;
float buf;
double *range_samples = calloc(num_pulses, 2 * sizeof(range_samples));
for (i=0; i < 2 * rx_length; i++){
x = fscanf(pulse_file, "%f", &buf);
*(range_samples) = buf;
}
All that needs to happen (in my mind) is that I need to cycle both sample# and pulse# (in that order), so when finished with one pulse it would move on to the next set of samples for the next pulse, and so forth. What I don't know how to do is to somehow declare file pointers for all return signal files, when the number of them can vary inbetween calls (e.g. do the whole thing for 4 pulses, and on the next call it can be 16 or 64).
If there are any ideas / comments / suggestions I would love to hear them.
Thanks.

I would make the code you posted a function that takes an array of file names as an argument:
void doPulse( const char **file_names, const int size )
{
FILE *file = 0;
// declare your other variables
for ( int i = 0; i < size; ++i )
{
file = fopen( file_names[i] );
// make sure file is open
// do the work on that file
fclose( file );
file = 0;
}
}

What you need is a generator. It would be reasonably easy in C++, but as you tagged C, I can imagine a function, taking a custom struct (the state of the object) as parameter. It could be something like (pseudo code) :
struct GtorState {
char *files[];
int filesIndex;
FILE *currentFile;
};
void gtorInit(GtorState *state, char **files) {
// loads the array of file into state, set index to 0, and open first file
}
int nextValue(GtorState *state, double *real, double *imag) {
// read 2 values from currentFile and affect them to real and imag
// if eof, close currentFile and open files[++currentIndex]
// if real and imag were found returns 0, else 1 if eof on last file, 2 if error
}
Then you main program could contain :
GtorState state;
// initialize the list of files to process
gtorInit(&state, files);
double real, imag);
int cr;
while (0 == (cr = nextValue(&state, &real, &imag)) {
// process (real, imag)
}
if (cr == 2) {
// process (at least display) error
}
Alternatively, your main program could iterate the values of the different files and call a function with state analog of the above generator that processes the values, and at the end uses the state of the processing function to get the results.

Tried a slightly different approach and it's working really well.
In stead of reading from the different files each time I want to do something, I read the entire contents of each file into a 2D array range_phase_data[sample_number][pulse_number], and then access different parts of the array depending on which range bin I'm currently working on.
Here's an excerpt:
#define REAL(z,i) ((z)[2*(i)])
#define IMAG(z,i) ((z)[2*(i)+1])
for (i=0; i<rx_length; i++){
printf("\t[%s] Range bin %i. Samples %i to %i.\n", __FUNCTION__, i, 2*i, 2*i+1);
for (j=0; j<num_pulses; j++){
REAL(fft_buf, j) = range_phase_data[2*i][j];
IMAG(fft_buf, j) = range_phase_data[2*i+1][j];
}
printf("\t[%s] Range bin %i done, ready to FFT.\n", __FUNCTION__, i);
// do stuff with the data
}
This alleviates the need to dynamically allocate file pointers and in stead just opens the files one at a time and writes the data to the corresponding column in the matrix.
Cheers.

reading a raw audio file as Matlab does in C

I have the following small script that I want to write in C :
`%% getting the spectgrum
clear, clc ;
fileName ='M0.raw'
[x,fs] = audioread(fileName);
[xPSD,f] = pwelch(x,hanning(8192),0,8192*4 ,fs);
plot(f,10*log10(abs(xPSD)));
xlim([0 22e3]);
absxPSD = abs(xPSD);
save('absXPSD.txt','absxPSD','-ascii');
save('xPSD.txt','xPSD','-ascii');
save('xValues.txt','x','-ascii');
save('frequency.txt','f','-ascii');`
without goning in details, I have a problem getting the correct result, when I checked I figured out that the data that I read is wrong here's sample that read the raw file to compare it with what Matalb reads :
#include <stdio.h>
#include <stdlib.h>
int main (){
FILE* inp =NULL;
FILE* oup =NULL;
double value =0;
inp = fopen("M0.raw","r");
oup = fopen("checks.txt","w+");
UPDATE
after LoPiTaL's answer I've tried to jump over the RIFF header which is 44Byte length0 using fseek
fseek (inp,352,SEEK_SET);// that didn't help getting the right result !!
if( inp == NULL || oup==NULL){
printf(" error at file opning \n");
return -1;
}
while (!(feof(inp))){
fread(&value,sizeof(double),1,inp);
printf(" %f \n ",value);
fprintf(oup,"%f\n",value);
}
fclose(inp);
fclose(oup);
return 0;
}
and the result that I get is :
-28083683309813134333858080554409220100578902032859386180468433149049781495379346137536863936326139303879846829175766826833343673613788446579155215033623707200818670767132304934425064429529496303287641688697019947073799877821581901737052884168025721481955133510652655692037990001524306465271815108431928360960.000000
0.000000
0.000000
0.000000
0.000000
0.000000
-20701636078248669570005757343846586744027511881225108933223144646890577802102653022204406730988428912367583701134782419138464527797567258583836429190479797597328678189654150340845........................................................................
and my aim is to get those value :
-1.0162354e-02
-9.3688965e-03
-7.5073242e-03
-1.9531250e-03
3.7231445e-03
1.3549805e-02
2.3223877e-02
3.2867432e-02
4.4830322e-02
5.5114746e-02
6.7291260e-02
7.7636719e-02
8.8562012e-02
9.5794678e-02
1.0055542e-01
1.0415649e-01
1.0351563e-01
1.0235596e-01
9.8785400e-02
9.1796875e-02
8.3648682e-02
7.1594238e-02
the audio file is mono an is 16bit resolution , any idea how can solve this ? thanks for any help

You must open the file in binary mode, for starters. Otherwise you get text mode, which can do translations of line endings, for instance. Not good with binary data.
Binary mode:
inp = fopen("M0.raw", "rb");
^
|
muy
importante

Sure, you cannot read an audio file as is and hope that the data is as you think it is.
Ignoring any coded audio file, which of course you have to decode prior to read it, lets focus in the RAW audio files:
RAW audio files usually are WAV files. WAV files have a .RIFF header at the beginning of the file, which obviously you would have to ignore before reading audio data.
http://en.wikipedia.org/wiki/Resource_Interchange_File_Format
After you have removed the RIFF header, then the data starts.
As you stated, the data is encoded as 16 bit resolution. 16 bit resolution means that 0x0000 is 0.0 and 0xFFFF is 1.0, and the size of the data is only two bytes!
So you have to read two bytes at a time (i.e. with a signed short) and then do the conversion to the range 0 to 1:
signed short ss;
double value;
FILE* inp =NULL;
inp = fopen("M0.raw","rb"); //As stated in other answer, use binary mode!
fseek (inp,44,SEEK_SET); // Only 44 bytes!!
//We already have discarded the header here....
while (fread(&ss, sizeof(signed short) ,1 , inp) == 1){
//Now we have to convert from signed short to double:
value=((double)ss)/(unsigned)0xFFFF;
//Print the results:
printf(" %f \n ",value);
fprintf(oup,"%f\n",value);
}
Of course, the function "audioread" from Matlab already does all of this for you, so you don't have to care about the encoding, as in your example, your particular data is in 16 bit, but if you use any other file, it could be 8, 16, 24 or 32, even could be differential or be encoded despite being a WAV file (see the RIFF header for more information).

What could cause a Labwindows/CVI C program to hate the number 2573?

Using Windows
So I'm reading from a binary file a list of unsigned int data values. The file contains a number of datasets listed sequentially. Here's the function to read a single dataset from a char* pointing to the start of it:
function read_dataset(char* stream, t_dataset *dataset){
//...some init, including setting dataset->size;
for(i=0;i<dataset->size;i++){
dataset->samples[i] = *((unsigned int *) stream);
stream += sizeof(unsigned int);
}
//...
}
Where read_dataset in such a context as this:
//...
char buff[10000];
t_dataset* dataset = malloc( sizeof( *dataset) );
unsigned long offset = 0;
for(i=0;i<number_of_datasets; i++){
fseek(fd_in, offset, SEEK_SET);
if( (n = fread(buff, sizeof(char), sizeof(*dataset), fd_in)) != sizeof(*dataset) ){
break;
}
read_dataset(buff, *dataset);
// Do something with dataset here. It's screwed up before this, I checked.
offset += profileSize;
}
//...
Everything goes swimmingly until my loop reads the number 2573. All of a sudden it starts spitting out random and huge numbers.
For example, what should be
...
1831
2229
2406
2637
2609
2573
2523
2247
...
becomes
...
1831
2229
2406
2637
2609
0xDB00000A
0xC7000009
0xB2000008
...
If you think those hex numbers look suspicious, you're right. Turns out the hex values for the values that were changed are really familiar:
2573 -> 0xA0D
2523 -> 0x9DB
2247 -> 0x8C7
So apparently this number 2573 causes my stream pointer to gain a byte. This remains until the next dataset is loaded and parsed, and god forbid it contain a number 2573. I have checked a number of spots where this happens, and each one I've checked began on 2573.
I admit I'm not so talented in the world of C. What could cause this is completely and entirely opaque to me.

You don't specify how you obtained the bytes in memory (pointed to by stream), nor what platform you're running on, but I wouldn't be surprised to find your on Windows, and you used the C stdio library call fopen(filename "r"); Try using fopen(filename, "rb");. On Windows (and MS-DOS), fopen() translates MS-DOS line endings "\r\n" (hex 0x0D 0x0A) in the file to Unix style "\n", unless you append "b" to the file mode to indicate binary.

A couple of irrelevant points.
sizeof(*dataset) doesn't do what you think it does.
There is no need to use seek on every read
I don't understand how you are calling a function that only takes one parameter but you are giving it two (or at least I don't understand why your compiler doesn't object)