extract epoch causes data overlapping postgres - c

I'm working with postgres trying to extract the epoch time from a column that is a timestamp
this is the stored procedure:
CREATE OR REPLACE FUNCTION epochtime(sampleid integer, starttime timestamp without time zone, stoptime timestamp without time zone)
RETURNS text AS
$BODY$DECLARE
result text;
BEGIN
SELECT INTO result string_agg(concat_ws(',',epochres), ',')
FROM (
Select extract('epoch' from "Timestamp") as epochres from "Results"
where "SampleID"=sampleid and "Timestamp" >= starttime
and "Timestamp" <= stoptime order by "Timestamp" asc ) res;
return result;
END$BODY$
LANGUAGE plpgsql VOLATILE
C function where the stored procedure is being called:
static int getEpoch(ClientData cdata, Tcl_Interp *interp, int objc, Tcl_Obj *const
objv[]){
Tcl_Obj *result;
char sampleid[15];
char repStartTime[256];
char repStopTime[256];
int length;
int lengthTwo;
int lengthThree;
if (objc != 4) {
Tcl_WrongNumArgs(interp, 2, objv, "number of argument error");
return TCL_ERROR;
}
strcpy(sampleid, Tcl_GetStringFromObj(objv[1], &lengthThree));
strcpy(repStartTime, Tcl_GetStringFromObj(objv[2], &length));
strcpy(repStopTime, Tcl_GetStringFromObj(objv[3], &lengthTwo));
char command[256];
PQclear(res);
strcpy(command, "Select \"epochtime\"('");
strcat(command, sampleid);
strcat(command,"','");
strcat(command, repStartTime);
strcat(command,"','");
strcat(command, repStopTime);
strcat(command, "')");
res = PQexec(conn,command);
result = Tcl_GetObjResult(interp);
Tcl_SetStringObj(result, PQgetvalue(res,0,0), strlen(PQgetvalue(res,0,0)));
return TCL_OK;
}
for small number of data i would say around 700 numbers it works fine, but if i try to get a lot of numbers like 10000 or something like that it causes a segmentation fault in the program and if i run the query within the command line it overlaps all the result, literally the numbers look like they are on top of each other
example small amount of records in the column:
example large amount of records in the column:
like i said everything works like it should with the small amount of records but i need it to work the same way with the large amount of records.

The problem that you have is that you concatenate all epochs into a single string and that leads to a buffer overrun when you have many epochs. In other words, when returning many epochs, the resulting string is larger than the allocated memory for the result, leading to the segmentation fault.
The overprinting issue in pgAdmin is just a side effect. pgAdmin is bad with long strings.
A solution would be to simply return the selection of epochs to your C program as individual records and do any concatenation there. You should allocate a string buffer large enough to hold all epochs in a single string, which you could estimate from the number of epochs returned, multiplied by the typical epoch length which appears to be 10 characters in your case (plus the separator and space, obviously, making it 12).
But do you really need to have all epochs in a single string?

Related

C11: how to quickly convert a char array into ints, then modify ints and update char array

There are two parts of the problem that I don't know how to solve:
Input
The user can enter some inputs like 12,14y or 15m and I need to extract the two ints and the character. For now, I simply use:
char buffer[50];
scanf("%s", buffer);
switch (buffer[strlen(buffer)-1]) {
// ... I use this to read the last char
}
This can give me the information of how many ints I have to read:
one in the m,n case -> sscanf(buffer, "%d%c", int1, c)
two in the y,s,b case -> sscanf(buffer, "%d,%d%c", int1, int2, c)
I need these numbers for the core of my program, so I need int values not only the string.
The problem is that online I read about sscanf inefficiency and I need a good way to do this task quickly.
Output
My code has to modify these numbers just in one case (y) and conserve a modified copy of the user input. For example, users input is 1,12y then I have to modify it in 1,10y and store it as a char array so it's not only an input. The modification of int2 it's quite long to explain, I can say that the new value would be less than the original one (in my example from 12 I get 10). The only idea I have about this it's how to create the new char array: I can calculate int1 and int2 length trying to divide them with increasing power of 10 until I get a result between 1 and 9. e.g.:
int1 = 201:
201 no
20.1 no
2.01 yes
=> 3 tries, length = 3
Then I use a malloc. But then, how can I write my "output" in the new char array? e.g.:
input = "1,201y"
-> int1 = 1, int2 = 201
-> lenght(int1) = 1, length(int2) = 2
// if the core program sets int2 = 51, then
char *out = malloc(1+2+1):
// now I have to write "1,51y" in this char array
I've coded the "core" program already, but now I'd want to improve a fast "translation" of user input (because in the core program I need to know if it's a int1m or int1n or int1,int2y or int1,int2s or int1,int2b command) and I don't know how to modify user input to save it in a string (for strings I use char arrays dynamically allocated). Only the y command could modify int2.
I hope that it's clear what I've to done.
The problem is that online I read about sscanf inefficiency
"Online" isn't a very trustworthy source. Inefficiency depends entirely on what you compare the function with.
If you compare with any plain C function then all of the stdio.h functions are very much inefficient. As is malloc for that matter. However, printing to the screen and waiting on the human user are by far the largest bottlenecks in this program, so you might want to re-consider why and what you are optimizing.
That being said, you can easily roll out a manual specialized version of the string to integer conversion, by calling strtol family of functions. Here's a version supporting exactly 1 or 2 integers in the input string (it can easily be rewritten to use a loop instead):
#include <stdlib.h>
int parse_input (const char* input, int* i1, int* i2, char* ch)
{
char* endptr=NULL;
const char* cptr=input;
int result;
result = strtol(cptr, &endptr, 10);
if(cptr==endptr)
{
return 0;
}
*i1 = result;
if(*endptr != ',')
{
*ch = *endptr;
return 1;
}
cptr=endptr+1;
result = strtol(cptr, &endptr, 10);
if(cptr==endptr)
{
return 0;
}
*i2 = result;
*ch = *endptr;
return 2;
}
Some extra error handling might be needed too. This gives around 50 instructions when compiled for x86_64, not counting strtol calls. Where some 20 of those instructions are related to the parameter stacking and calling convention.

How do I read and parse a text file with numbers, fast (in C)?

The last time update: my classmate uses fread() to read about one third of the whole file into a string, this can avoid lacking of memory. Then process this string, separate this string into your data structure. Notice, you need to care about one problem: at the end of this string, these last several characters may cannot consist one whole number. Think about one way to detect this situation so you can connect these characters with the first several characters of the next string.
Each number is corresponding to different variable in your data structure. Your data structure should be very simple because each time if you insert your data into one data structure, it is very slow. The most of time is spent on inserting data into data structure. Therefore, the fastest way to process these data is: using fread() to read this file into a string, separate this string into different one-dimensional arrays.
For example(just an example, not come from my project), I have a text file, like:
72 24 20
22 14 30
23 35 40
42 29 50
19 22 60
18 64 70
.
.
.
Each row is one person's information. The first column means the person's age, the second column is his deposit, the second is his wife's age.
Then we use fread() to read this text file into string, then I use stroke() to separate it(you can use faster way to separate it).
Don't use data structure to store the separated data!
I means, don't do like this:
struct person
{
int age;
int deposit;
int wife_age;
};
struct person *my_data_store;
my_data_store=malloc(sizeof(struct person)*length_of_this_array);
//then insert separated data into my_data_store
Don't use data structure to store data!
The fastest way to store your data is like this:
int *age;
int *deposit;
int *wife_age;
age=(int*)malloc(sizeof(int)*age_array_length);
deposit=(int*)malloc(sizeof(int)*deposit_array_length);
wife_age=(int*)malloc(sizeof(int)*wife_array_length);
// the value of age_array_length,deposit_array_length and wife_array_length will be known by using `wc -l`.You can use wc -l to get the value in your C program
// then you can insert separated data into these arrays when you use `stroke()` to separate them.
The second update: The best way is to use freed() to read part of the file into a string, then separate these string into your data structure. By the way, don't use any standard library function which can format string into integer , that's to slow, like fscanf() or atoi(), we should write our own function to transfer a string into n integer. Not only that, we should design a more simpler data structure to store these data. By the way, my classmate can read this 1.7G file within 7 seconds. There is a way can do this. That way is much better than using multithread. I haven't see his code, after I see his code, I will update the third time to tell you how could hi do this. That will be two months later after our course finished.
Update: I use multithread to solve this problem!! It works! Notice: don't use clock() to calculate the time when using multithread, that's why I thought the time of execution increases.
One thing I want to clarify is that, the time of reading the file without storing the value into my structure is about 20 seconds. The time of storing the value into my structure is about 60 seconds. The definition of "time of reading the file" includes the time of read the whole file and store the value into my structure. the time of reading the file = scan the file + store the value into my structure. Therefore, have some suggestions of storing value faster ? (By the way, I don't have control over the inout file, it is generated by our professor. I am trying to use multithread to solve this problem, if it works, I will tell you the result.)
I have a file, its size is 1.7G.
It looks like:
1 1427826
1 1427827
1 1750238
1 2
2 3
2 4
3 5
3 6
10 7
11 794106
.
.
and son on.
It has about ten millions of lines in the file. Now I need to read this file and store these numbers in my data structure within 15 seconds.
I have tried to use freed() to read whole file and then use strtok() to separate each number, but it still need 80 seconds. If I use fscanf(), it will be slower. How do I speed it up? Maybe we cannot make it less than 15 seconds. But 80 seconds to read it is too long. How to read it as fast as we can?
Here is part of my reading code:
int Read_File(FILE *fd,int round)
{
clock_t start_read = clock();
int first,second;
first=0;
second=0;
fseek(fd,0,SEEK_END);
long int fileSize=ftell(fd);
fseek(fd,0,SEEK_SET);
char * buffer=(char *)malloc(sizeof(char)*fileSize);
char *string_first;
long int newFileSize=fread(buffer,1,fileSize,fd);
char *string_second;
while(string_first!=NULL)
{
first=atoi(string_first);
string_second=strtok(NULL," \t\n");
second=atoi(string_second);
string_first=strtok(NULL," \t\n");
max_num= first > max_num ? first : max_num ;
max_num= second > max_num ? second : max_num ;
root_level=first/NUM_OF_EACH_LEVEL;
leaf_addr=first%NUM_OF_EACH_LEVEL;
if(root_addr[root_level][leaf_addr].node_value!=first)
{
root_addr[root_level][leaf_addr].node_value=first;
root_addr[root_level][leaf_addr].head=(Neighbor *)malloc(sizeof(Neighbor));
root_addr[root_level][leaf_addr].tail=(Neighbor *)malloc(sizeof(Neighbor));
root_addr[root_level][leaf_addr].g_credit[0]=1;
root_addr[root_level][leaf_addr].head->neighbor_value=second;
root_addr[root_level][leaf_addr].head->next=NULL;
root_addr[root_level][leaf_addr].tail=root_addr[root_level][leaf_addr].head;
root_addr[root_level][leaf_addr].degree=1;
}
else
{
//insert its new neighbor
Neighbor *newNeighbor;
newNeighbor=(Neighbor*)malloc(sizeof(Neighbor));
newNeighbor->neighbor_value=second;
root_addr[root_level][leaf_addr].tail->next=newNeighbor;
root_addr[root_level][leaf_addr].tail=newNeighbor;
root_addr[root_level][leaf_addr].degree++;
}
root_level=second/NUM_OF_EACH_LEVEL;
leaf_addr=second%NUM_OF_EACH_LEVEL;
if(root_addr[root_level][leaf_addr].node_value!=second)
{
root_addr[root_level][leaf_addr].node_value=second;
root_addr[root_level][leaf_addr].head=(Neighbor *)malloc(sizeof(Neighbor));
root_addr[root_level][leaf_addr].tail=(Neighbor *)malloc(sizeof(Neighbor));
root_addr[root_level][leaf_addr].head->neighbor_value=first;
root_addr[root_level][leaf_addr].head->next=NULL;
root_addr[root_level][leaf_addr].tail=root_addr[root_level][leaf_addr].head;
root_addr[root_level][leaf_addr].degree=1;
root_addr[root_level][leaf_addr].g_credit[0]=1;
}
else
{
//insert its new neighbor
Neighbor *newNeighbor;
newNeighbor=(Neighbor*)malloc(sizeof(Neighbor));
newNeighbor->neighbor_value=first;
root_addr[root_level][leaf_addr].tail->next=newNeighbor;
root_addr[root_level][leaf_addr].tail=newNeighbor;
root_addr[root_level][leaf_addr].degree++;
}
}
Some suggestions:
a) Consider converting (or pre-processing) the file into a binary format; with the aim to minimise the file size and also drastically reduce the cost of parsing. I don't know the ranges for your values, but various techniques (e.g. using one bit to tell if the number is small or large and storing the number as either a 7-bit integer or a 31-bit integer) could halve the file IO (and double the speed of reading the file from disk) and slash parsing costs down to almost nothing. Note: For maximum effect you'd modify whatever software created the file in the first place.
b) Reading the entire file into memory before you parse it is a mistake. It doubles the amount of RAM required (and the cost of allocating/freeing) and has disadvantages for CPU caches. Instead read a small amount of the file (e.g. 16 KiB) and process it, then read the next piece and process it, and so on; so that you're constantly reusing the same small buffer memory.
c) Use parallelism for file IO. It shouldn't be hard to read the next piece of the file while you're processing the previous piece of the file (either by using 2 threads or by using asynchronous IO).
d) Pre-allocate memory for the "neighbour" structures and remove most/all malloc() calls from your loop. The best possible case is to use a statically allocated array as a pool - e.g. Neighbor myPool[MAX_NEIGHBORS]; where malloc() can be replaced with &myPool[nextEntry++];. This reduces/removes the overhead of malloc() while also improving cache locality for the data itself.
e) Use parallelism for storing values. For example, you could have multiple threads where the first thread handles all the cases where root_level % NUM_THREADS == 0, the second thread handles all cases where root_level % NUM_THREADS == 1, etc.
With all of the above (assuming a modern 4-core CPU), I think you can get the total time (for reading and storing) down to less than 15 seconds.
My suggestion would be to form a processing pipeline and thread it. Reading the file is an I/O bound task and parsing it is CPU bound. They can be done at the same time in parallel.
There are several possibilities. You'll have to experiment.
Exploit what your OS gives you. If Windows, check out overlapped io. This lets your computation proceed with parsing one buffer full of data while the Windows kernel fills another. Then switch buffers and continue. This is related to what #Neal suggested, but has less overhead for buffering. Windows is depositing data directly in your buffer through the DMA channel. No copying. If Linux, check out memory mapped files. Here the OS is using the virtual memory hardware to do more-or-less what Windows does with overlapping.
Code your own integer conversion. This is likely to be a bit faster than making a clib call per integer.
Here's example code. You want to absolutely limit the number of comparisons.
// Process one input buffer.
*end_buf = ' '; // add a sentinel at the end of the buffer
for (char *p = buf; p < end_buf; p++) {
// somewhat unsafe (but fast) reliance on unsigned wrapping
unsigned val = *p - '0';
if (val <= 9) {
// Found start of integer.
for (;;) {
unsigned digit_val = *p - '0';
if (digit_val > 9) break;
val = 10 * val + digit_val;
p++;
}
... do something with val
}
}
Don't call malloc once per record. You should allocate blocks of many structs at a time.
Experiment with buffer sizes.
Crank up compiler optimizations. This is the kind of code that benefits greatly from excellent code generation.
Yes, standard library conversion functions are surprisingly slow.
If portability is not a problem, I'd memory-map the file. Then, something like the following C99 code (untested) could be used to parse the entire memory map:
#include <stdlib.h>
#include <errno.h>
struct pair {
unsigned long key;
unsigned long value;
};
typedef struct {
size_t size; /* Maximum number of items */
size_t used; /* Number of items used */
struct pair item[];
} items;
/* Initial number of items to allocate for */
#ifndef ITEM_ALLOC_SIZE
#define ITEM_ALLOC_SIZE 8388608
#endif
/* Adjustment to new size (parameter is old number of items) */
#ifndef ITEM_REALLOC_SIZE
#define ITEM_REALLOC_SIZE(from) (((from) | 1048575) + 1048577)
#endif
items *parse_items(const void *const data, const size_t length)
{
const unsigned char *ptr = (const unsigned char *)data;
const unsigned char *const end = (const unsigned char *)data + length;
items *result;
size_t size = ITEMS_ALLOC_SIZE;
size_t used = 0;
unsigned long val1, val2;
result = malloc(sizeof (items) + size * sizeof (struct pair));
if (!result) {
errno = ENOMEM;
return NULL;
}
while (ptr < end) {
/* Skip newlines and whitespace. */
while (ptr < end && (*ptr == '\0' || *ptr == '\t' ||
*ptr == '\n' || *ptr == '\v' ||
*ptr == '\f' || *ptr == '\r' ||
*ptr == ' '))
ptr++;
/* End of data? */
if (ptr >= end)
break;
/* Parse first number. */
if (*ptr >= '0' && *ptr <= '9')
val1 = *(ptr++) - '0';
else {
free(result);
errno = ECOMM; /* Bad data! */
return NULL;
}
while (ptr < end && *ptr >= '0' && *ptr <= '9') {
const unsigned long old = val1;
val1 = 10UL * val1 + (*(ptr++) - '0');
if (val1 < old) {
free(result);
errno = EDOM; /* Overflow! */
return NULL;
}
}
/* Skip whitespace. */
while (ptr < end && (*ptr == '\t' || *ptr == '\v'
*ptr == '\f' || *ptr == ' '))
ptr++;
if (ptr >= end) {
free(result);
errno = ECOMM; /* Bad data! */
return NULL;
}
/* Parse second number. */
if (*ptr >= '0' && *ptr <= '9')
val2 = *(ptr++) - '0';
else {
free(result);
errno = ECOMM; /* Bad data! */
return NULL;
}
while (ptr < end && *ptr >= '0' && *ptr <= '9') {
const unsigned long old = val2;
val1 = 10UL * val2 + (*(ptr++) - '0');
if (val2 < old) {
free(result);
errno = EDOM; /* Overflow! */
return NULL;
}
}
if (ptr < end) {
/* Error unless whitespace or newline. */
if (*ptr != '\0' && *ptr != '\t' && *ptr != '\n' &&
*ptr != '\v' && *ptr != '\f' && *ptr != '\r' &&
*ptr != ' ') {
free(result);
errno = ECOMM; /* Bad data! */
return NULL;
}
/* Skip the rest of this line. */
while (ptr < end && *ptr != '\n' && *ptr != '\r')
ptr++;
}
/* Need to grow result? */
if (used >= size) {
items *const old = result;
size = ITEMS_REALLOC_SIZE(used);
result = realloc(result, sizeof (items) + size * sizeof (struct pair));
if (!result) {
free(old);
errno = ENOMEM;
return NULL;
}
}
result->items[used].key = val1;
result->items[used].value = val2;
used++;
}
/* Note: we could reallocate result here,
* if memory use is an issue.
*/
result->size = size;
result->used = used;
errno = 0;
return result;
}
I've used a similar approach to load molecular data for visualization. Such data contains floating-point values, but precision is typically only about seven significant digits, no multiprecision math needed. A custom routine to parse such data beats the standard functions by at least an order of magnitude in speed.
At least the Linux kernel is pretty good at observing memory/file access patterns; using madvise() also helps.
If you cannot use a memory map, then the parsing function would be a bit different: it would append to an existing result, and if the final line in the buffer is partial, it would indicate so (and the number of chars not parsed), so that the caller can memmove() the buffer, read more data, and continue parsing. (Use 16-byte aligned addresses for reading new data, to maximize copy speeds. You don't necessarily need to move the unread data to the exact beginning of the buffer, you see; just keep the current position in the buffered data.)
Questions?
First, what's your disk hardware? A single SATA drive is likely to be topped out at 100 MB/sec. And probably more like 50-70 MB/sec. If you're already moving data off the drive(s) as fast as you can, all the software tuning you do is going to be wasted.
If your hardware CAN support reading faster? First, your read pattern - read the whole file into memory once - is the perfect use-case for direct IO. Open your file using open( "/file/name", O_RDONLY | O_DIRECT );. Read to page-aligned buffers (see man page for valloc()) in page-sized chunks. Using direct IO will cause your data to bypass double buffering in the kernel page cache, which is useless when you're reading that much data that fast and not re-reading the same data pages over and over.
If you're running on a true high-performance file system, you can read asynchronously and likely faster with lio_listio() or aio_read(). Or you can just use multiple threads to read - and use pread() so you don't have waste time seeking - and because when you read using multiple threads a seek on an open file affects all threads trying to read from the file.
And do not try to read fast into a newly-malloc'd chunk of memory - memset() it first. Because truly fast disk systems can pump data into the CPU faster than the virtual memory manager can create virtual pages for a process.

Very Slow Data Processing

Consider the following code that loads a dataset of records into a buffer and creates a Record object for each record. A record constitutes one or more columns and this information is uncovered at run-time. However, in this particular example, I have set the number of columns to 3.
typedef unsigned int uint;
typedef struct
{
uint *data;
} Record;
Record *createNewRecord (short num_cols);
int main(int argc, char *argv[])
{
time_t start_time, end_time;
int num_cols = 3;
char *relation;
FILE *stream;
int offset;
char *filename = "file.txt";
stream = fopen(filename, "r");
fseek(stream, 0, SEEK_END);
long fsize = ftell(stream);
fseek(stream, 0, SEEK_SET);
if(!(relation = (char*) malloc(sizeof(char) * (fsize + 1))))
printf((char*)"Could not allocate buffer");
fread(relation, sizeof(char), fsize, stream);
relation[fsize] = '\0';
fclose(stream);
char *start_ptr = relation;
char *end_ptr = (relation + fsize);
while (start_ptr < end_ptr)
{
Record *new_record = createNewRecord(num_cols);
for(short i = 0; i < num_cols; i++)
{
sscanf(start_ptr, " %u %n",
&(new_record->data[i]), &offset);
start_ptr += offset;
}
}
Record *createNewRecord (short num_cols)
{
Record *r;
if(!(r = (Record *) malloc(sizeof(Record))) ||
!(r->data = (uint *) malloc(sizeof(uint) * num_cols)))
{
printf(("Failed to create new a record\n");
}
return r;
}
This code is highly inefficient. My dataset contains around 31 million records (~1 GB) and this code processes only ~200 records per minute. The reason I load the dataset into a buffer is because I'll later have multiple threads process the records in this buffer and hence I want to avoid files accesses. Moreover, I have a 48 GB RAM, so the dataset in memory should not be a problem. Any ideas on how can to speed things up??
SOLUTION: the sscanf function was actually extremely slow and inefficient.. When I switched to strtoul, the job finishes in less than a minute. Malloc-ing ~ 3 million structs of type Record took only few seconds.
Confident that a lurking non-numeric data exist in the file.
int offset;
...
sscanf(start_ptr, " %u %n", &(new_record->data[i]), &offset);
start_ptr += offset;
Notice that if the file begins with non-numeric input, offset is never set and if it had the value of 0, start_ptr += offset; would never increment.
If a non-numeric data exist later in the file like "3x", offset will get the value of 1, and cause the while loop to proceed slowly for it will never get an updated value.
Best to check results of fread(), ftell() and sscanf() for unexpected return values and act accordingly.
Further: long fsizemay be too small a size. Look to using fgetpos() and fsetpos().
Note: to save processing time, consider using strtoul() as it is certainly faster than sscanf(" %u %n"). Again - check for errant results.
BTW: If code needs to uses sscanf(), use sscanf("%u%n"), a tad faster and for your code and the same functionality.
I'm not an optimization professional but I think some tips should help.
First of all, I suggest you use filename and num_cols as macros because they tend to be faster as literals when I don't see you changing their values in code.
Seond, using a struct for storing only one member is generally not recommended, but if you want to use it with functions you should only pass pointers. Since I see you're using malloc to store a struct and again for storing the only member then I suppose that is the reason why it is too slow. You're using twice the memory you need. This might not be the case with some compilers, however. Practically, using a struct with only one member is pointless. If you want to ensure that the integer you get (in your case) is specifically a record, you can typedef it.
You should also make end_pointer and fsize const for some optimization.
Now, as for functionality, have a look at memory mapping io.

Need help in C printf formatted output

I am displaying a partition table, and the table is displayed somewhat like:
Number Device name Partition type Size in MB
------------------------------------------------------------
1 /dev/sda1 NTFS 300
2 /dev/sda2 *Win95 FAT32 99
3 /dev/sda3 Unknown 128
4 /dev/sda4 NTFS 19472
120 /dev/sda120 NTFS 3000
*=Active partition
Now for displaying the above, we are using formatted output printf and the format string is
"%-6d=partition number %-25.25s=device name %c=active partition %-30.30s=part type %7Ld=size"
Now i want to display the same partition table, but with some slight modification, such that the gaps in partition slots would be displayed by a range, like:
5-119 /dev/sda5.../dev/sda119 Empty 0
I am using the formatted string as:
%d-%-6d=partition range %s%d...%s%d=(/dev/sda5.../dev/sda119) %c %-30.30s %7Ld
but it does not help me.
What should be the correct format string? I am using a gcc compiler.
I think you need to use snprintf() to prepare the two composite strings, and then a simpler printf() to do the actual printing. Since you've not shown your actual code, we have to guess at everything, which is a nuisance...
int min = 5;
int max = 119;
char *dev = "/dev/sda";
char num_range[32];
char dev_range[60];
snprintf(num_range, sizeof(num_range), "%d-%d", min, max);
snprintf(dev_range, sizeof(dev_range), "%s%d...%s%d", dev, min, dev, max);
printf("%-10s %-50.50s %c%-30.30s %7d", num_range, dev_range, ' ', "Empty", 0);
You specified %-25.25s for a single device, so it isn't clear whether you should double that for the range, or you should use some other value (or even the same value); you'll need to tweak that part of the format string to suit yourself. This technique is also how I get a colon at the end of a name — format the name and the colon into a string, and then format that string into the final print operation.

Parsing a file in C

I need parse through a file and do some processing into it. The file is a text file and the data is a variable length data of the form "PP1004181350D001002003..........". So there will be timestamps if there is PP so 1004181350 is 2010-04-18 13:50. The ones where there are D are the data points that are three separate data each three digits long, so D001002003 has three coordonates of 001, 002 and 003.
Now I need to parse this data from a file for which I need to store each timestamp into a array and the corresponding datas into arrays that has as many rows as the number of data and three rows for each co-ordinate. The end array might be like
TimeStamp[1] = "135000", low[1] = "001", medium[1] = "002", high[1] = "003"
TimeStamp[2] = "135015", low[2] = "010", medium[2] = "012", high[2] = "013"
TimeStamp[3] = "135030", low[3] = "051", medium[3] = "052", high[3] = "043"
....
The question is how do I go about doing this in C? How do I go through this string looking for these patterns and storing the values in the corresponding arrays for further processing?
Note: Here the seconds value in timestamp is added on our own as it is known at each data comes after 15 seconds.
edit: updated to follow your specs.
While your file seems to be variable length, your data isn't, you could use fscanf and do something like this:
while(fscanf(file,"PP%*6d%4d", &timestamp, &low, &medium, &high))
{
for(int i = 0; fscanf(file, "D%3d%3d%3d", &low, &medium, &high); i++)
{
timestamp=timestamp*100+i*15;
//Do something with variables (e.g. convert to string, push into vector, ...)
}
}
Note that this reads the data into integers (timestamp, low, medium and high are int's), A string version looks like this (timestamp, low, medium and high are char arrays):
int first[] = {'0', '1', '3', '4'};
int second[] = {'0','5'};
while(fscanf(file,"PP%*6d%4c", &timestamp, &low, &medium, &high))
{
for(int i = 0; fscanf(file, "D%3c%3c%3c", &low, &medium, &high); i++)
{
timestamp[i][4]=first[i%4];
timestamp[i][2]=second[i%2];
}
}
edit: some more explanation about the formatting string, with %*6d I mean: look for 6 digits and discard them (* means: do not put in a variable). %4d or %4c means in this context the same (as 1 digit will be one char), but we do save them in corresponding variables.
As long as your patterns aren't variable length, you could simply use fscanf. If you need something more complex, you might try PCRE, but for this case I think sscanf will suffice.
I wouldn't recommend using fscanf directly on input data because it is very sensitive to the in data, if one byte is wrong and suddenly doesn't the format specifier then you could in worst case a memory overwrite.
It is better to either in using fgetc and parse as it comes in or read into a buffer (fread) and process it from there.
Simply Parsing? Here it is!!
UPDATE: Checkout KillianDS's code above. Thats even better!!
[STEP 1] Search for /n ( or CR+LF)
[STEP 2] Starting from the first character on the line, U know the no. of characters each datafield occupies. Read that many characters from the file.
use atoi() to convert the character data to int
http://www.cplusplus.com/reference/clibrary/cstdlib/atoi/
Repeat for all fields.

Resources