How to get the distinct count of a column using C [closed] - c

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I would like to get the distinct count of the column of a large data file using C.How can I do it.Please kindly advise me.Thanks.My sample data file is as below.
For 2nd attribute the distinct count is 6.
399547,v4149,p3178,1990,2065,fraud
399940,v5852,p3194,8278,2180,fraud
399983,v3476,p3199,766,1125,fraud
400206,v3467,p3216,494,311000,fraud
400345,v4497,p3219,1211,432100,fraud
400471,v3473,p3225,41392,3710,fraud
400498,v3476,p3225,102,23820,fraud
401325,v4497,p3297,1322,1110,fraud

Make a search tree for every column. Let's say you have 10 rows in a file with 2 distinct values for the nth column viz. 3456 and 3457. Your search tree for nth column will look like:
You'll end up with 6 Search trees. Once you have read the entire file, traverse all possible paths in each search tree and that will give you the number of distinct values.

Read and split every line.
Put the second attributes into an array.
qsort the array
You have now an array with equal strings adjacent to each other. You can loop over the array and count different entries.
If your entries are all 5 characters long, otherwise you must malloc() memory for each attribute.
char (*array)[6];
int i;
int n; /* number of lines read */
int distinct = 1;
/* read the data file and put it into array */
/* qsort() array */
for (i = 1; i < n; ++i) {
if (strcmp(array[i], array[i - 1]) != 0)
++distinct;
}
printf("There are %d distinct rows\n", distinct);

You can use std::map<std::string,int> - it will hold key-value pairs, where key is vNNNN, and value is number of repetitions.
First loop will scan input file and populate this map, then number of keys in map will be distinct count.
EDIT: If you cannot use C++ and do require C, you will have to find some hashmap library for C, like sparsehash.
If amount of data is really, really big, it is possible that it will not fit in memory. In this case, I would recommend to use SQLite temporary database to parse, store and index your data and then use standard SELECT DISTINCT on it.

Related

Copying 0 as 000 in C++? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
This question is ANSWERED. Nothing to do with formatters but instead my idiocy when it comes to copying to new buffers.
I'm hoping this is a one line answer. I have a snprintf() statement which is something like the following:
snprintf(buffer, sizeof(buffer), "%03d", 0U);
I'm expecting buffer to hold 000 but for some reason it only holds 00. Assume the buffer is plenty large enough to hold what I want. Am I being silly?
EDIT:
See below for complete code with context. I was trying to simplify it before as I didn't think all this context was necessary. The point still remains, using %04u gives me 000 in the first CSV row. %03u only gives me 00.
uint16_t CSVGenerator::write_csv_data(TestRecord* record)
{
// Define the templates.
const char *row_template = "%04u,%6.3E,%6.3E,%6.3E,%6.3E,%6.3E\n";
char csv_row_buffer[CSV_ROW_BUFFER_SIZE];
// Add the data.
uint16_t row_count = 0U;
for (uint16_t reading = 0U; reading < MEASURE_READING_COUNT; ++reading)
{
// Parse the row.
snprintf(csv_row_buffer, sizeof(csv_row_buffer), row_template,
// Test ID
MEASURE_PERIOD_SECS * reading,
// Impedances Z1-Z5.
record->measurements[reading][0U],
record->measurements[reading][1U],
record->measurements[reading][2U],
record->measurements[reading][3U],
record->measurements[reading][4U]);
// Add it to the main buffer, excluding the terminator.
strncpy((m_csv_data_buffer + (reading * CSV_ROW_BUFFER_SIZE) - 1U),
csv_row_buffer, (sizeof(csv_row_buffer) - 1U));
// Increment the row count.
++row_count;
} // for : each reading.
return row_count;
}
How do you check that it contains only "000"? If you are reading it from (m_csv_data_buffer + (reading * CSV_ROW_BUFFER_SIZE)) you are actually losing the first byte since you've copied it to (m_csv_data_buffer + (reading * CSV_ROW_BUFFER_SIZE) - 1U) in your code.
strncpy handles null terminators implicitly so I'm guessing where you're subtracting 1 from the target buffer address you're actually putting the first character of your new row into the last byte of the previous row.
You seem to be using a combination of fixed buffer sizes and variable string lengths. This is what is the likely cause of what you're seeing.
are you on a 16bit machine? maybe you declared buffer as a char* so sizeof(buffer) evaluates as 2 and snprintf copies only the first two bytes of the actual output "000" (plus the terminator character 0x00)
I suspect the problem resides on how you read back the content of your buffer, not the actual content, maybe the:
Some dummy code to keep buffer in scope while I check it..
is way more important than what you posted

Correct my thinking in this C exercise [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
The exercise asks to find which of the numbers from 1 to 500, the sum of the numbers specific digits, raised to the third power equals that particular number.
for example 1^3=1
and 371 makes 3^3+7^3+1^3 = 371
How I approached the problem:
I was thinking if I could have an array of strings with 500 slots, each slot containing a string converted number, then I could do math with each slot's string. If they met the criteria I would apply then that slot would be printed.
I tried the function sprintf without much success. In a loop it just initializes the strings (or is it arrays? after 3 hours I am confused) [0] slot, leaving all other slots unchanged.
I don't want you to solve the exercise, rather than guide me with my logic. Please ask me to add code of what I did if you want to.
Always start by clearly defining your algorithm, so you know what you are doing. Split it up into simple steps. Something like this:
For each integer i in the interval 1 to 500:
Check if the condition holds for this i
If it holds:
Print i
else:
Do nothing
Now you need to define "Check if the condition holds for this i". I would use some modulo and division arithmetics to extract the digits, but I leave the details to you.
Note that I have talked nothing about C or any other programming language. Only when you know your algorithm should you start thinking about implementation.
(There is actually the possibility of a slightly different algorithm than the one given above, where you have one loop for each digit nested inside each other. That solution may be acceptable to you but it will not be as generic)
for(i=1;i<=500;i++)
{
//loop for checking each number i
int sum=0; // to store the sum of cube of digits
int n=i; //copy of i
//The below while loops does the task. It extracts a digit from the number and adds its cube to the sum
// last digit from the number can be seen by taking its remainder by 10 . For eg 35%10=5
//once we have used this digit make the number shorter by dividing by 10. For eg 35/10 becomes 3 (because of integer divisions)
while(n>0)
{
int rem=n%10; //extract the last digit
sum+=cube(rem); //cube function raises a number to its cube
n/=10; //remove the digit we had extracted earlier from the number
}
if(sum==i) //we got the number we wanted
printf("%d\n",i);
}

How to delete row number using C programing? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I have a Structure, in the StructureI have an Array,
I read a text file and then open it in `Array Into Structure ',
What I have is a list of names, Last, results.
so what is the best way to find a row number and select which row to delete and delete it? I said, Array into a Structure.?
I know I can use memmove and realloc but how do I use these?
Well, all you can do is move the following elements towards the start, and decrease the "logical" length. The logical length is different from the physical length, which is the maximum number of elements the array can hold, based on how much memory has been allocated.
So, assuming an array starting at array and with count elements, code to delete the n:th element would be:
if( n < count - 1)
memmove(array + n, array + n + 1, ((count - n) - 1) * sizeof *array);
--count;
This copies the following elements (unless you're deleting the very last one, in which case there's nothing to copy) and then decreases the logical length.

Problems with arrays in C [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Given a shifted array (for example):
[17 34 190 1 4]
which shifted from (we don't know original)
[1 4 17 34 190]
What would be a good function to find the position of that number?
For example if I pass 1, it would return 3th position.
linear search for answer would always work, but I believe you can get there in O(log) time.
Some sort of binary search for the shift point via checking if the value of the shift sorted array goes against what it is supposed it. Like creating a trie. Keep forming the sorted tree until you find the "illegal" node (man this is glossing over a lot of details - I know). That tells you where the inflection point is and you now treat the array as 2 sorted vectors. Quickly check to see if the value to find is larger than the max entry of each so we know which vector to search. BSearch the sorted vector for your value and return its index.
The hard part is finding the inflection point. :)
You would have to scan the array.
size_t pos_in_arr(int *arr, size_t arr_size, int match)
{
size_t i;
for (i = 0; i < arr_size; i++)
if (arr[i] == match)
break;
return i;
}
This function would return the position as asked, or one more than the maximum position in case the element is not found.
The solution is what you ask, but it is probably not what you need, because it does not use in any way the fact that the array has been shifted. I suspect the original problem to be more complex.
For example if you knew that in the original array one element was fifth and now is seventh, and the element you are looking for was twenty-third, you could answer "twenty-fifth" without actually scanning the array up to the twenty-fifth position, which could be the point of the whole exercise. But to build such a solution, one would need to know more about the problem.

Finding min and max from data file [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I am trying to write a simple program that reads integers from a data file and outputs the minimum and maximum value. The first integer of the input file will indicate how many more integers will be read, and then the integers will be listed.
Example input file:
5 100 -25 42235 7 -1
As simple as this is, i'm not sure where to start. I imagine I will need to create an array of size = the first integer, then increment my position in the array and compare to a min/max value.
I am not sure what the syntax would be to declare an array with the first integer, then fill the array with the remaining integers, or if that is even the right approach. After creating the array, I should have no problem assigning values to min/max variables with a loop increasing position through the array.
You don't need an array. Just keep track of the current minimum and maximum (they start as the first number you read). After you read each number, if it's lower than the minimum it becomes the new minimum, and if it's higher than the maximum it becomes the maximum.
There is no need to use an array to store data. You just need to find out minimum and maximum values from data received from a file.
long lMin = 0, lMax = 0;
int nCount;
long lNo;
// open file
fscanf( file, "%d", &nCount );
for( int i = 0; i < nCount; i++ )
{
fscanf( file, "%ld", &lNo );
if( lNo > lMax )
lMax = lNo;
if( lNo < lMin )
lMin = lNo;
}
printf(" Min = %ld Max = %ld\n", lMin, lMax );
// close file

Resources