Compare 2 arrays of string for matches in C optimization - c

I am having a perl script that has 2 arrays, 1 with keys and 1 with substring.
I need to check if substring of 1 array have matches in the keys array.
The amount of records is huge, something that can be counted in millions so I use Inline:C to speed up the search, however it is still taking hours to treat the records.
--Perl part
//%h contains {"AAAAA1" => 1, "BBBBBB" => 1, "BB1234" =>1, "C12345" => 1.... }
my #k=sort keys %h;
//#k contains ["AAAAA1", "BBBBBB", "BB1234", "C12345".... ]
my #nn;
//#n contains [ "AAAAA1999", "AAAAABBB134", "D123edae", "C12345CCSAER"]
// "AAAAA1" (from #k) can be found in "AAAAA1999" (in #n) = OK
foreach(#n) {
my $res=array_search(\#k,$_);
if($res) {
$y++;
} else {
$z++;
push #nn,$_;
}
}
--C part
int fastcmp ( char *p1, char *p2 ) {
while( *p1 ){
char *a = p1, *b = p2;
if (*b != *a) return 0;
++p1; ++b;
}
return 1;
}
int array_search(AV *a1, SV *s1){
STRLEN bytes1;
char *p1,*p2,*n;
long a1_size,i,c;
a1_size = av_len(a1);
p1 = SvPV(s1,bytes1);
for(i=start;i<=a1_size;++i){
SV** elem = av_fetch(a1, i, 0);
SV** elem_next = (i<a1_size-1)?av_fetch(a1, i+1, 0):elem;
p2 = SvPV_nolen (*elem);
n = SvPV_nolen (*elem_next);
if (p1[0] == p2[0]) {
if (fastcmp(p1,p2)>0) {
return i;
}
}
if ((p1[0] == p2[0]) && (p2[0] != n[0])) { return -1; }
}
return -1;
}
If somebody could help to optimize the search, that could be nice.
Thanks.
Note: added comments to help what is inside each variables.

The implementation you have fails in many ways:
Fails for #a=chr(0xE9); utf8::upgrade($x=$a[0]); array_search(\#a, $x);
Fails for "abc"=~/(.*)/; array_search(["abc"], $1);
Fails for array_search(["a\0b"], "a\0c");
It also incorrectly assumes the strings are null-ternminated, which can lead to a SEGFAULT when they aren't.
Your approach scans #k for each element of #n, but if you build a trie (as the following code does), it can be scanned once.
my $alt = join '|', map quotemeta, keys %h;
my $re = qr/^(?:$alt)/;
my #nn = sort grep !/$re/, #n;
my $z = #nn;
my $y = #n - #nn;
For example, if there are 1,000 Ns and 1,000 Hs, your solution does up to 1,000,000 comparisons and mine does 1,000.
Note that 5.10+ is needed for the regex optimisation of alternations into a trie. Regexp::List can be used on older versions.
A proper C implementation will be a little faster because you can do a trie search using a function that does just that rather than using the regex engine.

Related

How to retrieve JsonbPair values of input JSONB object in PostgreSQL?

I am trying to write a PostgreSQL (11.2) server side function to read the key-value pairs of an input JSONB object. I did this (in print_kv_pair below) by trying to
extract the JsonPairs from the input jsonb object and
iterate through the keys and values and print them.
For example, for '{"a":1, "b": 2}', I expect it to print
k = "a", v = 1
k = "b", v = 2
However, the code output strange characters for the key, and the values (1 and 2) are not a numeric type as I expect. Please see sample output at the end of the question.
Can someone explain how to fix the code and correctly iterate through the key-value pairs?
PG_FUNCTION_INFO_V1(print_kv_pair);
Datum
print_kv_pair(PG_FUNCTION_ARGS)
{
//1. extracting JsonbValue
Jsonb *jb1 = PG_GETARG_JSONB_P(0);
JsonbIterator *it1;
JsonbValue v1;
JsonbIteratorToken r1;
JsonbParseState *state = NULL;
if (jb1 == NULL)
PG_RETURN_JSONB_P(jb1);
if (!JB_ROOT_IS_OBJECT(jb1))
ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("Can only take objects")));
it1 = JsonbIteratorInit(&jb1->root);
r1 = JsonbIteratorNext(&it1, &v1, false);
if (r1 != WJB_BEGIN_OBJECT)
ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("Iterator was not an object")));
JsonbValue *object = &v1;
Assert(object->type == jbvObject);
//2. iterating through key-value pairs
JsonbPair *ptr;
for (ptr = object->val.object.pairs;
ptr - object->val.object.pairs < object->val.object.nPairs; ptr++)
{
//problem lines!!!
char *buf = pnstrdup(ptr->key.val.string.val, ptr->key.val.string.len);
elog(NOTICE, "print_kv_pair(): k = %s", buf); //debug
if (ptr->value.type != jbvNumeric) {
ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("value must be numeric")));
}
elog(NOTICE, "print_kv_pair(): v = %s", DatumGetCString(DirectFunctionCall1(numeric_out,
NumericGetDatum(ptr->value.val.numeric))) ); //debug
}
elog(NOTICE, "print_kv_pair(): ok4");
PG_RETURN_BOOL(true);
}
Sample output with problem line disabled:
=> select print_kv_pair('{"a":1.0, "b": 2.0}'::jsonb);
NOTICE: print_kv_pair(): k = $�K
ERROR: value must be numeric
It seems that part 1. extracting JsonbVaule isn't working properly, and the extracted value points to invalid memory.
(I'm not very familiar with JSONB or the server side PostgreSQL programming.) Any suggestion is appreciated.

reverse vowels in string in c

I'm a beginner programmer. I was trying out the problem of reversing vowels in a string.
Ex: input: zabedfigu, output: zubidfega
When I run the following code, I get a runtime error. I've tried to change the conditions within incrementing the pointer pc1 upto only upto the middle index etc., but that either gives me a runtime error or doesn't give me the required output. I'd like some help on what to do to make my code work as well as any new way of solving the problem. TIA.
#include<stdio.h>
char* reverseVowels(char* str)
{
char *pc1, *pc2;
int i;
pc1 = &str[0];
for(i=0; str[i]!='\0';++i)
;
pc2 = &str[i-1];
while(pc1!=pc2)
{
if((*pc1=='a')||(*pc1=='e')||(*pc1=='i')||(*pc1=='o')||(*pc1=='u'))
{
while(pc2!=pc1)
{
if((*pc2=='a')||(*pc2=='e')||(*pc2=='i')||(*pc2=='o')||(*pc2=='u'))
{
char temp;
temp = *pc1;
*pc1 = *pc2;
*pc2 = temp;
++pc2;
break;
}
else
++pc2;
}
++pc1;
}
else
++pc1;
}
//return str;
return NULL;
}
int main()
{
char string[20], *pstr;
scanf("%s", string);
//pstr = reverseVowels(string);
//printf("%s", pstr);
reverseVowels(string);
printf("%s", string);
return 0;
}
You have several answers and comments pointing out the fundamental flaw in your code — that you're incrementing pc2 instead of decrementing it. However, I think your algorithm is more complicated than need be. You could:
Subject at all times to pc1 < pc2:
If pc1 is not pointing at a vowel, increment it
If pc2 is not pointing at a vowel, decrement it
If the pointers are different, swap the vowels and adjust the pointers
With test code, and with the addition of an is_vowel() function which detects both upper-case and lower-case vowels, I ended up with:
#include <ctype.h>
#include <stdio.h>
#include <string.h>
static inline int is_vowel(int c)
{
c = tolower(c);
return (c == 'a' || c == 'e' || c == 'i' || c == 'o' || c == 'u');
}
static void reverse_vowels(char *string)
{
char *p1 = string;
char *p2 = p1 + strlen(p1); // '\0' is not a vowel
while (p1 < p2)
{
while (p1 < p2 && !is_vowel((unsigned char)*p1))
p1++;
while (p1 < p2 && !is_vowel((unsigned char)*p2))
p2--;
if (p1 != p2)
{
char c = *p1;
*p1++ = *p2;
*p2-- = c;
}
}
}
int main(void)
{
#ifdef INTERACTIVE
char line[1024];
while (fgets(line, sizeof(line), stdin) != NULL)
{
line[strcspn(line, "\n")] = '\0';
printf("Input: [%s]\n", line);
reverse_vowels(line);
printf("Output: [%s]\n", line);
}
#else
char strings[][40] =
{
"",
"a",
"b",
"ab",
"abe",
"abeci",
"nnnnummmmmmmmmmmmippppoqq",
"AbleWasIEreISawElba",
"A Man, A Plan, A Canal - Panama!"
};
enum { NUM_STRINGS = sizeof(strings) / sizeof(strings[0]) };
for (int i = 0; i < NUM_STRINGS; i++)
{
printf("Input: [%s]\n", strings[i]);
reverse_vowels(strings[i]);
printf("Output: [%s]\n", strings[i]);
}
#endif /* INTERACTIVE */
return 0;
}
You can compile it with -DINTERACTIVE to give you an interactive test, or by default it gives a fixed set of tests.
Default output:
Input: []
Output: []
Input: [a]
Output: [a]
Input: [b]
Output: [b]
Input: [ab]
Output: [ab]
Input: [abe]
Output: [eba]
Input: [abeci]
Output: [ibeca]
Input: [nnnnummmmmmmmmmmmippppoqq]
Output: [nnnnommmmmmmmmmmmippppuqq]
Input: [AbleWasIEreISawElba]
Output: [ablEWasIerEISawelbA]
Input: [A Man, A Plan, A Canal - Panama!]
Output: [a Man, a Plan, a CAnal - PAnamA!]
Sample interactive session (my program was called rv61):
$ rv61
Input: []
Output: []
a
Input: [a]
Output: [a]
b
Input: [b]
Output: [b]
ab
Input: [ab]
Output: [ab]
ae
Input: [ae]
Output: [ea]
abcde
Input: [abcde]
Output: [ebcda]
ablewasiereisawelba
Input: [ablewasiereisawelba]
Output: [ablewasiereisawelba]
palindromic nonsense
Input: [palindromic nonsense]
Output: [pelendromic nonsinsa]
vwlsmssng
Input: [vwlsmssng]
Output: [vwlsmssng]
AManAPlanACanal-Panama!
Input: [AManAPlanACanal-Panama!]
Output: [aManaPlanaCAnal-PAnamA!]
a big and complex sentence with multiple words of a number of lengths and so on
Input: [ a big and complex sentence with multiple words of a number of lengths and so on ]
Output: [ o bog and cemplox sentunca woth moltepli wurds if e nember ef longths and si an ]
$
Note that the testing tests a number of degenerate cases — an empty string, a string with no vowels, a string with one vowel, etc. The palindromic tests benefit from supporting mixed case — it's hard to spot that vowels have been swapped if they're all lower case and the text is a palindrome.
Another test that could be applied is to reverse the vowels twice; the output should be the same as the input. Conservation tests can be important. (If you had a sort which didn't preserve all the elements in the array but added random new ones and/or dropped initial ones, you wouldn't be happy. But that's a topic for another day.)
Having a simple test harness along the lines shown can be helpful for library functions. Many of my library functions have a #ifdef TEST … #endif at the end to allow them to be tested for sanity. The best tests verify that the result is what is expected; these ones are lazy and leave it to visual inspection to validate the output. If it was a library function, there'd be a header to declare the function which would be #included in the source, and the function would not be static. (My default compilation options require either a declaration of the function before it is defined, or the function must be static. I make functions static in sample code like this since there's no other file referencing the function, so there's no need for a header to declare the function, and only headers should declare externally visible functions.)
Note too that the is_vowel name is carefully chosen to avoid the reserved names in the C standard:
Function names that begin with either is or to, and a lowercase letter may be added to the declarations in the <ctype.h> header.
Using isVowel() would have been OK too; using isvowel() would be using a reserved name.
Bill Woodger commented:
Why the comment about '\0' not being a vowel? …
With the code as shown, after the initialization of p2, it is true that *p2 == '\0'. The observation the '\0' is not a vowel matters if the string is non-empty because if it matched the is_vowel() predicate, the null byte could be moved to some point earlier in the string, truncating it.
Suppose the function was reverse_controls() instead of reverse_vowels() and the test used iscntrl() instead of is_vowel(). Then the code would have to handle it differently for a non-zero length string because the null byte would be reported as a control character and that would send things awry — it would be swapped with the first other control character (if there was another) in the string, truncating the string. That is not what's intended.
The problem here is that you are incrementing both pointers the one in the 0 position, and the one in the end position. The first pointer should increment, and the second one should decrement, thus instead of doing this:
++pc2;
You should do this
--pc2;
The problem is occurring when you are going to increment the pointer variable value of pc2, instead of decrementing the pointer variable pc2 value like this --pc.
Updated
According to cleblanc's comment, my previous answer was not working for an input like abade, so then I changed the code to fix that problem.
#include <stdio.h>
char* reverseVowels(char* str)
{
char *pc1, *pc2;
int i;
pc1 = &str[0];
for(i=0; str[i]!='\0';++i)
;
pc2 = &str[i-1];
while(pc1<pc2)
{
if((*pc1=='a')||(*pc1=='e')||(*pc1=='i')||(*pc1=='o')||(*pc1=='u'))
{
while(pc2!=pc1)
{
if((*pc2=='a')||(*pc2=='e')||(*pc2=='i')||(*pc2=='o')||(*pc2=='u'))
{
char temp;
temp = *pc1;
*pc1 = *pc2;
*pc2 = temp;
--pc2;
break;
}
else
--pc2;
}
++pc1;
}
else
++pc1;
}
//return str;
return NULL;
}
int main()
{
char string[20], *pstr;
scanf("%s", string);
//pstr = reverseVowels(string);
//printf("%s", pstr);
reverseVowels(string);
printf("%s\n", string);
return 0;
}

Dynamically enumerate keys in libconfig

in libconfig - is it possible to dymanically enumerate keys?
As an example, in this example config file from their repo - if someone invented more days in the hours section, could the code dynamically enumerate them and print them out?
Looking at the docs, I see lots of code to get a specific string, or list out an array, but I can't find an example where it enumerates the keys of a config section.
Edit
Received some downvotes, so thought I'd have another crack at being more specific.
I'd like to use libconfig to track some state in my application, read in the last known state when the app starts, and write it out again when it exits. My app stores things in a tree (of depth 2) - so this could be niceley represented as an associative array in a libconfig compatible file as below. The point is that the list of Ids (1234/4567) can change. I could track them in another array, but if I could just enumerate the 'keys' in the ids array below - that would be neater.
so
ids = {
"1234" = [1,2,3]
"4567" = [9,10,11,23]
}
e.g (psuedocode)
foreach $key(config_get_keys_under(&configroot)){
config_get_String($key)
}
I can't see anything obvious in the header file.
You can use config_setting_get_elem function to get n-th element of the group, array or list, and then (if it's group) use config_setting_name to get it's name. But AFAIK you can't use digits in key names. So consider following config structure:
ids = (
{
key = "1234";
value = [1, 2, 3];
},
{
key = "4567";
value = [9, 10, 11, 23];
}
);
Then you can easily enumerate through all members of the ids getting the values you want using the following code:
#include <stdio.h>
#include <libconfig.h>
int main(int argc, char **argv) {
struct config_t cfg;
char *file = "config.cfg";
config_init(&cfg);
/* Load the file */
printf("loading [%s]...\n", file);
if (!config_read_file(&cfg, file)) {
printf("failed\n");
return 1;
}
config_setting_t *setting, *member, *array;
setting = config_lookup(&cfg, "ids");
if (setting == NULL) {
printf("no ids\n");
return 2;
}
int n = 0, k, v;
char const *str;
while (1) {
member = config_setting_get_elem(setting, n);
if (member == NULL) {
break;
}
printf("element %d\n", n);
if (config_setting_lookup_string(member, "key", &str)) {
printf(" key = %s\n", str);
}
array = config_setting_get_member(member, "value");
k = 0;
if (array) {
printf(" values = [ ");
while (1) {
if (config_setting_get_elem(array, k) == NULL) {
break;
}
v = config_setting_get_int_elem(array, k);
printf("%s%d", k == 0 ? "" : ", ", v);
++k;
}
printf(" ]\n");
}
++n;
}
printf("done\n");
/* Free the configuration */
config_destroy(&cfg);
return 0;
}

Convert an array into function parameters

I register functions at a global registry. A function can have multiple arguments. I can register and call them from the registry.
Here is one of my unit tests to understand the registry.
void *a_test_function_d(int a, char *b){
printf("*** c_test called\n");
isRunD = a;
testChar = b;
return NULL;
}
TEST(testWithMultibleArguments) {
isRunD = 0;
testChar = "";
add_command(a_test_function_d);
assertEquals(1, avl_tree_count(command_registry));
exec_command("a_test_function_d", 42, "test");
assertEquals(42, isRunD);
assertEquals("test", testChar);
avl_tree_free(command_registry);
command_registry = NULL;
}
This works fine for me so far. But here comes the part I can’t find a nice solution for. From a line-parser i get tokens. The first one should be the command, the following tokens are the arguments. If i would have a fixed length of arguments, than i doesn’t have any problems, but how can I construct a function or a macro that handles a variable count of tokens to pass them as arguments to a function?
This is what i have so far:
// split lines into tokens
char *token;
token = strtok(linebuffer," ");
if (token) {
if ( has_cammand(token) ) {
// HOW TO PUT ARGS from strtok(linebuffer," ") to FUNCTION....
exec_command(token /* , a1, a2, a3 */ );
} else {
uart_puts("Command not found.\n");
}
}
My line buffer is a char* and can look like:
find honigkuchen
set name peter
(coming from a user input interactive shell).
the prototypes of the functions would be:
void *find(char *);
void *set(char *, char *);
Of cause I can define a macro and count _VA_ARGS_, or the array and do a if-else on 1, 2, 3, 4, … Parameters, but this seems a bit messy to me.
There must be a better way to convert a array, to a parameter list.
Pass the array and the number of items in the array as arguments to the function under test. Is there some reason to complicate this further?
Keep in mind that an array passed to a function is really a pointer to the first item in the array.
So, if you have:
// Prototype for test function:
bool testFunction( char *items, int itemCount );
char items[10];
int itemCount = 0;
// Get items from where ever
items[0] = 'a';
items[1] = 'r';
items[2] = 'r';
items[3] = 'a';
items[4] = 'y';
itemCount = 5;
// Assume testFunction returns true if the test succeeds, else false
if( testFunction( items /*or &items[0] to make it more clear*/, itemCount ) )
puts( "Success!" );
else
puts( "Failure :(" );
Ask away if anything is unclear...

Find Verbs in a String

I am trying (and having trouble) to write a program (In C) that accepts a string in the command line (eg. $ test.out "This is a string") and looks through the string to find verbs (and nouns, but if I figure out verbs, I can do nouns on my own).
A list of aplphabetically sorted verbs is given in the file lexicon.h, and is what I am supposed to use as my dictionary.
I know how to accept the string from the command line and use that input to create an array of strings, each string itself being a separate word, and I already have a working program that can do that, and that I hope to use part of for this one.
I am supposed to create a function called binary_search(...stuffgoeshere...) and use that to search through the lexicon file and find the verb.
I would like some suggestions or guidance on how to create a function (binary_search) that can check to see if an already separated word matches any on the list in lexicon.h. I do not want someone to just write an answer, I would like to know why you are suggesting what you do. Hopefully I can learn something fun out of this!
I know it's messy, but this is what I have so far.
Also note that lexicon's verb array has 637 values (as seen when I make int size = 637)
This program does not compile anymore, as I have not yet figured out how to make the binary_search function work yet. I am trying to modify a binary search function used in an example for class, however, that one sorted numbers in a text file, not strings of characters.
If there is anything else I should include, let me know. Thank you for your help!
#include <stdio.h>
#include <string.h>
#include "lexicon.h"
int binary_search(char word[], char verbs[][], int size);
int
main(int argc, char*argv[])
{
char word[80];
char str[80],
args[80][80];
int counter = 0,
a = 0,
i = 0,
index = 0,
t = 0;
while(str[a] != '\0')
{
if(str[a] == ' ')
{
args[index][i] = '\0';
i = 0;
a++;
index ++;
counter ++;
}
args[index][i++] = str[a++];
}
args[index][i] = '\0';
counter = counter + 1;
printf("\nThe verbs were: ");
int verbposition= -1;
int size = 637;
while(t<counter)
{
strcpy(word, args[t]);
verbposition = binary_search(word, verbs, size);
if(verbposition > -1)
printf("%s", args[t]);
t++;
}
return 0;
}
int
binary_search(char word[], char &verbs[][], int size)
{
int bottom = 0,
top = size - 1,
found = 0,
middle;
while(bottom <= top && !found)
{
middle = (bottom + top) / 2;
if(strcmp(word, verbs[middle]))
{
found = 1;
return = middle;
}
if(strcmp(word, verbs[middle]) > 0)
{
top = middle - 1;
}
else
bottom = middle + 1;
}
return -1;
}
You are on the right track. I would highly suggest you to use print statements as you will have a clear idea of where you are going wrong.

Resources