Am in my way to practice how to use the pcre regex library, to match regular expression/pattern against a given data/buffer.Then if there is match, i have to load the matched string to may array/list. but, when i print my list/array (using a for loop), the output is unexpected/wrong. pls see how the logic works:
1.first i have to load patterns/regex.....i have a function to do this and returns patterns in an array/list.
2.iterate on each pattern and search for match in a data/buffer....pcre library handles this business.
3.if match exists, push/fill to a list/array
4.print out all matches with in loop
my sample code is:
code.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pcre.h>
#define NUMBER 3
#define MAX_GROUPS 200
char *data = "Jun 12 05 12:24:48 100.101.102.103 end";
char **load_patern()
{
char **regex_array = malloc (sizeof (char *) * NUMBER);
regex_array[0] = "(\\w+\\s\\d+\\s\\d+\\s\\d+:\\d+:\\d+)"; //pattern to match date time
regex_array[1] = "(\\d+.\\d+.\\d+.\\d+)"; //pattern for ip_adress
regex_array[2] = "(end)";//pattern for "end"
return regex_array;
}
int main()
{
char **patern_list = load_patern();
int t,i,size,rc,p,re_err_offset,re_vce[MAX_GROUPS];
char sBuffer[512];
char *pLasts = NULL;
pcre *re_compiled;
pcre_extra *re_extra;
const char *re_err_str;
char *token,*logg,*next,*match_str;
char **struct_list = malloc (sizeof (char *) * NUMBER);
for(t=0;t<3;t++) //3 is number of patterns, patern_list size
{
snprintf(sBuffer, sizeof(sBuffer), "%s",data);
pLasts = sBuffer;
re_compiled = pcre_compile(patern_list[t], 0, &re_err_str,
&re_err_offset, NULL);
re_extra = pcre_study(re_compiled, 0, &re_err_str);
next = pLasts;
size = strlen(pLasts);
rc = pcre_exec(re_compiled, re_extra, next, size, 0, 0, re_vce, MAX_GROUPS);
if(rc>0) //if match exists
{
next[re_vce[3]] = '\0';
match_str = next + re_vce[2];
struct_list[t] = match_str;
printf("data at [%d]:%s\n",t,struct_list[t]);//this prints correctly
}
}
//but,here am trying to print each match stored in struct_list[], but it fails to display correctly.
for(p=0; p<3; p++)
{
printf("loop_one: [%d]----%s\n",p,struct_list[p]);
}
return 0;
}
The out put from a loop iterating on p,should be:
loop_one: 0----Jun 12 05 12:24:48
loop_one: 1----100.101.102.103
loop_one: 2----end
any thing i miss?
Related
I am fairly new to C and have been trying my hand with some arduino projects on Proteus. I recently tried implementing a keypad and LCD interface with Peter Fleury's libraries, so far the characters I input are displayed fine, but I run into a problem when trying to print to the serial port. It's like the value of the keys keeps on being concatenated with every iteration so the ouput has extra characters like this:
The value before the comma is from the 'key' variable, the value after it the 'buf' variable:
151
(The 5 I input in the second iteration was added to the 1 from the first iteration and then put into the variable I print)
I figure it may be due to my lack/incorrect use of pointers, heres is my code:
#include <avr/io.h>
#include <util/delay.h>
#include <stdlib.h>
#include <stdio.h>
#include "lcd.h"
#include "mat_kbrd.h"
#include "funciones.h"
#include "menu.h"
char buf[256];
char* coma = ",";
int main(void)
{
pin_init();
serial_begin();
lcd_init(LCD_DISP_ON);
kbrd_init();
bienvenida();
while (1) {
int i = 0;
char key = 0;
//char *peso;
//int pesoSize = 1;
char peso[100];
//peso = calloc(pesoSize,sizeof(char));
int salida = 0;
lcd_clrscr();
desechos();
key = kbrd_read();
if (key != 0) {
lcd_gotoxy(0,3);
lcd_putc(key);
_delay_ms(2000);
lcd_clrscr();
cantidad();
while (salida != 1) {
char keypeso = 0;
keypeso = kbrd_read();
//pesoSize = i;
//peso = realloc(peso,pesoSize*sizeof(char));
if (keypeso != 0) {
if (keypeso == '+') {
salida = 1;
keypeso = *("");
lcd_clrscr();
calcularTotal(key,peso);
_delay_ms(2000);
} else {
lcd_gotoxy(i,1);
lcd_putc(keypeso);
snprintf(peso, sizeof peso, "%s%s",peso, &keypeso);
//strcat(peso,&keypeso);
i++;
_delay_ms(2000);
}
}
}
snprintf(buf, sizeof buf, "%s%s%s", &key,coma,peso);
serial_println_str(buf);
}
}
}
&key and &keypeso point to a single char, but you are using the %s format specifier, so trying to read a string into a single char. Use %c rather then %s for single characters, and pass the char not the address-of-char..
I'm using the LXLE 14.04 distribution of Linux. I want to write a C program to read commands, interpret and perform them. I'd like the program to be efficient, and I do not want to use a linked list. The commands are operations on sets. Each set can contain any of the values from 0 through 127 inclusive. I decided to represent a set as an array of characters, containing 128 bits. If bit at position pos is turned on then the number pos is in the set and if the bit at position pos is turned off then the number pos is not present in the set. For example, if the bit at position 4 is 1, then the number 4 is present in the set, if the bit at position 11 is 1 then the number 11 is present in the set.
The program should read commands and interpret them in a certain way. There are a few commands: read_set, print_set, union_set, intersect_set, sub_set and halt.
For example, the command read_set A,1,2,14,-1 in the terminal will cause the reading of values of the list into the specified set in the command. In this case the specified set in the command is A. The end of the list is represented by -1. So after writing this command, the set A will contain the elements 1,2,14.
This is what I have so far. Below is the file set.h
#include <stdio.h>
typedef struct
{
char array[16]; /*Takes 128 bits of storage*/
}set;
extern set A , B , C , D , E , F;
This is the file main.c
#include <stdio.h>
#include "set.h"
#include <string.h>
#include <stdlib.h>
set A , B , C , D , E , F; /*Variable definition*/
set sets[6];
/*Below I want to initialize sets so that set[0] = A set[1] = B etc*/
sets[0].array = A.array;
sets[1].array = B.array;
sets[2].array = C.array;
sets[3].array = D.array;
sets[4].array = E.array;
sets[5].array = F.array;
void read_set(set s,char all_command[])
{
int i, number = 0 , pos;
char* str_num = strtok(NULL,"A, ");
unsigned int flag = 1;
printf("I am in the function read_set right now\n");
while(str_num != NULL) /*without str_num != NULL get segmentation fault*/
{
number = atoi(str_num);
if(number == -1)
return;
printf("number%d ",number);
printf("str_num %c\n",*str_num);
i = number/8; /*Array index*/
pos = number%8; /*bit position*/
flag = flag << pos;
s.array[i] = s.array[i] | flag;
str_num = strtok(NULL, ", ");
if(s.array[i] & flag)
printf("Bit at position %d is turned on\n",pos);
else
printf("Bit at position %d is turned off\n",pos);
flag = 1;
}
}
typedef struct
{
char *command;
void (*func)(set,char*);
} entry;
entry chart[] = { {"read_set",&read_set} };
void (*getFunc(char *comm) ) (set,char*)
{
int i;
for(i=0; i<2; i++)
{
if( strcmp(chart[i].command,comm) == 0)
return chart[i].func;
}
return NULL;
}
int main()
{
#define PER_CMD 256
char all_comm[PER_CMD];
void (*ptr_one)(set,char*) = NULL;
char* comm; char* letter;
while( (strcmp(all_comm,"halt") != 0 ) & (all_comm != NULL))
{
printf("Please enter a command");
gets(all_comm);
comm = strtok(all_comm,", ");
ptr_one = getFunc(comm);
letter = strtok(NULL,",");
ptr_one(sets[*letter-'A'],all_comm);
all_comm[0] = '\0';
letter[0] = '\0';
}
return 0;
}
I defined a command structure called chart that has a command name and function pointer for each command. Then I have created an array of these
structures which can be matched within a loop.
In the main function, I've created a pointer called ptr_one. ptr_one holds the value of the proper function depending on the command entered by the user.
The problem is, that since user decides which set to use,I need to represent the sets as some variable, so that different sets can be sent to the function ptr_one. I thought about
creating an array in main.c like so
set sets[6];
sets[0] = A;
sets[1] = B;
sets[2] = C;
sets[3] = D;
sets[4] = E;
sets[5] = F;
And then call the function ptr_one in the main function like this ptr_one(sets[*letter-'A'] , all_command).
That way, I convert my character into a set.
The problem is that while writing the above code I got the following compile error:
error: expected ���=���, ���,���, ���;���, ���asm��� or ���attribute��� before ���.��� token
I also tried the following in the file main.c
sets[0].array = A.array;
sets[1].array = B.array;
sets[2].array = C.array;
sets[3].array = D.array;
sets[4].array = E.array;
sets[5].array = F.array;
But I got this compile error expected ���=���, ���,���, ���;���, ���asm��� or ���attribute��� before ���.��� token
I know similar questions have been asked, by they don't seem to help in my
specific case. I tired this set sets[6] = { {A.array},{B.array},{C.array},{D.array},{E.array},{F.array} } too but it did not compile.
What's my mistake and how can I initialize sets so that it holds the sets A though F?
I'm trying to get the value of xml tags in c programming by regexec and i cannot use xml parser.
Below is my sample code, can someone help in getting the expected output.
char value[500];
regex_t regexp_data;
regmatch_t matched_data[10];
char pattern_str[] = "<CODE[ \t]*^*>[ \t]*\\(.*\\)[ \t]*<\\/CODE[ \t]*>";
char msg_str[] = "<ROOT><INFO><CODE>5001</CODE><MSG>msg one</MSG></INFO> <INFO><CODE>5002</CODE><MSG>msg two</MSG></INFO></ROOT>";
if ((regcomp(®exp_data, pattern_str, REG_NEWLINE) == 0) &&
(regexec(®exp_data, msg_str, 10, matched_data, 0) == 0))
{
int i;
for (i=0; i < 10; ++i)
{
memset(value, '\0', sizeof(value));
memcpy(value, &msg_str[matched_data[i].rm_so], (matched_data[i].rm_eo - matched_data[i].rm_so));
printf ("value [%s]\n", value);
}
regfree(®exp_data);
}
/*----------------------
Outupt
value [<CODE>5001</CODE><MSG>msg one</MSG></INFO><INFO><CODE>5002</CODE>]
value [5001</CODE><MSG>msg one</MSG></INFO><INFO><CODE>5002]
----------------------
Expected Outupt
value [5001]
value [5002]
----------------------*/
Per Wiktor's comment, .* is too greedy, so I updated the regex to "<CODE[ \t]*>\\s*([0-9]*)\\s*<\\/CODE[ \t]*>" and passed in the REG_EXTENDED flag to avoid having to escape the parentheses.
As for capturing multiple matches, you want to follow how the gist Wiktor linked captures multiple matches. In order to get every match, you have to call regexec on the string multiple times while advancing a pointer to the source string by the length of the entire match. The first array element in the array of matches is the entire match, while the subsequent elements are the captured groups. Since you only have one captured group, you only need to pass in a size of 2, not 10. Here's the full code I used:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <regex.h>
int main() {
char value[500];
regex_t regexp_data;
regmatch_t matched_data[2];
char pattern_str[] = "<CODE[ \t]*>\\s*([0-9]*)\\s*<\\/CODE[ \t]*>";
char msg_str[] = "<ROOT><INFO><CODE>5001</CODE><MSG>msg one</MSG></INFO><INFO><CODE>5002</CODE><MSG>msg two</MSG></INFO></ROOT>";
char *cursor = msg_str;
if (regcomp(®exp_data, pattern_str, REG_EXTENDED | REG_NEWLINE) != 0) {
printf("Couldn't compile.\n");
return 1;
}
while (regexec(®exp_data, cursor, 2, matched_data, 0) != REG_NOMATCH) {
memset(value, '\0', sizeof(value));
memcpy(value, cursor + matched_data[1].rm_so, (matched_data[1].rm_eo - matched_data[1].rm_so));
printf("value [%s]\n", value);
cursor += matched_data[0].rm_eo;
}
regfree(®exp_data);
}
Your regular expression is matching from the first instance of <CODE> to the last instance of </CODE>. To help prevent this, you can replace the (.*\\) with ([^<]*\\), so your regex is now:
char pattern_str[] = "<CODE[ \t]*^*>[ \t]*\\([^<]*\\)[ \t]*<\\/CODE[ \t]*>";
I am using Dev-c++ IDE to compile my C (WIN32 API) programs.
I am using regex lirary provided by http://gnuwin32.sourceforge.net/packages/regex.htm
I am using this documentation for reference and the same has been provided from the above site... http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html
Following is the Code:
#include <cstdlib>
#include <iostream>
#include <sys/types.h>
#include <regex.h>
#include <conio.h>
#include <stdio.h>
using namespace std;
int main(int argc, char *argv[])
{
int a;
regex_t re;
char str[128] = "onces sam lived with samle to win samile hehe sam hoho sam\0";
regmatch_t pm;
a = regcomp(&re,"sam", 0);
if(a!=0)
{
puts("Invalid Regex");
getch();
return 0;
}
a = regexec(&re, &str[0], 1, &pm, REG_EXTENDED);
printf("\n first match at %d",pm.rm_eo);
int cnt = 0;
while(a==0)
{
a = regexec(&re, &str[0] + pm.rm_eo, 1, &pm, 0);
printf("\n next match %d",pm.rm_eo);
cnt++;
if(cnt>6)break;
}
getch();
return EXIT_SUCCESS;
}
The while loop goes infinite displaying the first and second end position of the matching string and not going further.
I have used the cnt variable to check for 6 turns and then i am breaking the loop to stop the infinite run.
The Output is:
first match at 9
next match 15
next match 9
next match 15
next match 9
next match 15
What am i missing here?
Try this instead:
int cnt = 0;
int offset = 0;
a = regexec(&re, &str[0], 1, &pm, REG_EXTENDED);
while(a==0) {
printf("\n %s match at %d", offset ? "next" : "first", offset+pm.rm_so);
offset += pm.rm_eo;
cnt++;
a = regexec(&re, &str[0] + offset, 1, &pm, 0);
}
You were not actually stepping through your string, which was what caused the unending loop.
I come up with this code, giving a little improvement (more compact) of the #jxh solution, and avoiding of using extra lookup &str[0]
int cnt = 0;
int offset = 0;
while(!regexec(&re, str + offset, 1, &pm, REG_EXTENDED)) {
printf("%s match at %d\n", offset ? "next" : "first", offset+pm.rm_so);
offset += pm.rm_eo;
cnt++;
}
I have a definite set of strings and its corresponding numbers:
kill -> 1
live -> 2
half_kill -> 3
dont_live -> 4
List is of 30 such strings and their number mapping.
If user enters "kill", I need to return 1 and if he enters "dont_live" I need to return 4.
How should I achieve this in c program? I am looking for an efficient solution because this operation needs to be done 100s of times.
should I put them in #define in my .h file?
Thanks in advance.
Sort your table, and use the standard library function bsearch to perform a binary search.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
struct entry {
char *str;
int n;
};
/* sorted according to str */
struct entry dict[] = {
"dont_live", 4,
"half_kill", 3,
"kill", 1,
"live", 2,
};
int compare(const void *s1, const void *s2)
{
const struct entry *e1 = s1;
const struct entry *e2 = s2;
return strcmp(e1->str, e2->str);
}
int
main (int argc, char *argv[])
{
struct entry *result, key = {argv[1]};
result = bsearch(&key, dict, sizeof(dict)/sizeof(dict[0]),
sizeof dict[0], compare);
if (result)
printf("%d\n", result->n);
return 0;
}
Here's what you get when you run the program.
$ ./a.out kill
1
$ ./a.out half_kill
3
$ ./a.out foo
<no output>
PS: I reused portions of sidyll's program. My answer should now be CC BY-SA compliant :p
A possible solution:
#include <stdio.h>
#include <string.h>
struct entry {
char *str;
int n;
};
struct entry dict[] = {
"kill", 1,
"live", 2,
"half_kill", 3,
"dont_live", 4,
0,0
};
int
number_for_key(char *key)
{
int i = 0;
char *name = dict[i].str;
while (name) {
if (strcmp(name, key) == 0)
return dict[i].n;
name = dict[++i].str;
}
return 0;
}
int
main (int argc, char *argv[])
{
printf("enter your keyword: ");
char s[100]; scanf("%s", s);
printf("the number is: %d\n", number_for_key(s));
return 0;
}
Here's one approach:
int get_index(char *s)
{
static const char mapping[] = "\1.kill\2.live\3.half_kill\4.dont_live";
char buf[sizeof mapping];
const char *p;
snprintf(buf, sizeof buf, ".%s", s);
p = strstr(mapping, buf);
return p ? p[-1] : 0;
}
The . mess is to work around kill being a substring of half_kill. Without that issue you could simply search for the string directly.
If it is a very short list of strings then a simple block of ifs will be more than sufficient
if (0 == strcmp(value, "kill")) {
return 1;
}
if (0 == strcmp(value, "live")) {
return 2;
}
...
If the number approach 10 I would begin to profile my application though and consider a map style structure.
if you have a fixed set of strimgs, you have two options: generate a perfect hashing function (check gperf or cmph) or create a trie so that you never have to check charcters more than once.
Compilers usually use perfect hashes to recognize a language keyword, in your case I would probably go with the trie, it should be the fastest way (but nothing beats direct measurement!)
Is it really a bottleneck? You should worry about efficiency only if the simple solution proves to be too slow.
Having said that, possible speed improvements are checking the lengths first:
If it's 4 characters then it could be "kill" or "live"
If it's 9 characters then it could be "half_kill" or "dont_live"
or checking the first character in a switch statement:
switch (string[0]) {
case 'k':
if (strcmp(string, "kill") == 0)
return 1;
return 0;
case 'l':
...
default:
return 0;
}
Use hashmap/ hashtable i think this would be the best solution.
Can you use an Enumunerator?
int main(void) {
enum outcome { kill=1, live, half_kill, dont_live };
printf("%i\n", kill); //1
printf("%i\n", dont_live); //4
printf("%i\n", half_kill); //3
printf("%i\n", live); //2
return 0;
}
Create a list of const values:
const int kill = 1;
const int live = 2;
const int half_kill = 3;
etc