How can I parse this unknown array notation? What language is this array notation from? - arrays

I am accessing data from an old database and several of the records have this array notation which defines the criteria for a real estate search. There are several hundred and I need to convert them to data I can use in JS and PHP.
Here is an example of the array notation. I haven't been able to find any other questions asking about this format.
a:14:{s:2:"id";s:22:"our-listings-metrolist";s:3:"map";a:4:{s:8:"latitude";s:17:"38.93309311783631";s:9:"longitude";s:19:"-120.74187943878752";s:4:"zoom";s:1:"8";s:4:"open";s:1:"0";}s:4:"feed";s:15:"ncarmetrolistca";s:6:"panels";a:2:{s:9:"office_id";a:3:{s:7:"display";s:1:"1";s:9:"collapsed";s:1:"0";s:6:"hidden";s:1:"0";}s:4:"type";a:3:{s:7:"display";s:1:"1";s:9:"collapsed";s:1:"0";s:6:"hidden";s:1:"0";}}s:9:"office_id";s:5:"01PHA";s:11:"search_type";s:0:"";s:3:"idx";s:15:"ncarmetrolistca";s:14:"search_subtype";s:0:"";s:10:"snippet_id";s:22:"our-listings-metrolist";s:13:"snippet_title";s:42:"Our Sacramento / Sierra Foothills Listings";s:10:"page_limit";s:1:"6";s:7:"sort_by";s:17:"DESC-ListingPrice";s:4:"view";s:4:"grid";s:12:"price_ranges";s:4:"true";}
It's not hard to understand and I will write my own parser if I have to but I'm hoping I don't need to. a defines an array, s defines a string, and i defines an integer. The integer after each definition character defines the length of the array, string, or integer, and then the value defined at that position representing either a key or a value.
What kind of notation is this? Is there someway I can parse this quickly into a format that can be used in JS and PHP. Do I need to build my own parser?

That's the serialization of an object in php.
For instance:
$obj = ['a'=>1, 'b'=>true, 'c'=>'foo'];
echo serialize($obj); /* prints: a:3:{s:1:"a";i:1;s:1:"b";b:1;s:1:"c";s:3:"foo";} */
To unserialize, just use the unserialize() function.

Related

Perl structure flow to C

I've started working on a program which is in Perl and has to be transformed into C.
There are a lot of subroutines which have structure member accessing which is unfamiliar to me, because I have little to no knowledge about Perl syntax and structure flow.
Example:
$ref->{$struct2[$value]->{field1}}->{struct_insideStruct2}->{$ref2->{field}}
$ref is a third structure
$ref2 is a local copy of a parameter which is of type struct1
My question is: How do you create a line like this in C?
Do I need to create nested multiple structures?
I need to understand how multiple access operators in Perl works and if I can create something similiar in C, thanks in advance.
I recommend to not try to directly translate between languages, as this likely results in a clumsy and unnatural code. That would certainly be the case here, as commented further down. The best I can do for this quest is to explain what the expression does
$ref -> { $struct2[$value]->{field1} }
-> { struct_insideStruct2 }
-> { $ref2->{field} }
The $ref is a reference to a hash (associative array); it's OK to think of it as a pointer to a hash. One can tell because the -> ("arrow") operator dereferences, and the {...} on its right means that on its left there must be a hash reference; this returns a value that it points to.
In this case, the key with which it is dereferenced (the index into the associative array) involves an element of the array #struct2 at index $value; that element is another hash reference, being dereferenced (indexed into) with a key field1 (string literal†).
What this returns is another hash reference, which is then indexed into (dereferenced) with the key struct_insideStruct2 (string), and this again returns a hash reference.
That last one is indexed with a key which itself is produced by dereferencing another hash reference, $ref2, with a key field (string).
This is an example of a Perl complex data structure. How do you like it? I don't, not very much. Even in Perl, ideally I'd like to see this rewritten as a class, as it goes too deep and wide and so it packs too much complexity without any supporting structure which a class can provide.
If you still wish to indeed and really do that kinda thing in C, you can. May want to find a good hash implementation (or use structs and nest them carefully), and probably to dust off your function pointer syntax and such. But I would recommend to not get into all that.
Instead, once you understand the deep-nested data structure explained above, and the data it represents, find a way to implement what it means and does in your code in a native C way. We always want to use logic, techniques, and idioms native to the language at hand.
Along with linked documentation also see the short and sweet perlintro. The full reference for Perl's references is perlref.
† Normally such "barewords" need be under quotes, 'string' (or using "", or q() or qq() operators ...). But if that is a sole thing between {} then the quoting may be omitted.

splitting JSON string using regex

I want to split a JSON document and which has a pattern like [[[1,2],[3,4][5,6]]] using regex. The pairs represent x ad y. What I want to do it to take this string and produce a list with {"1,2", "3,4","5,6"}. Eventually I want to split the pairs. I was thinking I can make a list of {"1,2", “3,4","5,6"} and use the for loop to split the pairs. Is this approach correct to get the x and y separately?
JSON is not a regular language, but a Context free language, and as such, cannot be matched by a regular expresion. You need a full JSON parser like the ones referenced in the comments to your question.
... but, if you are going to have a fixed structure, like only three levels of square brakets only, and with the structure you posted in your question, then there's a regexp that can parse it (It would be a subset of the JSON grammar, not general enough to parse other JSON contents):
You'll have numbers: ([+-]?[0-9]+)
Then you'll have brackets and separators: \[\[\[, ,, \],\[ and \]\]\]
and finally, put all this together:
\[\[\[([+-]?[0-9]+),([+-]?[0-9]+)\],\[([+-]?[0-9]+),([+-]?[0-9]+)\],\[([+-]?[0-9]+),([+-]?[0-9]+)\]\]\]
and if you want to permit spaces between symbols, then you need:
\s*\[\s*\[\s*\[\s*([+-]?\d+)\s*,\s*([+-]?\d+)\s*\]\s*,\s*\[\s*([+-]?\d+)\s*,\s*([+-]?\d+)\s*\]\s*,\s*\[\s*([+-]?\d+)\s*,\s*([+-]?\d+)\s*\]\s*\]\s*\]\s*
This regexp will have six matching groups that will match the corresponding integers in the matching string as the folloging demo
Clarification
Regular languages, and regular grammars, and regular expressions form a class of languages with many practical properties, for example:
You can parse them efficiently in one pass with what is called a finite automaton
You can define the automaton to accept language sentences simply with a regular expression.
You can simply operate with regexps (or with automata) to make more complex acceptors (for the union of language sets, intersection, symmetric difference, concatenation, etc) to make acceptors for them.
You can simply say if one regular expression (the language it defines) is a subset, superset or none of the language of the original.
By contrast, it limits the power of languages that can be defined with it:
you cannot define languages that allow nesting of subexpressions (like the bracketing you allow in JSON expressions or the tag nesting allowed in XML documents)
you cannot define languages which collect context and use it in another place of the sentence (for example, sentences that identify a number and have to match that same number in another place of the sentence)
But, the meaning of my answer is that, if you bind the upper limit of nesting (let's say, for example, to three levels of parenthesis, like the example you posted) you can make your language regular and then parse it with the regular expression. It is not easy to do that, because this often leads to complex expressions (as you have seen in my answer) but not impossible, and you'll gain the possibility of being able to identify parts of the sentence as submatches of the regular subexpressions embedded in the global one.
If you want to allow nesting, you need to switch to context free languages, which are defined with context free grammars and are accepted with a more complex stack based automaton. Then, you loose the complete set of operations you had:
You'll never be able again to say if some language overlaps another (is included)
You'll never be abla again to construct a language from the union, intersection or difference of other context free languages.
But you will be able to match unbounded nested sentences. Normally, programming languages are defined with a context free grammar and a little more work for context checking (for example, to check if some identifier being used is actually defined in the declaration section or to match the starting and ending tag identifiers at matching levels in an XML document)
For context free languages, see this.
For regular languages, see this.
Second clarification
As in your question you didn't expressed you wanted to match real, decimal numbers, I have modified the demo to make it to allow fixed point numbers (not general floating point with exponential notation, you'll need to work it yourself, as an exercise). Just make some tests and modify the regexp to adapt it to your needs.
(well, if you want to see the solution, look at it)
Yeah i tried using the regex in my code but it is not working so I am trying a different approach now. I have an idea of how to approach it but it is not really working. First of let me be more clear on the question. What I am trying to so parse a JSON document. Like the image below. the file has a strings have [[[1,2],[3,4][5,6]]] pattern. What I am trying to get out of this is to have each pair as a list. So the list has an x-y pairs.
the string structure
My approach: first replace the “[[“ and “]]” at the begging and at the end, so I have a string with the same pattern through out. which gives [enter image description here][2]me a string “[1,2],[3,4][5,6]” This is my code but it is not working. How do I fix it? The other thing I though it could be an issue is, the strings are not the same length so. So how do I replace just the beginning and the ending?
my code
Then I can use a regex split method to get a list that has a form {“1,2” , “3,4”, “5,6”}. I am not really sure how to do this though.
Then I take the x, and the y, and add them and add those to the list. So I get of a list pair x-y pair. I will appreciate if you show me how to do this.
This is the approach I am working on but if there is a better way of doing it I will be glad to see it. [enter image description here][4]

Access array elements from string argument in Modelica

I'm having a task in Modelica, where within a function, I want to read out values of a record (parameters) according to a given string type argument, similar to the dictionary type in Python.
For example I have a record containing coefficicents for different media, I want to read out the coefficients for methane, so my argument is the string "Methane".
Until now I solve this by presenting a second array in my coefficients-record storing the names of the media in strings. This array I parse in a for loop to match the requested media-name and then access the coefficients-array by using the found index.
This is obviously very complicated and leads to a lot of confusing code and nested for loops. Isn't there a more convenient way like the one Python presents with its dictionary type, where a string is directly linked to a value?
Thanks for the help!
There are several different alternatives you can use. I will add the pattern I like most:
model M
function index
input String[:] keys;
input String key;
output Integer i;
algorithm
i := Modelica.Math.BooleanVectors.firstTrueIndex({k == key for k in keys});
end index;
constant String[3] keys = {"A","B","C"};
Real[size(keys,1)] values = {1,2*time,3};
Real c = values[index(keys,"B")] "Coefficient";
annotation(uses(Modelica(version="3.2.1")));
end M;
The reason I like this code is because it can be made efficient by a Modelica compiler. You create a keys vector, and a corresponding data vector. The reason it is not a record is that you want the keys vector to be constant, and the values may vary over time (for a more generic dictionary than you wanted).
The compiler can then create a constant index for any constant names you want to lookup from this. This makes sorting and matching better in the compiler (since there are no unknown indexes). If there is a key you want to lookup at run-time, the code will work for this as well.

How to convert a PGresult to custom data type with libpq (PostgreSQL)

I'm using the libpq library in C to accessing my PostgreSQL database. So, when I do res = PQexec(conn, "SELECT point FROM test_point3d"); I don't know how to convert the PGresult I got to my custom data type.
I know I can use the PQgetValue function, but again I don't know how to convert the returning string to my custom data type.
The best way to think about this is that data types interact with applications over a textual interfaces. Libpq returns a string from just about anything. The programmer has a responsibility to parse the string and create a data type from it. I know the author has probably abandoned the question but I am working on something similar and it is worth documenting a few important tricks here that are helpful in some cases.
Obviously if this is a C language type, with its own in and out representation, then you will have to parse the string the way you would normally.
However for arrays and tuples, the notation is basically
[open_type_identifier][csv_string][close_type_identifier]
For example a tuple may be represented as:
(35,65,1111111,f,f,2011-10-06,"2011-10-07 13:11:24.324195",186,chris,f,,,,f)
This makes it easy to parse. You can generally use existing csv processers once you trip off the first and last character. Moreover, consider:
select row('test', 'testing, inc', array['test', 'testing, inc']);
row
-------------------------------------------------
(test,"testing, inc","{test,""testing, inc""}")
(1 row)
As this shows you have standard CSV escaping inside nested attributes, so you can, in fact, determine that the third attribute is an array, and then (having undoubled the quotes), parse it as an array. In this way nested data structures can be processed in a manner roughly similar to what you might expect with a format like JSON. The trick though is that it is nested CSV.

How can i store data in C in a tabular format?

I want to store data in C in tabular format. I am having difficulty in relating the following. Can someone help?
For example:
I want to store the follwong entries, then what should be the ideal way of storing in C?
IP Address Domain Name
1.) 10.1.1.2 www.yahoo.com
2.) 20.1.1.3 www.google.com
Should i use structures? Say for example?
struct table
{
unsigned char ip address;
char domain_name[20];
};
If not, please clarify?
You probably mixing two different questions:
How to organize data in your program (in-memory) - this is the part about using structures.
How to serialize data, that is to store it in external storage e.g. in file. This is the part about "tabular" format that implies text with fields delimited by tabs.
If IP and domain often come together in your program then it is reasonable to use structure or class (in C++) for that. Regarding your example I do not know restrictions on domain name lenght but "20" would be definitely insufficient. I'd suggest using dynamically allocated strings here. For storing IP (v4) address you may use 32 bit unsigned int - char is insufficient. Do you intend to support IP v6 also? then you need 128 bit for address.
In C (and C++) there is no built-in serialization facility like one in virtually every dynamic (or "managed") language like C#, Java, Python. So by defining a structure you do not automatically get methods for writing/reding your data. So you should use some library for serialization or write your own for reading/writing your data.
The method of storage depends at least partially on what you're going to do with the information. If it's simply to read it in and then print it out again, you could process it strictly as text.
However, network programs often make use of this type of data. See the structures in the system header files netinet/in.h, arpa/inet.h, and sys/socket.h Or see the man page for inet_aton()
Structures are the way to go. Use sufficiently sized arrays. IPV4 addresses take 16 chars and domain names take a maximum of 255 chars.
struct table
{
char ip_addr[16];
char domain_name[255];
};
Unfortunately I cannot make comments. But in respect to Amarghosh's answer, this problem would be perfectly solved using fixed length arrays for the fields since both sets (if domain is top-level only) of data are of limited length (15 characters for the ip address [assuming IPv4], and there is a 63 character ascii limit per label for domain names.)
There are two issues in representing tabular data:
1. Representing a row
2. Representing many rows.
In your example, the row can be represented by:
struct Table_Record
{
unsigned char ip_address[4];
char domain_name[MAX_DOMAIN_LENGTH];
};
I've decided to use a fixed field length for the domain name. This will make processing simpler.
The next question is how to structure the rows. This is a decision you will have to make. The simplest suggestion is to use an array. However, an array is a fixed size and needs to be reallocated if there are more entries than the array size:
struct Table_Record table[MAX_ROWS];
Another data structure for the table is a list (single or double, your choice). Unfortunately, the C language does not provide a list data structure so you will have either write your own or obtain a library.
Alternative useful data structures are maps (associative arrays) and trees (although many maps are implemented using trees). A map would allow you to retrieve the value for a given key. If the key is the IP address, the map would return the domain name.
If you are going to read and write this data using files, I suggest using a database rather than writing your own. Many people recommend SQLite.

Resources