I am accessing data from an old database and several of the records have this array notation which defines the criteria for a real estate search. There are several hundred and I need to convert them to data I can use in JS and PHP.
Here is an example of the array notation. I haven't been able to find any other questions asking about this format.
a:14:{s:2:"id";s:22:"our-listings-metrolist";s:3:"map";a:4:{s:8:"latitude";s:17:"38.93309311783631";s:9:"longitude";s:19:"-120.74187943878752";s:4:"zoom";s:1:"8";s:4:"open";s:1:"0";}s:4:"feed";s:15:"ncarmetrolistca";s:6:"panels";a:2:{s:9:"office_id";a:3:{s:7:"display";s:1:"1";s:9:"collapsed";s:1:"0";s:6:"hidden";s:1:"0";}s:4:"type";a:3:{s:7:"display";s:1:"1";s:9:"collapsed";s:1:"0";s:6:"hidden";s:1:"0";}}s:9:"office_id";s:5:"01PHA";s:11:"search_type";s:0:"";s:3:"idx";s:15:"ncarmetrolistca";s:14:"search_subtype";s:0:"";s:10:"snippet_id";s:22:"our-listings-metrolist";s:13:"snippet_title";s:42:"Our Sacramento / Sierra Foothills Listings";s:10:"page_limit";s:1:"6";s:7:"sort_by";s:17:"DESC-ListingPrice";s:4:"view";s:4:"grid";s:12:"price_ranges";s:4:"true";}
It's not hard to understand and I will write my own parser if I have to but I'm hoping I don't need to. a defines an array, s defines a string, and i defines an integer. The integer after each definition character defines the length of the array, string, or integer, and then the value defined at that position representing either a key or a value.
What kind of notation is this? Is there someway I can parse this quickly into a format that can be used in JS and PHP. Do I need to build my own parser?
That's the serialization of an object in php.
For instance:
$obj = ['a'=>1, 'b'=>true, 'c'=>'foo'];
echo serialize($obj); /* prints: a:3:{s:1:"a";i:1;s:1:"b";b:1;s:1:"c";s:3:"foo";} */
To unserialize, just use the unserialize() function.
I want to split a JSON document and which has a pattern like [[[1,2],[3,4][5,6]]] using regex. The pairs represent x ad y. What I want to do it to take this string and produce a list with {"1,2", "3,4","5,6"}. Eventually I want to split the pairs. I was thinking I can make a list of {"1,2", “3,4","5,6"} and use the for loop to split the pairs. Is this approach correct to get the x and y separately?
JSON is not a regular language, but a Context free language, and as such, cannot be matched by a regular expresion. You need a full JSON parser like the ones referenced in the comments to your question.
... but, if you are going to have a fixed structure, like only three levels of square brakets only, and with the structure you posted in your question, then there's a regexp that can parse it (It would be a subset of the JSON grammar, not general enough to parse other JSON contents):
You'll have numbers: ([+-]?[0-9]+)
Then you'll have brackets and separators: \[\[\[, ,, \],\[ and \]\]\]
and finally, put all this together:
\[\[\[([+-]?[0-9]+),([+-]?[0-9]+)\],\[([+-]?[0-9]+),([+-]?[0-9]+)\],\[([+-]?[0-9]+),([+-]?[0-9]+)\]\]\]
and if you want to permit spaces between symbols, then you need:
\s*\[\s*\[\s*\[\s*([+-]?\d+)\s*,\s*([+-]?\d+)\s*\]\s*,\s*\[\s*([+-]?\d+)\s*,\s*([+-]?\d+)\s*\]\s*,\s*\[\s*([+-]?\d+)\s*,\s*([+-]?\d+)\s*\]\s*\]\s*\]\s*
This regexp will have six matching groups that will match the corresponding integers in the matching string as the folloging demo
Clarification
Regular languages, and regular grammars, and regular expressions form a class of languages with many practical properties, for example:
You can parse them efficiently in one pass with what is called a finite automaton
You can define the automaton to accept language sentences simply with a regular expression.
You can simply operate with regexps (or with automata) to make more complex acceptors (for the union of language sets, intersection, symmetric difference, concatenation, etc) to make acceptors for them.
You can simply say if one regular expression (the language it defines) is a subset, superset or none of the language of the original.
By contrast, it limits the power of languages that can be defined with it:
you cannot define languages that allow nesting of subexpressions (like the bracketing you allow in JSON expressions or the tag nesting allowed in XML documents)
you cannot define languages which collect context and use it in another place of the sentence (for example, sentences that identify a number and have to match that same number in another place of the sentence)
But, the meaning of my answer is that, if you bind the upper limit of nesting (let's say, for example, to three levels of parenthesis, like the example you posted) you can make your language regular and then parse it with the regular expression. It is not easy to do that, because this often leads to complex expressions (as you have seen in my answer) but not impossible, and you'll gain the possibility of being able to identify parts of the sentence as submatches of the regular subexpressions embedded in the global one.
If you want to allow nesting, you need to switch to context free languages, which are defined with context free grammars and are accepted with a more complex stack based automaton. Then, you loose the complete set of operations you had:
You'll never be able again to say if some language overlaps another (is included)
You'll never be abla again to construct a language from the union, intersection or difference of other context free languages.
But you will be able to match unbounded nested sentences. Normally, programming languages are defined with a context free grammar and a little more work for context checking (for example, to check if some identifier being used is actually defined in the declaration section or to match the starting and ending tag identifiers at matching levels in an XML document)
For context free languages, see this.
For regular languages, see this.
Second clarification
As in your question you didn't expressed you wanted to match real, decimal numbers, I have modified the demo to make it to allow fixed point numbers (not general floating point with exponential notation, you'll need to work it yourself, as an exercise). Just make some tests and modify the regexp to adapt it to your needs.
(well, if you want to see the solution, look at it)
Yeah i tried using the regex in my code but it is not working so I am trying a different approach now. I have an idea of how to approach it but it is not really working. First of let me be more clear on the question. What I am trying to so parse a JSON document. Like the image below. the file has a strings have [[[1,2],[3,4][5,6]]] pattern. What I am trying to get out of this is to have each pair as a list. So the list has an x-y pairs.
the string structure
My approach: first replace the “[[“ and “]]” at the begging and at the end, so I have a string with the same pattern through out. which gives [enter image description here][2]me a string “[1,2],[3,4][5,6]” This is my code but it is not working. How do I fix it? The other thing I though it could be an issue is, the strings are not the same length so. So how do I replace just the beginning and the ending?
my code
Then I can use a regex split method to get a list that has a form {“1,2” , “3,4”, “5,6”}. I am not really sure how to do this though.
Then I take the x, and the y, and add them and add those to the list. So I get of a list pair x-y pair. I will appreciate if you show me how to do this.
This is the approach I am working on but if there is a better way of doing it I will be glad to see it. [enter image description here][4]
Is there a way to turn a Maude expression into a string?
I'm looking for the equivalent of Haskell's show.
There's no really clean solution. You could convert the expression into
a metaterm, metaPrettyPrint it, convert the resulting Qids to strings
and concatenate them, but you would still need to deal with lexical
issues, such as where to insert spaces.
(From the Maude mailing list)
I have created a fulltext catalog that stores the data from some of the columns in a table, but the contents seem to have been split apart by characters that I don't really want to be considered word delimiters. ("/", "-", "_" etc..)
I know that I can set the language for word breaker, and http://msdn.microsoft.com/en-us/library/ms345188.aspx gives som idea on how to install new languages - but I need more direct control than that, because all of those languages still break on the characters I want to not break on.
Is there a way to define my own language to use for finding word breakers?
Full text indexes only consider the characters _ and ` while indexing. All the other characters are ignored and the words get split where these characters occur. This is mainly because full text indexes are designed to index large documents and there only proper words are considered to make it a more refined search.
We faced a similar problem. To solve this we actually had a translation table, where characters like #,-, / were replaced with special sequences like '`at`','`dash`','`slash`' etc. While searching in the full text, u've to again replace ur characters in the search string with these special sequences and search. This should take care of the special characters.
The ability to configure FTS indexing is fairly limited out of the box. I don't think that you can use languages to do this.
If you are up for a challenge, and have access to some C++ knowledge, you can always write a custom IFilter implementation. It's not trivial, but not too difficult. See here for IFilter resources.
I'm using the libpq library in C to accessing my PostgreSQL database. So, when I do res = PQexec(conn, "SELECT point FROM test_point3d"); I don't know how to convert the PGresult I got to my custom data type.
I know I can use the PQgetValue function, but again I don't know how to convert the returning string to my custom data type.
The best way to think about this is that data types interact with applications over a textual interfaces. Libpq returns a string from just about anything. The programmer has a responsibility to parse the string and create a data type from it. I know the author has probably abandoned the question but I am working on something similar and it is worth documenting a few important tricks here that are helpful in some cases.
Obviously if this is a C language type, with its own in and out representation, then you will have to parse the string the way you would normally.
However for arrays and tuples, the notation is basically
[open_type_identifier][csv_string][close_type_identifier]
For example a tuple may be represented as:
(35,65,1111111,f,f,2011-10-06,"2011-10-07 13:11:24.324195",186,chris,f,,,,f)
This makes it easy to parse. You can generally use existing csv processers once you trip off the first and last character. Moreover, consider:
select row('test', 'testing, inc', array['test', 'testing, inc']);
row
-------------------------------------------------
(test,"testing, inc","{test,""testing, inc""}")
(1 row)
As this shows you have standard CSV escaping inside nested attributes, so you can, in fact, determine that the third attribute is an array, and then (having undoubled the quotes), parse it as an array. In this way nested data structures can be processed in a manner roughly similar to what you might expect with a format like JSON. The trick though is that it is nested CSV.