Why are arrays in JSON not uniform? - arrays

I believe that arrays as a data structure is an organized set of items and by definition in JSON it is an ordered set of key:value pairs. I tried to test it out by a simple example.
{
"employees":[{
"Srno":1,
"EmpID":123,
"Name":"John Doe"
},
{
"Srno":2,
"Name":"James Mars"}]
}
The idea was every element in the employees array to have three properties viz. Srno,EmpID and Name.
However, the second element is intentionally left with 2 out 3 properties viz Srno and Name only.
My assumption was that it will not parse. But it did.
Then this statement from JSON.org about arrays, in incorrect.
An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.
Where am I mistaken in understanding about arrays in JSON? Can someone clarify please.

JSON defines a syntax for exchange of structured data, but doesn't define much in the way of semantics at all.
{
"example":[{
"id":1,
"a":123,
"b":"John Doe"
},
{
"id":1,
"a":"ABC",
"c":"James Mars",
"d": true
}]
}
The above snippet is perfectly valid JSON. Notice -- in addition to your "concerns" about arrays:
There is no way of specifying that ID must be unique.
There is no way of specifying that nodes with the same name have the same datatype.
In summary, not only does JSON not require that each node has an identical number of properties, the properties that exist don't have to have the same names or the same data-types.
Conversely, you could duplicate the first node of your example entirely (3 properties with the same names and values) and it would be equally valid. It's purely syntax, no semantics.

Your assumption is that programming languages should give some sort of parse error given an array where the values are of different type, like in your example. That assumption is VERY wrong.
Sure, you're correct if you're talking about Java, C++ or C# for example, but Perl, Python, PHP, Ruby, R, JavaScript, Smalltalk, ActionScript, Clojure, ColdFusion, Common Lisp (and most other Lisps), Powershell, Dylan, Groovy, Gambas, Matlab, io, VBScript and many many more languages would accept an array with objects of different types.
JSON is just like those languages. Nothing weird going on at all.
PS. I would recommend learning a dynamically typed language (one from the list above maybe) to get a wider understanding of programming in general. Just as I would advice all dynamic language-advocates to learn a static one!

Related

Perl structure flow to C

I've started working on a program which is in Perl and has to be transformed into C.
There are a lot of subroutines which have structure member accessing which is unfamiliar to me, because I have little to no knowledge about Perl syntax and structure flow.
Example:
$ref->{$struct2[$value]->{field1}}->{struct_insideStruct2}->{$ref2->{field}}
$ref is a third structure
$ref2 is a local copy of a parameter which is of type struct1
My question is: How do you create a line like this in C?
Do I need to create nested multiple structures?
I need to understand how multiple access operators in Perl works and if I can create something similiar in C, thanks in advance.
I recommend to not try to directly translate between languages, as this likely results in a clumsy and unnatural code. That would certainly be the case here, as commented further down. The best I can do for this quest is to explain what the expression does
$ref -> { $struct2[$value]->{field1} }
-> { struct_insideStruct2 }
-> { $ref2->{field} }
The $ref is a reference to a hash (associative array); it's OK to think of it as a pointer to a hash. One can tell because the -> ("arrow") operator dereferences, and the {...} on its right means that on its left there must be a hash reference; this returns a value that it points to.
In this case, the key with which it is dereferenced (the index into the associative array) involves an element of the array #struct2 at index $value; that element is another hash reference, being dereferenced (indexed into) with a key field1 (string literal†).
What this returns is another hash reference, which is then indexed into (dereferenced) with the key struct_insideStruct2 (string), and this again returns a hash reference.
That last one is indexed with a key which itself is produced by dereferencing another hash reference, $ref2, with a key field (string).
This is an example of a Perl complex data structure. How do you like it? I don't, not very much. Even in Perl, ideally I'd like to see this rewritten as a class, as it goes too deep and wide and so it packs too much complexity without any supporting structure which a class can provide.
If you still wish to indeed and really do that kinda thing in C, you can. May want to find a good hash implementation (or use structs and nest them carefully), and probably to dust off your function pointer syntax and such. But I would recommend to not get into all that.
Instead, once you understand the deep-nested data structure explained above, and the data it represents, find a way to implement what it means and does in your code in a native C way. We always want to use logic, techniques, and idioms native to the language at hand.
Along with linked documentation also see the short and sweet perlintro. The full reference for Perl's references is perlref.
† Normally such "barewords" need be under quotes, 'string' (or using "", or q() or qq() operators ...). But if that is a sole thing between {} then the quoting may be omitted.

Should one use arrays and or dictionaries in TCL

Since Tcl 8.5, we have both dictionaries, and arrays. Now, everybody knows of the advantages of the dictionaries.
Is there an advantage to an array, other than the environment array?
Has anyone found the arrays' advantage, assuming that one needs not use the TCL older than 8.5?
You can trace an array variable, but you cannot trace a dictionary value.
Other than that, the syntax for fetching an array value is more terse.
References: array dict
The big semantic advantage of arrays is that you can trace elements of the array; they really are collections of variables. This also means that you can use elements with commands like vwait, and have Tk widgets use them to store their models, and so on. (All of those depend on traces to work.)
The big semantic advantage of dictionaries is that you can pass them from one context to another cheaply; they really are values. This makes using them as an argument to a procedure or returning it from a procedure both trivial and cheap.
Syntactically, arrays are nicer.

splitting JSON string using regex

I want to split a JSON document and which has a pattern like [[[1,2],[3,4][5,6]]] using regex. The pairs represent x ad y. What I want to do it to take this string and produce a list with {"1,2", "3,4","5,6"}. Eventually I want to split the pairs. I was thinking I can make a list of {"1,2", “3,4","5,6"} and use the for loop to split the pairs. Is this approach correct to get the x and y separately?
JSON is not a regular language, but a Context free language, and as such, cannot be matched by a regular expresion. You need a full JSON parser like the ones referenced in the comments to your question.
... but, if you are going to have a fixed structure, like only three levels of square brakets only, and with the structure you posted in your question, then there's a regexp that can parse it (It would be a subset of the JSON grammar, not general enough to parse other JSON contents):
You'll have numbers: ([+-]?[0-9]+)
Then you'll have brackets and separators: \[\[\[, ,, \],\[ and \]\]\]
and finally, put all this together:
\[\[\[([+-]?[0-9]+),([+-]?[0-9]+)\],\[([+-]?[0-9]+),([+-]?[0-9]+)\],\[([+-]?[0-9]+),([+-]?[0-9]+)\]\]\]
and if you want to permit spaces between symbols, then you need:
\s*\[\s*\[\s*\[\s*([+-]?\d+)\s*,\s*([+-]?\d+)\s*\]\s*,\s*\[\s*([+-]?\d+)\s*,\s*([+-]?\d+)\s*\]\s*,\s*\[\s*([+-]?\d+)\s*,\s*([+-]?\d+)\s*\]\s*\]\s*\]\s*
This regexp will have six matching groups that will match the corresponding integers in the matching string as the folloging demo
Clarification
Regular languages, and regular grammars, and regular expressions form a class of languages with many practical properties, for example:
You can parse them efficiently in one pass with what is called a finite automaton
You can define the automaton to accept language sentences simply with a regular expression.
You can simply operate with regexps (or with automata) to make more complex acceptors (for the union of language sets, intersection, symmetric difference, concatenation, etc) to make acceptors for them.
You can simply say if one regular expression (the language it defines) is a subset, superset or none of the language of the original.
By contrast, it limits the power of languages that can be defined with it:
you cannot define languages that allow nesting of subexpressions (like the bracketing you allow in JSON expressions or the tag nesting allowed in XML documents)
you cannot define languages which collect context and use it in another place of the sentence (for example, sentences that identify a number and have to match that same number in another place of the sentence)
But, the meaning of my answer is that, if you bind the upper limit of nesting (let's say, for example, to three levels of parenthesis, like the example you posted) you can make your language regular and then parse it with the regular expression. It is not easy to do that, because this often leads to complex expressions (as you have seen in my answer) but not impossible, and you'll gain the possibility of being able to identify parts of the sentence as submatches of the regular subexpressions embedded in the global one.
If you want to allow nesting, you need to switch to context free languages, which are defined with context free grammars and are accepted with a more complex stack based automaton. Then, you loose the complete set of operations you had:
You'll never be able again to say if some language overlaps another (is included)
You'll never be abla again to construct a language from the union, intersection or difference of other context free languages.
But you will be able to match unbounded nested sentences. Normally, programming languages are defined with a context free grammar and a little more work for context checking (for example, to check if some identifier being used is actually defined in the declaration section or to match the starting and ending tag identifiers at matching levels in an XML document)
For context free languages, see this.
For regular languages, see this.
Second clarification
As in your question you didn't expressed you wanted to match real, decimal numbers, I have modified the demo to make it to allow fixed point numbers (not general floating point with exponential notation, you'll need to work it yourself, as an exercise). Just make some tests and modify the regexp to adapt it to your needs.
(well, if you want to see the solution, look at it)
Yeah i tried using the regex in my code but it is not working so I am trying a different approach now. I have an idea of how to approach it but it is not really working. First of let me be more clear on the question. What I am trying to so parse a JSON document. Like the image below. the file has a strings have [[[1,2],[3,4][5,6]]] pattern. What I am trying to get out of this is to have each pair as a list. So the list has an x-y pairs.
the string structure
My approach: first replace the “[[“ and “]]” at the begging and at the end, so I have a string with the same pattern through out. which gives [enter image description here][2]me a string “[1,2],[3,4][5,6]” This is my code but it is not working. How do I fix it? The other thing I though it could be an issue is, the strings are not the same length so. So how do I replace just the beginning and the ending?
my code
Then I can use a regex split method to get a list that has a form {“1,2” , “3,4”, “5,6”}. I am not really sure how to do this though.
Then I take the x, and the y, and add them and add those to the list. So I get of a list pair x-y pair. I will appreciate if you show me how to do this.
This is the approach I am working on but if there is a better way of doing it I will be glad to see it. [enter image description here][4]

Lua's hybrid array and hash table; does it exist anywhere else?

Lua's implementation of tables keep its elements in two parts: an array part and a hash part.
Does such a thing exist in any other languages?
Take a look at section 4, Tables, in The Implementation of Lua 5.0.
Lua 5.1 Source Code - table.c
This idea was original with Roberto Ierusalimschy and the rest of the Lua team. I heard Roberto give a talk about it at the MIT Lightweight Languages workshop in 2003, and in this talk he discussed prior work and argued convincingly that the idea was new. I don't know if other languages have copied it since.
The original Awk has a somewhat more restricted language model than Lua; either a number or a string can be used as a key in an array, but arrays themselves are not first-class values: an array must have a name, and an array cannot be used as a key in the array.
Regarding the implementation, I have checked the sources for the original Awk as maintained by Brian Kernighan, and the implementation of Awk uses a hash table, not Lua's hybrid array/table structure. The distinction is important because in Lua, when a table is used with consecutive integer keys, the space overhead is the same as for a C array. This is not true for original Awk.
I have not bothered to investigate all later implementations of awk, e.g., Gnu Awk, mawk, and so on.
EDIT: This doesn't answer the question, which was about the implementation.
AWK also did it.
It's interesing how some languages conflate operations that are different in others:
List indexing - a[10]
Associative indexing - a['foo']
Object field access - a.foo
Function/Method calls - a('foo') / a.foo()
Very incomplete examples:
Perl is the rare language where sequential/associative indexing have separate syntax - a[10] / a{'foo'}. AFAIK, object fields map to one of the other operations, depending on which the implementor of the class felt like using.
In Python, all 4 are distinct; sequential/associative indexing use same syntax but separate data types are optimized for them.
In Ruby, object fields are methods with no arguments - a.foo.
In JavaScript, object fields a.foo are syntax sugar for associative indexing a['foo'].
In Lua and AWK, associative arrays are also used for sequential indexing - a[10].
In Arc, sequential and associative indexing looks like function calls - (a 10) / (a "foo"), and I think a.foo is syntax sugar for this too(?).
The closest thing I can think of is Javascript - you create an array with new Array(), and then proceed to index either by number or by string value. It could well be for performance reasons some Javascript implementations choose to do so using two arrays, for the reasons noted in the Lua documentation you linked to.
ArrayWithHash is a fast implementation of array-hashtable hybrid in C++.
Since C++ is a statically typed language, only integer keys are allowed in ArrayWithHash (no way to insert string or pointer key). In other words, it is something like an array with hash table backup for large indices. Also it uses different hash table implementation which is less memory-efficient than Lua table implementation.

What is the actual definition of an array? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Arrays, What’s the point?
I tried to ask this question before in What is the difference between an array and a list? but my question was closed before reaching a conclusive answer (more about that).
I'm trying to understand what is really meant by the word "array" in computer science. I am trying to reach an answer not have a discussion as per the spirit of this website. What I'm asking is language agnostic but you may draw on your knowledge of what arrays are/do in various languages that you've used.
Ways of thinking about this question:
Imagine you're designing a new programming language and you decide to implement arrays in it; what does that mean they do? What will the properties and capabilities of those things be. If it depends on the type of language, how so?
What makes an array an array?
When is an array not an array? When it is, for example, a list, vector, table, map, or collection?
It's possible there isn't one precise definition of what an array is, if that is the case then are there any standard or near-standard assumptions or what an array is? Are there any common areas at least? Maybe there are several definitions, if that is the case I'm looking for the most precision in each of them.
Language examples:
(Correct me if I'm wrong on any of these).
C arrays are contiguous blocks of memory of a single type that can be traversed using pointer arithmetic or accessed at a specific offset point. They have a fixed size.
Arrays in JavaScript, Ruby, and PHP, have a variable size and can store an object/scalar of any type they can also grow or have elements removed from them.
PHP arrays come in two types: numeric and associative. Associative arrays have elements that are stored and retrieved with string keys. Numeric arrays have elements that are stored and retrieved with integers. Interestingly if you have: $eg = array('a', 'b', 'c') and you unset($eg[1]) you still retrieve 'c' with $eg[2], only now $eg[1] is undefined. (You can call array_values() to re-index the array). You can also mix string and integer keys.
At this stage of sort of suspecting that C arrays are the only true array here and that strictly-speaking for an array to be an array it has to have all the characteristics I mention in that first bullet point. If that's the case then — again these are suspicions that I'm looking to have confirmed or rejected — arrays in JS and Ruby are actually vectors, and PHP arrays are probably tables of some kind.
Final note: I've made this community wiki so if answers need to be edited a few times in lieu of comments, go ahead and do that. Consensus is in order here.
It is, or should be, all about abstraction
There is actually a good question hidden in there, a really good one, and it brings up a language pet peeve I have had for a long time.
And it's getting worse, not better.
OK: there is something lowly and widely disrespected Fortran got right that my favorite languages like Ruby still get wrong: they use different syntax for function calls, arrays, and attributes. Exactly how abstract is that? In fortran function(1) has the same syntax as array(1), so you can change one to the other without altering the program. (I know, not for assignments, and in the case of Fortran it was probably an accident of goofy punch card character sets and not anything deliberate.)
The point is, I'm really not sure that x.y, x[y], and x(y) should have different syntax. What is the benefit of attaching a particular abstraction to a specific syntax? To make more jobs for IDE programmers working on refactoring transformations?
Having said all that, it's easy to define array. In its first normal form, it's a contiguous sequence of elements in memory accessed via a numeric offset and using a language-specific syntax. In higher normal forms it is an attribute of an object that responds to a typically-numeric message.
array |əˈrā|
noun
1 an impressive display or range of a particular type of thing : there is a vast array of literature on the topic | a bewildering array of choices.
2 an ordered arrangement, in particular
an arrangement of troops.
Mathematics: an arrangement of quantities or symbols in rows and columns; a matrix.
Computing: an ordered set of related elements.
Law: a list of jurors empaneled.
3 poetic/literary elaborate or beautiful clothing : he was clothed in fine array.
verb
[ trans. ] (usu. be arrayed) display or arrange (things) in a particular way : arrayed across the table was a buffet | the forces arrayed against him.
[ trans. ] (usu. be arrayed in) dress someone in (the clothes specified) : they were arrayed in Hungarian national dress.
[ trans. ] Law empanel (a jury).
ORIGIN Middle English (in the senses [preparedness] and [place in readiness] ): from Old French arei (noun), areer (verb), based on Latin ad- ‘toward’ + a Germanic base meaning ‘prepare.’
From FOLDOC:
array
1. <programming> A collection of identically typed data items
distinguished by their indices (or "subscripts"). The number
of dimensions an array can have depends on the language but is
usually unlimited.
An array is a kind of aggregate data type. A single
ordinary variable (a "scalar") could be considered as a
zero-dimensional array. A one-dimensional array is also known
as a "vector".
A reference to an array element is written something like
A[i,j,k] where A is the array name and i, j and k are the
indices. The C language is peculiar in that each index is
written in separate brackets, e.g. A[i][j][k]. This expresses
the fact that, in C, an N-dimensional array is actually a
vector, each of whose elements is an N-1 dimensional array.
Elements of an array are usually stored contiguously.
Languages differ as to whether the leftmost or rightmost index
varies most rapidly, i.e. whether each row is stored
contiguously or each column (for a 2D array).
Arrays are appropriate for storing data which must be accessed
in an unpredictable order, in contrast to lists which are
best when accessed sequentially. Array indices are
integers, usually natural numbers, whereas the elements of
an associative array are identified by strings.
2. <architecture> A processor array, not to be confused with
an array processor.
Also note that in some languages, when they say "array" they actually mean "associative array":
associative array
<programming> (Or "hash", "map", "dictionary") An array
where the indices are not just integers but may be
arbitrary strings.
awk and its descendants (e.g. Perl) have associative
arrays which are implemented using hash coding for faster
look-up.
If you ignore how programming languages model arrays and lists, and ignore the implementation details (and consequent performance characteristics) of the abstractions, then the concepts of array and list are indistinguishable.
If you introduce implementation details (still independent of programming language) you can compare data structures like linked lists, array lists, regular arrays, sparse arrays and so on. But then you are not longer comparing arrays and lists per se.
The way I see it, you can only talk about a distinction between arrays and lists in the context of a programming language. And of course you are then talking about arrays and lists as supported by that language. You cannot generalize to any other language.
In short, I think this question is based on a false premise, and has no useful answer.
EDIT: in response to Ollie's comments:
I'm not saying that it is not useful to use the words "array" and "list". What I'm saying is the words do not and cannot have precise and distinct definitions ... except in the context of a specific programming language. While you would like the two words to have distinct meaning, it is a fact that they don't. Just take a look at the way the words are actually used. Furthermore, trying to impose a new set of definitions on the world is doomed to fail.
My point about implementation is that when we compare and contrast the different implementations of arrays and lists, we are doing just that. I'm not saying that it is not a useful thing to do. What I am saying is that when we compare and contrast the various implementations we should not get all hung up about whether we call them arrays or lists or whatever. Rather we should use terms that we can agree on ... or not use terms at all.
To me, "array" means "ordered collection of things that is probably efficiently indexable" and "list" means "ordered collection of things that may be efficiently indexable". But there are examples of both arrays and lists that go against the trend; e.g. PHP arrays on the one hand, and Java ArrayLists on the other hand. So if I want to be precise ... in a language-agnostic context, I have to talk about "C-like arrays" or "linked lists" or some other terminology that makes it clear what data structure I really mean. The terms "array" and "list" are of no use if I want to be clear.
An array is an ordered collection of data items indexed by integer. It is not possible to be certain of anything more. Vote for this answer you believe this is the only reasonable outcome of this question.
An array:
is a finite collection of elements
the elements are ordered, and this is their only structure
elements of the same type
supported efficient random access
has no expectation of efficient insertions
may or may not support append
(1) differentiates arrays from things like iterators or generators. (2) differentiates arrays from sets. (3) differentiates arrays from things like tuples where you get an int and a string. (4) differentiates arrays from other types of lists. Maybe it's not always true, but a programmer's expectation is that random access is constant time. (5) and (6) are just there to deny additional requirements.
I would argue that a real array stores values in contiguous memory. Anything else is only called an array because it can be used like array, but they aren't really ("arrays" in PHP are definately not actual arrays (non-associative)). Vectors and such are extensions of arrays, adding additional functionality.
an array is a container, and the objects it holds have no any relationships except the order; the objects are stored in a continuous space abstractly (high level, of course low level may continuous too), so you could access them by slot[x,y,z...].
for example, per array[2,3,5,7,1], you could get 5 using slot[2] (slot[3] in some languages).
for a list, a container too, each object (well, each object-holder exactly such as slot or node) it holds has indicators which "point" to other object(s) and this is the main relationship; in general both high or low level the space is not continuous, but may be continuous; so accessing by slot[x,y,z...] is not recommended.
for example, per |-2-3-5-7-1-|, you need to do a travel from first object to 3rd one to get 5.

Resources