Ruby PG gem select field in array format - arrays

I have a postgresql field that stores a 4-element array. I want to select the value of that field, but it's coming back as a string:
{43.690916,-79.396774,43.700845,-79.37125}
I would assume that the gem would have known the format of that field and returned an array but I am wrong.
How can I get this into an array without going through string methods? That would seem like a hack. Moving from four individual float fields to a single array field with associated methods, I thought would make records easier to access.
There was no migration, and restrictive to assume this is in Rails, which it is not. Here is the structure:
Column | Type | Collation | Nullable | Default
-----------+------------------------+-----------+----------+-------------------------------------------
loc_id | integer | | not null | nextval('mjtable_loc_id_seq'::regclass)
locname | character varying(255) | | |
locbounds | double precision[] | | |

In Postgres, array is still a string in the database so you will only have a string and you would need to deal with that as such.
conn = PG.connect( dbname: 'pgarray_development') # or whatever db name
data = conn.exec('SELECT * FROM foos').entries
=> [{"id"=>"1", "coords"=>"{1.0,2.0,3.0,4.0}"]
data.first['coords'].class
=>String
But you can do this
conn.type_map_for_results = PG::BasicTypeMapForResults.new conn
conn.exec("select coords::float[] from foos").values
=> [[1.0, 2.0, 3.0, 4.0]]
There are probably other ways to use type casts, see https://bitbucket.org/ged/ruby-pg/wiki/Home

Related

How to convert array<string> to string in hive

I want to convert an array<string> to string in hive. The array data is as follow:
+-------------------------------------+--+
| NULL |
| ["Extension","Terms & Conditions"] |
| ["Value (generic or item level)"] |
+-------------------------------------+--+
I want to collect array values to convert to string without [""] so that I could get result like:
+-------------------------------------+--+
| NULL |
| Extension,Terms & Conditions |
| Value (generic or item level) |
+-------------------------------------+--+
Following query: select concat_ws(',', col_name) as col_name from table_stg; provides the result but is is returning NULL as empty. I tried several reference like:
How can I convert array to string in hive sql?
Hive - How to cast array to string?
But not getting the desired result. Is there any way to get the desired result?
With reference to Vamsi comment, I managed to get this and thought to answer as well for the community reference.
select case when col_name is NULL then NULL
else concat_ws(',',col_name) end from table_name;

Element-wise sum of array across rows of a dataset - Spark Scala

I am trying to group the below dataset based on the column "id" and sum the arrays in the column "values" element-wise. How do I do it in Spark using Scala?
Input: (dataset of 2 columns, column1 of type String and column2 of type Array[Int])
| id | values |
---------------
| A | [12,61,23,43]
| A | [43,11,24,45]
| B | [32,12,53,21]
| C | [11,12,13,14]
| C | [43,52,12,52]
| B | [33,21,15,24]
Expected Output: (dataset or dataframe)
| id | values |
---------------
| A | [55,72,47,88]
| B | [65,33,68,45]
| C | [54,64,25,66]
Note:
The result has to be flexible and dynamic. That is, even if there are 1000s of columns or even if the file is several TBs or PBs, the solution should still hold good.
I'm a little unsure about what you mean when you say it has to be flexible, but just on top of my head, I can think of a couple of ways. The first (and in my opinion the prettiest) one uses a udf:
// Creating a small test example
val testDF = spark.sparkContext.parallelize(Seq(("a", Seq(1,2,3)), ("a", Seq(4,5,6)), ("b", Seq(1,3,4)))).toDF("id", "arr")
val sum_arr = udf((list: Seq[Seq[Int]]) => list.transpose.map(arr => arr.sum))
testDF
.groupBy('id)
.agg(sum_arr(collect_list('arr)) as "summed_values")
If you have billions of identical ids, however, the collect_list will of course be a problem. In that case you could do something like this:
testDF
.flatMap{case Row(id: String, list: Seq[Int]) => list.indices.map(index => (id, index, list(index)))}
.toDF("id", "arr_index", "arr_element")
.groupBy('id, 'arr_index)
.agg(sum("arr_element") as "sum")
.groupBy('id)
.agg(collect_list('sum) as "summed_values")
The below single-line solution worked for me
ds.groupBy("Country").agg(array((0 until n).map(i => sum(col("Values").getItem(i))) :_* ) as "Values")

How to get the dimensionality of an ARRAY column?

I'm working on a project that collects information about your schema from the database directly. I can get the data_type of the column using information_schema.columns, which will tell me if it's an ARRAY or not. I can also get the underlying type (integer, bytea etc) of the ARRAY by querying information_schema.element_types as described here:
https://www.postgresql.org/docs/9.1/static/infoschema-element-types.html
My problem is that I also need to know how many dimensions the array has, whether it is integer[], or integer[][] for example. Does anyone know of a way to do this? Google isn't being very helpful here, hopefully someone more familiar with the Postgres spec can lead me in the right direction.
For starters, the dimensionality of an array is not reflected in the data type in Postgres. The syntax integer[][] is tolerated, but it's really just integer[] internally.
Read the manual here.
This means that dimensions can vary within the same array type (the same table column).
To get actual dimensions of a particular array value:
SELECT array_dims(my_arr); -- [1:2][1:3]
Or to just get the number of dimensions:
SELECT array_ndims(my_arr); -- 2
There are more array functions for similar needs. See table of array functions in the manual.
Related:
Use string[][] with ngpsql
If you need to enforce particular dimensions in a column, add a CHECK constraint. To enforce 2-dimensional arrays:
ALTER TABLE tbl ADD CONSTRAINT tbl_arr_col_must_have_2_dims
CHECK (array_ndims(arr_col) = 2);
Multidimensional arrays support in Postgres is very specific. Multidimensional array types do not exist. If you declare an array as multidimensional, Postgres casts it automatically to a simple array type:
create table test(a integer[][]);
\d test
Table "public.test"
Column | Type | Modifiers
--------+-----------+-----------
a | integer[] |
You can store arrays of different dimensions in a column of an array type:
insert into test values
(array[1,2]),
(array[array[1,2], array[3,4]]);
select a, a[1] a1, a[2] a2, a[1][1] a11, a[2][2] a22
from test;
a | a1 | a2 | a11 | a22
---------------+----+----+-----+-----
{1,2} | 1 | 2 | |
{{1,2},{3,4}} | | | 1 | 4
(2 rows)
This is a key difference between Postgres and programming languages like C, python etc. The feature has its advantages and disadvantages but usually causes various problems for novices.
You can find the number of dimensions in the system catalog pg_attribute:
select attname, typname, attndims
from pg_class c
join pg_attribute a on c.oid = attrelid
join pg_type t on t.oid = atttypid
where relname = 'test'
and attnum > 0;
attname | typname | attndims
---------+---------+----------
a | _int4 | 2
(1 row)
It is not clear whether you can rely on this number, as for the documentation:
attndims - Number of dimensions, if the column is an array type; otherwise 0. (Presently, the number of dimensions of an array is not enforced, so any nonzero value effectively means "it's an array".)

sphinx - Column count doesn't match

I have the following in my sphinx
mysql> desc rec;
+-----------+---------+
| Field | Type |
+-----------+---------+
| id | integer |
| desc | field |
| tid | uint |
| gid | uint |
| no | uint |
+-----------+---------+
And I ran the following successfully in sphinx sql
replace into rec VALUES ('24','test test',1,1, 1 );
But when I run in the C mysql API I get this error
Column count doesn't match value count at row 1
the c code is this
if (mysql_query(con, "replace into rec VALUES ('24','test test',1,1, 1 )") )
{
fprintf(stderr, "%s\n", mysql_error(con));
mysql_close(con);
exit(1);
}
Please note that the C program is connecting to the sphinx sql with no issues
One problem may be that you are quoting the integer for the id column. I would try taking out the single quotes around the 24. The column named desc is also concerning, since that is a reserved word in MySQL.
A good best practice is to always specify the column names, even if you are inserting into all columns. The reason is that you may want to alter the table later to add a column and you don't necessarily want to go back and change all your code to match the new structure. It also makes your code clearer since you don't have to reference the table structure to know what the values mean and it helps in case a tool like Sphinx is using a different order for the columns than you expect. Try changing your code to this, which specifies the columns and quotes them (mysql uses backticks for quotes) and also removes the quotes around the value for the id column:
if (mysql_query(con, "replace into rec (`id`, `desc`, `tid`, `gid`, `no`) VALUES (24, 'test test', 1, 1, 1)") )

Are discrete string values repeated on disk for each duplication?

I have to store a definite set of string values in a column in a large table. You're probably wondering why I don't use another look-up table and set a FK-PK relationship; well imagine there's a good reason for that.
Does Oracle use an compression mechanism for such columns? Or, is there any way to make it use one?
If the answer is negative does Oracle just stores the exact characters for every duplication of values? Can you provide a reference?
As with dates Oracle does not compress data for you.
Setting up a simple environment:
create table test ( str varchar2(100) );
insert all
into test values ('aaa')
into test values ('aba')
into test values ('aab')
into test values ('abb')
into test values ('bbb')
select * from dual;
and using DUMP(), which returns the datatype, the length in bytes and the internal representation of the data, you can see what is stored using this query:
select str, dump(str)
from test
The answer is that in every case 3 bytes are stored.
+-----+-----------------------+
| STR | DUMP(STR) |
+-----+-----------------------+
| aaa | Typ=1 Len=3: 97,97,97 |
| aba | Typ=1 Len=3: 97,98,97 |
| aab | Typ=1 Len=3: 97,97,98 |
| abb | Typ=1 Len=3: 97,98,98 |
| bbb | Typ=1 Len=3: 98,98,98 |
+-----+-----------------------+
SQL Fiddle
As jonearles suggests in the linked answer you can use table compression to reduce the amount of stored bytes, but there are a number of trade offs. Declare your table as follows instead:
create table test ( str varchar2(100) ) compress;
Please note all the warnings in the documentation, and jonearles' answer; there are too many to post here.
It's highly unlikely that you need to save a few bytes in this manner.

Resources