I'm trying to concatenate the element of int array to one string in hive.
The function concat_ws works only for string arrays, so I tried cast(my_int_array as string) but it's not working.
Any suggestion?
Try to transform using /bin/cat:
from mytable select transform(my_int_array) using '/bin/cat' as (my_int_array);
Second option is to alter table and replace delimiters:
1) ALTER TABLE mytable CHANGE COLUMN my_int_array = my_int_array_string string;
2) SELECT REPLACE(my_int_array_string, '\002', ', ') FROM mytable;
It seems that the easiest way is to write a custom UDF to perform this specific task:
public class ConcatIntArray extends UDF {
public String evaluate(ArrayList<Integer> in, final String delimiter){
return in.stream().map(u-> String.valueOf(u)).collect(Collectors.joining(delimiter));
}
}
Related
i'm using Flink SQL and the following scheme shows my source data (belonging to some Twitter data):
CREATE TABLE `twitter_raw` (
`entities` ROW(
`hashtags` ROW(
`text` STRING,
`indices` INT ARRAY
) ARRAY,
`urls` ROW(
`indices` INT ARRAY,
`url` STRING,
`display_url` STRING,
`expanded_url` STRING
) ARRAY,
`user_mentions` ROW(
`screen_name` STRING,
`name` STRING,
`id` BIGINT
) ARRAY
)
)
WITH (...);
I want to get only the hashtags in a collection. Therefore i have to map the collection of constructed objects (ROW) to an array of STRING.
Like this scheme:
CREATE TABLE `twitter_raw` (
`entities` ROW(
`hashtags` STRING ARRAY,
`urls` STRING ARRAY,
`user_mentions` STRING ARRAY
)
)
WITH (...);
How can i achieve this with Flink-SQL? Maybe built-in functions (JSON-functions?) or own UDF or do i have to write a DataStream Job?
Thanks in advance.
The SQL command UNNEST helps in this case. It is like EXPLODE in Spark.
You can solve it by creating a new row for each hashtag in the hashtags array:
SELECT hashtag, index
FROM twitter_raw
CROSS JOIN UNNEST(hashtags) AS t (hashtag, index)
You can either define a computed row, or a VIEW, and then extracting the hashtags field using the dot notation. e.g.:
CREATE VIEW hashtags_raw (hashtags) AS
SELECT entities.hashtags AS hashtags FROM twitter_raw
I have a function which has a return type of IList<Product>
class Product
{
int Id,
string ProductClass,
string ProductName
}
I have to make a comma separated string of ProductName. I am trying below code but it is not giving me correct result
Array arrayofProduct = MyFunction().ToArray();
string productNames = string.Join(",", arrayofProduct);
I think it is because arrayofProduct has 3 columns and I have to pass only 1 (i.e. ProductName) to get the comma separated list.
Use Linq to Select the ProductName into a collection and then use that to construct the desired comma separated string
var names = MyFunction().Select(p => p.ProductName);
string productNames = string.Join(",", names);
Or apart from the above answer you can directly query your array of product and get the result.
string productNames = string.Join(",", arrayofProduct.Select(x => x.ProductName);
I'm trying to concatenate a number of characters corresponding to some ints (the first 15 ASCII characters for example):
;with cte as (
select 1 nr
union all
select nr + 1
from cte
where nr <= 15)
select (
select char(nr)
from cte
for xml path (''), type).value('.', 'nvarchar(max)')
option (maxrecursion 0)
but I'm getting an error saying:
Msg 6841, Level 16, State 1, Line 1
FOR XML could not serialize the
data for node 'NoName' because it contains a character (0x0001) which
is not allowed in XML. To retrieve this data using FOR XML, convert it
to binary, varbinary or image data type and use the BINARY BASE64
directive.
Even if I try to modify my CTE's seed from 1 to 10 for example, I still get the error but for a different character, 0x000B.
I have two possible solutions I'm looking for:
find a way to concatenate all the characters (any other method than using FOR XML) - preffered solution
or
remove all characters that are not allowed in XML - I've tried this but it seems I just hit other non-allowed characters. I've also looked for a list of these non-allowed characters but I couldn't find one.
Any help is very much appreciated.
Update - context:
This is part of a bigger CTE where I'm trying to generate random character sets from random numbers by doing multiple divisions and modulus operations.
I modulo each number by 256, get the result, turn it into its corresponding CHAR() and then dividing the number by 256 and so on until it's modulo or division is 0.
In the end I want to concatenate all of these characters. I have everything in place, I'm just encountering this error which does not allow me to concatenate the generated strings from CHAR().
This might sound weird and you might say that it's not a SQL-task and you can do it in other languages, but I want to try and find a solution in SQL, no matter how low the performance is.
XML PATH is just one of the techniques used for grouped concatenation. Aaron Bertrand explains and compares all of them in Grouped Concatenation in SQL Server. Built-in support for this is coming in the next version of SQL Server in the form of STRING_AGG.
Bertrand's article explains that XML PATH can only work with XML safe characters. Non-printable characters like 0x1 (SOH) and 0xB (Vertical Tab) won't work without XML encoding the data first. Typically, this isn't a problem because real data doesn't contain non-printable charactes - what would a SOH and VT look like on a text box?
Perhaps, the easiest way to solve your problem is to use UNICODE() instead of CHAR() to generate Unicode characters and start form 32 instead of 0 or 1.
For now, the fastest and safest method to aggregate strings is to use a SQLCLR custom aggregate. If you don't use sloppy techniques like concatenating strings directly, it will also consume the least amount of memory.The various GROUP_CONCAT implementations shown in this project are small enough that you can copy and use in your own projects. They will work with any Unicode character too, even with non-printable ones.
BTW, SQL Server vNext brings STRING_AGG to aggregate strings. We'll just have to wait a year or two.
The non-ordered version, GROUP_CONCAT is just 99 lines. It simply collects all strings in a dictionary and writes them out at the end:
using System;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.IO;
using System.Collections.Generic;
using System.Text;
namespace GroupConcat
{
[Serializable]
[SqlUserDefinedAggregate(Format.UserDefined,
MaxByteSize = -1,
IsInvariantToNulls = true,
IsInvariantToDuplicates = false,
IsInvariantToOrder = true,
IsNullIfEmpty = true)]
public struct GROUP_CONCAT : IBinarySerialize
{
private Dictionary<string, int> values;
public void Init()
{
this.values = new Dictionary<string, int>();
}
public void Accumulate([SqlFacet(MaxSize = 4000)] SqlString VALUE)
{
if (!VALUE.IsNull)
{
string key = VALUE.Value;
if (this.values.ContainsKey(key))
{
this.values[key] += 1;
}
else
{
this.values.Add(key, 1);
}
}
}
public void Merge(GROUP_CONCAT Group)
{
foreach (KeyValuePair<string, int> item in Group.values)
{
string key = item.Key;
if (this.values.ContainsKey(key))
{
this.values[key] += Group.values[key];
}
else
{
this.values.Add(key, Group.values[key]);
}
}
}
[return: SqlFacet(MaxSize = -1)]
public SqlString Terminate()
{
if (this.values != null && this.values.Count > 0)
{
StringBuilder returnStringBuilder = new StringBuilder();
foreach (KeyValuePair<string, int> item in this.values)
{
for (int value = 0; value < item.Value; value++)
{
returnStringBuilder.Append(item.Key);
returnStringBuilder.Append(",");
}
}
return returnStringBuilder.Remove(returnStringBuilder.Length - 1, 1).ToString();
}
return null;
}
public void Read(BinaryReader r)
{
int itemCount = r.ReadInt32();
this.values = new Dictionary<string, int>(itemCount);
for (int i = 0; i <= itemCount - 1; i++)
{
this.values.Add(r.ReadString(), r.ReadInt32());
}
}
public void Write(BinaryWriter w)
{
w.Write(this.values.Count);
foreach (KeyValuePair<string, int> s in this.values)
{
w.Write(s.Key);
w.Write(s.Value);
}
}
}
}
Just another approach (works with non-printables too):
You are adding one character after each other. You do not need any group concatenation at all. Your recursive (rather iterativ) CTE is a hidden RBAR on its own and will do this for you.
The following example uses a list of ints (considering your use case where you need to do this with random numbers) as input:
DECLARE #SomeInts TABLE(ID INT IDENTITY,intVal INT);
INSERT INTO #SomeInts VALUES(36),(33),(39),(32),(35),(37),(1),(2),(65);
WITH cte AS
(
SELECT ID,intVal AS nr,CAST(CHAR(intVal) AS VARCHAR(MAX)) AS targetString FROM #SomeInts WHERE ID=1
UNION ALL
SELECT si.ID,intVal + 1,targetString + CHAR(intVal)
FROM #SomeInts AS si
INNER JOIN cte ON si.ID=cte.ID+1
)
SELECT targetString, CAST(targetString AS varbinary(max))
FROM cte
option (maxrecursion 0);
The result (printed and as growing hex list --> beware of x01 and x02):
I have a string array like this.
string[] ColumnArray = new string[] { First story, second data , third way };
Following is the linQ query on this array.
string query = (from x in ColumnArray
where x.Contains("Story")
select x).First();
But sometimes the query will be like this.
string query = (from x in ColumnArray
where ( x.Contains("Story") || x.Contains("View"))
select x).First();
That condition should add dynamically. SO how the dynamic LinQ can helps here.
I have written something like this.
string dynamiccondition= // some condition.
var query = (from x in ColumnArray.AsEnumerable().AsQueryable().Where(dynamiccondition).Select(x));
// but this is not working.
Any suggestion?
In DynamicLINQ you can use logical operation like AND(&&) and OR(||), so try something like this
string dynamiccondition="it.Contains(\"Story\") OR it.Contains(\"View\")"
var query = ColumnArray.AsQueryable()
.Where(dynamiccondition);
I have a straight forward query like,
Select emp_Name from Employee where empID=123
How can i get the return as only String instead of List<String>, as here this query will return only one value.
Change return type of method in your mapper interface to String. This will return only value or null if there is no entry.