Ingest unstructured file into snowflake table - snowflake-cloud-data-platform

Have file with 200 rows, when I tried to load file into snowflake table it will print 200 rows, but I want is 1 row contains data for 200 rows.
create or replace table sample_test_single_col (LOADED_AT timestamp, FILENAME string, single_col varchar(2000) );
COPY INTO sample_test_single_col
from (
SELECT
CURRENT_TIMESTAMP as LOADED_AT,
METADATA$FILENAME as FILENAME,
s.$1 as single_col from #%table_stage s )
file_format = (type = csv);
Input:-
From:- robert
Sent: Thursday, August 03, 2006 1:15 PM
To: Jerry
Subject: RE: Latest news
All documents are scanned.
Desired output:-
Row LOADED_AT FILENAME SINGLE_COL
1 06-06-2022 03:14 #table_stage/filename.csv From:- robert
Sent: Thursday, August 03, 2006 1:15 PM
To: Jerry
Subject: RE: Latest news
All documents are scanned.
Current Output:-
Row LOADED_AT FILENAME SINGLE_COL
1 06-06-2022 03:14 #table_stage/filename.csv From:- robert
2 06-06-2022 03:14 #table_stage/filename.csv Sent: Thursday, August 03, 2006 1:15 PM
3 06-06-2022 03:14 #table_stage/filename.csv To: Jerry
4 06-06-2022 03:14 #table_stage/filename.csv Subject: RE: Latest news
5 06-06-2022 03:14 #table_stage/filename.csv All documents are scanned.
Any help will be appreciated!!

The parameter RECORD_DELIMITER's default value when loading data is "New line character". This is why each line becomes a new row when you load the file.
You can set the parameter to something else (which you don't expect to have in your file):
COPY INTO sample_test_single_col
from (
SELECT
CURRENT_TIMESTAMP as LOADED_AT,
METADATA$FILENAME as FILENAME,
s.$1 as single_col from #mystage s )
file_format = (type = csv RECORD_DELIMITER = 'NONEXISTENT');

Related

HOW TO FILTER DJANGO QUERYSETS WITH MULTIPLE AGGREGATIONS

Lets say I have a django model table
class Table(models.Model):
name = models.CharField()
date_created = models.DatetimeField()
total_sales = models.DecimalField()
some data for context
Name
date-created
total-sales
a
2020-01-01
200
b
2020-02-01
300
c
2020-04-01
400
*
**********
***
c
2020-12-01
1000
c
2020-12-12
500
now I want to filter an aggregate of
total_yearly_sales = 10500
current month being December
total_monthly_sales = 1500
daily_sales
total_daily_sales = 500
also do a Group by by name
models.Table.objects.values('Name').annotate(Sum('total-sales')).order_by()
I want to do this in one query(one db hit)
Hence the query should generate
total_yearly_sales
total_monthly_sales
total_daily_sales
total_sales_grouped_by_name ie {a:200, b:300, c:1900}
I know this is too much to ask. Hence let me express my immense gratitude and thanks for having a look at this.
cheers
The above queries I can generate them individually like so
today = timezone.now().date()
todays_sales = models.Table.filter(date_created__date__gte=today, date_created___date__lte=today).aggregate(Sum('total_sales'))
=> 500
monthly_sales(this month) = models.Table.objects.filter(date_created__year=today.year, date_created__month=today.month).aggregate(Sum('total_sales'))
=>10500
total_yearly_sales = models.Table.objects.filter(date_created__year=today.year).aggregate(Sum('total_sales')) => 10500

Filesystem connector produces no output & streamEnvironment.executes() throws "no operators defined"

I have following code that wants to write the data generated with datagen to a file, but when I run the application, no target directory is created, and no data is written.
When I add env.execute() at the end of the code, it complains that No operators defined in streaming topology. Cannot execute.
I would ask how to make the application work, thanks.
test("insert into table") {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
val tenv = StreamTableEnvironment.create(env)
val ddl =
"""
create temporary table abc(
name STRING,
age INT
) with (
'connector' = 'datagen'
)
""".stripMargin(' ')
tenv.executeSql(ddl)
val sql =
"""
select * from abc
""".stripMargin(' ')
val sinkDDL =
s"""
create temporary table xyz(
name STRING,
age INT
) with (
'connector' = 'filesystem',
'path' = 'D:\\${System.currentTimeMillis()}-csv' ,
'format' = 'csv'
)
""".stripMargin(' ')
tenv.executeSql(sinkDDL)
val insertInSQL =
"""
insert into xyz
select name, age from abc
""".stripMargin(' ')
tenv.executeSql(insertInSQL)
// env.execute()
}
I think you should have UDF in table execution, see
https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/functions/udfs.html#table-functions
You could see the example, write the function and insert it into your sql pipeline, this works as the "operator" in your error msg.
I think it actually works, just not when we think it does :)
I've tested this with the Blink planner in Flink 1.12:
"org.apache.flink" %% "flink-table-planner-blink" % flinkVersion % "provided"
Calling env.execute() on the StreamingEnvironment at the end is actually not required, since each .executeSql() earlier in the program are already submitting asynchronous jobs. The sink in your code then gets attached to one of those jobs and not with the job that env.execute() would launch (which in this case is an empty job, triggering the error you mention). I found a clue about that in this response on the mailing list.
When I run the code in the question (with the Blink planner, and adapting the output to 'path' = '/tmp/hello-flink-${System.currentTimeMillis()}-csv' on my host), I see several hidden files being progressively being created. I'm guessing they are similarly being hidden on your Windows host (files starting with a . below mean hidden on Linux):
ls -ltra /tmp/hello-flink-1609574239647-csv
total 165876
drwxrwxrwt 40 root root 12288 Jan 2 08:57 ..
-rw-rw-r-- 1 svend svend 134217771 Jan 2 08:59 .part-393f5557-894a-4396-bdf9-c7813fdd1d75-0-0.inprogress.48863a2b-f022-401b-95e3-659ec4920162
drwxrwxr-x 2 svend svend 4096 Jan 2 08:59 .
-rw-rw-r-- 1 svend svend 35616014 Jan 2 08:59 .part-393f5557-894a-4396-bdf9-c7813fdd1d75-0-1.inprogress.3412bcb0-d30d-43be-819b-1acf26a0a8bb
What's happening is simply that the rolling policy of the FileSystem SQL connector is by default waiting much longer before committing the files.
If you start your code from the IDE, you can adapt the creation of the environment as follows (would normally be done in conf/flink-conf.yaml):
val props = new Properties
props.setProperty("execution.checkpointing.interval", "10000") // 10000 ms
val conf = ConfigurationUtils.createConfiguration(props)
val fsEnv = StreamExecutionEnvironment.createLocalEnvironment(1, conf)
and use a small file size in the output connector:
create temporary table xyz(
name STRING,
age INT
) with (
'connector' = 'filesystem',
'path' = '/tmp/hello-flink-${System.currentTimeMillis()}-csv' ,
'format' = 'csv',
'sink.rolling-policy.file-size' = '1Mb'
And CSV files should now be committed much faster:
ls -ltra hello-flink-1609575075617-csv
total 17896
-rw-rw-r-- 1 svend svend 1048669 Jan 2 09:11 part-a6158ce5-25ea-4361-be11-596a67989e4a-0-0
-rw-rw-r-- 1 svend svend 1048644 Jan 2 09:11 part-a6158ce5-25ea-4361-be11-596a67989e4a-0-1
-rw-rw-r-- 1 svend svend 1048639 Jan 2 09:11 part-a6158ce5-25ea-4361-be11-596a67989e4a-0-2
-rw-rw-r-- 1 svend svend 1048676 Jan 2 09:11 part-a6158ce5-25ea-4361-be11-596a67989e4a-0-3
-rw-rw-r-- 1 svend svend 1048680 Jan 2 09:11 part-a6158ce5-25ea-4361-be11-596a67989e4a-0-4
-rw-rw-r-- 1 svend svend 1048642 Jan 2 09:11 part-a6158ce5-25ea-4361-be11-596a67989e4a-0-5

Using Crystal Report employee in out time calculation

I have employee attendance software and data like this
emp id Date Time
1 15/06/16 08:00 12:30 01:00 08:00
2 15/06/16 08:00 12:30 01:00 07:30
How to calculate total hours in the day using crystal reports?
For example:
emp id 1 on date 15 /06/16 total hours work day is 12 hours
and
emp id 2 on date 15/06/16 total hours work day is 11:30 hours.
Try this:
Create a formula:
Split(Totext(database.Time)," ")[1]& " "&"in time"&ChrW(13)
Split(Totext(database.Time)," ")[2]& " "&"out time"&ChrW(13)
Split(Totext(database.Time)," ")[3]& " "&"in time"&ChrW(13)
Split(Totext(database.Time)," ")[4]& " "&"out time"
Place in detail section
Edit..............
Create a formula and write below code and place after the database field:
Note: Where Database.fieldname contains 08:00 12:30 01:00 08:00 whole string
Numbervar starthour;
Numbervar startminute;
Numbervar endhour;
Numbervar endminute;
Numbervar Finalhour;
Numbervar Finalminute;
Numbervar i:=0;
starthour:=(ToNumber(Split(Split({Database.fieldname}," ")[2],":")[1]) - ToNumber(Split(Split({Database.fieldname}," ")[1],":")[1]));
endhour:=(ToNumber(Split(Split({Database.fieldname}," ")[7],":")[1]) - ToNumber(Split(Split({Database.fieldname}," ")[4],":")[1]));
startminute:=(ToNumber(Split(Split({Database.fieldname}," ")[2],":")[2]) - ToNumber(Split(Split({Database.fieldname}," ")[1],":")[2]));
endminute:=(ToNumber(Split(Split({Database.fieldname}," ")[7],":")[2]) - ToNumber(Split(Split({Database.fieldname}," ")[4],":")[2]));
if (startminute+endminute) >=60
Then
(
Finalhour:=(starthour+endhour)+1;
Finalminute:=(startminute+endminute)-60
)
else
(
Finalhour:=(starthour+endhour);
Finalminute:=(startminute+endminute)
)
;
Finalhour&":"&Finalminute

Date format issue in Firefox browser

My code
for(n in data.values){
data.values[n].snapshot = new Date(data.values[n].snapshot);
data.values[n].value = parseInt(data.values[n].value);
console.log(data.values[n].snapshot);
}
here console.log shows perfect date in Chrome as 'Thu Aug 07 2014 14:29:00 GMT+0530 (India Standard Time)', but in Firefox it is showing as 'Invalid Date'.
If I console.log(data.values[n].snapshot) before the new Date line, it is showing date as
2014-08-07 14:29
How can I convert the date format to Firefox understandable way.
The Date object only officially accepts two formats:
Mon, 25 Dec 1995 13:30:00 GMT
2011-10-10T14:48:00
This means that your date 2014-08-07 14:29 is invalid.
Your date can be easily made compatible with the second date format though (assuming that date is yyyy-mm-dd hh:mm):
for(n in data.values){
n = n.replace(/\s/g, "T");
data.values[n].snapshot = new Date(data.values[n].snapshot);
data.values[n].value = parseInt(data.values[n].value);
console.log(data.values[n].snapshot);
}

Smarty - scope of dates

I'm creating smarty3 code that shows the expected delivery date of a product depending on the current time and day of the week.
Tuesday -> <14:00 Friday - 14:00 Monday)
Wednesday -> <14:00 Monday - 14:00 Tuesday)
Thursday -> <14:00 Tuesday - 14:00 Wednesday)
Friday -> <14:00 Wednesday - 14:00 Thursday)
Monday -> <14:00 Thursday - 14:00 Friday)
I'm thinking of storing the list in an array where the Key is the day of the week and value is the scope.
How can I check if the current sever date corresponds with any of the scopes and how should I store the scopes in the array so that I can check the range?
In php:
$expected = time()-14*3600+2*86400;
if(date('w',$expected) ==6){
$expected+=86400;
}
if(date('w',$expected) ==0){
$expected+=86400;
}
print date('l',$expected);
In smarty:
{$expected = $smarty.now() -14*3600+2*86400}
{if date('w',$expected) ==6}
{$expected = $expected + 86400}
{/if}
{if date('w',$expected) ==0}
$expected+=86400;
{$expected = $expected + 86400}
{/if}
{$expected|date_format:%A}

Resources