How download more than 100MB data into csv from snowflake's database table - snowflake-cloud-data-platform

Is there a way to download more than 100MB of data from Snowflake into excel or csv?
I'm able to download up to 100MB through the UI, clicking the 'download or view results button'

You'll need to consider using what we call "unload", a.k.a. COPY INTO LOCATION
which is documented here:
https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-location.html
Other options might be to use a different type of client (python script or similar).
I hope this helps...Rich
.....EDITS AS FOLLOWS....
Using the unload (COPY INTO LOCATION) isn't quite as overwhelming as it may appear to be, and if you can use the snowSQL client (instead of the webUI) you can "grab" the files from what we call an "INTERNAL STAGE" fairly easily, example as follows.
CREATE TEMPORARY STAGE my_temp_stage;
COPY INTO #my_temp_stage/output_filex
FROM (select * FROM databaseNameHere.SchemaNameHere.tableNameHere)
FILE_FORMAT = (
TYPE='CSV'
COMPRESSION=GZIP
FIELD_DELIMITER=','
ESCAPE=NONE
ESCAPE_UNENCLOSED_FIELD=NONE
date_format='AUTO'
time_format='AUTO'
timestamp_format='AUTO'
binary_format='UTF-8'
field_optionally_enclosed_by='"'
null_if=''
EMPTY_FIELD_AS_NULL = FALSE
)
overwrite=TRUE
single=FALSE
max_file_size=5368709120
header=TRUE;
ls #my_temp_stage;
GET #my_temp_stage file:///tmp/ ;
This example:
Creates a temporary stage object in Snowflake, which will be discarded when you close your session.
Takes the results of your query and loads them into one (or more) csv files in that internal temporary stage, depending on size of your output. Notice how I didn't create another database object called a "FILE FORMAT", it's considered a best practice to do so, but you can do these one off extracts without creating that separate object if you don't mind having the command be so long.
Lists the files in the stage, so you can see what was created.
Pulls the files down using the GET, in this case this was run on my mac and the file(s) were placed in /tmp, if you are using Windoz you will need to modify a little bit.

Related

How to Import Just The FileName into SQL via SSIS

I am trying to write a file name to a table in my database - at the moment all I am achieving is importing the whole file path.
I have a foreach loop, which on the Collection looks in a specific folder and for a specific file type (the retrieve file name is fully qualified)
This has a variable mapping to "ImportInvoiceFilePath"
Then within that is a Data Flow Task which includes the flat file source, a derived column which creates the file path in the database.
This works fine - but what I am trying really hard to do but can't work out is how do I get just the file name (no extension) to write to the database as well?
Literally worked it out. Set my forloop to nameonly then in my connection to my Source file under expressions put:
#[User::ProcessingInvoiceFilePath] + "\\"+#[User::ImportInvoiceFileName]+".saf"
Where saf is the file type

Auto-generating destinations of split files in SSIS

I am working on my first SSIS package. I have a view with data that looks something like:
Loc Data
1 asd
1 qwe
2 zxc
3 jkl
And I need all of the rows to go to different files based on the Loc value. So all of the data rows where Loc = 1 should end up in the file named Loc1.txt, and the same for each other Loc.
It seems like this can be accomplished with a conditional split to flat file, but that would require a destination for each Location. I have a lot of Locations, and they all will be handled the same way other than being split in to different files.
Is there a built in way to do this without creating a bunch of destination components? Or can I at least use the script component to act as a way?
You should be able to set an expression using a variable. Define your path up to the directory and then set the variable equal to that column.
You'll need an Execute SQL task to return a Single Row result set, and loop that in a container for every row in your original result set.
I don't have access at the moment to post screenshots, but this link should help outline the steps.
So when your package runs the expression will look like:
'C:\Documents\MyPath\location' + #User::LocationColumn + '.txt'
It should end up feeding your directory with files according to location.
Set the User::LocationColumn equal to the Location Column in your result set. Write your result set to group by Location, so all your records write to a single file per Location.
I spent some time try to complete this task using the method #Phoenix suggest, but stumbled upon this video along the way.
I ended up going with the method shown in the video. I was hoping I wouldn't have to separate it in to multiple select statements for each location and an extra one to grab the distinct locations, but I thought the SSIS implementation in the video was much cleaner than the alternative.
Change the connection manager's connection string, in which you have to use variable which should be changed.
By varying the variable, destination file also changes
and connection string is :
'C:\Documents\ABC\Files\' + #User::data + '.txt'
vote this if it helps you

How to write to CSV in Spark

I'm trying to find an effective way of saving the result of my Spark Job as a csv file. I'm using Spark with Hadoop and so far all my files are saved as part-00000.
Any ideas how to make my spark saving to file with a specified file name?
Since Spark uses Hadoop File System API to write data to files, this is sort of inevitable. If you do
rdd.saveAsTextFile("foo")
It will be saved as "foo/part-XXXXX" with one part-* file every partition in the RDD you are trying to save. The reason each partition in the RDD is written a separate file is for fault-tolerance. If the task writing 3rd partition (i.e. to part-00002) fails, Spark simply re-run the task and overwrite the partially written/corrupted part-00002, with no effect on other parts. If they all wrote to the same file, then it is much harder recover a single task for failures.
The part-XXXXX files are usually not a problem if you are going to consume it again in Spark / Hadoop-based frameworks because since they all use HDFS API, if you ask them to read "foo", they will all read all the part-XXXXX files inside foo as well.
I'll suggest to do it in this way (Java example):
theRddToPrint.coalesce(1, true).saveAsTextFile(textFileName);
FileSystem fs = anyUtilClass.getHadoopFileSystem(rootFolder);
FileUtil.copyMerge(
fs, new Path(textFileName),
fs, new Path(textFileNameDestiny),
true, fs.getConf(), null);
Extending Tathagata Das answer to Spark 2.x and Scala 2.11
Using Spark SQL we can do this in one liner
//implicits for magic functions like .toDf
import spark.implicits._
val df = Seq(
("first", 2.0),
("choose", 7.0),
("test", 1.5)
).toDF("name", "vals")
//write DataFrame/DataSet to external storage
df.write
.format("csv")
.save("csv/file/location")
Then you can go head and proceed with adoalonso's answer.
I have an idea, but not ready code snippet. Internally (as name suggest) Spark uses Hadoop output format. (as well as InputFormat when reading from HDFS).
In the hadoop's FileOutputFormat there is protected member setOutputFormat, which you can call from the inherited class to set other base name.
It's not really a clean solution, but inside a foreachRDD() you can basically do whatever you like, also create a new file.
In my solution this is what I do: I save the output on HDFS (for fault tolerance reasons), and inside a foreachRDD I also create a TSV file with statistics in a local folder.
I think you could probably do the same if that's what you need.
http://spark.apache.org/docs/0.9.1/streaming-programming-guide.html#output-operations

AS400: How to know which program create a file?

I am not expert about AS400, just know some commands and i exporti some files from AS400 (iSeries) into SQL Server 2005.
Actually i need to know which RPG Program created a file in a library. This because that file contains statistic data from other files stored in other AS400 libraries.
This screenshot show the file STTMVF in the library DAT_4DWH (by DSPLIB DAT_4DWH)
So there are a command that let me know which RPG program created the file STTMVF ?
If yes i need to open the source RPG or CL and try to understand which phisical files are used to compose this statistic file.
Thanks in advance!
You can use journal management or program references to determine what is writing to the file.
Journal management
Starting the journal
To create a basic journal you need to create a journal receiver, a journal, and activate journalling for the file. Replace RECEIVER-LIB, RECEIVER-FILE, JOURNAL-LIB, JOURNAL-FILE, FILE-LIB and FILE with values appropriate for your system.
CRTJRNRCV JRNRCV(RECEIVER-LIB/RECEIVER-FILE)
CRTJRN JRN(JOURNAL-LIB/JOURNAL-FILE) JRNRCV(RECEIVER-LIB/RECEIVER-FILE)
STRJRNPF FILE(FILE-LIB/FILE) JRN(JOURNAL-LIB/JOURNAL-FILE) OMTJRNE(*OPNCLO)
Dumping the journal
DSPJRN JRN(JOURNAL-LIB/JOURNAL-FILE) FILE(FILE-LIB/FILE) RCVRNG(*CURCHAIN) JRNCDE(R) ENTTYP(PT PX DL UP) OUTPUT(*OUTFILE) OUTFILFMT(*TYPE1) OUTFILE(QTEMP/QADSPJRN)
Querying the journal
The field JOPGM will contain the program name that inserted, updated, or deleted records from the file.
Removing the journal
ENDJRNPF FILE(FILE-LIB/FILE)
DLTJRN JRN(JOURNAL-LIB/JOURNAL-FILE)
Program references
Dumping the references
DSPPGMREF PGM(*ALLUSR/*ALL) OUTPUT(*OUTFILE) OUTFILE(QTEMP/QADSPPGM)
Querying the references
Search the file for all references where the field WHFNAM equals FILE. The field WHPNAM will contain the program name. Due to file overrides, etc this method is not as accurate as using a journal.

Read a value from a .xls file using .bat files

I just want to know if there could be any way by which we can read a value from an .xls file using a .bat file.
For eg:If i have an .xls named test.xls which is having two columns
namely 'EID' and then 'mail ID'.Now when we give the input to the .xls the EID name.it should extract the mail id which corresponds to the EID and echo the result out.
**EID** **MailID**
E22222 MynameisA#company.com
E33333 MynameisB#company.com
...
...
So by the above table,when i give the input to the xls file using my .bat file as E22222,it should read the corresponding mail ID as MynameisA#company.com and it should echo the value.
So i hope i am able to present my doubt.Please get back to me for more clarifications.
Thanks and regards
Maddy
There is no facility to do this directly with traditional .bat files. However, you might investigate PowerShell, which is designed to be able to do this sort of thing. PowerShell integrates well with existing Windows applications (such as Excel) and may provide the tools you need to do this easily.
A quick search turned up this example of reading Excel files from PowerShell.
You can't do this directly from a batch file. Furthermore, to manipulate use Excel files in scripting you need Excel to be installed.
What you can do is wrap the Excel-specific stuff in a VBScript and call that from your batch.
You can do it with Alacon - command-line utility for Alasql database.
It works with Node.js, so you need to install Node.js and then Alasql package:
To take data from Excel file you can use the following command:
> node alacon "SELECT VALUE [mail ID] FROM XLS('mydata.xls', {headers:true})
WHERE EID = ?" "E2222"
Fist parameter is a SQL-expresion, which read data from XLSX file with header and search data
for second parameter value: "E22222". The command returns mail ID value.
This will be hard (very close to impossible) in BAT, especially when using the original XLS file, but even after an export to CSV it will be much easier to use a script/programming language (Perl, C, whatever) to do this.

Resources