I use a working IronPython script to export in an xls file all the columns of a Spotfire Table.
I want to update the below script to only export “Selected columns” (see picture) defined in the Column menu of the Table Properties.
With more than 1500 columns among 50 tables it is not an option to hard coded/predefined the list of column to be exported. If I change the columns selection in the Table Properties only those selected column must be exported.
In the example I would like to have in my Test.xls file only content of the three columns “studyid”, “etcd” and “element” of my TE table.
With the below script “domain” and “tesrcdtc” column are also exported.
IronPython script:
from Spotfire.Dxp.Data.Export import DataWriterTypeIdentifiers
from System.IO import File, Directory
from System.Collections.Generic import List
from Spotfire.Dxp.Data import *
from Spotfire.Dxp.Application.Visuals import *
DataTable = Document.ActiveDataTableReference
Rows = Document.ActiveFilteringSelectionReference.GetSelection(DataTable).AsIndexSet()
writer = Document.Data.CreateDataWriter(DataWriterTypeIdentifiers.ExcelXlsDataWriter)
stream = File.OpenWrite("C:/Export/Test.xls")
Cols = []
for col in DataTable.Columns:
Cols.append(col.Name)
writer.Write(stream, DataTable, Rows, Cols)
stream.Close()
from Spotfire.Dxp.Data.Export import DataWriterTypeIdentifiers
from System.IO import File, Directory
from System.Collections.Generic import List
from Spotfire.Dxp.Data import *
from Spotfire.Dxp.Application.Visuals import *
DataTable = Document.ActiveDataTableReference
Rows = Document.ActiveFilteringSelectionReference.GetSelection(DataTable).AsIndexSet()
writer = Document.Data.CreateDataWriter(DataWriterTypeIdentifiers.ExcelXlsDataWriter)
stream = File.OpenWrite("C:/Export/Test.xls")
Cols = [“studyid”,“etcd”,“element”] # Instead of adding all the columns, just add the ones you want
writer.Write(stream, DataTable, Rows, Cols)
stream.Close()
Related
My company is attempting to use Snowflake Named Internal Stages as a data lake to store vendor extracts.
There is a vendor that provides an extract that is 1000+ columns in a pipe delimited .dat file. This is a canned report that they extract. The column names WILL always remain the same. However, the column locations can change over time without warning.
Based on my research, a user can only query a file in a named internal stage using the following syntax:
--problematic because the order of the columns can change.
select t.$1, t.$2 from #mystage1 (file_format => 'myformat', pattern=>'.data.[.]dat.gz') t;
Is there anyway to use the column names instead?
E.g.,
Select t.first_name from #mystage1 (file_format => 'myformat', pattern=>'.data.[.]csv.gz') t;
I appreciate everyone's help and I do realize that this is an unusual requirement.
You could read these files with a UDF. Parse the CSV inside the UDF with code aware of the headers. Then output either multiple columns or one variant.
For example, let's create a .CSV inside Snowflake we can play with later:
create or replace temporary stage my_int_stage
file_format = (type=csv compression=none);
copy into '#my_int_stage/fx3.csv'
from (
select *
from snowflake_sample_data.tpcds_sf100tcl.catalog_returns
limit 200000
)
header=true
single=true
overwrite=true
max_file_size=40772160
;
list #my_int_stage
-- 34MB uncompressed CSV, because why not
;
Then this is a Python UDF that can read that CSV and parse it into an Object, while being aware of the headers:
create or replace function uncsv_py()
returns table(x variant)
language python
imports=('#my_int_stage/fx3.csv')
handler = 'X'
runtime_version = 3.8
as $$
import csv
import sys
IMPORT_DIRECTORY_NAME = "snowflake_import_directory"
import_dir = sys._xoptions[IMPORT_DIRECTORY_NAME]
class X:
def process(self):
with open(import_dir + 'fx3.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
yield(row, )
$$;
And then you can read this UDF that outputs a table:
select *
from table(uncsv_py())
limit 10
A limitation of what I showed here is that the Python UDF needs an explicit name of a file (for now), as it doesn't take a whole folder. Java UDFs do - it will just take longer to write an equivalent UDF.
https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-tabular-functions.html
https://docs.snowflake.com/en/user-guide/unstructured-data-java.html
I have a folder with subfolders stacked with .xls files which I want to merge into one large DataFrame and export it to a mssql server. Furthermore the filenames contains a timestamp ddmmmyyyy which I need to extract and concatenate to the df.
import pandas as pd
import numpy as np
import os, pymssql, pyodbc
from datetime import datetime
from sqlalchemy import create_engine
def connect():
return pyodbc.connect(
r'DRIVER={SQL Server};'
r'SERVER=myServer;'
r'DATABASE=myDB;'
r'UID=myUser;'
r'PWD=myPwd;'
r'TDS_Version=7.3;'
r'Port=1337'
)
cnx = create_engine('mssql://', creator=connect)
cnx.connect()
# Parse files and dump to SQL
folder = "\myFolder\""
for root, dirs, files in os.walk(folder):
for file in files:
if file.endswith(".xls") and ("~" not in file):
df = pd.read_excel(root + "/" + file,header=5)
tmp = file.split("_")[2]
tmp = datetime.strptime(tmp, '%d%b%Y')
df['Created'] = tmp
df.to_sql(name="myTable", con=cnx, if_exists='append', index=False)
# Check the dumped content
sql = "SELECT * FROM myTable"
df = pd.read_sql(sql, cnx)
df.head()
The connection works, and from what I gather the loop runs, but no new data are added to the DataFrame. df.head() returns an unchanged table. Someone got any clues on what I'm doing wrong?
Also I get this annoying connection warning when running the create_engine statement, although it doesn't affect anything:
SAWarning: No driver name specified; this is expected by PyODBC when
using DSN-less connections "No driver name specified;
Any help appreciated! :)
I use peewee related with an exsits table:
import peewee
from playhouse.postgres_ext import *
class Rules(peewee.Model):
channels = JSONField(null=True)
remark = peewee.CharField(max_length=500, null=True)
class Meta:
database = db
db_table = 'biz_rule'
schema = 'opr'
example: in my table there exists a record in column channels:
["A012102","C012102","D012102","E012102"]
I want to judge whether "A012102" is in the list,how to write the code?
If you're using PostgreSQL 9.4+, you can use the jsonb data type using the corresponding postgres_ext.BinaryJSONField peewee field type. It has contains_any() and contains_all() methods that correspond to the PostgreSQL ?| and ?& operators (see the PostgreSQL JSON docs). So I think it'd be something like this:
from playhouse.postgres_ext import BinaryJSONField
class Rules(peewee.Model):
channels = BinaryJSONField(null=True)
...
query = Rules.select().where(Rules.channels.contains_all('A012102'))
I have group of excel files in a folder. excel file name will be like
ABC 2014-09-13.xlsx
ABC 2014-09-14.xlsx
ABC 2014-09-15.xlsx
I need to get the data from latest excel file and load it into the table using ssis package.
This may not be the shortest answer, but will help you.
Steps:
Create a For-each loop, to fetch all the excel sheets. Insert all the excel sheet names to a table.
Create a variable. Assign its value as the MAX() among Excel dates.
Add a 2nd Fore-each loop. Just like the 1st loop, pick all the excel sheets 1 by 1, compare each file name with Variable value. Load the excel which matches it.
As this is duplicate question, I will put answer anyway with some changes or additional info.
You should have created table for excel to import and added Connection Manager into package.
Create 2 variables MainDir, where excel files exists, and ExcelFile to hold last file full name.
Add Script Task to package. Open it and in the Script tab add ReadOnlyVariables = User::MainDir and ReadWriteVariables = User::ExcelFile
Press Edit Script... button and in the new window paste this code:
into Main
string fileMask = "*.xlsx";
string mostRecentFile = string.Empty;
string rootFolder = string.Empty;
rootFolder = Dts.Variables["User::MainDir"].Value.ToString();
System.IO.DirectoryInfo directoryInfo = new System.IO.DirectoryInfo(rootFolder);
System.IO.FileInfo mostRecent = null;
System.IO.FileInfo[] legacyArray = directoryInfo.GetFiles(fileMask, System.IO.SearchOption.TopDirectoryOnly);
Array.Sort(legacyArray, (f2, f1) => f2.Name.CompareTo(f1.Name));
mostRecent = legacyArray[legacyArray.Length - 1];
if (mostRecent != null)
{
mostRecentFile = mostRecent.FullName;
}
Dts.Variables["User::ExcelFile"].Value = mostRecentFile;
Dts.TaskResult = (int)ScriptResults.Success;`
Create Excel Connection Manager and in the Edit mode select Excel file path to some excel, Excel version and if needed keep First row has column names checked.
In the properties of Excel Connection Manager find Expressions and add Property ExcelFilePath with value #[User::ExcelFile]
Put Data Flow Task, connect with Script task.
Add Excel Source into Data Flow Task. Open editor. Select Excel Connection Manager you created before, Data access mode change to SQL command and add this line (make sure, that excel file sheet name is Sheet1): SELECT * FROM [Sheet1$]. Also check if all necessary columns selected in Columns tab.
The last component is OLE DB Destination, which you must connect with Excel Source component. Add connection manager, select table and mappings to table you want to insert.
That's all you need to do to insert excel...
I am relatively new to GUI's in Matlab, and I have created a simple GUI using GUIDE. I want to connect to a database (already defined and working!) and populate a listbox with the values from the database so the user can choose which to use (in this case they are chemical compounds). I haven't been able to find a good tutorial or clues on how to populate the listbox in this way. So far, I have:
function load_listbox(hObject,handles)
conn = database('antoine_db','','');
setdbprefs('datareturnformat','structure'); %sets the db preferences to a structure
query = 'SELECT ID,"Compound Name" FROM antoine_data ORDER BY ID';
result = fetch(conn,query);
%%The following creates a structure containing the names and ID's
%%of everything in the database
data = struct([]);
for i=1:length(result.ID)
data(i).id = result.ID(i);
data(i).name = char(result.CompoundName(i));
end
names = data.name;
handles.compounds = names;
whos;
set(handles.listbox1,'String',handles.compounds,'Value',1);
handles.output = hObject;
% Update handles structure
guidata(hObject, handles);
end
What would be the simplest way to populate a listbox from a database (or large array) like this? As of right now, the listbox is populated with only the first item in names, which is because somehow names contains only the first item. Although, if I just display 'data.name', I get the entire list of 300 items in the list!
I got it! So, the problem was that I was converting the data.name to a character -> originally it was a cell. Thus, I added names(i) = data(i).name; in the for loop, and removed names=data.name; It is now populated with all of the names of the compound! The working function looks like this:
function load_listbox(hObject,handles)
conn = database('antoine_db','','');
setdbprefs('datareturnformat','structure'); %sets the db preferences to a structure
query = 'SELECT ID,"Compound Name" FROM antoine_data ORDER BY ID';
result = fetch(conn,query);
%%The following creates a structure containing the names and ID's
%%of everything in the database
data = struct([]);
for i=1:length(result.ID)
data(i).id = result.ID(i);
data(i).name = (result.CompoundName(i)); %this is a cell
names(i) = data(i).name;
end
handles.compounds = names;
set(handles.listbox1,'String',handles.compounds,'Value',1);
handles.output = hObject;
% Update handles structure
guidata(hObject, handles);
end