How do I get the index to take the filename as its value? - loops

I have a list of filenames that I want to make (they don't exist yet). I want to loop through the list and create each file. Next I want to write to each file a path (along with other text not shown here) that includes the name of the file. I have written something similar to below so far but cannot see how to get the index i to take the file name values. Please help.
import os
biglist=['sleep','heard','shed']
for i in biglist:
myfile=open('C:\autumn\winter\spring\i.txt','w')
myfile.write('DATA = c:\autumn\winter\spring\i.dat')
myfile.close

Maybe you can try this below python function.
import sys
biglist=['sleep','heard','shed']
def create_file():
for i in biglist:
try:
file_name_with_ext = "C:\autumn\winter\spring\"+ i + ".txt"
file = open(file_name_with_ext, 'a')
file.close()
except:
print("caught error!")
sys.exit(0)
create_file() #invoking the function

Related

Python loop to extract all sequences shorter than 100AA in multiple fasta files

I'm kinda new to python and I wrote a script to loop through all fasta files in a directory and extract the sequences shorter than 100AA of each file:
from Bio import SeqIO
import sys
import os
def loop_extractsmorfs(input_handle, output_handle):
files = os.listdir(input_handle)
for file in SeqIO.parse(files, "fasta"):
if len(file.seq) <= 100 :
files.append(file)
SeqIO.write(files, output_handle, "fasta")
if __name__=="__main__" :
loop_extractsmorfs(input_handle=sys.argv[1], output_handle=sys.argv[2])
When I run this code on the terminal using both the input_handle and output_handle as arguments I get:
AttributeError: 'list' object has no attribute 'read'
I imagine there must be some mistake in the way I'm using the os.listdir or something but the examples I found online only show how to "print the files in that directory" and I need to extract and write new files.
You are trying to parse a list of files. SeqIO cannot do this. You should parse each file separately & then, for your convenience, unite resulted iterators into one (via chain in my example). Necessary sequences should be stored in a seclude list.
So:
from Bio import SeqIO
import sys
from pathlib import Path
from itertools import chain
def loop_extractsmorfs(input_dirpath: Path, output_filepath: Path) -> None:
little_records = []
for record in chain(
*(SeqIO.parse(filepath, "fasta") for filepath in input_dirpath.iterdir())
):
if len(record.seq) <= 100:
little_records.append(record)
SeqIO.write(little_records, output_filepath, "fasta")
if __name__=="__main__" :
loop_extractsmorfs(
input_dirpath=Path(sys.argv[1]), output_filepath=Path(sys.argv[2])
)
This could be put into a list comprehension but I would not like to overload the answer.

How to export Snowflake Web UI Worksheet SQL to file

Classic Snowflake Web UI and the new Snowsight are great at importing sql from a file but neither allows you to export sql to a file. Is there a workaround?
You can use an IDE to connect to snowflake and write queries. Then the scripts can be downloaded using IDE features and can sync with git repo as well.
dbeaver is one such IDE which supports snowflake :
https://hevodata.com/learn/dbeaver-snowflake/
The query pane is interactive so the obvious workaround will be:
CTRL + A (select all)
CTRL + C (copy)
<open_favourite_text_editor>
CTRL + P (paste)
CTRL + S (save)
This tool can help you while the team develops a native feature to export worksheets:
"Snowflake Snowsight Extensions wrap Snowsight features that do not have API or SQL alternatives, such as manipulating Dashboards and Worksheets, and retrieving Query Profile and step timings."
https://github.com/Snowflake-Labs/sfsnowsightextensions
Further explained on this post:
https://medium.com/snowflake/importing-and-exporting-snowsight-dashboards-and-worksheets-3cd8e34d29c8
For example, to save to a file within PowerShell:
PS > $dashboards | foreach {$_.SaveToFolder(“path/to/folder”)}
PS > $dashboards[0].SaveToFile(“path/to/folder/mydashboard.json”)
ETA: I'm adding this edit to the front because this is what actually worked.
Again, BSON was a dead end & punycode is irrelevant. I don't know why punycode is referenced in the metadata file; but my best guess is that they might use punycode to encode the worksheet name itself (though I'm not sure why that would be needed since it shouldn't need to be part of a URL).
After doing terrible things and trying a number of complex ways of dealing with escape character hell, I found that the actual encoding is very simple. It just works as an 8 bit encoding with anything that might cause problems escaped away (null, control codes, double quotes, etc.). To load, treat the file as a text file using an 8-bit encoding; extract the data as a JSON field, then re-encode that extracted data as that same encoding. I just used latin_1 to read; but it may not even matter which encoding you use as long as you are consistent and use the same one to re-encode. The encoded field will then be valid zlib compressed data.
I decided that I wanted to start from scratch so I needed to back the worksheets first and I made a Python script based on my findings above. Be warned that this may return even worksheets that you previously closed for good. After running this and verifying that backups were created, I just ran rm #~/worksheet_data/;, closed the tab & reopened it.
Here's the code (fill in the appropriate base directory location):
import os
from collections import OrderedDict
import configparser
from sqlalchemy import create_engine, exc
from snowflake.sqlalchemy import URL
import pathlib
import json
import zlib
import string
def format_filename(s: str) -> str: # From https://gist.github.com/seanh/93666
"""Take a string and return a valid filename constructed from the string.
Uses a whitelist approach: any characters not present in valid_chars are
removed. Also spaces are replaced with underscores.
Note: this method may produce invalid filenames such as ``, `.` or `..`
When I use this method I prepend a date string like '2009_01_15_19_46_32_'
and append a file extension like '.txt', so I avoid the potential of using
an invalid filename.
"""
valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)
filename = ''.join(c for c in s if c in valid_chars)
# filename = filename.replace(' ','_') # I don't like spaces in filenames.
return filename
def trlng_dash(s: str) -> str:
"""Removes trailing character if present."""
return s[:-1] if s[-1] == '-' else s
sso_authenticate = True
# Assumes CLI config file exists.
config = configparser.ConfigParser()
home = pathlib.Path.home()
config_loc = home/'.snowsql/config' # Assumes it's set up from Snowflake CLI.
base_dir = home/r'{your Desired base directory goes here.}'
json_dir = base_dir/'json' # Location for your worksheet stage JSON files.
sql_dir = base_dir/'sql' # Location for your worksheets.
# Assumes CLI config file exists.
config.read(config_loc)
# Add connection parameters here (assumes CLI config exists).
# Using sso so only 2 are needed.
# If there's no config file, etc. enter by hand here (or however you want to do it).
connection_params = {
'account': config['connections']['accountname'],
'user': config['connections']['username'],
}
if sso_authenticate:
connection_params['authenticator'] = 'externalbrowser'
if config['connections'].get('password', None) is not None:
connection_params['password'] = config['connections']['password']
if config['connections'].get('rolename', None) is not None:
connection_params['role'] = config['connections']['rolename']
if locals().get('database', None) is not None:
connection_params['database'] = database
if locals().get('schema', None) is not None:
connection_params['schema'] = schema
sf_engine = create_engine(URL(**connection_params))
if not base_dir.exists():
base_dir.mkdir()
if not json_dir.exists():
json_dir.mkdir()
if not (sql_dir).exists():
sql_dir.mkdir()
with sf_engine.connect() as connection:
connection.execute(f'get #~/worksheet_data/ \'file://{str(json_dir.as_posix())}\';')
for file in [path for path in json_dir.glob('*') if path.is_file()]:
if file.suffix != '.json':
file.replace(file.with_suffix(file.suffix + '.json'))
with open(json_dir/'metadata.json', 'r') as metadata_file:
files_meta = json.load(metadata_file)
# List of files from metadata file will contain some empty worksheets.
files_description_orig = OrderedDict((file_key_value['name'], file_key_value) for file_key_value in sorted(files_meta['activeWorksheets'] + list(files_meta['inactiveWorksheets'].values()), key=lambda x: x['name']) if file_key_value['name'])
# files_description will only track non empty worksheets
files_description = files_description_orig.copy()
# Create updated files description filtering out empty worksheets.
for item in files_description_orig:
json_file = json_dir/f"{files_description_orig[item]['name']}.json"
# If a file didn't make it or was deleted by hand, we should
# remove from the filtered description & continue to the next item.
if not (json_file.exists() and json_file.is_file()):
del files_description[item]
continue
with open(json_file, 'r', encoding='latin_1') as f:
json_dat = json.load(f)
# If the file represents a worksheet with a body field, we want it.
if not json_dat['wsContents'].get('body'):
del files_description[item]
## Delete JSON files corresponsing to empty worksheets.
# f.close()
# try:
# (json_dir/f"{files_description_orig[item]['name']}.json").unlink()
# except:
# pass
# Produce a list of normalized filenames (no illegal or awkward characters).
file_names = set(
format_filename(trlng_dash(files_description[item]['encodedDetails']['scriptName']).strip())
for item in files_description)
# Add useful information to our files_description OrderedDict
for file_name in file_names:
repeats_cnt = 0
file_name_repeats = (
item
for item
in files_description
if file_name == format_filename(trlng_dash(files_description[item]['encodedDetails']['scriptName']).strip())
)
for file_uuid in file_name_repeats:
files_description[file_uuid]['normalizedName'] = file_name
files_description[file_uuid]['stemSuffix'] = '' if repeats_cnt == 0 else f'({repeats_cnt:0>2})'
repeats_cnt += 1
# Now we iterate on non-empty worksheets only.
for item in files_description:
json_file = json_dir/f"{files_description[item]['name']}.json"
with open(json_file, 'r', encoding='latin_1') as f:
json_dat = json.load(f)
body = json_dat['wsContents']['body']
body_bin = body.encode('latin_1')
body_txt = zlib.decompress(body_bin).decode('utf8')
sql_file = sql_dir/f"{files_description[item]['normalizedName']}{files_description[item]['stemSuffix']}.sql"
with open(sql_file, 'w') as sql_f:
sql_f.write(body_txt)
creation_stamp = files_description[item]['created']/1000
os.utime(sql_file, (creation_stamp,creation_stamp))
print('Done!')
As mentioned at Is there any option in snowflake to save or load worksheets? (and in Snowflake's own documentation), in the Classic UI, the worksheets are saved at the user stage under #~/worksheet_data/.
You can download it with a get command like:
get #~/worksheet_data/<name> file:///<your local location>; (though you might need quoting if running from Windows).
The problem is that I do not know how to access it programmatically. The downloaded files look like JSON but it is not valid JSON. The main key is "wsContents" and contains most of the worksheet information. Its value includes two subkeys, "encoding" and "body".
The "encoding" key denotes that gzip is being used. The "body" key seems to be the actual worksheet data which looks a lot like a straight binary representation of the compressed text data. As such, any JSON reader will choke on it.
If it is anything like that, I do not currently know how to access it programmatically using Python.
I do see that a JSON like format exists, BSON, that is bundled into PyMongo. Trying to use this on these files fails. I even tried bson.is_valid and it returns False so I am assuming that it means that these files in Snowflake are not actually BSON.
Edited to add: Again, BSON is a dead end.
Examining the "body" value as just binary data, the first two bytes of sample files do seem to correspond to default zlib compression (0x789c). However, attempting to run straight zlib.decompress on the slice created from that first byte to the last corresponding to the first & last characters of the "body" value results in the error:
Error - 3 while decompressing data: invalid code lengths set
This makes me think that the bytes there, as is, are at least partly garbage and still need some processing before they can be decompressed.
One clue that I failed to mention earlier is that the metadata file (called "metadata" and which serves as an inventory of the remaining files at the #~/worksheet_data/ location) declares that the files use the punycode encoding. However, I have not known how to use that information. The data in these files doesn't particularly look like what I feel punycode should look like nor does it particularly make sense to me that you would use punycode on binary data that is not meant to ever be used to directly generate text such as zlib compressed data.

How to read a text file from resources without javaClass

I need to read a text file with readLines() and I've already found this question, but the code in the answers always uses some variation of javaClass; it seems to work only inside a class, while I'm using just a simple Kotlin file with no declared classes. Writing it like this is correct syntax-wise but it looks really ugly and it always returns null, so it must be wrong:
val lines = object {}.javaClass.getResource("file.txt")?.toURI()?.toPath()?.readLines()
Of course I could just specify the raw path like this, but I wonder if there's a better way:
val lines = File("src/main/resources/file.txt").readLines()
Thanks to this answer for providing the correct way to read the file. Currently, reading files from resources without using javaClass or similar constructs doesn't seem to be possible.
// use this if you're inside a class
val lines = this::class.java.getResourceAsStream("file.txt")?.bufferedReader()?.readLines()
// use this otherwise
val lines = object {}.javaClass.getResourceAsStream("file.txt")?.bufferedReader()?.readLines()
According to other similar questions I've found, the second way might also work within a lambda but I haven't tested it. Notice the need for the ?. operator and the lines?.let {} syntax needed from this point onward, because getResourceAsStream() returns null if no resource is found with the given name.
Kotlin doesn't have its own means of getting a resource, so you have to use Java's method Class.getResource. You should not assume that the resource is a file (i.e. don't use toPath) as it could well be an entry in a jar, and not a file on the file system. To read a resource, it is easier to get the resource as an InputStream and then read lines from it:
val lines = this::class.java.getResourceAsStream("file.txt").bufferedReader().readLines()
I'm not sure if my response attempts to answer your exact question, but perhaps you could do something like this:
I'm guessing in the final use case, the file names would be dynamic - Not statically declared. In which case, if you have access to or know the path to the folder, you could do something like this:
// Create an extension function on the String class to retrieve a list of
// files available within a folder. Though I have not added a check here
// to validate this, a condition can be added to assert if the extension
// called is executed on a folder or not
fun String.getFilesInFolder(): Array<out File>? = with(File(this)) { return listFiles() }
// Call the extension function on the String folder path wherever required
fun retrieveFiles(): Array<out File>? = [PATH TO FOLDER].getFilesInFolder()
Once you have a reference to the List<out File> object, you could do something like this:
// Create an extension function to read
fun File.retrieveContent() = readLines()
// You can can further expand this use case to conditionally return
// readLines() or entire file data using a buffered reader or convert file
// content to a Data class through GSON/whatever.
// You can use Generic Constraints
// Refer this article for possibilities
// https://kotlinlang.org/docs/generics.html#generic-constraints
// Then simply call this extension function after retrieving files in the folder.
listOfFiles?.forEach { singleFile -> println(singleFile.retrieveContent()) }
In order to have the same url that work for both Jar or in local, the url (or path) needs to be a relative path from the repository root.
..meaning, the location of your file or folder from your src folder.
could be "/main/resources/your-folder/" or "/client/notes/somefile.md"
The url must be a relative path from the repository root.
it must be "src/main/resources/your-folder/" or "src/client/notes/somefile.md"
Now you get the drill, and luckily for Intellij Idea users, you can get the correct path with a right-click on the folder or file -> copy Path/Reference.. -> Path From Repository Root (this is it)
Last, paste it and do your thing.

Import timeseries via loop (pot. generic)

Solved, now working example!
I have a set of time series that is populated in a folder structure as follows:
TimeSeriesData\Config_000000\seed_001\Aggregate_Quantity.txt
…
TimeSeriesData\Config_000058\seed_010\Aggregate_Quantity.txt
And in each file there is a first line containing a character, followed by lines containing the data. E.g.
share
-21.75
-20.75
…
Now, I would like to import all those files (at best, without specifying ex post the number of configs and seeds, at worst with supplying it) into single time series of the like: “AggQuant_config_seed” where config relates to the config ID and seed to the seed id.
I tried the following (using the non-preferred way), but “parsing” the “path” does not work / I do not know how to do it.
string base = "TimeSeriesData/Config_"
string middle = "/seed_"
string endd = "/Aggregate_Quantity.txt"
string path = ""
loop for (i=0;i<=58;i+=1)
loop for (j=1;j<=10;j+=1)
path = ""
sprintf path "%s%06d%s%03d%s",base,i,middle,j,endd
append #path #will be named share
rename share Agg_Q_$i_$j #rename
endloop
endloop
To sum up the problem, the following does not work:
string path="somwhere/a_file.txt" #string holding path
#wrong: append $path #use append on string
append #path #works!
And if possible, a way to search recursively through a set of folders, using the folder information together with file-information for the variable name, would be nice. Is that possible within gretl?
I realize that in many cases I would like to refer to a specific “help” section like those for functions and commands, but for operators instead (like “$”, which is obviously wrong here).

Unable to open a file with uigetfile in Matlab

I am building a code that lets the user open some files.
reference = warndlg('Choose the files for analysis.');
uiwait(reference);
filenames2 = uigetfile('./*.txt','MultiSelect', 'on');
if ~iscell(filenames2)
filenames2 = {filenames2}; % force it to be a cell array of strings
end
numberOfFiles = numel(filenames2);
data = importdata(filenames2{i},delimiterIn,headerlinesIn);
When I run the code, the prompts show up, I press OK, and then nothing happens. The code just stops, telling me :
Error using importdata (line 137)
Unable to open file.
Error in FreqVSChampB_no_spec (line 119)
data=importdata(filenames2{1},delimiterIn,headerlinesIn);
I just don't have the opportunity to select a file. The cellarray stays empty as showed in the following image.
MATLAB can't find the file that you have selected. Your variable filenames2 contains only the name of the file, not its full path. If you don't provide the full path to importdata, it will search for whatever file name you provide on the MATLAB path, and if it can't find it it will error as you see.
Try something like this - I'm just doing it with single selection for ease of description, but you can do something similar with multiple selection.
[fileName, pathName] = uigetfile('*.txt');
fullNameWithPath = fullfile(pathName, fileName);
importdata(fullNameWithPath)
fullfile is useful, as it inserts the correct character between pathName and fileName (\ on Windows, / on Unix).
You can try to add
pause(0.1);
just after uiwait(reference);
For me it works. In fact I've noticed the active windows changes when we use uiwait and uigetfile.

Resources