How do I read the build-in Project.toml from a Pluto notebook? - package

I would like to instantiate the project.toml that's build in in a Pluto notebook with the native package manager. How do I read it from the notebook?
Say, I have a notebook, e.g.,
nb_source = "https://raw.githubusercontent.com/fonsp/Pluto.jl/main/sample/Interactivity.jl"
How can I create a temporary environment, and get the packages for the project of this notebook? In particular, how do I complete the following code?
cd(mktempdir())
import Pkg; Pkg.activate(".")
import Pluto, Pkg
nb = download(nb_source, ".")
### Some code using Pluto's build in package manager
### to read the Project.toml from nb --> nb_project_toml
cp(nb_project_toml, "./Project.toml", force=true)
Pkg.instantiate(".")

So, first of all, the notebook you are looking at is a Pluto 0.17.0 notebook, which does not have the internal package manager. I think it was added in Pluto 0.19.0.
This is what the very last few cells look like in a notebook using the internal pluto packages:
# ╔═╡ 00000000-0000-0000-0000-000000000001
PLUTO_PROJECT_TOML_CONTENTS = """
[deps]
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
PlutoUI = "7f904dfe-b85e-4ff6-b463-dae2292396a8"
PyCall = "438e738f-606a-5dbb-bf0a-cddfbfd45ab0"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
[compat]
Plots = "~1.32.0"
PlutoUI = "~0.7.40"
PyCall = "~1.94.1"
"""
# ╔═╡ 00000000-0000-0000-0000-000000000002
PLUTO_MANIFEST_TOML_CONTENTS = """
# This file is machine-generated - editing it directly is not advised
julia_version = "1.8.0"
...
so you could add something like:
import(nb)
write("./Project.toml", PLUTO_PROJECT_TOML_CONTENTS)
This has the drawback of running all the code in your notebook, which might take a while.
Alternatively, you could read the notebook file until you find the # ╔═╡ 00000000-0000-0000-0000-000000000001 line and then either parse the following string yourself or eval everything after that (something like eval(Meta.parse(string_stuff_after_comment)) should do it...)
I hope that helps a little bit.

The Pluto.load_notebook_nobackup() reads the information of a notebook. This gives a dictionary of deps in the field .nbpkg_ctx.env.project.deps
import Pluto, Pkg
Pkg.activate(;temp=true)
nb_source = "https://raw.githubusercontent.com/fonsp/Pluto.jl/main/sample/PlutoUI.jl.jl"
nb = download(nb_source)
nb_info = Pluto.load_notebook_nobackup(nb)
deps = nb_info.nbpkg_ctx.env.project.deps
Pkg.add([Pkg.PackageSpec(name=p, uuid=u) for (p, u) in deps])

Related

How to export Snowflake Web UI Worksheet SQL to file

Classic Snowflake Web UI and the new Snowsight are great at importing sql from a file but neither allows you to export sql to a file. Is there a workaround?
You can use an IDE to connect to snowflake and write queries. Then the scripts can be downloaded using IDE features and can sync with git repo as well.
dbeaver is one such IDE which supports snowflake :
https://hevodata.com/learn/dbeaver-snowflake/
The query pane is interactive so the obvious workaround will be:
CTRL + A (select all)
CTRL + C (copy)
<open_favourite_text_editor>
CTRL + P (paste)
CTRL + S (save)
This tool can help you while the team develops a native feature to export worksheets:
"Snowflake Snowsight Extensions wrap Snowsight features that do not have API or SQL alternatives, such as manipulating Dashboards and Worksheets, and retrieving Query Profile and step timings."
https://github.com/Snowflake-Labs/sfsnowsightextensions
Further explained on this post:
https://medium.com/snowflake/importing-and-exporting-snowsight-dashboards-and-worksheets-3cd8e34d29c8
For example, to save to a file within PowerShell:
PS > $dashboards | foreach {$_.SaveToFolder(“path/to/folder”)}
PS > $dashboards[0].SaveToFile(“path/to/folder/mydashboard.json”)
ETA: I'm adding this edit to the front because this is what actually worked.
Again, BSON was a dead end & punycode is irrelevant. I don't know why punycode is referenced in the metadata file; but my best guess is that they might use punycode to encode the worksheet name itself (though I'm not sure why that would be needed since it shouldn't need to be part of a URL).
After doing terrible things and trying a number of complex ways of dealing with escape character hell, I found that the actual encoding is very simple. It just works as an 8 bit encoding with anything that might cause problems escaped away (null, control codes, double quotes, etc.). To load, treat the file as a text file using an 8-bit encoding; extract the data as a JSON field, then re-encode that extracted data as that same encoding. I just used latin_1 to read; but it may not even matter which encoding you use as long as you are consistent and use the same one to re-encode. The encoded field will then be valid zlib compressed data.
I decided that I wanted to start from scratch so I needed to back the worksheets first and I made a Python script based on my findings above. Be warned that this may return even worksheets that you previously closed for good. After running this and verifying that backups were created, I just ran rm #~/worksheet_data/;, closed the tab & reopened it.
Here's the code (fill in the appropriate base directory location):
import os
from collections import OrderedDict
import configparser
from sqlalchemy import create_engine, exc
from snowflake.sqlalchemy import URL
import pathlib
import json
import zlib
import string
def format_filename(s: str) -> str: # From https://gist.github.com/seanh/93666
"""Take a string and return a valid filename constructed from the string.
Uses a whitelist approach: any characters not present in valid_chars are
removed. Also spaces are replaced with underscores.
Note: this method may produce invalid filenames such as ``, `.` or `..`
When I use this method I prepend a date string like '2009_01_15_19_46_32_'
and append a file extension like '.txt', so I avoid the potential of using
an invalid filename.
"""
valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)
filename = ''.join(c for c in s if c in valid_chars)
# filename = filename.replace(' ','_') # I don't like spaces in filenames.
return filename
def trlng_dash(s: str) -> str:
"""Removes trailing character if present."""
return s[:-1] if s[-1] == '-' else s
sso_authenticate = True
# Assumes CLI config file exists.
config = configparser.ConfigParser()
home = pathlib.Path.home()
config_loc = home/'.snowsql/config' # Assumes it's set up from Snowflake CLI.
base_dir = home/r'{your Desired base directory goes here.}'
json_dir = base_dir/'json' # Location for your worksheet stage JSON files.
sql_dir = base_dir/'sql' # Location for your worksheets.
# Assumes CLI config file exists.
config.read(config_loc)
# Add connection parameters here (assumes CLI config exists).
# Using sso so only 2 are needed.
# If there's no config file, etc. enter by hand here (or however you want to do it).
connection_params = {
'account': config['connections']['accountname'],
'user': config['connections']['username'],
}
if sso_authenticate:
connection_params['authenticator'] = 'externalbrowser'
if config['connections'].get('password', None) is not None:
connection_params['password'] = config['connections']['password']
if config['connections'].get('rolename', None) is not None:
connection_params['role'] = config['connections']['rolename']
if locals().get('database', None) is not None:
connection_params['database'] = database
if locals().get('schema', None) is not None:
connection_params['schema'] = schema
sf_engine = create_engine(URL(**connection_params))
if not base_dir.exists():
base_dir.mkdir()
if not json_dir.exists():
json_dir.mkdir()
if not (sql_dir).exists():
sql_dir.mkdir()
with sf_engine.connect() as connection:
connection.execute(f'get #~/worksheet_data/ \'file://{str(json_dir.as_posix())}\';')
for file in [path for path in json_dir.glob('*') if path.is_file()]:
if file.suffix != '.json':
file.replace(file.with_suffix(file.suffix + '.json'))
with open(json_dir/'metadata.json', 'r') as metadata_file:
files_meta = json.load(metadata_file)
# List of files from metadata file will contain some empty worksheets.
files_description_orig = OrderedDict((file_key_value['name'], file_key_value) for file_key_value in sorted(files_meta['activeWorksheets'] + list(files_meta['inactiveWorksheets'].values()), key=lambda x: x['name']) if file_key_value['name'])
# files_description will only track non empty worksheets
files_description = files_description_orig.copy()
# Create updated files description filtering out empty worksheets.
for item in files_description_orig:
json_file = json_dir/f"{files_description_orig[item]['name']}.json"
# If a file didn't make it or was deleted by hand, we should
# remove from the filtered description & continue to the next item.
if not (json_file.exists() and json_file.is_file()):
del files_description[item]
continue
with open(json_file, 'r', encoding='latin_1') as f:
json_dat = json.load(f)
# If the file represents a worksheet with a body field, we want it.
if not json_dat['wsContents'].get('body'):
del files_description[item]
## Delete JSON files corresponsing to empty worksheets.
# f.close()
# try:
# (json_dir/f"{files_description_orig[item]['name']}.json").unlink()
# except:
# pass
# Produce a list of normalized filenames (no illegal or awkward characters).
file_names = set(
format_filename(trlng_dash(files_description[item]['encodedDetails']['scriptName']).strip())
for item in files_description)
# Add useful information to our files_description OrderedDict
for file_name in file_names:
repeats_cnt = 0
file_name_repeats = (
item
for item
in files_description
if file_name == format_filename(trlng_dash(files_description[item]['encodedDetails']['scriptName']).strip())
)
for file_uuid in file_name_repeats:
files_description[file_uuid]['normalizedName'] = file_name
files_description[file_uuid]['stemSuffix'] = '' if repeats_cnt == 0 else f'({repeats_cnt:0>2})'
repeats_cnt += 1
# Now we iterate on non-empty worksheets only.
for item in files_description:
json_file = json_dir/f"{files_description[item]['name']}.json"
with open(json_file, 'r', encoding='latin_1') as f:
json_dat = json.load(f)
body = json_dat['wsContents']['body']
body_bin = body.encode('latin_1')
body_txt = zlib.decompress(body_bin).decode('utf8')
sql_file = sql_dir/f"{files_description[item]['normalizedName']}{files_description[item]['stemSuffix']}.sql"
with open(sql_file, 'w') as sql_f:
sql_f.write(body_txt)
creation_stamp = files_description[item]['created']/1000
os.utime(sql_file, (creation_stamp,creation_stamp))
print('Done!')
As mentioned at Is there any option in snowflake to save or load worksheets? (and in Snowflake's own documentation), in the Classic UI, the worksheets are saved at the user stage under #~/worksheet_data/.
You can download it with a get command like:
get #~/worksheet_data/<name> file:///<your local location>; (though you might need quoting if running from Windows).
The problem is that I do not know how to access it programmatically. The downloaded files look like JSON but it is not valid JSON. The main key is "wsContents" and contains most of the worksheet information. Its value includes two subkeys, "encoding" and "body".
The "encoding" key denotes that gzip is being used. The "body" key seems to be the actual worksheet data which looks a lot like a straight binary representation of the compressed text data. As such, any JSON reader will choke on it.
If it is anything like that, I do not currently know how to access it programmatically using Python.
I do see that a JSON like format exists, BSON, that is bundled into PyMongo. Trying to use this on these files fails. I even tried bson.is_valid and it returns False so I am assuming that it means that these files in Snowflake are not actually BSON.
Edited to add: Again, BSON is a dead end.
Examining the "body" value as just binary data, the first two bytes of sample files do seem to correspond to default zlib compression (0x789c). However, attempting to run straight zlib.decompress on the slice created from that first byte to the last corresponding to the first & last characters of the "body" value results in the error:
Error - 3 while decompressing data: invalid code lengths set
This makes me think that the bytes there, as is, are at least partly garbage and still need some processing before they can be decompressed.
One clue that I failed to mention earlier is that the metadata file (called "metadata" and which serves as an inventory of the remaining files at the #~/worksheet_data/ location) declares that the files use the punycode encoding. However, I have not known how to use that information. The data in these files doesn't particularly look like what I feel punycode should look like nor does it particularly make sense to me that you would use punycode on binary data that is not meant to ever be used to directly generate text such as zlib compressed data.

Keras Multiple inputs - Expected to see 2 array(s), but instead got the following list of 1 arrays:

Following is the code to create model. Model has 2 input layers, 1 embedding, LSTM, attention, and dense layer. I am getting an error (image attached) when I am trying to execute model.fit with multiple inputs.
Not sure why? Please explain.
MAX_SEQUENCE_LENGTH = 20
# First input layer
sequence_ip = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
# Second input layer
time_Decay_ip = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='float32')
# Adding embedding layer
embedding_layer = Embedding(vocab_length, output_dim = 32, input_length=seq_length, trainable=True)
embedded_sequences = embedding_layer(sequence_ip)
l_gru = LSTM(100, return_sequences=True)(embedded_sequences)
l_att = attention()([l_gru, time_Decay_ip])
preds = Dense(1, activation='softmax', trainable = True)(l_att)
model = Model([sequence_ip, time_Decay_ip], preds)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc'])
model.summary()
model.fit(x = [np.array(X_train), np.array(time_decay_tr)], y = np.array(Y_train), validation_data=(X_test, Y_test), nb_epoch=10, batch_size=9)
I had this error when using keras, there are multiple ways to fix this:
You can either uninstall and reinstall the data you are using,
Or you can uninstall and reinstall tensorflow.
You also need to check if your GPU is running trying to run CUDA (fron NVIDIA cards) if it is and you dont have a NVIDIA GPU use the CPU

How to save the results of an np.array for future use when using Google Colab

I am working on a project of Information Retrieval. For that I am using Google Colab. I am in the phase where I have computed some features ("input_features") and I have the labels ("labels") by doing a for loop, which took me about 4 hours to finish.
So at the end I have appended the results to an array:
input_features = np.array(input_features)
labels = np.array(labels)
So my question would be:
Is it possible to save those results in order to use them future purposes when using google colab?
I have found 2 options that maybe could be applied but I don't know where these files are created.
1) To save them as csv files. And my code would be:
from numpy import savetxt
# save to csv file
savetxt('input_features.csv', input_features, delimiter=',')
savetxt('labels.csv', labels, delimiter=',')
And in order to load them:
from numpy import loadtxt
# load array
input_features = loadtxt('input_features.csv', delimiter=',')
labels = loadtxt('labels.csv', delimiter=',')
# print the array
print(input_features)
print(labels)
But still I don't get something back when I print.
2) Save the results of an array by using pickle where I followed these instructions from here:
https://colab.research.google.com/drive/1EAFQxQ68FfsThpVcNU7m8vqt4UZL0Le1#scrollTo=gZ7OTLo3pw8M
from google.colab import files
import pickle
def features_pickeled(input_features, results):
input_features = input_features + '.txt'
pickle.dump(results, open(input_features, 'wb'))
files.download(input_features)
def labels_pickeled(labels, results):
labels = labels + '.txt'
pickle.dump(results, open(labels, 'wb'))
files.download(labels)
And to load them back:
def load_from_local():
loaded_features = {}
uploaded = files.upload()
for input_features in uploaded.keys():
unpickeled_features = uploaded[input_features]
loaded[input_features] = pickle.load(BytesIO(data))
return loaded_features
def load_from_local():
loaded_labels = {}
uploaded = files.upload()
for labels in uploaded.keys():
unpickeled_labels = uploaded[labels]
loaded[labels] = pickle.load(BytesIO(data))
return loaded_labes
#How do I print the pickled files to see if I have them ready for use???
When using python I would do something like this for pickle:
#Create pickle file
with open("name.pickle", "wb") as pickle_file:
pickle.dump(name, pickle_file)
#Load the pickle file
with open("name.pickle", "rb") as name_pickled:
name_b = pickle.load(name_pickled)
But the thing is that I don't see any files to be created in my google drive.
Is my code correct or do I miss some part of the code?
Long description in order to hopefully have explained in detail what I want to do and what I have done for this issue.
Thank you in advance for your help.
Google Colaboratory notebook instances are never guaranteed to have access to the same resources when you disconnect and reconnect because they are run on virtual machines. Therefore, you can't "save" your data in Colab. Here are a few solutions:
Colab saves your code. If the for loop operation you referenced doesn't take an extreme amount of time to run, just leave the code and run it every time you connect your notebook.
Check out np.save. This function allows you to save an array to a binary file. Then, you could re-upload your binary file when you reconnect your notebook. Better yet, you could store the binary file on Google Drive, mount your drive to your notebook, and reference it like that.
# Mount driver to authenticate yourself to gdrive
from google.colab import drive
drive.mount('/content/gdrive')
#---
# Import necessary libraries
import numpy as np
from numpy import savetxt
import pandas as pd
#---
# Create array
arr = np.array([1, 2, 3, 4, 5])
# save to csv file
savetxt('arr.csv', arr, delimiter=',') # You will see the results if you press in the File icon (left panel)
And then you can load it again by:
# You can copy the path when you find your file in the file icon
arr = pd.read_csv('/content/arr.csv', sep=',', header=None) # You can also save your result as a txt file
arr

How to compile and simulate a modelica model that is part of a package with JModelica?

My question is similar to the question of janpeter. I study the ebook by Tiller and try to simulate the example 'Architecture Driven Approach' with OpenModelica and JModelica. I tried the minimal example 'BaseSystem' in OpenModelica and it works fine. But with JModelica version 1.14 I get errors in the compiling process and my script fail. My python script is:
import matplotlib.pyplot as plt
from pymodelica import compile_fmu
from pyfmi import load_fmu
# Variables: modelName, modelFile, extraLibPath
modelName = 'BaseSystem'
modelFile = 'BaseSystem.mo'
extraLibPath = 'C:\Users\Tom\Desktop\Tiller2015a\ModelicaByExample\Architectures'
compilerOption = {'extra_lib_dirs':[extraLibPath]}
# Compile model
fmuName = compile_fmu( modelName, modelFile, compiler_options=compilerOption)
# Load model
model = load_fmu( fmuName)
# Simulate model
res = model.simulate( start_time=0.0, final_time=5.0)
# Extract interesting values
res_w = res['sensor.w']
res_y = res['setpoint.y']
tSim = res['time']
# Visualize results
fig = plt.figure(1)
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
ax1.plot(tSim, res_w, 'g-')
ax2.plot(tSim, res_y, 'b-')
ax1.set_xlabel('t (s)')
ax1.set_ylabel('w (???)', color='g')
ax2.set_ylabel('y (???)', color='b')
plt.title('BaseSystem')
plt.legend()
plt.grid(True)
plt.show()
My problem is how to compile and simulate a model that is part of a package?
I am not a jModelica user, but I think I see some confusion in your script. You wrote:
modelName = 'BaseSystem'
modelFile = 'BaseSystem.mo'
extraLibPath = 'C:\Users\Tom\Desktop\Tiller2015a\ModelicaByExample\Architectures'
But that implies (to me) that the compiler should open the package stored at C:\Users\Tom\Desktop\Tiller2015a\ModelicaByExample\Architectures. But the top-level package is ModelicaByExample and the model you want is Architectures.BaseSystem. So I think something like this is probably more appropriate:
modelName = 'Architectures.BaseSystem'
modelFile = 'package.mo'
extraLibPath = 'C:\Users\Tom\Desktop\Tiller2015a\ModelicaByExample'
The essential point here is that you should be opening ModelicaByExample (specifically, the package.mo file in the ModelicaByExample directory). That opens the ModelicaByExample package. You need to open this package because it is the top-level package. You can't load just a sub-package (which is what it looked like you were trying to do). Then, once you've got ModelicaByExample loaded, you can ask the compiler to specifically compile Architectures.BaseSystem.
I suspect OpenModelica was "helping" you by opening the top-level package anyway, even if you were asking it to open the sub-package.
But again, I don't know jModelica very well and I have definitely not tested any of these suggestions.
Thank you Michael Tiller. With your support I found the solution.
First, the modelName has to be full qualified. Second, as you mentioned, the extraLibPath should end at the top-level directory of the library ModelicaByExample. But then I got errors, that JModelica couldn't find components or declarations which are part of the Modelica Standard Library (MSL).
So I added the modelicaLibPath to the MSL, but the error messages remained the same. After many attempts, I launched the command line with administrator privileges and any errors were gone.
Here is the executable python script: BaseSystem.py
### Attention!
# The script and/or the command line must be
# started with administrator privileges
import matplotlib.pyplot as plt
from pymodelica import compile_fmu
from pyfmi import load_fmu
# Variables: modelName, modelFile, extraLibPath
modelName = 'Architectures.SensorComparison.Examples.BaseSystem'
modelFile = ''
extraLibPath = 'C:\Users\Tom\Desktop\Tiller2015a\ModelicaByExample'
modelicaLibPath = 'C:\OpenModelica1.9.2\lib\omlibrary\Modelica 3.2.1'
compileToPath = 'C:\Users\Tom\Desktop\Tiller2015a'
# Set the compiler options
compilerOptions = {'extra_lib_dirs':[modelicaLibPath, extraLibPath]}
# Compile model
fmuName = compile_fmu( modelName, modelFile, compiler_options=compilerOptions, compile_to=compileToPath)
# Load model
model = load_fmu( fmuName)
# Simulate model
res = model.simulate( start_time=0.0, final_time=5.0)
# Extract interesting values
res_w = res['sensor.w']
res_y = res['setpoint.y']
tSim = res['time']
# Visualize results
fig = plt.figure(1)
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
ax1.plot(tSim, res_w, 'g-')
ax2.plot(tSim, res_y, 'b-')
ax1.set_xlabel('t (s)')
ax1.set_ylabel('sensor.w (rad/s)', color='g')
ax2.set_ylabel('setpoint.y (rad/s)', color='b')
plt.title('BaseSystem')
plt.legend()
plt.grid(True)
plt.show()

R tm: reloading a 'PCorpus' backend filehash database as corpus (e.g. in restarted session/script)

Having learned loads from answers on this site (thanks!), it's finally time to ask my own question.
I'm using R (tm and lsa packages) to create, clean and simplify, and then run LSA (latent semantic analysis) on, a corpus of about 15,000 text documents. I'm doing this in R 3.0.0 under Mac OS X 10.6.
For efficiency (and to cope with having too little RAM), I've been trying to use either the 'PCorpus' (backend database support supported by the 'filehash' package) option in tm, or the newer 'tm.plugin.dc' option for so-called 'distributed' corpus processing). But I don't really understand how either one works under the bonnet.
An apparent bug using DCorpus with tm_map (not relevant right now) led me to do some of the preprocessing work with the PCorpus option instead. And it takes hours. So I use R CMD BATCH to run a script doing things like:
> # load corpus from predefined directory path,
> # and create backend database to support processing:
> bigCcorp = PCorpus(bigCdir, readerControl = list(load=FALSE), dbControl = list(useDb = TRUE, dbName = "bigCdb", dbType = "DB1"))
> # converting to lower case:
> bigCcorp = tm_map(bigCcorp, tolower)
> # removing stopwords:
> stoppedCcorp = tm_map(bigCcorp, removeWords, stoplist)
Now, supposing my script crashes soon after this point, or I just forget to export the corpus in some other form, and then I restart R. The database is still there on my hard drive, full of nicely tidied-up data. Surely I can reload it back into the new R session, to carry on with the corpus processing, instead of starting all over again?
It feels like a noodle question... but no amount of dbInit() or dbLoad() or variations on the 'PCorpus()' function seem to work. Does anyone know the correct incantation?
I've scoured all the related documentation, and every paper and web forum I can find, but total blank - nobody seems to have done it. Or have I missed it?
The original question was from 2013. Meanwhile, in Feb 2015, a duplicate, or similar question, has been answered:
How to reconnect to the PCorpus in the R tm package?. That answer in that post is essential, although pretty minimalist, so I'll try to augment it here.
These are some comments I've just discovered while working on a similar problem:
Note that the dbInit() function is not part of the tm package.
First you need to install the filehash package, which the tm-Documentation only "suggests" to install. This means it is not a hard dependency of tm.
Supposedly, you can also use the filehashSQLite package with library("filehashSQLite") instead of library("filehash"), and both of these packages have the same interface and work seamlesslessly together, due to object-oriented design. So also install "filehashSQLite" (edit 2016: some functions such as tn::content_transformer() are not implemented for filehashSQLite).
then this works:
library(filehashSQLite)
# this string becomes filename, must not contain dots.
# Example: "mydata.sqlite" is not permitted.
s <- "sqldb_pcorpus_mydata" #replace mydat with something more descriptive
suppressMessages(library(filehashSQLite))
if(! file.exists(s)){
# csv is a data frame of 900 documents, 18 cols/features
pc = PCorpus(DataframeSource(csv), readerControl = list(language = "en"), dbControl = list(dbName = s, dbType = "SQLite"))
dbCreate(s, "SQLite")
db <- dbInit(s, "SQLite")
set.seed(234)
# add another record, just to show we can.
# key="test", value = "Hi there"
dbInsert(db, "test", "hi there")
} else {
db <- dbInit(s, "SQLite")
pc <- dbLoad(db)
}
show(pc)
# <<PCorpus>>
# Metadata: corpus specific: 0, document level (indexed): 0
#Content: documents: 900
dbFetch(db, "test")
# remove it
rm(db)
rm(pc)
#reload it
db <- dbInit(s, "SQLite")
pc <- dbLoad(db)
# the corpus entries are now accessible, but not loaded into memory.
# now 900 documents are bound via "Active Bindings", created by makeActiveBinding() from the base package
show(pc)
# [1] "1" "2" "3" "4" "5" "6" "7" "8" "9"
# ...
# [900]
#[883] "883" "884" "885" "886" "887" "888" "889" "890" "891" "892"
#"893" "894" "895" "896" "897" "898" "899" "900"
#[901] "test"
dbFetch(db, "900")
# <<PlainTextDocument>>
# Metadata: 7
# Content: chars: 33
dbFetch(db, "test")
#[1] "hi there"
This is what the database backend looks like. You can see that the documents from the data frame have been encoded somehow, inside the sqlite table.
This is what my RStudio IDE shows me:

Resources