Spark Streaming display (streaming) not working - file

I follow this example to simulate streaming in Spark from a source file. At the end of the example, a function named display is used which is supported only in databricks. I run my code in Jupyter notebook. What is the alternative in Jupyter to get the same output obtained from display function?
screenshoot_of_the_Example
Update_1:
The code:
# Source
sourceStream=spark.readStream.format("csv").\
option("header",True).\
schema(schema).option("ignoreLeadingWhiteSpace",True).\
option("mode","dropMalformed").\
option("maxFilesPerTrigger",1).load("D:/PHD Project/Paper_3/Tutorials/HeartTest_1/").\
withColumnRenamed("output","label")
#stream test data to the ML model
streamingHeart=pModel.transform(sourceStream).select('label')
I do the following:
streamingHeart.writeStream.outputMode("append").\
format("csv").option("path", "D:/PHD \
Project/Paper_3/Tutorials/sa1/").option("checkpointLocation",\
"checkpoint/filesink_checkpoint").start()\
The problem is that the generated files (output files) are empty. What might be the reason behind that?

I solved the problem by changing the checkpoint, as follow.
Project/Paper_3/Tutorials/sa1/").option("checkpointLocation",\
"checkpoint/filesink_checkpoint_1")

Related

Sagemaker Studio trial component chart not showing

I am wondering why I am unable to show the loss and accuracy curve in Sagemaker Studio, Trial components chart.
I am using tensorflow's keras API for training.
from sagemaker.tensorflow import TensorFlow
estimator = TensorFlow(
entry_point="sm_entrypoint.sh",
source_dir=".",
role=role,
instance_count=1,
instance_type="ml.m5.4xlarge",
framework_version="2.4",
py_version="py37",
metric_definitions=[
{'Name':'train:loss', 'Regex':'loss: ([0-9.]+'},
{'Name':'val:loss', 'Regex':'val_loss: ([0-9.]+'},
{'Name':'train:accuracy', 'Regex':'accuracy: ([0-9.]+'},
{'Name':'val:accuracy', 'Regex':'val_accuracy: ([0-9.]+'}
],
enable_sagemaker_metrics=True
)
estimator.fit(
inputs="s3://xxx",
experiment_config={
"ExperimentName": "urbansounds-20211027",
"TrialName": "tf-classical-NN-20211027",
"TrialComponentDisplayName": "Train"
}
)
Regex is enabled, and appears to be logging them correctly. Since under the metrics tab, it shows 12 counts for each metric, corresponding to 12 epochs cycle which I specified.
However, the chart is empty. The x-axis is in time here, but it is also empty when I switched to epoch.
tldr: in your entry_point source code sm_entrypoint.sh, you need to explicitly inform the experiment tracker which epoch the metric is associated with, using the log_metric() function.
There are two ways tracker work in SageMaker Experiment Tracker: (1) you log the metric in your entry_point code, and use metric_definitions argument in the estimator to teach SM to parse the metric from the logs, as the way you did it, or, you can (2) explicitly create a Tracker instance inside your entry_point, and invoke the log_metric() method. Apparently only method (2) tells SM Tracker what epoch each metric entry is registered to.
I found the answer from a random video https://youtu.be/gMnkfPztIHU?t=141, after days of search :(
Oh there is also a catch: SageMaker images do not have the Tracker package installed, so if you just have from smexperiments.tracker import Tracker in your entry_point source code, your SM Estimator will complain. So you will need to install sagemaker-experiments for your image, by
create a requirements.txt file that has sagemaker-experiments==0.1.35 in it;
specify the source dir by including source_dir="./dir_that_contains_requirements.txt" in your estimator creation.

Error 2005: While attempting to GET response from form recognizer

Currently I'm using form recognizer version 2.1 preview to train a custom model. I'm able to test the model in Form Recognizer Labeling Tool and got the output. When I input the same file that I got out in labeling tool in my program I'm getting the error below.
{"status": "failed", "createdDateTime": "2020-09-25T20:03:21Z", "lastUpdatedDateTime": "2020-09-25T20:03:21Z", "analyzeResult": {"version": "2.1.0", "errors": [{"code": "2005", "message": "The file submitted couldn't be parsed. This can be due to one of the following reasons: the file format is not supported ( Supported formats include JPEG, PNG, BMP, PDF and TIFF), the file is corrupted or password protected."}]}}
The GET request code used is:
resp = requests.get(url=get_url,headers={"Ocp-Apim-Subscription-Key":FORM_RECOGNIZER_SUBSCRIPTION_KEY})
I have saved the file to server and then tried to read it from there and pass the read file to form recognizer. This worked for me. But I don't know why did it happen.
I also encountered this exact error message following this article.
The article show 4 steps:
Train Model (require SAS to blob - whole folder)
Get model result
Analyze (require SAS to a single file)
Get analyze result
Profit
I got this error on step 4.
After monkeying around, I figured that the cause is not actually in step 4 but in step 3 instead. I was providing SAS to the blob instead of SAS to the file.
After I corrected the SAS URL it works perfectly.
Here is how to get SAS to blob:
Here is how to get SAS to a file:
What file are you trying to use as input ? Form Recognizer supports PDF, Tiff and images (PNG and JPEG) as file types and inputs to the analyze API. See more details here - https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/python-labeled-data?tabs=v2-0#analyze-forms-for-key-value-pairs-and-tables

Zlib.Gunzip not working in Grafana Plugin

I am creating my first Grafana panel plugin to display GLG grphics. I am using react simple panel plugin.
For GLG implementation I am having GLG static library(can't install with npm). So I added my GLG library files(GlgCE.js, GlgTooklitCE.js, gunzip.min.js) in external folder. I am importing all these library files in SimplePanel.tsx file. One of my step is to decompress created the data.In my GlgToolkit.js I am having below code which creates object for Zlib.Gunzip and decompress the data which is in Uint8Array format.
tproto.__glg_gunzip_hook__ = (data) => {
var gunzip = new Zlib.Gunzip(data);
return gunzip.decompress();
};
My problem is that above code is not working, while debugging I can say its unable to create object for Zlib.Gunzip. It returing undefine for gunzip variable, and data is not getting decompress.
I will be great if anybody caan help me on this.How can one library file can communicate with other(in this case gunzip.min.js).
I found my own solution, I imported the gunzip.min.js file in my library file.
import * as Zlib from './gunzip.min.js';

How I can get real img file name by src in Selenium?

Using LogoImg.GetAttribute("src") I get the following scr:
https://scol.stage-next.sc.local/lspprofile/5a2e7338d6e9a927741175e2/image?id=5a2fbc98d6e9a9177c8c1592
But the real name of the file is: TestImage - 9fb0c49d-69b1-49ed-8c63-4283e405b781.jpg
If i enter the src in my browser i got the file with real name downloaded.
How can I get the real name of the file in selenium as I need it for test.
Well the task is solved by other means, i just compared the differences in src. But the responce to the question would be yet interesting.
As you are able to retrieve the src attribute as follows :
https://scol.stage-next.sc.local/lspprofile/5a2e7338d6e9a927741175e2/image?id=5a2fbc98d6e9a9177c8c1592
This is the reference to the resource stored in the Database. So it wouldn't be possible to retrive the name 9fb0c49d-69b1-49ed-8c63-4283e405b781.jpg before the file gets downloaded.
To ensure the download is completed and then to read the filename you will need to use either of the following :
glob.glob() or fnmatch :
https://stackoverflow.com/a/4296148/771848
Watchdog module to monitor changes with in a directory:
python selenium, find out when a download has completed?

SharePoint GetFolderByServerRelativeUrl /Files API not returning list of files as expected

I'm new to SharePoint and I'm having trouble with a few of the simple examples I've found and I'm not sure if there's a permission I don't have correct or if I'm not understanding this properly.
when I use a browser to access my URL:
https://mysite.com/_api/web/GetFolderByServerRelativeUrl('/SCF/Shared%20Documents/FY%202014%20Memos')
part of the return xml says there are 87 items <d:ItemCount m:type="Edm.Int32">87</d:ItemCount>, which does correctly correlate to the number of files inside this folder.
Here's where I get confused. When I use the following to show the contents of the folder, I don't get any of the file information listed in the result xml like I would expect:
https://mysite.com/_api/web/GetFolderByServerRelativeUrl('/SCF/Shared%20Documents/FY%202014%20Memos')/Files
I've also tried the following to get specific file info, but I get a file not found message:
https://mysite.com/_api/web/GetFolderByServerRelativeUrl('/SCF/Shared%20Documents/FY%202014%20Memos/096.pdf')
Am I missing something simple?
This behavior occurs since the incorrect context of web for SP.Web.getFolderByServerRelativeUrl Method is specified in REST query:
https://[server]/[web]/_api/web/GetFolderByServerRelativeUrl('/[web]/[library]/[folder]')
|
web site from which Folder/Files are retrieved
Assume the following site structure:
/ News web (root)
|
Archive sub web
|
Documents library
|
2008 Folder
Then the following REST query:
https://[server]/archive/_api/web/GetFolderByServerRelativeUrl('/archive/Documents/2008')/Files
or
https://[server]/archive/_api/web/GetFolderByServerRelativeUrl('Documents/2008')/Files
will return the files located in 2008 folder of Documents library under Archive sub site.
believe it or not, my problem was sending the parameter with double quote instead of single quote
good:
https://[server]/[web]/_api/web/GetFolderByServerRelativeUrl('/[web]/[library]/[folder]')
bad:
https://[server]/[web]/_api/web/GetFolderByServerRelativeUrl("/[web]/[library]/[folder]")

Resources