In Zeppelin, at each row I am having to provide the interpreter at each row. Is there a way to set the interpreter for the whole session.
%pyspark
import re
Took 0 seconds.
import pandas as pd
console :1: error: '.' expected but identifier found. import pandas as pd
%pyspark
import pandas as pd
Took 0 seconds.
How do I set the interpreter for the whole session?
The Spark Interpreter group currently has 4 interpreter as listed here...
https://zeppelin.incubator.apache.org/docs/0.5.0-incubating/interpreter/spark.html
The default interpreter is %spark and default interpreter is selected based on the order of the interpreter listed in the zeppelin.interpreters property in zeppelin-site.xml config file.
The current order of interpreter in your zeppelin-site.xml (zeppelin.interpreters property) will be this ...
org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter
Modify this to ...
org.apache.zeppelin.spark.PySparkInterpreter, org.apache.zeppelin.spark.SparkInterpreter
and restart Zeppelin (zeppelin-daemon.sh restart)
This will make %pyspark as default interpreter.
Thanks
The above answer did not work on recent Zeppelin versions.
To set the default interpreter, check /etc/zeppelin/conf/interpreter.json and look for something like:
...
{
"name": "spark",
"class": "org.apache.zeppelin.spark.SparkInterpreter",
"defaultInterpreter": true,
"editor": {
"language": "scala",
"editOnDblClick": false
}
},
...
{
"name": "pyspark",
"class": "org.apache.zeppelin.spark.PySparkInterpreter",
"defaultInterpreter": false,
"editor": {
"language": "python",
"editOnDblClick": false
}
}
Now everything seems trivial. We just need to change the defaultInterpreter of spark to false, and defaultInterpreter of pyspark to true.
And then restart the zeppelin (sudo stop zeppelin; sudo start zeppelin).
Even Fishball's answer for recent Zeppelin seems outdated. My conf/interpreter.json came with spark as default ("defaultInterpreter": true) and python/pyspark as not ("defaultInterpreter": false) and yet Zeppelin picked up python/pyspark as default. And in my case, I wanted spark over pyspark.
The solution was to just drag and drop interpreters in Zeppelin web console's interpreter binding section.
change zeppelin.interpreter.group.default in conf/zeppelin-site.xml from spark to whichever interpreter you would like to use.
Related
I created 3 package versions - 0.1.0.1, 0.1.0.2, 0.1.0.3 of 2nd generation managed package.
I promoted 0.1.0.3 to release
I created versions 0.1.0.4 - 0.1.0.6 with --skipancestorcheck
I made following update in sfdx-project file:
I specified ancestorId in sfdx-project file as id (04t......) of released version 0.1.0.3
I set "versionName" to "ver 0.2" from "ver 0.1"
I set "versionNumber" to "0.2.0.NEXT" from "0.1.0.NEXT"
I ran command sfdx force:package:beta:version:create --package "Test App" --installationkey "XXX" --definitionfile config/project-scratch-def.json --wait 10 -c"
I got error
"ERROR running force:package:version:create: An unexpected error occurred. Please include this ErrorId if you contact support: 636537585-177978 (-693775129)"
Salesforce Support found that ErrorId is related to "java.lang.RuntimeException: Failed to retrieve Second Generation Package Version info of ancestor 05i3z000000fyk0AAA."
05i3z000000fyk0AAA is Id of Package2Version record of version 0.1.0.3 in my DevHub org
My expectation is that new beta version of package will be created but I got error.
Did you have something similar?
I am now stuck, because I can not delete released version of 2nd generation package and I can not create new one. Do you have any idea how to get out of this?
I tried ancestorVersion : HIGHEST and all possible values for ancestorId, ancestorVersion but nothing helped.
My latest sfdx-project file is
{
"packageDirectories": [
{
"path": "force-app",
"default": true,
"package": "Test App Core",
"versionName": "ver 0.2",
"versionNumber": "0.2.0.NEXT",
"ancestorId":"04t..........YAAQ"
}
],
"name": "test-salesforce-core",
"namespace": "zenoo_app",
"sfdcLoginUrl": "https://login.salesforce.com",
"sourceApiVersion": "55.0",
"packageAliases": {
"Test App Core": "0Ho3........CAA",
"Test App Core#0.1.0-1": "04t..............OAAQ",
"Test App Core#0.1.0-2": "04t..............TAAQ",
"Test App Core#0.1.0-3": "04t..............YAAQ"
}
}
I am using sfdx-cli version 7.176
I'm using Kinesis Data Analytics Studio which provides a Zeppelin environment.
Very simple code:
%flink.pyflink
from pyflink.common.serialization import JsonRowDeserializationSchema
from pyflink.common.typeinfo import Types
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream.connectors import FlinkKafkaConsumer
# create env = determine app runs locally or remotely
env = s_env or StreamExecutionEnvironment.get_execution_environment()
env.add_jars("file:///home/ec2-user/flink-sql-connector-kafka_2.12-1.13.5.jar")
# create a kafka consumer
deserialization_schema = JsonRowDeserializationSchema.builder() \
.type_info(type_info=Types.ROW_NAMED(
['id', 'name'],
[Types.INT(), Types.STRING()])
).build()
kafka_consumer = FlinkKafkaConsumer(
topics='nihao',
deserialization_schema=deserialization_schema,
properties={
'bootstrap.servers': 'kakfa-brokers:9092',
'group.id': 'group1'
})
kafka_consumer.set_start_from_earliest()
ds = env.add_source(kafka_consumer)
ds.print()
env.execute('job1')
I can get this working locally can sees change logs being produced to console. However I cannot get the same results in Zeppelin.
Also checked STDOUT in Flink web console task managers, nothing is there too.
Am I missing something? Searched for days and could not find anything on it.
I'm not 100% sure but I think you may need a sink to begin pulling data through the datastream, you could potentially use the included Print Sink Function
I'm trying to change the default interpreter in an interpreter group.
Specifically, I'm changing
"defaultInterpreter": true,
to
"defaultInterpreter": false,
in
{
"name": "spark",
"class": "org.apache.zeppelin.spark.SparkInterpreter",
"defaultInterpreter": true,
"editor": {
"language": "scala",
"editOnDblClick": false,
"completionKey": "TAB",
"completionSupport": true
}
},
Then changing the next section (Spark SQL) to true.
But then as I restart Zeppelin, interpreter.json gets reverted.
Please advice
Ok after digging around I think I found the right place.
In interpreters/< interpreter >/interpreter-settings.json is the actual place to edit the settings.
conf/interpreter.json is generated.
I wish these things were easily found in docs or via search though, not by developers digging and trying different things.
I am trying to use scikit_bring_your_own/container/decision_trees/train mode, running in AWS CLI, I had no issues. Trying to replicate in Creating Sagemaker Training Job , facing issue in loading data from S3 to docker image path.
In CLI command we used specify the docker run -v $(pwd)/test_dir:/opt/ml --rm ${image} train from where the input needs to referred.
In training job, mentioned the S3 bucket location and output path for model artifacts.
Error entered in the exception as in train - "container/decision_trees/train"
raise ValueError(('There are no files in {}.\n' +
'This usually indicates that the channel ({}) was incorrectly specified,\n' +
'the data specification in S3 was incorrectly specified or the role specified\n' +
'does not have permission to access the data.').format(training_path, channel_name))
Traceback (most recent call last):
File "/opt/program/train", line 55, in train
'does not have permission to access the data.').format(training_path, channel_name))
So not understanding is there any tweaking required or any access missing.
kindly help
If you set the InputDataConfig in the CreateTrainingJob API like this
"InputDataConfig": [
{
"ChannelName": "train",
"DataSource": {
"S3DataSource": {
"S3DataDistributionType": "FullyReplicated",
"S3DataType": "S3Prefix",
"S3Uri": "s3://<bucket>/a.csv"
}
},
"InputMode": "File",
},
{
"ChannelName": "eval",
"DataSource": {
"S3DataSource": {
"S3DataDistributionType": "FullyReplicated",
"S3DataType": "S3Prefix",
"S3Uri": "s3://<bucket>/b.csv"
}
},
"InputMode": "File",
}
]
SageMaker download the data specified above from S3 to the /opt/ml/input/data/channel_name directory in the Docker container. In this case, the algorithm container should be able to find the input data under
/opt/ml/input/data/train/a.csv
/opt/ml/input/data/eval/b.csv
You can find more details in https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html
I have a ServiceBuswithQueue ARM template that has the output section like this below:
"outputs": {
"serviceBusNamespaceName": {
"type": "string",
"value": "[parameters('serviceBusNamespaceName')]"
},
"namespaceConnectionString": {
"type": "string",
"value": "[listkeys(variables('authRuleResourceId'), variables('sbVersion')).primaryConnectionString]"
},
"sharedAccessPolicyPrimaryKey": {
"type": "string",
"value": "[listkeys(variables('authRuleResourceId'), variables('sbVersion')).primaryKey]"
},
"serviceBusQueueName": {
"type": "string",
"value": "[parameters('serviceBusQueueName')]"
}
}
For that I Created the Continuous Integration (CI) and Continuous Deployment (CD) in VSTS, In CD I have used the PowerShell task to deploy the above ARM template. But I want to pass the output of this ARM template like "$(serviceBusQueueName)" to input parameter of the next ARM template in Continuous Deployment.
In know the above scenario can achieved using ARM outputs in between the two ARM task in Continuous Deployment. But I don’t want it because currently I am using the PowerShell task to deploy the ARM template.
Before posting this question, I was researched and find the following links but those are not helpful to resolve my issue.
Azure ARM templates - using the output of other deployments
How do I use ARM 'outputs' values another release task?
Can anyone please suggest me how to resolve the above issue?
You can override parameters by specifying corresponding parameters.
Override template parameter in the script
# Start the deployment
Write-Host "Starting deployment...";
$outputs = New-AzureRmResourceGroupDeployment -ResourceGroupName $resourceGroupName -Mode Incremental -TemplateFile $templateFilePath -TemplateParameterFile $parametersFilePath;
foreach ($key in $outputs.Outputs.Keys){
$type = $outputs.Outputs.Item($key).Type
$value = $outputs.Outputs.Item($key).Value
Write-Host "##vso[task.setvariable variable=$key;]$value"
}
You can display all the environment variables in a subsequent script:
Write-Host "Environment variables:"
gci env:* | sort-object name