More than 2 notebooks with R code in zeppelin failing with error sparkr intrepreter not responding - apache-zeppelin

I have met with a strange issue in running R notebooks in zeppelin(0.7.2).
Spark intrepreter is in per note Scoped mode and spark version is 1.6.2 and SPARK_HOME is set.
Please find the steps below to reproduce the issue:
Create a notebook (Note1) and run any r code in a paragraph. I ran the following code.
%r
rdf <- data.frame(c(1,2,3,4))
colnames(rdf) <- c("myCol")
sdf <- createDataFrame(sqlContext, rdf)
withColumn(sdf, "newCol", sdf$myCol * 2.0)
Create another notebook (Note2) and run any r code in a paragraph. I ran the same code as above.
Till now everything works fine.
Create third notebook (Note3) and run any r code in a paragraph. I ran the same code. This notebook fails with the error
org.apache.zeppelin.interpreter.InterpreterException: sparkr is not
responding
What I understood from the analysis is that the process created for sparkr interpreter is not getting killed properly and this makes every third model to throw an error while executing. The process will be killed on restarting the sparkr interpreter and another 2 models could be executed successfully. ie, For every third model run using the sparkr interpreter, the error is thrown.
Help me to fix the problem.

you need to set spark.r.numRBackendThreads larger than 2. By default it is 2 which means you can only have 2 threads for RBackend. Since you are using scoped mode per note, each note will consume one thread for RBackend, so you can only run 2 note.

Related

How to use p3dfft multiple times for different dimensions? Any idea?

I want to use p3dfft first NxNxN and in-between 2Nx2Nx2N dimensions. I have tried to implement that, it shows
P3DFFT Setup error: the problem is already initialized.
Currently multiple setups not supported.
Quit the library using p3dfft_clean before initializing another setup
or if I tried to set up p3dfft 2 times (without cleaning first one) it shows error code 139. See below,
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 4127 RUNNING AT localhost.localdomain
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
I was trying to making header file and made one function for given dimension but still error is there when function called 2 times for 2 different dimensions.

CommandSequence taking too long to download

With both DKPy-SITL and our APM2 board, the wait_ready method is causing our program to raise an API Exception due to the command list (waypoints) taking too long to download. In the past (with droneapi) this wasn't an issue for me. Some waypoints are being downloaded, but the process takes about 10 seconds for each one, which leads me to believe something weird is going on.
Are there any ways to speed up the download process? I've posted the relevant code below.
self.vehicle = connect(connection_string, baud=baud_rate,
status_printer=dronekit_printer, wait_ready=True)
and later in another asynchronous method
def commands(self):
commands = self.vehicle.commands
commands.download()
commands.wait_ready()
return commands
The error occurs on commands.wait_ready(). There has to be a faster way to download commands than sitting there for over 30 seconds on an i7 4790k processor, especially since I've run the same code off a slower computer in the past with droneapi. If need be, I can raise an issue on the dronekit github as well.
I had the same issue. First time download call always goes well (0 commands). Once you have uploaded some commands the second time you try to download it fails ('Timeout' exception).
What I did to solve this was calling clear without download after the first time.
Something like this:
cmds = vehicle.commands
if not cmds.count > 0:
# Download
cmds.download()
# Wait until download is finished
cmds.wait_ready()
cmds.clear()
# Add / Modify the commands here and then upload them

C program exits giving error ORA-12162: TNS:net service name is incorrectly specified

I am working on a remote red-hat server and there I'm developing a c application to insert data in to a remote oracle database. So first i installed the OCI instant client rpm on the server and tried to compile a sample program. after certain linkages I could compile it. But then when I am going to run it. It exits giving an error saying
ORA-12162: TNS:net service name is incorrectly specified
The sample code I used is from the blog (refer to this code in case you need to clarify the things.where I’m quoting only few pieces to this post) René Nyffenegger's collection of things on the web
René Nyffenegger on Oracle
(refer to this code in case you need to clarify the things.where I’m quoting only few pieces to this post)
In the code I added some prints to check for the error And it seems like It gets stuck in the OCIServerAttach() function r gives a printed walue of -1
r=OCIServerAttach(srv, err, dbname, strlen(dbname), (ub4) OCI_DEFAULT);
printf("r value %d",r);
if (r != OCI_SUCCESS) {
checkerr(err, r);
goto clean_up;
}
Another point is that in the compilation process it gives a warning saying that a certain libry is not include. but the exicutable file is created. Here is the massage I get in the compilation process.
[laksithe#loancust ~]$ gcc -L$ORACLE_HOME/lib/ -L$ORACLE_HOME/rdbms/lib/ -o oci_test oci_test.o -L/usr/lib/oracle/12.1/client64/lib -lclntsh `cat $ORACLE_HOME/lib/sysliblist`
cat: /lib/sysliblist: No such file or directory
Going through the web I found that by creating a tnsnames.ora file with the connection details I could solve the problem. But even It didn't work for me. Here is the link for that blog blog
It has been a week since this error and I cold'nt solve it. could someone please help me.
connection string format I used is abc.ghi.com:1521/JKLMN
My recommendation is to bypass tnsnames completely. Oracle has always allowed you to put in the direct connection details, but EZConnect makes that even easier.
When you format your connection string, instead of listing the TNS name, use the actual connection properties in the following format:
servername:port/service name
For Example
MyOracle.MyCompany.Com:1521/SalesReporting
Your connection string might also require direct=true, but I'm honestly not sure.
I like the idea of tnsnames, but it's a double edged sword. When it works, it's great. When it doesn't, you want to throw something. With EZConnect, it always works.
By the way, if you don't know the properties of the three items above, find a machine that connect via tnsnames and:
tnsping <your TNS-named database>

Undocumented Managed VM task queue RPCFailedError

I'm running into a very peculiar and undocumented issue with a GAE Managed VM and Task Queues. I understand that the Managed VM service is in beta, so this question may not be relevant forever, but it's definitely causing me lots of headache now.
The main symptom of the issue is that, in certain (not completely known to me) circumstances, I'm seeing the following error/traceback:
File "/home/vmagent/my_app/some_file.py", line 265, in some_ndb_tasklet
res = yield some_task.add_async('some-task-queue-name')
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/tasklets.py", line 472, in _on_rpc_completion
result = rpc.get_result()
File "/home/vmagent/python_vm_runtime/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
return self.__get_result_hook(self)
File "/home/vmagent/python_vm_runtime/google/appengine/api/taskqueue/taskqueue.py", line 1948, in ResultHook
rpc.check_success()
File "/home/vmagent/python_vm_runtime/google/appengine/api/apiproxy_stub_map.py", line 579, in check_success
self.__rpc.CheckSuccess()
File "/home/vmagent/python_vm_runtime/google/appengine/ext/vmruntime/vmstub.py", line 312, in _WaitImpl
raise self._ErrorException(*_DEFAULT_EXCEPTION)
RPCFailedError: The remote RPC to the application server failed for call taskqueue.BulkAdd().
I've gone through my local App Engine SDK to trace this through, and I can get up to the last line of the trace, but google/appengine/ext/vmruntime/ doesn't exist on my machine at all, so I have no idea what's happening in vmstub.py. From looking at the local code, some_task.add_async('the-queue') is spinning up an RPC and waiting for it to finish, but this error is not what the except apiproxy_errors.ApplicationError, e: at line 1949 of taskqueue.py is expecting...
The code that's generating the error looks something like this:
#ndb.tasklet
def kickoff_tasks(batch_of_payloads):
for task_payload in batch_of_payloads:
# task_payload is a dict
task = taskqueue.Task(
url='/the/handler/url',
params=payload)
res = yield task.add_async('some-valid-task-queue-name')
Other things worth noting:
this code itself is running in a task handler kicked off by another task.
I first saw this error before implementing this sort of batching, and assumed the issue was because I had added too many tasks from within a task handler.
In some cases, I can run this successfully with a batch size of 100, but in others, it fails consistently (depending on the data in the payloads) at 100, and sometimes succeeds at batch sizes of 50.
The task payloads themselves include batches of items, and are tuned to be just small enough to fit in a task. App Engine advertises a maximum task size of 100KB, so I'm keeping the payloads to under 90,000 bytes right now. Lowering the size even more doesn't seem to help any.
I've also tried implementing an exponential backoff to retry the kickoff_tasks method when this error appears, but it seems that once the error is raised, I can't add any other tasks at all from within the same handler (i.e. I can't kickoff a "continue from where you left off" task, I just have to let this one fail and restart itself)
So, my question is, what is actually causing this error? How can I avoid it, or fix this so that I'm handling it correctly?
This is a known issue that is being worked on. There are actually two issues - the RPC failure itself and the lack of handling of the RPCFailedError exception by the SDK.
There is some public discussion of the issue here.
If you're using App Engine Flexible and the python-compat-multicore image, a new bug popped up related to App Engine using a newer version of the requests library that broke the communication between App Engine Flexible and the datastore. You can fix this error by monkey patching the library in your appengine_config.py file.
Add the following code to appengine_config.py:
try:
import appengine.ext.vmruntime.vmstub as vmstub
except ImportError:
pass
else:
if isinstance(vmstub.DEFAULT_TIMEOUT, (int, long)):
# Newer requests libraries do not accept integers as header values.
# Be sure to convert the header value before sending.
# See Support Case ID 11235929.
vmstub.DEFAULT_TIMEOUT = bytes(vmstub.DEFAULT_TIMEOUT)
Note that if you do not have an appengine_config.py file, you can just create it in your base project directory (wherever you put your app.yaml file). This file gets run during App Engine startup..

Parallel processing - `forking` fails under Mac OS 10.6.8

It appears that fork fails under Mac OS 10.6.8. The algorithm is coded in R and I have mainly be using the foreach package for parallel processing. The code works perfectly well when run sequentially (foreach and %do%) but not when run in parallel (foreach and %dopar%) even though the processes run in parallel do NOT communicate.
I used foreach on this same machine a month ago and it worked fine. An update of the OS has been performed in the meantime.
Error Messages
I received several kinds of error messages that seems to come almost stochastically. Also the errors differ depending on whether the code is run from the terminal (either by copy-pasting in the R environment or with R CMD BATCH) or from the R console.
When run on the Terminal, I get
Error in { : task 1 failed - "object 'dp' not found"
When run on the R console I get either
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
....
<repeated many times when run on the R console>
or
Error in { : task 1 failed - "object 'dp' not found"
with the exact same code! Note that although this second error message is the same than the one received on the Terminal, the number of things that are printed (through the print() function) on the screen vastly differ!
What I've tried
I updated the package foreach and I also restarting my computer but it did not quite help.
I tried to print pretty much anything I could but it ended up being quite hard to keep track of what this algorithm is doing. For example, it often through the error about the missing object dp without executing the print statement at the line that precedes the call of the object dp.
I tried to use %dopar% but registering only 1 CPU. The output did not change on the Terminal, but it changed on the Console. Now the console gives the exact same error, at the same time than the terminal.
I made sure that several CPUs were in used when I ask for several CPUs.
I tried to use mclapply instead of foreach and registerDoMC() instead of registerDoParallel() to register the number of cores.
Extra-Info
My version of R is 3.0.2 GUI 1.62 Snow Leopard build. My machine has 16 cores.

Resources