zeppelin giving error in pyspark interpreter for dynamic input - apache-zeppelin

I am not able to get the dynamic input in pyspark from zeppelin context.
This is what I do
%pyspark
z.input("name")
It gives me this error
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark.py", line 162, in
eval(compiledCode)
File "", line 1, in
AttributeError: 'PyZeppelinContext' object has no attribute 'input'
I am able to execute the statements in the %spark interpreter.

ZeppelinContext is automatically created and injected into Scala language backend, from: http://zeppelin-project.org/docs/zeppelincontext.html
Are you sure that Python can use the Zeppelin Context?

Related

how to connect a discord bot through proxy

I am trying to run a discord bot using discord.py and through a proxy. The discordpy doc on this is pretty scarce on the subject and not up to date with aiohttp implementation.
discordpy doc basically says to use a ProxyConnector and pass it as an argument when the client is created.
But in aiohttp, this way is deprecated and client.ClientSession().get is recommended instead. Problem is, client.ClientSession().get asks me to provide a URL.
I also tried with ProxyConnector anyway, but it doesn't work when I finally run the bot (can't connect to the discord API). I'm not sure what's wrong with it, as the proxy itself works fine with any other HTTPS services.
Code with recommended way
conn = client.ClientSession().get(proxy='<proxy_url>', proxy_auth=BasicAuth(<proxy_auth>))
self.client = discord.Client(connector=conn)
Code with deprecated way
conn = ProxyConnector(proxy='<proxy_url>', proxy_auth=BasicAuth(<proxy_auth>))
self.client = discord.Client(connector=conn)
Traceback
Traceback (most recent call last):
File "C:/Users/airiau/PycharmProjects/pronostics/main.py", line 50, in <module>
main()
File "C:/Users/airiau/PycharmProjects/pronostics/main.py", line 46, in main
bot.run(config['token'])
File "C:\Users\airiau\PycharmProjects\pronostics\sample\DiscordBot.py", line 352, in run
self.client.run(self.token)
File "C:\Users\airiau\venv-3.6\lib\site-packages\discord\client.py", line 519, in run
self.loop.run_until_complete(self.start(*args, **kwargs))
File "C:\Program Files (x86)\Python36-32\lib\asyncio\base_events.py", line 468, in run_until_complete
return future.result()
File "C:\Users\airiau\venv-3.6\lib\site-packages\discord\client.py", line 491, in start
yield from self.connect()
File "C:\Users\airiau\venv-3.6\lib\site-packages\discord\client.py", line 444, in connect
self.ws = yield from DiscordWebSocket.from_client(self)
File "C:\Users\airiau\venv-3.6\lib\site-packages\discord\gateway.py", line 207, in from_client
timeout=60, loop=client.loop)
File "C:\Program Files (x86)\Python36-32\lib\asyncio\tasks.py", line 358, in wait_for
return fut.result()
File "C:\Users\airiau\venv-3.6\lib\site-packages\discord\gateway.py", line 65, in _ensure_coroutine_connect
ws = yield from websockets.connect(gateway, loop=loop, klass=klass)
File "C:\Users\airiau\venv-3.6\lib\site-packages\websockets\py35\client.py", line 19, in __await__
return (yield from self.client)
File "C:\Users\airiau\venv-3.6\lib\site-packages\websockets\client.py", line 210, in connect
factory, wsuri.host, wsuri.port, **kwds)
File "C:\Program Files (x86)\Python36-32\lib\asyncio\base_events.py", line 787, in create_connection
', '.join(str(exc) for exc in exceptions)))
OSError: Multiple exceptions: [Errno 10060] Connect call failed ('104.16.59.37', 443), [Errno 10060] Connect call failed ('104.16.60.37', 443)
From continuing research, I found this link with this answer:
It appears that WebSockets used by discord.py do not support HTTP
proxies. This would just magically work with HTTPS, but since the
proxy is HTTP it doesn't. That means that, short of rewriting
discord.py with HTTP proxy support (by using websocket-client, for
example, which supports HTTP proxies), we may be out of luck.
It looks like it might not be possible to do it altogether.

Running Local Version of Schema.org

I am attempting to run a local version of the schema.org app so I can write a proposal for an addition to the ontology. I followed the tutorial at http://dataliberate.com/2016/02/10/evolving-schema-org-in-practice-pt1-the-bits-and-pieces/, which had me set up Google App Engine and download a forked version of schema.org using Git.
Unfortunately, I cannot get the schema.org app to run on my machine. Sample GAE apps work fine, but whenever I start the schema.org app I get the following error:
Traceback (most recent call last):
File "C:\Users\Kevin\Desktop\Ontology\schemaorg\lib\rdflib\plugins\parsers\pyRdfa\__init__.py", line 580, in graph_from_source
if not rdfOutput : raise f
rdflib.plugins.parsers.pyRdfa.FailedSource
ERROR2016-09-29 14:54:39,825 wsgi.py:263]
Traceback (most recent call last):
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\runtime\wsgi.py", line 240, in Handle
handler = _config_handle.add_wsgi_middleware(self._LoadHandler())
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\runtime\wsgi.py", line 299, in _LoadHandler
handler, path, err = LoadObject(self._handler)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\runtime\wsgi.py", line 85, in LoadObject
obj = __import__(path[0])
File "C:\Users\Kevin\Desktop\Ontology\schemaorg\sdoapp.py", line 2585, in <module>
read_schemas(loadExtensions=ENABLE_HOSTED_EXTENSIONS)
File "C:\Users\Kevin\Desktop\Ontology\schemaorg\api.py", line 1055, in read_schemas
apirdflib.load_graph('core',file_paths)
File "C:\Users\Kevin\Desktop\Ontology\schemaorg\apirdflib.py", line 118, in load_graph
g.parse(file=open(full_path(f),"r"),format=format)
File "C:\Users\Kevin\Desktop\Ontology\schemaorg\lib\rdflib\graph.py", line 1037, in parse
parser.parse(source, self, **args)
File "C:\Users\Kevin\Desktop\Ontology\schemaorg\lib\rdflib\plugins\parsers\structureddata.py", line 145, in parse
check_lite=check_lite
File "C:\Users\Kevin\Desktop\Ontology\schemaorg\lib\rdflib\plugins\parsers\structureddata.py", line 176, in _process
processor.graph_from_source(orig_source, graph=graph, pgraph=processor_graph, rdfOutput=False)
File "C:\Users\Kevin\Desktop\Ontology\schemaorg\lib\rdflib\plugins\parsers\pyRdfa\__init__.py", line 662, in graph_from_source
if not rdfOutput : raise b
FailedSource
INFO 2016-09-29 10:54:39,951 module.py:788] default: "GET /_ah/warmup HTTP/1.1" 500-
The problem is occurring when it tries to parse the RDF, but I suspect the lack of RDF output is being caused by the 500 error. I have done an extensive search and found plenty of examples of the 500 error with GAE, but none of the suggested fixes has worked (e.g., increasing the TIMEOUT setting, rolling back to SDK 1.36).
I am running the app on localhost:9080. I get a 500 error whenever I try to access it from the browser. I can, however, access the admin at localhost:8001. For some reason, it shows two instances running.
Any help would be greatly appreciated. Let me know if you need more information.
This problem has now been fixed with a Windows specific patch to the Schema.org code line as referenced in Git Issues (#1384) and (#1412)
A pull of the latest code from the repository should clear the problem.

PostgreSQL to MySQL data migration

I am trying to move my PostgreSQL database with all the data inside it to a MySQL database so I am using MySQL Workbench > Data migration tool.
On the "Reverse Engineer Source" step I got a strange error:
ERROR: Reverse engineer selected schemata: ProgrammingError("('42P01', '[42P01] ERROR: relation "public.psqlcfg_lid_seq" does not exist;\nError while executing the query (7) (SQLExecDirectW)')"): error calling Python module function DbPostgresqlRE.reverseEngineer Failed
The complete error log where this error message appears at its end is:
Starting...
Connect to source DBMS...
- Connecting...
Connecting to ...
Opening ODBC connection to DSN=InventoryDBDS...
Connected
Connect to source DBMS done
Reverse engineer selected schemata....
Reverse engineering public from InventoryDB
- Reverse engineering catalog information
Traceback (most recent call last):
File "C:\Program Files\MySQL\MySQL Workbench CE 6.0.6\modules\db_postgresql_re_grt.py", line 335, in reverseEngineer
return PostgresqlReverseEngineering.reverseEngineer(connection, catalog_name, schemata_list, context)
File "C:\Program Files\MySQL\MySQL Workbench CE 6.0.6\modules\db_generic_re_grt.py", line 228, in reverseEngineer
catalog = cls.reverseEngineerCatalog(connection, catalog_name)
File "C:\Program Files\MySQL\MySQL Workbench CE 6.0.6\modules\db_generic_re_grt.py", line 388, in reverseEngineerCatalog
cls.reverseEngineerSequences(connection, schema)
File "C:\Program Files\MySQL\MySQL Workbench CE 6.0.6\modules\db_postgresql_re_grt.py", line 76, in reverseEngineerSequences
min_value, max_value, start_value, increment_by, last_value, is_cycled, ncache = cls.execute_query(connection, seq_details_query % (schema.name, seq_name)).fetchone()
File "C:\Program Files\MySQL\MySQL Workbench CE 6.0.6\modules\db_generic_re_grt.py", line 76, in execute_query
return cls.get_connection(connection_object).cursor().execute(query, *args, **kwargs)
pyodbc.ProgrammingError: ('42P01', '[42P01] ERROR: relation "public.psqlcfg_lid_seq" does not exist;\nError while executing the query (7) (SQLExecDirectW)')
Traceback (most recent call last):
File "C:\Program Files\MySQL\MySQL Workbench CE 6.0.6\workbench\wizard_progress_page_widget.py", line 192, in thread_work
self.func()
File "C:\Program Files\MySQL\MySQL Workbench CE 6.0.6\modules\migration_schema_selection.py", line 160, in task_reveng
self.main.plan.migrationSource.reverseEngineer()
File "C:\Program Files\MySQL\MySQL Workbench CE 6.0.6\modules\migration.py", line 335, in reverseEngineer
self.state.sourceCatalog = self._rev_eng_module.reverseEngineer(self.connection, self.selectedCatalogName, self.selectedSchemataNames, self.state.applicationData) SystemError: ProgrammingError("('42P01', '[42P01] ERROR: relation "public.psqlcfg_lid_seq" does not exist;\nError while executing the query (7) (SQLExecDirectW)')"): error calling
Python module function DbPostgresqlRE.reverseEngineer
ERROR: Reverse engineer selected schemata: ProgrammingError("('42P01', '[42P01] ERROR: relation "public.psqlcfg_lid_seq" does not exist;\nError while executing the query (7) (SQLExecDirectW)')"): error calling Python module function DbPostgresqlRE.reverseEngineer Failed
I've searched the web for anything related to (error 42P01) appearing in the log, but couldn't find any reference. So if someone can please tell me what exactly I am doing wrong here that will be really great.
Thanks
This error bring me here.
If your "psqlcfg_lid_seq" actually including both uppercase and lowercase character(s), remember that PostgreSQL will convert the name into ALL lowercase for query.
A basic knowledge is: In order to perform a case matched query, the name must be wrapped by double quotation marks ("), so the convertion will be avoided.
However, in MySQL Workbench, they forget to do that when try to fetch sequences.
In db_postgresql_re_grt.py. Located in %Program Files%\MySQL\MySQL Workbench (Your version, for example "6.1 CE")\modules on Windows.
Line around 70, you will found the SQL query in the variable seq_details_query, it will be something like:
seq_details_query = """SELECT min_value, max_value, start_value,
increment_by, last_value, is_cycled, cache_value
FROM %s.%s"""
Change that to:
seq_details_query = """SELECT min_value, max_value, start_value,
increment_by, last_value, is_cycled, cache_value
FROM \"%s\".\"%s\""""
So the sequences can be fetched, and so whole flow can be proceed.
Notice that: You may need to restart MySQL Workbench to use modified scripts.
I'm surprised MySQL guys still not fix this problem. Maybe i need report the bug somehow? :P
42P01 is a generic error meaning the object doesn't exist.
In this case it's the sequence public.psqlcfg_lid_seq that does not exist.
Based on the series of error messages, the error happens when the tool tries to query the attributes of that sequence (with SELECT ... FROM schema_name.sequence_name)
Presumably this sequence is still referenced somewhere in the database even if ot no longer exists. In theory there are safeguards in PostgreSQL against this situation (dependency tracking) but I believe they depend on your server version and maybe on the specifics of the dependency.
To find the references to this sequence, one approach would be to dump the database schema to an SQL text file (with pg_dump -s) and search for the text psqlcfg_lid_seq within it.
Once found, presumably some ALTER statements may be able to remove the references.

Getting database for Echoprint Api using Python getting error

i am trying to get started with Echoprint api .just installed Echoprint server and run it then i run a command for getting database of Echoprint api
ritesh#L901134:~/echoprint/util$ python fastingest.py -b /home/ritesh/Downloads/echoprint-dump.json
and the error logs i am getting is
1/1 /home/ritesh/Downloads/echoprint-dump.json
Traceback (most recent call last):
File "fastingest.py", line 62, in <module>
codes, bigeval = parse_json_dump(f)
File "fastingest.py", line 14, in parse_json_dump
codes = json.load(open(jfile))
File "/usr/lib/python2.7/json/__init__.py", line 278, in load
**kw)
File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
can any one please tell why i am facing this error and how to remove it so that i can get complete database ??
Try to use this command:"
python splitdata.py .../big.json to split file in more mini.json. In this case you will can ingest json's and find which file is broken.
for example :
after splitting will have:
big-1.json
big-2.json
big-3.json - broken
big-4.json
big-5.json
continue to insert 1,2,4,5 - and than try to split again big-3.json ... infinit than, you will find broken file, which you can repair manually.

Mercurial client error 255 and HTTP error 404 when attempting to push large files to server

Problem:
19/06/10 Update: More evidence problem is server-side. Receiving this error on Windows 7 command line (see below for full traceback):
URLError: <urlopen error [Errno 10054] An existing connection was forcibly closed by the remote host>
abort: error: An existing connection was forcibly closed by the remote host
When attempting to push a changeset that contains 6 large files (.exe, .dmg, etc) to my remote server my client (MacHG) is reporting the error:
"Error During Push. Mercurial reported
error number 255: abort: HTTP Error
404: Not Found"
What does the error even mean?! The only thing unique (that I can tell) about this commit is the size, type, and filenames of the files. How can I determine which exact file within the changeset is failing? How can I delete the corrupt changeset from the repository? In a different post, someone reported using "mq" extensions to effectively delete an erroneous changeset from the history within a repository, but mq looks overly complicated for what I'm trying to solve.
Background:
I can push and pull the following: source files, directories, .class files and a .jar file to and from the server, using both MacHG and toirtoise HG.
I successfully committed to my local repository the addition for the first time the 6 large .exe, .dmg etc installer files (about 130Mb total).
In the following commit to my local repository, I removed ("untracked" / forget) the 6 files causing the problem, however the previous (failing) changeset is still queued to be pushed to the server (i.e. my local host is trying to push the "add" and then the "remove" to the remote server - and keep aligned with the "keep everything in history" philosophy of the source control system).
I can commit .txt .java files etc using TortoiseHG from Windows PCs. I haven't actually testing committing or pushing the same large files using TortoiseHG.
Please help!
Setup:
Client applications = MacHG v0.9.7 (SCM 1.5.4), and TortoiseHG v1.0.4 (SCM 1.5.4)
Server = HTTPS, IIS7.5, Mercurial 1.5.4, Python 2.6.5, setup using these instructions:
http://www.jeremyskinner.co.uk/mercurial-on-iis7/
In IIS7.5 the CGI handler is configured to handle ALL verbs (not just GET, POST and HEAD).
My hgweb.cgi file on the server is as follows:
#!/usr/bin/env python
#
# An example hgweb CGI script, edit as necessary
# Path to repo or hgweb config to serve (see 'hg help hgweb')
#config = "/path/to/repo/or/config"
# Uncomment and adjust if Mercurial is not installed system-wide:
#import sys; sys.path.insert(0, "/path/to/python/lib")
# Uncomment to send python tracebacks to the browser if an error occurs:
#import cgitb; cgitb.enable()
from mercurial import demandimport; demandimport.enable()
from mercurial.hgweb import hgweb, wsgicgi
application = hgweb('C:\inetpub\wwwroot\hg\hgweb.config')
wsgicgi.launch(application)
My hgweb.config file on the server is as follows:
[collections]
C:\Mercurial Repositories = C:\Mercurial Repositories
[web]
baseurl = /hg
allow_push = usernamea
allow_push = usernameb
Output from the command line from my macbook (both Mercurial and MacHG installed) using -v and --trackback flags:
macbook15:hgrepos coderunner$ hg -v --traceback push
pushing to https://coderunner:***#hg.mydomain.com.au/hg/hgrepos
searching for changes
3 changesets found
Traceback (most recent call last):
File "/Library/Python/2.6/site-packages/mercurial/dispatch.py", line 50, in _runcatch
return _dispatch(ui, args)
File "/Library/Python/2.6/site-packages/mercurial/dispatch.py", line 471, in _dispatch
return runcommand(lui, repo, cmd, fullargs, ui, options, d)
File "/Library/Python/2.6/site-packages/mercurial/dispatch.py", line 341, in runcommand
ret = _runcommand(ui, options, cmd, d)
File "/Library/Python/2.6/site-packages/mercurial/dispatch.py", line 522, in _runcommand
return checkargs()
File "/Library/Python/2.6/site-packages/mercurial/dispatch.py", line 476, in checkargs
return cmdfunc()
File "/Library/Python/2.6/site-packages/mercurial/dispatch.py", line 470, in <lambda>
d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
File "/Library/Python/2.6/site-packages/mercurial/util.py", line 401, in check
return func(*args, **kwargs)
File "/Library/Python/2.6/site-packages/mercurial/commands.py", line 2462, in push
r = repo.push(other, opts.get('force'), revs=revs)
File "/Library/Python/2.6/site-packages/mercurial/localrepo.py", line 1491, in push
return self.push_unbundle(remote, force, revs)
File "/Library/Python/2.6/site-packages/mercurial/localrepo.py", line 1636, in push_unbundle
return remote.unbundle(cg, remote_heads, 'push')
File "/Library/Python/2.6/site-packages/mercurial/httprepo.py", line 235, in unbundle
heads=' '.join(map(hex, heads)))
File "/Library/Python/2.6/site-packages/mercurial/httprepo.py", line 134, in do_read
fp = self.do_cmd(cmd, **args)
File "/Library/Python/2.6/site-packages/mercurial/httprepo.py", line 85, in do_cmd
resp = self.urlopener.open(req)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 389, in open
response = meth(req, response)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 502, in http_response
'http', request, response, code, msg, hdrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 427, in error
return self._call_chain(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 361, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 510, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 404: Not Found
abort: HTTP Error 404: Not Found
macbook15:hgrepos coderunner$
Output from Windows 7 host (has only TortoiseHG installed) attempting to push the same files to the server (different changset, but contains the same 6 file additions as the changeset being pushed from the macbook)
c:\repositories\hgrepos>hg -v --traceback push
pushing to https://coderunner:***#hg.mydomain.com.au/hg/hgrepos
searching for changes
1 changesets found
Traceback (most recent call last):
File "mercurial\dispatch.pyo", line 50, in _runcatch
File "mercurial\dispatch.pyo", line 471, in _dispatch
File "mercurial\dispatch.pyo", line 341, in runcommand
File "mercurial\dispatch.pyo", line 522, in _runcommand
File "mercurial\dispatch.pyo", line 476, in checkargs
File "mercurial\dispatch.pyo", line 470, in <lambda>
File "mercurial\util.pyo", line 401, in check
File "mercurial\commands.pyo", line 2462, in push
File "mercurial\localrepo.pyo", line 1491, in push
File "mercurial\localrepo.pyo", line 1636, in push_unbundle
File "mercurial\httprepo.pyo", line 235, in unbundle
File "mercurial\httprepo.pyo", line 134, in do_read
File "mercurial\httprepo.pyo", line 85, in do_cmd
File "urllib2.pyo", line 389, in open
File "urllib2.pyo", line 407, in _open
File "urllib2.pyo", line 367, in _call_chain
File "mercurial\url.pyo", line 523, in https_open
File "mercurial\keepalive.pyo", line 259, in do_open
URLError: <urlopen error [Errno 10054] An existing connection was forcibly closed by the remote host>
abort: error: An existing connection was forcibly closed by the remote host
c:\repositories\hgrepos>
It is a keep-alive issue? Is IIS7.5 at fault? Python 2.6.5 at fault?
Went through the same pain points...
With the default settings on the IIS server, you will not be able to push large repositories to the server, as IIS has a default maximum request length of only 4 MB, and a timeout for CGI scripts of 15 min, making it impossible to upload large files.
To enable the uploading of large files (and this is not easy to find on the web…), do the following:
1. In IIS Manager, click on the web site node, and click the Limits… link.
2. Then specify a connection time-out sufficiently large (I chose 1 hour here, or 3600 seconds)
3. Next, click the node containing hg (as per the installation procedure), then double-click CGI
4. Specify a sufficiently-long time out for CGI scripts (e.g., 10 hours)
Now, edit C:\inetpub\wwwroot\hg\web.config, so that it has a new <security> section under <system.webserver>, and a <httpRuntime> specification under <system.web>:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<system.webServer>
[…]
<security>
<requestFiltering>
<requestLimits maxAllowedContentLength ="2147482624" />
</requestFiltering>
</security>
</system.webServer>
<system.web>
<httpRuntime
executionTimeout="540000" maxRequestLength="2097151"/>
</system.web>
</configuration>
This specifies an http timeout of a bit more than 6 days, and a maximum upload limit of about 2 GB.
Had the same issue using IIS 7 as server. Tried the solution above which resolved the error 255 issue, but still got Errorno 10054 with larger files. I then increased the Connection Time-out in IIS which worked.
To change: Web Site -> Manage Web Site -> Advanced Settings -> Connection Limits -> Connection Time-out. The default is 2 minutes. Changed mine to 20 minutes and it worked.
Not sure why this works but seems that Mercurial makes a connection to the server, takes a while to process larger files, then only sends a request. By that time IIS has disconnected the client.
Ok, your solution did it!
I already had a requestLimits tag like this:
<requestLimits maxUrl="16384" maxQueryString="65536" />
so I added maxAllowedContentLength ="524288000" to it like this:
<requestLimits maxUrl="16384" maxQueryString="65536" maxAllowedContentLength ="524288000" />
And that did it!
I'm just posting this for anyone else coming into this thread from a search.
There's currently an issue using the largefiles extension in the mercurial python module when hosted via IIS. See this post if you're encountering issues pushing large changesets (or large files) to IIS via TortoiseHg.
The problem ultimlately turns out to be a bug in SSL processing introduced Python 2.7.3 (probably explaining why there are so many unresolved posts of people looking for problems with Mercurial). Rolling back to Python 2.7.2 let me get a little further ahead (blocked at 30Mb pushes instead of 15Mb), but to properly solve the problem I had to install the IISCrypto utility to completely disable transfers over SSLv2.

Resources