I started to create google dataproc instance in version 1.3 due to some problems in version 1.2 . Zeppelin should be "version 0.8" because "dataproc version 1.3" installs spark 2.3 . I can export "zeppelin json files created in version 0.73" to 0.8. It does not throw error as expected. But if I create a json in zeppelin 0.8 and then export this json to my local machine and then I try to import "json in local machine" to version 0.73 and version 0.8. Both threw same error. (invalid json). How can I correct this problem? same machine created json but cant read json it created. It is strange issue.
I copy the json created by zeppelin 0.8 below for your review. It includes one line spark code.
Thanks.
"{\"paragraphs\":[{\"text\":\"%spark \\n\\nval dd = 2\",\"user\":\"anonymous\",\"dateUpdated\":\"2018-07-19T12:58:54+0000\",\"config\":{\"colWidth\":12,\"fontSize\":9,\"enabled\":true,\"results\":{},\"editorSetting\":{\"language\":\"scala\",\"editOnDblClick\":false,\"completionKey\":\"TAB\",\"completionSupport\":true},\"editorMode\":\"ace/mode/scala\"},\"settings\":{\"params\":{},\"forms\":{}},\"apps\":[],\"jobName\":\"paragraph_1532005121582_199122006\",\"id\":\"20180719-125841_1980776305\",\"dateCreated\":\"2018-07-19T12:58:41+0000\",\"status\":\"READY\",\"progressUpdateIntervalMs\":500,\"focus\":true,\"$$hashKey\":\"object:523\"}],\"name\":\"ddd\",\"id\":\"2DNR1W1HM\",\"noteParams\":{},\"noteForms\":{},\"angularObjects\":{\"spark:shared_process\":[]},\"config\":{\"isZeppelinNotebookCronEnable\":false,\"looknfeel\":\"default\",\"personalizedMode\":\"false\"},\"info\":{}}"
I had the same error and used https://www.freeformatter.com/json-escape.html to unescape the escape characters. Once you unescape the string that you have quoted in this question, make sure your remove the double quotes from the beginning and ending of it. You result will be a valid JSON, which you can import into Zeppelin.
PS: Make sure you remove any PHI or PII out of your initial JSON that you got from exporting the notebook. In other words, before you export the notebook, use the "clear output" function provided in the Zeppelin notebook.
PPS: You can also use "JSON Unesape" within Sublime Text.
Related
When I was trying to create an AVRO file with data containing a date object, it throws an AvroTypeException error.
This is the schema I have used:
screen shot of schema
This is the code-bit writing the data:
screen shot of code-bit
This is the error shown while running the code:
screen shot of error
Please find the link here to full version of my code I have tried.
NOTE: Python version: 3.7.10, avro-python3 version 1.10.2
Any help or suggestions are appreciated.
avro-python3 never supported logical types and was abandoned in favor of having the avro library support both Python 2 and 3 (and now it only supports Python 3).
To fix the problem you should pip uninstall avro-python3 and pip install avro.
I am having a problem exporting a collection from Mongo Atlas to my local machine. I have tried several different formats including this one, which I found in the official Atlas documentation on importing and exporting data.
First I log into my Atlas like so:
mongosh "mongodb+srv://cluster0.oyvrw.mongodb.net/dbname" --username uname
Then I try the command from the official docs:
mongoexport --uri mongodb+srv://uname:password#cluster0.oyvrw.mongodb.net/dbname --collection colname --type json --out cats.json
I have looked around at other similar questions and tried everything I can find online without success. One suggestion was not to run the command from the mongo shell but from the regular command line, but this does not work either.
It seems like it should be easier to get a collection out of Atlas to JSON. Any help or suggestions are much appreciated. Thanks!
For anyone facing this error, the mongoexport command does not work with mongosh. It must be run with the system shell.
However, mongoexport is part of mongo-database-tools, which as of MongoDB 4.4, is released separately. As a result, running mongoexport in the system shell will throw a command not found if the installed version of MongoDB is 4.4 or greater.
To solve this you can install the database tools using homebrew:
brew install mongodb/brew/mongodb-database-tools
Of course, make sure you have homebrew already installed. If not a quick Google will help.
Then following command should work to perform an export:
mongoexport --uri mongodb+srv://<username>:<password>#cluster0.oyvrw.mongodb.net/<dbName> --collection <collectionName> --type json --out /Users/macuser/desktop/exportBU.json
Hope that helps anyone having similar problems getting data in/out of MongoDB.
i'm trying z.load in apache zeppelin as following:
%dep
z.load("/zeppelin-0.5.6-incubating-bin-all/lplibs/hive/csv-serde-1.0.5-jar-with-dependencies.jar")
I get an ERROR and it says (not sure this is the error):
Must be used before SparkInterpreter (%spark) initialized
Hint: put this paragraph before any Spark code and restart Zeppelin/Interpreter
this zeppelin section is the first i have in my notebook so i'm not sure what its complaining about..
Right now I can't check your problem, but you should restart interpreter (pushing restart button) before loading dependency jar file.
There might be a chance that Sparkcontext has already been started by other notebook.
So as Kangrok mentioned, just restart Spark interpreter.
But apart from that, why don't you use the latest zeppelin, in which you don't need to use %dep to load your dependencies. Instead it can load from Interpreter screen.
More details can be found here https://zeppelin.incubator.apache.org/docs/0.6.0-incubating-SNAPSHOT/manual/dependencymanagement.html
I'm using FossilSCM as my only solution for control version and tickets. So far, so good. Its self contained and minimalist approach suit my needs. But I would like to start to make some analysis on the projects history and development and a good soruce for that are the projects timelines. I could go with some html parsing trying to convert the Fossil timeline output to something else, but I would like if there is any option to export that info in other structured format (e.g JSON or similar). Web search has not produce any useful finding on that issue. Any pointers to a solution?
Thanks,
Offray
Have you tried fossil json timeline branch trunk?
fossil help json
Usage: fossil json SUBCOMMAND ?OPTIONS?
In CLI mode, the -R REPO common option is supported. Due to limitations
in the argument dispatching code, any -FLAGS must come after the final
sub- (or subsub-) command.
The commands include:
anonymousPassword
artifact
branch
cap
config
diff
dir
g
login
logout
query
rebuild
report
resultCodes
stat
tag
timeline
user
version (alias: HAI)
whoami
wiki
Run 'fossil json' without any subcommand to see the full list (but be
aware that some listed might not yet be fully implemented).
Compile json when you build from source:
./configure --json
The key for having this working is to enable json support in fossil by compiling it from sources. Current version have it disabled, so looking for any clue on it in command line help got me nothing originally. Thanks to user 2612611 for the initial clue about it. Here is the procedure I followed:
Go to https://www.fossil-scm.org/download.html and download the source tarball package.
Uncompress the previous package.
Go to the folder where you uncompressed the package (lets call it /uncompress-folder
Run ./configure --json
Run make.
Optional: Put your newly created fossil binary in your path or where the last one was installed (something like sudo mv /uncompress-folder/fossil /usr/bin/fossil.
Open the fossil repository that you want to export its history and launch the fossil web interface (fossil ui).
Go to http://localhost:8080/json/timeline/checkin?limit=0 ,where http://localhost:8080 is your local machine interface for fossil ui, and json/timeline/checkin?limit=0 is the json API call saying: json export of timeline (/json/timeline) chekins (/checkin) for all history (?limit=0). If instead of the 0 at the end of the url you put another integer you will have the last n checkins.
From command prompt you should be able to get the same result by running fossil json timeline checkin --limit=0 > timeline.json stored on the file timeline.json, instead of the web browser but in local test it didn't work.
API is still a moving target, but you can find documentation on this excellent project at [1] and a demo interface to test the parameters at [2]
[1] https://docs.google.com/document/d/1fXViveNhDbiXgCuE7QDXQOKeFzf2qNUkBEgiUvoqFN4/view?pli=1#
[2] http://fossil.wanderinghorse.net/repos/fossil-sgb/json/
I'm using Tika parser to index my files into Solr. I created my own parser (which extends XMLParser). It uses my own mimetype.
I created a jar file which inside looks like this:
src
|-main
|-some_packages
|-MyParser.java
|resources
|-META-INF
|-services
|-org.apache.tika.parser.Parser (which contains a line:some_packages.MyParser.java)
|_org
|-apache
|-tika
|-mime
|-custom-mimetypes.xml
In custom-mimetypes I put the definition of new mimetype becouse my xml files have some special tags.
Now where is the problem: I've been testing parsing and indexing with Solr on glassfish installed on my local machine. It worked just fine. Then I wanted to install it on some remote server. There is the same version of glassfish installed (3.1.1). I copied-pasted Solr application, it's home directory with all libraries (including tika jars and the jar with my custom parser). Unfortunately it doesn't work. After posting files to Solr I can see in content-type field that it detected my custom mime type. But there are no fields that suppose to be there like if MyParser class was never runned. The only fields I get are the ones from Dublin Core. I checked (by simply adding some printlines) that Tika is only using XMLParser.
Have anyone had similar problem? How to handle this?
Problem was that I was using Java 7 to compile my parser but Apache Tika was compiled with Java 5...