Is there a way to bulk download files in Jive via API or script? - export

We have to extract almost 1,000 documents for a divestiture. Doing it by clicking is going to take a long time. I know Jive has an API, but I can find anything that would let us download multiple files from multiple groups.
Any ideas are appreciated.
Thanks!

Sure. Use /contents/{contentID} to grab a document.
There's more detail in the Document Entity Section of the Jive REST API Documentation.
You might find your list of documents to retrieve by using the Search methods of the API. Here's a curl example:
curl -v -u <username:password> https://<url>/api/core/v3/search/contents?filter=search(<search term>)
Also, just so you know, there is an active Jive Developer Community where questions like this are likely to get more eyeballs. And, as a start to Development with Jive in general, check out https://developer.jivesoftware.com/

Related

Semantria Integration with DB

I need to know, has someone integrated any DB to Semantria, and get output to any DB or excel or text file ?
I have tried to explore semantria via excel and API , but integration does not work perfectly.
It depends on what kind of integration you're looking for.
I have already done many integrations with different storages including indexing services and RDBMS solutions.
Unfortunately there are no ready-to-use components available on the market, so you will need to build integration by your own.
Semantria offers SDK (https://github.com/Semantria/semantria-sdk) for all modern languages, you will need to build a logic that will get analysis results and will save them to a certain storage.
Can you please explain what storage do you use and what Semantria output you're interested in?
Thanks George.
Well at the moment, we are just focusing on pulling the data from DB (take for instance mySQL, or Oracle), and output should again go back to same DB, i will take care of transformation needed in o/p.
Now where I am stuck, is the place where I can set up a link between DB and semantria, how will these SDK help, never worked on something like this.
A brief on this will surely be of great help

How to crawl entire wikimapia?

i need a sitemap which can help to people and google to know pages as well.
I've tried WebSphinx application.
I realize if I put wikipedia.org as the starting URL, it will not crawl further.
Hence, how to actually crawl the entire Wikipedia? Can anyone gimme some guidelines? Do I need to specifically go and find those URLs and put multiple starting URLs?
Anyone has suggestions of good website with the tutorial on usng WebSphinx's API?
Crawling wikipedia is a bad idea. It is hundreds of TBs of data uncompressed. I would suggest offline crawling by using various dumps provided by wikipedia. Find them here https://dumps.wikimedia.org/
You can create a sitemap for wikipedia using page meta information, external links, interwikilinks and redirects databases to name a few.

Parse data with tika for apache solr

I have managed to get apache nutch to index a news website and pass the results off to Apache solr.
Using this tutorial
https://github.com/renepickhardt/metalcon/wiki/simpleNutchSolrSetup the only difference is I have decided to use Cassandra instead.
As a test I am trying to crawl Cnn, to extract out the title of article's and the date it was published.
Question 1:
How to parse data from the webpage, to extract the date and the title.
I have found this article for a plugin. It seems a bit out dated and am not sure that it still applies. I have also read that Tika can be used as well but again most tutorials are quite old.
http://www.ryanpfister.com/2009/04/how-to-sort-by-date-with-nutch/
Another SO article is this
How to extend Nutch for article crawling. I would prefer to use Nutch, only because that is what I have started with. I have do not really have a preference.
Anything would be a great help.
Norconex HTTP Collector will store with your document all possible metadata it could find, without restriction. That ranges from the HTTP Header values obtained when downloading a page, to all the tags in that HTML page.
That may likely be too much fields for you. If so, you can reject those you do not want, or instead, be explicit about the ones you want to keep by adding a "KeepOnlyTagger" to your <importer> section in your configuration:
<tagger class="com.norconex.importer.tagger.impl.KeepOnlyTagger"
fields="title,pubdate,anotherone,etc"/>
You'll find how to get started quickly along with configuration options here: http://www.norconex.com/product/collector-http/configuration.html

Index my own data in Solr

I am new to Solr and have a couple of questions to ask help from more experienced people:
I am able to get example running, however what is exactly the start.jar?
I know by running "java -jar start.jar", i can start solr. But do i run this command after i index my own data, not the given sample data? if not, what should i do to run my own solr instance with my own indexed data?
I do need to index my own sample data, not related to the given example solr thing at all. How exactly should i do it? Should i copy the example directory then modify the fields in sechema.xml? should i then run the post.sh accordingly to index the data like what i did to set up the example solr?
Thanks a lot for your help!
Steps:
Decide what will be the document structure u store in SOLR. (Somewhat like creating the schema of a relational DB for one table).
remove the example core and create your own core with that schema
once the schema works with no errors (you check the server logs that hosts the SOLR app) You can start feed the data you have into SOLR. You POST it via HTTP in a specific structure which is documented in the SOLR Wiki. Various frameworks have some classes to handle that.
Marked as Wiki as this is too broad an answer for someone who did not bother to RTFM...
Dear custom indexing is not a difficult task as I have worked on it just a few days ago. First you need to write your documnet is xml,csv or json( format supported in solr) containing fields according to your schema.xml, then run following command in example/exampledocs
For a document mydoc.xml
./post.sh mydoc.xml
if in output, status value is 0 then indexing is successful and you can search your document in solr
Reference:http://www.solrtutorial.com/solr-in-5-minutes.html
Though the question is old, but I am writing for new visitors with same issue. The question can't be answered in few words. You must understand what Solr is, whats Solr Admin UI, why we need Solr instead a relational database. Then you can understand how to import sample data. I have recently published two articles i.e. Solr Introduction and Importing Sample Data, these might be helpful for you.
http://www.devtrainings.com/2017/03/apache-solr-introduction-and-server.html
http://www.devtrainings.com/2017/03/apache-solr-index-data-and-run-search.html

Where are the specifications/XSDs for Amazon MWS feed XML processing reports?

Amazon provides a batch of documents describing the format of the feeds we can send via MWS, however, we also need to know what to expect in their responses, what status codes may be reported or what is the structure of XML when errors reported, etc...
Where can I get the information?
The MWS XML schemata are documented within the Selling on Amazon Guide to XML linked from the Developer Guides section in the Amazon Marketplace Web Service (Amazon MWS) Documentation.
I'm omitting a direct link to the PDF, as this might change once in a while. For the same reason the XSD files you are looking for are not publicly linked by Amazon as well, rather you'll find the links to the most current schema documents within the respective sections of the Selling on Amazon Guide to XML.
You might also be interested in the Amazon MWS Developer Guide, the Feeds API Reference and the guide for the Amazon MWS Scratchpad, which are all available there as well.
Good luck!
I know this is a rather old question but I just wanted to look at the actual XML schema files myself today.
There is an XML Documentation PDF hosted on images-na.ssl-images-amazon.com which I assume will stay there for a while. This PDF contains links to the core schema files amzn-envelope.xsd, amzn-header.xsd, and amzn-base.xsd and some other API schemas like Product.xsd which all appear to be relative to https://images-na.ssl-images-amazon.com/images/G/01/rainier/help/xsd/release_1_9/.
The PDF explicitly states that
The XSD samples shown on the Help pages may not reflect the latest XSDs. We recommend
using the provided XSD links to obtain the latest [ve]rsions.
However, the official MWS Feeds API documentation also links to some XSDs but these are relative to https://images-na.ssl-images-amazon.com/images/G/01/rainier/help/xsd/release_4_1/ now, e.g. Price.xsd. Schema references also seem to be relative to this path. For example, Price.xsd includes amzn-base.xsd via <xsd:include schemaLocation="amzn-base.xsd"/> and sure enough there it is.
Unfortunately, I have no idea whether release_4_1 is the latest release of the schemas but the link from the MWS API documentation is a good indicator to me.
Another way to get the XSD's which I think is the most "official" way is to go to your Seller Central and navigate to Help > XML & data exchange > Reference > XSDs.
There you can download all the XSD's available to your account.
Hope it helps!
It seems that this XSD files are outdated.
Just checked the official sellercentral help page for the XSD files https://sellercentral-europe.amazon.com/gp/help/G1611
For the OrderReport there is still release_4_1 referenced.
Some time ago amazon has added a new field to OrderReport for EU markets. The new field is IsSoldByAB.
I am using the xsd files since many years for automatic code generation. And this fails from time to time because of new fields like this. This field is not descriped in one of this:
release_1_9 ($Revision: #7 $, $Date: 2006/05/23 $)
release_4_1 ($Revision: #10 $, $Date: 2007/09/06 $)
XSD files and I am not able to find a version that include this field.
Since some years I extend the XSD file on my own to generate my code. IsSoldByAB is just a boolean field as IsPrime or IsBusinessOrder. So this was an easy task but not "official"...

Resources