How can I publish my dataset on Google Dataset Search?

How can I publish my dataset on Google Dataset Search? - dataset

I want to put my dataset which I have prepared in Google Dataset Search. I didn't find any procedure to do this task online.
This question - How can I post my doctoral lab's datasets on Google Dataset Search? asked in Google support provide answers but the corresponding links are not of great use.
Can

The easiest way to make your dataset eligible to be included in Dataset Search results is to upload it to a repository that adheres to metadata standards used by Dataset Search. There are many such repositories. You can try the following:
https://zenodo.org
https://figshare.com
https://dataverse.harvard.edu/
https://www.kaggle.com/datasets
If you don't want to use an external repository you will need to embed metadata in the webpage you host that describes the datasets. More information on this here.

Related

exporting the experiment/simulation screen with PLE possible?

so i made the model for my thesis and like to share it with my prof. over the anylogic cloud. Unfortunatelly when i try to export the model it shows:
"Model parameter values overridden by experiment 'Simulation' will not be used; default values will be exported."
and it will not show the simulation screen in the cloud version. Is this because i use the PLE, if no: how can i improve my export?
i already tried the documentation and various google searches, but i was not able to find anything useful.
Thank you in advance! :)

There is no simulation screen in the cloud. This is independent of your version.
Learn how to create simulation-setups in the cloud by checking the "Run configuration" part of your model before uploading to the cloud. Also check the help on that and example models (those uploaded by AnyLogic to the cloud and that you find matched in the AnyLogic example library).

How to do FTS within Google Cloud Platform

Does Google Cloud Platform have a product to do full-text search via an API with non-web data (such as json or xml documents)? This may seem like a pretty silly question, but the only options I have come across are:
Search inside of Google App Engine (only available for python2, not python3) -- https://cloud.google.com/appengine/training/fts_intro/.
Related to web search only: https://developers.google.com/custom-search/docs/tutorial/introduction
Using a managed Elasticsearch: https://console.cloud.google.com/marketplace/details/google/elasticsearch.
Cloud firestore explicitly states it doesn't offer that and suggests using Aloglia (and gives details on integrating): https://cloud.google.com/firestore/docs/solutions/search
Is there something I'm missing? I'm basically looking to index and search about a million documents in a sort of free-form type of search. Is this offered as a product from Google outside of App Engine? If so, how can I access it?

You have pretty much covered it there. There is currently no specific Google service for full-text search. As you mentioned, App Engine Search API is available for Python 2.7, which will stop being maintained after January 2020, and not Python 3.
There is one more option you could consider, which is using Lucene foe GAE. I found this blog where several possibilities are studied, perhaps could be an interesting reading for you.
To sum up, I would recommend ElasticSearch or Aloglia, but for the latter you need a Firebase project.

Using Azure Search for PDFs in Azure Blob Storage

We are trying to enable full text search. Application stores PDF files in the Azure Blob Storage, which is the data source for Azure Search. Majority of this works fine however the Indexer is not able to extract text from couple of PDFs. Are there any specific kinds of PDFs that Azure Search Indexer can extract?. If Yes, What are they?
Any information, Help/Support in this regard greatly appreciated.

Azure Search can extract all text from PDF text elements. Extracting text from embedded images (which requires OCR) or tables is not yet integrated in Azure Search, but it is on the roadmap.
If your PDFs contain images and you want to extract text from those as well, then you can try following the steps here.

Are there any specific kinds of PDFs that Azure Search Indexer can extract?
Based on my experience, there are no specific kinds of PDFs that Azure search Indexer can't extract. According to your description, I assume that it reaches the Azure search limitation. For more detailed information please refer to Indexing Documents in Azure Blob Storage with Azure Search.
Azure Search limits how much text it extracts depending on the pricing tier: 32,000 characters for Free tier, 64,000 for Basic, and 4 million for Standard, Standard S2 and Standard S3 tiers. A warning is included in the indexer status response for truncated documents.

I recently wrote a blog post about my experience with this. I ended up using a python-based script running in a Docker container within Azure Somewhat complicated, but the blog lays it out pretty clearly (and the results have been very good as far as OCR/searchability)
http://martyice.github.io/docker-in-azure/

Developing an web directory search engine for enterprises information, what's better? use a database or files?

I want to develop an web app for storing enterprises' information, so this info can be searched by keywords as by category, but principally by keywords, because the interface it's going to be as simple a Google. The doubt I have is, is it better to store this info in a database or in text files?

If you want full text search, probably neither. You should look into a search index such as Elasticsearch (http://www.elasticsearch.org/overview/). A search index stores data in a way that is optimized for searching.

Where are the specifications/XSDs for Amazon MWS feed XML processing reports?

Amazon provides a batch of documents describing the format of the feeds we can send via MWS, however, we also need to know what to expect in their responses, what status codes may be reported or what is the structure of XML when errors reported, etc...
Where can I get the information?

The MWS XML schemata are documented within the Selling on Amazon Guide to XML linked from the Developer Guides section in the Amazon Marketplace Web Service (Amazon MWS) Documentation.
I'm omitting a direct link to the PDF, as this might change once in a while. For the same reason the XSD files you are looking for are not publicly linked by Amazon as well, rather you'll find the links to the most current schema documents within the respective sections of the Selling on Amazon Guide to XML.
You might also be interested in the Amazon MWS Developer Guide, the Feeds API Reference and the guide for the Amazon MWS Scratchpad, which are all available there as well.
Good luck!

I know this is a rather old question but I just wanted to look at the actual XML schema files myself today.
There is an XML Documentation PDF hosted on images-na.ssl-images-amazon.com which I assume will stay there for a while. This PDF contains links to the core schema files amzn-envelope.xsd, amzn-header.xsd, and amzn-base.xsd and some other API schemas like Product.xsd which all appear to be relative to https://images-na.ssl-images-amazon.com/images/G/01/rainier/help/xsd/release_1_9/.
The PDF explicitly states that
The XSD samples shown on the Help pages may not reflect the latest XSDs. We recommend
using the provided XSD links to obtain the latest [ve]rsions.
However, the official MWS Feeds API documentation also links to some XSDs but these are relative to https://images-na.ssl-images-amazon.com/images/G/01/rainier/help/xsd/release_4_1/ now, e.g. Price.xsd. Schema references also seem to be relative to this path. For example, Price.xsd includes amzn-base.xsd via <xsd:include schemaLocation="amzn-base.xsd"/> and sure enough there it is.
Unfortunately, I have no idea whether release_4_1 is the latest release of the schemas but the link from the MWS API documentation is a good indicator to me.

Another way to get the XSD's which I think is the most "official" way is to go to your Seller Central and navigate to Help > XML & data exchange > Reference > XSDs.
There you can download all the XSD's available to your account.
Hope it helps!

It seems that this XSD files are outdated.
Just checked the official sellercentral help page for the XSD files https://sellercentral-europe.amazon.com/gp/help/G1611
For the OrderReport there is still release_4_1 referenced.
Some time ago amazon has added a new field to OrderReport for EU markets. The new field is IsSoldByAB.
I am using the xsd files since many years for automatic code generation. And this fails from time to time because of new fields like this. This field is not descriped in one of this:
release_1_9 ($Revision: #7 $, $Date: 2006/05/23 $)
release_4_1 ($Revision: #10 $, $Date: 2007/09/06 $)
XSD files and I am not able to find a version that include this field.
Since some years I extend the XSD file on my own to generate my code. IsSoldByAB is just a boolean field as IsPrime or IsBusinessOrder. So this was an easy task but not "official"...

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How can I publish my dataset on Google Dataset Search? - dataset

Related

exporting the experiment/simulation screen with PLE possible?

How to do FTS within Google Cloud Platform

Using Azure Search for PDFs in Azure Blob Storage

Developing an web directory search engine for enterprises information, what's better? use a database or files?

Where are the specifications/XSDs for Amazon MWS feed XML processing reports?

Categories

Resources