Errors with Watson Visual Recognition, training not possible - artificial-intelligence

I am trying to train two models on Watson VR. One is for object (details) recognition within a picture. The other is to estimate the class object.
I have been able to prepare the classes of object for both models.
However, it seems I have multiple issues with training and I am now stack. I have found a similar post in Stack Overflow but it relates to data size and type; my data are all in .jpg format and all dataset is below 250 MB.
Classifier:
The classifier is the one that gives me more issues.
Firstly, I have tried to train the model but then the server went down. The day after I have found the model "trained" but with errors. I basically restarted by preparing again the classes.
All classes have at least 10-12 pictures (10 is minimum required). When I click on "Train Model" I receive the following error:
In the dashboard I am given explanation of the failed training:
Data size was originally about 241/250 MB, now it is 18.4/250 MB. I am not sure what brought the change.
Thank you for the help!

Thanks for providing the screenshots, that is very helpful!
It says your "DrinksClassifier" is in a failed state. It's best to delete that collection from Studio, and start over. Make sure you have at least 10 examples of each class... the lower screenshot seems to show it didn't find any examples for "AgedCoffee".

Related

Train an object detection model using dataset from labeling job

I've created a public labeling job so people could help me label objects on 50+ images using 8 different classes.
This job is finished, but I'm still unable to run the training job I've created.
Here's how the job is set up:
Algorithm: built-in object detection
Input data configuration:
Data source: S3
URI: the manifest url generated by the labeling job On S3
I'm getting this error message: "Missing image files in train channel".
Shouldn't it get the images path from the manifest?
What am I missing?
It's a little difficult to diagnose without some additional information (basically the entire request you're making to the CreateTrainingJob API or, if you're using the AWS SageMaker console, the training job definition).
This is probably the most relevant resource as a starting point: https://aws.amazon.com/blogs/machine-learning/easily-train-models-using-datasets-labeled-by-amazon-sagemaker-ground-truth/. You might also find it worthwhile to read through https://docs.aws.amazon.com/sagemaker/latest/dg/augmented-manifest.html.
It could be any number of things, but my hunch is that it has something to do with the configuration of the training job you're trying to create, e.g., the S3DataType has to be "AugmentedManifestFile," the InputMode has to be "Pipe", etc. All these are described in the above links.

Tool to export an entire JIRA project in a readable text format quickly

I am the lead developer on a project for a 'difficult' client. I will try not to bore anybody with the details but here is my issue I am facing.
Our client has a team of QA testers that are managing their project through JIRA. We currently have a fixed bid contract with them to supply them the software they requested at a fixed price and any additional features or pre-existing issues will be covered under time and materials.
They have taken the time to raise every defect within the system unrelated to the current fixed bid process and have tried to get them resolved for free and each time we have come to an agreement through JIRA comments that this is a preexisting issue/new feature and you will have to pay for it after the project has been completed which they have agreed to.
The issue is this client has a history of forgetting conversations and email trains that don't benefit them putting a lot of wasted time on our side digging up proof we agreed to handle a situation a specific way.
The project will not complete for several more weeks but as soon as it does I will likely be removed from the JIRA project by their administrator and they will begin asking again for us to complete all this additional work at no cost and I will lose access to the comments on each issue explaining to them it will not be free and them agreeing.
I am currently exporting each ticket after it closes but this is wasting about 30-40 minutes a day and would be interested if there is a tool out there that can export an entire JIRA project to a readable text format that I can run once near the project end.
TL:DR; Is there a tool that will allow me to export an entire JIRA project in a text readable format before I lose access to the project and all information included within that project
Export as CSV doesn't include comments and is limited to 1000 issues be default.
I have used the jira-python library to retrieve all issues, all fields, all comments from a single project. Missed the attachments though.
But what you have is a people problem more than a technical problem. Good luck!
Large exports (e.g. many hundreds of issues) are not recommended.
To change the number of issues that are exported, change the value of the tempMax parameter in the URL.
To export search results to Microsoft Excel:
Choose Issues > Search for Issues.
Refine your search, as described in Searching for Issues, then choose the Export menu.
Choose one of the following from the dropdown menu:
'Excel (All fields)'— this will create a spreadsheet column for every issue field (excluding comments).
Note: This will only show the custom fields that are available for all of the issues in the search results. For example, if a field is only available for one project and multiple projects are in the search results then that field will not appear in the Excel document. The same goes for fields that are only available for certain issue types.
'Excel (Current fields)' — this will create a spreadsheet column for the issue fields that are currently displayed in your Issue Navigator.
A file called - .xls will be created. Edit this file using Microsoft Excel and/or save it as required.

How is ElasticSearch supposed to work in CakePHP 3?

I've been trying my very best not to ask any nosy question here in stackoverflow, but it has been almost one week since I got stuck in this problem and I couldn't find any solution.
I already have my working website built with CakePHP 3.2. What the website basically does is scrape Twitter for tweets containing a given search term, check if it's already in my database, and store it if it doesn't yet exist. Twitter's JSON response has this "tweet_id" property, and I've been using that value to check for whether I should ignore or append a specific tweet to my DB. While this might be okay while my database is small, I suspect it's going to slow things down considerably when my tables grow bigger. Thus my need for ElasticSearch.
My ElasticSearch server is running on my Arch Linux install, and I've configured my app to point to the said server. Also, I have my "Type" object named the same way as my "Tweets" table (I followed the documentation until the overview part http://book.cakephp.org/3.0/en/elasticsearch.html). This craps out an "Unknown method "alias" error, and following Google searches led me to creating an alternate pagination class since that was what some found to be the cause of the error (https://github.com/lorenzo/audit-stash/issues/4), which still doesn't fix things.
I'm not sure if I got this right. I installed the ElasticSearch plugin with the assumption that all I have to do is name the Types the same name as my tables, since to me the documentation "implies" that this should be done on top of the Blog Tutorial they did to "improve query performance".
TLDR, how is this supposed to work? Is my above assumption right? Do I name the Types differently and index everything myself? I'm not sure if there's just too much automagic, or I'm just poor at these sort of things. And yes, I'm new to frameworks (but not PHP, among other languages)
Thanks in advance!

App Engine backup never finishes only clue is failure in map reduce worker_callback

Over the last few weeks we have repeatedly failed on doing a complete backup of the data store using the datastore admin tool. We thought the issues had to do with quota errors we were running into so we switched our application from a free to a paid app and we still have problems.
Each time we are attempting to back up to the blobstore and what occurs is that the process never finishes. We see the backup in our Pending Backups list but it never actually completes. We only have a total of 43MB of data right now so we don't see it as a data transfer problem. Looking at our default Task Queues it shows that we have two pending tasks one is a call to /_ah/mapreduce/controller_callback and another is a call to /_ah/mapreduce/worker_callback
The worker_callback racks up its retry count and the only error clue we have is on the Previous Run tab it shows the last http response code to be 500. There is no error message, nothing shows up in our error logs, it just keeps trying over and over again.
We've been able to narrow the backup problems to a specific entity kind for a particular namespace but we can't figure out why that entity kind is failing whereas the others are not. The major difference is the entity kind has a large number of embedded entities, but if the app engine is able to read / put those entities we can't understand why it seems to be having problems backing it up. The particular namespace that the error occurs in has the largest data stored for that entity kind compared to the other namespaces we have setup.
We think if we can see what error is occurring in the worker_callback we may be able to figure out why the backup is failing, or what is wrong with our data that's preventing the backup. Is there something we need to setup / enable through settings / configuration files to give us more detailed information on the backup? Or is there some other avenue we should explore to figure out how to investigate/fix this problem?
I should mention we are using the Java SDK as well as Objectify V3 to work with the data store. We are also backing up data to the Blobstore.
Thank you.
Well with the app engine team's help we figured what the problem was and we worked around the issue. I want to give details in case anyone else runs into this problem.
From issue 8363 the app engine team indicated that from their logs they could see that the map reduce failed because of the large number of properties that our entity kind had. The specific entity kind that was causing the failure had a large number of variable properties that was generating errors when map reduce tried to write out a schema. They indicated that the solution on their end was to ignore entities that were like this in the backup to make it so the backup worked successfully.
What we did to work around the issue and make the backup work was change how we told objectify to store out data. The large number of properties were being created due to our use of the #embedded keyword on a HashMap() class member field. Since the embedded keyword breaks down classes into individual components it was generating a large number of properties. We switched the member field to be #serialized and then ran a conversion process to make it use the new serialized property. This made the backup / restore work again.
You can read more about the differences between embedded and serialized on objectify's website
snielson, would you mind opening an issue on our Public issue tracker here. Remember to add your Application ID so we can further debug this specific scenario.
Thanks!

Need ideas on retrieving data from a website

I'm stumped and need some ideas on how to do this or even whether it can be done at all.
I have a client who would like to build a website tailored to English-speaking travelers in a specific country (Thailand, in this case). The different modes of transportation (bus & train) have good web sites for providing their respective information. And both are very static in terms of the data they present (the schedules rarely change). Here's one of the sites I would need to get info from: train schedules The client wants to provide users the ability to search for a beginning and end location and determine, using the external website's information, how they can best get there, being provided a route with schedule times for the different modes of chosen transport.
Now, in my limited experience, I would think the way to do that would be to retrieve the original schedule info from the external site's server (via API or some other means) and retain the info in a database, which can be queried as needed. Our first thought was to contact the respective authorities to determine how/if this can be done, but this has proven to be problematic due to the language barrier, mainly.
My client suggested what is basically "screen scraping", but that sounds like it would be complicated at best, downloading the web page(s) and filtering through the HTML for relevant/necessary data to put into the database. My worry is that the info on these mainly static sites is so static, that the data isn't even kept in a database to build the page and the web page itself is updated (hard-coded) when something changes.
I could really use some help and suggestions here. Thanks!
Screen scraping is always problematic IMO as you are at the mercy of the person who wrote the page. If the content is static, then I think it would be easier to copy the data manually to your database. If you wanted to keep up to date with changes, you could then snapshot the page when you transcribe the info and run a job to periodically check whether the page has changed from the snapshot. When it does, it sends an email for you to update it.
The above method could also be used in conjunction with some sort of screen scaper which could fall back to a manual process if the page changes too drastically.
Ultimately, it is a case of how much effort (cost) is your client willing to bear for accuracy
I have done this for the following site: http://www.buscatchers.com/ so it's definitely more than doable! A key feature of a web scraping solution for travel sites is that it must send you emails if anything went wrong during the scraping process. On the site, I use a two day window so that I have two days to fix the code if the design changes. Only once or twice have I had to change my code, and it's very easy to do.
As for some examples. There is some simplified source code here: http://www.buscatchers.com/about/guide. The full source code for the project is here: https://github.com/nicodjimenez/bus_catchers. This should give you some ideas on how to get started.
I can tell that the data is dynamic, it's to well structured. It's not hard for someone who is familiar with xpath to scrape this site.

Resources