Differences in volume of audio content - alexa

We are having a Skill built for us which plays podcasts and audio snippets of videos. We currently serve all of this content through other traditional platforms like native mobile apps and a website.
One problem we have is that the volume of all this audio content is much lower than the rest of the sounds outputted from the Alexa device. We don't notice any similar discrepancies on the other aforementioned platforms, and the developers building the Skill say that there's no API which allows you to boost or manipulate the output volume (not the system volume).
Has anyone else had experiences with this sort of issue? We are reluctant to pump up the volume of our source files as it will affect all the other places they are listened to.

Short answer - yes it's tricky and you're not alone.
As detailed in this BBC report* on designing a voice application - "All of our experts have experienced problems in audio levels when mixing and mastering audio for smart speakers."
Official guidence from Amazon is:
Program loudness for Alexa should average -14 dB LUFS/LKFS
The true-peak value should not exceed -2 dBFS
It then goes so far as to say that,
Your skill may be rejected if program loudness:
Is lower than -19 dB LUFS
Is higher than -9dB LUFS
However, this mainly seems to apply to audio played either as part of a Flash Briefing or using the AudioPlayer Interface, and you may get away with deviating from this if using it as part of SSML output.
That said, in practice developers tend to bump the levels up on audio until it sounds good. Sticking strictly to the recommendations usually renders the volume way too low.
So, if you didn't want to increase the loudness of clips as it'd affect other platforms, your best options might be to create an Alexa-version of the audio - mixed slightly louder. Though, I recognize that this would be tedious.
*In the interest of transparency, I wanted to declare that I fed into this report.

Related

Is there a way to assign dB value to a specific audio files for reproduction in speakers?

Most speakers would have a volume knob, but is there a way to modify it for each audio file? My biggest concern is the speaker integrity as to not send too loud of a sound. I can't find a way to get the speakers data so it can be decided if it's safe to play it or not, as I intend to make it work in every speaker. It's a broad question but I dont even know if it's possible so I dont know where to start.
Audio files should ideally be at the maximum level that does not distort. I could see engaging in preprocessing all your recordings so that they're at a consistent sound level. That might be sufficient to account for the concern you are raising about client speaker integrity.
If you want to have the database store a number that reflects how "hot" a particular audio file is, then it seems to me that the amount of effort required to get that number might be comparable or even greater than the effort to preprocess the sounds to a consistent max-with-no-overflow level.
Once you have that number, how to make use of it? The HTML <audio> tag does not have an amplitude property. So, you would have to make use of some library in your client, such as the Web Audio API, or one of the many libraries that are built on top of it. With Web Audio API, the relevant point to change the volume is the gain node.

Libraries for Face Detection

I need to develop a mobile app (primarily for Android, iOS, and Windows Mobile) for face detection. Obviously, OpenCV is the most well known. However, I'm unsure about the compatibility among the different OS'es. Besies OpenCV, are there other options? 2 key requirements:
-Open source/commercial libraries but must run locally/natively in devices without internet connection so Player Service API would not work
-Capable of tracking multiple faces in motion
Anyone can share their experiences/knowledge in this area? Any pointers greatly appreciated!
You are really pushing the margin a whole lot.
Face detection generally consists of three different areas.
1) Recognizing a face as a face (there is a mouth, a nose, eyes) This is useful for focusing a snapshot.
2) Recognizing facial features, looking for emotion (mouth in a smile) or eye tracking.
3) Facial recognition. Using the system to perform identification by attaching a name to a face.
You want to use a face recognition tool to perform tracking and count people entering a particular place, using a mobile phone.
First tracking is pretty difficult. Its one thing to perform simple face identity in a single frame snap shot. That's pretty easy. The problem is, you may find your frame rates so poor that you can only accommodate 1 frame every three or even every five seconds. That will make it nearly impossible to track and count faces. Counting faces is easy, but what's hard is to determine if that face in the screen was counted previously or is a new person entering the screen.
OpenCV has a whole lot of tools and examples out there for facial recognition, image tracking, etc. I'd strongly recommend playing with OpenCV and test its capabilities. I'd recommend the C/C++ versions (unless you are already a Python programmer) Here's a place to start, a blog entry I wrote a few months ago.
I really like the tutorials from Kyle Hounslow... Look him up on youtube. His videos are well thought out, they are interesting and he provides example code for all his work. Go ahead and watch all of those videos, and repeat all of those examples. Get a feel for what is available in frame rates using a laptop.
The next part of your task is porting stuff from OpenCV to Android/iOS. That's no easy task. I'm sure folks have tried, and I'm sure helpful hints are out there.
I don't mean to dissuade you from an awesome investigation but do note what you want to do is mighty difficult. You will have to invest some time to even determine where all the difficult areas are. And unfortunately you won't know effective frame rates and performance until you build some stuff and try it.
Good luck with the journey.

Comparing Flash, HTML, Silverlight, X3D and Unity 3d

I have to prepare a comparison between the following technologies to present it to my Project Manager, but I fell that I'm lost, so if any one can help I will be thankful
I want to compare between them in the following areas:
the support of online video streaming
the budget of using each one
Learning Time will be needed to learn the technology
Which one is the standard and will target a lot of users
The support if I found any problem
Bugs and security issues
connection to DB, SOA and web services
supporting of multi player
The support of online video streaming
Some of the X3D viewers support video streaming (and some even 3D streaming, for things such as augmented reality).
Which one is the standard and will target a lot of users
X3D is a standardized format, such as JPEG with multiple companies being able to manipulate such data and is even officially recommended by HTML5 specs whereas Unity ties you to a single company. Even if most X3D viewers are plugin-based like Flash, there exists also native implementations such as X3DOM to display/interact with X3D files for any browser that supports WebGL.
Connection to DB, SOA and web services
I would usually recommend using a webservice for interfacing with a DB, and yes, X3D can interact with webservices (XML, JSON). There is even a standard binary format that is fast to transfer and parse large contents faster.
Supporting of multi player
Some X3D-supporting providers offer a multiusers service, such as Bitmanagement's BS Collaborate server, but I've seen people using Darkstar/RedDwarf to make multiusers 3D environments as well.
the support of online video streaming
Unity 3D does not support video streaming, unless done through textures, which will give you a really slow frame rate.
I don't know for sure about X3D, but I would doubt it was really made for such tasks.
Silver light has good video support, it should be easy to stream with.
HTML only supports streaming video if using HTML 5, for which it gives the best user experience when user's browser supports it.
Flash is the de-facto for video streaming. It is extensively widespread. They use it for YouTube for example.
the budget of using each one
The cheapest of them all is HTML, it is free. Then you can theoretically set up something for free in flash using Flex SDK and server streaming technology such as Red5 (both open source and free). After that, I believe that all others would probably be on par cost wise, Unity3D coming in as the cheapest of the paid alternatives.
Learning Time will be needed to learn the technology
Listed in order of fastest one to learn to slowest (assuming no prior experience in any):
HTML
Flash/Silverlight
Unity3D
X3D
Which one is the standard and will target a lot of users
Flash is the most widespread. Its only competitor would be HTML 5, as new browsers tend to support it and its the only possible option on iOS. On the other hand, if 3D is what you want, then Unity3D is the standard for now, might be followed by HTML 5 in the future.
The support if I found any problem
Well, Unity3D would offer you good paid support, flash and silver light also (but only when you pay for streaming server licenses). HTML, X3D will not give you any support, but you can find a lot of information on the internet. There is also extensive information about Flash and Silverlight on the internet, but mostly Flash.
Bugs and security issues
All are pretty secure, I'm just not sure about X3D, but all others are comparable in term of security or bug issues.
connection to DB, SOA and web services
Easy to do with HTML, Flash and Silverlight. Harder with Unity3D, and hardest with X3D.
supporting of multi player
Multi-player what? If you are making a game, then clearly I would say your real options are Unity3D if the game is to be in 3D, Flash if it is to be done in 2D. Check out SmartFoxServer for easy multiplayer server.
My 2 cents:
the support of online video streaming:
Some X3D players do support it. Unity does in some ways : http://unity3d.com/unity/features/audio-and-video
the budget of using each one:
X3D and Unity3d are free. You can pay for Unity licenses for extra features and platforms like iOS and Android. If you need to write plugins for Unity, you'll need the $1500 license. There are no costs for distribution of Unity products.
Learning Time will be needed to learn the technology:
Both X3D and Unity3d have active communities and many online resources and offline books. Unfortunately for X3D, the best content creation tool (Vivaty Studio) is no longer supported officially, but X3D is supported in Maya, Max, Blender, and many other 3D programs. Unity's online docs are excellent and the answers.unity3d.com forum (and other forums) are free and fast.
Which one is the standard and will target a lot of users:
'Standard' Well, HTML is the broadest standard. X3D (if including VRML) is the oldest most widely used 3D standard. HTML you have. HTML5 is coming, 'real soon now' (I'm already turning blue). If you mean 'most readily available' the HTML is #1, Flash is #2 (as everyone has a browser, and most computers come with Flash installed already). Flash needs to be installed. Unity needs to be installed too, but it's at least as fast and easy to install as Flash, and it's gotten millions of downloads, so it's getting pretty pervasive. X3D requires a plugin (this should change sometime 'real soon now' with x3dom on HTML5), but the many X3D players are all a little different from each other.
The support if I found any problem:
All have much online community support. X3D has a spec committee but that's not really support per se, you'd have to contact the X3D plugin provider (Bitmanagement, Cortona, Octaga, Exit Reality, Fraunhoffer, etc.) Unity has great online community forums, you can pay for premium support, but I'd only do that if I needed a serious bug or feature that has no work-around.
Bugs and security issues:
X3D's bugs depend on which player you use. Unity has bugs, but the product is pretty solid (I've only crashed it once, and I use is all day, every day, for over a year). Both have a mind toward security, but neither of these are totally secure, especially since you can write scripts that are inherently not secure. So you have a hand in how secure YOUR content will be. Some X3D players support encryption. Unity products are compiled.
connection to DB, SOA and web services:
You can use something like AJAX or JSON or whatever in all these platforms, no? So if it's by web service, sure. If by direct local access, I know Unity can do that. Both Unity and Flash require cross-server xml files on the server to allow access cross-domain (in the web player for Unity anyway).
supporting of multi player:
Unity has excellent multi player networking components. X3D (spec) supports it too, but it really depends on which X3D player you go with as to how well it actually works. Worst case, you can use AJAX or JSON or whatever to roll your own.
Which you choose depends mostly on what you want to do with it. Flash is generally the best route right now, unless it's all about 3D, then I'd try Unity. But a year from now, HTML5 alternatives will begin to take over. Flash DOES support 3D, there are different ways it can be done. Vivaty had a full-featured X3D player written in Flash, so it can be done. There are several good 3rd party 3d plugins for Flash.
I totally agree with wildpeaks : )
Connection to DB, SOA and web services: easy to do with HTML, Flash and Silverlight. Harder with Unity3D, and hardest with X3D.
Reply: I think X3D is not hardest.
X3D(X3DOM) can interact with webservices (XML) as very easy in this example/tutorial
Flash supports hardware accelerated 3d, and comes out of the box with 3d support. In addition, there is the papervision library for more advanced 3d. Unity3d is also supported
as a flash library.
I would consider Flex as a real alternative to Flash. It has the same actionscript language, but uses a tag based syntax called MXML, similar to silverlight. Database remoting is extremely simple. You can access your .Net/Java/Php objects directly on the front end without having to deal with serialisation issues. All of the Flash libraries are accessible.
There is also the X3D player from instantreality.org, supporting video streaming & decoding, XMLHttp request via scripting and its free for non commercial usage.
Flash 3D isn't good 3D for any application of real-time 3D. It is 2.5D with some tricks.
X3D is easy to learn for simple things and harder as complexity goes up. It does have the advantage of being VRML with pointy brackets so the free content, examples and toolkits are easily found. I did comparison tests of the various players. BS Contact is the best for the ability to handle the most complex content with the fastest frame rate and rich color palette. Network support is still non-standard although XMLHTTP and database connections are easy to bolt on. As others have said, Instant Reality is coming on fast and supported by people with a deep understanding of the past implementations and future requirements.
The decision comes down to the project type. A simple comparison rating such as you are is misleading at best but thanks for giving it a shot. I've used VRML through all of its incarnations and now X3D for world building and now as a source for 3D models in video work in combination with Sony Vegas. For cost-benefit without the need to use very expensive modeling toolkits, it is the best of all the choices.

Looking for an example of when screen scraping might be worthwhile

Screen scraping seems like a useful tool - you can go onto someone else's site and steal their data - how wonderful!
But I'm having a hard time with how useful this could be.
Most application data is pretty specific to that application even on the web. For example, let's say I scrape all of the questions and answers off of StackOverflow or all of the results off of Google (assuming this were possible) - I'm left with data that is not very useful unless I either have a competing question and answer site (in which case the stolen data will be immediately obvious) or a competing search engine (in which case, unless I have an algorithm of my own, my data is going to be stale pretty quickly).
So my question is, under what circumstances could the data from one app be useful to some external app? I'm looking for a practical example to illustrate the point.
It's useful when a site publicly provides data that is (still) not available as an XML service. I had a client who used scraping to pull flight tracking data into one of his company's intranet applications.
The technique is also used for research. I had a client who wanted to compare the contents of several online dictionaries by part of speech, and all of these sites had to be scraped.
It is not a technique for "stealing" data. All ordinary usage restrictions apply. Many sites implement CAPTCHA mechanisms to prevent scraping, and it is inappropriate to work around these.
A good example is StackOverflow - no need to scrape data as they've released it under a CC license. Already the community is crunching statistics and creating interesting graphs.
There's a whole bunch of popular mashup examples on ProgrammableWeb. You can even meet up with fellow mashupers (O_o) at events like BarCamps and Hack Days (take a sleeping bag). Have a look at the wealth of information available from Yahoo APIs (particularly Pipes) and see what developers are doing with it.
Don't steal and republish, build something even better with the data - new ways of understanding, searching or exploring it. Always cite your data sources and thank those who helped you. Use it to learn a new language or understand data or help promote the semantic web. Remember it's for fun not profit!
Hope that helps :)
If the site has data that would benefit from being accessible through an API (and it would be free and legal to do so), but they just haven't implemented one yet, screen scraping is a way of essentially creating that functionality for yourself.
Practical example -- screen scraping would allow you to create some sort of mashup that combines information from the entire SO family of sites, since there's currently no API.
Well, to collect data from a mainframe. That's one reason why some people use screen scraping. Mainframes are still in use in the financial world and often it's running software that has been written in the previous century. The people who wrote it might already be retired and since this software is very critical for these organizations, they really hate it when some new code needs to be added. So, screenscraping offers an easy interface to communicate with the mainframe to collect information from the mainframe and then send it onwards to any process that needs this information.
Rewrite the mainframe application, you say? Well, software on mainframes can be very old. I've seen software on mainframes that was over 30 years old, written in COBOL. Often, those applications work just fine and companies don't want to risk rewriting parts because it might break some code that had been working for over 30 years! Don't fix things if they're not broken, please. Of course, additional code could be written but it takes a long time for mainframe code to be used in a production environment. And experienced mainframe developers are hard to find.
I myself had to use screen scraping too in a software project. This was a scheduling application which had to capture the output to the console of every child process it started. It's the simplest form of screen scraping, actually, and many people don't even realize that if you redirect the output of one application to the input of another, that it's still a kind of screen scraping. :)
Basically, screen scraping allows you to connect one (web) application with another one. It's often a quick solution, used when other solutions would cost too much time. Everyone hates it, but the amount of time it saves still makes it very efficient.
Let's say you wanted to get scores from a popular sports site that did not offer the information available with an XML feed or API.
For one project we found a (cheap) commercial vendor that offered translation services for a specific file format. The vendor didn't offer an API (it was, after all, a cheap vendor) and instead had a web form to upload and download from.
With hundreds of files a day the only way to do this was to use WWW::Mechanize in Perl, screen scrape the way through the login and upload boxes, submit the file, and save the returned file. It's ugly and definitely fragile (if the vendor changes the site in the least it could break the app) but it works. It's been working now for over a year.
One example from my experience.
I needed a list of major cities throughout the world with their latitude and longitude for an iPhone app I was building. The app would use that data along with the geolocation feature on the iPhone to show which major city each user of the app was closest to (so as not to show exact location), and plot them on a 3D globe of the earth.
I couldn't find an appropriate list in XML/Excel/CSV type format anywhere easily, but I did find this wikipedia page with (roughly) the info I needed. So I wrote up a quick script to scrape that page and load the data into a database.
Any time you need a computer to read the data on a website. Screen scraping is useful in exactly the same instances that any website API is useful. Some websites, however, don't have the resources to create an API themselves; screen scraping is the developer's way around that.
For instance, in the earlier days of Stack Overflow, someone built a tool to track changes to your reputation over time, before Stack Overflow itself provided that feature. The only way to do that, since Stack Overflow has no API, was to screen scrape.
The obvious case is when a webservice doesn't offer reverse search. You can implement that reverse search over the same data set, but it requires scraping the entire dataset.
This may be fair use if the reverse search also requires significant pre-processing, e.g. because you need to support partial matching. The data source may not have the technical skills or computing resources to provide the reverse search option.
I use screen scraping daily, I run some eCommerce sites and have screen-scraping scripts running daily to gather product lists automatically from my suppliers wholesale sites. This allows me to have upto date information on all the products available to me from several suppliers and allows me to flag non-economical margins due to price changes.

Which resolution to target for a Mobile App?

When desinging UI for mobile apps in general which resolution could be considered safe as a general rule of thumb. My interest lies specifically in web based apps. The iPhone has a pretty high resolution for a hand held, and the Nokia E Series seem to oriented differently. Is 240×320 still considered safe?
Not enough information...
You say you're targeting a "Mobile App" but the reality is that mobile could mean anything from a cell phone with 128x128 resolution to a MID with 800x600 resolution.
There is no "safe" resolution for such a wide range, and if you're truly targeting all of them you need to design a custom interface for each major resolution. Add some scaling factors in and you might be able to cut it down to 5-8 different interface designs.
Further, the UI means "User Interface" and includes a lot more than just the resolution - you can't count on a touchscreen, full keyboard, or even software keys.
You need to either better define your target, or explain your target here so we can better help you.
Keep in mind that there are millions of phone users that don't have PDA resolutions, and you can really only count on 128x128 or better to cover the majority of technically inclined cell phone users (those that know there's a web browser in their phone, nevermind those that use it).
But if you're prepared to accept these losses, go ahead and hit for 320x240 and 240x320. That will give you most current PDA phones and up (older blackberries and palm devices had smaller square orientations). Plan on spending time later supporting lower resolution devices and above all...
Do not tie your app to a particular resolution.
Make sure your app is flexible enough that you can deploy new UI's without changing internal application logic - in other words separate the presentation from the core logic. You will find this very useful later - the mobile world changes daily. Once you gauge how your app is being used you can, for instance, easily deploy an iPhone specific version that is pixel perfect (and prettier than an upscaled 320x240) in order to engage more users. Being able to do this in a few hours (because you don't have to change the internals) is going to put you miles ahead of the competition if someone else makes a swipe at your market.
-Adam
Right now I believe it would make sense for me to target about 2 resolutions and latter learn my customers best needs through feedback?
It's a chicken and egg problem.
Ideally before you develop the product you already know what your customers use/need.
Often not even the customers know what they need until they use something (and more often than not you find out what they don't need rather than what they need).
So in this case, yes, spend a little bit of time developing a prototype app that you can send out there to a few people and get feedback. They will have better feedback because they can try it out, and you will have a springboard to start from. The ability to quickly release UI updates without changing core logic will allow you test several interfaces quickly without a huge time investment.
Further, to customers you will seem really responsive to their needs, which will be a big benefit to people who's jobs depend on reaction time.
-Adam
You mentioned Web based apps. Any particular framework you have in mind?
In many cases, WALL seems to help to large extent.
Here's one Article, Adapting to User Devices Using Mobile Web Technology exploiting WALL.

Resources