I'm trying to create an in-browser background removal using tensorflow.js body-pix
While the online demo reaches ~10 frames per second on my computer, my code takes ~3 seconds for a single frame.
I mostly followed the example on the official github
Here is my code:
const remove_background = async (img) => {
console.time("ML");
console.time("loadModel");
const net = await bodyPix.load({
architecture: 'MobileNetV1',
outputStride: 16,
multiplier: 0.5,
quantBytes: 2
});
console.timeEnd("loadModel");
console.time("segmentPerson");
const segmentation = await net.segmentPerson(img.imageData, {
segmentationThreshold: 0.7,
internalResolution: 0.25
});
console.timeEnd("segmentPerson");
console.time("ApplyMask");
for (var i=0;i<segmentation.data.length;i++) {
if(segmentation.data[i] === 0) {
img.imageData.data[i*4] = 255;
img.imageData.data[i*4+1] = 0;
img.imageData.data[i*4+2] = 0;
}
}
console.timeEnd("ApplyMask");
console.timeEnd("ML");
return img;
}
And here are the times:
loadModel: 129.35498046875 ms
segmentPerson: 2755.817138671875 ms
ApplyMask: 4.910888671875 ms
ML: 2890.7060546875 ms
I started with a 1700x1700 Image with internal resolution of 0.1, and worked down to a 200x200 image with 0.25 resolution, with no significant improvement.
What am I doing wrong?
I'm using tfjs#1.2 & body-pix#2.0
I have found in my implementation using bodypix that the first detection takes quite a bit longer than subsequent ones. I believe they do one-time initialization on the first call to segmentPerson. We are segmenting video. I found I could save about 500ms of startup time of the first detection just by passing a blank image to segmentPerson while the stream is being acquired before passing the video frame by frame when the stream has started.
Related
I'm in the early stages of producing a rectangle SVG that uses the rectangle shape. I'm generating the colours of the pixels using a loop that gets all the RGB colours in increments of 8. The returned array has 32,768 <rect /> in it. The code produces the desired outcome however I get an error in:
Chrome
Maximum call stack size exceeded
getTypeSymbol (react_devtools_backend.js:4828)
Firefox
too much recursion
From what I can tell this isn't a recursion problem, the function doesn't appear to be reloading. I think it's related to the size of the array.
Any thoughts on what I should do here, I've never seen this problem before.
function PixelColour() {
console.log("start");
let counter = 0;
let x = 0;
let y = 0;
const colours = [];
for (let red = 0; red < 256; red += 8) {
for (let green = 0; green < 256; green += 8) {
for (let blue = 0; blue < 256; blue += 8) {
counter++;
if (x < 256) {
x++;
} else {
x = 0;
}
if (x === 256) {
y++;
}
colours.push(
<rect
key={counter}
x={x}
y={y}
height="1"
width="1"
style={{ fill: `rgb(${red}, ${green}, ${blue})` }}
/>
);
}
}
}
return <Fragment>{colours}</Fragment>;
}
class Colours extends React.Component {
render() {
return (
<div>
<svg width="256" height="128" style={{ border: "2px solid black" }}>
<PixelColour />
</svg>
</div>
);
}
}
It's not your function reloading (or rerendering) which is the problem per se; you're running into recursion within React itself. Specifically with the way that React devtools tries to reconcile the DOM in order to build a component map.
This is actually not so easy to replicate on codesandbox, since it appears to be using an experimental version of devtools, and this error only crops up when hot-reloading. The stack trace there is scheduleFibersWithFamiliesRecursively but on my local machine with the standard devtools extension installed I get mountFiberRecursively when the component mounts.
I did some digging and came across this github issue, and a PR addressing it which appears to have been abandoned for the time being.
Perhaps if you go and give them a nudge they might take another look:
Getting maximum call stack exceeded on backend.js when rendering many elements.
Refactored backend renderer to remove most of the recursion
All you can do in the meantime is disable the devtools extension. I would add that even with it disabled, this component takes several seconds to mount on my local machine. If you don't take the calculation out of the render cycle (function body) then it is going to be run on every render. You're trying to mount 10s of thousands of DOM nodes which is never going to run performantly - even the PR above only supposes a limit of 15000.
I think a better idea would be to 1) calculate this well in advance if you can, perferably as hard-coded data and nowhere near the UI thread, and 2) draw to a canvas rather than creating nodes in the DOM.
I'm trying to trim a mp3 file.
using this code:
private void TrimMp3(string open, string save)
{
using (var mp3FileReader = new Mp3FileReader(open))
using (var writer = File.Create(save))
{
var startPostion = TimeSpan.FromSeconds(60);
var endPostion = TimeSpan.FromSeconds(90);
mp3FileReader.CurrentTime = startPostion;
while (mp3FileReader.CurrentTime < endPostion)
{
var frame = mp3FileReader.ReadNextFrame();
if (frame == null) break;
writer.Write(frame.RawData, 0, frame.RawData.Length);
}
}
}
"open" is the file I'm trimming and "save" is the location I'm saving.
The trimming works but not fully. The new file does start from 60 seconds but it keeps going and not stopping at 90 seconds. For example if the file is 3 minutes it will start at 1 minute and end at 3. Its like the while is always true. What am I doing wrong here?
Thanks in advance!
I have no idea what your Mp3FileReader is doing there. But your while loop looks odd. Does mp3FileRead.ReadNextFrame() also change mp3FileReader.CurrentTime ? If not then there is your problem.
You should atleast do mp3FileReader.CurrentTime + 1Frame. Otherwise your currenttime is never changed and loop will always be true
In NAudio 1.8.0, Mp3FileReader.ReadNextFrame does not progress CurrentTime, although I checked in a fix for that recently.
So you can either get the latest NAudio code, or make use of the SampleCount property on each Mp3Frame to accurately keep track of how far through you are yourself.
I am trying to write a program that will recognize an image on the screen, compare it against a resource library, and then calculate based on the result of the image source.
The first thing that I did was to create the capture screen function which looks like this:
private Bitmap Screenshot()
{
System.Drawing.Bitmap Table = new System.Drawing.Bitmap(88, 40, PixelFormat.Format32bppArgb);
System.Drawing.Graphics g = System.Drawing.Graphics.FromImage(RouletteTable);
g.CopyFromScreen(1047, 44, 0, 0, Screen.PrimaryScreen.Bounds.Size);
return Table;
}
Then, I analyze this picture. The first method I used was to create two for loops and analyze both the bitmaps pixel by pixel. The problem with this method was time, it took a long time to complete 37 times. I looked around and found the convert to bytes and the convert to hash methods. This is the result:
public enum CompareResult
{
ciCompareOk,
ciPixelMismatch,
ciSizeMismatch
};
public CompareResult Compare(Bitmap bmp1, Bitmap bmp2)
{
CompareResult cr = CompareResult.ciCompareOk;
//Test to see if we have the same size of image
if (bmp1.Size != bmp2.Size)
{
cr = CompareResult.ciSizeMismatch;
}
else
{
//Convert each image to a byte array
System.Drawing.ImageConverter ic = new System.Drawing.ImageConverter();
byte[] btImage1 = new byte[1];
btImage1 = (byte[])ic.ConvertTo(bmp1, btImage1.GetType());
byte[] btImage2 = new byte[1];
btImage2 = (byte[])ic.ConvertTo(bmp2, btImage2.GetType());
//Compute a hash for each image
SHA256Managed shaM = new SHA256Managed();
byte[] hash1 = shaM.ComputeHash(btImage1);
byte[] hash2 = shaM.ComputeHash(btImage2);
for (int i = 0; i < hash1.Length && i < hash2.Length&& cr == CompareResult.ciCompareOk; i++)
{
if (hash1[i] != hash2[i])
cr = CompareResult.ciPixelMismatch;
}
}
return cr;
}
After I analyze the two bitmaps in this function, I call it in my main form with the following:
Bitmap Table = Screenshot();
CompareResult success0 = Compare(Properties.Resources.Result0, Table);
if (success0 == CompareResult.ciCompareOk)
{ double result = 0; Num.Text = result.ToString(); goto end; }
The problem I am getting is that once this has all been accomplished, I am always getting a cr value of ciPixelMismatch. I cannot get the images to match, even though the images are identical.
To give you a bit more background on the two bitmaps, they are approximately 88 by 40 pixels, and located at 1047, 44 on the screen. I wrote a part of the program to automatically take a picture of that area so I did not have to worry about the wrong location or size being captured:
Table.Save("table.bmp");
After I took the picture and saved it, I moved it from the bin folder in the project directly to the resource folder and ran the program again. Despite all of this, the result is still ciPixelMismatch. I believe the problem lies within the format that the pictures are being saved as. I believe that despite them being the same image, they are being analyzed in different formats, maybe one of the pictures contains a bit more information than the other which is causing the mismatch. Can somebody please help me solve this problem? I am just beginning with my c# programming, I am 5 days into the learning process, and I am really at a loss for this.
Yours sincerely,
Samuel
Hey I am using GetUserMedia() to capture audio input from user's microphone. Meanwhile I want to put captured values into an array so I can manipulate with them. I am using the following code but the problem is that my array gets filled with value 128 all the time (I print the results in console for now), and I can't find my mistake. Can someone help me find my mistake?
//create a new context for audio input
context = new webkitAudioContext();
var analyser = null;
var dataarray = [];
getLiveInput = function() {
navigator.webkitGetUserMedia({audio: true},onStream,onStreamError);
};
function onStream(stream)
{
var input = context.createMediaStreamSource(stream);
analyser = context.createAnalyser();
var str = new Uint8Array(analyser.frequencyBinCount);
analyser.getByteTimeDomainData(str);
for (var i = 0; i < str.length; i++) {
var value = str[i];
dataarray.push(value);
console.log(dataarray)
}//end for loop
}//end function
function onStreamError(e) {
console.error('Streaming failed: ', e);
};
The values returned from getByteTimeDomainData are 8 bit integers, from 0 to 255. 128, which is half way, basically means "no signal". It is the equivalent of 0 in PCM audio data from -1 to 1.
But ANYWAY - there are a couple problems:
First, you're never connecting the input to the analyser. You need input.connect(analyser) before you call analyser.getByteTimeDomainData().
The second problem isn't with your code so much as it's just an implementation issue.
Basically, the gotStream function only gets called once - and getByteTimeDomainData only returns data for 1024 samples worth of audio (a tiny fraction of a second). The problem is, this all happens so quickly and for such a short period of time after the stream gets created, that there's no real input yet. Try wrapping the analyser.getByteTimeDomainData() call and the loop that follows it in a 1000ms setTimeout and then whistle into your microphone as soon as you give the browser permission to record. You should see some values other than 128.
Here's an example: http://jsbin.com/avasav/5/edit
I'm working with the new face tracking SDK of Kinect (Microsoft Official), and I noticed that there's difference in detection between c++ and c#-wpf example: the first one is way faster in recognition than the second (the one I want to use, actually). In the c++ version the face tracking is almost on the fly, while in the wpf one it starts ONLY when I put my entire body (so the entire skeleton) in the FOV of Kinect.
Did anyone found out why? I noticed that the skeletonframe provided shows the property "Trackingmode = default", even though I set the kinect skeleton stream on seated.
colorImageFrame.CopyPixelDataTo(this.colorImage);
depthImageFrame.CopyPixelDataTo(this.depthImage);
skeletonFrame.CopySkeletonDataTo(this.skeletonData);
// Update the list of trackers and the trackers with the current frame information
foreach (Skeleton skeleton in this.skeletonData)
{
if (skeleton.TrackingState == SkeletonTrackingState.Tracked
|| skeleton.TrackingState == SkeletonTrackingState.PositionOnly)
{
// We want keep a record of any skeleton, tracked or untracked.
if (!this.trackedSkeletons.ContainsKey(skeleton.TrackingId))
{
this.trackedSkeletons.Add(skeleton.TrackingId, new SkeletonFaceTracker());
}
// Give each tracker the upated frame.
SkeletonFaceTracker skeletonFaceTracker;
if (this.trackedSkeletons.TryGetValue(skeleton.TrackingId,
out skeletonFaceTracker))
{
skeletonFaceTracker.OnFrameReady(this.Kinect,
colorImageFormat,
colorImage,
depthImageFormat,
depthImage,
skeleton);
skeletonFaceTracker.LastTrackedFrame = skeletonFrame.FrameNumber;
}
}
}
The code is the one provide my microsoft with the 1.5 SDK.
I had some information in other forums, specifically here (Thanks to this guy (blog)):
MSDN forum link
Basically, in the c++ example all the methods to track the face are used, both color+depth and color+depth+skeleton, while in the c# only the latter is used. So it only starts when you stand up.
I did some tests, but the other method is still not working for me, I did some modification to the code but with no luck. Here is my modification:
internal void OnFrameReady(KinectSensor kinectSensor, ColorImageFormat colorImageFormat, byte[] colorImage, DepthImageFormat depthImageFormat, short[] depthImage)
{
if (this.faceTracker == null)
{
try
{
this.faceTracker = new Microsoft.Kinect.Toolkit.FaceTracking.FaceTracker(kinectSensor);
}
catch (InvalidOperationException)
{
// During some shutdown scenarios the FaceTracker
// is unable to be instantiated. Catch that exception
// and don't track a face.
//Debug.WriteLine("AllFramesReady - creating a new FaceTracker threw an InvalidOperationException");
this.faceTracker = null;
}
}
if (this.faceTracker != null)
{
FaceTrackFrame frame = this.faceTracker.Track(
colorImageFormat,
colorImage,
depthImageFormat,
depthImage,
Microsoft.Kinect.Toolkit.FaceTracking.Rect.Empty);
//new Microsoft.Kinect.Toolkit.FaceTracking.Rect(100,100,500,400));
this.lastFaceTrackSucceeded = frame.TrackSuccessful;
if (this.lastFaceTrackSucceeded)
{
if (faceTriangles == null)
{
// only need to get this once. It doesn't change.
faceTriangles = frame.GetTriangles();
}
this.facePointsProjected = frame.GetProjected3DShape();
this.rotationVector = frame.Rotation;
this.translationVector = frame.Translation;
this.faceRect = frame.FaceRect;
this.facepoints3D = frame.Get3DShape();
}
}
}
frame.TrackSuccessful is always false. Any idea?
I finally figured it out and made a post on MSDN forums regarding what else needs to be done to get this working.
It's here.
Hope that helps!