What does the renderingEmSize parameter in GlyphRun specify? - wpf

I'm trying to make a GlyphRun instance for use in a GlyphRunDrawing, but the documentation is just so bad that it's almost comical. For example, the parameter renderingEmSize is described like this:
renderingEmSizeType: System.Double
A value of type Double.
Just... wow.
I know what an "em" is in a font (width of the em dash), but I don't know what the grid units are. Device pixels? Device independent pixels?

Turns out the answer is in the source code. Thanks for MS making this available, if they are going to make eyes bleed on the docs.
Interestingly, all the information we need is contained in the xml doc comments on GlyphRun.cs. The renderingEmSize for example, is as follows:
<param name="renderingEmSize">Font rendering size in drawing surface units (96ths of an inch).</param>
The rest of the file is similarly well-commented, including this seemingly out-of-place but gripping read:
/*
The default branch prediction rules for modern processors specify that forward branches
are not to be taken. If the branch is in fact taken, all of the speculatively executed code
must be discarded, the processor pipeline flushed, and then reloaded. This results in a
processor stall of at least 42 cycles for the P4 Northwood for each mis-predicted branch.
The deeper the processor pipeline the higher the cost, i.e. Prescott processors.
Checking for multiple incorrect parameters in a method with high call count like this one can
easily add significant overhead for no reason. Note that the C# compiler should be able to make
reasonable assumptions about branches that throw exceptions, but the current whidbey
implemenation is weak in this regard. Also the current IBC tools are unable to add branch
prediction hints to improve behavior based on run time information. Also note that adding
branch prediction hints increases code size by a byte per branch and doing this in every
method that is coded without default branch prediction behavior in mind would add an
unacceptable amount of working set.
*/
The whole file can be found here: GlyphRun.cs at webtropy

Related

Why are the inputs to my guess_nonlinear() all 1s?

The N2 diagram for my full problem is below.
The N2 diagram for the coupled portion of the problem is below.
I have a DirectSolver handling the coupling between LLTForces and ImplicitLiftingLine, and an LNBGS solver handling the coupling between LiftingLineGroup and TestCL.
The gist for the problem is here: https://gist.github.com/eufren/31c0e569ed703b2aea3e2ef5360610f7
I have implemented guess_nonlinear() on ImplicitLiftingLine, which should use various outputs from LLTGeometry to give a good initial guess for the vortex strengths based on a linearised form of the governing equations.
def guess_nonlinear(self, inputs, outputs, resids):
freestream_unit_vector = inputs['freestream_unit_vector']
freestream_velocity = inputs['freestream_velocity']
n = inputs['normal_vectors']
A = inputs['surface_areas']
l = inputs['bound_vortices']
ic_tot = inputs['influence_coefficients_total']
v_inf = freestream_velocity
v_inf_vec = v_inf*freestream_unit_vector
lin_numerator = np.pi * v_inf * A * np.sum(n * v_inf_vec, axis=1)
lin_denominator = (np.linalg.norm(np.cross(v_inf_vec, l), axis=1) - np.pi * v_inf * A * np.sum(np.sum(n * ic_tot, axis=2), axis=1))
lin_vtx_str = lin_numerator / lin_denominator
outputs['vortex_strengths'] = lin_vtx_str
However, when the problem is run for the first time, any inputs not explicitly set with p.set_val() are all 1s. This causes guess_nonlinear() to give a bad output and so the system fails to converge:
As far as I can tell, the execution order for the LLT group is correct, and the geometry components should be being executed before the implicit component. I'm confused as to why this doesn't seem to actually be happening when the code is run, and instead these inputs are taking their default values.
What do I need to change to get this to work properly? Additionally, I've found difficulty in getting LNBGS to converge (hence adding guess_nonlinear()) during optimisation - only DirectSolver gets all the way through the optimisation without issues, but it's very slow for large numbers of LLT nodes). How can I improve the linear and nonlinear solver selection, and improve the reliability of the iterative solver?
Note: Thanks for providing a testable example. It made figuring out the answer to your question a lot simpler. Your problem was a bit subtle and I would not have been able to give a good answer without runnable code
Your first question: "Why are all the inputs 1"
"Short" Answer
You have put the nonlinear solver to high in the model hierarchy, which then included a key precurser component that computed your input values. By moving the solver down to a lower level of the model, I was able to ensure that the precurser component (LTTGeometry) ran and had valid outputs before you got to the guess_nonlinear of implicit component.
Here is what you had (Notice the implicit solver included LTTGeometry even though the data cycle does not require that component:
I moved both the nonlinear solver and the linear solver down into the LTTCycle group, which then allows the LTTGeometry component to execute before getting to the nonlinear solver and guess_nonlinear step:
My fix is only partially correct, since there is a secondary cycle from the TestCL component that also needs a solver and does not have one. However, that cycle still does not involve the LTTGeometry group. So the fully correct fix is to restructure you model top run geometry first, and then put the LTTCycle and TestCL groups together so you can run a solver over just them. That was a bit more hacking than I wanted to do on your test problem, but you can see the general idea from the adjusted N2 above.
Long Answer
The guess_nonlinear sequence in OpenMDAO does NOT run the compute method of explicit components or of groups. It follows the execution hierarchy, and calls any guess_nonlinear that it finds. So that means that any explicit components you have in your model will NOT get executed, their outputs will not get updated with computed values, and those computed values will not get passed to the inputs of downstream components.
Things get a little tricky when you have deep model hierarchies. The guess_nonlinear method is called as the first step in the nonlinear solver process. If you have a NonLinearRunOnce solver at the top level, it will follow the compute chain down the line calling compute or solve_nonlinear on each child and doing a data transfer after each one. If one of those children happens to be a group with a nonlinear solver, then that solver will call guess_nonlinear on its children (grandchildren of the top group with the NonLinearRunOnce solver) as the first step. So any outputs that were computed by the siblings of this group will be valid, but none of the outputs from the grandchild level will have been computed yet.
You may be wondering why not just have the guess_nonlinear method call the compute for any explicit components? There is a difficult to balance trade off here. If you assume that all explicit components are very cheap to run, then it might make sense to run the compute methods --- or it might not. A lot depends on the cyclic data structure. If some early component in the group needs guesses from the later one, then running its compute isn't going to help you much at all. Perhaps more importantly though, not all explicit components are cheap to run. You might have a very expensive computation, and calling compute as part of the guess process would be way too costly.
The compromise here, if you need some kind of top level guess process, is that you can implement guess_nonlinear at the group level. It's less common to do, but it gives you total control over what happens. You can call whatever you need to call in whatever sequence.
So the absolute key thing to remember is that the only data you have available to you when a guess_nonlinear is called is any data that was computed before your containing solver was executed. That means any thing that was computed before you got to the model scope of the containing solver (not the scope of the component with the guess_method itself).
Your second question: "How can I speed this up when the number of nodes gets large?"
This one not possible to give a generic answer to at all. I noticed that you have already specified sparse partial derivatives. That is a great start, but if its still not fast enough for you then it means you're reaching the limits of what you can do with a DirectSolver. You note that this solver is the only one that gets you through the optimization without issues, which I will take to mean that ScipyKryloventer link description here and PetscKrylov are not converging the linear system well for you --- at least not by themselves. Thats not surprising, as krylov solvers almost always require some kind of preconditioner... and this is why I can't offer a generic answer. Setting up efficient linear solvers for larger-scale compute is a tricky subject. If you look into the literature, you'll find some good suggestions. You can also study open source implementations like VSPAero for some tips.
effectively, you've reached the limit of what simple linear solvers can offer you. From this point forward, OpenMDAO can help a bit by making it easier to implement some preconditioning, but you'll have to suffer the math side yourself.

React Profiler: What do the timings mean?

I am using react profiler to make my app more efficient. It will commonly spit out a graph like this:
I am confused because the timings do not add up. For example, it would make sense if the total commit time for "Shell" was 0.3ms then "Main" was "0.2ms of 0.3ms." But that is not the case.
What precisely do these timings mean and how do they add up?
(note: I have read "Introducing the React Profiler" but it appears from this section that this time-reporting convention is new since that article.)
The first number (0.2ms) is the self duration and the second number (0.3ms) is the actual duration. Mostly the self duration is the actual duration minus the time spent on the children. I have noticed that the numbers don't always add up perfectly, which I would guess is either a rounding artifact or because some time is spent on hidden work. For example, in your case, the Shell has an actual time of 3.1ms and a self duration of 0.3ms, which means the 2 children (Navbar and Main), should add up to 3.1ms - 0.3ms, or 2.8ms. However, we see that the Navbar is not re-rendered, so it's 0ms, but the actual duration for Main is only 2.7ms, not 2.8ms. It's not going to have any impact in practical terms when you're performance tuning, but it does violate expectations a bit.

changing HM reference software to display some information about the bitstream

I am very new to the HM HEVC (and the JEM) reference software, and I am currently trying to understand the source code. I want to add some lines to display for each component: name of Algo (i.e. inter/intra Algos) + length of the bitstream+ position in output bin file.
To know which component cost more bits to code and how codec is working. I want to do same thing for the JEM also after that.
my problem first is that I am unable of understanding a lot of function there, the comment is not sufficient, so is there any references to understand the code??!! (I already read the Manuel ,doesn’t help).
2nd I don’t know where & how exactly to add these lines; is it in TEncGOP, TEncSlice or TEncCU. Ps: I don’t think in TEncGOP.compressGOP so maybe in the 2 other classes.
(I put the answer to comment that #Mourad put four hours ago here, becuase it will be long)
I assume that you could manage to find where the actual encoding after the RDO loop is implemented. As you correctly mentioned, xEncodeCU is the function you need to refer to make sure you are not still in the RDO.
Now you need to find the exact function in xEncodeCU that is responsible for your target codec tool.
For instance, if you want to count the number of bits for coefficient coding, you should be looking into the m_pcEntropyCoder->encodeCoeff() (It's a JEM function and may have different name in the HM). Once you find this line in the xEncodeCU, you may do this and get the number of bits written inside encodeCoeff() function:
UInt b_before = m_pcEntropyCoder->getNumberOfWrittenBits();
m_pcEntropyCoder->encodeCoeff( ... );
UInt b_after = m_pcEntropyCoder->getNumberOfWrittenBits();
UInt writtenBitsCoeff = b_after - b_before;
One important point: as you cas see, the function getNumberOfWrittenBits() gives you integer rates, which is obtained by rounding sum of fractional rates corresponding to all syntax elements coded inside the function encodeCoeff. This error might or might not be acceptable, depending on your problem. For example, if instead of coefficient coding rate, you wanted to know the rate of CBF, then this error would not be acceptable at all. Because, CBF rate is mostly less than one bit. If this is your case, then you would need to calculate the fractional bits one-by-one. It would be totally different and relatively more complicated than this.
Point 1: There is one rule of tumb that logging coding decisions (e.g. pred mode, MV, IPM, block size) is much easier at the decoder side than encoder. This is because of the fact that you have super complicated RDO process at the encoder side that can easily make you get lost in the loops. But at the decoder side, everything appears only once. However, if you insist on doing it at the encoder side, you may find some tips here: Get some information from HEVC reference software
Point 2: Unlike coding decisions, logging rate (i.e. number of written bits for different syntax elements) is more complicated at the decoder side than encoder. This is particularly true for fractional bits associated to anything that is encoded in non-EP mode (i.e. with CABAC contexts). So you may do this part at the ecoder side. But I am afraid it is not easy.
Point 3: I think the best way to understand the code is to read it line-by-line. It's very time-consuming but if you theoritically know the standard(s), you will probably be able to distiguish important parts and ignore the rest.
PS: I think there are too many questions, mostly too general, in your post. It makes it a bit difficult for me to answer them all together. So you I'll wait for you to take your next step and ask more precise questions.

Measure difference between two files

I have a question that's very specific, yet very general at the same time. (Also, I don't know if this is quite the right site for this.)
The Scenario
Let's say I have an uncompressed video vid.avi. It is then run through [Some compression algorithm], which is lossy. I want to compare vid.avi and the new, compressed file to determine just how much data was lost in the compression. How can I compare the files and how can I measure the difference between the two, using the original as the reference point? Is it possible at all? I would prefer a generic answer that will work with any language, but I would also gladly accept an answer that's specific to a language.
EDIT: Let me be more specific. I want something that compares two video files in a similar way that the Notepad++ Compare plugin compares text files. I just want to find out how close each individual pixel's colour is to the original file's colour for that pixel.
Thanks in advance, and thank you for taking the time to read this question.
It is generally the change in video quality that people want to measure when comparing compression methods, rather than a loss of data.
If you did want to measure somehow the data loss, you would have to define what you mean by 'data' and how you wanted to measure it. Video compression is quite complex and the approach may even differ frame by frame within a video. Data could mean the colour depth for each pixel, the number of frames per second, whether a frame is encoded based on a delay to other frames etc.
Video quality is subjective so the reduction in quality after compression will not be an absolute value. The usual way to measure the quality is similar to the technique used for audio - Mean Opinion Score: https://en.wikipedia.org/wiki/Mean_opinion_score. Its essentially uses a well defined process to try to apply some objectivity to a test audiences subjective experience.

How could we get a variable value from GLSL?

I'm doing a project with a lot of calculation and i got an idea is throw pieces of work to GPU, but i wonder whether could we retrieve results from GLSL, if it is posible, how?
GLSL does not provide outputs besides what is placed in the frame buffer.
To program a GPU and get results more conveniently, use CUDA (NVidia only) or OpenCL (cross-platform).
In general, what you want to do is use OpenCL for general-purpose GPU tasks. However, if you are insistent about pretending that OpenGL is not a rendering API...
Framebuffer Objects make it relatively easy to render to multiple outputs. This of course means that you have to structure your processing such that what gets rendered matches what you want. You can render to 32-bit floating-point "images", so you have access to plenty of precision. The biggest difficulty is what I stated: figuring out how to structure your task to match rendering.
It's a bit easier when using transform feedback. This is the ability to write the output of the vertex (or geometry) shader processing to a buffer object. This still requires structuring your tasks into something like rendering, but it's easier because vertex shaders have a strict one-vertex-to-one-vertex mapping. For every input vertex, there is exactly one output. And if you draw GL_POINTS, it's not too difficult to use attributes to pass the data that changes.
Both easier and harder is the use of shader_image_load_store. This is effectively the ability to read/write from/to arbitrary images "whenever you want". I put that last part in quotes because there are lots of esoteric rules about data race conditions: reading from a value written by another shader invocation and so forth. These are not trivial to deal with. You can try to structure your code to avoid them, by not writing to the same image location in the same shader. But in many cases, if you could do that, you could just render to the framebuffer.
Ultimately, it's pretty much impossible to answer this question in the general case, without knowing what exactly you're trying to actually do. How you approach GPGPU through a rendering API depends greatly on exactly what you're trying to compute.

Resources