How to Initialize the Qubits in IBM Quantum Composer - quantum-computing

Step 1 Qisqit :
Here is the code of Qiskit. I have initialized the four qubits.
qr1 = QuantumRegister(4)
mea = ClassicalRegister(2)
circuit = QuantumCircuit(qr1,mea)
initial=[[1,0],[0,1]]
circuit.initialize(initial[1], 0)
circuit.initialize(initial[1], 1)
# Uc
circuit.x(qr1[1])
circuit.ccx(qr1[0],qr1[1],qr1[2])
circuit.x(qr1[0])
circuit.x(qr1[1])
circuit.ccx(qr1[0],qr1[1],qr1[3])
circuit.x(qr1[0])
Its Circut:
Step 2 IBM Qunatum Composer:
Here is no such tool to initialize the qubis. Please guide me how to initilize here?

The OpenQASM language is the standard for exchange circuits among several quantum computing tools. You can use to move your Qiskit circuit to the IBM Quantum Composer.
print(circuit.qasm())
OPENQASM 2.0;
include "qelib1.inc";
gate multiplex1_reverse_dg q0 { ry(pi) q0; }
...
You can take the output and paste it in the OpenQASM 2.0 box in IBM Quantum Composer.
Note: At the moment, there is an issue in Qiskit that QASM-exports a circuit with initialize instructions as "gates". For your specific case, you need to do decompose first:
print(circuit.decompose().qasm())
Detailed explanation of initialize: Qiskit initialize is a reset followed by a state preparation. Take the following example:
circuit = QuantumCircuit(1)
circuit.initialize([0,1], 0)
print(circuit.decompose().qasm())
OPENQASM 2.0;
include "qelib1.inc";
gate multiplex1_reverse_dg q0 { ry(pi) q0; } #3
gate disentangler_dg q0 { multiplex1_reverse_dg q0; }
gate state_preparation(param0,param1) q0 { disentangler_dg q0; }
qreg q[1];
reset q[0]; #1
state_preparation(0,1) q[0]; #2
In #1, the reset, followed by state_preparation in #2. After some nested calling, the standard ry is called in #3. In this case, circuit.initialize([0,1], 0) is equivalent to reset q0[0]; ry(pi) q0[0].

Related

Frama-C does not recognize valid memory access from bitwise-ANDed index

I am right-shifting an unsigned integer then &ing it with 0b111, so the resulting value must be in the range [0, 7].
When I use that value as an index into an array of length 8, Frama-C is not able to verify the associated rte: mem_access assertion.
#include <assert.h>
#include <stdbool.h>
#include <stdint.h>
/*#
requires \valid_read(bitRuleBuf + (0 .. 7));
assigns \nothing;
ensures \forall uint8_t c; (0 <= c < 8) ==> (
((\result >> c) & 0b1) <==> bitRuleBuf[(state >> c) & 0b111]
);
*/
uint8_t GetNewOctet(
const uint16_t state,
const bool bitRuleBuf[const static 8])
{
uint8_t result = 0;
/*
loop invariant 0 <= b <= 8;
loop invariant \forall uint8_t c; (0 <= c < b) ==> (
((result >> c) & 0b1) <==> bitRuleBuf[(state >> c) & 0b111]
);
loop assigns b, result;
loop variant 8 - b;
*/
for (uint8_t b = 0; b < 8; b += 1) {
result |= ((uint8_t)bitRuleBuf[(state >> b) & 0b111]) << b;
// Still seeing an issue if break apart the steps:
/*
const uint16_t shifted = state >> b;
const uint16_t anded = shifted & 0b111;
// "assert rte: mem_access" is not successful here.
const uint8_t value = bitRuleBuf[anded];
const uint8_t shifted2 = value << b;
result |= shifted2;
*/
}
return result;
}
/*#
assigns \nothing;
*/
int main(void) {
// Empty cells with both neighbors empty become alive.
// All other cells become empty.
const bool bitRuleBuf[] = {
1, // 0b000
0, // 0b001
0, // 0b010
0, // 0b011
0, // 0b100
0, // 0b101
0, // 0b110
0 // 0b111
};
const uint8_t newOctet = GetNewOctet(0b0010000100, bitRuleBuf);
//assert(newOctet == 0b00011000); // Can be uncommented to verify.
//# assert newOctet == 0b00011000;
return 0;
}
The failed assertion happens for line 27: result |= ((uint8_t)bitRuleBuf[(state >> b) & 0b111]) << b;.
Changing the & 0b111 to % 8 does not resolve the issue.
I have tried many variations of the code and ACSL involved, but have not been successful.
I'm guessing integer promotion might be involved in the issue.
How can the code/ACSL be modified so that verification is successful?
$ frama-c --version
24.0 (Chromium)
I am running frama-c and frama-c-gui with arguments -wp and -wp-rte.
For background of the included block of code, the least significant 10 bits of the state argument are the state of 10 cells of 1-dimensional cellular automata. The function returns the next state of the middle 8 cells of those 10.
Edit: Alt-Ergo 2.4.1 is used:
$ why3 config detect
Found prover Alt-Ergo version 2.4.1, OK.
1 prover(s) added
Save config to /home/user/.why3.conf
First, a small note: if you're using WP, it is also important to state which solvers you have configured: each of them has strengths and weaknesses, so that it might be easier to complete a proof with the appropriate solver. In particular, my answer is based on the use of Z3 (4.8.14), known as z3-ce by
why3 config detect (note that you have to run this command each time you change the set of solvers you use).
EDIT As mentioned in comments below, the Mod-Mask tactic is not available in Frama-C 24.0, but only in the development version (https://git.frama-c.com/pub/frama-c). As far as I can tell, for a Frama-C 24.0 solution, you need to resort to a Coq script as mentioned at the end of this answer.
Z3 alone is not sufficient to complete the proof, but you can use a WP tactic (see section 2.2 of the WP manual about the interactive proof editor in Frama-C's GUI). Namely, if you select land(7, to_sint32(x)) in the proof editor, the tactics panel will show you a Mod-Mask tactic, which converts bitmasks to modulo operations and vice-versa (see image below). If you apply it, Z3 will complete the two resulting proof obligations, completing the proof of the assertion.
After that, you can save the script in order to be able to replay it later: use e.g. -wp-prover script,z3-ce,alt-ergo to let WP take advantage of existing scripts in addition to automated solvers. The scripts are searched for (and saved to) the script subdirectory of the WP session directory, which defaults to ./.frama-c/wp and can be set with -wp-session.
Another possibility is to use a Coq script to complete the proof. This supposes you have Coq, CoqIDE and the Why3 Coq libraries installed (they are available as opam packages coq, coqide and why3-coq respectively). Once this is done and you have configured Why3 to use Coq (why3 config detect should tell you that it has found Coq), you can use it through the GUI of Frama-C to complete proofs that the built-in interactive prover can't take care of.
For that, you may need to configure Frama-C to display a Coq column in the WP Goals panel: click on the Provers button in this panel, and make sure that Coq is ON, as shown below:
Once this is done, you can double-click in the cell of this column that corresponds to the proof obligation you want to discharge with Coq. This will open a CoqIDE session where you have to complete the proof script at then end of the opened file. Once this is done, save the file and quit CoqIDE. The Coq script will then be saved as part of the WP session, and can be played again if coq is among the arguments given to -wp-prover. For what it's worth, the following script seems to do the trick (Coq 8.13.2, Why3 1.4.1, and of course Frama-C 24.0):
intros alloc mem_int b buf state0 state1 x a1 hpos hval hbmax hbmax1 htyp hlink huint1 huint2 huint3 hvalid htyp1 htyp2 hdef.
assert (0<=land 7 (to_sint32 x) <= 7)%Z.
apply uint_land_range; auto with zarith.
unfold valid_rd.
unfold valid_rd in hvalid.
unfold a1.
unfold shift; simpl.
unfold shift in hvalid; simpl in hvalid.
assert (to_sint32 (land 7 (to_sint32 x)) = land 7 (to_sint32 x))%Z.
- apply id_sint32; unfold is_sint32; auto with zarith.
- intros _; repeat split; auto with zarith.

Solving a simple puzzle in prolog

I solved a puzzle in C and tried to do the same in Prolog but i'm having some trouble expressing the facts and goals in this language.
The very simplified version of the problem is this: there's two levers in a room. Each lever control a mechanism that can move either forward or backward in four different positions (which i noted 0, 1, 2 or 3). If you move a mechanism four times in the same direction, it'll be in the same position as before.
The lever n°1 move the mechanism n°1 two positions forward.
The lever n°2 move the mechanism n°2 one position forward.
Initially, the mechanism n°1 is in position 2 and the mechanism n°2 is in position 1.
The problem is to find the quickest way to move both mechanisms in position 0 and get the sequence of lever that lead to each solution.
Of course here the problem is trivial and you only need to pull the lever n°1 one time and the lever n°2 three times to have a solution.
Here's a simple code in C which gives the sequence of lever to pull to solve this problem by pulling less than 5 levers:
int pos1 = 2, pos2 = 1;
int main()
{
resolve(0,5);
return 0;
}
void lever1(){
pos1 = (pos1 + 2) % 4;
}
void undolever1(){
pos1 = (pos1 - 2) % 4;
}
void lever2(){
pos2 = (pos2 + 1) % 4;
}
void undolever2(){
pos2 = (pos2 - 1) % 4;
}
void resolve(l, k){
if(k == 0){
return;
}
if(pos1 == 0 && pos2 == 0){
printf("Solution: %d\n", l);
return;
}
if(k>0){
k--;
lever1();
resolve(l*10+1,k);
undolever1();
lever2();
resolve(l*10+2,k);
undolever2();
}
}
My code in Prolog looks like this so far:
lever(l1).
lever(l2).
mechanism(m1).
mechanism(m2).
position(m1,2).
position(m2,1).
pullL1() :- position(m1, mod(position(m1,X)+2,4)).
pullL2() :- position(m2, mod(position(m2,X)+1,4)).
solve(k) :- solve_(k, []).
solve_(0, r) :- !, postion(m1, p1), postion(m2, p2), p1 == 0, p2 == 0.
solve_(k, r) :- k > 0, pullL1(), k1 is k - 1, append(r, [1], r1), solve_(k1, r1).
solve_(k, r) :- k > 0, pullL2(), k1 is k - 1, append(r, [2], r2), solve_(k1, r2).
I'm pretty sure there's multiple problems in this code but I'm not sure how to fix it.
Any help would be really appreciated.
I think this is a very interesting Problem. I suppose you want a general solution -> one lever can move multiple mechanisms. In the case the problem is like yours, where one lever only controls one mechanism the solution is trivial. You just move every lever for the amount of time until the mechanism is at state zero.
But I want to provide a more general solution, so where one lever can move multiple mechanisms. But first a little bit math. Don't worry i'll end up doing an example too.
Lets define
as being n levers and
being m mechanisms. Then lets define every lever by a vector:
where is the amount of steps moves forward.
For the mechanisms we define:
beeing the bias of the mechanisms -> so is in initial state and
being the amount of states for every mechanism. So now we can describe our whole system like this:
wher is the amount of times we have to activate . If we want to set all mechanisms to zero. If you are not familiar with the notation this just means that a%m = b%m.
can we rewritten as:
where k can be any natural number. So we can rewrite our system to an equation system:
prolog can solve for us such a equation system.
(there are different solutions to solve diophantine equation systems look at https://en.wikipedia.org/wiki/Diophantine_equation)
Ok now lets make an example: let say we have two levers and three mechanisms with 4 states. The fist lever moves M1 one forward and M3 two forward. The second lever moves M2 one forward and M3 one forward. M1 is in State 2. M2 is in State 3. M3 is in State 3. So our equation system looks like this:
in prolog we can solve this with the clpfd libary.
?- [library(clpfd)].
and then solve like this:
?- X1+(-4)*K1+2 #= 0, 1*X2+(-4)*K2+3 #= 0, 2*X1+X2+(-4)*K3+3 #= 0,Vs = [X1,X2], Vs ins 0..100,label(Vs).
which gives us the solution
Vs = [2, 1]
-> so X1 = 2 and X2 = 1 which is correct. Prolog can give you more solutions.

How to optimize valve simulation logic? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
This is an simple logic-programming and optimisation etude, that I've created for myself, and somewhat stumbled upon it.
I have a numerical simulation of simple scheme. Consider some reservoir (or a capacitor) Cm, which is constantly pumping up with pressure. Lets call its current state Vm:
At its output it has a valve, or a gate, G, that could be open or closed, according to following logic:
Gate opens, when pressure(or voltage) Vm exceeds some threshold, call it Vopen: Vm > Vopen
Gate remains open, while current outrush Ia is greater than some Ihold: Ia > IHold
Gate conducts power only out of reservoir (like a diode)
I am doing numerical ODE solving of this, i.e. determining Vm and Ia at each (equal, small) timestep dt. Have three variants of this:
Variable types:
float Vm=0.0, Ia=0.0, Vopen, Ihold, Ra, dt;
int G=0;
Loop body v1 (serial):
Vm = Vm + (-Ia*G)*dt;
G |= (Vm > Vopen);
Ia = Ia + (Vm*Ra*G)*dt;
G &= (Ia > Ihold);
Loop body v2 (serial, with temp var, ternary conditionals):
int Gv; // temporary var
Vm = Vm + (-Ia*G)*dt;
Gv = (Vm > Vopen) ? 1 : G;
Ia = Ia + (Vm*Ra*Gv)*dt;
G = (Ia > Ihold) ? Gv : 0;
Loop body v3 (parallel, with cache):
// cache new state first
float Vm1 = Vm + (-Ia*G)*dt;
float Ia1 = Ia + (Vm*Ra*G)*dt;
// update memory
G = ( Vm1 > Vopen ) || (( Ia1 > Ihold ) && G);
Vm = Vm1;
Ia = Ia1; // not nesesary to cache, done for readibility
the G was taken from building up the following truth table, plus imagination:
Q:
Which is correct? Are they?
How third variant(parallel logic) differs from first two(serial logic)?
Are there more effective ways of doing this logic?
PS. I am trying to optimize it for SSE, and then (separately) for OpenCL (if that gives optimisation hints)
PPS. For those who is curious, here is my working simulator, involving this gate (html/js)
By the overall description they are the same and should all fulfil your needs.
Your serial code will produce half-steps. That means if you break it down to the discrete description V(t) can be described as being 1/2 dt in front of I(t). The fist one will keep G changing in every half-step and the second one will synchronise it to I. But since you are not evaluating G in between in doesn't really matter. There is also no real problem with V and I being a half step apart but you should keep it in mind but maybe you should use for plotting/evaluation the vector {V(t),(I(t-1)+I(t))/2,G(t)}.
The parallel code will keep them all in the same time step.
For your pure linear problem the direct integration is a good solution. Higher order ode solver won't buy you anything. State-space representation for pure linear systems only write the same direct integration in a different way.
For SIMD optimisation there isn't munch to do. You need to evaluate step-wise since you are updating I by V and V by I. That means you can't run the steps in parallel which makes many interesting optimisation not possible.

Weird angle results when drone is flying

We use the L3GD20 gyro sensor and the LSM303DLHC accelerometer sensor in combination with the complementary filter to measure the angles of the drone.
If we simulate the angles of the drone with our hands, for example, if we tilt the drone forward, our x angle is positive. And if we tilt it backwards, our x angle is negative.
But if we activate our motors, the drone will always go to a negative x-angle. On top of that, the x-angle that used to be positive is now negative. Because the drone tries to compensate this angle, but the angles are inverted, the drone will never go back to its original state.
#define RAD_TO_DEG 57.29578
#define G_GAIN 0.070
#define AA 0.98
#define LOOP_TIME 0.02
void calculate_complementaryfilter(float* complementary_angles)
{
complementary_angles[X] = AA * (complementary_angles[X] + gyro_rates[X] * LOOP_TIME) + (1 - AA) * acc_angles[X];
complementary_angles[Y] = AA * (complementary_angles[Y] + gyro_rates[Y] * LOOP_TIME) + (1 - AA) * acc_angles[Y];
}
void convert_accelerometer_data_to_deg()
{
acc_angles[X] = (float) atan2(acc_raw[X], (sqrt(acc_raw[Y] * acc_raw[Y] + acc_raw[Z] * acc_raw[Z]))) * RAD_TO_DEG;
acc_angles[Y] = (float) atan2(acc_raw[Y], (sqrt(acc_raw[X] * acc_raw[X] + acc_raw[Z] * acc_raw[Z]))) * RAD_TO_DEG;
}
void convert_gyro_data_to_dps()
{
gyro_rates[X] = (float)gyr_raw[X] * G_GAIN;
gyro_rates[Y] = (float)gyr_raw[Y] * G_GAIN;
gyro_rates[Z] = (float)gyr_raw[Z] * G_GAIN;
}
The problem isn't the shaking of the drone. If we put the motors on max speed and simulate the angles by hand, we get the right angles. Thus also the right compensation by the motors.
If we need to add more code, just ask.
Thankyou in advance.
Standard exhaustive methodology for this kind of problems
You can go top-bottom or bottom-top. In this case, I'm more inclined to think in a hardware related problem, but it is up to you:
Power related problem
When you take the drone with your hand and run motors at full throttle, do they have propellers installed?
Motors at full speed without propellers drawn only a fraction of their full load. When lifting drone weight, a voltage drop can cause electronic malfunction.
Alternative cause: shortcircuits/derivations?
Mechanical problem (a.k.a vibrations spoil sensor readings)
In the past, I've seen MEMS sensors suffer a lot under heavy vibrations (amplitude +-4g). And with "suffer a lot" I mean accelerometer not even registering gravity and gyros returning meaningless data.
If vibrations are significative, you need either a better frame for the drone or a better vibration isolation for the sensors.
SW issues (data/algorithm/implementation)
If it is definitely unrelated with power and mechanical aspects, you can log raw sensor data and process it offline to see if it makes sense.
For this, you need an implementation of the same algorithm embedded in the drone.
Here you will be able to discern between:
Buggy/wrong embedded implementation.
Algorithm fails under working conditions.
Other: data looks wrong (problem reading), SW not reaching time cycle constraints.

ARM: Start/Wakeup/Bringup the other CPU cores/APs and pass execution start address?

I've been banging my head with this for the last 3-4 days and I can't find a DECENT explanatory documentation (from ARM or unofficial) to help me.
I've got an ODROID-XU board (big.LITTLE 2 x Cortex-A15 + 2 x Cortex-A7) board and I'm trying to understand a bit more about the ARM architecture. In my "experimenting" code I've now arrived at the stage where I want to WAKE UP THE OTHER CORES FROM THEIR WFI (wait-for-interrupt) state.
The missing information I'm still trying to find is:
1. When getting the base address of the memory-mapped GIC I understand that I need to read CBAR; But no piece of documentation explains how the bits in CBAR (the 2 PERIPHBASE values) should be arranged to get to the final GIC base address
2. When sending an SGI through the GICD_SGIR register, what interrupt ID between 0 and 15 should I choose? Does it matter?
3. When sending an SGI through the GICD_SGIR register, how can I tell the other cores WHERE TO START EXECUTION FROM?
4. How does the fact that my code is loaded by the U-BOOT bootloader affect this context?
The Cortex-A Series Programmer's Guide v3.0 (found here: link) states the following in section 22.5.2 (SMP boot in Linux, page 271):
While the primary core is booting, the secondary cores will be held in a standby state, using the
WFI instruction. It (the primary core) will provide a startup address to the secondary cores and wake them using an
Inter-Processor Interrupt(IPI), meaning an SGI signalled through the GIC
How does Linux do that? The documentation-S don't give any other details regarding "It will provide a startup address to the secondary cores".
My frustration is growing and I'd be very grateful for answers.
Thank you very much in advance!
EXTRA DETAILS
Documentation I use:
ARMv7-A&R Architecture Reference Manual
Cortex-A15 TRM (Technical Reference Manual)
Cortex-A15 MPCore TRM
Cortex-A Series Programmer's Guide v3.0
GICv2 Architecture Specification
What I've done by now:
UBOOT loads me at 0x40008000; I've set-up Translation Tables (TTBs), written TTBR0 and TTBCR accordingly and mapped 0x40008000 to 0x8000_0000 (2GB), so I also enabled the MMU
Set-up exception handlers of my own
I've got Printf functionality over the serial (UART2 on ODROID-XU)
All the above seems to work properly.
What I'm trying to do now:
Get the GIC base address => at the moment I read CBAR and I simply AND (&) its value with 0xFFFF8000 and use this as the GIC base address, although I'm almost sure this ain't right
Enable the GIC distributor (at offset 0x1000 from GIC base address?), by writting GICD_CTLR with the value 0x1
Construct an SGI with the following params: Group = 0, ID = 0, TargetListFilter = "All CPUs Except Me" and send it (write it) through the GICD_SGIR GIC register
Since I haven't passed any execution start address for the other cores, nothing happens after all this
....UPDATE....
I've started looking at the Linux kernel and QEMU source codes in search for an answer. Here's what I found out (please correct me if I'm wrong):
When powering up the board ALL THE CORES start executing from the reset vector
A software (firmware) component executes WFI on the secondary cores and some other code that will act as a protocol between these secondary cores and the primary core, when the latter wants to wake them up again
For example, the protocol used on the EnergyCore ECX-1000 (Highbank) board is as follows:
**(1)** the secondary cores enter WFI and when
**(2)** the primary core sends an SGI to wake them up
**(3)** they check if the value at address (0x40 + 0x10 * coreid) is non-null;
**(4)** if it is non-null, they use it as an address to jump to (execute a BX)
**(5)** otherwise, they re-enter standby state, by re-executing WFI
**(6)** So, if I had an EnergyCore ECX-1000 board, I should write (0x40 + 0x10 * coreid) with the address I want each of the cores to jump to and send an SGI
Questions:
1. What is the software component that does this? Is it the BL1 binary I've written on the SD Card, or is it U-BOOT?
2. From what I understand, this software protocol differs from board to board. Is it so, or does it only depend on the underlying processor?
3. Where can I find information about this protocol for a pick-one ARM board? - can I find it on the official ARM website or on the board webpage?
Ok, I'm back baby. Here are the conclusions:
The software component that puts the CPUs to sleep is the bootloader (in my case U-Boot)
Linux somehow knows how the bootloader does this (hardcoded in the Linux kernel for each board) and knows how to wake them up again
For my ODROID-XU board the sources describing this process are UBOOT ODROID-v2012.07 and the linux kernel found here: LINUX ODROIDXU-3.4.y (it would have been better if I looked into kernel version from the branch odroid-3.12.y since the former doesn't start all of the 8 processors, just 4 of them but the latter does).
Anyway, here's the source code I've come up with, I'll post the relevant source files from the above source code trees that helped me writing this code afterwards:
typedef unsigned int DWORD;
typedef unsigned char BOOLEAN;
#define FAILURE (0)
#define SUCCESS (1)
#define NR_EXTRA_CPUS (3) // actually 7, but this kernel version can't wake them up all -> check kernel version 3.12 if you need this
// Hardcoded in the kernel and in U-Boot; here I've put the physical addresses for ease
// In my code (and in the linux kernel) these addresses are actually virtual
// (thus the 'VA' part in S5P_VA_...); note: mapped with memory type DEVICE
#define S5P_VA_CHIPID (0x10000000)
#define S5P_VA_SYSRAM_NS (0x02073000)
#define S5P_VA_PMU (0x10040000)
#define EXYNOS_SWRESET ((DWORD) S5P_VA_PMU + 0x0400)
// Other hardcoded values
#define EXYNOS5410_REV_1_0 (0x10)
#define EXYNOS_CORE_LOCAL_PWR_EN (0x3)
BOOLEAN BootAllSecondaryCPUs(void* CPUExecutionAddress){
// 1. Get bootBase (the address where we need to write the address where the woken CPUs will jump to)
// and powerBase (we also need to power up the cpus before waking them up (?))
DWORD bootBase, powerBase, powerOffset, clusterID;
asm volatile ("mrc p15, 0, %0, c0, c0, 5" : "=r" (clusterID));
clusterID = (clusterID >> 8);
powerOffset = 0;
if( (*(DWORD*)S5P_VA_CHIPID & 0xFF) < EXYNOS5410_REV_1_0 )
{
if( (clusterID & 0x1) == 0 ) powerOffset = 4;
}
else if( (clusterID & 0x1) != 0 ) powerOffset = 4;
bootBase = S5P_VA_SYSRAM_NS + 0x1C;
powerBase = (S5P_VA_PMU + 0x2000) + (powerOffset * 0x80);
// 2. Power up each CPU, write bootBase and send a SEV (they are in WFE [wait-for-event] standby state)
for (i = 1; i <= NR_EXTRA_CPUS; i++)
{
// 2.1 Power up this CPU
powerBase += 0x80;
DWORD powerStatus = *(DWORD*)( (DWORD) powerBase + 0x4);
if ((powerStatus & EXYNOS_CORE_LOCAL_PWR_EN) == 0)
{
*(DWORD*) powerBase = EXYNOS_CORE_LOCAL_PWR_EN;
for (i = 0; i < 10; i++) // 10 millis timeout
{
powerStatus = *(DWORD*)((DWORD) powerBase + 0x4);
if ((powerStatus & EXYNOS_CORE_LOCAL_PWR_EN) == EXYNOS_CORE_LOCAL_PWR_EN)
break;
DelayMilliseconds(1); // not implemented here, if you need this, post a comment request
}
if ((powerStatus & EXYNOS_CORE_LOCAL_PWR_EN) != EXYNOS_CORE_LOCAL_PWR_EN)
return FAILURE;
}
if ( (clusterID & 0x0F) != 0 )
{
if ( *(DWORD*)(S5P_VA_PMU + 0x0908) == 0 )
do { DelayMicroseconds(10); } // not implemented here, if you need this, post a comment request
while (*(DWORD*)(S5P_VA_PMU + 0x0908) == 0);
*(DWORD*) EXYNOS_SWRESET = (DWORD)(((1 << 20) | (1 << 8)) << i);
}
// 2.2 Write bootBase and execute a SEV to finally wake up the CPUs
asm volatile ("dmb" : : : "memory");
*(DWORD*) bootBase = (DWORD) CPUExecutionAddress;
asm volatile ("isb");
asm volatile ("\n dsb\n sev\n nop\n");
}
return SUCCESS;
}
This successfully wakes 3 of 7 of the secondary CPUs.
And now for that short list of relevant source files in u-boot and the linux kernel:
UBOOT: lowlevel_init.S - notice lines 363-369, how the secondary CPUs wait in a WFE for the value at _hotplug_addr to be non-zeroed and to jump to it; _hotplug_addr is actually bootBase in the above code; also lines 282-285 tell us that _hotplug_addr is to be relocated at CONFIG_PHY_IRAM_NS_BASE + _hotplug_addr - nscode_base (_hotplug_addr - nscode_base is 0x1C and CONFIG_PHY_IRAM_NS_BASE is 0x02073000, thus the above hardcodings in the linux kernel)
LINUX KERNEL: generic - smp.c (look at function __cpu_up), platform specific (odroid-xu): platsmp.c (function boot_secondary, called by generic __cpu_up; also look at platform_smp_prepare_cpus [at the bottom] => that's the function that actually sets the boot base and power base values)
For clarity and future reference, there's a subtle piece of information missing here thanks to the lack of proper documentation of the Exynos boot protocol (n.b. this question should really be marked "Exynos 5" rather than "Cortex-A15" - it's a SoC-specific thing and what ARM says is only a general recommendation). From cold boot, the secondary cores aren't in WFI, they're still powered off.
The simpler minimal solution (based on what Linux's hotplug does), which I worked out in the process of writing a boot shim to get a hypervisor running on the XU, takes two steps:
First write the entry point address to the Exynos holding pen (0x02073000 + 0x1c)
Then poke the power controller to switch on the relevant core(s): That way, they drop out of the secure boot path into the holding pen to find the entry point waiting for them, skipping the WFI loop and obviating the need to even touch the GIC at all.
Unless you're planning a full-on CPU hotplug implementation you can skip checking the cluster ID - if we're booting, we're on cluster 0 and nowhere else (the check for pre-production chips with backwards cluster registers should be unnecessary on the Odroid too - certainly was for me).
From my investigation, firing up the A7s is a little more involved. Judging from the Exynos big.LITTLE switcher driver, it seems you need to poke a separate set of power controller registers to enable cluster 1 first (and you may need to mess around with the CCI too, especially to have the MMUs and caches on) - I didn't get further since by that point it was more "having fun" than "doing real work"...
As an aside, Samsung's mainline patch for CPU hotplug on the 5410 makes the core power control stuff rather clearer than the mess in their downstream code, IMO.
QEMU uses PSCI
The ARM Power State Coordination Interface (PSCI) is documented at: https://developer.arm.com/docs/den0022/latest/arm-power-state-coordination-interface-platform-design-document and controls things such as powering on and off of cores.
TL;DR this is the aarch64 snippet to wake up CPU 1 on QEMU v3.0.0 ARMv8 aarch64:
/* PSCI function identifier: CPU_ON. */
ldr w0, =0xc4000003
/* Argument 1: target_cpu */
mov x1, 1
/* Argument 2: entry_point_address */
ldr x2, =cpu1_entry_address
/* Argument 3: context_id */
mov x3, 0
/* Unused hvc args: the Linux kernel zeroes them,
* but I don't think it is required.
*/
hvc 0
and for ARMv7:
ldr r0, =0x84000003
mov r1, #1
ldr r2, =cpu1_entry_address
mov r3, #0
hvc 0
A full runnable example with a spinlock is available on the ARM section of this answer: What does multicore assembly language look like?
The hvc instruction then gets handled by an EL2 handler, see also: the ARM section of: What are Ring 0 and Ring 3 in the context of operating systems?
Linux kernel
In Linux v4.19, that address is informed to the Linux kernel through the device tree, QEMU for example auto-generates an entry of form:
psci {
method = "hvc";
compatible = "arm,psci-0.2", "arm,psci";
cpu_on = <0xc4000003>;
migrate = <0xc4000005>;
cpu_suspend = <0xc4000001>;
cpu_off = <0x84000002>;
};
The hvc instruction is called from: https://github.com/torvalds/linux/blob/v4.19/drivers/firmware/psci.c#L178
static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
which ends up going to: https://github.com/torvalds/linux/blob/v4.19/arch/arm64/kernel/smccc-call.S#L51
Go to www.arm.com and download there evaluation copy of DS-5 developement suite. Once installed, under the examples there will be a startup_Cortex-A15MPCore directory. Look at startup.s.

Resources