I'm trying to erase a NOR Flash memory with Linux MTD driver in C...
I'm confused about the return status from the ioctl(MEMUNLOCK) call which returns an error even if ioctl(MEMERASE) is successful after it.
The following code displays the warning message but works (i.e. the Flash block has been erased):
int erase_MTD_Pages(int fd, size_t size, off_t offset)
{
mtd_info_t mtd_info;
erase_info_t ei;
ioctl(fd, MEMGETINFO, &mtd_info);
ei.length = mtd_info.erasesize;
for(ei.start = offset; ei.start < (offset+size); ei.start += mtd_info.erasesize) {
if(ioctl(fd, MEMUNLOCK, &ei) < 0)
{
// logPrintf(FAILURE, "[Flash] Can not unlock MTD (MEMUNLOCK, errno=%d)!\n", errno);
// return RETURN_FILE_ERROR;
logPrintf(WARNING, "[Flash] Can not unlock MTD (MEMUNLOCK, errno=%d)!\n", errno);
}
if(ioctl(fd, MEMERASE, &ei) < 0)
{
logPrintf(FAILURE, "[Flash] Can not erase MTD (MEMERASE, errno=%d)!\n", errno);
return RETURN_FILE_ERROR;
}
}
return RETURN_SUCCESS;
}
When I look some C codes on the net, the return status from MEMUNLOCK is not always checked (e.g. from mtc.c):
ioctl(fd, MEMUNLOCK, &mtdEraseInfo);
if(ioctl(fd, MEMERASE, &mtdEraseInfo)) {
fprintf(stderr, "Could not erase MTD device: %s\n", mtd);
close(fd);
exit(1);
}
flash_unlock also returns an error:
root $ cat /proc/mtd
dev: size erasesize name
mtd0: 00020000 00020000 "X-Loader-NOR"
mtd1: 000a0000 00020000 "U-Boot-NOR"
mtd2: 00040000 00020000 "Boot Env-NOR"
mtd3: 00400000 00020000 "Kernel-NOR"
mtd4: 03b00000 00020000 "File System-NOR"
root $ mtd_debug info /dev/mtd3
mtd.type = MTD_NORFLASH
mtd.flags = MTD_CAP_NORFLASH
mtd.size = 4194304 (4M)
mtd.erasesize = 131072 (128K)
mtd.writesize = 1
mtd.oobsize = 0
regions = 0
root $ flash_unlock /dev/mtd3
Could not unlock MTD device: /dev/mtd3
Am I missing something? Is it normal to get an error from MEMUNLOCK with some configurations?
Notes / Environment:
The read-only flag (MTD_WRITEABLE) in not set on the mtd3 partition (only on mtd0 and mtd1).
flash_lock also returns the same error.
TI AM3505 (ARM Cortex A8, OMAP34).
Linux 2.6.37.
Flash NOR Spansion S29GL512S12DHIV1.
Kernel log:
mtdoops: mtd device (mtddev=name/number) must be supplied
physmap platform flash device: 08000000 at 08000000
physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank. Manufacturer ID 0x000001 Chip ID 0x002301
Amd/Fujitsu Extended Query Table at 0x0040
Amd/Fujitsu Extended Query version 1.5.
Silicon revision: 14
Address sensitive unlock: Required
Erase Suspend: Read/write
Block protection: 1 sectors per group
Temporary block unprotect: Not supported
Block protect/unprotect scheme: 8
Number of simultaneous operations: 0
Burst mode: Not supported
Page mode: 12 word page
Vpp Supply Minimum Program/Erase Voltage: 0.0 V
Vpp Supply Maximum Program/Erase Voltage: 0.0 V
Top/Bottom Boot Block: Uniform, Top WP
number of CFI chips: 1
RedBoot partition parsing not available
Using physmap partition information
Creating 5 MTD partitions on "physmap-flash.0":
0x000000000000-0x000000020000 : "X-Loader-NOR"
0x000000020000-0x0000000c0000 : "U-Boot-NOR"
0x0000000c0000-0x000000100000 : "Boot Env-NOR"
0x000000100000-0x000000500000 : "Kernel-NOR"
0x000000500000-0x000004000000 : "File System-NOR"
For a flash chip that I worked on (drivers/mtd/devices/m25p80.c), I found that UNLOCK was not implemented. The driver's ioctl(UNLOCK) returned -EOPNOTSUPP=95. And code inspection showed mtd_unlock return status being dropped on the floor, as you have found.
These imply assumptions in the m25p80 driver that flash will just never be locked, and in the mtd drivers that it's OK for the device driver to omit UNLOCK. On the board I worked on, flash was being locked by u-boot after every write, so erase and reprogram from linux didn't work at all. I looked at u-boot driver and device datasheet, got some code to implement m25p80_lock and m25p80_unlock, it was not too difficult after I knew what was up. I did not upstream it.
It does seem like a defect for chip drivers to not implement these.
By the way Mousstix, very nice job providing full information in this question.
On newer Kernels (tested on 4.1.18) there is an device-tree option named "use-advanced-sector-protection;" When this is set, I was able to erase/write to protected flash-regions.
It is also documented in the Kernel: Documentation/devicetree/bindings/mtd/mtd-physmap.txt
Related
I have a similar issue mentioned here but with a different behavior.
We have an FPGA (from Altera) acts as a 32KB memory on a PCIe bus of IMX8M-Plus CPU (ARM Cortex-A53).
I wrote a simple driver to access FPGA's memory. As you can see from lspci output below, 32KB memory mapped to Region 4 (BAR4) and I use avalon_ioctl_set_operation() and avalon_ioctl_get_operation() to get and set BAR content.
static int avalon_ioctl_get_operation(unsigned long arg, uint8_t op_size)
{
struct avalon_pcie_operation o;
if (copy_from_user(&o, (void __user*)arg, sizeof(o)))
return -EFAULT;
if (o.bar >= PCI_SRIOV_NUM_BARS)
return -EFAULT;
if (!io_dev.bar_addrs[o.bar])
return -ENOMEM;
if (!IS_ALIGNED(o.offset, op_size))
return -EFAULT;
switch (op_size)
{
case 1:
o.data8 = ioread8(io_dev.bar_addrs[o.bar] + o.offset);
break;
case 2:
o.data16 = ioread16(io_dev.bar_addrs[o.bar] + o.offset);
break;
case 4:
o.data32 = ioread32(io_dev.bar_addrs[o.bar] + o.offset);
break;
case 8:
o.data64 = ioread64(io_dev.bar_addrs[o.bar] + o.offset);
break;
default:
return -EFAULT;
}
if (copy_to_user((void __user *)arg, &o, sizeof(o)))
return -EFAULT;
return 0;
}
static int avalon_ioctl_set_operation(unsigned long arg, uint8_t op_size)
{
struct avalon_pcie_operation o;
if (copy_from_user(&o, (void __user*)arg, sizeof(o)))
return -EFAULT;
if (o.bar >= PCI_SRIOV_NUM_BARS)
return -EFAULT;
if (!io_dev.bar_addrs[o.bar])
return -ENOMEM;
if (!IS_ALIGNED(o.offset, op_size))
return -EFAULT;
switch (op_size)
{
case 1:
iowrite8(o.data8, io_dev.bar_addrs[o.bar] + o.offset);
break;
case 2:
iowrite16(o.data16, io_dev.bar_addrs[o.bar] + o.offset);
break;
case 4:
iowrite32(o.data32, io_dev.bar_addrs[o.bar] + o.offset);
break;
case 8:
iowrite64(o.data64, io_dev.bar_addrs[o.bar] + o.offset);
break;
default:
return -EFAULT;
}
return 0;
}
For my testing, I used BAR4 and offset 0. Whenever I call those functions for 8/16/32 bit variants of read/write all are working fine. I can read whatever I write.
But when I attempt to use iowrite64() and ioread64() from offset 0, I read garbage data (0xFFFFFFFFFFFFFFFF) and PCIe config page is altered and PCIe device stops functioning (you can see lspci output in the altered state at the bottom). And that is happening immediately after ioread64() function call.
I stepped into ioread64() function and saw that at the end it uses __raw_readq() which is defined as
static inline u64 __raw_readq(const volatile void __iomem *addr)
{
u64 val;
asm volatile(ALTERNATIVE("ldr %0, [%1]",
"ldar %0, [%1]",
ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE)
: "=r" (val) : "r" (addr));
return val;
}
We use Linux v5.4 aarch64, I believe that accessing to a bus as a 64-bit should be fine.
uname -a
Linux 5.4.193-0+git.a301219f58a2 #1 SMP PREEMPT Thu Dec 15 14:09:11 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
Here is the lspci -vv output at the beginning of the test:
00:00.0 PCI bridge: Synopsys, Inc. DWC_usb3 / PCIe bridge (rev 01) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 218
Region 0: Memory at 18000000 (32-bit, non-prefetchable) [size=1M]
Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
I/O behind bridge: [disabled]
Memory behind bridge: [disabled]
Prefetchable memory behind bridge: 18100000-181fffff [size=1M]
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
Expansion ROM at 18200000 [virtual] [disabled] [size=64K]
BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit-
Address: bc022000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <1us, L1 <8us
ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x1 (ok)
TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt+
RootCap: CRSVisible+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP+, LTR-
10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-, LN System CLS Not Supported, TPHComp-, ExtTPHComp-, ARIFwd-
AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
AtomicOpsCtl: ReqEn- EgressBlck-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr+ BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
RootCmd: CERptEn+ NFERptEn+ FERptEn+
RootSta: CERcvd+ MultCERcvd+ UERcvd- MultUERcvd-
FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
Capabilities: [148 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
LaneErrStat: 0
Capabilities: [158 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=10us
L1SubCtl2: T_PwrOn=10us
Kernel driver in use: pcieport
01:00.0 RAM memory: Altera Corporation Device e001
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 219
Region 0: Memory at 18108000 (64-bit, prefetchable) [size=512]
Region 4: Memory at 18100000 (64-bit, prefetchable) [size=32K]
Capabilities: [50] MSI: Enable+ Count=1/4 Maskable- 64bit+
Address: 00000000bc022000 Data: 0001
Capabilities: [78] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #1, Speed 8GT/s, Width x2, ASPM not supported
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x1 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+, NROPrPrP-, LTR-
10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt+, EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-, TPHComp-, ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [200 v1] Vendor Specific Information: ID=1172 Rev=0 Len=044 <?>
Capabilities: [300 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
LaneErrStat: 0
Kernel driver in use: avalon-dma
Kernel modules: avalon_drv
Corrupted state after ioread64():
00:00.0 PCI bridge: Synopsys, Inc. DWC_usb3 / PCIe bridge (rev 01) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 218
Memory at 18000000 (32-bit, non-prefetchable) [size=1M]
Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
I/O behind bridge: [disabled]
Memory behind bridge: [disabled]
Prefetchable memory behind bridge: 18100000-181fffff [size=1M]
Expansion ROM at 18200000 [virtual] [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit-
Capabilities: [70] Express Root Port (Slot-), MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] Secondary PCI Express
Capabilities: [158] L1 PM Substates
Kernel driver in use: pcieport
01:00.0 RAM memory: Altera Corporation Device e001 (rev ff) (prog-if ff)
!!! Unknown header type 7f
Kernel driver in use: avalon-dma
Kernel modules: avalon_drv
Do you know any restrictions available not to access PCIe bus as 64-bit on a 64-bit CPU with 64-bit OS? Or is there any special procedure that I need to follow to use ioread64()?
Explanation for #0andriy comment: I went through iMX8M-Plus reference manual and it says that it supports 32- and 64-bit PCI Express addresses and also 64 bit MSI at hardware level but I'm not sure if it is supported at Linux driver level. Do you know how I can be sure about it? About your question, The reason why I did choose to use ioreadXX/iowriteXX apis at the first place is that I have an example avalon dma driver and the driver accesses to the DMA registers in the FPGA which are mapped to BAR0 by using ioread32/iowrite32 apis. DMA and all MSI interrupts are working fine. Basically I followed the same approach. Only exception which is not used by the Avalon dma driver but in my custom driver was using ioread64/iowrite64 apis which I had problems there and trying to figure it out why.
Situation : board with an Arm CPU that has Nand flash next to it. On power-up, U-boot bootloader starts up and copies the flash contents to RAM, then it transfers control to that code in RAM. A Linux system with some application code, composed through Buildroot, starts running. Its entire filesystem is stored as a single UBIFS file in flash, and it starts using that.
When a certain byte is set, the bootloader keeps in control, and starts a TFTP transfer to download and store a new flash image.
Trigger : a board came back defective. Linux kernel startup clearly shows the issue:
[ 1.931150] Creating 8 MTD partitions on "atmel_nand":
[ 1.936285] 0x000000000000-0x000000040000 : "at91bootstrap"
[ 1.945280] 0x000000040000-0x0000000c0000 : "bootloader"
[ 1.954065] 0x0000000c0000-0x000000100000 : "bootloader env"
[ 1.963262] 0x000000100000-0x000000140000 : "bootloader redundant env"
[ 1.973221] 0x000000140000-0x000000180000 : "spare"
[ 1.981552] 0x000000180000-0x000000200000 : "device tree"
[ 1.990466] 0x000000200000-0x000000800000 : "kernel"
[ 1.999210] 0x000000800000-0x000010000000 : "rootfs"
...
[ 4.016251] ubi0: attached mtd7 (name "rootfs", size 248 MiB)
[ 4.022181] ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
[ 4.029040] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
[ 4.035941] ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
[ 4.042960] ubi0: good PEBs: 1980, bad PEBs: 4, corrupted PEBs: 0
[ 4.049033] ubi0: user volume: 2, internal volumes: 1, max. volumes count: 128
[ 4.056359] ubi0: max/mean erase counter: 2/0, WL threshold: 4096, image sequence number: 861993884
[ 4.065476] ubi0: available PEBs: 0, total reserved PEBs: 1980, PEBs reserved for bad PEB handling: 36
[ 4.074898] ubi0: background thread "ubi_bgt0d" started, PID 77
...
[ 4.298009] UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0, name "rootfs", R/O mode
[ 4.306415] UBIFS (ubi0:0): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
[ 4.316418] UBIFS (ubi0:0): FS size: 155926528 bytes (148 MiB, 1228 LEBs), journal size 9023488 bytes (8 MiB, 72 LEBs)
[ 4.327197] UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
[ 4.333095] UBIFS (ubi0:0): media format: w4/r0 (latest is w5/r0), UUID AE9F77DC-04AF-433F-92BC-D3375C83B518, small LPT model
[ 4.346924] VFS: Mounted root (ubifs filesystem) readonly on device 0:15.
[ 4.356186] devtmpfs: mounted
[ 4.367038] Freeing unused kernel memory: 1024K
[ 4.371812] Run /sbin/init as init process
[ 4.568143] UBIFS (ubi0:1): background thread "ubifs_bgt0_1" started, PID 83
[ 4.644809] UBIFS (ubi0:1): recovery needed
[ 4.685823] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 126976 bytes from PEB 235:4096, read only 126976 bytes, retry
[ 4.732212] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 126976 bytes from PEB 235:4096, read only 126976 bytes, retry
[ 4.778705] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 126976 bytes from PEB 235:4096, read only 126976 bytes, retry
[ 4.824159] ubi0 error: ubi_io_read: error -74 (ECC error) while reading 126976 bytes from PEB 235:4096, read 126976 bytes
... which causes an exception, but the kernel keeps on going, then another error is detected :
[ 5.071518] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 126976 bytes from PEB 709:4096, read only 126976 bytes, retry
[ 5.118110] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 126976 bytes from PEB 709:4096, read only 126976 bytes, retry
[ 5.164447] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 126976 bytes from PEB 709:4096, read only 126976 bytes, retry
[ 5.210987] ubi0 error: ubi_io_read: error -74 (ECC error) while reading 126976 bytes from PEB 709:4096, read 126976 bytes
... but impressively, the system still comes up alive and behaves almost fine.
Why does the kernel not mark these flash blocks as bad ? Those data can't be read anyway, and at least the next image flashing might skip the bad blocks...
Investigation : so the Kernel found a defective PEB #235 (decimal) in the "rootfs" partition of the flash. Each PEB is 128KB, so the error sits somewhere beyond byte 30,801,920 (decimal). Since the "rootfs" partition only starts from byte 0x800000 of the flash, the actual damaged page must be somewhere beyond byte 39,190,528 (decimal) or 0x2560000. And sure enough, when using the nand read utility within U-boot :
U-Boot> nand read 0x20000000 0x2560000 0x1000
NAND read: device 0 offset 0x2560000, size 0x1000
4096 bytes read: OK
U-Boot> nand read 0x20000000 0x2561000 0x1000
NAND read: device 0 offset 0x2561000, size 0x1000
4096 bytes read: OK
U-Boot> nand read 0x20000000 0x2562000 0x1000
NAND read: device 0 offset 0x2562000, size 0x1000
PMECC: Too many errors
NAND read from offset 2562000 failed -5
0 bytes read: ERROR
so the damaged page sits at offset 8K within that block of flash.
From various other posts, I learned that nand flash with 2K pages organized in 128K blocks, has an extra 64 "Out Of Band" bytes over every 2048 payload bytes, bringing each page to a gross size of 2112 bytes. Anyway, the entire block of 128K will have to be disused, as this is the erase size. No problem, there is storage to spare, I just want to make sure that the next flashing will skip over this bad block.
Since neither the Linux kernel nor the bootloader bothered to mark the bad block, I'll do it by hand in U-boot:
U-Boot> nand markbad 2562000
block 0x02562000 successfully marked as bad
A similar investigation for the 2nd bad flash page reveals that the other error sits at flash address 0x60a1000 :
U-Boot> nand read 0 60A1000 800
NAND read: device 0 offset 0x60a1000, size 0x800
PMECC: Too many errors
NAND read from offset 60a1000 failed -5
0 bytes read: ERROR
so here too, the nand markbad utility is used to manually put a permanent mark on this block :
U-Boot> nand markbad 60a1000
block 0x060a1000 successfully marked as bad
and to verify that everything is taken into account :
U-Boot> nand bad
Device 0 bad blocks:
02560000
060a0000
Just like it should be - from the start of each 128K block, both blocks are marked.
Problem : so I learned that the 64 OOB bytes are divided in 2 bytes marker, 38 bytes error-correcting code, and 24 bytes journaling. Of all the OOB bytes accompanying each 2048 payload bytes, only the very first piece of 64 bytes, accompanying the first page of 2KB, lends its 2 bytes marker code to indicate the status of the entire 128KB block. These 2 bytes should be modified in the flash device itself so that this status is persistent. So in my U-boot session, instead of launching the Linux system, I restarted the CPU and remained in U-boot :
U-Boot> reset
resetting ...
RomBOOT
ba_offset = 0xc ...
AT91Bootstrap 3.6.0-00029-g0cd4e6a (Wed Nov 12 12:14:04 CET 2014)
NAND: ONFI flash detected
NAND: Manufacturer ID: 0x2c Chip ID: 0x32
NAND: Disable On-Die ECC
PMECC: page_size: 0x800, oob_size: 0x40, pmecc_cap: 0x4, sector_size: 0x200
NAND: Initialize PMECC params, cap: 0x4, sector: 0x200
NAND: Image: Copy 0x80000 bytes from 0x40000 to 0x26f00000
NAND: Done to load image
U-Boot 2013.10-00403-g1f9a20a (Nov 12 2014 - 12:14:27)
CPU: SAMA5D31
Crystal frequency: 12 MHz
CPU clock : 528 MHz
Master clock : 132 MHz
DRAM: 128 MiB
NAND: 256 MiB
MMC: mci: 0
In: serial
Out: serial
Err: serial
Net: macb0
Hit any key to stop autoboot: 0
U-Boot> nand info
Device 0: nand0, sector size 128 KiB
Page size 2048 b
OOB size 64 b
Erase size 131072 b
U-Boot> nand bad
Device 0 bad blocks:
U-Boot>
The bad blocks have been forgotten - the marker code was not applied persistently ?
Granted, this U-boot version seems rather old. Has the nand markbad utility been improved since then ?
Workaround : I modified the OOB bytes of the first page within the bad block myself. I read all 2112 bytes of the first page into RAM, then modified the 2 bytes marker code, and wrote the 2112 bytes back from RAM into flash. Technically, I should have erased the whole 128K flash page and then written back all 128K of contents. But my laziness has been challenged enough today. Nand flash can be toggled from 1 to 0 arbitrarily - it's the reverse operation that is hard, requiring an erase to restore a whole 128K page back to all-0xFF. I noticed that all the "block good" markers are encoded as 0xFFFF, so I figured that writing "0x0000" instead should suffice.
U-Boot> nand read.raw 0x20200000 0x2560000 1
NAND read: 2112 bytes read: OK
The format for nand read.raw is a bit quirky, as opposed to nand.read which expects size as the last argument in bytes, it wants size expressed in number-of-pages instead. The first page is all we need, so argument '1' does the trick. The contents, which have now been transferred to RAM, can be inspected with U-boot's md utility :
U-Boot> md 0x20200000 0x210
20200000: 23494255 00000001 00000000 01000000 UBI#............
20200010: 00080000 00100000 9cfb6033 00000000 ........3`......
...
202007e0: 00000000 00000000 00000000 00000000 ................
202007f0: 00000000 00000000 00000000 00000000 ................
20200800: ffffffff ffffffff ffffffff ffffffff ................
20200810: ffffffff ffffffff ffffffff ffffffff ................
20200820: ffffffff b0c9aa24 0008fdb8 00000000 ....$...........
20200830: 00000000 00000000 00000000 00000000 ................
Note how the md utility expects its size argument in yet a different format : this one expects it in units of words. Just to keep us alert.
The dump at address 0x20200800 clearly shows how markbad has failed its purpose: the 2 marker bytes of the bad block are still merrily on 0xFFFF.
Then to modify these bytes, another U-boot utility comes in handy :
U-Boot> mm 0x20200800
20200800: ffffffff ? 00000000
20200804: ffffffff ? q
It's a bit crude, I've changed the 4 first OOB bytes instead of just the 2 first marker bytes. Finall, to write the modified contents back into flash :
U-Boot> nand write.raw 0x20200000 0x2560000 1
NAND write: 2112 bytes written: OK
Funny enough, the nand bad diagnostic doesn't notice the block which has just been marked, even after some nand read attempts which do fail.
U-Boot> nand bad
Device 0 bad blocks:
U-Boot>
But this is no cause for alarm. The 2nd bad block was marked manually in a similar fashion, and upon another reset :
U-Boot> reset
resetting ...
RomBOOT
ba_offset = 0xc ...
AT91Bootstrap 3.6.0-00029-g0cd4e6a (Wed Nov 12 12:14:04 CET 2014)
...
U-Boot 2013.10-00403-g1f9a20a (Nov 12 2014 - 12:14:27)
...
Hit any key to stop autoboot: 0
U-Boot> nand bad
Device 0 bad blocks:
02560000
060a0000
U-Boot>
Lo and behold, the 'bad block' marking has persisted ! The next flash storage operation neatly skipped over the bad blocks, saving a consistent kernel and filesystem in the various partitions of the flash. This was the intention all along, but it seems to require gritty manual work. Is there no automated way ?
U-Boot has changed quite a bit since 2014. Patches possibly of relevance to your problem include:
dc0b69fa9f97 ("mtd: nand: mxs_nand: allow to enable BBT support")
c4adf9db5d38 ("spl: nand: sunxi: remove support for so-called 'syndrome' mode")
8d1809a96699 ("spl: nand: simple: replace readb() with chip specific read_buf()")
Please, retest with U-Boot Git HEAD. If there is still something missing, please, report it to the U-Boot developer list or even better send your patch.
I have a question regarding PAPI (Performance Application Programming Interface). I downloaded and installed PAPI library. Still not sure how to use it correctly and what additional things I need, to make it work. I am trying to use it in C. I have this simple program:
int retval;
retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT && retval > 0) {
printf("PAPI error: 1\n");
exit(1);
}
if (retval < 0)
printf("PAPI error: 2\n");
retval = PAPI_is_initialized();
if (retval != PAPI_LOW_LEVEL_INITED)
printf("PAPI error: 2\n");
int num_hwcntrs = 0;
if ((num_hwcntrs = PAPI_num_counters()) <= PAPI_OK)
printf("This system has %d available counters. \n", num_hwcntrs);
I have included papi.h library and I am compiling with gcc -lpapi flag. I added library in path so it is able to compile and run, but as a result I get this:
This system has 0 available counters.
Thought initialization seems to work as it doesn't give error code.
Any advice or suggestion would be helpful to determine what I have not done right or missed to run it correctly. I mean, I should have available counters in my system, more precisely I need cache miss and cache hit counters.
I tried to count available counters after I run this another simple program and it gave error code -25:
int numEvents = 2;
long long values[2];
int events[2] = {PAPI_L3_TCA,PAPI_L3_TCM};
printf("PAPI error: %d\n", PAPI_start_counters(events, numEvents));
UPDATE: I just tried to check from terminal hardware information with command: papi_avail | more; and I got this:
Available PAPI preset and user defined events plus hardware information.
PAPI version : 5.7.0.0
Operating system : Linux 4.15.0-45-generic
Vendor string and code : GenuineIntel (1, 0x1)
Model string and code : Intel(R) Core(TM) i5-6200U CPU # 2.30GHz (78, 0x4e)
CPU revision : 3.000000
CPUID : Family/Model/Stepping 6/78/3, 0x06/0x4e/0x03
CPU Max MHz : 2800
CPU Min MHz : 400
Total cores : 4
SMT threads per core : 2
Cores per socket : 2
Sockets : 1
Cores per NUMA region : 4
NUMA regions : 1
Running in a VM : no
Number Hardware Counters : 0
Max Multiplex Counters : 384
Fast counter read (rdpmc): no
PAPI Preset Events
Name Code Avail Deriv Description (Note)
PAPI_L1_DCM 0x80000000 No No Level 1 data cache misses
PAPI_L1_ICM 0x80000001 No No Level 1 instruction cache misses
PAPI_L2_DCM 0x80000002 No No Level 2 data cache misses
PAPI_L2_ICM 0x80000003 No No Level 2 instruction cache misses
.......
So because Number Hardware Counters is 0, I can't use this tool to count cache misses with PAPI's preset events? Is there any configuration that can be useful or should I forget about it till I change my laptop?
I have a USB device that outputs data of size of one byte, and I want to pass these bytes to FPGA component that exists on AXI bridge, FPGA and CPU are on the same chip... it's SoC FPGA Altera Cyclone V. CPU is ARM Cortex-A9. Kernel version 3.7.0.
There is a software that reads from the USB device and writes to a dump file... it works just fine. I tried to use mmap() to map the FPGA address to the virtual space and write to it from the userspace. When doing so... after say a minute, the kernel seem to crash.
I wrote a driver for my FPGA component and I passed the driver path to that software as a file, so that it writes to it, and eventually to my FPGA component, but the same result... kernel crashes again after a random time.
I also wrote a simple program that reads bytes from a local file and pass it to FPGA... this works fine either ways (using mmap() or driver module), the file passes through to the FPGA with no problems at all no matter how big is the file.
So the problem is when passing from USB device to FPGA, either using mmap() or a driver module.
Here is a sample crash message:
Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
Modules linked in: ipv6
CPU: 1 Not tainted (3.7.0 #106)
PC is at scheduler_ipi+0x8/0x4c
LR is at handle_IPI+0x10c/0x19c
pc : [<800521a0>] lr : [<800140d4>] psr: 80000193
sp : bf87ff58 ip : 8056acc8 fp : 00000000
r10: 00000000 r9 : 413fc090 r8 : 00000001
r7 : 00000000 r6 : bf87e000 r5 : 80535018 r4 : 8053eec0
r3 : 8056ac80 r2 : bf87ff58 r1 : 00000482 r0 : 00000481
Flags: Nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
Control: 10c5387d Table: 3f0c404a DAC: 00000015
Process swapper/1 (pid: 0, stack limit = 0xbf87e240)
Stack: (0xbf87ff58 to 0xbf880000)
ff40: 00000000 800140d4
ff60: fffec10c 8053e418 bf87ff90 fffec100 8000f6e0 8000851c 8000f708 8000f70c
ff80: 60000013 ffffffff bf87ffc4 8000e180 00000000 00000000 00000001 00000000
ffa0: bf87e000 80565688 803ddfb0 80541fc8 8000f6e0 413fc090 00000000 00000000
ffc0: 8053e9b8 bf87ffd8 8000f708 8000f70c 60000013 ffffffff 00000020 8000f894
ffe0: 3f86c06a 00000015 10c0387d 805658d8 0000406a 003d1ee8 31ca2085 5c1021c3
Code: eaffffad 80564700 e92d4800 e1a0200d (4c4c9b50)
---[ end trace 9e492cde975c41f9 ]---
Other crash messages start like:
Unable to handle kernel paging request at virtual address 2a7a4390
Internal error: Oops - bad syscall: ebcffb [#1] SMP ARM
pgd = bf318000
[2a7a4390] *pgd=00000000
And:
Internal error: Oops - undefined instruction: 0 [#2] SMP ARM
Modules linked in: ipv6
CPU: 1 Tainted: G D (3.7.0 #106)
Here is the full crash messages.
I noticed that all the crash messages I get intersect with the PC and LR locations, but actually I don't have previous experience with Linux kernel. I found similar error messages online but none of the proposed solutions worked for me.
Source Code:
This is function is called whenever a new buffer of bytes arrives from USB:
static void rtlsdr_callback(unsigned char *buf, uint32_t len, void *ctx)
{
if (ctx) {
if (do_exit)
return;
if ((bytes_to_read > 0) && (bytes_to_read < len)) {
len = bytes_to_read;
do_exit = 1;
rtlsdr_cancel_async(dev);
}
/* if (fwrite(buf, 1, len, (FILE*)ctx) != len) {
fprintf(stderr, "Short write, samples lost, exiting!\n");
rtlsdr_cancel_async(dev);
}
*/
if (fm_receiver_addr == NULL)
{
virtual_base = mmap(NULL, HPS2FPGA_SPAN, PROT_WRITE, MAP_PRIVATE, fd, HPS2FPGA_BASE);
if (virtual_base == MAP_FAILED)
{
perror("mmap");
close(fd);
exit(1);
}
fm_receiver_addr = (unsigned char*)(virtual_base + FM_DEMOD_OFFSET);
}
int i, j;
for (i = 0; i < len; i++)
{
*fm_receiver_addr = buf[i];
for (j = 0; j < 150; j++);
}
if (bytes_to_read > 0)
bytes_to_read -= len;
}
}
You see I commented fwrite() function (it's used by the original code to write to files) and replaced it with my code that writes to my FPGA component: *fm_receiver_addr = buf[i];. Before that I check the address to see if it's valid and obtain another address if it's not.
For the other way, the driver module, I wrote this code:
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/device.h>
#include <linux/platform_device.h>
#include <linux/uaccess.h>
#include <linux/ioport.h>
#include <linux/io.h>
#define HPS2FPGA_BASE 0xC0000000
#define HPS2FPGA_SPAN PAGE_SIZE
void* fm_demod_addr;
int i;
// Get a driver entry in Sysfs
static struct device_driver fm_demod_driver =
{
.name = "fm-demodulator", // Name of the driver
.bus = &platform_bus_type, // Which bus does the device exist
};
// Function that is used when we read from the file in /sys, but we won't use it
ssize_t fm_demod_read(struct device_driver* drv, char* buf)
{ return 0; }
// Function that is called when we write to the file in /sys
ssize_t fm_demod_write_sample(struct device_driver* drv, const char* buf, size_t count)
{
if (buf == NULL)
{
pr_err("Error! String must not be NULL!\n");
return -EINVAL;
}
for (i = 0; i < count; i++)
{
iowrite8(buf[i], fm_demod_addr);
}
return count;
}
// Set our module's pointers and set permissions mode
static DRIVER_ATTR(fm_demod, S_IWUSR, fm_demod_read, fm_demod_write_sample);
// Set module information
MODULE_LICENSE("Dual BSD/GPL");
MODULE_AUTHOR("Siraj Muhammad <sirajmuhammad#outlook.com>");
MODULE_DESCRIPTION("Driver for FPGA component 'FM Demodulator'");
static int __init fm_demod_init(void)
{
int ret;
struct resource* res;
// Register driver in kernel
ret = driver_register(&fm_demod_driver);
if (ret < 0)
return ret;
// Create file system in /sys
ret = driver_create_file(&fm_demod_driver, &driver_attr_fm_demod);
if (ret < 0)
{
driver_unregister(&fm_demod_driver);
return ret;
}
// Request exclusive access to the memory region we want to write to
res = request_mem_region(HPS2FPGA_BASE, HPS2FPGA_SPAN, "fm-demodulator");
if (res == NULL)
{
driver_remove_file(&fm_demod_driver, &driver_attr_fm_demod);
driver_unregister(&fm_demod_driver);
return -EBUSY;
}
// Map the address into virtual memory
fm_demod_addr = ioremap(HPS2FPGA_BASE, HPS2FPGA_SPAN);
if (fm_demod_addr == NULL)
{
driver_remove_file(&fm_demod_driver, &driver_attr_fm_demod);
driver_unregister(&fm_demod_driver);
release_mem_region(HPS2FPGA_BASE, HPS2FPGA_SPAN);
return -EFAULT;
}
return 0;
}
static void __exit fm_demod_exit(void)
{
// Remove file system from /sys
driver_remove_file(&fm_demod_driver, &driver_attr_fm_demod);
// Unregister the driver
driver_unregister(&fm_demod_driver);
// Release requested memory
release_mem_region(HPS2FPGA_BASE, HPS2FPGA_SPAN);
// Un-map address
iounmap(fm_demod_addr);
}
module_init(fm_demod_init);
module_exit(fm_demod_exit);
And I revert the userspace code to its original state, and pass the driver path: /sys/bus/platform/drivers/fm-demodulator/fm_demod to the userspace app to write to it.
Any thought about it?
Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
PC is at scheduler_ipi+0x8/0x4c
LR is at handle_IPI+0x10c/0x19c
pc : [<800521a0>] lr : [<800140d4>] psr: 80000193
[snip]
Code: eaffffad 80564700 e92d4800 e1a0200d (4c4c9b50)
---[ end trace 9e492cde975c41f9 ]---
No one can probably absolutely know the answer. Note: undefined instruction!
The PC is at scheduler_ipi+0x8/0x4c, this is hardcore ARM-Linux scheduling; an inter-processor interrupt. You can disassemble the 'Code:' part to help,
0: eaffffad b 0xfffffebc
4: 80564700 subshi r4, r6, r0, lsl #14
8: e92d4800 push {fp, lr}
c: e1a0200d mov r2, sp
10: 4c4c9b50 mcrrmi 11, 5, r9, ip, cr0
The crash is at the instruction mcrrmi and this appears to be non-sense. If you disassemble sched/core.o you will see the instruction sequence, but I bet that the '4c4c9b50' value is corrupt. Ie, this is not the code the compiler generated.
So the problem is when passing from USB device to FPGA, either using mmap() or a driver module.
I will use a zen move and think a little. The USB device use DMA? Your FPGA is probably also some how in control of the ARM/AXI bus. I would at least consider the possibility that the FPGA is corrupting a bus cycle and perhaps flipping address bits and causing a phycial write to kernel code space. This can happen when you use an innocent by-stander like a DMA peripheral. The ARM CPU will use cache and burst everything.
Things to check,
The code address in (brackets) is reported as the compiler produced. If not, hardware has probably corrupted things. It is hard for Linux code to do this as the kernel code pages are typically R/O.
You should also produce disassembler for any code and see what register is in effect. For instance, the (4c4c9b50) code can be found with,
printf '\x50\x9b\x4c\x4c' > arm.bin
objdump -marm -b binary -D arm.bin
You can just objdump vmlinux to find the assembler for the scheduler_ipi routine and then determine what a pointer might be. For instance, if this_rq() was in R9 and R9 is bogus, then you have a clue.
If the code is corrupt, you need a bus analyzer and/or some routine to monitor the location and report whenever it changes to try and locate the source of corruption.
I've been banging my head with this for the last 3-4 days and I can't find a DECENT explanatory documentation (from ARM or unofficial) to help me.
I've got an ODROID-XU board (big.LITTLE 2 x Cortex-A15 + 2 x Cortex-A7) board and I'm trying to understand a bit more about the ARM architecture. In my "experimenting" code I've now arrived at the stage where I want to WAKE UP THE OTHER CORES FROM THEIR WFI (wait-for-interrupt) state.
The missing information I'm still trying to find is:
1. When getting the base address of the memory-mapped GIC I understand that I need to read CBAR; But no piece of documentation explains how the bits in CBAR (the 2 PERIPHBASE values) should be arranged to get to the final GIC base address
2. When sending an SGI through the GICD_SGIR register, what interrupt ID between 0 and 15 should I choose? Does it matter?
3. When sending an SGI through the GICD_SGIR register, how can I tell the other cores WHERE TO START EXECUTION FROM?
4. How does the fact that my code is loaded by the U-BOOT bootloader affect this context?
The Cortex-A Series Programmer's Guide v3.0 (found here: link) states the following in section 22.5.2 (SMP boot in Linux, page 271):
While the primary core is booting, the secondary cores will be held in a standby state, using the
WFI instruction. It (the primary core) will provide a startup address to the secondary cores and wake them using an
Inter-Processor Interrupt(IPI), meaning an SGI signalled through the GIC
How does Linux do that? The documentation-S don't give any other details regarding "It will provide a startup address to the secondary cores".
My frustration is growing and I'd be very grateful for answers.
Thank you very much in advance!
EXTRA DETAILS
Documentation I use:
ARMv7-A&R Architecture Reference Manual
Cortex-A15 TRM (Technical Reference Manual)
Cortex-A15 MPCore TRM
Cortex-A Series Programmer's Guide v3.0
GICv2 Architecture Specification
What I've done by now:
UBOOT loads me at 0x40008000; I've set-up Translation Tables (TTBs), written TTBR0 and TTBCR accordingly and mapped 0x40008000 to 0x8000_0000 (2GB), so I also enabled the MMU
Set-up exception handlers of my own
I've got Printf functionality over the serial (UART2 on ODROID-XU)
All the above seems to work properly.
What I'm trying to do now:
Get the GIC base address => at the moment I read CBAR and I simply AND (&) its value with 0xFFFF8000 and use this as the GIC base address, although I'm almost sure this ain't right
Enable the GIC distributor (at offset 0x1000 from GIC base address?), by writting GICD_CTLR with the value 0x1
Construct an SGI with the following params: Group = 0, ID = 0, TargetListFilter = "All CPUs Except Me" and send it (write it) through the GICD_SGIR GIC register
Since I haven't passed any execution start address for the other cores, nothing happens after all this
....UPDATE....
I've started looking at the Linux kernel and QEMU source codes in search for an answer. Here's what I found out (please correct me if I'm wrong):
When powering up the board ALL THE CORES start executing from the reset vector
A software (firmware) component executes WFI on the secondary cores and some other code that will act as a protocol between these secondary cores and the primary core, when the latter wants to wake them up again
For example, the protocol used on the EnergyCore ECX-1000 (Highbank) board is as follows:
**(1)** the secondary cores enter WFI and when
**(2)** the primary core sends an SGI to wake them up
**(3)** they check if the value at address (0x40 + 0x10 * coreid) is non-null;
**(4)** if it is non-null, they use it as an address to jump to (execute a BX)
**(5)** otherwise, they re-enter standby state, by re-executing WFI
**(6)** So, if I had an EnergyCore ECX-1000 board, I should write (0x40 + 0x10 * coreid) with the address I want each of the cores to jump to and send an SGI
Questions:
1. What is the software component that does this? Is it the BL1 binary I've written on the SD Card, or is it U-BOOT?
2. From what I understand, this software protocol differs from board to board. Is it so, or does it only depend on the underlying processor?
3. Where can I find information about this protocol for a pick-one ARM board? - can I find it on the official ARM website or on the board webpage?
Ok, I'm back baby. Here are the conclusions:
The software component that puts the CPUs to sleep is the bootloader (in my case U-Boot)
Linux somehow knows how the bootloader does this (hardcoded in the Linux kernel for each board) and knows how to wake them up again
For my ODROID-XU board the sources describing this process are UBOOT ODROID-v2012.07 and the linux kernel found here: LINUX ODROIDXU-3.4.y (it would have been better if I looked into kernel version from the branch odroid-3.12.y since the former doesn't start all of the 8 processors, just 4 of them but the latter does).
Anyway, here's the source code I've come up with, I'll post the relevant source files from the above source code trees that helped me writing this code afterwards:
typedef unsigned int DWORD;
typedef unsigned char BOOLEAN;
#define FAILURE (0)
#define SUCCESS (1)
#define NR_EXTRA_CPUS (3) // actually 7, but this kernel version can't wake them up all -> check kernel version 3.12 if you need this
// Hardcoded in the kernel and in U-Boot; here I've put the physical addresses for ease
// In my code (and in the linux kernel) these addresses are actually virtual
// (thus the 'VA' part in S5P_VA_...); note: mapped with memory type DEVICE
#define S5P_VA_CHIPID (0x10000000)
#define S5P_VA_SYSRAM_NS (0x02073000)
#define S5P_VA_PMU (0x10040000)
#define EXYNOS_SWRESET ((DWORD) S5P_VA_PMU + 0x0400)
// Other hardcoded values
#define EXYNOS5410_REV_1_0 (0x10)
#define EXYNOS_CORE_LOCAL_PWR_EN (0x3)
BOOLEAN BootAllSecondaryCPUs(void* CPUExecutionAddress){
// 1. Get bootBase (the address where we need to write the address where the woken CPUs will jump to)
// and powerBase (we also need to power up the cpus before waking them up (?))
DWORD bootBase, powerBase, powerOffset, clusterID;
asm volatile ("mrc p15, 0, %0, c0, c0, 5" : "=r" (clusterID));
clusterID = (clusterID >> 8);
powerOffset = 0;
if( (*(DWORD*)S5P_VA_CHIPID & 0xFF) < EXYNOS5410_REV_1_0 )
{
if( (clusterID & 0x1) == 0 ) powerOffset = 4;
}
else if( (clusterID & 0x1) != 0 ) powerOffset = 4;
bootBase = S5P_VA_SYSRAM_NS + 0x1C;
powerBase = (S5P_VA_PMU + 0x2000) + (powerOffset * 0x80);
// 2. Power up each CPU, write bootBase and send a SEV (they are in WFE [wait-for-event] standby state)
for (i = 1; i <= NR_EXTRA_CPUS; i++)
{
// 2.1 Power up this CPU
powerBase += 0x80;
DWORD powerStatus = *(DWORD*)( (DWORD) powerBase + 0x4);
if ((powerStatus & EXYNOS_CORE_LOCAL_PWR_EN) == 0)
{
*(DWORD*) powerBase = EXYNOS_CORE_LOCAL_PWR_EN;
for (i = 0; i < 10; i++) // 10 millis timeout
{
powerStatus = *(DWORD*)((DWORD) powerBase + 0x4);
if ((powerStatus & EXYNOS_CORE_LOCAL_PWR_EN) == EXYNOS_CORE_LOCAL_PWR_EN)
break;
DelayMilliseconds(1); // not implemented here, if you need this, post a comment request
}
if ((powerStatus & EXYNOS_CORE_LOCAL_PWR_EN) != EXYNOS_CORE_LOCAL_PWR_EN)
return FAILURE;
}
if ( (clusterID & 0x0F) != 0 )
{
if ( *(DWORD*)(S5P_VA_PMU + 0x0908) == 0 )
do { DelayMicroseconds(10); } // not implemented here, if you need this, post a comment request
while (*(DWORD*)(S5P_VA_PMU + 0x0908) == 0);
*(DWORD*) EXYNOS_SWRESET = (DWORD)(((1 << 20) | (1 << 8)) << i);
}
// 2.2 Write bootBase and execute a SEV to finally wake up the CPUs
asm volatile ("dmb" : : : "memory");
*(DWORD*) bootBase = (DWORD) CPUExecutionAddress;
asm volatile ("isb");
asm volatile ("\n dsb\n sev\n nop\n");
}
return SUCCESS;
}
This successfully wakes 3 of 7 of the secondary CPUs.
And now for that short list of relevant source files in u-boot and the linux kernel:
UBOOT: lowlevel_init.S - notice lines 363-369, how the secondary CPUs wait in a WFE for the value at _hotplug_addr to be non-zeroed and to jump to it; _hotplug_addr is actually bootBase in the above code; also lines 282-285 tell us that _hotplug_addr is to be relocated at CONFIG_PHY_IRAM_NS_BASE + _hotplug_addr - nscode_base (_hotplug_addr - nscode_base is 0x1C and CONFIG_PHY_IRAM_NS_BASE is 0x02073000, thus the above hardcodings in the linux kernel)
LINUX KERNEL: generic - smp.c (look at function __cpu_up), platform specific (odroid-xu): platsmp.c (function boot_secondary, called by generic __cpu_up; also look at platform_smp_prepare_cpus [at the bottom] => that's the function that actually sets the boot base and power base values)
For clarity and future reference, there's a subtle piece of information missing here thanks to the lack of proper documentation of the Exynos boot protocol (n.b. this question should really be marked "Exynos 5" rather than "Cortex-A15" - it's a SoC-specific thing and what ARM says is only a general recommendation). From cold boot, the secondary cores aren't in WFI, they're still powered off.
The simpler minimal solution (based on what Linux's hotplug does), which I worked out in the process of writing a boot shim to get a hypervisor running on the XU, takes two steps:
First write the entry point address to the Exynos holding pen (0x02073000 + 0x1c)
Then poke the power controller to switch on the relevant core(s): That way, they drop out of the secure boot path into the holding pen to find the entry point waiting for them, skipping the WFI loop and obviating the need to even touch the GIC at all.
Unless you're planning a full-on CPU hotplug implementation you can skip checking the cluster ID - if we're booting, we're on cluster 0 and nowhere else (the check for pre-production chips with backwards cluster registers should be unnecessary on the Odroid too - certainly was for me).
From my investigation, firing up the A7s is a little more involved. Judging from the Exynos big.LITTLE switcher driver, it seems you need to poke a separate set of power controller registers to enable cluster 1 first (and you may need to mess around with the CCI too, especially to have the MMUs and caches on) - I didn't get further since by that point it was more "having fun" than "doing real work"...
As an aside, Samsung's mainline patch for CPU hotplug on the 5410 makes the core power control stuff rather clearer than the mess in their downstream code, IMO.
QEMU uses PSCI
The ARM Power State Coordination Interface (PSCI) is documented at: https://developer.arm.com/docs/den0022/latest/arm-power-state-coordination-interface-platform-design-document and controls things such as powering on and off of cores.
TL;DR this is the aarch64 snippet to wake up CPU 1 on QEMU v3.0.0 ARMv8 aarch64:
/* PSCI function identifier: CPU_ON. */
ldr w0, =0xc4000003
/* Argument 1: target_cpu */
mov x1, 1
/* Argument 2: entry_point_address */
ldr x2, =cpu1_entry_address
/* Argument 3: context_id */
mov x3, 0
/* Unused hvc args: the Linux kernel zeroes them,
* but I don't think it is required.
*/
hvc 0
and for ARMv7:
ldr r0, =0x84000003
mov r1, #1
ldr r2, =cpu1_entry_address
mov r3, #0
hvc 0
A full runnable example with a spinlock is available on the ARM section of this answer: What does multicore assembly language look like?
The hvc instruction then gets handled by an EL2 handler, see also: the ARM section of: What are Ring 0 and Ring 3 in the context of operating systems?
Linux kernel
In Linux v4.19, that address is informed to the Linux kernel through the device tree, QEMU for example auto-generates an entry of form:
psci {
method = "hvc";
compatible = "arm,psci-0.2", "arm,psci";
cpu_on = <0xc4000003>;
migrate = <0xc4000005>;
cpu_suspend = <0xc4000001>;
cpu_off = <0x84000002>;
};
The hvc instruction is called from: https://github.com/torvalds/linux/blob/v4.19/drivers/firmware/psci.c#L178
static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
which ends up going to: https://github.com/torvalds/linux/blob/v4.19/arch/arm64/kernel/smccc-call.S#L51
Go to www.arm.com and download there evaluation copy of DS-5 developement suite. Once installed, under the examples there will be a startup_Cortex-A15MPCore directory. Look at startup.s.