I am working on a SoC with several Cortex M7 cores. It has SRAM mapped to region 0x2000 0000 -> 0x3FFF FFFF and DDR mapped to 0x6000 0000 -> 0xDFFF FFFF.
It seems that configuring the 4 partitions of DDR to cached normal memory (WT, WB or WBA) is triggering an HW bug which freezes the whole chip after a few seconds. Even the debugger get disconnected. Note that I do not need to access the DDR to raise the problem. Just running code in SRAM to configure the MPU and do various stuff will, after a random delay, trigger the bug.
Is there any limitation to the attributes I set when configuring the MPU?
I would say only the system space in 0xE000 0000 has fixed attributes and others are freely configurable depending on the HW implementation, but I have a doubt because if I refer to ARMv7-M arch ref manual, I can for instance find this:
B3.1 The system address map (p. 588)
[...] A declared cache type can be demoted but not promoted [...]
I am not sure here if this is just a limitation during runtime for two sequential configurations or an absolute limitation that forbids enabling the cache for partitions at 0xA... and 0xC... even once during init as they are not enabled in the default address map.
Also, the documentation indicates that the DDR is normal memory, but is there any problem if I keep it configured by default as device space (without taking into account slower non re-ordered accesses inherent to this configuration)?
Here is the exact configuration for the 4 DDR partitions:
RBAR -> 60000000 80000000 A0000000 C0000000 (start addresses)
RASR => 03080039 03080039 13010039 13100039 (default settings)
RASR => 03020039 03020039 03020039 03020039 (new settings triggering the bug)
Note that for the SRAM I keep the same setting:
RBAR -> 20000000
RASR => 030B0039
This question already has answers here:
how does the processor read memory?
(3 answers)
Closed 3 years ago.
I am trying to learn how memory is arranged and handled by a computer, and I don't catch the alignment concept.
For instance, in a 32-bit architecture, why do we say that short (2 bytes) are unaligned if they fit entirely within a single 32-bit word, even if they are not located at an even address?
Because if the processor reads 32 bits by 32 bits and a char is at address x0 then is followed by a short (address x01 and x02) then is followed by another char (x03). Suddenly there is no problem since there will be no cut data since the processor reads 4 bytes.
So the short is aligned, isn't it?
The question suggests a processor that has 32 wires connected to a bus, for data, with possibly other wires for control. When it wants data from memory, it puts an address on the bus, requests a read from memory, waits for the data, and reads it through those 32 wires.
In typical processor designs, those 32 wires are connected to some temporary internal register which itself has connections to other registers. It is easy to move those 32 bits around as a block, with each bit going on its own wire.
If we want to move some of the bits within the 32, we need to shift them. This might be done with various hardware, such as a shifting unit that we put bits into, request a certain amount of shift, and read a result from. Internally, that shifting unit will have a variety of connections and switches to do its job.
Typically, such a shifting unit will be able to move eight bits from any of four positions (starting at bits 0, 8, 16, or 24) to the base position (0). That way, an instruction such as “load byte” can be effected by reading 32 bits from memory (because it only comes in 32-bit chunks), then using the shifting unit to get the desired byte. That shifting unit might not have the wires and switches needed to move any arbitrary set of bits (say, starting at 7, 13, or 22) to the base position. That would take many more wires and switches.
The processor also needs to be able to effect a load-16-bits instruction. For that, the shifting unit will be able to move 16 bits from positions 0 or 16 to position 0. Certainly the engineers could design it to also move 16 bits from position 8 to position 0. But that requires more wires and switches, which cost money, silicon, and energy. In many processors, a decision was made that this expense was not worthwhile, so the capability is not implemented.
In consequence, the hardware simply cannot shift data from bytes 1 and 2 to bytes 0 and 1 in the course of the loading process. (There might be other shifters in the processor, such as in a general-purpose logic unit for implementing shift instructions, but those are generally separate and accessed through instruction dispatching and control mechanisms. They are not in the line of components used in loading from memory.)
Alignment is a definition. Assuming 8 bit bytes and the memory is byte addressable. an 8 bit byte (unsigned char) cannot be unaligned. a 16 bit halfword to be aligned must have the lsbit zero. A 32 bit word the lower two bits zero, 64 bit doubleword three bits zero and so on. So if your 16 bit unsigned short is on an odd address then it is unaligned.
A "32 bit system" does not mean a 32 bit bus, bus widths do not necessarily match the size of the processor registers or instruction size or whatever. No reason to make that assumption. Saying that though, if you are talking MIPS or ARM then yes the buses are most likely 32 or 64 bit for their 32 bit register processors and 64 or perhaps 128 for 64 bit processors, likely 64 bit. But an x86 has 8 bit instructions with 8,16,32,64 bit registers and variable length instructions when you add up the bytes it can possibly take, there is no way to classify its sizes is it an 8 bit processor with its 8 bit instructions 32 or 64 due to its larger register sizes or 128,256,512 etc due to its bus sizes?
You mentioned 32, let's stick with that. I want to walk through an array of bytes, I want to do writes. I have a 32 bit wide data bus one of the typical designs you see today. Let's say the other side is a cache and it is built of 32 bit wide srams to line up with the processor side bus, we won't worry about how the dram is implemented on the other side. So you will likely have a write data bus, a read data bus and either separate write address and read address or one address bus with a way to indicate a read/write transaction.
As far as the bus is concerned all transactions are 32 bit, you don't necessarily expect the unused byte lanes to float, z state, you expect them to be high or low for valid clocks on that bus (Between valid clock cycles sure the bus may go high-z).
A read transaction will typically be and let's assume be an aligned address to the bus width so a 32 bit aligned address (either on the bus or on the far side). There isn't usually a notion of byte lane enables on a read, the processor internally isolates the bytes of interest and discards the others. Some have a length field on the address bus where it makes sense. plus cache control signals and other signals.
An aligned 32 bit read would be say address 0x1000 or 0x1004 length of 0 (n-1), the address bus does its handshake with a unique transaction id, later on the read data bus ideally a single clock cycle will contain that 32 bits of data with that id, the processor sees that and completes the transaction (might be more handshaking) and extracts all 4 bytes and does what the instruction said to do with them.
A 64 bit access aligned on a 32 bit boundary would have a length of one, one address bus handshake, two clocks cycles worth of data on the read data bus. A 16 bit aligned transaction at 0x1000 or 0x1002 will let's say be a read of 0x1000 and the processor will discard either lanes 0 and 1 or lanes 2 and 3, some bus designs align the bytes on the lower lanes so you might see a bus where the two bytes always come back on lanes 0 and 1 for a 16 bit read.
An unaligned 32 bit read would take two bus cycles, twice the overhead, twice the number of clocks a 0x1002 32 bit read is one 0x1000 read where the processor saves 2 of the bytes, then a 0x1004 read and the processor saves two of those byte combines them into the 32 bit number and then does what the instruction says so instead of 5 or 8 or whatever the minimum is for this bus it is now twice as many and likely not interleaved but back to back.
An unaligned 16 bit at address 0x1001 would be a single 32 bit read hopefully but an unaligned 16 bit read at address 0x1003 is two transactions now twice the clocks twice the overh head one at 0x1000 and one at 0x1004 saving one byte each.
Writes are the same but with an additional penalty. Aligned 32 bit writes, say at 0x1000 one bus transaction, address, write data, done. The cache being 32 bits wide in this example could simply write those 32 bits to sram in one sram transaction. An unaligned 32 bit write say at 0x1001, would be two complete bus transactions as expected taking twice the number of bus clocks but also the sram will take two or more number of clocks as well because you need to read-modify-write the sram you can't just write. in order to write the 0x1001 to 0x1003 bytes you need to read 32 bits from sram, change three of those bytes not changing the lower one, and write that back. Then when the other transaction comes in you write the 0x1004 byte while preserving the other three in that sram location.
All byte writes are a single bus transaction per, but all also incur the read-modify-write. Note that depending on how many clocks the bus takes and how many transactions you can have in flight at a time, the read-modify-write of the sram might be invisible you might not be able to get data to the cache fast enough to have a bus transaction have to wait on the sram read-modify-write, but in another similar question since this has been asked so many times here, there is a platform where this was demonstrated.
So you can now tell me how the 16 bit write transactions are going to go, they also incur the read-modify-write at the cache for every one of them, if the address is say 0x1003 then you get two bus transactions and two read-modify-writes.
One of the beauties of the cache though is that even though drams come in 8, 16, 32 bit parts (count how many chips are on a dram stick, often 8 or 9, 4 or 5 or 2 or 3 or some multiple of those. 8 is likely a 64 bit wide bus 8 bits per part, 16 64 bit wide, 8 bits per part, dual rank and so on) the transactions are done in 32 or 64 bit widths, which is kind of the point of a cache. If we were to have to do a read-modify-write at the drams slow speeds that would be horrible, we read-modify-write at the cache/sram speed, then all transactions, cache line evictions and fills are at multiples of the dram bus width so 64 or 2x64 or 4x64 etc per cache line.
I am working with 28c16 2kb parallel eeprom. It has 11 address pins to select one of 2000 bytes we want to work with and 8 I/O pins for reading or writing to that byte. There is an OC (output enable) pin which, when grounded, gives output of selected byte from 8 I/O pins. Similarly, there is a WE (Write enable) pin which, when given low pulse of width less than 1 microsecond, writes to selected byte taking data from I/O pins. The datasheet of this chip says that the width of pulse on WE pin to write onto the selected byte must be between 100 to 1000 nano seconds. The problem is that I want to use arduino to program this chip. But how can I generate 100-1000 nanosecond pulse using arduino? The lowest delay time in arduino is 1 microsecond (1000 ns) plus time taken by digitalWrite and digitalRead functions (working with ports directly still takes more that 120 ns more). So it exceeds 1 microsecond..... Is there any way to generate pulse of width less than one microsecond?
I'm currently working with an Atmel SAM3X8 ARM microcontroller that features a dual banked 2 x 256KB flash memory. I'm trying to implement a firmware update feature, that puts the new firmware into the currently unused flash bank, and when done swaps the banks using the flash remapping to run the new firmware.
The datasheet states to do so I need to set the GPNVM2 bit, then the MCU will remap the memory, so Flash 1 is now at 0x80000 and Flash 0 at 0xC0000. This will also lead to the MCU executing code beginning from Flash 1.
To cite the datasheet:
The GPNVM2 is used only to swap the Flash 0 and Flash 1. If GPNVM2 is ENABLE, the Flash 1 is mapped at
address 0x0008_0000 (Flash 1 and Flash 0 are continuous). If GPNVM2 is DISABLE, the Flash 0 is mapped at
address 0x0008_0000 (Flash 0 and Flash 1 are continuous).
[...]
GPNVM2 enables to select if Flash 0 or Flash 1 is used for the boot.
Setting GPNVM bit 2 selects the boot from Flash 1, clearing it selects the boot from Flash 0.
But when I set GPNVM2, either via SAM-BA or my own firmware using flash_set_gpnvm(2) (ASF SAM Flash Service API), it will still boot from the program in Flash 0, and the new program will still reside at Flash 1's offset 0xC0000. The state of GPNVM2 has been verified by flash_is_gpnvm_set(2)
Flashing the firmware itself to Flash1 bank works flawlessly, that has been verified by dumping the whole flash memory with SAM-BA.
There is an errata from Atmel about an issue, that the flash remapping only works for portions smaller than 64KB. My code is less than that (40KB), so this shouldn't be an issue.
I've not found any other people having this issue, nor any example how to use it, so maybe somebody could tell me if I'm doing something wrong here, or what else to check.
I had the same issue (see here: Atmel SAM3X8E dual bank switching for booting different behaviour).
After some more research I found an Application Note (Link: http://ww1.microchip.com/downloads/en/AppNotes/Atmel-42141-SAM-AT02333-Safe-and-Secure-Bootloader-Implementation-for-SAM3-4_Application-Note.pdf) which explains the boot behaviour of the SAM3X in a more clear way. The problem is that the datasheet is a bit misleading (at least I was confused too). The SAM3X has no ability to remap the the Flash banks. The booting behaviour is a bit different (see the picture in the link, it's a snipped from the Application note, page 33/34):
Booting behaviour SAM3X
Picture 3-9 shows the SAM3X's behaviour at the boot-up. The GPNVM bits 1 and 2 just determine which memory section (ROM/Flash0/Flash1) is mirrored to the boot memory (located at 0x00000000). The mapping of the Flash banks is not changed. Therefore Flash0 still is mapped to 0x00080000 and Flash1 to 0x000C0000).
As the Application Note states some other Atmel microcontrollers are able to really remap the Flash banks (e.g. SAM3SD8 and SAM4SD32/16). These processors change the location of the Flash banks as you can see in picture 3-10.
To be able to update your firmware it is therefore necessary to implement some kind of bootloader. I implemented one by myself and was able to update my firmware even without using the GPNVM bits at all. I also opend a support ticket at Microchip to clarify the booting behaviour. When I receive an answer I hope to tell you more.
EDIT:
Here's the answer from the Microchip support:
Setting the GPNVM2 bit in SAM3X will merely make the CPU 'jump to' or start from flash bank 1 i.e. 0xC0000.
No actual swap of memory addresses will take place.
To use flash bank 1, you will need to change the linker file (flash.ld) to reflect the flash start address 0xC0000.
For flash bank 0 application, change:
rom (rx) : ORIGIN = 0x00080000, LENGTH = 0x00080000 /* Flash, 512K /
to:
rom (rx) : ORIGIN = 0x00080000, LENGTH = 0x00040000 / Flash, 256K */
For flash bank 1 application, change:
rom (rx) : ORIGIN = 0x00080000, LENGTH = 0x00080000 /* Flash, 512K /
to:
rom (rx) : ORIGIN = 0x000C0000, LENGTH = 0x00040000 / Flash, 256K */
If this is not done, the reset handler in the flash 1 application will point to an address in the flash 0 application.
So, although code will start execution in flash 1 (if GPNVM2 is set), it will jump back to the flash 0 application.
The errata stating the 64kb limitation can be ignored.
Therefore the Application Note is right and no actual change of the mmory mapping is performed.
Cheers
Lukas
I have a fundamental doubt regarding NAND chip,
We are trying to bring up custom board based on DM365,
We are trying to boot from the NAND,
NAND used is from micron MT29F8G08ABABA. (1Giga bytes = 8 Gigabits)
Organization
– Page size x8: 4320 bytes (4096 + 224 bytes)
– Block size: 128 pages (512K +28 K bytes)
– Plane size: 2 planes x 1024 blocks per plane
– Device size: 8Gb: 2048 blocks
Now as per datasheet of MT29F8G08BABA i think block size is (512K+224)bytes.
But in u-boot terminologies they use Sector size for NAND device.
Because when i use command
nand info
from u-boot commandline,
I get nand size as follows
Device 0: NAND 1GiB 3,3V 8-bit, sector size 256 KiB
Is this sector size is block size(which is actually 512k as per datasheet) or Environment sector?
NAND read/Write is working fine from u-boot, There is no issue as such.
So i just want to understand these terminologies.
Now if this is environment sector size then is there any way to get block size information from U-boot ?
Can somebody please enlighten me on this ?
Thank you,
Regards,
Ankur
It seems eraseblock size is nothing but the sector size.
Check below link for more information.
Updated link to TI E2E forum question
Regards,
Ankur