contact@embeddedgeeks.com
Embedded World

ARM Memory Management

The scope of this documentation is to understand the Memory Management Unit for ARMv8 Based processor. Memory management Unit converts the virtual Address (in CPU’s logical space) into Physical Address. For example let us suppose in the following program:

int variable;
printf(“Addrss of variable = 0x%x\n”, &variable);

The address could be anything (Let’s assume 0x40000200). Now 0x40000200 may or may not the actual memory address in the Physical Memory (RAM). It could be anything thing (lets assume 0xA0000200). Thus the CPU produce the logical address 0x40000200 which is converted into the physical address 0xA0000200 by the Memory Management Unit.

Now the question remains Why we require an Address Translation, or in other word in the above program why we don’t operate on actual physical memory 0xA0000200?
Let us suppose a program that requires a huge amount of contagious memory in the RAM. Now our external memory would have that much memory require for the program/process, but it may or may not have the memory in contagious fashion. Even though we are accessing a logical address range of contagious memory, in actual scenario the physical memory that are linked to are scattered and the scattered page table are linked by the logical contagious memory range by our MMU.

Now if we draw a simple diagram on how the MMU is connected to RAM, it would look something like:

Thus, whenever a CPU produce a virtual address, the MMU looks inside a table for the corresponding physical address. This Table is refered as Translation Look-aside Buffer.
The Translation Lookaside Buffer (TLB) is a cache of recently accessed page translations in the
MMU.
Each TLB entry typically contains not just physical and Virtual Addresses, but also attributes such as memory type, cache policies, access permissions, the Address Space ID (ASID), and the Virtual Machine ID (VMID). If the TLB does not contain a valid translation for the Virtual Address issued by the processor, known as a TLB miss, an external translation table walk or lookup is performed. Dedicated hardware within the MMU enables it to read the translation tables in memory. The newly loaded translation can then be cached in the TLB for possible reuse if the translation table walk does not result in a page fault.

ARMv8 MMU Registers

Translation Address Base Register

  • In ARMv8 based system, the TLB entries in the Main memory, is specified by a special register called Translation Address Base Register (TBR0_ELx or TTBR1_EL1).
  • TTBR0 is selected if the upper Bits of VA are 0’s and TTBR1 is selected if upper Bits of VA are 1’s.
  • EL2 & EL3 has TTBR0 but no TTBR1, which means EL3 uses VA ranging from 0x00 to 0x0000FFFF_FFFFFFFF.

Translation Table Control Register

  • Top Bit Ignore (TBI) indicates that the top 16 Bits of PA must be 0 or 1. Which means that the PA of any general purpose register must be either 0x0000 or 0xFFFF. Any attempt to use different value would trigger a fault.
  • IPS (Intermediate Physical Address Size) field indicates the maximum output address size (‘000’ = 32 Bits, ‘101’ = 48 Bits).  
  • Translation Granule (TGx): Granule size of kernel or User Space. (’00’ = 4KB, ’01’ = 16KB, ’11’ = 64KB).
  • TxSZ: The translation would require three or four level. The level is calculated by the granule size and the value stored in Translation Table Size. 
  • SHx – TBD
  • ORGNx – TBD
  • IRGNx – TBD

Example of a level 4 translation

Important Formula to be consider 

  1. Granule = LOG2(page_size)
  2. inputSize = 64 – UINT(TCR_ELn.TxSZ)
  3. stride = Granule – 3
  4. level = 4 – ROUNDUP((inputSize – Granule)/Stride)
  5. For each level starting from low level; AddrSelTop = inputSize – 1; 
  6. AddrSelBottom = ((3 – level) * Stride + Granule); 
  7. After which on the consequent level, the AaddrSelTop becomes (AaddrSelBottom -1)
  8. TBD

So, in above diagram, considering the page size as 64KB and TxSZ as 22;Granule = log2(64*1024) = 16inputSize = 64 – 22 = 42stride = 16 – 3 = 13level = 4 – ((42 – 16)/13) = 4 – (26/13) = 4 – 2 = 2addrSelBottom = inputSize (just consider to fit in logic)for (level = 2; level < 4; level ++) {addrSelTop = addrSelBottom – 1addrSelBottom = ((3 – 2)*13 + 16) = 13 + 16 = 29// process addrSelTop & addrSelBottom to get next level table/block entry}
NOTE: In the above diagram, by the formula, the addrSelTop BIT for level 2 is 41, and addrSelBottom is derived as 29. On summering, we get:

If VA[63:42] = 1 then TTBR1 is used for the base address for the first-page table. When VA[63:42] = 0, TTBR0 is used for the base address for the first-page table.

The page table contains 8192 64-bit page table entries and is indexed via VA[41:29]. The MMU reads the pertinent level 2 page table entry from the table.

The MMU checks the level 2 page table entry for validity and whether or not the requested memory access is allowed. Assuming it is valid, the memory access is allowed.

 In above figure, the level 2 page table entry refers to the address of the level 3 page table (it is a table descriptor).

Bits [47:16] are taken from the level 2 page table entry and form the base address of the level 3 page table.

Bits [28:16] of the VA are used to index the level 3 page table entry. The MMU reads the pertinent level 3 page table entry from the table.

The MMU checks the level 3 page table entry for validity and whether or not the requested memory access is allowed. Assuming it is valid, the memory access is allowed.

In above figure, the level 3 page table entry refers to a 64KB page (it is a page descriptor).

Bits [47:16] are taken from the level 3 page table entry and used to form PA[47:16].

Because we have a 64KB page, VA[15:0] is taken to form PA[15:0].

The full PA[47:0] is returned, along with additional information from the page table entries.

Presence of EL2

The virtualization extensions to the ARMv8-A architecture introduce a second stage of translation. When a hypervisor is present in the system, one or more guest operating systems might be present.The hypervisor must perform some extra translation steps in a two stage process to share the physical memory system between the different guest operating systems. In the first stage, a Virtual Address (VA) is translated to an Intermediate Physical Address (IPA). This is usually under OS control. A second stage, controlled by the hypervisor, then performs translation of the IPA to the final Physical Address (PA).

Secure State EL3_MON

The Secure monitor EL3 has its own dedicated translation tables. The table base address is specified in TTBR0_EL3 and configured via TCR_EL3. Translation tables are capable of accessing both Secure and Non-secure Physical Addresses. TTBR0_EL3 is used only in Secure monitor EL3 mode, not by the trusted kernel itself. When the transition to Secure world has completed, the trusted kernel uses the EL1 translations, that is, the translation tables pointed to by TTBR0_EL1 and TTBR1_EL1. As these registers are not banked in AArch64, Securemonitor code must configure new tables for the Secure world and save and restore copies of TTBR0_EL1 and TTBR1_EL1.
The EL1 translation regime behaves differently in Secure state, compared to its normal operation in Non-secure state. The second stage of translation is disabled and the EL1 translation regime is now able to point to both Secure or Non-secure Physical Addresses.Entries in the TLB are tagged as Secure or Non-secure, so that no TLB maintenance is ever required when you transition between Secure and Normal worlds.

REFERENCES

  1. ARMv8 Architectural Reference Manual
  2. Pseudo code for MMU operations for ARMv8
  3. https://sourabhemsec.blogspot.com/