Contents
Welcome to the Computer Organisation and Architecture course!
(At least the IITM course webpage mentions) This course will teach the fundamentals of Computer Organization and Architecture and elaborate on the Application Binary Interfaces described in Course CS2300.
Briefly we will cover:
- Hardware Software interface, the ISA, assembly programming
- Context switching, Interrupts and the correspoding ISA support
- Memory management, paging and the corresponding ISA support
- Memory heirarchy and caches
- Pipelining and instructoin level parallelism
In short we will see how the gate level hardware is combined to get your laptops and systems running!!
The lab component will build deeper understanding through actually implementing these concepts in code.
Here are some quick links for you to follow:
Schedule
Resources
Textbooks
- PAT, HAN
- Shen
Addtional resources
Paging
Till now we have seen that an \(n\) bit CPU would imply that there are \(2^n\) addressible locations and with byte addressible 64-bit systems that amounts to 16 exabytes !!
Fyi, Google data centers have 27 petabytes of storage!
We therefore use virtual memory. Memory in our programs is #virtual, in the sense that we can assume that the total available memory is \(2^{64}\)(or\(2^{32}\)). Typically the picture that we would have in mind is the following:
Also, we want to run various programs (including our dear OS) at once, keeping data and code separate is exteremley important to ensure functional correctness.
One solution could be allocating disjoint segments of memory to every program and keep track of the "address space (program's section of memory)" associated with every process.
Typically the programs we would write are of KBs and some MBs of stack might be allocated, but allocating \(2^{32}\) bytes for every program is an overkill + infeasible.
Another solution could be allocating space dynamically to programs like the following:
Issues with this scheme is we need to know before hand how much space a program occupies and then find a hole (empty space in memory) than can accomodate our process. It is possible that any given hole might not be able to accomodate the program, but the total space occupied by the holes over the memory might be sufficient.
Above all this makes programming difficult and we are proud lazy engineers!
The solution to above problems is using virtual memory and paging.
Virtual memory: The programmer assumes that the address space of the program is \(2^n\) bytes wide for n-bit systems. The adress starts at 0x0
.
Paging: The main memory is divided into chunks typically of 4KBs called pages.
How does paging work?
The adresses generated by the CPU contain virtual address. These are specified by the programmer. To actually access the Data from the RAM, we need a translator in hardware (see figure)
The virtual address space of the process is divided into pages. 4KB = \(2^{12}\) bytes and therefore \(2^{12}\) bytes are grouped together in pages. Assume we have a 32 bit system. Thus, the VA contains \(2^{20}\) virtual pages. Each page needs to be mapped a physical page (aka frame) in the RAM, and this translation is done in the hardware using a page table
Pages in main memory need not remain contiguous.
Main memory (DRAM) is a very small space. We typically have SSDs that meant only for storage, and most of our programs are stored there. In order to run a program we need to load text
section in main memory.
-
When all pages are not needed, we can do demand paging i.e., load only those pages which are required, into the main memory and write their translations to the page table.
-
A present bit has to be stored with every translation to check if the page we are asking for is there in main memory.
-
Since main memory is limited, if another page for a process has to be loaded, we need to replace a page. If the page was not changed, we can discard it, since its copy sits in the disk already, otherwise we need to update the copy in the disk with the changed version of the replaced page.
-
A dirty bit has to be maintained to check if a given page was changed after loading into memory.
What exaclty is happening?
Consider the instruction ld x1, 0(x2)
the user has set up some address in x2 which is virtual and wants to load the value into register x1
.
Assuming 32 bit addresses, the first 20 bits are passed into the page table, and the physical page number is found.
Note that:
- the page table is indexed using virtual page number (first 20 bits)
- the PPN (frame number) is essentially the physical address of the 0th entry of that frame.
- the offset inside the page doesnt change in translation.
After getting the PPN, the address to be accessed is PPN + offset(12)
. The physical address is accessed and the value stored is loaded into x1.
// more content ...
References
- Onur Mutlu's slides on virtual memory
- Prof. Gopal's slides for 2025 offering
- Operating Systems in Three Easy Pieces(OSTEP)
- Prof. Smruti Sarangi's lecture
Lab 2 - paging
Recall from the lectures on Paging, how the main memory is divided into chunks (generally of 4KB) called pages for efficient memory management and process isolation.
What does it mean in code?
-
Note that the CPU generates addresses through the load and store instructions. For instance
ld x1, 0(x2)
expresses that the value stored at address = 0 + value ofx2
should be written to x1. But this address is the virtual address and we cannot use it to access the memory. -
Thus we need a address translation system (which is stored as a look-up table, indexed using the virtual address).
-
In any paging system (single or mutli level), we simply need the mapping between the virtual and the physical page numbers, since the offset within a page remains the same.
-
Since each process has its own translation, the
satp
register specifies where is the root page table located in memory (note that this is a physical address, in fact all page table entries contain physical addresses only, the hardware has nothing to do with virtual addresses) -
So all we have to do is get the physical page number, index inside it using the offset and we are all set.
Steps for the lab:
-
In main switch from machine mode to supervisor mode.
-
Initialise your page tables as given below. We will perform Sv39 paging
- In the data section of the code, set
satp_config
to0x8000000000081000
. This tells your OS where to look for when the translation is needed. - Note that the first 4 bits specify the mode (8/9/10 for Sv39, Sv48 and Sv57 resp.). The last 12 bits of the address are dropped since we want to specify the start of the page only. Therefore the address becomes
0x8000000000081000
- Make sure to do
.align 12
. This is required to make sure that data and instructions remain in different pages. - Map the page table entries to the pages shown. For example for setting the 0th entry of the root page table to the page with base address
0x82001000
:
li t0, 0x0081000000 li t1, 0x0082001 slli t1, t1, 10 ori t1, t1, 0x1 sd t1, 0(t0)
Recall that the last 10 bits are the permission bits. For non-leaf pages we may get away with setting only the valid bit to 1.
- In the data section of the code, set
-
Now set the leaf page table's entries to pages that contain your data and instructions.
- The code section (without translation, bare metal) by default starts at
0x80000000
. This should be retained i.e., this should not change after translation. 0x0
should map to where the user code starts i.e.,0x80001000
and0x1000
should map to the data section0x80002000
. (Hint : Try to figure out which Page table entries should be filled with which values, the three entries u should fill are indicated in the diagram)- Note that now we will need more permissions!
- DO NOT set the dirty bit to 1
- USER bit should be 1 only in pages concerning user code and data
- R,W,X,V,G,A should also be 1
- The code section (without translation, bare metal) by default starts at
-
Prepare to jump into the supervisor mode.
-
Copy paste the following snippet for satp_configuration and TLB settings.
la t1, satp_config # load satp val
ld t2, 0(t1)
sfence.vma zero, zero
csrrw zero, satp, t2
sfence.vma zero, zero
li t4, 0
csrrw zero, sepc, t4
sret
- Write the following user code:( use
.align 12
)
user_code:
la t1,var1
la t2,var2
la t3,var3
la t4,var4
j user_code
- Add the following to the data section of your code:
.data
.align 12
var1: .word 1
var2: .word 2
var3: .word 3
var4: .word 4
-
After jumping into the user mode, you should observe that the addresses are now virtual in spike. This means your translation was successful and while writing user code u can freely assume addresses to be virtual and write programs accordingly!!
-
To compile use the following commands:
$ riscv64-unknown-elf-gcc -nostartfiles -T linker.ld <your-code>.S
$ riscv64-unknown-elf-objdump -D a.out > dump
Authors
Shishu the great