ARM64架构Linux内核启动过程分析_上

ARM64架构Linux内核启动过程分析(上)

  • Linux内核版本:5.10.90
  • 硬件:NXP S32G-VNP-RDB2 (4 * A53,ARM64)
image-20230414155940183

1. ROM code

从外部设备(串口、网络、NAND flash、USB磁盘设备或者其他磁盘设备)加载 Linux bootloader。

2. BootLoader

  • 初始化系统中的RAM并将RAM的信息告知kernel
  • 准备好device tree blob的信息并将dtb的首地址告知kernel
  • 解压内核(可选)
  • 加载Linux内核,将控制权转交给内核

在跳入内核之前,必须满足以下条件:

  • 停止所有支持 DMA 的设备,这样内存就不会被占用被伪造的网络数据包或磁盘数据损坏。
  • 主 CPU 通用寄存器设置
    x0 = 系统 RAM 中设备树 blob (dtb) 的物理地址。
    x1 = 0(留作将来使用)
    x2 = 0(留作将来使用)
    x3 = 0(留作将来使用)
  • CPU模式
    所有形式的中断都必须在 PSTATE.DAIF (Debug, SError, IRQ 和 FIQ)。
    CPU 必须处于 EL2 中(推荐以便访问虚拟化扩展)或非安全 EL1。
  • 缓存、MMU
    MMU 必须关闭。
    指令缓存可能打开或关闭。
1
2
3
4
5
6
7
8
9
10
# @file: arch/arm64/kernel/head.S
# 内存管理单元关闭,数据缓存关闭,指令缓存可开可闭
The requirements are:
MMU = off, D-cache = off, I-cache = on or off,
x0 = physical address to the FDT blob.

# bootloader为了性能,很可能是打开了MMU以及各种cache,
# 只是在进入kernel的时候,受限于ARM64 boot protocol而将CPU以及cache、
# MMU等硬件状态设定为指定的状态。因此,实际上这时候,instruction cache
# 以及TLB中很可能有残留的数据,因此后续需要将其清除

3. Linux内核启动

3.1 Linux内核从哪里启动

Linux内核的image有以下几种形式,目前开发板使用未压缩的image。

  • vmlinux:原始内核文件,可引导的、未压缩、可压缩的内核镜像,ELF格式
  • image:未压缩,调用 objcopy -o binary生成原始二进制文件,本质是将符号与重定位信息舍弃,只剩二进制数据。
  • zImage:压缩,二进制文件+gzip压缩

欲知Linux内核的startup entry, 最简单的办法是网上查看相关文章,而且启动点有Kernel startup entry point的注释,可以通过搜索找到head.S文件。

本文尝试通过反汇编 vmlinux,查找Linux内核运行的第一条指令,然后通过第一条指令找到Linux内核启动点。

3.1.1 反汇编vmlinux查找Linux内核启动点

首先查看 vmlinux ELF header,可知 入口地址为 0xffffffc010000000

ps:
入口地址在哪里设置?
offset(TEXT_OFFSET),偏移 256MB

1
2
3
4
5
6
7
fzy@fzy-Lenovo:~/Documents/05_NXP_S32G/linux_kernel$ readelf -h ./vmlinux
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
OS/ABI: UNIX - System V
Machine: AArch64
Entry point address: 0xffffffc010000000

反编译 vmlinux。

1
<path to>/gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-objdump -dxh ./vmlinux > ./vmlinux.S
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# @file: vmlinux.S
# 符号表
SYMBOL TABLE:
ffffffc010000000 l d .head.text 0000000000000000 .head.text
......
0000000000000000 l df *ABS* 0000000000000000 arch/arm64/kernel/head.o
ffffffc010000000 l .head.text 0000000000000000 _head
ffffffc010b40020 l .init.text 0000000000000020 preserve_boot_args
# ------------------------------------------------------------
Disassembly of section .head.text:

ffffffc010000000 <_text>:
ffffffc010000000: 142d0000 b ffffffc010b40000 <primary_entry>
...
ffffffc010000010: 00d70000 .word 0x00d70000
ffffffc010000014: 00000000 .word 0x00000000
ffffffc010000018: 0000000a .word 0x0000000a
...
ffffffc010000038: 644d5241 .word 0x644d5241
ffffffc01000003c: 00000000 .word 0x00000000

从 vmlinux.S可知,内核启动的第一条指令是:

1
ffffffc010000000:       142d0000        b       ffffffc010b40000 <primary_entry>

对应的section为:.head.text_text,,同时还能看到 arch/arm64/kernel/head.o, head.o的源文件为head.S。

由此我们可知,程序从arch/arm64/kernel/head.S处开始运行,且第一条指令为 b primary_entry

查看System.map也可以知道启动点

ffffffc010000000 t _head
ffffffc010000000 T _text

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// @file: arch/arm64/kernel/head.S
/*
* Kernel startup entry point.
* ---------------------------
*/
__HEAD
_head:
/*
* DO NOT MODIFY. Image header expected by Linux boot-loaders.
*/
#ifdef CONFIG_EFI
/*
* This add instruction has no meaningful effect except that
* its opcode forms the magic "MZ" signature required by UEFI.
*/
add x13, x18, #0x16
b primary_entry
#else
b primary_entry // branch to kernel start, magic 从这里开始,跳转
.long 0 // reserved
#endif

注意,我们确实没有设置CONFIG_EFI。

1
2
3
4
5
6
# Boot options 
#
CONFIG_CMDLINE="console=ttyLF0"
# CONFIG_CMDLINE_FORCE is not set
# CONFIG_EFI is not set # 我们确实没有设置CONFIG_EFI
# end of Boot options

3.2 第一阶段(汇编语言)

3.2.1 primary_entry

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// @file: arch/arm64/kernel/head.S
// 105 行
SYM_CODE_START(primary_entry)
bl preserve_boot_args
bl el2_setup // Drop to EL1, w0=cpu_boot_mode
adrp x23, __PHYS_OFFSET
and x23, x23, MIN_KIMG_ALIGN - 1 // KASLR offset, defaults to 0
bl set_cpu_boot_mode_flag
bl __create_page_tables
/*
* The following calls CPU setup code, see arch/arm64/mm/proc.S for
* details.
* On return, the CPU will be ready for the MMU to be turned on and
* the TCR will have been set.
*/
bl __cpu_setup // initialise processor
b __primary_switch
SYM_CODE_END(primary_entry)

3.2.2 preserve_boot_args

功能:保存从bootloader传递过来的x0 ~ x3寄存器这四个寄存器

  • ARM64 boot protocol对这4个寄存器严格限制。x0保存dtb物理地址,x1~x3 = 0。x0是boot_args这段内存的首地址,X1是末地址。后续setup_arch函数会访问boot_args,并进行校验。

  • 使用DMB来保证stp指令在dc ivac指令之前执行完成

  • 将boot_args变量对应的cache line进行清除并设置无效

关于 __inval_dcache_area,可参考:/arch/arm64/mm/cache.S
Ensure that any D-cache lines for the interval [kaddr, kaddr+size)are invalidated. Any partial lines at the ends of the interval are also cleaned to PoC to prevent data loss

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/*
* Preserve the arguments passed by the bootloader in x0 .. x3
*/
SYM_CODE_START_LOCAL(preserve_boot_args)
mov x21, x0 // x21=FDT // 将dtb的地址暂存在x21寄存器,释放出x0使用
// x0保存boot_args变量的地址
adr_l x0, boot_args // record the contents of
stp x21, x1, [x0] // x0 .. x3 at kernel entry
stp x2, x3, [x0, #16]

dmb sy // needed before dc ivac with
// MMU off

mov x1, #0x20 // 4 x 8 bytes
b __inval_dcache_area // tail call
SYM_CODE_END(preserve_boot_args)

3.2.3 el2_setup

若处于EL2模式,需要将CPU退回EL1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
/*
* If we're fortunate enough to boot at EL2, ensure that the world is
* sane before dropping to EL1.
*
* Returns either BOOT_CPU_MODE_EL1 or BOOT_CPU_MODE_EL2 in w0 if
* booted in EL1 or EL2 respectively.
*/
SYM_FUNC_START(el2_setup)
msr SPsel, #1 // We want to use SP_EL{1,2}
mrs x0, CurrentEL
cmp x0, #CurrentEL_EL2
b.eq 1f
mov_q x0, (SCTLR_EL1_RES1 | ENDIAN_SET_EL1)
msr sctlr_el1, x0
mov w0, #BOOT_CPU_MODE_EL1 // This cpu booted in EL1
isb
ret

1: mov_q x0, (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
msr sctlr_el2, x0

3.2.4 set_cpu_boot_mode_flag

设置全局变量__boot_cpu_mode,前提是CPU退回EL1模式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/*
* Sets the __boot_cpu_mode flag depending on the CPU boot mode passed
* in w0. See arch/arm64/include/asm/virt.h for more info.
*/
SYM_FUNC_START_LOCAL(set_cpu_boot_mode_flag)
adr_l x1, __boot_cpu_mode
cmp w0, #BOOT_CPU_MODE_EL2
b.ne 1f
add x1, x1, #4
1: str w0, [x1] // This CPU has booted in EL1
dmb sy
dc ivac, x1 // Invalidate potentially stale cache line
ret
SYM_FUNC_END(set_cpu_boot_mode_flag)
1
2
3
4
5
6
7
8
9
10
11
/*
* We need to find out the CPU boot mode long after boot, so we need to
* store it in a writable variable.
*
* This is not in .bss, because we set it sufficiently early that the boot-time
* zeroing of .bss would clobber it.
*/
SYM_DATA_START(__boot_cpu_mode)
.long BOOT_CPU_MODE_EL2
.long BOOT_CPU_MODE_EL1
SYM_DATA_END(__boot_cpu_mode)

3.2.5 __create_page_tables

页表初始化

  • identity mapping
    • 建立整个内核(从KERNEL_START到KERNEL_END)的一致性mapping,将物理地址所在的虚拟地址段mapping到物理地址
    • 一致性映射可以保证在在 打开MMU 那一点附近的程序代码可以平滑切换

https://stackoverflow.com/questions/16688540/page-table-in-linux-kernel-space-during-boot/27266309#27266309
1)在使能 mmu 之前,CPU 产生的地址都是物理地址,使能 mmu 之后,产生的都是虚拟地址。
2)CPU 是 pipeline 工作的,在执行当前指令时,CPU 很可能已经做了一个动作,就是产生了下一条指令的地址(也就是计算出来了下一条指令在那里)。如果是在 mmu 打开之前,那这个地址就是物理地址。
因此,假设当前指令就是在打开 mmu,那么,在执行打开 mmu 这条指令时,CPU 已经产生了一个地址(下一条指令的地址),如上 2) 所讲,此时这个地址是物理地址。那么 打开mmu这条指令执行完毕,mmu 生效后,CPU会把刚才产生的物理地址当成虚拟地址,去 mmu 表中查找对应的物理地址

  • Map the kernel image
    • 仅从系统内存起始物理地址开始的一小段内存mapping

虚拟地址总线宽度最大可设置52,(36 39 42 47 48 52)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
/*
* Setup the initial page tables. We only setup the barest amount which is
* required to get the kernel running. The following sections are required:
* - identity mapping to enable the MMU (low address, TTBR0)
* - first few MB of the kernel linear mapping to jump to once the MMU has
* been enabled
*/
SYM_FUNC_START_LOCAL(__create_page_tables)
mov x28, lr

/*
* Invalidate the init page tables to avoid potential dirty cache lines
* being evicted. Other page tables are allocated in rodata as part of
* the kernel image, and thus are clean to the PoC per the boot
* protocol.
*/
adrp x0, init_pg_dir
adrp x1, init_pg_end
sub x1, x1, x0
bl __inval_dcache_area

/*
* Clear the init page tables.
*/
adrp x0, init_pg_dir
adrp x1, init_pg_end
sub x1, x1, x0
1: stp xzr, xzr, [x0], #16
stp xzr, xzr, [x0], #16
stp xzr, xzr, [x0], #16
stp xzr, xzr, [x0], #16
subs x1, x1, #64
b.ne 1b

mov x7, SWAPPER_MM_MMUFLAGS

/*
* Create the identity mapping.
*/
adrp x0, idmap_pg_dir
adrp x3, __idmap_text_start // __pa(__idmap_text_start)

#ifdef CONFIG_ARM64_VA_BITS_52
mrs_s x6, SYS_ID_AA64MMFR2_EL1
and x6, x6, #(0xf << ID_AA64MMFR2_LVA_SHIFT)
mov x5, #52
cbnz x6, 1f
#endif
mov x5, #VA_BITS_MIN
1:
adr_l x6, vabits_actual
str x5, [x6]
dmb sy
dc ivac, x6 // Invalidate potentially stale cache line

/*
* VA_BITS may be too small to allow for an ID mapping to be created
* that covers system RAM if that is located sufficiently high in the
* physical address space. So for the ID map, use an extended virtual
* range in that case, and configure an additional translation level
* if needed.
*
* Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
* entire ID map region can be mapped. As T0SZ == (64 - #bits used),
* this number conveniently equals the number of leading zeroes in
* the physical address of __idmap_text_end.
*/
adrp x5, __idmap_text_end
clz x5, x5
cmp x5, TCR_T0SZ(VA_BITS_MIN) // default T0SZ small enough?
b.ge 1f // .. then skip VA range extension

adr_l x6, idmap_t0sz
str x5, [x6]
dmb sy
dc ivac, x6 // Invalidate potentially stale cache line

#if (VA_BITS < 48)
#define EXTRA_SHIFT (PGDIR_SHIFT + PAGE_SHIFT - 3)
#define EXTRA_PTRS (1 << (PHYS_MASK_SHIFT - EXTRA_SHIFT))

/*
* If VA_BITS < 48, we have to configure an additional table level.
* First, we have to verify our assumption that the current value of
* VA_BITS was chosen such that all translation levels are fully
* utilised, and that lowering T0SZ will always result in an additional
* translation level to be configured.
*/
#if VA_BITS != EXTRA_SHIFT
#error "Mismatch between VA_BITS and page size/number of translation levels"
#endif

mov x4, EXTRA_PTRS
create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
#else
/*
* If VA_BITS == 48, we don't have to configure an additional
* translation level, but the top-level table has more entries.
*/
mov x4, #1 << (PHYS_MASK_SHIFT - PGDIR_SHIFT)
str_l x4, idmap_ptrs_per_pgd, x5
#endif
1:
ldr_l x4, idmap_ptrs_per_pgd
mov x5, x3 // __pa(__idmap_text_start)
adr_l x6, __idmap_text_end // __pa(__idmap_text_end)

map_memory x0, x1, x3, x6, x7, x3, x4, x10, x11, x12, x13, x14

/*
* Map the kernel image (starting with PHYS_OFFSET).
*/
adrp x0, init_pg_dir
mov_q x5, KIMAGE_VADDR // compile time __va(_text)
add x5, x5, x23 // add KASLR displacement
mov x4, PTRS_PER_PGD
adrp x6, _end // runtime __pa(_end)
adrp x3, _text // runtime __pa(_text)
sub x6, x6, x3 // _end - _text
add x6, x6, x5 // runtime __va(_end)

map_memory x0, x1, x5, x6, x7, x3, x4, x10, x11, x12, x13, x14

/*
* Since the page tables have been populated with non-cacheable
* accesses (MMU disabled), invalidate those tables again to
* remove any speculatively loaded cache lines.
*/
dmb sy

adrp x0, idmap_pg_dir
adrp x1, idmap_pg_end
sub x1, x1, x0
bl __inval_dcache_area

adrp x0, init_pg_dir
adrp x1, init_pg_end
sub x1, x1, x0
bl __inval_dcache_area

ret x28
SYM_FUNC_END(__create_page_tables)

3.2.6 __cpu_setup

CPU初始化设置

  • cache和TLB的处理
    • 清空
  • 设置TCR_EL1、SCTLR_EL1
    • kernel space和user space使用不同的页表,因此有两个Translation Table Base Registers,形成两套地址翻译系统,TCR_EL1寄存器主要用来控制这两套地址翻译系统
    • SCTLR_EL1是一个对整个系统(包括memory system)进行控制的寄存器
  • CPU做好MMU打开的准备
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
/*
* __cpu_setup
*
* Initialise the processor for turning the MMU on.
*
* Output:
* Return in x0 the value of the SCTLR_EL1 register.
*/
.pushsection ".idmap.text", "awx"
SYM_FUNC_START(__cpu_setup)
tlbi vmalle1 // Invalidate local TLB
dsb nsh

mov x1, #3 << 20
msr cpacr_el1, x1 // Enable FP/ASIMD
mov x1, #1 << 12 // Reset mdscr_el1 and disable
msr mdscr_el1, x1 // access to the DCC from EL0
isb // Unmask debug exceptions now,
enable_dbg // since this is per-cpu
reset_pmuserenr_el0 x1 // Disable PMU access from EL0
reset_amuserenr_el0 x1 // Disable AMU access from EL0

/*
* Memory region attributes
*/
mov_q x5, MAIR_EL1_SET
#ifdef CONFIG_ARM64_MTE
/*
* Update MAIR_EL1, GCR_EL1 and TFSR*_EL1 if MTE is supported
* (ID_AA64PFR1_EL1[11:8] > 1).
*/
mrs x10, ID_AA64PFR1_EL1
ubfx x10, x10, #ID_AA64PFR1_MTE_SHIFT, #4
cmp x10, #ID_AA64PFR1_MTE
b.lt 1f

/* Normal Tagged memory type at the corresponding MAIR index */
mov x10, #MAIR_ATTR_NORMAL_TAGGED
bfi x5, x10, #(8 * MT_NORMAL_TAGGED), #8

/* initialize GCR_EL1: all non-zero tags excluded by default */
mov x10, #(SYS_GCR_EL1_RRND | SYS_GCR_EL1_EXCL_MASK)
msr_s SYS_GCR_EL1, x10

/*
* If GCR_EL1.RRND=1 is implemented the same way as RRND=0, then
* RGSR_EL1.SEED must be non-zero for IRG to produce
* pseudorandom numbers. As RGSR_EL1 is UNKNOWN out of reset, we
* must initialize it.
*/
mrs x10, CNTVCT_EL0
ands x10, x10, #SYS_RGSR_EL1_SEED_MASK
csinc x10, x10, xzr, ne
lsl x10, x10, #SYS_RGSR_EL1_SEED_SHIFT
msr_s SYS_RGSR_EL1, x10

/* clear any pending tag check faults in TFSR*_EL1 */
msr_s SYS_TFSR_EL1, xzr
msr_s SYS_TFSRE0_EL1, xzr
1:
#endif
msr mair_el1, x5
/*
* Set/prepare TCR and TTBR. We use 512GB (39-bit) address range for
* both user and kernel.
*/
mov_q x10, TCR_TxSZ(VA_BITS) | TCR_CACHE_FLAGS | TCR_SMP_FLAGS | \
TCR_TG_FLAGS | TCR_KASLR_FLAGS | TCR_ASID16 | \
TCR_TBI0 | TCR_A1 | TCR_KASAN_FLAGS
tcr_clear_errata_bits x10, x9, x5

#ifdef CONFIG_ARM64_VA_BITS_52
ldr_l x9, vabits_actual
sub x9, xzr, x9
add x9, x9, #64
tcr_set_t1sz x10, x9
#else
ldr_l x9, idmap_t0sz
#endif
tcr_set_t0sz x10, x9

/*
* Set the IPS bits in TCR_EL1.
*/
tcr_compute_pa_size x10, #TCR_IPS_SHIFT, x5, x6
#ifdef CONFIG_ARM64_HW_AFDBM
/*
* Enable hardware update of the Access Flags bit.
* Hardware dirty bit management is enabled later,
* via capabilities.
*/
mrs x9, ID_AA64MMFR1_EL1
and x9, x9, #0xf
cbz x9, 1f
orr x10, x10, #TCR_HA // hardware Access flag update
1:
#endif /* CONFIG_ARM64_HW_AFDBM */
msr tcr_el1, x10
/*
* Prepare SCTLR
*/
mov_q x0, SCTLR_EL1_SET
ret // return to head.S
SYM_FUNC_END(__cpu_setup)

3.2.7 __primary_switch

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
SYM_FUNC_START_LOCAL(__primary_switch)
#ifdef CONFIG_RANDOMIZE_BASE
mov x19, x0 // preserve new SCTLR_EL1 value
mrs x20, sctlr_el1 // preserve old SCTLR_EL1 value
#endif

adrp x1, init_pg_dir
bl __enable_mmu // 跳转,开启 MMU
#ifdef CONFIG_RELOCATABLE
#ifdef CONFIG_RELR
mov x24, #0 // no RELR displacement yet
#endif
bl __relocate_kernel
#ifdef CONFIG_RANDOMIZE_BASE
ldr x8, =__primary_switched
adrp x0, __PHYS_OFFSET
blr x8

/*
* If we return here, we have a KASLR displacement in x23 which we need
* to take into account by discarding the current kernel mapping and
* creating a new one.
*/
pre_disable_mmu_workaround
msr sctlr_el1, x20 // disable the MMU
isb
bl __create_page_tables // recreate kernel mapping

tlbi vmalle1 // Remove any stale TLB entries
dsb nsh
isb

msr sctlr_el1, x19 // re-enable the MMU
isb
ic iallu // flush instructions fetched
dsb nsh // via old mapping
isb

bl __relocate_kernel
#endif
#endif
ldr x8, =__primary_switched
adrp x0, __PHYS_OFFSET
br x8
SYM_FUNC_END(__primary_switch)

3.2.8 __enable_mmu

开启MMU

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
/*
* Enable the MMU.
*
* x0 = SCTLR_EL1 value for turning on the MMU.
* x1 = TTBR1_EL1 value
*
* Returns to the caller via x30/lr. This requires the caller to be covered
* by the .idmap.text section.
*
* Checks if the selected granule size is supported by the CPU.
* If it isn't, park the CPU
*/
SYM_FUNC_START(__enable_mmu)
mrs x2, ID_AA64MMFR0_EL1
ubfx x2, x2, #ID_AA64MMFR0_TGRAN_SHIFT, 4
cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED
b.ne __no_granule_support
update_early_cpu_boot_status 0, x2, x3
adrp x2, idmap_pg_dir
phys_to_ttbr x1, x1
phys_to_ttbr x2, x2
msr ttbr0_el1, x2 // load TTBR0
offset_ttbr1 x1, x3
msr ttbr1_el1, x1 // load TTBR1
isb
msr sctlr_el1, x0
isb
/*
* Invalidate the local I-cache so that any instructions fetched
* speculatively from the PoC are discarded, since they may have
* been dynamically patched at the PoU.
*/
ic iallu
dsb nsh
isb
ret
SYM_FUNC_END(__enable_mmu)

3.2.9 __primary_switched

C环境准备

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
/*
* The following fragment of code is executed with the MMU enabled.
*
* x0 = __PHYS_OFFSET
*/
SYM_FUNC_START_LOCAL(__primary_switched)
adrp x4, init_thread_union
add sp, x4, #THREAD_SIZE
adr_l x5, init_task
msr sp_el0, x5 // Save thread_info

#ifdef CONFIG_ARM64_PTR_AUTH
__ptrauth_keys_init_cpu x5, x6, x7, x8
#endif

adr_l x8, vectors // load VBAR_EL1 with virtual
msr vbar_el1, x8 // vector table address
isb

stp xzr, x30, [sp, #-16]!
mov x29, sp

#ifdef CONFIG_SHADOW_CALL_STACK
adr_l scs_sp, init_shadow_call_stack // Set shadow call stack
#endif

str_l x21, __fdt_pointer, x5 // Save FDT pointer

ldr_l x4, kimage_vaddr // Save the offset between
sub x4, x4, x0 // the kernel virtual and
str_l x4, kimage_voffset, x5 // physical mappings

// Clear BSS
adr_l x0, __bss_start
mov x1, xzr
adr_l x2, __bss_stop
sub x2, x2, x0
bl __pi_memset
dsb ishst // Make zero page visible to PTW

#ifdef CONFIG_KASAN
bl kasan_early_init
#endif
#ifdef CONFIG_RANDOMIZE_BASE
tst x23, ~(MIN_KIMG_ALIGN - 1) // already running randomized?
b.ne 0f
mov x0, x21 // pass FDT address in x0
bl kaslr_early_init // parse FDT for KASLR options
cbz x0, 0f // KASLR disabled? just proceed
orr x23, x23, x0 // record KASLR offset
ldp x29, x30, [sp], #16 // we must enable KASLR, return
ret // to __primary_switch()
0:
#endif
add sp, sp, #16
mov x29, #0
mov x30, #0
b start_kernel
SYM_FUNC_END(__primary_switched)

3.2.10 start_kernel

汇编阶段结束,第二阶段开始,第二阶段开发语言为C语言。

3.3 第一阶段总结

TODO

参考文档

  1. Linux内核启动流程-博客园
  2. Linux内核启动流程-基于ARM64 (推荐阅读)
  3. Linux内核4.14版本:ARM64的内核启动过程(一)——start_kernel之前
  4. Linux内核4.14版本:ARM64的内核启动过程(二)——start_kernel
  5. Linux内核镜像文件格式与生成过程
  6. ARMv8,某台湾高校教师做的wiki
  7. ARMv8-a架构简介
  8. ARM64的启动过程之(一):内核第一个脚印 (系列文章,推荐阅读)
  9. ARM64的启动过程之(二):创建启动阶段的页表
  10. ARM64的启动过程之(三):为打开MMU而进行的CPU初始化
  11. ARM64的启动过程之(四):打开MMU
  12. ARM64的启动过程之(五):UEFI
  13. ARM64的启动过程之(六):异常向量表的设定
  14. ARM-汇编指令集(总结)
  15. ARM汇编语言 - 简介 [三]
  16. arm64 架构之入栈/出栈操作 (汇编指令集,推荐阅读)
  17. MMU是如何完成地址翻译的?

ARM64架构Linux内核启动过程分析_上
http://ziyangfu.github.io/2023/04/14/ARM64架构Linux内核启动过程分析_上/
作者
FZY
发布于
2023年4月14日
许可协议