本文共 15452 字,大约阅读时间需要 51 分钟。
一块内存分配给应用程序之后,从代码的组织上,我们就有将它们分段的需求。
比如,可以分为代码段,数据段,只读数据段,堆栈段,未初始化的数据段等等。在GAS汇编器中,我们通过.section伪指令来指定段名。ARM编译器我买不起,我就忽略它了。
段的描述 | 默认段名 |
---|---|
代码段 | .text |
经过初始化的数据段 | .data |
未经初始化的数据段 | .bss |
BSS是Block Started by Symbol的缩写,就是为符号预留一些空间。
为什么要将指令段和数据段分开呢?这要从冯.诺依曼结构和哈佛结构说起。
诺依曼结构是指令和数据都存在同一个存储器上,比如Intel的8086,MIPS,ARM7都是这样的结构。因为指令和数据都要走同一条总线,所以会产生一些竞争。与此相对,将指令存储和数据存储分开,这就是哈佛结构。从ARM9开始,ARM芯片就是哈佛结构了。它们各走各自的总线,各有各的cache,可以提高命中率。
即使在诺依曼结构下,通常也是将代码和指令存放在同一存储器的不同区域。
我们找一个gcc -S生成的汇编来看一下:
源码就用俗到不能再俗的hello,world的例子:#includeint main(){ printf("Hello,Word!\n");}
生成的汇编如下:
.file "hello.cc" .section .rodata.LC0: .string "Hello,Word!" .text .globl main .type main, @functionmain:.LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movl $.LC0, %edi call puts movl $0, %eax popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc.LFE0: .size main, .-main .ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3" .section .note.GNU-stack,"",@progbits
像.text,.data和.bss这样的标准段直接就是伪指令,不用再加.section了。
而像本例这样,将字符串放到.rodata这样的区域中,就需要加.section伪指令来声明一下。.rodata可以对应C++的const关键字,为常量专门区分一个区域。我们通过用objdump -h命令来查看我们刚才的helloworld代码被编译成多少个段。打出来一看,还真不少,一共有26个。
a.out: file format elf64-x86-64Sections:Idx Name Size VMA LMA File off Algn 0 .interp 0000001c 0000000000400238 0000000000400238 00000238 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 1 .note.ABI-tag 00000020 0000000000400254 0000000000400254 00000254 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .note.gnu.build-id 00000024 0000000000400274 0000000000400274 00000274 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 3 .gnu.hash 0000001c 0000000000400298 0000000000400298 00000298 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA 4 .dynsym 00000060 00000000004002b8 00000000004002b8 000002b8 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA 5 .dynstr 0000003d 0000000000400318 0000000000400318 00000318 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 6 .gnu.version 00000008 0000000000400356 0000000000400356 00000356 2**1 CONTENTS, ALLOC, LOAD, READONLY, DATA 7 .gnu.version_r 00000020 0000000000400360 0000000000400360 00000360 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA 8 .rela.dyn 00000018 0000000000400380 0000000000400380 00000380 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA 9 .rela.plt 00000030 0000000000400398 0000000000400398 00000398 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA 10 .init 00000018 00000000004003c8 00000000004003c8 000003c8 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 11 .plt 00000030 00000000004003e0 00000000004003e0 000003e0 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 12 .text 000001d8 0000000000400410 0000000000400410 00000410 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 13 .fini 0000000e 00000000004005e8 00000000004005e8 000005e8 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 14 .rodata 00000010 00000000004005f8 00000000004005f8 000005f8 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 15 .eh_frame_hdr 0000002c 0000000000400608 0000000000400608 00000608 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 16 .eh_frame 000000a4 0000000000400638 0000000000400638 00000638 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA 17 .ctors 00000010 0000000000600e28 0000000000600e28 00000e28 2**3 CONTENTS, ALLOC, LOAD, DATA 18 .dtors 00000010 0000000000600e38 0000000000600e38 00000e38 2**3 CONTENTS, ALLOC, LOAD, DATA 19 .jcr 00000008 0000000000600e48 0000000000600e48 00000e48 2**3 CONTENTS, ALLOC, LOAD, DATA 20 .dynamic 00000190 0000000000600e50 0000000000600e50 00000e50 2**3 CONTENTS, ALLOC, LOAD, DATA 21 .got 00000008 0000000000600fe0 0000000000600fe0 00000fe0 2**3 CONTENTS, ALLOC, LOAD, DATA 22 .got.plt 00000028 0000000000600fe8 0000000000600fe8 00000fe8 2**3 CONTENTS, ALLOC, LOAD, DATA 23 .data 00000010 0000000000601010 0000000000601010 00001010 2**3 CONTENTS, ALLOC, LOAD, DATA 24 .bss 00000010 0000000000601020 0000000000601020 00001020 2**3 ALLOC 25 .comment 0000002a 0000000000000000 0000000000000000 00001020 2**0 CONTENTS, READONLY
好吧,不是每种语言都这么疯狂的,我们看看Go语言生成的段:
除了debug相关的,只有8个段,比GCC还是简单一些的。
main: file format elf64-x86-64Sections:Idx Name Size VMA LMA File off Algn 0 .text 0000fb00 0000000000400c00 0000000000400c00 00000c00 2**3 CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .rodata 000073b0 0000000000410700 0000000000410700 00010700 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .gosymtab 0000ad5c 0000000000417ab0 0000000000417ab0 00017ab0 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA 3 .gopclntab 00002248 0000000000422810 0000000000422810 00022810 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA 4 .noptrdata 00000008 0000000000425000 0000000000425000 00025000 2**3 CONTENTS, ALLOC, LOAD, DATA 5 .data 000022b0 0000000000425008 0000000000425008 00025008 2**3 CONTENTS, ALLOC, LOAD, DATA 6 .bss 00011830 00000000004272b8 00000000004272b8 000272b8 2**3 ALLOC 7 .noptrbss 02009a38 0000000000438ae8 0000000000438ae8 00038ae8 2**3 ALLOC 8 .debug_abbrev 000000cc 0000000000000000 0000000000000000 0002c93a 2**0 CONTENTS, READONLY, DEBUGGING 9 .debug_line 00004868 0000000000000000 0000000000000000 0002ca06 2**0 CONTENTS, READONLY, DEBUGGING 10 .debug_frame 00003240 0000000000000000 0000000000000000 0003126e 2**0 CONTENTS, READONLY, DEBUGGING 11 .debug_info 00008828 0000000000000000 0000000000000000 000344ae 2**0 CONTENTS, READONLY, DEBUGGING 12 .debug_pubnames 00001dd3 0000000000000000 0000000000000000 0003ccd6 2**0 CONTENTS, READONLY, DEBUGGING 13 .debug_pubtypes 00000ae0 0000000000000000 0000000000000000 0003eaa9 2**0 CONTENTS, READONLY, DEBUGGING 14 .debug_aranges 00000630 0000000000000000 0000000000000000 0003f589 2**0 CONTENTS, READONLY, DEBUGGING 15 .debug_gdb_scripts 0000002c 0000000000000000 0000000000000000 0003fbb9 2**0 CONTENTS, READONLY, DEBUGGING
幸好,OAT的section比C++的还要少一些。主要的段有14个。
一个指令段,7个只读的数据段,6个调试用的段。base.odex: file format elf64-littleaarch64Sections:Idx Name Size VMA LMA File off Algn 0 .dynsym 00000060 0000000000000200 0000000000000200 00000200 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA 1 .dynstr 00000027 0000000000000260 0000000000000260 00000260 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .hash 00000020 0000000000000288 0000000000000288 00000288 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 3 .rodata 0030f000 0000000000001000 0000000000001000 00001000 2**12 CONTENTS, ALLOC, LOAD, READONLY, DATA 4 .text 003c9424 0000000000310000 0000000000310000 00310000 2**12 CONTENTS, ALLOC, LOAD, READONLY, CODE 5 .dynamic 00000070 00000000006da000 00000000006da000 006da000 2**12 CONTENTS, ALLOC, LOAD, READONLY, DATA 6 .eh_frame 000869b0 00000000006db000 00000000006db000 0084d000 2**12 CONTENTS, ALLOC, LOAD, READONLY, DATA 7 .eh_frame_hdr 00016744 00000000007619b0 00000000007619b0 008d39b0 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 8 .debug_info 000427a1 0000000000000000 0000000000000000 008ea0f4 2**0 CONTENTS, READONLY, DEBUGGING 9 .debug_info.oat_patches 00006028 0000000000000000 0000000000000000 0092c895 2**0 CONTENTS, READONLY, DEBUGGING 10 .debug_abbrev 000055bf 0000000000000000 0000000000000000 009328bd 2**0 CONTENTS, READONLY, DEBUGGING 11 .debug_str 001328d8 0000000000000000 0000000000000000 00937e7c 2**0 CONTENTS, READONLY, DEBUGGING 12 .debug_line 00038d42 0000000000000000 0000000000000000 00a6a754 2**0 CONTENTS, READONLY, DEBUGGING 13 .debug_line.oat_patches 00000583 0000000000000000 0000000000000000 00aa3496 2**0 CONTENTS, READONLY, DEBUGGING
从上面列出的信息可以看到,段是有自己的类型的,而且还有对应的如可写,只读,可执行等属性。
常量 | 值 | 含义 |
---|---|---|
SHT_NULL | 0 | 无效段 |
SHT_PROGBITS | 1 | 程序段 |
SHT_SYMTAB | 2 | 符号表 |
SHT_STRTAB | 3 | 字符串表 |
SHT_RELA | 4 | 重定位表 |
SHT_HASH | 5 | 符号表的哈希表 |
SHT_DYNAMIC | 6 | 动态链接信息 |
SHT_NOTE | 7 | 提示性信息,注释 |
SHT_NOBITS | 8 | 无内容,如未初始化的.bss段 |
SHT_REL | 9 | 重定位信息 |
SHT_SHLIB | 10 | RFU |
SHT_DYNSYM | 11 | 动态链接的符号表 |
段的标志位
常量 | 值 | 含义 |
---|---|---|
SHF_WRITE | 1 | 可写 |
SHF_ALLOC | 2 | 需要分配空间,有些段并不需要,而.code,.bss,.data这些都需要分配空间 |
SHF_EXEINSTR | 4 | 可执行,一般只有.code段才有这个属性 |
下面我们学以致用,通过readelf -S看看OAT文件的段的类型和属性:
带A的是SHF_ALLOC属性,X是可执行属性。连同符号表、字符串表,空字段等字段,一共有18个字段。
There are 18 section headers, starting at offset 0xaa3ae0:Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .dynsym DYNSYM 0000000000000200 00000200 0000000000000060 0000000000000018 A 2 0 8 [ 2] .dynstr STRTAB 0000000000000260 00000260 0000000000000027 0000000000000000 A 0 0 1 [ 3] .hash HASH 0000000000000288 00000288 0000000000000020 0000000000000004 A 1 0 4 [ 4] .rodata PROGBITS 0000000000001000 00001000 000000000030f000 0000000000000000 A 0 0 4096 [ 5] .text PROGBITS 0000000000310000 00310000 00000000003c9424 0000000000000000 AX 0 0 4096 [ 6] .dynamic DYNAMIC 00000000006da000 006da000 0000000000000070 0000000000000010 A 2 0 4096 [ 7] .symtab SYMTAB 0000000000000000 006da070 00000000000435c0 0000000000000018 8 0 8 [ 8] .strtab STRTAB 0000000000000000 0071d630 000000000012f609 0000000000000000 0 0 1 [ 9] .eh_frame PROGBITS 00000000006db000 0084d000 00000000000869b0 0000000000000000 A 0 0 4096 [10] .eh_frame_hdr PROGBITS 00000000007619b0 008d39b0 0000000000016744 0000000000000000 A 0 0 4 [11] .debug_info PROGBITS 0000000000000000 008ea0f4 00000000000427a1 0000000000000000 0 0 1 [12] .debug_info.oat_p LOUSER+0 0000000000000000 0092c895 0000000000006028 0000000000000000 0 0 1 [13] .debug_abbrev PROGBITS 0000000000000000 009328bd 00000000000055bf 0000000000000000 0 0 1 [14] .debug_str PROGBITS 0000000000000000 00937e7c 00000000001328d8 0000000000000000 0 0 1 [15] .debug_line PROGBITS 0000000000000000 00a6a754 0000000000038d42 0000000000000000 0 0 1 [16] .debug_line.oat_p LOUSER+0 0000000000000000 00aa3496 0000000000000583 0000000000000000 0 0 1 [17] .shstrtab STRTAB 0000000000000000 00aa3a19 00000000000000c1 0000000000000000 0 0 1Key to Flags: W (write), A (alloc), X (execute)
上面我们了解了section,这是针对ELF文件中的组织单元。下面我们讨论另一个概念叫做segment。
将section映射到内存中时,为了节省空间,我们可以将相同类型和属性的section放在一起。因为section很多,每个都占用整个页面的话会导致页的浪费,而合到一起则只需要每个segment占满整页就可以了。
我们通过readelf -l参数可以查看segment的分配情况,如下例:
Elf file type is DYN (Shared object file)Entry point 0x0There are 7 program headers, starting at offset 64Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040 0x0000000000000188 0x0000000000000188 R 8 LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000310000 0x0000000000310000 R 1000 LOAD 0x0000000000310000 0x0000000000310000 0x0000000000310000 0x00000000003c9424 0x00000000003c9424 R E 1000 LOAD 0x00000000006da000 0x00000000006da000 0x00000000006da000 0x0000000000000070 0x0000000000000070 RW 1000 DYNAMIC 0x00000000006da000 0x00000000006da000 0x00000000006da000 0x0000000000000070 0x0000000000000070 RW 1000 LOAD 0x000000000084d000 0x00000000006db000 0x00000000006db000 0x000000000009d0f4 0x000000000009d0f4 R 1000 GNU_EH_FRAME 0x00000000008d39b0 0x00000000007619b0 0x00000000007619b0 0x0000000000016744 0x0000000000016744 R 4 Section to Segment mapping: Segment Sections... 00 01 .dynsym .dynstr .hash .rodata 02 .text 03 .dynamic 04 .dynamic 05 .eh_frame .eh_frame_hdr 06 .eh_frame_hdr
常量 | 值 | 描述 |
---|---|---|
PT_NULL | 0 | 空值 |
PT_LOAD | 1 | 加载到内存中 |
PT_DYNAMIC | 2 | 动态链接 |
PT_INTERP | 3 | 动态链接的辅助信息 |
PT_NOTE | 4 | 暂时用不到 |
PT_SHLIB | 5 | RFU |
PT_PHDR | 6 | 程序表头的位置和大小 |
GNU_EH_FRAME | .eh_frame_hdr专用 |
总结起来,过程就是,先装载数据,再装载代码,然后是动态链接相关的,最后是系统的一些辅助信息。
积累了前面这么多基础知识之后,我们终于可以向ELF文件头的后半部分前进了。
做为坚定的64位的爱好者,我以64位为例来讲解后面的文件头,兼顾32位。
偏移量 | 字段 | 长度 | 描述 |
---|---|---|---|
0x18 | e_entry | 8 | 程序的入口点,32位下为4字节 |
0x20 | e_phoff | 8 | 程序头表的地址,其实就是segment表,对应segment与section的关系的,32位下为4字节 |
0x28 | e_shoff | 8 | 段头表的地址,指向section的表,32位下为4字节 |
0x30 | e_flags | 4 | 根据架构不同的标志位 |
0x34 | e_ehsize | 2 | ELF头的长度,在64位下是64字节,在32位下是52字节。差的12个字节就是上面的三个字段e_entry,e_phoff,e_shoff给闹的 |
0x36 | e_phentsize | 2 | 程序头表项的长度 |
0x38 | e_phnum | 2 | 程序头表的记录数 |
0x3A | e_shentsize | 2 | section表记录的长度 |
0x3C | e_shnum | 2 | section表的记录数 |
0x3E | e_shstrndx | 2 | section表中的哪一项是名字项 |
好,我们还把上章的那个例子拿出来:
7F 45 4C 46 02 01 01 03 00 00 00 00 00 00 00 00 03 00 B7 00 01 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 E0 3A AA 00 00 00 00 00 00 00 00 00 40 00 38 00 07 00 40 00 12 00 11 00 06 00 00 00 04 00 00 00 40 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00
还记得我们上次分析的头0x18个字节吗?
温故而知新,我们开始这节学到的内容:
转载地址:http://qvwva.baihongyu.com/