This course targets PhD students in the cyber security area. The contents are organized around the representation, modeling and understanding of programs, which will enable the students to be familiar and confident in dealing with research problems related to program analysis and software security, such as reverse engineering, vulnerability analysis, malware detection, and so on. Topics will include:
- Low level program representation at binary and assembly code levels, PE and ELF file format, x86 and ARM assembly instructions, etc.
- Intermediate level program representation, like Abstract Syntax Tree (AST), various intermediate language tarting for different program analysis tasks (such as VEX, BIL, LLVM IR and so on.), etc.
- Transformation between different representations, including compile, assembly, decompile, disassemble, code lifting, etc.
- Common program abstraction models, like Control Flow Graph, Data Flow Graph, Call Graph, etc.
- Program representations for machine learning, like feature extraction, code embedding, etc.
- Typical program understanding tasks, like taint analysis, code obfuscation and de-obfuscation, code similarity detection, information recovery, vulnerability discovery, malware detection, etc.