Linker Code
Linker code is code emitted by the linker during link-time. They serve as the "glue" for the smooth functioning of modern programs. One helpful characteristic of linker code is that they will not appear in object files by definition, as they have not been "linked" yet. Therefore, we should be able to use the differences between the final linked executable and the object file(s) to grep the linker code. (This can turn into a binary diffing problem.)
GCC Linker Artifacts
These linker artifacts remain in the binary despite dynamically linking libc.
1. Entry and Initialization Functions:
- **`_start`**: The process entry point set by the linker, invoked by the OS loader.
- **`_init`** / **`_fini`**: Functions for running global constructors before `main` and global destructors at shutdown.
-
Global Constructors and Destructors Management:
__do_global_ctors_aux
/__do_global_dtors_aux
: Legacy helpers for invoking global constructors and destructors in older toolchains.__cxa_atexit
,__cxa_finalize
: C++ ABI functions that register and run global destructors at program termination._GLOBAL__sub_I_<function_name>
and_GLOBAL__D_<function_name>
: Internal symbols emitted for per-translation-unit constructor/destructor registration.
-
Dynamic Linking and Procedure Linkage Table:
.plt
(Procedure Linkage Table): Jump table stubs for lazily resolved function calls to shared libraries.- **
gmon_start
: Used for gprof profiling initialization if present.
-
**Transactional Memory Support:
register_tm_clones
,deregister_tm_clones
: Manage transactional memory “clone” registrations._ITM_registerTMCloneTable
,_ITM_deregisterTMCloneTable
: Functions called to register/deregister TM-specific code variants.
-
Guarded Initialization (C++ Static Local Initialization):
__cxa_guard_acquire
,__cxa_guard_release
,__cxa_guard_abort
: Support thread-safe initialization of static local objects in C++.
-
Frame and Debugging Helpers:
frame_dummy
: Ensures certain sections are correctly associated with debugging/unwinding info.__cxx_global_var_init
: Often generated to call constructors of global objects.
MSVC Linker Artifacts
-
CRT Entry Points:
mainCRTStartup
,wmainCRTStartup
: Entry points for console applications. They set up the runtime, callmain
orwmain
.WinMainCRTStartup
,wWinMainCRTStartup
: Entry points for GUI applications.DllMainCRTStartup
: Entry point for DLL initialization and finalization._main
/__main
: Thunk sometimes used to ensure global initializers run beforemain
.
-
Security Initialization and Checks:
__security_init_cookie
: Initializes the security cookie used to mitigate buffer overflows.__security_check_cookie
: Checks the cookie at runtime to detect stack corruption.
-
Structured Exception Handling (SEH):
__SEH_prolog
,__SEH_epilog
: Generated code patterns for setting up and tearing down SEH frames._except_handler3
,_except_handler4
: Runtime functions for handling structured exceptions.__CxxFrameHandler3
,__CxxFrameHandler4
: C++ exception handling routines that integrate with SEH.
-
C Runtime Initialization and Termination:
_CRT_INIT
,_initterm
,_initterm_e
: Functions that initialize and terminate various global states, static constructors, and runtime checks._cexit
,_c_exit
: Functions run on normal or quick program termination.
-
Thread Local Storage (TLS):
__dyn_tls_init_callback
: Set up or clean up thread-local storage data._tls_used
: Data structure for TLS directory information.
-
Thunk and Import Stub Functions:
- Thunk functions are small pieces of code that adjust calling conventions or jump into dynamically loaded functions.
__imp_
-prefixed symbols for imported functions from DLLs.
-
Runtime Arithmetic and Utility Routines:
_chkstk
: Stack probing for large local allocations.__aulldiv
,__aullrem
,__allmul
: Compiler-generated routines for integer arithmetic on large types._purecall
: Handler for pure virtual function calls in C++.
-
C++ Runtime Support:
_CxxThrowException
: Core C++ exception throw mechanism.__scrt_common_main_seh
,__scrt_uninitialize_cfltcvt
: Internal routines for SCRT (shared CRT) initialization and cleanup.
Disregarding Linker Code
Implementations for ground truth generation such as x86-sok and Li et. al's disregard linker code. In fact, this reflects a larger issue in the binary analysis world--we often see linker code as just boilerplate.
Unfortunately, this mindset trickles down to aspects that affect the verification of a program. For example, if a tool is lazy with handling linker code, it often leads to a naive loader implementation, which in turn leads to missing instructions, subroutines, and a lack of portability (see OpenVMS 9.2 ELF programs).
Proof of Concept
To concretely show that discarding linker code from truth consideration can affect ground truth, a proof of concept is needed.
Instead of instrumenting the default Gold Linker, let's instrument LLD instead.
Let's build this out
What about LTO (Link Time Optimization)?
We can justify putting this in the back-burner because many of the challenges that comes from LTO only reside in the generation of ground truth. In theory, if we can get tools to disassemble code with completeness and soundness, then LTO should not be much of a factor.