Page 1 of 1

Compiling and linking theory

Posted: Tue Mar 31, 2009 1:15 am
by Chris Corbyn
Not PHP related...

I'm quite used to using gcc to compile and link source code for me, but in the back of my mind I don't *really* know what's going on under-the-surface. Should I care? Or should I just accept that compiling and linking are required steps to produce executable files from source code without really caring how it works?

I mean, what's the compiler doing when it reads source code and produces a binary file (what do the instructions in that file look like?)? And what exactly is the linker doing when it links in shared objects etc? I don't particularly want to know in any great depth, I just want to have more of an understanding about the processes involved.

Would learning assembler help me, or is that just shifting down to a slightly lower level with the same questions left unanswered? Maybe I need to know just a little about the instructions that are sent to the CPU, such as what a simple if..else looks like at the CPU level?

Anybody here ever written a simple (really simple, not very smart) virtual machine with its own defined set of instructions to demonstrate such a thing?

Re: Compiling and linking theory

Posted: Tue Mar 31, 2009 1:34 am
by Benjamin
Chris Corbyn wrote:I don't *really* know what's going on under-the-surface. Should I care?
Only where it concerns you. I know there are situations where you need to drop out of C or C++ and into assembler for a tight loop that is running slow. Things like that. Maybe one compiler generates faster code than another. That's what I would be concerned with. Don't spin wheels learning things that won't benefit you.

Re: Compiling and linking theory

Posted: Tue Mar 31, 2009 1:47 am
by Chris Corbyn
astions wrote:Don't spin wheels learning things that won't benefit you.
I guess I'm hoping that understanding more about what's going on will be beneficial to my writing C code. But maybe not... It's not like I've really delved deep into the guts of the Zend Engine in a quest to find out how PHP works... but I know the general principles for how the interpreter reads source code and what it does with it (PHP is mostly just using yacc to parse the source and run a bunch of callbacks in the Zend Engine).

Executable files are a whole different ballgame though... Apart from the parsing part of the process.

I've always been more of a "behind the scenes" programmer cos that's usually where all the interesting stuff is happening :P

Re: Compiling and linking theory

Posted: Tue Mar 31, 2009 1:51 am
by Benjamin
If I could write everything in assembly without it increasing development time by 1000... I would be all there.. seriously. I'm all into understanding what is going on under the hood and perfecting everything. But.. C and C++ were created for good reason. The compilers save you immense amounts of time by allowing you to write code at a higher level.

Re: Compiling and linking theory

Posted: Wed Apr 01, 2009 8:25 pm
by alex.barylski
I'm quite used to using gcc to compile and link source code for me, but in the back of my mind I don't *really* know what's going on under-the-surface. Should I care? Or should I just accept that compiling and linking are required steps to produce executable files from source code without really caring how it works?
As a applications developer, it's not really a concern, IMHO -- although learning anything new will rarely kill you. :P
what do the instructions in that file look like
Depends on the compiler I suppose, but machine code is a typical result.
Would learning assembler help me, or is that just shifting down to a slightly lower level with the same questions left unanswered? Maybe I need to know just a little about the instructions that are sent to the CPU, such as what a simple if..else looks like at the CPU level?
Learning assembly never hurts, but won't really help understand the process much. You would still need to learn the specifics of your particular compiler/linker system(s). The idea behind a linker, for historical reasons, was to allow a programmer to write modular applications, test a function library and then compile into machine language and leave it out of the compilation phase, so compilation only occured on the files you were actually working on. The compiled binaries were then linked into a single unit and the appropriate headers were added for any given platform, such as Windows PE file format, etc.

Machine language doesn't really have "IF" statements instead you would use conditional jmp instructions. Whether a JMP instruction "jumps" to it's label is usually depenent on a value in a flags register (might be different on todays architecture). It either jumps or falls through, nothing more, nothing less.

There are a bewildering array of jump nmenoics:

JZ = Jump if Zero
JG = Jump if greater than
JNZ = Jump if not Zero
JLE = Jump if less than or equal
JNG = Jump if not greater than

To make matters worse, many mnemonics use the exact same instruction opcode and there are usually instructions for signed and unsigned values.

High level language constructs do not translate into machine/assembly very easy so I cannot easily give an example without firing up VS and viewing the assembly.

For a simple variable test like:

Code: Select all

if($a == $b){
  echo 'Hello World';
}
Might look like:

Code: Select all

 
mov EAX, $a  ; Copy the data at memory offset $a into register
cmp EAX, $b  ; Compare the value in register EAX to value in memory offset $b and set flag to TRUE if they are equal
je  HELLO_WORLD 
ret
 
HELLO_WORLD:
echo 'Hello World';
ret
 
This is horribly inaccurate, as there is no 'echo' instruction, etc...but demonstrates the very basics of how complex a simple conditional test can be. I believe cmp which is what does the test would then set a bit in a flag register if the values were equal, the next instruction is the je (jump if equal) -- this instruction would actually test the bit flag set by cmp and jump to the offset or continue on it's merry way. Thus you would need the ret to hault execution or the HELLO_WORLD jump would get executed anyways, regardless of the conditional test.
I know there are situations where you need to drop out of C or C++ and into assembler for a tight loop that is running slow.
Compilers for at least 10 years have been more than capable of optimizing most loops for you. It's very rare that you can actually opitmize C/C++ code, if anything you would be best off just using inline assembler or inlining functions called inside a loop -- better yet use a macro. :P
I've always been more of a "behind the scenes" programmer cos that's usually where all the interesting stuff is happening
Then it's probably worth reading up on just to satisfy your desire to understand... :)

Re: Compiling and linking theory

Posted: Fri Apr 03, 2009 7:44 am
by alex.barylski
Incase anyone is interested, I dug up this little article which has a graphic to explain the whole process from a C/C++ build process:

http://www.tenouk.com/ModuleW.html

Re: Compiling and linking theory

Posted: Fri Apr 03, 2009 8:40 am
by Chris Corbyn
Cheers PCSpectra. Insightful :)

Re: Compiling and linking theory

Posted: Sat Apr 11, 2009 10:29 am
by Chris Corbyn
I've discovered the iTunes U section of the iTunes store. Stanford have a ridiculous amount of free lecture videos along with the lecture slides, assignments etc online. Their CS107 course (Programming Paradigms) covers C, C++, Scheme and Python, discussing the paradigms used in those languages and also delves down into how they are translated into assembly :) Haven't watched much yet but glad I've stumbled upon all their content. Kudos to Stanford!

I didn't realise there are different forms of assembly too.

Re: Compiling and linking theory

Posted: Sat Apr 11, 2009 3:50 pm
by alex.barylski
I didn't realise there are different forms of assembly too.
I remember playing with TASM (Borland I think) and their assembler supported things like macros and even went as far as to offer OOP...imagine that...using objects in assembly. :P

Re: Compiling and linking theory

Posted: Sat Apr 11, 2009 9:11 pm
by Doug G
I didn't realise there are different forms of assembly too.
Assemblers are specific to the cpu the code runs on. The assembler for a 8086 chip is different from an assembler for a 6502 chip (note: These are obsolete ancient cpu chips just for an example). And I know the assembler for my old DEC PDP-8/E wouldn't spit out anything recognizable by a PC :)