Compiling and linking theory

Ye' old general discussion board. Basically, for everything that isn't covered elsewhere. Come here to shoot the breeze, shoot your mouth off, or whatever suits your fancy.
This forum is not for asking programming related questions.

Moderator: General Moderators

Post Reply
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Compiling and linking theory

Post by Chris Corbyn »

Not PHP related...

I'm quite used to using gcc to compile and link source code for me, but in the back of my mind I don't *really* know what's going on under-the-surface. Should I care? Or should I just accept that compiling and linking are required steps to produce executable files from source code without really caring how it works?

I mean, what's the compiler doing when it reads source code and produces a binary file (what do the instructions in that file look like?)? And what exactly is the linker doing when it links in shared objects etc? I don't particularly want to know in any great depth, I just want to have more of an understanding about the processes involved.

Would learning assembler help me, or is that just shifting down to a slightly lower level with the same questions left unanswered? Maybe I need to know just a little about the instructions that are sent to the CPU, such as what a simple if..else looks like at the CPU level?

Anybody here ever written a simple (really simple, not very smart) virtual machine with its own defined set of instructions to demonstrate such a thing?
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Re: Compiling and linking theory

Post by Benjamin »

Chris Corbyn wrote:I don't *really* know what's going on under-the-surface. Should I care?
Only where it concerns you. I know there are situations where you need to drop out of C or C++ and into assembler for a tight loop that is running slow. Things like that. Maybe one compiler generates faster code than another. That's what I would be concerned with. Don't spin wheels learning things that won't benefit you.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Re: Compiling and linking theory

Post by Chris Corbyn »

astions wrote:Don't spin wheels learning things that won't benefit you.
I guess I'm hoping that understanding more about what's going on will be beneficial to my writing C code. But maybe not... It's not like I've really delved deep into the guts of the Zend Engine in a quest to find out how PHP works... but I know the general principles for how the interpreter reads source code and what it does with it (PHP is mostly just using yacc to parse the source and run a bunch of callbacks in the Zend Engine).

Executable files are a whole different ballgame though... Apart from the parsing part of the process.

I've always been more of a "behind the scenes" programmer cos that's usually where all the interesting stuff is happening :P
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Re: Compiling and linking theory

Post by Benjamin »

If I could write everything in assembly without it increasing development time by 1000... I would be all there.. seriously. I'm all into understanding what is going on under the hood and perfecting everything. But.. C and C++ were created for good reason. The compilers save you immense amounts of time by allowing you to write code at a higher level.
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Compiling and linking theory

Post by alex.barylski »

I'm quite used to using gcc to compile and link source code for me, but in the back of my mind I don't *really* know what's going on under-the-surface. Should I care? Or should I just accept that compiling and linking are required steps to produce executable files from source code without really caring how it works?
As a applications developer, it's not really a concern, IMHO -- although learning anything new will rarely kill you. :P
what do the instructions in that file look like
Depends on the compiler I suppose, but machine code is a typical result.
Would learning assembler help me, or is that just shifting down to a slightly lower level with the same questions left unanswered? Maybe I need to know just a little about the instructions that are sent to the CPU, such as what a simple if..else looks like at the CPU level?
Learning assembly never hurts, but won't really help understand the process much. You would still need to learn the specifics of your particular compiler/linker system(s). The idea behind a linker, for historical reasons, was to allow a programmer to write modular applications, test a function library and then compile into machine language and leave it out of the compilation phase, so compilation only occured on the files you were actually working on. The compiled binaries were then linked into a single unit and the appropriate headers were added for any given platform, such as Windows PE file format, etc.

Machine language doesn't really have "IF" statements instead you would use conditional jmp instructions. Whether a JMP instruction "jumps" to it's label is usually depenent on a value in a flags register (might be different on todays architecture). It either jumps or falls through, nothing more, nothing less.

There are a bewildering array of jump nmenoics:

JZ = Jump if Zero
JG = Jump if greater than
JNZ = Jump if not Zero
JLE = Jump if less than or equal
JNG = Jump if not greater than

To make matters worse, many mnemonics use the exact same instruction opcode and there are usually instructions for signed and unsigned values.

High level language constructs do not translate into machine/assembly very easy so I cannot easily give an example without firing up VS and viewing the assembly.

For a simple variable test like:

Code: Select all

if($a == $b){
  echo 'Hello World';
}
Might look like:

Code: Select all

 
mov EAX, $a  ; Copy the data at memory offset $a into register
cmp EAX, $b  ; Compare the value in register EAX to value in memory offset $b and set flag to TRUE if they are equal
je  HELLO_WORLD 
ret
 
HELLO_WORLD:
echo 'Hello World';
ret
 
This is horribly inaccurate, as there is no 'echo' instruction, etc...but demonstrates the very basics of how complex a simple conditional test can be. I believe cmp which is what does the test would then set a bit in a flag register if the values were equal, the next instruction is the je (jump if equal) -- this instruction would actually test the bit flag set by cmp and jump to the offset or continue on it's merry way. Thus you would need the ret to hault execution or the HELLO_WORLD jump would get executed anyways, regardless of the conditional test.
I know there are situations where you need to drop out of C or C++ and into assembler for a tight loop that is running slow.
Compilers for at least 10 years have been more than capable of optimizing most loops for you. It's very rare that you can actually opitmize C/C++ code, if anything you would be best off just using inline assembler or inlining functions called inside a loop -- better yet use a macro. :P
I've always been more of a "behind the scenes" programmer cos that's usually where all the interesting stuff is happening
Then it's probably worth reading up on just to satisfy your desire to understand... :)
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Compiling and linking theory

Post by alex.barylski »

Incase anyone is interested, I dug up this little article which has a graphic to explain the whole process from a C/C++ build process:

http://www.tenouk.com/ModuleW.html
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Re: Compiling and linking theory

Post by Chris Corbyn »

Cheers PCSpectra. Insightful :)
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Re: Compiling and linking theory

Post by Chris Corbyn »

I've discovered the iTunes U section of the iTunes store. Stanford have a ridiculous amount of free lecture videos along with the lecture slides, assignments etc online. Their CS107 course (Programming Paradigms) covers C, C++, Scheme and Python, discussing the paradigms used in those languages and also delves down into how they are translated into assembly :) Haven't watched much yet but glad I've stumbled upon all their content. Kudos to Stanford!

I didn't realise there are different forms of assembly too.
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Compiling and linking theory

Post by alex.barylski »

I didn't realise there are different forms of assembly too.
I remember playing with TASM (Borland I think) and their assembler supported things like macros and even went as far as to offer OOP...imagine that...using objects in assembly. :P
Doug G
Forum Contributor
Posts: 282
Joined: Sun Sep 09, 2007 6:27 pm

Re: Compiling and linking theory

Post by Doug G »

I didn't realise there are different forms of assembly too.
Assemblers are specific to the cpu the code runs on. The assembler for a 8086 chip is different from an assembler for a 6502 chip (note: These are obsolete ancient cpu chips just for an example). And I know the assembler for my old DEC PDP-8/E wouldn't spit out anything recognizable by a PC :)
Post Reply