João Silvestre's Silvestre

Understanding CPU Operation: A Programmer's Guide to Program Execution (Part 2)

2023-08-28T20:57:00Z

Introduction

In Part 1 we took a look at key concepts of CPU components and architecture, we also took a look at the instruction cycle and how it works. So we now know how programs go from being written, compiled into instructions that the CPU can execute and how the execution cycle handles the execution of those instructions, but how do we actually make the computer do useful things? We all use a computer, we have monitors, keyboards and mouses, how do we go from keyboard presses to something showing up on the screen? How does a program store things on hard drive? How can we use memory beyond the embedded registers?

Intructions and Data

In the previous post we talked about the instruction cycle, and how the CPU fetches instructions from memory, decodes them and executes them. But how does the CPU know what is an instruction and what is data? How does it know what to execute and what to store?

CPUs are a chip and as most chips do they have pins for power, ground and data input/output, the pin layout is different for each CPU, but they all have a set of pins that are used to communicate with the outside world, these pins are called the bus. The bus is a set of wires that are used to transfer data between the CPU and other components, the bus is divided into three parts:

Address Bus: This is a set of wires that are used to specify a memory address, the CPU uses this to tell the memory what address it wants to read or write to.
Data Bus: This is a set of wires that are used to transfer data between the CPU and other components, the CPU uses this to read or write data to memory or other components.
Control Bus: This is a set of wires that are used to control the flow of data between the CPU and other components, the CPU uses this to tell the memory to read or write data, or to tell other components to do something.

Bus Controller

Between the CPU and the other components, there will typically be something called a bus controller, it's main function is to decide, based on the address bus, where to send the memory access signals from the CPU. If the address is in the range of the memory, it will send the signals to the memory, if it is in the range of the hard drive, it will send the signals to the hard drive, and so on. Visualize the different bus as just a set of wires connecting from the CPU into the bus controller, and from there into the different components.

Address Bus

This is used to tell the memory, or other components, what address it wants to read or write to. This is what the bus controller uses to decide what components are being targeted. Typically memory will be within the first ranges of memory so, if you use any address between 0x0000 and 0x7FFF, the bus controller will send the signals to the memory, and now other components can use different addresses, say 0x8000 to 0x8100 for the PCI devices (graphic cards, sound cards, etc), and so on. Some components might not care about the address, but it will still be used by the controller to decide what to do with the data.

Data Bus

This carries the data that we're sending, or receiving, from the memory or other components. This is what the bus controller uses to send the data to the memory or other components. This is also what the CPU uses to read or write data to the memory or other components.

Control Bus

This carries the control signals that are used to tell the memory or other components what to do, this can typically be a code that is to be understood by the other component. For example, the CPU can send a signal to the memory to read data, or to write data, or to reset the memory, or to tell the memory to wait for a signal from the CPU, and so on.

Making Use of the Address Space: turning our programs useful

Intruction set: Adding memory space access capabilities

Let's define two new instructions for our CPU, these will be responsible for writing and reading from memory.

// Write to memory addr R0, the data R1 and the control bits R2 (only the first 4 bits are used from R2)
WRITE R0, R1, R2

// Read from memory addr R0, store the data in R1 and the control bits R2 (only the first 4 bits are used from R2)
READ R0, R1, R2

We're also defining that we're using the full amount of memory possible (8KB) on the address space 0x0000 to 0x1FFFF and we have a keyboard available at 0x20000.

Using the New Instructions

With the new instructions we can now write a program that takes user input and performs an addition, let's take a look at the following program:

READ 0x20000, R0, 0x0 // Read from keyboard, store in R0
READ 0x20000, R1, 0x0 // Read from keyboard, store in R1
ADD R0, R1, R2 // Add R0 and R1, store in R2

This is a bit more useful than our example in Part 1, it takes user input and adds it together, but how do we get the result? We can't just print it to the screen, we didn't define how to do that yet. We can store it in memory, but how do we know where to store it?

Registers and Address Bus

If the registers are 16-bit in size, and the address bus is also 16-bit in size, how do we address the other components? There are several strategies that can be employed, such as not allowing for the full 16-bits to be used for RAM or another strategy is to have more than one bus controller, one for the memory and another for the other components, and the CPU can switch between them. The first strategy is the simplest, but it limits the amount of memory that can be used, the second strategy is more complex, but it allows for more memory to be used.

In our example CPU, we can have the last 3-bits of the memory address serve as a code for what is being targeted as that would give us a total of 8 potential components, for example, if the last 3-bits are 0, then the memory is targeted, if the last 3-bits are 1, then the hard drive is targeted, if the last 3-bits are 2, then the PCI devices are targeted, and so on. This means that the memory can only be 8KB in size, but it allows for more components to be used.

Introducing ROM: Read-only Memory

I've mentioned in Part 1 that typically a CPU starts executing instructions by looking at address 0 but we have our memory there. So let's complicate our scenario a bit more, we've already decided on the 13-bit address space for the memory but we're taking full advantage of it, we need to make some room for a ROM, so let's shorten a bit more and make it half.

Our ROM is a read-only bit of memory, it's non-volatile but it can hold the code we programmed into it, by taking another bit away from our memory space (and I mean RAM here) we're left with 4KB of RAM and we can now have 4 KB of ROM, we can use the ROM to store our program and the RAM to store the result.

Saving to Memory

So we now have, from 0x0000 to 0x0FFF ROM space where we can store our program and from 0x1000 to 0x1FFF RAM space where we can store our result, 0x2000 for the keyboard. We can write the program and store it in the ROM available, and then we can execute it, the CPU will start executing at address 0x0000, it will read the instruction, decode it and execute it, it will read the next instruction, decode it and execute it, and so on. Because we know our memory starts at 0x1000 we can use a register to keep track of where the last memory available is, and we can use that to store the result of the addition (reminder we only have 5 registers available).

MOV R0, 0x1000 // Move the value 0x1000 to R0, our memory pointer
READ 0x20000, R1, 0x0 // Read from keyboard, store in R1
READ 0x20000, R2, 0x0 // Read from keyboard, store in R2
ADD R1, R2, R3 // Add R1 and R2, store in R3
WRITE R0, R3, 0x0 // Write to memory addr R0, the data R3 and the control bits 0x0

Display: Adding a LCD screen

Now we have a program that can add two numbers and store the result in memory, but how do we know what the result is? We can't just look at the memory, we need to print it to the screen, but we don't have a screen, we only have a keyboard. So let's add a screen to our arquitechture, we'll add it at address 0x3000. How we communicate with the screen it will depend, we won't be talking about the monitor you're used to nowadays, let's focus on an LCD (Liquid Crystal Display), those little green screens typically with two rows of text available.

Defining the Liquid Crystal Display (LCD) protocol

These displays are very simple, but they require a precise set of instructions to work, they have a set of pins that are used to send data to the screen, and a set of pins that are used to send commands to the screen, the data pins are used to send the data that will be displayed on the screen, the command pins are used to tell the screen what to do, for example, to clear the screen, to move the cursor to a specific position, to turn the screen on or off, and so on. Usually you'll find the information about the pins and the communication protocol on the manufacturer manual, but for the sake of simplicity, let's assume that the communication protocol is as follows:

You send the command 0x0 followed by 0x1 to start up the screen
You send 0x2 to clear the screen, this will move the cursor to 0,0
You send 0x3 followed by the position you want to move the cursor, the first 8-bits carry the X and the last 8-bits carry the Y, note that the Y can only be 0 or 1 and the X can only be between 0 and 64
You send 0xF to write to the screen at the current cursor position, we need a way to define text so let's assume that we're using the ASCII table (8-bits), which fits well into our 16-bit data bus, in fact we can send two characters per operation! So we'll send the first character in the first 8-bits and the second character in the last 8-bits, and every character we write will move the cursor forward by one position, if the cursor is at the end of the line, it will move to the next line, if the cursor is at the end of the screen, it will move to the beginning of the screen.

Writing to the Screen

Let's make our program more interactive by, asking for the inputs and printing the result to the screen:

MOV R0, 0x1000 // Move the value 0x1000 to R0, our memory pointer

// Start the screen
WRITE 0x3000, 0x0000, 0x0 // Send the command 0x00 to the screen
WRITE 0x3000, 0x0000, 0x1 // Send the command 0x01 to the screen

// Writes INPUT 1: to the screen
WRITE 0x3000, 0x0000, 0x2 // Clear the screen
WRITE 0x3000, 0x4E49, 0xF // Write IN to the screen
WRITE 0x3000, 0x5055, 0xF // Write PU to the screen
WRITE 0x3000, 0x2053, 0xF // Write T<space> to the screen
WRITE 0x3000, 0x3A31, 0xF // Write 1: to the screen

READ 0x20000, R1, 0x0 // Read from keyboard, store in R1

// Writes INPUT 2: to the screen
WRITE 0x3000, 0x0000, 0x2 // Clear the screen
WRITE 0x3000, 0x4E49, 0xF // Write IN to the screen
WRITE 0x3000, 0x5055, 0xF // Write PU to the screen
WRITE 0x3000, 0x2053, 0xF // Write T<space> to the screen
WRITE 0x3000, 0x3A32, 0xF // Write 2: to the screen

READ 0x20000, R2, 0x0 // Read from keyboard, store in R2

ADD R1, R2, R3 // Add R1 and R2, store in R3
WRITE R0, R3, 0x0 // Write to memory addr R0, the data R3 and the control bits 0x0

// Writes RESULT: to the screen
WRITE 0x3000, 0x0000, 0x2 // Clear the screen
WRITE 0x3000, 0x5245, 0xF // Write RE to the screen
WRITE 0x3000, 0x5355, 0xF // Write SU to the screen
WRITE 0x3000, 0x4C54, 0xF // Write LT to the screen
WRITE 0x3000, 0x3A20, 0xF // Write :<space> to the screen
WRITE 0x3000, R3, 0xF // Write the result to the screen

This is getting harder now to write now, makes you start to appreciate compilers, a huge thank you to Grace Hopper who invented the first compiler.

Subroutines (or Functions)

If we take a look at the previously written assembly code, we can see that we're writing to the screen a lot, and we're writing the same thing over and over again, we can simplify this by creating a subroutine (or as we call them today, a function), a subroutine is a set of instructions that can be called from anywhere in the program, and it will execute the instructions and return to the caller. We can create a subroutine that writes a string to the screen, and we can call it whenever we need to write something to the screen, this will make our program smaller and easier to read.

The x86 Calling Convention

But how do we pass on arguments into the subrotine, and how do we get the result back? At this point, any method that we could divise would be fair game, we could use a predetermined set os registers: R0, R1, R2, R3 and R4 for arguments and R5 for the output, we could even reuse R0 as the output, overwritting whatever is there to have one more register available for arguments, as long as the caller knows it's going to happen but we could also use the memory RAM for this, and that is what happens nowadays. As this was a challenge every programmer faced and, if you wanted to share some assembly code with someone else, or if you wanted to call a subroutine defined by another developer, you could either both agree on how it should work, or you could use a convention, and that's what happened, there's a convention called x86 calling conventions that defines how arguments are passed to a subroutine and how the result is returned, and it's still used today. I'm not going into details about the convention, feel free to read the wikipedia article that does a decent job at explaining it, but it's a set of rules that are agreed upon by the caller and callee of every subroutine. This ensures that the caller does not lose critical registers when it's not expecting to, and it ensures that the callee knows where to find the arguments and where to store the result.

Interoperability: The Key

This convention is followed by all modern compilers, if both the C and Golang compilers follow this convention, then you can call a C function from Golang and vice-versa, and that's how we can use libraries written in C from Golang and vice-versa, this works because at the assembly level the CPU does not care what high-level language you used to write the code, it only cares about the instructions that are being executed and the convention ensures that Golang and C functions can call eachother, but this will always depend on compilers supporting the feature. However the same cannot be said for C# and Java, this is because they run on a virtual execution environment, you can technically call functions in DLLs that were compiled in C, but that does not happen the same way it does for C and Golang, it's just something their respective virtual execution environment's support.

Conclusion

Hopefully this post helped you understand how programs are executed on the CPU, and how the CPU communicates with other components, and how it can use the address space to communicate with other components. This is a very complex topic, and I've only scratched the surface, but I hope this helps you understand how the CPU works a bit better.

If you take anything from this post, it's that CPUs can be anything its creators design them to be, we could create a 128-bit CPU today, or even a 32-bit capable of addressing 64-bit memory space, but every design decision has a trade-off, and it's up to the designers to decide what trade-offs they want to make. Modern CPUs are the result of decades of research and development, decisions on tradeoffs that are reevaluated every generation, but most importantly designed with compatibility in mind, compatibility with other components and with the choosen architecture. Apple's "recent-ish" decision to move from Intel to ARM is a good example of this, ARM is a newer architecture so the baggage from the past is much smaller, and by designing their own CPU they can decide which components to support and how that support will work, meaning they can optimize everything to their own liking and save on both monetary and performance costs.

For the next part I want to expand on the role of the operating system and how it manages the CPU and the memory, and how it allows multiple programs to run at the same time, so stay tuned for the next part.

If you have any questions, comments or suggestions, please let me know, I'm always looking to improve my writing and my knowledge. Reach me at Twitter, GitHub, LinkedIn or via email at blog[at]itssilvestre.com.

Small Nuggets of Computer History

The following topics are not relevant to the main point of the post but add some relevant context to it, so I'm adding them here as a bonus.

Virtual Execution Environment Languages

The virtual execution environment were created to solve a small but noticeable problem for developers, if you've never worked with multiple environments you wouldn't have gone through this but if you write a C program and want it running on Linux for x86, x86-64 and ARM architectures you need to compile the program three times, even if the operating system is Linux for all of them, and you need to do it on a compatible machine (x86 can be compiled in a x86-64 machine, as the name indicates the x86-64 architecture was made to be compatible with it). This is because the CPU can only execute instructions that it understands, and the instructions for x86 are different from the instructions for x86-64 and ARM, so you need to compile the program for each architecture.

This is not a large problem but it's still a problem for the developer, because it means that you need to have a machine with the same architecture as the target machine to compile the program, and that's where virtual machines come in. .NET and Java developed this intermediate executable that needs to be compiled for every architecture it wants to be compatible with. This executable then is able to execute compiled C# and Java code, both virtual machines defined their own intermediate language (IL) that is akin to assembly, but only understandable by the virtual machine execution agent and not by the CPU itself, the execution agent acts as a translator between IL and machine-code, but that translation comes at a cost of performance and memory.

32-bit Windows Could Only Use 3.5 GB of RAM

Let me prephase that this was eventually addressed on some of the last 32-bit CPUs, but it was a limitation at some point.

You read the title right, if you're not from the 32 bit era of computers and I know some of you might not be, you may not be aware of this. Now if you've been following allow, a 32-bit CPU should have 2^32 of address space available which is 4 GB, so why 3.5 GB? Well, for the same reason we had to reduce the memory in our own example CPU, to leave address space available for other components, in this case, the graphics card. The graphics card needs to have it's own memory, and that memory needs to be mapped to the address space of the CPU, so the CPU can send data to the graphics card and the graphics card can send data to the CPU. This is called DMA (Direct Memory Access), and it's a way for the CPU to send data to other components without having to wait for the component to be ready to receive the data, the CPU can just send the data to the component and the component will read it when it's ready.

However, not all 32-bit systems and CPUs were the same, we had examples of servers allowing up to 64 GB of RAM, they allowed this by having an address space on the bus of 36-bits. This means that the CPU could address 2^36 = 64 GB of RAM, but this was not the norm, and even then individual applications running on these systems could only use 4 GB of RAM but the system add more RAM to give every application running, avoiding having to do disc swaps (which are very slow, if you don't it's basically taking a piece of memory and storing it on the disc in order to free that memory space for other uses).

Virtual Memory vs Physical Memory

If you were really really following along then you might have noticed, if on a 32-bit CPU the applications can only use 4 GB and the system can have up to 64 GB of RAM, how does the CPU address the RAM? The answer is that there's a concept called Virtual Memory which is a way for the CPU to address more memory than it's available, each application will have its own address space, and the CPU will map that address space to the actual address space of the RAM, this is done by the OS, and it's a very complex topic, but it's important to know that it exists. This also acts as a security feature because two applications running at the same time won't be able to access other applications memory, they will only be able to access their own memory. The trick played on those 64 GB systems was to map the 4 GB of address space of each application to a different 4 GB of the 64 GB of RAM available, this way each application could use 4 GB of RAM, and it would take longer before a disc swap needed to happen.

The Beeping Keyboard

This is not common nowadays but have you ever had your computer get stuck, you continue to type away at the keyboard hoping that will solve it (it never does) and the computer starts beeping? That's a warning that the keyboard buffer is full, yes your keyboard also has memory, not literally inside the keyboard, but a small bit of computer memory is used to store the keys you press, and when the buffer is full, the computer will beep to warn you that it's full. A fast typer could fill the buffer pretty fast, faster than the CPUs at the time could read from it.

This is a very simple example of how the CPU communicates with other components, if the CPU tried to read directly from the keyboard two things needed to be true: CPU had to be faster at checking for key presses than the user typing, the user needed to have the key pressed in order for the CPU to read it. In order to avoid this headache, the keyboard has a small bit of memory that stores the keys pressed, and the CPU can read from that memory whenever it wants, and the keyboard will store the keys pressed until the CPU reads them. Everytime a key is pressed a signal will be sent to the CPU, called an interrupt (topic for the next post), and the CPU will read the key from the keyboard memory, and the keyboard will store the next key pressed, and so on. This way the CPU can read the keys pressed at it's own pace, and the user can type as fast as they want.

Understanding CPU Operation: A Programmer's Guide to Program Execution (Part 1)

2023-08-15T00:00:00Z

First things first, if you didn't know, CPU stands for central processing unit and it's far easier to type, a term I'll use for brevity.

Introduction

Welcome to the first part of our series understanding how a CPU executes programs. In this part we'll go over the process that makes a program come to life within a CPU. If you're wondering why is this relevant to you as a programmer, software development works in layers and I've found in my experience that by understanding the layers below I'm able to be better at using, and understanding, the layer I'm working at.

Layers in Modern Software Development

When it comes to modern software development, we're dealing with a multi-layered landscape that's designed to make complex tasks more manageable. These layers serve as building blocks, each adding its own level of abstraction and functionality. By understanding these layers, we can navigate the intricacies of software development more effectively.

Abstraction and Simplification

CPUs and Machine Code: At the foundational level, CPUs execute machine code—binary sequences that the hardware can interpret directly. However, crafting machine code can be challenging for humans, leading to the emergence of higher-level languages like Assembly. Assembly provides a text-based representation of machine code instructions, offering developers a more human-readable interface for programming.

High-Level Languages: As software development progressed, we moved beyond Assembly to higher-level languages like C. These languages introduced abstractions that made coding more intuitive. Compilers then translated this code into machine code, bridging the gap between human-friendly languages and hardware-level execution.

Layered Development for Efficiency

Compiler Magic: The magic happens with compilers—the software responsible for translating high-level code into machine code. Compilers enable developers to express complex logic in ways that align with their thinking. The compiler takes care of mapping these expressions to the underlying machine code instructions, making the development process efficient and manageable.

Operating System Abstraction: Modern operating systems operate as a layer above the hardware, providing a bridge between applications and the CPU. They offer libraries that abstract complex operations, ensuring programmers don't need to worry about hardware specifics. Moreover, operating systems provide isolation between programs for security, ensuring one misbehaving program doesn't disrupt others.

Network Layers: The idea of layers extends to other domains, like networking. In networking, we have layers ranging from the physical layer that handles hardware signals to the application layer that deals with user interfaces. This layered approach simplifies network communication by breaking down complex processes into manageable steps.

Striking a Balance

While layers enhance development efficiency, they also create a challenge. Layers abstract lower-level details, shielding developers from complexity. However, this can be a double-edged sword when debugging or understanding how things work under the hood. Striking a balance between abstraction and understanding is key to becoming a proficient developer who can harness these layers effectively.

By grasping the concept of layers, we gain insight into how our programs interact with the underlying hardware and systems. This understanding empowers us to write better code, optimize performance, and navigate the complexities of software development more adeptly. It is far too often that I meet developers that know how to write code but seem totally obliviously to how a computer works and why things work the way they do, and I think that's a shame. As programmers we're not just users of the computer. but we're part of its development, extending its capabilities by writing custom software, and we should know how it all works, at least at a high level.

Disclaimer

Before we move on onto the explanation, the following explanations and examples will be based on a fictitious and simple CPU. Modern CPUs are complex pieces of hardware that rely on several years of development, concepts and techniques, so to make it easier to understand, we'll be using a CPU that only has the most basic features, and we'll be using a simplified instruction set, so that we can focus on the concepts and not on the details.

The CPU: More Than Just the "Brain"

If you know about computers you should already know what the CPU is, many call it the brain of the computer, but that's not really true, it's more akin to the heart, it's the piece of hardware that makes everything else work, by executing user-provided instructions.

CPUs nowadays can perform anything from very data movement to complex ones cryptographic operations, this allows applications to execute faster as the hardware is able to perform this operations faster than if they were implemented in software, and it also allows for more complex applications to be developed, as the CPU can perform operations that would be too complex (time complexity) to be done in software.

Unveiling the Execution Process

Key Concepts to Master

So in order to understand how it works, we need to lay out a few concepts first.

Instruction: The first one is the instruction, an instruction is an operation that is built into the CPU die (chip), and it represents the capabilities the CPU has. An instruction can be something like "add two numbers", "move data from memory to register", "jump to a specific memory address", etc.

An instruction will be documented and will have a name, and it will have a specific format that must be placed in a registry, so that the CPU may look at it and understand what must be done. Part of the instruction definition will be the parameters it takes, also defined in registers, so OP1 which is an ADD may require the operand 1 to be in the register R1 and the operand 2 to be in the register R2, and will output the result into the register R3. So a program that adds three numbers may look like this in Assembly (this is a simplified illustrative example, real assembly instructions will have a specific format and parameters, decided by the specific instruction set in use):
```
// Add 1, 2 and 3 together and store the result in R1
MOV R1, 1 // Stores the value 1 in R1
MOV R2, 2 // Stores the value 2 in R2
ADD R1, R2, R3 // Adds the values in R1 and R2 and stores the result in R3
MOV R4, 3 // Stores the value 3 in R4
ADD R3, R4, R5 // Adds the values in R3 and R4 and stores the result in R5
MOV R1, R5 // Stores the value in R5 in R1
```
Assembly is the lowest level of language we have, but even assembly is compiled into machine code, which is just a bunch of binary data that is loaded and executed by the CPU. See the following example of how that might be structured:

In the example above, you can see, two of the instructions we used above, it's showing those instructions in hexadecimal notation as they would be in machine code, I used the first 4 bits to represent the instruction being used, like an operation code, and then the following 12 bits for parameters, this is very suboptimal but great for showing off. The add has three arguments, 4 bits each, representing the registers R1, R2 and R3, and the move has two arguments, 4 bits for the register and 8 bits for the constant value. This is a simplified example, but it shows how the CPU can understand the instructions and execute them.
Instruction Set: Then we have the instruction set, that is the collection of all operations a certain CPU can perform, now this is not per CPU, otherwise companies like Microsoft, that makes an operative system capable of running on hundreds of CPUs would be unable to support so many of them. There are standard instruction sets that rule which operations are available and how they work, so that any CPU that implements that instruction set can run the same code. The most common instruction set nowadays is the x86-64, which is the one used by Intel and AMD CPUs, but there are others, like ARM, RISC-V, etc. ARM is a licensable instruction set and CPU designer that is used by many producers, while RISC-V aims to be an open-source CPU arquitechture.
Program: The next concept is the program, a program is a collection of instructions that are executed in order to perform a certain task. A program can be anything from a simple calculator to a complex video game.
Clock and Clock Speed Have you ever bought a computer, or a CPU, and saw that it has a clock speed of 4GHz? Well, that's due to a component called the clock generator or crystal oscillator. It's a piece of hardware that generates a signal at a certain frequency, and that signal is used to synchronize all the other components of the CPU. This is important because it means that the CPU can only perform one operation per clock cycle, so if the clock is running at 4GHz, it can only perform 4 billion operations per second. This is why clock speed is so important, the higher the clock speed, the more operations the CPU can perform per second. However, note that instructions take more than one clock cycle to execute, and different instructions can take different amounts of clock cycles, so the clock speed is not the only thing that matters, but it's a good indicator of how fast a CPU is.
Register: It's a very small piece of memory that is inside the CPU, and it's used to store data that is being used by the CPU. Registers are very fast, but they are also very small, so they are used to store data that is being used very often, like the result of an operation, or the data that is being operated on. Registers are also very important because they are the only way for the CPU to interact with the outside world, so if you want to read data from memory, you need to load it into a register, and if you want to write data to memory, you need to write it from a register. There are some special registers with specific purposes such as the PC register, stands for program counter and has the address for the next instruction to be loaded, and the IR register, stands for instruction register and is used to store the instructions that are being executed by the CPU.

Exploring CPU Architecture

CPUs architectures define the instruction set, the size and number of the registers, the memory address size, etc. There are many CPU architectures, but the most common ones are x86, x86-64/AMD64 and ARM. Let's take a look at the x86-64 arquitechture, it was initially created by Intel, but it's now used by both Intel and AMD CPUs (usually called AMD64, but they're mostly similar). It has a known instruction set (however support varies by CPU as they keep introducing new instructions as CPUs evolve), it defines the size of the registers (64 bits), the number of registers (16), the memory address size (64 bits), etc. Memory address size is usually the same as the register size (even if it's not always the case) because when you want to read data from memory, you need to load the address into a register, and if the register is smaller than the memory address, you won't be able to read the correct address.

Navigating the Execution Cycle

So now that we have all the concepts we need, let's see how it all comes together. Let's say we have a CPU that has 5 registers, 16 bits each, and a memory address size of 16 bits. This means that the CPU can only read 64 KB of memory, because 2 ^ 16 = 65536, and 65536 / 1024 KB = 64 KB (32 bits registers are able to handle 4 GB of memory, and 64 bits are able to reach 16 EB, which is 16 exabytes and more than enough for today's needs).

Once started, the CPU will generally be looking at the address 0 to look for it's first instruction, this address will be stored inside a special register meant to hold the current program memory address (PC, program counter). It's not uncommon for this address to contain just be a piece of code that then decides where the actual application is, often called a bootloader. But to understand how does the CPU actually execute anything, for that we need to revisit the concept of clock, see a clock is a digital signal that goes from 0 to 1 and back to 0, every time the clock goes to 1 (or 0 doesn't really matter I believe, so let's assume a 1 for now), every of the following bullets will happen at every clock cycle and it starts from boot (address 0x0000):

Load the instruction at the address stored in the PC register into the instruction register (IR).
Increment the PC register by one (this is a simplification, in current CPUs instructions may have different sizes).
Decode the instruction in the IR register, this means looking at the instruction set and understanding what the instruction means and what it does.
Execute the instruction, this means performing the operation the instruction is meant to do, like adding two numbers, moving data from memory to a register, etc.

This is typically called the "fetch-decode-execute cycle" and will happen for every instruction in the program. The above steps are just a massive loop that the CPU will keep on executing until it's turned off, it's that simple. Here we're assuming every step happens sequentially, and every instruction too, but in modern day CPUs there's also a different technique called "pipelining" that allows the CPU to start executing the next instruction before the current one is finished, but again, for simplification sake let's assume a sequential order in our examples.

Each step may take one or more clock cycles to execute, would be safe to assume that loading the instruction and incrementing the PC register takes one clock cycle, while decoding and executing the instruction may take more than one, but again, decoding maybe takes a fixed amount of clock cycles but execution will vary depending on the instruction. In modern CPUs there will often exist many ways of accomplishing the same task, it's up for the compilers to choose the instructions that take the least amount of clock cycles to improve application performance. This is also why CPUs of newer generations may be faster than older generations, even when they have the same clock speed, they may support new instructions that are faster to execute and that can be used by the compiler to improve performance, even if it requires the application developer to release a new version.

Part Conclusion

As this part comes to an end I hope that you can now have a better understanding of how a program is executed by the CPU, and how the CPU works in general. This is the foundation of knowledge that I believe has enabled me to be a better employee, a better colleague and ultimately a better software engineer.

In the next part we'll take a look at how the CPU interacts with the rest of the computer.