Overview
This course focuses on understanding the hardware components of a modern computer, from the perspective of the programmer. The goal is to provide sufficiently broad understanding of how various components fit together, and sufficient depth to begin to appreciate how to make use of these components to make code run better, safer, and faster.
What you need for this course?
-
Textbook (Required): Computer Systems: A Programmer’s Perspective, 3rd Edition. Available in bookstores and libraries. Students in the 2016–2017 class taking this course have produced a set of Hebrew notes for future generations taking this course at Bar Ilan. THANKS!
-
Linux installed on your machine: dual boot preferred, virtual machine OK if you are aiming for mediocrity. I recommend the linux mint distribution, but ubuntu, or any other modern Linux distribution is fine.
IMPORTANT: Teaching during COVID-19 (Corona) conditions
During the 2020-21 semester, the course will be taught asynchronously. This means pre-recorded lectures will be available for download and watching at least a week prior to the scheduled class. It is the students’ responsibility to watch the lectures at their leisure—but prior to class. During the class, we will be talking about specific topics that require discussion, have an opportunity to answer questions (sent in advance), and expand on the lectures.
You’ll need a quiet room and computer. Camera on computer or phone is fine.
Lecture Notes
(incrementally available)
Introduction
Main point: Understand why this is important. Why know something about the hardware? Why learn assembly language (a language most programmers never use after taking course)? Why worry about optimization when we have great compilers?
- Book sections 1.1–1.6, 1.9–1.10
- Ground rules
- Overview
- Optional (and recommended): Book sections 1.7–1.8
Representing Data
- Book sections 2.1–2.3. Section 2.3: There’s a lengthy mathematical explanation of the properties of two’s complement representation. I consider it less important than the bottom line: Understanding when and how to use shifts, and the problems of overflow and underflow.
- Basic terminology: Bits, bytes, nibbles, and words
- Data vs instructions: they are all just bits…
- Integers: ones-complement, twos-complement, unsigned, casting
- Characters and strings
- Great tip about how to choose the best integer size
- Book section 2.4
- Floating point number representations: single-precision and double-precision
- Optional:
- Youtube video showing how to convert a floating point number from decimal to binary
- Example of the representation of a floating point number
Representing Instructions (Code)
Introduction: Programming and the Machine Level
- Instructions are bit patterns, too.
- Book sections 3.2–3.5
- Optional: 3.1, 3.5.5
- Optional: Additional resources for learning assembly:
- An assembly web resource for learning about assembly language, different assemblers, etc.
- A basic programming book which uses assembly language as the main teaching language
- [The GAS (GNU Assembler) Manual][http://sourceware.org/binutils/docs/as/index.html]. Just in case you’re stuck using GAS.
- Optional: Book sections 3.11-3.11 (Floting Point in Assembly)
- There is no ’loop’ instruction, nor ’else’, nor ‘switch’. It’s just IFs and GOTOs.
- Book: Section 3.6
- Branching with a stack
- Passing arguments, returning results: Calling conventions
- Book: Section 3.7 (much also covered in the lecture))
- Optional: A wikipedia page about different C/C++ compilers’ function calling conventions, and their impact.
- Arrays are stretches of memory. Arranged as we want (or the language standard states)
- Structures are also stretches of memory. Need alignment.
- Book: Sections 3.8-3.9 (also in lectures)
- Required Debugging, viruses and secure code (Book: Section 3.10 self study)
Using assembly language (and some system tools) to do simple reverse engineering.
The Memory Hierarchy, and the Wonderful Cache
Physical Memory, and the CPU:Memory speed gap
- The CPU gets instructions and data from the memory
- There are serial and parallel buses, both used in different ways
- We distinguish types of memory: volatile and non-volatile, random access, etc.
- In general, faster memory is much more expensive (in $!)
- Caching is a general mechanism for building hierarchies of small faster memory and large slower memory, which still works (mostly) as if entire memory is fast. This is the memory hierarchy.
- Book:
- 6.1.1 (only at a high level; optional: memory cell organization)
- 6.1.2–6.1.3
- 6.2–6.4 (unions: optional)
- Caching relies on code locality
- We can write code with better or worse locality, and therefore can get better or worse running time.
- Differences in speed can be very large
- Book: Sections 6.5–6.7
Optimizing Code
- Always use a profiler to identify where to invest optimization effort
- Different types of profilers (compile-time, run-time)
- Book: 5.14
- A short section on removing run-time bottlenecks in the Performance Tuning article in Wikipedia.
- The linux function getrusage() can be used to measure user CPU time and system CPU time of a process.
- Optional: More on profiling
- A list of profilers and related tools.
- A tutorial on performance tools.
- valgrind is a set of tools for profiling and debugging. Very powerful. See the wikipedia article on valgrind.
- The QPT profiler.
Machine-Independent Optimizations
- Some optimizations are better left to the compiler. That’s not what this talk is about.
- Programmer’s job: remove barriers and bottlenecks
- Even small steps can be very effective
- A linux journal article on using gcc optimization
- Compiler Optimization article in Wikipedia
- Book: 5.1, 5.3–5.11, 5.13. Optional: Data-flow charts.
- Optional: A recent story about machine-independent optimization of math functions in libc.
Machine-Dependent Optimizations
- Modern CPUs employ several design ideas which allow faster execution of instructions:
- Pipelining, allowing one instructions to begin execution before the one before it completes.
- Superscalar design, allowing multiple instructions to use specific functional units in parallel.
- As a result, instruction latency and throughput vary between instructions
- Dependencies between instructions can cause delays
- Loop unrolling (accumulator and unrolled steps) can remove dependencies and speed up loops.
- Use of SIMD instructions and special instruction sets also very useful for optimization.
- Up to programmer to know when to use what.
- Optional:
- Loop unrolling through Duff’s Device, and a copy of Duff’s original post and some explanation.
- A 2010 story about Fast machine-dependent optimizations of memcpy().
Linking
- Compilation process (including assembly) results in object files
- Linker job is to collect them together, resolve symbols
- We distinguish between weak and strong symbols
- Book: Sections 7.1–7.12. Optional: 7.13