Computer Structure 89-230 | Gal A. Kaminka: Professor\* \*and more.

Overview

This course focuses on understanding the hardware components of a modern computer, from the perspective of the programmer. The goal is to provide sufficiently broad understanding of how various components fit together, and sufficient depth to begin to appreciate how to make use of these components to make code run better, safer, and faster.

What you need for this course?

Textbook (Required): Computer Systems: A Programmer’s Perspective, 3rd Edition. Available in bookstores and libraries. Students in the 2016–2017 class taking this course have produced a set of Hebrew notes for future generations taking this course at Bar Ilan. THANKS!
Linux installed on your machine: dual boot preferred, virtual machine OK if you are aiming for mediocrity. I recommend the linux mint distribution, but ubuntu, or any other modern Linux distribution is fine.

IMPORTANT: Teaching during COVID-19 (Corona) conditions

During the 2020-21 semester, the course will be taught asynchronously. This means pre-recorded lectures will be available for download and watching at least a week prior to the scheduled class. It is the students’ responsibility to watch the lectures at their leisure—but prior to class. During the class, we will be talking about specific topics that require discussion, have an opportunity to answer questions (sent in advance), and expand on the lectures.

You’ll need a quiet room and computer. Camera on computer or phone is fine.

Lecture Notes

(incrementally available)

Introduction

Main point: Understand why this is important. Why know something about the hardware? Why learn assembly language (a language most programmers never use after taking course)? Why worry about optimization when we have great compilers?

Book sections 1.1–1.6, 1.9–1.10
Ground rules
Overview
Optional (and recommended): Book sections 1.7–1.8

Representing Data

Presentation Part 1:

Book sections 2.1–2.3. Section 2.3: There’s a lengthy mathematical explanation of the properties of two’s complement representation. I consider it less important than the bottom line: Understanding when and how to use shifts, and the problems of overflow and underflow.
Basic terminology: Bits, bytes, nibbles, and words
Data vs instructions: they are all just bits…
Integers: ones-complement, twos-complement, unsigned, casting
Characters and strings
Great tip about how to choose the best integer size

Presentation Part 2:

Book section 2.4
Floating point number representations: single-precision and double-precision
Optional:
Youtube video showing how to convert a floating point number from decimal to binary
Example of the representation of a floating point number

Representing Instructions (Code)

Introduction: Programming and the Machine Level

Instructions are bit patterns, too.
Book sections 3.2–3.5
Optional: 3.1, 3.5.5
Optional: Additional resources for learning assembly:
- An assembly web resource for learning about assembly language, different assemblers, etc.
- A basic programming book which uses assembly language as the main teaching language
- [The GAS (GNU Assembler) Manual][http://sourceware.org/binutils/docs/as/index.html]. Just in case you’re stuck using GAS.
Optional: Book sections 3.11-3.11 (Floting Point in Assembly)

Conditionals and Branches

There is no ’loop’ instruction, nor ’else’, nor ‘switch’. It’s just IFs and GOTOs.
Book: Section 3.6

Procedures and Functions

Branching with a stack
Passing arguments, returning results: Calling conventions
Book: Section 3.7 (much also covered in the lecture))
Optional: A wikipedia page about different C/C++ compilers’ function calling conventions, and their impact.

Data Structures

Arrays are stretches of memory. Arranged as we want (or the language standard states)
Structures are also stretches of memory. Need alignment.
Book: Sections 3.8-3.9 (also in lectures)
Required Debugging, viruses and secure code (Book: Section 3.10 self study)

Simple Reverse Engineering

Using assembly language (and some system tools) to do simple reverse engineering.

The Memory Hierarchy, and the Wonderful Cache

Physical Memory, and the CPU:Memory speed gap

The CPU gets instructions and data from the memory
- There are serial and parallel buses, both used in different ways
We distinguish types of memory: volatile and non-volatile, random access, etc.
In general, faster memory is much more expensive (in $!)
Caching is a general mechanism for building hierarchies of small faster memory and large slower memory, which still works (mostly) as if entire memory is fast. This is the memory hierarchy.
Book:
- 6.1.1 (only at a high level; optional: memory cell organization)
- 6.1.2–6.1.3
- 6.2–6.4 (unions: optional)

Optimizing code for cache use

Caching relies on code locality
We can write code with better or worse locality, and therefore can get better or worse running time.
Differences in speed can be very large
Book: Sections 6.5–6.7

Optimizing Code

Profiling Code

Always use a profiler to identify where to invest optimization effort
Different types of profilers (compile-time, run-time)
Book: 5.14
A short section on removing run-time bottlenecks in the Performance Tuning article in Wikipedia.
The linux function getrusage() can be used to measure user CPU time and system CPU time of a process.
Optional: More on profiling
- A list of profilers and related tools.
- A tutorial on performance tools.
- valgrind is a set of tools for profiling and debugging. Very powerful. See the wikipedia article on valgrind.
- The QPT profiler.

Machine-Independent Optimizations

Some optimizations are better left to the compiler. That’s not what this talk is about.
Programmer’s job: remove barriers and bottlenecks
Even small steps can be very effective
A linux journal article on using gcc optimization
Compiler Optimization article in Wikipedia
Book: 5.1, 5.3–5.11, 5.13. Optional: Data-flow charts.
Optional: A recent story about machine-independent optimization of math functions in libc.

Machine-Dependent Optimizations

Modern CPUs employ several design ideas which allow faster execution of instructions:
- Pipelining, allowing one instructions to begin execution before the one before it completes.
- Superscalar design, allowing multiple instructions to use specific functional units in parallel.
- As a result, instruction latency and throughput vary between instructions
Dependencies between instructions can cause delays
Loop unrolling (accumulator and unrolled steps) can remove dependencies and speed up loops.
Use of SIMD instructions and special instruction sets also very useful for optimization.
Up to programmer to know when to use what.
Optional:
- Loop unrolling through Duff’s Device, and a copy of Duff’s original post and some explanation.
- A 2010 story about Fast machine-dependent optimizations of memcpy().

Linking

Compilation process (including assembly) results in object files
Linker job is to collect them together, resolve symbols
We distinguish between weak and strong symbols
Book: Sections 7.1–7.12. Optional: 7.13