« Oatmeal

Notes on 6502 Assembly

The NES runs a very slightly modified 6502 processor. What follows are some very introductory, and not at all exhaustive notes on 6502 Assembly, or ASM.

If you find this at all interesting, Easy 6502 is a really great introductory primer on 6502 Assembly that lets you get your hands dirty right from a web browser.

Numbers

Numbers prefixed with one of the following:

• `\$` are hexadecimal format
• `#` are literal numbers

Any other number without either of these prefixes refers to a memory location.

So,

``    LDA #\$01``

Loads the hex value `\$01` into register `A`.

Registers and flags

There are 3 primary registers,

• `A`
• `X`
• `Y`

`A` is usually called the accumulator.

Each register holds a single byte

`SP` is the stack pointer, a register that is decremented every time a byte is pushed onto the stack and incremented whenever a byte is popped off the stack.

`PC` is the program counter. `PC` is how the processor keeps track of where in the currently running program it is.

Processor flags

Each flag is 1 bit, so all 7 flags can live in a single byte

Instructions

In 6502 Assembly instructions are like words in Forth, or functions in a higher order programming language. Every instruction takes 0 or 1 arguments.

An example of some instructions,

``````    LDA #\$c0  ; Load the hex value \$c0 into the A register
TAX       ; Transfer the value in the A register to X
INX       ; Increment the value in the X register
ADC #\$c4  ; Add the hex value \$c4 to the A register
BRK       ; Break - we're done``````

For a full list of 6502 ASM instructions see,

6502 ASM has a handful of branching instructions — they almost all rely on flags to determine what branch to follow.

The 6502 has 65536 bytes of available memory. These bytes are typically described using the HEX range \$0000 - \$ffff.

When the 6502 refers to addressing modes, it really means What is the source of the data used in this instruction?”

The different modes are,

Absolute: `\$c000`

With absolute addressing, the full memory location is used as the argument to the instruction.

Zero page: `\$c0`

All instructions that support absolute addressing (with the exception of the jump instructions) also have the option to take a single-byte address. This type of addressing is called zero page” - only the first page (the first 256 bytes) of memory is accessible. This is faster, as only one byte needs to be looked up, and takes up less space in the assembled code as well.

Zero page,X: `\$c0,X`

In this mode, a zero page address is given, and then the value of the `X` register is added.

Zero page,Y: `\$c0,Y`

This is the equivalent of zero page,X, but can only be used with `LDX` and `STX`.

Absolute,X and absolute,Y: `\$c000,X` and `\$c000,Y`

These are the absolute addressing versions of zero page,X and zero page,Y.

Immediate: `#\$c0`

Immediate addressing doesn’t strictly deal with memory addresses - this is the mode where actual values are used. For example, `LDX #\$01` loads the value `\$01` into the `X` register. This is very different to the zero page instruction `LDX \$01` that loads the value at memory location `\$01` into the `X` register.

Relative: `\$c0` (or label)

Relative addressing is used for branching instructions. These instructions take a single byte, which is used as an offset from the following instruction.

Implicit

Some instructions don’t deal with memory locations, for example, `INX` - increment the `X` register. These have implicit addressing because the argument is implied by the instruction.

Indirect: `(\$c000)`

Indirect addressing uses an absolute address to look up another address. The first address gives the least significant byte of the address, and the following byte gives the most significant byte.

Indexed indirect: `(\$c0,X)`

This one’s kinda weird. It’s like a cross between zero page,X and indirect. Basically, you take the zero page address, add the value of the `X` register to it, then use that to look up a two-byte address.

Indirect indexed: `(\$c0),Y`

Indirect indexed is like indexed indirect, but instead of adding the `X` register to the address before de-referencing, the zero page address is de-referenced, and the `Y` register is added to the resulting address.

For more on the different modes of addressing,

The stack

The current depth of the stack is measured by the stack pointer, a special register. The stack lives in memory between `\$0100` and `\$01ff`. The stack pointer is initially `\$ff`, which points to memory location `\$01ff`. When a byte is pushed onto the stack, the stack pointer becomes `\$fe`, or memory location `\$01fe`, and so on.

Jumping

Jumping is like branching with two main differences:

• First, jumps are not conditionally executed
• Second, they take a two-byte absolute address

For small programs, this second detail isn’t important, as you’ll be using labels, and the assembler works out the correct memory location from the label. For larger programs though, jumping is the only way to move from one section of the code to another.

Other Resources

Because these are but the barest of minimum notes, here are some more resources for continued reference.

Operators in C

Following up my notes on Data Types and Variables in C here are notes on operators in C.

An operator is a symbol that represents a mathematical or logical operation. An operator effects operands.

C provides a number of operators.

Some arithmetic operators include,

``````+
-
*
/
%``````

`%` is the most exciting of the list, it is called modulo and it returns the remainder after division. Of note, modulo can only be used on `integers` while the others can be used on any number.

This group of arithmetic operators are called binary arithmetic operators.”

There are also unary operators, or operators that work on just 1 operand.

``````++
--``````

These increment or decrement the value of the operand by 1. These work on both integers and floating point numbers.

Their behavior changes based on their position relative to the operand. E.g.

``b = a++``

Post-increment A by 1.

``b = ++a``

Here, pre-increment.

``````int a, b;

a = 0;
b = a++;

// => a is 1
// => b is 0

a = 0;
b = ++a;

// => a is 1
// => b is 1``````

Pre-increment is preformed before assignment to a new variable, while post-increment is preformed after assignment to a new variable.

Next up, assignment operators, or, relational operators.

These are operators that check for a relationship between two operands and return either 1 (true), or 0 (false). They always return `int`s, but can compare numbers or characters.

These operators include,

``````==
!=
>
<
>=
<=``````

Logical operators come from boolean algebra.

``````&&
||
!``````

`&&` is an operation of conjunction — intersection.

`||` is an operation of union.

`!` is an operation of exclusion.

In boolean algebra variables can only be assigned either true or false values. In C, 1 or 0.

Truth table!

x y x && y x || y !x
0 0 0 0 1
1 0 0 1 0
0 1 0 1 1
1 1 1 1 0

Each row shows a possible combination of values.

`!` is unique among the the logical operators because it is actually a unary operator, taking only 1 operand.

The result of a boolean operator is always an `int`, either 0 or 1. 0 is false. 1 (or any non-0 number) is true.

boolean operators pair nicely with the `bool` data type if you are using C99 or newer.

Bitwise operators allow for the direct manipulation of bits. This is useful when working with memory addresses. They’re a wee bit complicated, but allow for extremely efficient operations.

The operators include,

``````&
|
^
~
<<
>>``````

These match the logical operators, which preform boolean operation on an entire number. Bitwise operators also preform boolean operation, but rather than doing so on a single number, they do so on every single bit of the operands…bit by bit.

This means that if you had 2 operands, `A` and `B` like so,

``````a = 10101010
b = 00001111``````

Where `c = a & b`

The bitwise & would check against each bit so that `c = 000010101`. This is, at first blush, admittedly a little baffling. To make matters a wee bit more confusing, shifting focus to `>>` and `<<`, the right and left shift operators.

``b = a >> n``

This shifts the bits in `a` to the right” by `n` steps, so `b = a >> 3` would shift the bits of `a` by 3 steps.

If `a` started as `11001100` it would finish as `00011001`, with 0’s being introduced to the left as the bits are shifted over. Another way of thinking about this is that `b` is `a`, but missing the 3 least significant bits.

``````1 << 0 = 1
1 << 1 = 2
1 << 2 = 4
1 << 3 = 8``````

The result is the same as multiplying the leading operand (here, 1) by 2 for each shifted bit, e.g. `1 << 1 = 2` can be thought of as `1 * 2`, and `1 << 2 = 4` thought of as `1 * 2 * 2`, `a*2^n`.

On the other hand, shifting right is dividing by 2 for each shifted bit, `a/2^n`.

``````#include <stdio.h>
#include <stdint.h>

int main (void)
{
uint8_t a = 12; // 0000 1100
uint8_t b = 5;  // 0000 0101

// A & B  -->  0000 0100 = 4
// A | B  -->  0000 1101 = 13
// A ^ B  -->  0000 1001 = 9
// A << 1 -->  0001 1000 = 24
// A >> 1 -->  0000 0110 = 6

printf("A = %d\n", a);
printf("B = %d\n", b);
printf("A & B = %d\n", a & b);
printf("A | B = %d\n", a | b);
printf("A ^ B = %d\n", a ^ b);
printf("A << 1 = %d\n", a << 1);
printf("A >> 1 = %d\n", a >> 1);

return 0;
}``````

Bitwise operators are frequently used with bitmasks.

Using the bitwise `&` operator, two different bitmasks can be defined, one for bit clearing (if the bitmask is 0) and one for bit testing (if the bitmask is 1).

The bitwise `|` operator allows for a bitmask useful for bit setting, where, if a bitmask is 1, the result is 1.

Finally, the bitwise `^` operator allows for a bitmask that works as a toggle, switching the value of a bit from 1 to 0 or 0 to 1.

But how does this actually work? How can one actually preform bit manipulation? What if you’d like to set the Nth bit — set the bit in the 6th position to 1, for instance.

To do this use the bitwise `|` with a bitmask set to 1 in the 6th position of the bit.

``````result = date | 0b01000000; // the mask is a binary literal
// where the bit in the 6th position is set to 1``````

Of course…this is grossly impractical and a pain in the butt to read.

Instead, create a bitmask using the `<<` operator!

``result = data | (1 << 6);``

This’ll set the bit in position 6 to 1, and all others to 0.

A similar process works for clearing the Nth bit. To set the bit at the 5th position to 0 use the bitwise `&` with a mask set to 0 for the bit we’d like to reset.

``result = data & ~(1 << 5); ``

Here, making use of the `<<` operator again, creating a bitmask set to 1 everywhere but the 5th bit.

NOTE! Since `~` is outside of the parentheses inside of which the left shift is calculated the compliment operation happens after the bits are shifted. Shift followed by compliment.

Next, how to select a subset of bits, e.g. select the bits from position 3 - 5.

To start, shift the bits from position 3 to position 0. In other words, shift right by 3 bits.

Now to select the bits in positions 0, 1, and 2. Bitwise `&` with a bitmask where those same positions are all set to 1 will allow for this.

Put together, this looks like this,

``result = (data >> 3) & 0b111;``

To be totally honest I find a lot of this bit-level stuff baffling. In the programming stuff I do on the day to day I’ve never had to reach for these tools — this may be because they aren’t needed for what I do, or because I don’t understand them enough to even realize when I should be reaching for them 🤷‍♂️

Assignment operators are used to assign a value to a variable. The simplest is `=`. There are other assignment operators, though, like the compound assignment operators,

``````+=
-=
*=
/=
%=``````

`a += b` is the same as `a = a + b`, and `a *= b` is the same as `a = a * b` and so on.

There are also compound forms of the bitwise operators,

``````&=
|=
^=
>>=
<<=``````

These work the same as the previous compound operators.

An entirely different beast when it comes to operators is the `sizeof` operator. The `sizeof` operator returns the number of bytes an operand takes up in memory. The size is determined by the operand’s type, and is known at compile-time, not run-time. The result will be an integer constant, and the operand can be a variable, a basic or a derived datatype, or even an expression.

The type of the returned data is of type `size_t`. The number of bytes available to `size_t` varies from compiler to compiler. The `sizeof` operator is useful because it allows one to avoid the hardcoding of certain fixed values into a program, instead, they can be determined from the data itself. This leads to more portable code.

No conversation about operators would be complete without discussing type conversion. C is strongly typed, but that doesn’t mean data is stuck forever and always as a specific type after initial declaration. The cast operator allows for the conversion of one data type to another. Be warned, sometimes casting from one type to another can result in a loss of some information because not all data types have the same size in memory, e.g. a `char` is teeny tiny, while a `long long` is pretty big.

Sometimes type conversion happens implicitly, the compiler takes the wheel! I think the most common scenario for this is integer promotion, where a `char`, for instance, is promoted” to an integer during certain mathematical operations. Similarly, when assigning a `short int` to a `long int` there is an implicit conversion, making the data type wider.”

There is a hierarchy to data conversion,

``````int                       // smallest
unsigned int
long int
unsigned long int
long long int
unsigned long long int
float
double
long double                // largest``````

Implicit conversions can only apply upwards,” e.g. a data type can only be cast to a larger” type. To convert downwards” an explicit conversion must be used.

A totally fabricated example of implicit conversion:

``````#include <stdio.h>

int main()
{
int a = 1;
long int b = 2;
double c = 3.3;
b = b + a; // implicit conversion: a is promoted to a long int
c = c * b; // implicit conversion: b is promoted to a double
return 0;
}``````

But wait! There are more operators!?

The ternary operator, the reference operator, the de-reference operator, the array reference operator, the member selection operator, and the member selection operator for pointers!

☠️☠️☠️☠️☠️

Operators have precedence rules, too. They are strict, and never changing. The means that an expression is always evaluated in the same way, no matter what.

Parenthesis can be added to help make precedence a bit more obvious and to control precedence, too, because items in parenthesis get evaluated first. So, one can change evaluation order by adding parenthesis.

Here is a link to a chart that describes operator precedence in C.

Data Types and Variables in C

I’ve been writing a heap of Lua lately — this has lead to my becoming interested, again, in C. Here are some ancient notes I dug up on the most basics of data types and variables in C.

All of a computer’s memory is comprised of bits. A sequence of 8 bits forms a byte. A group of bytes (typically 4 or 8) form a word. Each word is associated with a memory address. The address increases by 1 with each byte of memory.

In C, a byte is an object that is as big as the smallest addressable unit.

Bytes are the minimum addressable, 8 bit wide unit.

A variable is a container for data. A variable is a symbolic representation of a memory location, or address.

A variable is comprised of a few parts:

First, define the data type, then an identifier, and then, optionally, initialize the variable with some data.

``int number_of_bananas = 124;``

Here, `int` is the data type, `number_of_bananas` is the identifier and `124` is the data.

C is strongly typed, this means that the data type cannot be changed after it is declared. You can make the value immutable by turning it into a constant using the `const` keyword.

``const int number_of_bananas = 124;``

A data type is a collection of compile-time properties, including:

• memory size and alignment
• set of valid values
• set of permitted operations

Some data types available in C include,

• Numbers (int, float, hex, etc.)
• Characters
• Strings
• Array
• Complex data types, like structs and pointers

Numbers and characters are called fundamental data types” in C — all other data types are called derived data types” because they are derived from the fundamental types.

Integers, `int`, are any non-fractional numbers either negative or positive including 0. You would use an `int` to describe the number of pets you have — you cannot have a fractional number of pets…unless you’ve done something awful and/or are cosplaying as King Solomon.

`int`s come in both signed and unsigned flavors. An `int` can be negative or positive, while an `unsigned int` can be 0 or positive, never negative. `unsigned int`s are useful for when you need to express a very large positive value. So, if you were going to create a variable to represent the temperature in Fahrenheit, you would want to use an `int` since the temperature in Fahrenheit can be negative, positive or exactly 0. While, if you were going to create a variable to represent the temperature in Kelvin you would probably want to use an `unsigned int` since Kelvin starts at 0 and only goes up from 0.

You can define the `unsigned int` data type using the keyword `unsigned int` or just `unsigned`.

Beside coming in signed and unsigned variants, `int` also comes in different sizes –

• `short int`
• `int`
• `long int`
• `long long int`

These describe different byte sizes allotted to the value. These exist in `unsigned` variants, too. See `stdint.h` for waaaaay more on this.

Totally random aside! When displaying a variable you need to use the correct format specifier, so, if a plain ol’ `int` `%d` whereas if a `long int` `%ld`. Now, if you wanna format the number a bit more, you can also include a width to help pad the number, e.g. `%7d` will add 6 leading spaces before the number if it is 1 digit long, or 5 if the number is 2 digits long.

``````int the_number = 42;
printf("The Answer to life, the universe and everything is %7d\n", the_number);

// The Answer to life, the universe and everything is      42``````

`float`s and `double`s can also include a number in their format string that defines their precision.

So, with 2 points of precision:

``````float pi = 3.14;
printf("%12.2f | PI\n", pi);
double pi2 = 314E-2;
printf("%12.2e | PI\n", pi2);

//      3.14 | PI
//  3.14e+00 | PI``````

Or with 4!

``````float pi = 3.14;
printf("%12.4f | PI\n", pi);
double pi2 = 314E-2;
printf("%12.4e | PI\n", pi2);

//  3.140000 | PI
//3.1400e+00 | PI``````

While `int`s represent discrete values floating point numbers (`float`s) are used to represent any number, negative or positive, including 0 and decimals, e.g. `3.14` is a `float`. This can also be written as `314E-2` as a `double`.

``````float pi = 3.14;
double pi2 = 314E-2;``````

`char` variables are represented numerically by an 8 bit signed integer (1 byte). This means that the available numeric range for `char` is from -128 to 127. This is the range of the `ASCII` table. BOOM, or from Wikipedia.

While `char` ranges from `-128` to `127` `unsigned char` ranges from `0` to `255`.

A `boolean` is a variable that can only take 1 of 2 values. Either `true` or `false`. C originally didn’t have any booleans, instead `false` was assumed to be `0` and anything other than `0` was considered to be `true`. While modern C supports boolean data types, treating `0` as `false` remains a common idiom. The `boolean` data type was introduced in the C99 standard.

An enumeration, or `enum` is a list of constants. It is useful for when you want to select exactly 1 option from a list of predefined values. Behind the scenes, `enum`s are nothing more than numbers…this makes sense for a data type called an enumeration. `enums` return an index, not an identifier, e.g.

``````enum Menu {
COFFEE,  // 0
JUICE,   // 1
WAFFLES, // 2
};

printf("The order: %d\n", order);

// The order: 1``````

If you want to explicitly set an index on an `enum` option you can. Note that the numbers of the options picks up from whatever you defined.

``````enum months
{
JAN = 1,
FEB,
MAR,
APR,
MAY,
JUN,
JUL,
AUG,
SEP,
OCT,
NOV,
DEC,
};``````

The above ensure that the months are numbered in a sane way…not starting from 0.