Skip to footer navigation.

« Oatmeal

Posts tagged c

Follow this tag's bespoke rss feed or return to the list of all tags.

Operators in C

Following up my notes on Data Types and Variables in C here are notes on operators in C.

An operator is a symbol that represents a mathematical or logical operation. An operator effects operands.

C provides a number of operators.

Some arithmetic operators include,

+
-
*
/
%

% is the most exciting of the list, it is called modulo and it returns the remainder after division. Of note, modulo can only be used on integers while the others can be used on any number.

This group of arithmetic operators are called binary arithmetic operators.”

There are also unary operators, or operators that work on just 1 operand.

++
--

These increment or decrement the value of the operand by 1. These work on both integers and floating point numbers.

Their behavior changes based on their position relative to the operand. E.g.

b = a++

Post-increment A by 1.

b = ++a

Here, pre-increment.

int a, b;

a = 0;
b = a++;

// => a is 1
// => b is 0

a = 0;
b = ++a;

// => a is 1
// => b is 1

Pre-increment is preformed before assignment to a new variable, while post-increment is preformed after assignment to a new variable.

Next up, assignment operators, or, relational operators.

These are operators that check for a relationship between two operands and return either 1 (true), or 0 (false). They always return ints, but can compare numbers or characters.

These operators include,

==
!=
>
<
>=
<=

Logical operators come from boolean algebra.

&&
||
!

&& is an operation of conjunction — intersection.

|| is an operation of union.

! is an operation of exclusion.

In boolean algebra variables can only be assigned either true or false values. In C, 1 or 0.

Truth table!

x y x && y x || y !x
0 0 0 0 1
1 0 0 1 0
0 1 0 1 1
1 1 1 1 0

Each row shows a possible combination of values.

! is unique among the the logical operators because it is actually a unary operator, taking only 1 operand.

The result of a boolean operator is always an int, either 0 or 1. 0 is false. 1 (or any non-0 number) is true.

boolean operators pair nicely with the bool data type if you are using C99 or newer.

Bitwise operators allow for the direct manipulation of bits. This is useful when working with memory addresses. They’re a wee bit complicated, but allow for extremely efficient operations.

The operators include,

&
|
^
~
<<
>>

These match the logical operators, which preform boolean operation on an entire number. Bitwise operators also preform boolean operation, but rather than doing so on a single number, they do so on every single bit of the operands…bit by bit.

This means that if you had 2 operands, A and B like so,

a = 10101010
b = 00001111

Where c = a & b

The bitwise & would check against each bit so that c = 000010101. This is, at first blush, admittedly a little baffling. To make matters a wee bit more confusing, shifting focus to >> and <<, the right and left shift operators.

b = a >> n

This shifts the bits in a to the right” by n steps, so b = a >> 3 would shift the bits of a by 3 steps.

If a started as 11001100 it would finish as 00011001, with 0’s being introduced to the left as the bits are shifted over. Another way of thinking about this is that b is a, but missing the 3 least significant bits.

1 << 0 = 1
1 << 1 = 2
1 << 2 = 4
1 << 3 = 8

The result is the same as multiplying the leading operand (here, 1) by 2 for each shifted bit, e.g. 1 << 1 = 2 can be thought of as 1 * 2, and 1 << 2 = 4 thought of as 1 * 2 * 2, a*2^n.

On the other hand, shifting right is dividing by 2 for each shifted bit, a/2^n.

#include <stdio.h>
#include <stdint.h>

int main (void)
{
  uint8_t a = 12; // 0000 1100
  uint8_t b = 5;  // 0000 0101

  // A & B  -->  0000 0100 = 4
  // A | B  -->  0000 1101 = 13
  // A ^ B  -->  0000 1001 = 9
  // A << 1 -->  0001 1000 = 24
  // A >> 1 -->  0000 0110 = 6
  
  printf("A = %d\n", a);
  printf("B = %d\n", b);
  printf("A & B = %d\n", a & b);
  printf("A | B = %d\n", a | b);
  printf("A ^ B = %d\n", a ^ b);
  printf("A << 1 = %d\n", a << 1);
  printf("A >> 1 = %d\n", a >> 1);

  return 0;
}

Bitwise operators are frequently used with bitmasks.

Using the bitwise & operator, two different bitmasks can be defined, one for bit clearing (if the bitmask is 0) and one for bit testing (if the bitmask is 1).

The bitwise | operator allows for a bitmask useful for bit setting, where, if a bitmask is 1, the result is 1.

Finally, the bitwise ^ operator allows for a bitmask that works as a toggle, switching the value of a bit from 1 to 0 or 0 to 1.

But how does this actually work? How can one actually preform bit manipulation? What if you’d like to set the Nth bit — set the bit in the 6th position to 1, for instance.

To do this use the bitwise | with a bitmask set to 1 in the 6th position of the bit.

result = date | 0b01000000; // the mask is a binary literal 
                            // where the bit in the 6th position is set to 1

Of course…this is grossly impractical and a pain in the butt to read.

Instead, create a bitmask using the << operator!

result = data | (1 << 6);

This’ll set the bit in position 6 to 1, and all others to 0.

A similar process works for clearing the Nth bit. To set the bit at the 5th position to 0 use the bitwise & with a mask set to 0 for the bit we’d like to reset.

result = data & ~(1 << 5); 

Here, making use of the << operator again, creating a bitmask set to 1 everywhere but the 5th bit.

NOTE! Since ~ is outside of the parentheses inside of which the left shift is calculated the compliment operation happens after the bits are shifted. Shift followed by compliment.

Next, how to select a subset of bits, e.g. select the bits from position 3 - 5.

To start, shift the bits from position 3 to position 0. In other words, shift right by 3 bits.

Now to select the bits in positions 0, 1, and 2. Bitwise & with a bitmask where those same positions are all set to 1 will allow for this.

Put together, this looks like this,

result = (data >> 3) & 0b111;

To be totally honest I find a lot of this bit-level stuff baffling. In the programming stuff I do on the day to day I’ve never had to reach for these tools — this may be because they aren’t needed for what I do, or because I don’t understand them enough to even realize when I should be reaching for them 🤷‍♂️

Assignment operators are used to assign a value to a variable. The simplest is =. There are other assignment operators, though, like the compound assignment operators,

+=
-=
*=
/=
%=

a += b is the same as a = a + b, and a *= b is the same as a = a * b and so on.

There are also compound forms of the bitwise operators,

&=
|=
^=
>>=
<<=

These work the same as the previous compound operators.

An entirely different beast when it comes to operators is the sizeof operator. The sizeof operator returns the number of bytes an operand takes up in memory. The size is determined by the operand’s type, and is known at compile-time, not run-time. The result will be an integer constant, and the operand can be a variable, a basic or a derived datatype, or even an expression.

The type of the returned data is of type size_t. The number of bytes available to size_t varies from compiler to compiler. The sizeof operator is useful because it allows one to avoid the hardcoding of certain fixed values into a program, instead, they can be determined from the data itself. This leads to more portable code.

No conversation about operators would be complete without discussing type conversion. C is strongly typed, but that doesn’t mean data is stuck forever and always as a specific type after initial declaration. The cast operator allows for the conversion of one data type to another. Be warned, sometimes casting from one type to another can result in a loss of some information because not all data types have the same size in memory, e.g. a char is teeny tiny, while a long long is pretty big.

Sometimes type conversion happens implicitly, the compiler takes the wheel! I think the most common scenario for this is integer promotion, where a char, for instance, is promoted” to an integer during certain mathematical operations. Similarly, when assigning a short int to a long int there is an implicit conversion, making the data type wider.”

There is a hierarchy to data conversion,

int                       // smallest 
unsigned int
long int
unsigned long int
long long int
unsigned long long int
float
double 
long double                // largest

Implicit conversions can only apply upwards,” e.g. a data type can only be cast to a larger” type. To convert downwards” an explicit conversion must be used.

A totally fabricated example of implicit conversion:

#include <stdio.h>

int main()
{
    int a = 1;
    long int b = 2;
    double c = 3.3;
    b = b + a; // implicit conversion: a is promoted to a long int
    c = c * b; // implicit conversion: b is promoted to a double
    return 0;
}

But wait! There are more operators!?

The ternary operator, the reference operator, the de-reference operator, the array reference operator, the member selection operator, and the member selection operator for pointers!

☠️☠️☠️☠️☠️

Operators have precedence rules, too. They are strict, and never changing. The means that an expression is always evaluated in the same way, no matter what.

Parenthesis can be added to help make precedence a bit more obvious and to control precedence, too, because items in parenthesis get evaluated first. So, one can change evaluation order by adding parenthesis.

Here is a link to a chart that describes operator precedence in C.

Data Types and Variables in C

I’ve been writing a heap of Lua lately — this has lead to my becoming interested, again, in C. Here are some ancient notes I dug up on the most basics of data types and variables in C.

All of a computer’s memory is comprised of bits. A sequence of 8 bits forms a byte. A group of bytes (typically 4 or 8) form a word. Each word is associated with a memory address. The address increases by 1 with each byte of memory.

In C, a byte is an object that is as big as the smallest addressable unit.

Bytes are the minimum addressable, 8 bit wide unit.

A variable is a container for data. A variable is a symbolic representation of a memory location, or address.

A variable is comprised of a few parts:

First, define the data type, then an identifier, and then, optionally, initialize the variable with some data.

int number_of_bananas = 124;

Here, int is the data type, number_of_bananas is the identifier and 124 is the data.

C is strongly typed, this means that the data type cannot be changed after it is declared. You can make the value immutable by turning it into a constant using the const keyword.

const int number_of_bananas = 124;

A data type is a collection of compile-time properties, including:

  • memory size and alignment
  • set of valid values
  • set of permitted operations

Some data types available in C include,

  • Numbers (int, float, hex, etc.)
  • Characters
  • Strings
  • Array
  • Complex data types, like structs and pointers

Numbers and characters are called fundamental data types” in C — all other data types are called derived data types” because they are derived from the fundamental types.

Integers, int, are any non-fractional numbers either negative or positive including 0. You would use an int to describe the number of pets you have — you cannot have a fractional number of pets…unless you’ve done something awful and/or are cosplaying as King Solomon.

ints come in both signed and unsigned flavors. An int can be negative or positive, while an unsigned int can be 0 or positive, never negative. unsigned ints are useful for when you need to express a very large positive value. So, if you were going to create a variable to represent the temperature in Fahrenheit, you would want to use an int since the temperature in Fahrenheit can be negative, positive or exactly 0. While, if you were going to create a variable to represent the temperature in Kelvin you would probably want to use an unsigned int since Kelvin starts at 0 and only goes up from 0.

You can define the unsigned int data type using the keyword unsigned int or just unsigned.

Beside coming in signed and unsigned variants, int also comes in different sizes –

  • short int
  • int
  • long int
  • long long int

These describe different byte sizes allotted to the value. These exist in unsigned variants, too. See stdint.h for waaaaay more on this.


Totally random aside! When displaying a variable you need to use the correct format specifier, so, if a plain ol’ int %d whereas if a long int %ld. Now, if you wanna format the number a bit more, you can also include a width to help pad the number, e.g. %7d will add 6 leading spaces before the number if it is 1 digit long, or 5 if the number is 2 digits long.

int the_number = 42;
printf("The Answer to life, the universe and everything is %7d\n", the_number);

// The Answer to life, the universe and everything is      42

floats and doubles can also include a number in their format string that defines their precision.

So, with 2 points of precision:

float pi = 3.14;
printf("%12.2f | PI\n", pi);
double pi2 = 314E-2;
printf("%12.2e | PI\n", pi2);

//      3.14 | PI
//  3.14e+00 | PI

Or with 4!

float pi = 3.14;
printf("%12.4f | PI\n", pi);
double pi2 = 314E-2;
printf("%12.4e | PI\n", pi2);

//  3.140000 | PI
//3.1400e+00 | PI

While ints represent discrete values floating point numbers (floats) are used to represent any number, negative or positive, including 0 and decimals, e.g. 3.14 is a float. This can also be written as 314E-2 as a double.

float pi = 3.14;
double pi2 = 314E-2;

char variables are represented numerically by an 8 bit signed integer (1 byte). This means that the available numeric range for char is from -128 to 127. This is the range of the ASCII table. BOOM, or from Wikipedia.

While char ranges from -128 to 127 unsigned char ranges from 0 to 255.

A boolean is a variable that can only take 1 of 2 values. Either true or false. C originally didn’t have any booleans, instead false was assumed to be 0 and anything other than 0 was considered to be true. While modern C supports boolean data types, treating 0 as false remains a common idiom. The boolean data type was introduced in the C99 standard.

An enumeration, or enum is a list of constants. It is useful for when you want to select exactly 1 option from a list of predefined values. Behind the scenes, enums are nothing more than numbers…this makes sense for a data type called an enumeration. enums return an index, not an identifier, e.g. 

enum Menu {
    COFFEE,  // 0
    JUICE,   // 1
    WAFFLES, // 2
};

enum menu order = JUICE;

printf("The order: %d\n", order);

// The order: 1

If you want to explicitly set an index on an enum option you can. Note that the numbers of the options picks up from whatever you defined.

enum months
{
    JAN = 1,
    FEB,
    MAR,
    APR,
    MAY,
    JUN,
    JUL,
    AUG,
    SEP,
    OCT,
    NOV,
    DEC,    
};

The above ensure that the months are numbered in a sane way…not starting from 0.

In reply to: chibicc: A Small C Compiler

Each commit of this project corresponds to a section of the book. For this purpose, not only the final state of the project but each commit was carefully written with readability in mind. Readers should be able to learn how a C language feature can be implemented just by reading one or a few commits of this project.

In reply to: The Ferret Lisp System | Irreal

The source of the whole system is a single Org mode file. If you had any doubts as to whether Org could support a literate programming approach in a non-trivial project