Write a Forth in Haskell: Part 01
Bootstrapping
Let us create a cabal project:
$ cabal init --simple -p forth --cabal-version=3.4 \
--language=GHC2021 --libandexe --tests \
--test-dir=test forth
Now, let’s open a new file, src/StateMachine.hs
, and make it a module with the following line:
module StateMachine where
Because we need a proper datatype to represent Forth’s execution model, I chose to implement it with a Linked List-based stack.
import Data.List qualified as List
newtype Stack = Stack {getStack :: [Integer]}
deriving newtype (Show, Eq)
What the above lines mean is that we create a newtype, a wrapper around
[Integer]
, with a constructor named Stack
, and an accessor function
called getStack
.
The deriving newtype
line allows us to inherit the Show
and Eq
instances
of the underlying type instead of the wrapper’s instances.
So what would an empty stack look like? Like this:
initStack :: Stack
initStack = Stack List.empty
We now need a function that will “process”, so to speak, the different operations we receive from our input:
process :: Stack -> Text -> Stack
process stack "+" = add
process stack "-" = sub
process stack a = push item stack
where
item = parseInt a
The
process
function is a pure function that carries its state, the stack
argument, with it.
We pattern-match on defined keywords and operators of the language, and call the
appropriate functions each time. The catch-all case at the bottom is there to grab numerical elements
from our input and simply push them on the stack, which we will then return.
That being said, we also need a couple of helpers.
The first one is
parseInt
. We have encountered it in the above where
-clause.
The core of its definition relies on the
decimal
function from Data.Text.Read
,
as well as
pack
from Data.Text
.
import Data.Text.Read (decimal)
import Data.Text (pack)
-- […]
parseInt :: Text -> Integer
parseInt a = either (error . pack) fst (decimal a)
However, this one-liner may be quite incomprehensible. Here is how we can rewrite this function:
parseInt a =
case decimal a of
Right result -> fst result
Left errorMsg -> error $ pack errorMsg
decimal
has the following type: Text -> Either String (Integer, Text)
,
which means it can either return an error message in its Left parameter (String
),
or return a tuple of (Integer, Text)
when parsing succeeds. The Text
part of
the tuple is used if and only if the number you intend to parse is followed with non-numerical characters. In practice, this translates to:
λ❯ decimal "32ee" :: Either String (Integer, Text)
Right (32,"ee")
Hence the use of
fst
on that result.
We simply do not care about the second part, only about the integer.
Now, you may be wondering about the use of
error
.
It doesn’t really return an integer, does it? Should it? Well, the thing about
this function is that it will return whatever type you ask of it, because it will stop the execution of the program.
Terribly unsafe from a types perspective, morally digusting, but we are going to need it.
At that point, here’s what our file looks like:
module StateMachine where
import Data.Text (pack)
import Data.Text.Read (decimal)
import Data.List (List)
import qualified Data.List as List
newtype Stack = Stack {getStack :: [Integer]}
deriving newtype (Show, Eq)
initStack :: Stack
initStack = Stack []
process :: Stack -> Text -> Stack
process stack "+" = add stack
process stack "-" = sub stack
process stack a = push item stack
where
item = parseInt a
parseInt :: Text -> Integer
parseInt a = either (error . pack) fst (decimal a)
-- Equivalent to
-- parseInt = case parseInt a of
-- Right result -> fst result
-- Left errorMsg -> error $ pack errorMsg
--
-- either :: (a -> c) -> (b -> c) -> Either a b -> c
-- ^^^^^^ ^^^^^^ ^^^^^^^^^^
-- │ │ │
-- this function this function The value
-- is called on is called on to be tested
-- the value in the value in
-- the `Left` the `Right`
Addition, subtraction
The next step is to implement the basics of stack manipulation.
The first function, push
, is implemented as a cons
operation:
push :: Integer -> Stack -> Stack
push item stack = item : stack
Its famous counterpart pop
will not be implemented yet.
Then, let’s take care of addition:
add :: Stack -> Stack
add stack =
if checkSize 2 stack
then
let (elems, newStack) = List.splitAt 2 (getStack stack)
result = sum elems
in push result (Stack newStack)
else
error "Stack underflow!"
Which brings us to our next helper: checkSize
.
checkSize :: Int -> Stack -> Bool
checkSize requiredSize stack =
(length (getStack stack)) >= requiredSize
Fundamentally, we need to be sure that the operation we make is safe at the stack level.
Now that we have all the cards, let’s combine them.
First, with the help of
List.splitAt
, we grab a 2-tuple of lists. The first one supposedly contains
the first two elements, and the second one has the rest of the stack in it.
With the help of
checkSize
, we then make sure to only proceed to the actual sum if and only if the first list, elems
, has two elements.
And finally, we push the result to the stack.
The subtraction function is similar in intent:
sub :: Stack -> Stack
sub stack =
if checkSize 2 stack
then
let (elems, newStack) = List.splitAt 2 (getStack stack)
result = sub' $ List.reverse elems
sub' = List.foldl1 (-)
in push result (Stack newStack)
else
error "Stack underflow!"
With the slight difference that we define our own subtraction function, and we reverse the list beforehand so we get a correct result.
foldl1
iterates over a container and applies the supplied function ((-)
) over those elements while keeping an accumulator. By convention, the 1
suffix tells us that
we do not need to supply a initial accumulator to the recursive function, assuming a non-empty container to start with.
So far, we implemented addition and subtraction. Their definion were a bit convoluted,
unnessarily even, due to a lack of a better abstraction. But be patient.
In part 02, we will explore more traditional Forth operations, such as duplication, drop, and rotation, amongst others.