Dart Neural Network Part 1: Implementation

A neural network created entirely in the Dart programming language and without packages.

Wed, October 5 2022

jake hockey

Jake Landers

Developer and Creator

hey

These are a collection of my thoughts and reflections while working on the project as well as a technical walk-through of some of the functions/classes. If you want to look at the live example, follow this link. If you want to see the source code, follow this link.

NOTE:

This is not meant to be a tutorial on how neural networks work. This is to show the rational behind my own implementation of a neural network. If you are looking for a more interactive tutorial, then the video by Sebastian Lague or the book by Sentdex linked in the next section are great resources, and what I personally used to wrap my head around this implementation.

Introduction (Skipable)

So you have decided to construct your own neural network. You tell yourself: "Hey! If I create my OWN neural network, I'll understand how they work better!" full of optimism and joy. Maybe you spend an hour or two tinkering around and begin to question why this is even valuable. You spend ages trying to determine the best language to use, factoring in that you want to have some support from a modern language, but you don't want to rely on packages to do the heavy lifting. You get gifted a (wonderful) book written in Python that goes through all of the steps on creating your own neural network. Said book goes on to collect dust on your nightstand for 9 months.

But then there comes a day where you are GOING to accomplish the task of creating your own neural network, whatever that looks like. You know you don't know much going in, but you think you have a solid grasp on computer science and mathematics. Then you start reading your book and work for 2 weeks straight getting this damn thing to work properly. You choose Dart as your language, accepting that the performance hits will be worth easily throwing a font-end on your project with Flutter (and you tell yourself "I can always port it to rust if I need speed." but you never will).

This fictional character is me and the factors and decisions I made leading up to working on this project. I am not going to lie and say the inspiration was all mine. An excellent video by Sebastian Lague walked through his experience creating a neural network in C#. I watched that video and said "Hey! I can totally do that too!". So a couple weeks after watching that video, I set out on my journey to finally build my neural network.

Inspiration

I followed the Neural Networks from Scratch in Python book pretty heavily. If you have not bought that book from the great artificial intelligence Youtuber Sentdex, I cannot recommend it highly enough. The way he walks through the explanations and math behind how a particular function works before showing the Python implementation was fundamental to me learning the math behind the network.

Vector Class

After the first few chapters of the book, it became extremely evident that I am going to need a class to handle some of the advanced arithmetic that you get when using Numpy. As my goal with this project was to use as few packages as possible (preferably 0), my own implementation was necessary.

My implementation is one that is not impossible to extend to greater than 2D arrays, but would require some work. An abstract class named Vector was created to give some common functionality between a 1D array and 2D array. This class implements the iterable trait, and as such needs tons of overload functions that I did not include here.

Vector Abstract Class

1abstract class Vector<T> implements Iterable { 2 /// list the vector is built around 3 late List<T> val; 4 ... 5} 6

Ths is the base class that powers most of the vector and matrix math needed by the network. After this basic implementation, two classes that implement this class were created: Vector1 and Vector2. This may have not been the most accurate naming choice for what these classes do, but to me they made sense. Vector1 is a 1D vector, and Vector2 is a 2D vector.

The following is from vector1.dart and vector2.dart to show the class along with the constructors I created.

Vector1

1class Vector1 extends Vector<num> { 2 /// Create a 1D vector with a size of [size] filled with [fill]. 3 Vector1(int size, {num fill = 0}) { 4 val = List.generate(size, (index) => fill); 5 } 6 7 /// Create a 1D vector from a 1D array 8 Vector1.from(List<num> list) { 9 val = List.from(list); 10 } 11 12 /// Create a copy of the vector from the passed [Vector1]. 13 Vector1.fromVector(Vector1 vec) { 14 val = List.from(vec.val); 15 } 16 17 /// Create an empty 1D vector 18 Vector1.empty() { 19 val = []; 20 } 21 22 /// Create a 1D vector of size [size] filled with random 23 /// double values multiplied by [scaleFactor]. [seed] can be 24 /// used to set the seed of the Random class from dart:math. 25 Vector1.random(int size, {double scaleFactor = 0.01, int seed = seed}) { 26 math.Random rng = math.Random(seed); 27 List<num> out = []; 28 for (var i = 0; i < size; i++) { 29 out.add(rng.nextDouble() * scaleFactor); 30 } 31 val = out; 32 } 33 34 /// Create a vector of the same shape as the passed 35 /// vector filled with the value of [fill] 36 Vector1.like(Vector1 vec, {num fill = 0}) { 37 val = List.generate(vec.length, (index) => fill); 38 } 39 40 ... 41} 42

Vector2

1class Vector2 extends Vector<Vector1> { 2 Vector2 get T { 3 List<List<num>> out = List.generate(this[0].length, (_) => []); 4 5 for (var i = 0; i < length; i++) { 6 for (var j = 0; j < this[i].length; j++) { 7 out[j].add(this[i][j]); 8 } 9 } 10 11 return Vector2.from(out); 12 } 13 14 /// Generate an empty 2D vector of [rows, height] 15 Vector2(int rows, int cols, {num fill = 0}) { 16 val = List.generate( 17 rows, 18 (_) => Vector1(cols, fill: fill), 19 ); 20 } 21 22 /// Create a 2D vector from a 2D array 23 Vector2.from(List<List<num>> list) { 24 val = List.from([for (var i in list) Vector1.from(i)]); 25 } 26 27 /// Generate copy of the passed [vec] 28 Vector2.fromVector(Vector2 vec) { 29 val = List.from([for (var i in vec.val) Vector1.fromVector(i)]); 30 } 31 32 // Create an empty 2D vector 33 Vector2.empty() { 34 val = []; 35 } 36 37 /// Create a 2D vector of size [rows, cols] filled with 38 /// random values multipled by [scaleFactor]. [seed] can 39 /// be used to set the value of the Random class from dart:math. 40 Vector2.random(int rows, int cols, 41 {double scaleFactor = 0.01, int seed = seed}) { 42 math.Random rng = math.Random(seed); 43 Vector2 list = Vector2.empty(); 44 for (var i = 0; i < rows; i++) { 45 Vector1 temp = Vector1.empty(); 46 for (var j = 0; j < cols; j++) { 47 var rand = rng.nextDouble() * scaleFactor; 48 if (rng.nextBool()) { 49 rand = -rand; 50 } 51 temp.add(rand); 52 } 53 list.add(temp); 54 } 55 val = list.val; 56 } 57 58 /// Create a 2D vector that is of size [size, size]. 59 /// Each diagonal will be filled with the value of [fill] (default 1), 60 /// and the rest of the values filled with 0. 61 /// For example, an eye of 3 will look like: 62 /// ``` 63 /// [[1,0,0] 64 /// [0,1,0] 65 /// [0,0,1]] 66 /// ``` 67 Vector2.eye(int size, {int fill = 1}) { 68 Vector2 out = Vector2.empty(); 69 for (var i = 0; i < size; i++) { 70 Vector1 temp = Vector1.empty(); 71 for (var j = 0; j < size; j++) { 72 if (i == j) { 73 temp.add(fill); 74 } else { 75 temp.add(0); 76 } 77 } 78 out.add(temp); 79 } 80 val = out.val; 81 } 82 83 /// Create a 2D vector from a 1D array in the form of [eye]. 84 /// This will create a 2D vector in the size of [input.length, input.length], 85 /// with the values of the array making up the diagonal and the 86 /// rest of the values being filled with 0 87 /// For example, with a passed of array = [5,4,3], the created 88 /// vector will look like: 89 /// ``` 90 /// [[5,0,0] 91 /// [0,4,0] 92 /// [0,0,3]] 93 /// ``` 94 Vector2.diagonalFlat(List<num> input) { 95 Vector2 out = Vector2.empty(); 96 for (var i = 0; i < input.length; i++) { 97 Vector1 temp = Vector1.empty(); 98 for (var j = 0; j < input.length; j++) { 99 if (i == j) { 100 temp.add(input[i]); 101 } else { 102 temp.add(0); 103 } 104 } 105 out.add(temp); 106 } 107 val = out.val; 108 } 109 110 /// Create a vector of the same shape as the passed [vec] 111 /// filled with the value of [fill]. 112 Vector2.like(Vector2 vec, {num fill = 0}) { 113 List<Vector1> out = []; 114 for (Vector1 i in vec) { 115 out.add(Vector1.like(i, fill: fill)); 116 } 117 val = out; 118 } 119 120 ... 121} 122

As it should be evident from the code samples above, the Vector1 class is a basic wrapper around a List<num> type. Vector2 is very similar, just instead of a List<List<num>>, I use List<Vector1> to get some free functionality out of the Vector1 class.

Why Create a Vector Class?

Now why would I choose to just wrap a List<num> in a pointless class that would just cause confusion? The answer lies in vector arithmetic. While the concepts behind dot products and vector addition/subtraction/etc. may be simple, having to re-implement them over and over again is a hassle and inefficient. And while most of the functionality may have been accomplished by extending the List<num> type, the arithmetic plus the constructors in an object oriented language just made sense to me.

Dot Product

The first thing I tried to tackle was the Dot product. The challenge here was to allow for (1D, 1D), (1D, 2D), and (2D, 2D) products.

1Vector dot(Vector v1, Vector v2) { 2 if (v1 is Vector1) { 3 if (v2 is Vector1) { 4 // both 1D 5 return _dot1D1D(v1, v2); 6 } else { 7 // v1 1D v2 2D 8 return _dot1D2D(v1, v2 as Vector2); 9 } 10 } else { 11 if (v2 is Vector1) { 12 // v1 is 2D v2 is 1D 13 return _dot1D2D(v2, v1 as Vector2); 14 } else { 15 // both 2D 16 return _dot2D2D(v1 as Vector2, v2 as Vector2); 17 } 18 } 19} 20
NOTE:

Wrapper dot function

1Vector1 _dot1D1D(Vector1 v1, Vector1 v2) { 2 assert(v1.length == v2.length, "Lists must be the same length"); 3 double out = 0; 4 for (int i = 0; i < v1.length; i++) { 5 out += v1[i] * v2[i]; 6 } 7 return Vector1.from([out]); 8} 9 10Vector1 _dot1D2D(Vector1 v1, Vector2 v2) { 11 assert(v1.length == v2[0].length, 12 "The elements in v2 should be the same length as v1"); 13 List<num> out = []; 14 for (var i = 0; i < v2.length; i++) { 15 out.add(_dot1D1D(v1, v2[i]).first); 16 } 17 return Vector1.from(out); 18} 19 20Vector2 _dot2D2D(Vector2 v1, Vector2 v2) { 21 assert(v1[0].length == v2.length, 22 "v2 must have the same length as the element length inside of v1"); 23 Vector2 transposedv2 = v2.T; 24 25 List<List<num>> out = []; 26 for (var i in v1.val) { 27 List<num> col = []; 28 for (var j in transposedv2.val) { 29 col.add(_dot1D1D(i, j).first); 30 } 31 out.add(col); 32 } 33 34 return Vector2.from(out); 35} 36

Vector Arithmetic

The hardest part to get right was the arithmetic. I had to rewrite this function multiple times because there would be some little issue plaguing me. You can check the commit history to see the evolution of this code. There are 4 different operations I implemented here. Addition, subtraction, multiplication, and division. The meat of the code is wrapped inside a function that takes an arithmetic function as an argument, to allow passing +,-,*,/ into.

1 /// Arithmetic can be performed on the vector. [other] can be of 3 2 /// different types: [Vector1], [Vector2], and [num]. On operators 3 /// that are not bi-directional, the workaround is to either 4 /// multiply the result by negative 1, or use the `replaceWhere` 5 /// method. 6 Vector operator +(dynamic other) { 7 return _arithmetic(other, (x1, x2) => x1 + x2); 8 } 9 10 /// Arithmetic can be performed on the vector. [other] can be of 3 11 /// different types: [Vector1], [Vector2], and [num]. On operators 12 /// that are not bi-directional, the workaround is to either 13 /// multiply the result by negative 1, or use the `replaceWhere` 14 /// method. 15 Vector operator -(dynamic other) { 16 return _arithmetic(other, (x1, x2) => x1 - x2); 17 } 18 19 /// Arithmetic can be performed on the vector. [other] can be of 3 20 /// different types: [Vector1], [Vector2], and [num]. On operators 21 /// that are not bi-directional, the workaround is to either 22 /// multiply the result by negative 1, or use the `replaceWhere` 23 /// method. 24 Vector operator *(dynamic other) { 25 return _arithmetic(other, (x1, x2) => x1 * x2); 26 } 27 28 /// Arithmetic can be performed on the vector. [other] can be of 3 29 /// different types: [Vector1], [Vector2], and [num]. On operators 30 /// that are not bi-directional, the workaround is to either 31 /// multiply the result by negative 1, or use the `replaceWhere` 32 /// method. 33 Vector operator /(dynamic other) { 34 return _arithmetic(other, (x1, x2) => x1 / x2); 35 } 36
1 /// Arithmetic can be performed on the vector. [other] can be of 3 2 /// different types: [Vector1], [Vector2], and [num]. On operators 3 /// that are not bi-directional, the workaround is to either 4 /// multiply the result by negative 1, or use the `replaceWhere` 5 /// method. 6 Vector _arithmetic(dynamic other, num Function(num x1, num x2) arithmetic) { 7 // check if other is of type num 8 if (other is num) { 9 if (this is Vector1) { 10 Vector1 out = Vector1.empty(); 11 for (num i in val as List<num>) { 12 out.add(arithmetic(i, other)); 13 } 14 return out; 15 } else if (this is Vector2) { 16 Vector2 out = Vector2.fromVector(this as Vector2); 17 for (var i = 0; i < length; i++) { 18 for (var j = 0; j < out[i].length; j++) { 19 out[i][j] = arithmetic(out[i][j], other); 20 } 21 } 22 return out; 23 } else { 24 throw "Vector is not of type Vector1 or Vector2"; 25 } 26 } 27 // determine types of vectors to add 28 if (this is Vector1) { 29 if (other is Vector1) { 30 // both 1D 31 assert( 32 val.length == other.val.length, 33 "Both vectors need to be the same length", 34 ); 35 Vector1 list = Vector1.empty(); 36 for (var i = 0; i < val.length; i++) { 37 list.add(arithmetic((val[i] as num), (other.val[i]))); 38 } 39 return list; 40 } else { 41 // this is 1D, other is 2D 42 assert( 43 val.length == (other.val[0] as Vector1).length, 44 "The first vector's length needs to match the elements inside the second vector's length", 45 ); 46 Vector2 list = Vector2.empty(); 47 for (var i = 0; i < other.val.length; i++) { 48 Vector1 temp = Vector1.empty(); 49 for (var j = 0; j < (other.val[i] as Vector1).length; j++) { 50 temp.add(arithmetic((val[j] as num), other.val[i][j])); 51 } 52 list.add(temp); 53 } 54 return list; 55 } 56 } else { 57 if (other is Vector1) { 58 // this is 2D, other is 1D 59 assert( 60 (val[0] as Vector1).length == other.val.length, 61 "The first vector item's length needs to match the second vector's length", 62 ); 63 Vector2 list = Vector2.empty(); 64 for (var i = 0; i < val.length; i++) { 65 Vector1 temp = Vector1.empty(); 66 for (var j = 0; j < (val[i] as Vector1).length; j++) { 67 temp.add(arithmetic((other.val[j]), (val[i] as Vector1)[j])); 68 } 69 list.add(temp); 70 } 71 return list; 72 } else { 73 // both 2D 74 Vector2 list = Vector2.empty(); 75 76 for (var i = 0; i < val.length; i++) { 77 Vector1 temp = Vector1.empty(); 78 for (var j = 0; j < (val[i] as Vector1).length; j++) { 79 if (other[0].length == 1) { 80 if (other.length == 1) { 81 temp.add(arithmetic( 82 (val[i] as Vector1)[j], (other.val[0] as Vector1)[0])); 83 } else { 84 temp.add(arithmetic( 85 (val[i] as Vector1)[j], (other.val[i] as Vector1)[0])); 86 } 87 } else { 88 if (other.length == 1) { 89 temp.add(arithmetic( 90 (val[i] as Vector1)[j], (other.val[0] as Vector1)[j])); 91 } else { 92 temp.add(arithmetic( 93 (val[i] as Vector1)[j], (other.val[i] as Vector1)[j])); 94 } 95 } 96 } 97 list.add(temp); 98 } 99 return list; 100 } 101 } 102 } 103

As you may be able to see, this implementation is pretty rough. I manually check for types, and throw errors when those types are not found. The result of this is the code gets to exist only once, but when using this function you need to cast the result to the type you "expect" (either a Vector1 or a Vector2). This is not safe at all, but works if you know the implementation.

Another limitation of this function is the inability to reverse the arithmetic. This is not an issue for addition or multiplication because 1 + 5 = 5 + 1 and 1 * 5 = 5 * 1. But this is not true for subtraction and division. So I created a function named replaceWhere in each of the vector classes that lets you perform more fine-tuned actions on the elements inside the vector. For example, you can use this function to subtract 1 by all values in the vector and return the results in another vector.

1 /// Create a [Vector1] containing all of the values 2 /// returned by the [logic] function and return this 3 /// [Vector1]. 4 Vector1 replaceWhere(num Function(int index) logic) { 5 Vector1 out = Vector1.like(this); 6 for (int i = 0; i < length; i++) { 7 out[i] = logic(i); 8 } 9 return out; 10 } 11 12 /// Create a [Vector2] containing all of the values 13 /// returned by the [logic] function and return this 14 /// [Vector2]. 15 Vector2 replaceWhere(num Function(int i, int j) logic) { 16 Vector2 out = Vector2.like(this); 17 for (int i = 0; i < length; i++) { 18 for (int j = 0; j < this[i].length; j++) { 19 out[i][j] = logic(i, j); 20 } 21 } 22 return out; 23 } 24

Printing 2D Vectors

During debugging of the program, I needed a way to print the contents of the 2D vector in a way I could tell what was actually contained by the lists. I drew heavy inspiration from numpy. This implementation is far from perfect, but does the job well enough.

1 2 String toString({int precision = 6}) { 3 String calc(Vector2 list, bool exp, int startIndex) { 4 late String out; 5 if (startIndex == 0) { 6 out = "["; 7 } else { 8 out = " "; 9 } 10 for (var i = 0; i < list.length; i++) { 11 out += " ${startIndex + i} ["; 12 for (var j = 0; j < list[i].length; j++) { 13 if (!list[i][j].isNegative) { 14 out += " "; 15 } 16 if (exp) { 17 out += list[i][j].toStringAsExponential(precision); 18 } else { 19 out += list[i][j].toStringAsFixed(precision); 20 } 21 if (j != list[i].length - 1) { 22 out += " "; 23 } 24 } 25 out += "]"; 26 if (i != list.length - 1) { 27 out += "\n "; 28 } else { 29 out += "]"; 30 } 31 } 32 return out; 33 } 34 35 if (length > 10) { 36 Vector2 beg = subVector(0, 5) as Vector2; 37 Vector2 end = subVector(length - 5, length) as Vector2; 38 String out1 = 39 calc(beg, beg.any((e1) => e1.any((e2) => e2 != 0 && e2 < 0.001)), 0); 40 String out2 = calc( 41 end, 42 end.any((e1) => e1.any((e2) => e2 != 0 && e2 < 0.001)), 43 val.length - 5); 44 return "$out1\n ...\n$out2"; 45 } else { 46 return calc( 47 this, val.any((e1) => e1.any((e2) => e2 != 0 && e2 < 0.001)), 0); 48 } 49 } 50

There is tons more code contained inside the vector classes that give more functionality, like sum(), abs(), flatMax(), pow(), sqrt(), exp(), and log().

Layers

I implemented 1 type of layer, a fully connected dense layer. All of the types of functions have an abstract class holding shared functionality along with specific implementations in corresponding files. For the layer, as there is only 1 implementation, this does not make much sense. But it does pave the way if other layers are to be implemented, such as a dropout layer.

The layer also has an activation function wrapped inside of it so that the function does not need to be called separately. These could be separated easily, and might have to based on more complex networks.

Abstract Class

1abstract class Layer { 2 late Vector2 weights; 3 late Vector2 biases; 4 late Activation activation; 5 late int inputSize; 6 late int numNeurons; 7 Vector2? inputs; 8 Vector2? output; 9 10 // back prop values 11 Vector2? dweights; 12 Vector2? dbiases; 13 Vector2? dinputs; 14 15 // momentum values 16 Vector2? weightMomentums; 17 Vector2? biasMomentums; 18 19 // weight cache values 20 Vector2? weightCache; 21 Vector2? biasCache; 22 23 // regularization values 24 late double weightRegL1; 25 late double weightRegL2; 26 late double biasRegL1; 27 late double biasRegL2; 28 29 // constructors 30 Layer(); 31 Layer.fromMap(Map<String, dynamic> map); 32 33 // required functional implementations 34 void forward(Vector2 inputs); 35 void backward(Vector2 dvalues); 36 37 38 String toString() { 39 return "Inputs:\n$inputs\nWeights:\n$weights\nBiases:\n$biases\ndWeights:\n$dweights\ndBiases:\n$dbiases\ndInputs:\n$dinputs"; 40 } 41 ... 42} 43

The rest of the class contains functions from saving and loading the layer values from json values, which is how I save the network.

Dense Layer

Layer Forward Pass

The forward pass is pretty easy to comprehend. The inputs to the layer are saved in memory, then the dot product of the inputs and weights is taken, with a bias added. This value makes up the output, which is in turned passed through the forward pass of the activation function.

1 2 void forward(Vector2 inputs) { 3 assert(inputs[0].length == weights.length, 4 "The elements inside of inputs need to be of the same length as the weights"); 5 // save the inputs for backprop 6 this.inputs = inputs; 7 // run forward pass 8 // output = (dot(inputs, weights) + biases) as Vector2; 9 10 // send though activation function 11 activation.forward((dot(inputs, weights) + biases) as Vector2); 12 output = activation.output; 13 } 14

Layer Backward Pass

The first operation performed on the backward pass is the activation function backward pass. The result of which is bound to activation.dinputs. Then, a gradient descent function is performed on the values, slightly tweaking each of the weights and biases of the layers based on the activation function.

The last part of the function is some regularization. This is to ensure the network does not rely on any one neuron too much. L1 is not used much at all, with L2 being the main way this is accomplished.

1 2 void backward(Vector2 dvalues) { 3 assert(inputs != null, "inputs is null, run forward() first"); 4 // send through activation function 5 activation.backward(dvalues); 6 7 // gradients on params 8 dweights = dot(inputs!.T, activation.dinputs!) as Vector2; 9 dbiases = activation.dinputs!.sum(axis: 0, keepDims: true) as Vector2; 10 11 // gradients on regularization 12 if (weightRegL1 > 0) { 13 var dL1 = Vector2.like(weights, fill: 1); 14 dL1 = dL1.replaceWhere((i, j) => weights[i][j] < 0 ? -1 : dL1[i][j]); 15 dweights = dweights! + (dL1 * weightRegL1) as Vector2; 16 } 17 if (weightRegL2 > 0) { 18 dweights = dweights! + (weights * (2 * weightRegL2)) as Vector2; 19 } 20 if (biasRegL1 > 0) { 21 var dL1 = Vector2.like(biases, fill: 1); 22 dL1 = dL1.replaceWhere((i, j) => biases[i][j] < 0 ? -1 : dL1[i][j]); 23 dbiases = dbiases! + (dL1 * biasRegL1) as Vector2; 24 } 25 if (biasRegL2 > 0) { 26 dbiases = dbiases! + (biases * (2 * biasRegL2)) as Vector2; 27 } 28 29 // gradients on values 30 dinputs = dot(activation.dinputs!, weights.T) as Vector2; 31 } 32

Activation Functions

Activation functions are incredibly important for the learning behavior of the network. They decide how to tweak the weights and biases of the network.

The following activation functions were implemented:

Abstract Class

Again, an abstract class was used to hold some common code between all of the activation functions.

1abstract class Activation { 2 Vector2? inputs; 3 Vector2? output; 4 Vector2? dinputs; 5 6 void forward(Vector2 inputs); 7 void backward(Vector2 dvalues); 8 9 String name(); 10} 11

ReLU

I will only go over one activation function, but this is probably the most important one. I use this activation function inside the hidden layer, and a sigmoid activation function for the output.

ReLU Forward Pass

The forward pass for the ReLU activation function is painfully simple. If the value is below 0, set it to 0. If the value is above 0, keep the value the same.

1 2 void forward(Vector2 inputs) { 3 // save inputs 4 this.inputs = Vector2.fromVector(inputs); 5 // print(inputs.subVector(1, 2)); 6 7 // calculate the output values from the inputs 8 output = maximum(0, inputs) as Vector2; 9 // print(output!.subVector(1, 2)); 10 } 11

ReLU Backward Pass

The backward pass is the exact same concept. I did this implementation much later than the forward pass, so I forgot the maximum() function existed, so I used the replaceWhere() function instead. It results in the exact same output regardless.

1 2 void backward(Vector2 dvalues) { 3 // make a copy of dvalues since we will be modifying it 4 dinputs = Vector2.fromVector(dvalues); 5 // print(dinputs!.subVector(1, 2)); 6 7 // zero the gradient where the values are negative 8 dinputs = 9 dinputs!.replaceWhere((i, j) => inputs![i][j] < 0 ? 0 : dinputs![i][j]); 10 // for (var i = 0; i < inputs!.length; i++) { 11 // for (var j = 0; j < inputs![i].length; j++) { 12 // if (inputs![i][j] < 0) { 13 // dinputs![i][j] = 0; 14 // } 15 // } 16 // } 17 18 // print(dinputs!.subVector(1, 2)); 19 } 20

Loss Functions

Another critical component of the network is the loss function. This is the beginning of the backwards pass for the network. There is a lot of math that goes into a loss function, and the two that were implemented are as follows:

Categorical Cross-Entropy

For the example that I have hosted at nn.jakelanders.com, the network needs to give the percent confidence for each number that it thinks each digit is. So, we need the network to "categorize" the values. So, we are going to use the categorical cross-entropy loss function.

Loss Forward Pass

For the forward pass, we are going to clip each input between 1e-7 and 1-1e-7 to avoid taking the log of 0.

1 2 Vector1 forward(Vector2 predictions, Vector1 labels) { 3 assert(predictions.length == labels.length, 4 "The prediction length and label length need to match"); 5 Vector1 loss = Vector1.empty(); 6 7 for (var i = 0; i < predictions.length; i++) { 8 // need to avoid taking the log of 0 9 // make the max value 1 - 1e-7, and min value 1e-7 10 loss.add(-math.log(math.min( 11 math.max(predictions[i][labels[i].round()], 1e-7), 1 - 1e-7))); 12 } 13 return loss; 14 } 15

Loss Backward Pass

For the backwards pass, we need to compare the dvalues and the labels (yTrue). First, we convert to a one hot vector (where the non-correct indexes are 0 with the correct being 1), and divide the negative values of the one hot vector by the input dvalues. Then, we divide this dinputs value by the samples.

1 2 void backward(Vector2 dvalues, Vector1 yTrue) { 3 var samples = dvalues.length; 4 var labels = dvalues[0].length; 5 6 // convert y to one-hot vector 7 Vector2 eyeTemp = Vector2.eye(labels); 8 Vector2 yTrueOneHot = Vector2.empty(); 9 for (var i = 0; i < dvalues.length; i++) { 10 yTrueOneHot.add(eyeTemp[yTrue[i] as int]); 11 } 12 Vector2 dinputs = ((yTrueOneHot * -1) / dvalues) as Vector2; 13 dinputs = (dinputs / samples) as Vector2; 14 15 this.dinputs = dinputs; 16 } 17

Optimizers

Optimizers are what allow a network to learn. This is where you hear terms such as learningRate and decay. By far the most popular optimizer for neural networks is the Adam optimizer. This is a cumulation of some other optimizer types with some various tweaks to make it especially powerful. It pulls ideas from the SGD optimizer and the RMS Prop optimizer.

The following optimizers were implemented:

Abstract Class

1abstract class Optimizer { 2 late double learningRate; 3 late double currentLearningRate; 4 late double decay; 5 late int iterations; 6 7 // constructors 8 Optimizer(); 9 Optimizer.fromMap(Map<String, dynamic> values); 10 11 /// To be called before each layer has been optimized 12 void pre(); 13 14 /// Update the layer with the specified optimizer function 15 void update(Layer layer); 16 17 /// To be called after all layers have been optimized 18 void post(); 19 20 // basic name 21 String name(); 22 23 /// to convert the optimizer to a map for json storage 24 Map<String, dynamic> toMap(); 25} 26

Adam Optimizer

The adam optimizer has some serious math behind the operations. I am not going to get into the technical details, partly because I am not qualified to speak on how this all works. I highly recommend Neural Networks from Scratch in Python. It gives a fantastic explanation behind the how AND why of all of the optimizers.

1 2 void pre() { 3 if (decay != 0) { 4 currentLearningRate = learningRate * (1 / (1 + decay * iterations)); 5 } 6 } 7 8 9 void update(Layer layer) { 10 // initialize momentum and cache values 11 layer.weightMomentums ??= 12 Vector2(layer.weights.shape[0], layer.weights.shape[1]); 13 layer.weightCache ??= 14 Vector2(layer.weights.shape[0], layer.weights.shape[1]); 15 layer.biasMomentums ??= 16 Vector2(layer.biases.shape[0], layer.biases.shape[1]); 17 layer.biasCache ??= Vector2(layer.biases.shape[0], layer.biases.shape[1]); 18 19 // update the momentum with the current gradient 20 layer.weightMomentums = (layer.weightMomentums! * beta1) + 21 (layer.dweights! * (1 - beta1)) as Vector2; 22 layer.biasMomentums = (layer.biasMomentums! * beta1) + 23 (layer.dbiases! * (1 - beta1)) as Vector2; 24 25 // get correct momentum 26 // iterations is 0 at first pass, and we need to start with 1 here 27 Vector2 weightMomentumsCorrected = layer.weightMomentums! / 28 (1 - math.pow(beta1, iterations + 1)) as Vector2; 29 Vector2 biasMomentumsCorrected = 30 layer.biasMomentums! / (1 - math.pow(beta1, iterations + 1)) as Vector2; 31 32 // update the cache with the squared current gradients 33 layer.weightCache = (layer.weightCache! * beta2) + 34 (layer.dweights!.pow(2) * (1 - beta2)) as Vector2; 35 layer.biasCache = (layer.biasCache! * beta2) + 36 (layer.dbiases!.pow(2) * (1 - beta2)) as Vector2; 37 38 // get corrected cache 39 Vector2 weightCacheCorrected = 40 layer.weightCache! / (1 - math.pow(beta2, iterations + 1)) as Vector2; 41 Vector2 biasCacheCorrected = 42 layer.biasCache! / (1 - math.pow(beta2, iterations + 1)) as Vector2; 43 44 // vanilla sgd parameter update + normalization with square root cache 45 layer.weights = layer.weights + 46 ((weightMomentumsCorrected * -currentLearningRate) / 47 (weightCacheCorrected.sqrt() + epsilon)) as Vector2; 48 layer.biases = layer.biases + 49 ((biasMomentumsCorrected * -currentLearningRate) / 50 (biasCacheCorrected.sqrt() + epsilon)) as Vector2; 51 } 52 53 54 void post() { 55 iterations += 1; 56 } 57

Neural Network Class

Now to pull this all together. All that composes a neural network is a list of layers with an activation function, a loss function, and an optimizer. That's it. So defining a class to hold all of these options is fairly simple.

I have some extra required fields in the network class to better protect myself from randomly creating parameters, having the network perform well, and completely forget what the network looked like. These options get thrown into a metadata file when the network is saved to disk.

1class NeuralNetwork { 2 /// list of layers that the model will run through. The first 3 /// layer size should accept the size of your input, and the last 4 /// layer should be your output layer giving your output. 5 late List<Layer> layers; 6 7 /// The loss function to use. The available functions are: 8 /// [LossBinaryCrossentropy] and [LossCategoricalCrossentropy]. 9 late Loss lossFunction; 10 11 /// Which optimizer to use. This will usually be adam, but other 12 /// loss functions are available. This includes: [OptimizerAdaGrad], 13 /// [OptimizerAdam], [OptimizerRMSProp], and [OptimizerSGD]. 14 late Optimizer optimizer; 15 16 /// An accuracy class to use. There is only one option: [Accuracy]. 17 late Accuracy accuracy; 18 19 /// The batch size to split the training or testing data into. 20 /// using larger batch sizes will usually result in better 21 /// network performance, up to a certain limit. A good number 22 /// for this is 128. The default is 1. 23 int? batchSize; 24 25 /// The seed used to generate any data, will be saved in the model 26 /// output for you to reference later. 27 late int seed; 28 29 /// The name of the dataset used. This is not used anywhere except 30 /// when saving the model, required to make sure you keep 31 /// proper track of what data your network was trained on. 32 late String dataset; 33 34 /// A description to describe any random information that you may 35 /// want to reference later when a model is saved 36 late String metadata; 37 38 /// Create a neural network with the list of Layers in [layers], 39 /// a [lossFunction], and an [optimizer]. There are some other 40 /// parameters that are required in order to properly generate 41 /// an output. These consist of [seed], [dataset], and [metadata]. 42 NeuralNetwork({ 43 required this.layers, 44 required this.lossFunction, 45 required this.optimizer, 46 required this.seed, 47 required this.dataset, 48 required this.metadata, 49 this.batchSize, 50 }) : assert(layers.isNotEmpty, "Layers cannot be an empty list") { 51 accuracy = Accuracy(); 52 } 53 ... 54} 55

Network Forward Pass

The forward pass is as simple as running through the layers which in turn will pass through the activation functions for the layers.

1 Vector2 _forward(Vector2 trainingData) { 2 // pass through layers 3 for (int i = 0; i < layers.length; i++) { 4 if (i == 0) { 5 layers[i].forward(trainingData); 6 } else { 7 layers[i].forward(layers[i - 1].output!); 8 } 9 } 10 11 return layers.last.output!; 12 } 13

Network Backward Pass

The backwards pass is simple as well. The loss function is called first, then all of the layers are ran through as well.

1 void _backward(Vector2 predictions, Vector1 trainingLabels) { 2 // backwards pass 3 lossFunction.backward(predictions, trainingLabels); 4 // loop backwards through all layers 5 for (int i = layers.length - 1; i > -1; i--) { 6 if (i == layers.length - 1) { 7 layers[i].backward(lossFunction.dinputs!); 8 } else { 9 layers[i].backward(layers[i + 1].dinputs!); 10 } 11 } 12 } 13

Training the Network

These two functions, _forward() and _backward(), are called from both the train() and the test() function. The train() function is a little more involved. We need to factor in epochs, batches, and optimization. The epochs are simple, it is just how many times we run through the data. The batch size is a bit tricky. We need to pull sub vectors corresponding to the amount of loops we need to run to pass through the entire training data. The code for which is as follows:

1 batchSize ??= trainingData.shape[0]; 2 // calculate step number 3 int steps = (trainingData.shape[0] / batchSize!).round(); 4 // include any stragling data 5 if (steps * batchSize! < trainingData.shape[0]) { 6 steps += 1; 7 } 8 print("# Beginning training of model:"); 9 print("# epochs: $epochs, batch size: $batchSize, steps: $steps"); 10 11 for (int epoch = 0; epoch < epochs; epoch++) { 12 // reset accumulated values 13 lossFunction.newPass(); 14 accuracy.newPass(); 15 16 for (int step = 0; step < steps; step++) { 17 Vector2 batchData = trainingData.subVector( 18 step * batchSize!, (step + 1) * batchSize!) as Vector2; 19 Vector1 batchLabels = trainingLabels.subVector( 20 step * batchSize!, (step + 1) * batchSize!) as Vector1; 21 ... 22 } 23 ... 24 } 25 ... 26

Then, the rest of the training pass is as simple as calling the _forward function, calculating the loss and accuracy, running the backwards function, then running the optimizer.

1 // run forward pass 2 var predictions = _forward(batchData); 3 4 var dataLoss = lossFunction.calculate( 5 layers.last.output!, 6 batchLabels, 7 ); 8 9 // calculate accuracy 10 var acc = accuracy.calculate(predictions, batchLabels); 11 12 // backwards pass 13 _backward(predictions, batchLabels); 14 15 // optimize 16 optimizer.pre(); 17 for (int i = 0; i < layers.length; i++) { 18 optimizer.update(layers[i]); 19 } 20 optimizer.post(); 21

Testing the Network

There are 2 scenarios we want to cover when testing the network. The first, is to test on an entire batch of data and get an average of the accuracy. The code for this is pretty similar to the training code.

1 /// Test the model against a list of inputs. The shape 2 /// must match that of your training data and the input 3 /// layer of the model. You can optionally print data to 4 /// the console every x steps with [printEveryStep]. 5 double test({ 6 required Vector2 testingData, 7 required Vector1 testingLabels, 8 int? printEveryStep, 9 }) { 10 print("# Beggining tesing of model:"); 11 batchSize ??= testingData.shape[0]; 12 13 // calculate step number 14 int steps = (testingData.shape[0] / batchSize!).round(); 15 // include any stragling data 16 if (steps * batchSize! < testingData.shape[0]) { 17 steps += 1; 18 } 19 20 double totalAccuracy = 0; 21 22 for (int step = 0; step < steps; step++) { 23 Vector2 batchData = testingData.subVector( 24 step * batchSize!, (step + 1) * batchSize!) as Vector2; 25 Vector1 batchLabels = testingLabels.subVector( 26 step * batchSize!, (step + 1) * batchSize!) as Vector1; 27 28 // run through the model 29 for (int i = 0; i < layers.length; i++) { 30 if (i == 0) { 31 layers[i].forward(batchData); 32 } else { 33 layers[i].forward(layers[i - 1].output!); 34 } 35 } 36 37 // calulate loss 38 var loss = lossFunction.calculate(layers.last.output!, batchLabels); 39 var correct = 0; 40 for (var i = 0; i < batchLabels.length; i++) { 41 if (layers.last.output![i].maxIndex() == batchLabels[i]) { 42 correct += 1; 43 } 44 } 45 var accuracy = correct / batchLabels.length; 46 totalAccuracy += accuracy; 47 48 if (printEveryStep != null && 49 (step % printEveryStep == 0 || step == steps - 1)) { 50 print( 51 "validation, acc: ${accuracy.toStringAsPrecision(3)}, loss: ${loss.toStringAsPrecision(3)}"); 52 } 53 } 54 double totalAcc = totalAccuracy / steps; 55 print("# [Total accuracy: ${(totalAcc * 100).toStringAsPrecision(4)}%]"); 56 return totalAcc; 57 } 58

The second scenario we want to cover is testing a single input. The network runs many inputs in parallel, so we just need to convert a list of pixels into a Vector2 with length 1.

1 /// Pass a single datapoint through the network to get a 2 /// list of confidences cooresponding to what the network 3 /// 'thinks' the inputted [data] is. 4 List<double> getConfidenceSingle(List<double> data) { 5 var prediction = _forward(Vector2.from([data])); 6 List<double> preds = []; 7 for (num i in prediction[0]) { 8 preds.add(i.toDouble()); 9 } 10 return preds; 11 } 12

Mnist Dataset

This network was trained on the mnist dataset. Instead of using the raw file format from the original authors, I used a csv formatted version which you can find here.

NNImage Class

The images are loaded into a special class created to hold the image data along with the size, label, and some extra classes to randomize the images.

1/// Wrapper to hold image data along with a label 2class NNImage { 3 late List<double> image; 4 late int label; 5 late int size; 6 ... 7 8 /// Randomize the image position, angle, noise, and scale 9 NNImage randomized() { 10 ImageProcessor imageProcessor = ImageProcessor(); 11 return imageProcessor.transformImage( 12 this, 13 TransformationSettings.random(), 14 ); 15 } 16} 17

Image Randomization

After running into some issues with over-fitting on the dataset, I became blocked and did not know what direction to go in. The video by Sebastian Lague really helped me get unblocked, as he faced a similar issue. The solution was to randomize the position, angle, size, and noise of the images. So, on the NNImage class there is a function to convert the vanilla image into a transformed image, which runs it though this code:

1// heavily pulled from https://github.com/SebLague/Neural-Network-Experiments/blob/main/Assets/Scripts/Data+Handling/ImageProcessor.cs 2class ImageProcessor { 3 NNImage transformImage(NNImage original, TransformationSettings settings) { 4 math.Random rng = math.Random(); 5 NNImage transformedImage = NNImage.from(original); 6 if (settings.scale != 0) { 7 Tuple2<double, double> iHat = Tuple2( 8 v1: math.cos(settings.angle) / settings.scale, 9 v2: math.sin(settings.angle) / settings.scale, 10 ); 11 Tuple2<double, double> jHat = Tuple2(v1: -iHat.v2, v2: iHat.v1); 12 for (int y = 0; y < transformedImage.size; y++) { 13 for (int x = 0; x < transformedImage.size; x++) { 14 double u = x / (transformedImage.size - 1); 15 double v = y / (transformedImage.size - 1); 16 17 double uTransformed = iHat.v1 * (u - 0.5) + 18 jHat.v1 * (v - 0.5) + 19 0.5 - 20 settings.offset.v1; 21 double vTransformed = iHat.v2 * (u - 0.5) + 22 jHat.v2 * (v - 0.5) + 23 0.5 - 24 settings.offset.v2; 25 double pixelValue = sample(original, uTransformed, vTransformed); 26 double noiseValue = 0; 27 if (rng.nextDouble() <= settings.noiseProbability) { 28 noiseValue = (rng.nextDouble() - 0.5) * settings.noiseStrength; 29 } 30 transformedImage.image[getFlatIndex(transformedImage, x, y)] = 31 math.min(math.max(pixelValue + noiseValue, 0), 1); 32 } 33 } 34 } 35 return transformedImage; 36 } 37 38 double sample(NNImage image, double u, double v) { 39 u = math.max(math.min(1, u), 0); 40 v = math.max(math.min(1, v), 0); 41 42 double texX = u * (image.size - 1); 43 double texY = v * (image.size - 1); 44 45 int indexLeft = texX.toInt(); 46 int indexBottom = texY.toInt(); 47 int indexRight = math.min(indexLeft + 1, image.size - 1); 48 int indexTop = math.min(indexBottom + 1, image.size - 1); 49 50 double blendX = texX - indexLeft; 51 double blendY = texY - indexBottom; 52 53 double bottomLeft = 54 image.image[getFlatIndex(image, indexLeft, indexBottom)]; 55 double bottomRight = 56 image.image[getFlatIndex(image, indexRight, indexBottom)]; 57 double topLeft = image.image[getFlatIndex(image, indexLeft, indexTop)]; 58 double topRight = image.image[getFlatIndex(image, indexRight, indexTop)]; 59 60 double valueBottom = bottomLeft + (bottomRight - bottomLeft) * blendX; 61 double valueTop = topLeft + (topRight - topLeft) * blendX; 62 double interpolatedValue = valueBottom + (valueTop - valueBottom) * blendY; 63 return interpolatedValue; 64 } 65 66 int getFlatIndex(NNImage image, int x, int y) { 67 return y * image.size + x; 68 } 69} 70 71class TransformationSettings { 72 late double angle; 73 late double scale; 74 late Tuple2<double, double> offset; 75 late int noiseSeed; 76 late double noiseProbability; 77 late double noiseStrength; 78 79 TransformationSettings({ 80 required this.angle, 81 required this.scale, 82 required this.offset, 83 required this.noiseSeed, 84 required this.noiseProbability, 85 required this.noiseStrength, 86 }); 87 ... 88} 89

Training and Testing Images

The training of the network included the original 60000 images along with all 60000 images randomized, for a total of 120000 images. The testing dataset was composed of 10000 images along with all 10000 images randomized, for a total of 20000 images.

Lastly, I hand generated 1000 images of digits in the format the test program creates them in, to give the network a little diversity in the image pixel values. So, 900 of these images were added for training and 100 were added for testing.

Model Performance

The accuracy of the model on the testing data was 93%. This still shows a bit of over-fitting, but the performance tends to be relatively good on images drawn on the website example.

There are some issues when it comes to drawing digits outside of the center of the grid, which could be fixed in a variety of ways. One, is to add some more variety to the training data. The second, is to take the pixel input from the user and run a function to center their input, though I am not sure how effective this would be.

Concluding Thoughts

This was an extremely fun and rewarding project to work on. I hope the information I put here helps someone in the future on the same journey. If there happens to be some part of my implementation that helps you, please let me know! I would love to hear what other similar projects people have done along with suggestions for how to improve my own implementation!

Comments