Lessons Learned and Reinforced from Writing My Own Deep Learning Package

16 June 2025

4

At work, I’ve been rolling my own deep learning package to experiment with graph convolutional neural networks. I did this because in graph-centric deep learning, an idea I picked up from this paper, the inputs, convolution kernels, and much more, are being actively developed, and the standard APIs don’t fit with this kind of data.

Here’s lessons I learned (and reinforced) while doing this.

`autograd` is an amazing package

I am using autograd to write my neural networks. autograd provides a way to automatically differentiate numpy code. As long as I write the forward computation up to the loss function, autograd will be able to differentiate the loss function w.r.t. all of the parameters used, thus providing the direction to move parameters to minimize the loss function.

Deep learning is nothing more than chaining elementary differentiable functions

Linear regression is nothing more than a dot product of features with weights and adding bias terms. Logistic regression just chains the logistic function on top of that. Anything deeper than that is what we might call a neural network.

One interesting thing that I’ve begun to ponder is the shape of the loss function, and how it changes when I change model architecture, activation functions, and more. I can’t speak intelligently about it right now, but from observing the training performance live (I update a plot of predictions vs. actual values at the end of x training epochs), different combinations of activation functions seem to cause different behaviours of the outputs, and there’s no first-principles reason why that I can think of. All-told, pretty interesting :).

Defining a good API is hard work

There are design choices that go into the API design. I first off wanted to build something familiar, so I chose to emulate the functional API of Keras and PyTorch and Chainer. I also wanted composability, in which I can define modules of layers and chain them together, so I opted to use Python objects and to take advantage of their __call__ method to achieve both goals. At the same time, autograd imposes a constraint in that I need to have functions differentiable with respect to their first argument, an array of parameters. Thus, I had to make sure the weights and biases are made transparently available for autograd to differentiate. As a positive side effect, it means I can actually inspect the parameters dictionary quite transparently.

Optimizing for speed is a very hard thing

Even though I’m doing my best already with matrix math (and hopefully getting better at mastering 3-dimensional and higher matrix algebra), in order to keep my API clean and compatible with autograd (meaning no sparse arrays), I have opted to use lists of numpy arrays.

Graph convolutions have a connection to network propagation

I will probably explore this a bit more deeply in another blog post, but yes, as I explore the math involved in doing graph convolutions, I’m noticing that there’s a deep connection there. The short story is basically “convolutions propagate information across nodes” in almost exactly the same way as “network propagation methods share information across nodes”, through the use of a kernel defined by the adjacency matrix of a graph.

Ok, that’s a lot of jargon, but I promise I will explore this topic at a later time.

Open Sourcing

I’m an avid open source fan. Lots of my work builds on it. However, because this “neural networks on graphs” work is developed on company time and for company use, this will very likely be the first software project that I send to Legal to evaluate whether I can open source/publish it or not — I’ll naturally have to make my strongest case for open sourcing the code base (e.g. ensuring no competitive intelligence is leaked), but eventually will still have to defer to them for a final decision.

Original Source

Lessons Learned and Reinforced from Writing My Own Deep Learning Package

`autograd` is an amazing package

Deep learning is nothing more than chaining elementary differentiable functions

Defining a good API is hard work

Optimizing for speed is a very hard thing

Graph convolutions have a connection to network propagation

Open Sourcing

5 Best Parental Controls for Teenagers in 2026 by Penka Hristovska

Adding Persistent Memory to Claude Code with the Lightweight memsearch Plugin

GLM-5 vs. MiniMax M2.5 vs. Gemini 3 Deep Think: Which Model Fits Your AI Agent Stack?

LEAVE A REPLY Cancel reply

Most Popular

I love my Pixel, but I’d trade it for this in a heartbeat

I stopped fighting Google Sheets after Gemini made formulas feel optional

We need to talk about this.

This Galaxy S26 leak highlights a trend that makes me want to skip it

EDITOR PICKS

I love my Pixel, but I’d trade it for this in a heartbeat

I stopped fighting Google Sheets after Gemini made formulas feel optional

We need to talk about this.

POPULAR POSTS

I love my Pixel, but I’d trade it for this in a heartbeat

I stopped fighting Google Sheets after Gemini made formulas feel optional

We need to talk about this.

POPULAR CATEGORY

ABOUT US

FOLLOW US

Lessons Learned and Reinforced from Writing My Own Deep Learning Package

autograd is an amazing package

Deep learning is nothing more than chaining elementary differentiable functions

Defining a good API is hard work

Optimizing for speed is a very hard thing

Graph convolutions have a connection to network propagation

Open Sourcing

LEAVE A REPLY Cancel reply

Most Popular

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY

ABOUT US

FOLLOW US

`autograd` is an amazing package