teddykoker's comments

teddykoker · 2026-01-21T13:57:11 1769003831

OpenEquivariance [1] is another good baseline for with kernels for the Clebsch-Gordon tensor product and convolution, and it is fully open source. Both kernel implementations have been successfully implemented into existing machine learning interatomic potentials, e.g. [2,3].

[1] https://github.com/PASSIONLab/OpenEquivariance

[2] https://arxiv.org/abs/2504.16068

[3] https://arxiv.org/abs/2508.16067

teddykoker · on Sept 22, 2023

A related line of work is "Thinking Like Transformers" [1]. They introduce a primitive programming language, RASP, which is composed of operations capable of being modeled with transformer components, and demonstrate how different programs can be written with it, e.g. histograms, sorting. Sasha Rush and Gail Weiss have an excellent blog post on it as well [2]. Follow on work actually demonstrated how RASP-like programs could actually be compiled into model weights without training [3].

[1] https://arxiv.org/abs/2106.06981

[2] https://srush.github.io/raspy/

[3] https://arxiv.org/abs/2301.05062

newhouseb · on Sept 22, 2023

Huge fan of RASP et al. If you enjoy this space, might be fun to take a glance at some of my work on HandCrafted Transformers [1] wherein I hand-pick the weights in a transformer model to do long-handed addition similar to how humans learn to do it in gradeshcool.

[1] https://colab.research.google.com/github/newhouseb/handcraft...

theGnuMe · on Sept 23, 2023

It seems like a functional language like Haskell would be the right tool for this.

Also going from a net to code would be super interesting in terms of explain-ability.

teddykoker · on Aug 18, 2022

See also Radio2Speech [1] which uses a UNet to recover audio from a RF beam.

[1] https://zhaorunning.github.io/Radio2Speech/

teddykoker · on Aug 4, 2022

Link to paper [1]. It looks like the authors construct a basic autoencoder to predict frames in videos of various videos (double pendulum, lava lamp, etc.) and then use the Levina–Bickel algorithm [2] to determine the expected "intrinsic dimension" of the latent space of the autoencoder. They then refer to the size of the intrinsic dimension as the "minimum number of variables required by the system to accurately capture the motion.", e.g. 24 for a video of a fireplace.

Personally, I wonder how much information this actually provides about a system. Since the neural network is non-linear, a single latent variable may theoretically function as more than one state variable.

[1]: https://www.nature.com/articles/s43588-022-00281-6.epdf?shar...

[2]: https://www.stat.berkeley.edu/~bickel/mldim.pdf

mikewarot · on Aug 4, 2022

Chess experts are only good at remembering valid chess positions that could result from a game in play. Their "expertise" vanishes why trying to remember chess boards set up at random (invalid) positions.

It could be that there are a few valid states that can be highly encoded in each of these situations, and the autoencoders found them.

This could be highly useful.

teddykoker · on June 1, 2022

According to [1], the byte pair encoding for “Apoploe vesrreaitais” (the words producing bird images) is "apo, plo, e</w>, ,ve, sr, re, ait, ais</w>", and Apo-didae & Plo-ceidae are families of birds.

[1] https://twitter.com/barneyflames/status/1531736708903051265?...

DalasNoin · on June 1, 2022

On the other hand the openai tokenizer gives me a different tokenization ap - opl - oe [0]. If you capitalize A the result is A - pop - loe. The dalle 2 paper only specifies that it uses a BPE encoding, I would assume they used the same one as for gpt3 [0] https://beta.openai.com/tokenizer

karmasimida · on June 1, 2022

If they use BPE dropout, then the split can be different and not unique.

And for the record, they use BPE dropout for DALLE-1, see https://arxiv.org/pdf/2102.12092.pdf

DalasNoin · on June 1, 2022

I believe they only apply it during training.

karmasimida · on June 1, 2022

right, that is my point. It is hard to know which combination triggers the current tokenization to be interpreted as bird.

teddykoker · on April 13, 2022

I wonder how their advertised "vector database" works. kNN combined with embeddings from pre-trained deep learning models can be very useful for information retrieval, (e.g. searching for duplicate/similar images or text).

In the past I have used a k-d tree [1] for this, which allows O(log n) searches in the vector space. It seems they are offering a k-d-tree-as-a-service.

[1] https://en.wikipedia.org/wiki/K-d_tree

gk1 · on April 13, 2022

Pinecone stores and searches through dense vector embeddings using a proprietary ANN index. It also has live index updates and metadata filtering, which you’d expect from any database but is surprisingly hard to find or do with vector indexes.

As you said, common use cases include deduplication and image search, and especially semantic search (text).

turnersr · on April 13, 2022

Do you happen to know other implementations that allow for live updates and metadata filtering like Pinecone?

generall · on April 13, 2022

Check out https://github.com/qdrant/qdrant

redskyluan · on April 14, 2022

See https://milvus.io/

fzliu · on April 13, 2022

> kNN combined with embeddings from pre-trained deep learning models can be very useful for information retrieval

Indeed! We've been able to build simple reverse image search apps and other solutions using the power of embeddings from pre-trained ML models: https://gist.github.com/fzliu/c9380a7f9ba411adeff0b727cdba15....

One quick note: k-d trees are great for indexing low-dimensional data, but for high-dimensional embeddings they tend to be a poor indexing choice since you'll end up visiting more nodes in the tree than you'd like. I found [1] to be a great overview of different indexing types for high-dimensional vectors and the advantages of each.

[1] https://milvus.io/docs/index.md

teddykoker · on April 13, 2022

For image retrieval, have you tried using a model trained with contrastive learning (e.g. SimCLR)? This could produce better embeddings for retrieval since the model is trained to explicitly minimize euclidean distance between similar pairs.

Thanks for the reference! Nice outline of various ANN approaches.

fzliu · on April 13, 2022

I haven't tried SimCLR, but I did try face embedding models trained with contrastive and triplet loss. For applications where precision is the key metric, I do agree that these loss functions are much better overall.

If discovery or recall is what you're after, a generic image classification model trained with binary cross-entropy might be better. For example, performing reverse image search on a photo of a German Shepherd should always return images of GSheps in the first N pages, but showing other dog breeds in later pages and possibly even cats after that would be a desirable feature for many search/retrieval solutions. An embedding model trained with contrastive loss might have this behavior to a certain extent, but a model based on BCE should be better.

teddykoker · on April 10, 2022

Also see Einops: https://github.com/arogozhnikov/einops, which uses a einsum-like notation for various tensor operations used in deep learning.

https://einops.rocks/pytorch-examples.html shows how it can be used to implement various neural network architectures in a more simplified manor.

johnnymellor · on April 10, 2022

Einops looks nice! It reminds me of https://github.com/deepmind/einshape which is another attempt at unifying reshape, squeeze, expand_dims, transpose, tile, flatten, etc under an einsum-inspired DSL.

pizza · on April 10, 2022

Somebody also realized that much of the time you can use one single function to describe all 3 of the einops operations. I present to you, einop: https://github.com/cgarciae/einop

PartiallyTyped · on April 10, 2022

In addition, einsum in jax uses einops.

teddykoker · on March 24, 2022

This might be what you’re looking for: https://gafniguy.github.io/4D-Facial-Avatars/

teddykoker · on May 2, 2021

You can find Benter’s paper about the model here: https://www.gwern.net/docs/statistics/decision/1994-benter.p...

About a year ago I read this and attempted to build a similar model (with more layers ;)) using data I scraped from Hong Kong Jockey Club’s website. Although I used much fewer features, it still produced profit in held-out races: https://teddykoker.com/2019/12/beating-the-odds-machine-lear.... Obviously there are many caveats when backtesting like this but I thought it was a fun project!

ackbar03 · on May 2, 2021

So basically the model was just a single layer neural net with extra proprietary data?

Spooky23 · on May 2, 2021

Data and meaning is the key.

I had a coworker who would prepare for weeks for stakes races and follow a few second tier horses as well. Twitter made some aspects easier as there is a racetrack Twitter community. His specialty was identifying exactas where a long shot would place or win with a favorite.

He’d pay people to film workout at Belmont and Saratoga and tweak his model (an Excel spreadsheet) based on what he saw. He would have a sense based on the workouts, weather, etc and would pick 4-10 races a week.

teddykoker · on May 2, 2021

Yup! Just a multinomial logistic regression model with a bunch of proprietary data and engineered features.