Hacker Newsnew | past | comments | ask | show | jobs | submit | teddykoker's commentslogin

OpenEquivariance [1] is another good baseline for with kernels for the Clebsch-Gordon tensor product and convolution, and it is fully open source. Both kernel implementations have been successfully implemented into existing machine learning interatomic potentials, e.g. [2,3].

[1] https://github.com/PASSIONLab/OpenEquivariance

[2] https://arxiv.org/abs/2504.16068

[3] https://arxiv.org/abs/2508.16067


A related line of work is "Thinking Like Transformers" [1]. They introduce a primitive programming language, RASP, which is composed of operations capable of being modeled with transformer components, and demonstrate how different programs can be written with it, e.g. histograms, sorting. Sasha Rush and Gail Weiss have an excellent blog post on it as well [2]. Follow on work actually demonstrated how RASP-like programs could actually be compiled into model weights without training [3].

[1] https://arxiv.org/abs/2106.06981

[2] https://srush.github.io/raspy/

[3] https://arxiv.org/abs/2301.05062


Huge fan of RASP et al. If you enjoy this space, might be fun to take a glance at some of my work on HandCrafted Transformers [1] wherein I hand-pick the weights in a transformer model to do long-handed addition similar to how humans learn to do it in gradeshcool.

[1] https://colab.research.google.com/github/newhouseb/handcraft...


It seems like a functional language like Haskell would be the right tool for this.

Also going from a net to code would be super interesting in terms of explain-ability.


See also Radio2Speech [1] which uses a UNet to recover audio from a RF beam.

[1] https://zhaorunning.github.io/Radio2Speech/


Link to paper [1]. It looks like the authors construct a basic autoencoder to predict frames in videos of various videos (double pendulum, lava lamp, etc.) and then use the Levina–Bickel algorithm [2] to determine the expected "intrinsic dimension" of the latent space of the autoencoder. They then refer to the size of the intrinsic dimension as the "minimum number of variables required by the system to accurately capture the motion.", e.g. 24 for a video of a fireplace.

Personally, I wonder how much information this actually provides about a system. Since the neural network is non-linear, a single latent variable may theoretically function as more than one state variable.

[1]: https://www.nature.com/articles/s43588-022-00281-6.epdf?shar...

[2]: https://www.stat.berkeley.edu/~bickel/mldim.pdf


Chess experts are only good at remembering valid chess positions that could result from a game in play. Their "expertise" vanishes why trying to remember chess boards set up at random (invalid) positions.

It could be that there are a few valid states that can be highly encoded in each of these situations, and the autoencoders found them.

This could be highly useful.


According to [1], the byte pair encoding for “Apoploe vesrreaitais” (the words producing bird images) is "apo, plo, e</w>, ,ve, sr, re, ait, ais</w>", and Apo-didae & Plo-ceidae are families of birds.

[1] https://twitter.com/barneyflames/status/1531736708903051265?...


On the other hand the openai tokenizer gives me a different tokenization ap - opl - oe [0]. If you capitalize A the result is A - pop - loe. The dalle 2 paper only specifies that it uses a BPE encoding, I would assume they used the same one as for gpt3 [0] https://beta.openai.com/tokenizer


If they use BPE dropout, then the split can be different and not unique.

And for the record, they use BPE dropout for DALLE-1, see https://arxiv.org/pdf/2102.12092.pdf


I believe they only apply it during training.


right, that is my point. It is hard to know which combination triggers the current tokenization to be interpreted as bird.


I wonder how their advertised "vector database" works. kNN combined with embeddings from pre-trained deep learning models can be very useful for information retrieval, (e.g. searching for duplicate/similar images or text).

In the past I have used a k-d tree [1] for this, which allows O(log n) searches in the vector space. It seems they are offering a k-d-tree-as-a-service.

[1] https://en.wikipedia.org/wiki/K-d_tree


Pinecone stores and searches through dense vector embeddings using a proprietary ANN index. It also has live index updates and metadata filtering, which you’d expect from any database but is surprisingly hard to find or do with vector indexes.

As you said, common use cases include deduplication and image search, and especially semantic search (text).


Do you happen to know other implementations that allow for live updates and metadata filtering like Pinecone?




> kNN combined with embeddings from pre-trained deep learning models can be very useful for information retrieval

Indeed! We've been able to build simple reverse image search apps and other solutions using the power of embeddings from pre-trained ML models: https://gist.github.com/fzliu/c9380a7f9ba411adeff0b727cdba15....

One quick note: k-d trees are great for indexing low-dimensional data, but for high-dimensional embeddings they tend to be a poor indexing choice since you'll end up visiting more nodes in the tree than you'd like. I found [1] to be a great overview of different indexing types for high-dimensional vectors and the advantages of each.

[1] https://milvus.io/docs/index.md


For image retrieval, have you tried using a model trained with contrastive learning (e.g. SimCLR)? This could produce better embeddings for retrieval since the model is trained to explicitly minimize euclidean distance between similar pairs.

Thanks for the reference! Nice outline of various ANN approaches.


I haven't tried SimCLR, but I did try face embedding models trained with contrastive and triplet loss. For applications where precision is the key metric, I do agree that these loss functions are much better overall.

If discovery or recall is what you're after, a generic image classification model trained with binary cross-entropy might be better. For example, performing reverse image search on a photo of a German Shepherd should always return images of GSheps in the first N pages, but showing other dog breeds in later pages and possibly even cats after that would be a desirable feature for many search/retrieval solutions. An embedding model trained with contrastive loss might have this behavior to a certain extent, but a model based on BCE should be better.


Also see Einops: https://github.com/arogozhnikov/einops, which uses a einsum-like notation for various tensor operations used in deep learning.

https://einops.rocks/pytorch-examples.html shows how it can be used to implement various neural network architectures in a more simplified manor.


Einops looks nice! It reminds me of https://github.com/deepmind/einshape which is another attempt at unifying reshape, squeeze, expand_dims, transpose, tile, flatten, etc under an einsum-inspired DSL.


Somebody also realized that much of the time you can use one single function to describe all 3 of the einops operations. I present to you, einop: https://github.com/cgarciae/einop


In addition, einsum in jax uses einops.


This might be what you’re looking for: https://gafniguy.github.io/4D-Facial-Avatars/


You can find Benter’s paper about the model here: https://www.gwern.net/docs/statistics/decision/1994-benter.p...

About a year ago I read this and attempted to build a similar model (with more layers ;)) using data I scraped from Hong Kong Jockey Club’s website. Although I used much fewer features, it still produced profit in held-out races: https://teddykoker.com/2019/12/beating-the-odds-machine-lear.... Obviously there are many caveats when backtesting like this but I thought it was a fun project!


So basically the model was just a single layer neural net with extra proprietary data?


Data and meaning is the key.

I had a coworker who would prepare for weeks for stakes races and follow a few second tier horses as well. Twitter made some aspects easier as there is a racetrack Twitter community. His specialty was identifying exactas where a long shot would place or win with a favorite.

He’d pay people to film workout at Belmont and Saratoga and tweak his model (an Excel spreadsheet) based on what he saw. He would have a sense based on the workouts, weather, etc and would pick 4-10 races a week.


Yup! Just a multinomial logistic regression model with a bunch of proprietary data and engineered features.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: