Does elevenlabs have a real-time conversational voice model? It seems like like their focus is largely on text to speech and speech to text. Which can approximate that type of thing but it's not at all the same as the native voice to voice that 4o does.
[disclaimer, i work at elevenlabs] we specifically went with a cascading model for our agents platform because it's better suited for enterprise use cases where they have full control over the brain and can bring their own llm. with that said, even with a cascading model, we can capture a decent amount of nuance with our asr model, and it also supports capturing audio events like laughter or coughing.
a true speech to speech conversational model will perform better on things like capturing tone, pronouncations, phonetics, etc, but i do believe we'll also get better at that on the asr side over time.
> Does elevenlabs have a real-time conversational voice model?
Yes.
> It seems like like their focus is largely on text to speech and speech to text.
They have two main broad offerings (“Platforms”); you seem to be looking at what they call the “Creative Platform”. The real-time conversational piece is the centerpiece of the “Agents Platform”.
Substack to me seems to be 40% self-promotion or advertising a service, 40% long-form LinkedIn posts / AI slop, and the remaining 20% is behind a subscription with eventual freebies. Mostly professional writing. It’s far from being a new blogspot.
I agree. Substack feels more like Op Ed writers realised they could make more money by self publishing than by staying at a dying media company with multiple levels of editorial oversight.
To do well on Substack you need to publish pretty regularly, several times a week to keep and build an audience, and the only thing anyone can generate that fast are opinions. So Substack has really just become a decentralised Op Ed page.
Decentralized and expensive. Maybe I’m looking at the wrong blogs but my impression so far is that a lot of subscriptions are around 5-10$ monthly for a single creator. I can get a ton of newspapers (ok not papers, websites) magazines etc for that price or better, and those have way more than one contributor. The video platform Nebula for example has 175 creators for 6$/month.
It does seem to work for a lot of people, though. Good for them.
The minimum price is enforced by Substack, unfortunately. You can make everything free but you can't charge, say, $1/month. It definitely pushes the platform toward writers who think "I want to make this my full-time job & income". It also definitely suffers from, to a lesser extent, the Medium problem of way too many people thinking it is some kind of get-rich-quick thing. Somehow the Reddit algorithm started showing me the substack reddit, which seemed to mostly be pretty new authors complaining that they aren't making much money from Substack.
That explains a lot. Thank you! What a weird business decision on their part. I would guess the minimum has something to with payment processing overhead, but Patreon handles 1-2$ monthly payments no problem and always has. Strange.
There was a brief moment where it felt like a really fresh take on blogging, but the number of big names that have come in make it feel a lot less "wild west, anyone could go big on this platform". The addition of Substack Notes (essentially, X / Twitter built in to Substack) also sort of brings it down to earth. It's hard to pretend your "longform reading" is more sophisticated than the low-attention-span Tiktok masses when you've got Twitter baked in.
I still like Substack overall, there is a vibe over there that I certainly like more than Twitter or Instagram. But it also has that air of snooty elitist nerdism that characterized the middle days of Twitter - and it seems like the level of get rich quick self promotion is at least in line with the rest of the net.
Ollama does heavily quantize models and has a very short context window by default, but this has not been my experience with unquantized, full context versions of Llama3.3 70B and particularly, Deepseek R1, and that is reflected in the benchmarks. For instance I used Deepseek R1 671B as my daily driver for several months, and it was at par with o1 and unquestionably better than GPT-4o (o3 is certainly better than all but typically we've seen opensource models catch up within 6-9 months).
Please shoot me an email at tanya@tinfoil.sh, would love to work through your use cases.
To your point: Do I understand correctly that, for example, by running the default model of Llama4 via ollama, the context window is very short even when the model's context is, like 10M. In order to "unlock" the full context version, I need to get the unquantized version.
For reference, here's what `ollama show llama4` returns:
- parameters 108.6B # llama4:scount
- context length 10485760 # 10M
- embedding length 5120
- quantization Q4_K_M
Yes, in the US right now. We don't run our own datacenters, though we sometimes consider it in a moment of frustration when the provider is not able to get the correct hardware configuration and firmware versions. Currently renting bare metal servers from neoclouds. We can't use hyperscalers because we need bare metal access to the machine.
That's the best part, you don't. You only need to trust NVIDIA and AMD/Intel.
Modulo difficult to mount physical attacks and side channels, which we wrote more about here: https://tinfoil.sh/blog/2025-05-15-side-channels
We're not competing with Gemini or OpenAI or the big cloud providers. For instance, Google is partnering with NVIDIA to ship Gemini on-prem to regulated industries in a CC environment to protect their model weights as well as for additional data privacy on-prem: https://blogs.nvidia.com/blog/google-cloud-next-agentic-ai-r...
We're simply trying to bring similar capabilities to other companies. Inference is just our first product.
>cloud provider can SSH into the VM
The point we were making was that CC was traditionally used to remove trust from cloud providers, but not the application provider. We are further removing trust from ourselves (as the application provider), and we can enable our customers (who could be other startups or neoclouds) to remove trust from themselves and prove that to their customers.
There are a multitude of components between my app and your service. You have secured one of them arguably the least important. But you can't provide any guarantees over say your API server that my requests are going through. Or your networking stack which someone e.g. a government could MITM.
I don't know anything about "secure enclaves" but I assume that this part is sorted out. It should be possible to use http with it I imagine. If not, yeah it is totally dumb from a conceptual standpoint.
>Since you rightly open-sourced the code (AGPL) is there anything stopping the cloud vendors from running and selling access to their own instances of your server-side magic?
Sure they can do that. Despite being open source, CC-mode on GPUs is quite difficult to work with especially when you start thinking about secrets management, observability etc, so we’d actually like to work with smaller cloud providers who want to provide this as a service and become competitive with the big clouds.
>Is your secret sauce the tooling to spin up and manage instances and ease customer UX?
Pretty much. Confidential computing has been around a while, and we still don’t see widespread adoption of it, largely because of the difficulty. If we're successful, we absolutely expect there to be a healthy ecosystem of competitors both cloud provider and startup.
>Do you envision an exit strategy that sells that secret sauce to a cloud provider or confidential computing middleware provider?
We’re not really trying to be a confidential computing provider, but more so, a verifiably private layer for AI. Which means we will try to make integration points as seamless as possible. For inference, that meant OpenAI API compatible client SDKs, we will eventually do the same for training/post-training, or MCP/OpenAI Agents SDK, etc. We want our integration points to be closely compatible with existing pipelines.
> Confidential computing has been around a while, and we still don’t see widespread adoption of it, largely because of the difficulty
This is not the reason at all. Complexity and difficult are inherent to large companies.
It's because it is a very low priority in an environment where for example there are tens of thousands of libraries in use, dozens of which will be in Production with active CVEs. And there are many examples of similar security and risk management issues that companies have to deal with.
Worrying about the integrity of the hardware or not trusting my cloud provider who has all my data in their S3 buckets anyway (which is encrypted using their keys) is not high on my list of concerns. And if it were I would be simply running on-premise anyway.