It lets you hook into various points in the kernel; ultimately you need to learn how the Linux kernel is structured to make the most of it.
Unlike a module, it can only really read data, not modify data structures, so it's nice for things like tracing kernel events.
The XDP subsystem is particularly designed for you to apply filters to network data before it makes it to the network stack, but it still doesn't give you the same level of control or performance as DPDK, since you still need the data to go to the kernel.
Yep (the 0x.tools author here). If you look into my code, you'll see that I'm not a good developer :-) But I have a decent understanding of Linux kernel flow and kernel/app interaction dynamics, thanks to many years of troubleshooting large (Oracle) database workloads. So I knew exactly what I wanted to measure and how, just had to learn the eBPF parts. That's why I picked BCC instead of libbpf as I was somewhat familiar with it already, but fully dynamic and "self-updating" libbpf loading approach is the goal for v3 (help appreciated!)
Yeah, I already see limitations, the last one was yesterday when I installed earlier Ubuntu versions to see how far back this can go - and even Ubuntu 22.04 didn't work out of the box, ended up with some BCC/kernel header mismatch issue [1] although the kernel itself supported it. A workaround was to download & compile the latest BCC yourself, but I don't want to go there as the customers/systems I work on wouldn't go there anyway.
But libbpf with CO-RE will solve these issues as I understand, so as long as the kernel supports what you need, the CO-RE binary will work.
This raises another issue for me though, it's not easy, but easier, for enterprises to download and run a single python + single C source file (with <500 code lines to review) than a compiled CO-RE binary, but my long term plan/hope is that I (we) get the RedHats and AWSes of this world to just provide the eventual mature release as a standard package.
Myself I've only built simple things, like tracing sched switch events for certain threads, and killing the process if they happen (specifically designed as a safety for pinned threads).
Same here, until now. I built the earlier xcapture v1 (also in the repo) about 5 years ago and it just samples various /proc/PID/task/TID pseudofiles regularly, it also allows you get pretty far with the thread-level activity measurement approach, especially when combined with always-on low frequency on-CPU sampling with perf.
XDP, in its intended configuration, passes pointers to packets still on the driver DMA rings (or whatever) directly to BPF code, which can modify packets and forward them to other devices, bypassing the kernel stack completely. You can XDP_PASS a packet if you'd like it to hit the kernel, creating an skbuff, and bouncing it through all the kernel's network stack code, but the idea is that you don't want to do that; if you do, just use TC BPF, which is equivalently powerful and more flexible.
XDP reads the data in the normal NAPI kernel way, integrating with the IRQ system etc., which might or might not be desirable depending on your use case.
Then if you want to forward it to userland, you still need to write the data to a ring buffer, with your userland process polling it, at which point it's more akin to using io_uring.
It's mostly useful if you can write your entire logic in your eBPF program without going through userland, so it's nice for various tracing applications, filters or security checks, but that's about it as far as I can tell.
Unlike a module, it can only really read data, not modify data structures, so it's nice for things like tracing kernel events.
The XDP subsystem is particularly designed for you to apply filters to network data before it makes it to the network stack, but it still doesn't give you the same level of control or performance as DPDK, since you still need the data to go to the kernel.