More

komuW · on Dec 5, 2024

Take a look at FSRS.

The paper: https://dl.acm.org/doi/10.1145/3534678.3539081?cid=996605471...

Various implementations: https://github.com/open-spaced-repetition/awesome-fsrs

Some benchmarks of various srs algorithms: https://expertium.github.io/Benchmark.html

komuW · on Oct 10, 2024

Have you looked at Tyk? https://github.com/TykTechnologies/tyk/

komuW · on Sept 4, 2024

Location: Nairobi, Kenya

Remote: Yes

Willing to relocate: No

Technologies: Golang, Python, Django, SQL, PostgreSQL, AWS

Résumé/CV: https://www.komu.engineer/cv/komu-CV.pdf

Email: komuw05@gmail.com

Blog: https://www.komu.engineer/blog

I'm a software engineer with ~8yrs experience. Here's somethings that I've done;

- Reduced cloud costs for a production kubernetes service by 20%. This was done by analysing pod utilisation on kubernetes and then right-sizing the pods to match the expected usage.

- Read the SMPP specification and implemented a client for it from scratch. Integrated the client into a production application and used it to send out SMS at averaget rates of 800/second.

- Created an open source ACME client based off the ACME specification.

komuW · on Aug 2, 2024

Hi, could you guys please reply to job applications. I had applied in April and up to now I haven't received even a rejection email.

smilliken · on Aug 2, 2024

Apologies. I'll double check with the team and figure out why you didn't receive a response.

komuW · on April 4, 2024

My solution[1] to this problem is to do what they did in the Apollo Guidance Computer; log to a ring buffer and only flush it (to disk or wherever) on certain conditions.

1. https://www.komu.engineer/blogs/09/log-without-losing-contex...

komuW · on Feb 5, 2024

Nice article. If you ever wondered how the python equivalent(celery) works, I wrote a blogpost about that ~4years ago; https://www.komu.engineer/blogs/07/understand-how-celery-wor...

komuW · on Feb 13, 2023

My favorite technique for reducing the cost of logging is the same technique that was employed in The Apollo Guidance Computer(though I'm not sure if they did it for cost purposes).

To quote from Annotations to Eldon Hall's Journey to the Moon[1]: "The Coroner recorded every instruction executed, with its inputs and results, writing over the oldest record when it filled up. When a program crashed, you could punch out a full record of what it was doing in most of its last second and analyze the problem at your ease. I have often wished that PCs offered such an advanced feature."

So essentially buffer all logs into an in-memory circular buffer of capacity N. If a log record is emitted that has a certain severity/level; flush all records from the buffer to disk/clickHouse/grafana/whatever.

The python MemoryHandler[2] almost implements the said technique, except that it also flushes when buffer is full; which is not particularly what I would want.

I also wrote a blogpost[3] about how to log without losing money or context, ~3yrs ago.

1. https://authors.library.caltech.edu/5456/1/hrst.mit.edu/hrs/...

2. https://github.com/python/cpython/blob/v3.11.1/Lib/logging/h...

3. https://www.komu.engineer/blogs/09/log-without-losing-contex...

twic · on Feb 13, 2023

I had a program which occasionally segfaulted (and even raised a SIGILL once, i forget how). By the time it segfaults, it's too late to get logging out (easily, at least). But i didn't want to write an ever-growing log of everything.

So, i did something a bit like the coroner. When the program started, it created a fresh log file, extended it to a certain size, and memory-mapped it. It then logged into this buffer, with new logging overwriting old (it wasn't actually a circular buffer; the program dropped a big blob of logging into the buffer at the top of its main loop).

While alive, the process never closed or msynced the mapping, and it was fixed size, so the kernel was under no particular pressure to write the contents to disk. But when the process crashed, the kernel would preserve the contents.

I admit i never benchmarked this, so i don't know whether it actually avoided excessive writes. But it seemed like a neat idea in principle!

kevin_nisbet · on Feb 13, 2023

It's been quite a few years, but when I started in telco, we had a vendor product that I think sort of worked like this. They were using a "micro-services" architecture within their nodes before it became popular. They also used a crash-only approach to writing software. So lots of asserts for unhandled / unexpected cases.

As I remember it, they wrote their crash handler to include the ring buffer of recent messages sent to the services. So whenever they'd get into an unexpected state, they'd just crash the process, and collect the ring buffer of recent messages along with the other normal things in a mini core. Made it so easy to track down those unexpected / corner cases in that platform.

foobiekr · on Feb 13, 2023

This is a very common practice in embedded code, generally three things:

1. A ring of log-like objects (obviously not rendered strings, since that is a waste of CPU) that can be optionally included in a crash report in structured form that can be dissected later.

2. Compiler-generated enter/exit counters and corresponding table per module, modules linking themselves as init time to the master table, for performance counters [invocations or time spent]; dumpable on demand; lightweight and always on

3. a ring of logs - these actually being rendered logs plus indices into (1) - that have been otherwise rendered, so the retention cost is minimal and you can map back to log files otherwise provided.

The distinction between (1) and (3) should be obvious, but in case it is not, short circuiting log rendering for logs that should otherwise be dropped is a very important practice to avoid debug-level logs consuming the majority of CPU time.

Traditionally, all of these are trivially inspectable in a core dump, but usually you'd like a reduced crash report instead: less wear and tear on the flash and easier for users [and bug management systems] to juggle. Crash reports and cores obviously need to include an unambiguous version [typically a hash of the code rather than a manually managed version #, for dynamically linked ELFs, fingerprints of all libraries as well]; for cores you just make sure to compute this at start and keep it in memory reachable from a pointer out of main().

pmalynin · on Feb 13, 2023

Modern CPUs do actually offer this for the most part, it’s called time travel debugging. Intel’s offering is called Intel Processor Trace. Although it’s not full input output logging.

foobiekr · on Feb 13, 2023

As a rule, it is better to have developers learn one way to work, rather than N.

A problem with time travel debugging is that you generally can't use it in production [of course, there are people who think devs should have direct access to prod, for them there is no help], and you 100% cannot use it for anything deployed at a customer (so for embedded, devices, actual non-SAAS software etc. etc.).

It's better to shore up your tools so that the workflow is very straightforward and leave stuff like time travel for people doing work on a very narrow subset of very hard to understand bugs.

komuW · on Jan 15, 2023

Note that dumping the Vault's process memory is beyond hashicorp/Vault's threat model. See: https://github.com/hashicorp/vault/issues/1446#issuecomment-...

I'm bringing this up because the circleCI blogpost says that the attacker did memory-dump encryption keys from a running process. See https://circleci.com/blog/jan-4-2023-incident-report/

So even if they were using hashicorp/vault, the attacker could probably still have been able to mem-dump vault's process.

robszumski · on Jan 15, 2023

You can run Vault inside of an enclave to protect it's memory: https://edgebit.io/enclaver/docs/0.x/guide-vault/

komuW · on Jan 15, 2023

Note that dumping the Vault's process memory is beyond hashicorp/Vault's threat model. See: https://github.com/hashicorp/vault/issues/1446#issuecomment-...

komuW · on Jan 15, 2023

From the circleCI blogpost[1]: "Our investigation indicates that the malware was able to execute session cookie theft, enabling them to impersonate the targeted employee in a remote location"

I haven't seen much discussion on how this specific attacker entrypoint can be mitigated. So I'm going to make a naive attempt in this comment.

How about storing the client's IP address in the session cookie. Then whenever the server recieves the cookie, it compares the client's IP address against the one stored in the session cookie. The server denies the login if there's a mismatch. The cookie would of-course have to be signed(hmac etc) so that it is tamper proof.

One problem with this is that client IP addresses are easily spoofed[2].

So, instead of storing the client's IP address; how about we instead store the clients' SSL fingerprints[3][4]. I haven't looked much into the literature, but I think those fingerprints are hard to spoof.

1. https://circleci.com/blog/jan-4-2023-incident-report/

2. https://adam-p.ca/blog/2022/03/x-forwarded-for/

3. https://github.com/salesforce/hassh

4. https://github.com/salesforce/ja3

mschuster91 · on Jan 15, 2023

> How about storing the client's IP address in the session cookie. Then whenever the server recieves the cookie, it compares the client's IP address against the one stored in the session cookie. The server denies the login if there's a mismatch.

That doesn't work in environments with multiple NAT origin IPs in place, or when they're using crap like Netskope/some other "security"/"privacy"/"VPN" software, as IPs tend to randomly change with these. It would generate way too many false-positive reports.

> One problem with this is that client IP addresses are easily spoofed[2].

Only if the backend servers are badly set up. For me, I always run haproxy as the frontend and forcibly delete incoming headers (X-Forwarded-*, Forwarded), and as an added precaution the backend software is configured to only trust the haproxy origin IPs - so even in the case an attacker manages somehow to directly access the backend servers directly, they cannot get a spoofed IP past the system.

> So, instead of storing the client's IP address; how about we instead store the clients' SSL fingerprints

That requires client-side SSL authentication, which is theoretically supported by all major browsers, but very rarely used and the UI support is... clunky at best.

komuW · on Jan 15, 2023

> That requires client-side SSL authentication,

I do not think it requires client side SSL. See: https://engineering.salesforce.com/tls-fingerprinting-with-j...

What is been fingerprinted is the TLS negotiation between client and server.

mschuster91 · on Jan 15, 2023

That fingerprints a piece of software (and, if SSL library versions or configurations change, the version), but not a specific client. It's virtually worthless as a security measure if the endpoint is a common browser.

eigenvalue · on Jan 15, 2023

These issues would be addressed by something like Tailscale, but of course they also got hacked recently…

AYBABTME · on Jan 15, 2023

Then the malware will proxy via your machine.

komuW · on Jan 15, 2023

yeah this just kicks the can down the road.