Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah precisely. Knowledge graphs are simple to think about but as soon as you look into them you realize all the complexity is in the creation of a meaningful ontology and loading data into that ontology. I actually think LLMs can be massively useful for building up the ontology but probably not in the creation of the ontology itself (far too ambiguous and large/conceptual task for them right now).


How do we build ontology using LLMs? Will the building blocks be like the different parts of a brain? P.S I am assuming that by "creation of ontology itself" means creation of AGI.


Ontologies are just defining what certain category, words, and entity types mean. Commonly used in NLP for data representation (“facts”/triples/etc.) in knowledge graphs and other places where the definition of an ontology helps provide structure.

This doesn’t have anything to do with AGI or brains. They are typically created or tuned by humans and then models fit/match/resolve entities to match the ontology.


@dbish nailed it, but I can give you a bit more concrete example. Continuing off the light example I started off with. An ontology for knowing what city a person currently is in. We have two classes of entities, a person, and a city. There is a single relationship type "LocatedAt" you can add and remove edges to indicate where a person is and you can construct some verification rules such as "a person can only be in one city at time".

To have an LLM construct a knowledge graph of where someone is (and I know this example is incredibly privacy invasive but its a simple concrete example not representative). Imagine giving an LLM access to all of your text messages. You can imagine giving it a prompt along the lines of "identify who is being discussed in this series of messages, if someone indicates where they are physically located report that as well" (you'd want to try harder than that, keeping it simple).

You could get an output that says something like `{"John Adams": "Philadelphia, PA, US"}`. If either the left or right side are missing create them. Then remove any LocatedAt edges for the left side and add one between these two entities. You have a simple knowledge graph.

Seems easy enough, but try to ask slightly harder questions... When did we know John Adams was in Philadelpha? Have they been there before? Where were they before Philadelphia? The ontology I just developed isn't capable of representing that data. You can of course solve these problems and there are common ontological patterns for representing it.

The point is, you kind of need to know the kind of questions you want to ask about your data when you're building your ontology and you're always going to miss something. Usually you find out the unknown questions you want to ask of the data only after you've already built your system and started asking it questions. It's the follow-ups that kill you.

There has been a lot of work on totally unstructured ontologies as well, but you're moving the hard problem elsewhere not solving it. Instead of having high quality data you can't answer every question with, you have arbitrary associations that may mean the same thing and thus any query you make is likely _missing_ relevant data and thus inaccurate.

Huge headache to go down, but honestly I think it is a worthwhile one. Previously if you changed your ontology to answer a new question, a human would have to go through and manually and painstakingly update your data to the new system. This is boring, tedious, easy-to-get-wrong-due-to-inattention kind of work. It's not complex, its not hard, its very easy to double check but it does require an understanding of language. LLMs are VERY capable of doing this kind of work, and likely more accurately.


Yes! My position on KGs largely flipped post GPT-3. Before KGs were mostly a niche thing given the cost vs rewards, and now they're an everyone thing.

- The effort needed for KGs has higher general potential value because RAG is useful

- The KG ontology quality-at-scale problem is now solvable by LLMs automating index-time data extraction, ontology design, & integration

This is an area we're actively looking at: Self-determining ontologies for optimizing RAG, such as during evolving events, news, logs, emails, customer conversations. We have active projects across areas here already like for emergency response, cyber security, news mining, etc. If folks are interested here, we're definitely looking for design partners with challenging problems on it. (And, looking to hire a principal cybersecurity researcher/engineer on it, and later in our other areas too!)


Have any good research on this subject?


We've been working on gov/enterprise/etc projects here as part of louie.ai that are coming to a head -- I'd expect not far from what XAI, Perplexity, and others are also doing. However, while those must focus on staying cheap at consumer scale, we focus on enterprise scale & ROI, so get to make different trade-offs, and I'm guessing closer to how Google and other more mature teams do KG. We're not doing traditional KG, however -- it's a needlessly/harmfully lossy discretization -- but are coming from lessons in that world, esp in the large-scale intel/OSINT side and graph neural net community.

A bit more concretely, for the LLM era, we're especially oriented around the move from vanilla RAG chunking to graph RAG, hierarchical RAG, "auto-wiki" style projects, and continuous learning LLMs. Separately, we've been working on neurosymbolic query synthesis for accessing this and mixing in (privacy-aware) continuous learning from teams using it. I think the first public details on this were in my keynote at the 2024 Infosec Jupyterthon, and we'll be sharing more at graphtheplanet.com next week as well. We haven't said as much, but we're also looking at the problem that the data itself isn't to be trusted, e.g., blind men and the elephant reporting different things over time on the news/social media/IT incident tickets.

Right now we're just focusing on building and delivering great tech, customer problem by customer problem. There's a lot to do for a truly good end-to-end analyst/analytics experience!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: