As others in this thread have mentioned, semantic/vector-based solutions tend to be much better when any of the following is true:
1. There are natural language questions being asked
2. There's ambiguity in any of the query terms but which there's more clarity if you understand the term in context of nearby terms (e.g. "I studied IT", "I saw the movie It", "What is it?")
3. There are multiple languages in either the query or the documents (many embedding models can embed to a very similar vector across languages)
4. Where you don't want to maintain giant synonym lists (e.g. user might search for or document might contain "movie" or "film" or "motion picture")
Whether you need any of those depends on your use case. If you are just doing e.g. part number search, you probably don't need any of this, and certainly not semantic/vector stuff.
But semantic/vector systems don't work well with terms that weren't trained in. e.g. "what color is part NY739DCP?" Fine tuning is bad at handling this (fine tuning is generally good for changing the format of the response, and generally not all that good for introducing or curtailing knowledge). Whether you need a keyword search system depends on whether your information are more of the "general knowledge" type or something more specific to your business.
Most companies have some need for both because they're building a search on "their" data/business, and you want to combine the results to make sure you're not duplicating. But I'll say I've seen a lot of companies get this sort of combination business wrong. Keeping 2 datastores completely in sync is wrought with failure, the models all have different limitations than the keyword system limitations which is good in some ways but can cause unexpected results in others, and the actual blending logic (be it RRF or whatever) can be difficult to implement "right."
I usually recommend folks look to use a single system for combining together these results as opposed to trying to invent yourself. Full disclosure: I work for Vectara (vectara.com) which has a complete pipeline that does combine both a neural/vector system and traditional keyword search, but I would recommend that folks look to combine these into a single system even if they didn't use our solution because it just takes so much operational complexity off the table.
1. There are natural language questions being asked 2. There's ambiguity in any of the query terms but which there's more clarity if you understand the term in context of nearby terms (e.g. "I studied IT", "I saw the movie It", "What is it?") 3. There are multiple languages in either the query or the documents (many embedding models can embed to a very similar vector across languages) 4. Where you don't want to maintain giant synonym lists (e.g. user might search for or document might contain "movie" or "film" or "motion picture")
Whether you need any of those depends on your use case. If you are just doing e.g. part number search, you probably don't need any of this, and certainly not semantic/vector stuff.
But semantic/vector systems don't work well with terms that weren't trained in. e.g. "what color is part NY739DCP?" Fine tuning is bad at handling this (fine tuning is generally good for changing the format of the response, and generally not all that good for introducing or curtailing knowledge). Whether you need a keyword search system depends on whether your information are more of the "general knowledge" type or something more specific to your business.
Most companies have some need for both because they're building a search on "their" data/business, and you want to combine the results to make sure you're not duplicating. But I'll say I've seen a lot of companies get this sort of combination business wrong. Keeping 2 datastores completely in sync is wrought with failure, the models all have different limitations than the keyword system limitations which is good in some ways but can cause unexpected results in others, and the actual blending logic (be it RRF or whatever) can be difficult to implement "right."
I usually recommend folks look to use a single system for combining together these results as opposed to trying to invent yourself. Full disclosure: I work for Vectara (vectara.com) which has a complete pipeline that does combine both a neural/vector system and traditional keyword search, but I would recommend that folks look to combine these into a single system even if they didn't use our solution because it just takes so much operational complexity off the table.