Look at computer science. They have invented so many different algorithms that work! This is a sign that computer science lacks a decent unifying theoretical framework.
Having different algorithms for different purposes is fine. For instance, autoencoders can do unsupervised learning, GANs can learn generative models that sample from the entire distribution, recurrent neural networks can handle time series, etc. Also while there are many different types of networks, research has shown which ones work and which ones don't. Few people pretrain autoencoders or bother with RBMs anymore, for instance. And I think we have good theoretical reasons why they aren't as good.
But to continue the analogy to computer science, imagine all the different kinds of sorting algorithms. They will each work better or worse based on how the data you are sorting is arranged. If it's already sorted in reverse, that's a lot different than if it's sorted completely randomly, or if it was sorted and then big chunks were randomly rearranged.
There's no way to prove that one sorting algorithm will always do better than another, because there's always special cases where they do do better. The same is true of neural networks, it's impossible to formally prove they will work, because it depends on how real world problems are distributed.
Type theory is a unifying theory of program semantics. It is one among many, each of which is a facet of one whole. In particular, the theory of algebraic and coalgebraic data types is a unifying theory for computation that allows one to derive efficient algorithms automatically. It does not encompass all algorithms design by a long shot, it is actually quite niche, but as a coherent mathematical theory that gets to the essence of computation, it is very promising.
I suppose the point of my comment is that theoretical computer science is actually a field with a lot of unifying theories that approach computation in coherent ways. Applied computer science is much, much messier because it is interested in the particularities and flaws of real world computational models and getting practical results now, leaving explanations to come later.
There are unifying theories of inference for AI, but they don't really cover deep neural networks. There are a few tantalizing hints that deep learning is intimately related to profound concepts in physics (renormalization) and functional programming.
I don't think that this is a sign that computer science lacks a decent unifying theoretical framework.
We can even proof that there is no optimal sorting algorithm. "For every sorting algorithm x exists an input i for which an sorting algorithm y exists, so that y sorts i faster than x" is provable
I think the GP was being sarcastic and casting doubt on the need for or importance of a unifying theoretical framework in CS.
Which is a shame. I think there are unifying frameworks in CS but think we could use more and I think we would benefit if, when people looked at a body of knowledge and saw just a list of techniques, they would say "hey, this could benefit from a more unified treatment".
well thats a typical case of editing without reading the answer if was referring to first. I read Andrews answer and then changed my comment without re-reading the answer i was referring to.
I normally hate when this happens, so i have to apologise.
Well, yes, but that's not terribly illuminating because your algorithm Y just has to encode the permutation for I and then check that the result is sorted before falling back to another algorithm. So in another sense you can say that this is true for any problem where the NTIME complexity is less than the DTIME complexity, which is an incredibly large set of problems, because you can just encode an oracle into your machine for a chosen input.
well, a sorting algorithm has to sort arbitrary input, so
def sort(ignore) = [1,2,3] does not count.
The details are complicated and i am not sure if i am ready to understand the proof, but i think you can even proof that there is no optimal algorithm even when you rule out trivial things like checking if the input matches a given list and returning a pre-sorted one or else sort manually.
Clearly, as you say, ignoring the input and producing a single output is not a sorting algorithm.
Comparing the input against pre-defined lists with a generic fallback algorithm is a sorting algorithm, but as you say it's rather 'trivial'.
However, I read klodolph's use of "permutation" as being more subtle than the above. Rather than splitting inputs into "known" (use pre-sorted answer) and "unknown" (use generic fallback), instead you write a valid sorting algorithm which treats all inputs in the same way, but you arrange all of the possible choice-points to perform optimally for your pre-selected inputs.
For example, imagine a quicksort which chooses exactly the right "random" pivots for particular inputs. It's not hard-coded in a particularly 'trivial' way; the fact that it works optimally for those inputs is a (carefully engineered) 'coincidence'.
In fact, some quicksort implementations randomly shuffle the input first, to lower the probability of malicious input triggering the worst-case O(n^2) behaviour. What if such a shuffle just-so-happend to "cancel out" the disorder of particular inputs, causing them to become sorted before the quicksort even begins? It's debatable whether that would count as a 'coincidence' or 'checking if the input matches a given list'.
We could even avoid the O(log n) overheads of average-case quicksort if we used that shuffling trick with bubble sort, to get its O(n) best-case performance.
Did anyone ever publish an academic paper about a sorting algorithm after merely showing that it beats mergesort time by several percent on couple of synthetic datasets?
They certainly should be able to, if they can show their algorithm actually does better on real world datasets.
But that's not my point. My point is that CS people can actually formally prove things about their algorithms. They can make reasonable assumptions like "if the data is randomly distributed". Machine learning can't make strong assumptions like that, and they can't formally prove anything. Expecting anything like formal proofs for machine learning algorithms is unreasonable because of the no free lunch theorem.
The same is true for sorting algorithms, if you remove the assumption that the data is randomly distributed. If you don't know how data is distributed in a real world problem, then there is no way of proving what algorithm will do better. You just have to test them empirically.
Houshalter: the no-free-lunch theorem assumes that all possible "real world" problems are equally probable. That does not appear to be true in theory (due to the laws of physics, e.g., see https://arxiv.org/abs/1608.08225 ) nor in practice (e.g., in complicated AI-type problems, the objects of interest always seem to lie on/near lower-dimensional manifolds embedded in a high-dimensional input space).
Despite the difficulty of solving tasks humans are good at, we also want machines to solve tasks that are impossible for humans to solve, so I don't think this is really a general argument against no free lunch.
But one could argue that sorting is a fundamental property of, presumably as keys, the numbers themselves, and therefore any alteration on an algorithm is a play on how numbers are ordered?
As an example merge sort established an invariant in its divide step which is exploited on the conquer step. Moving this idea forward comparison based sorting establishes some version of this invariant. This argument seems to make sense to me though.
I don't think there is a lack of a unifying theoretical framework. Both the turing machine and the lambda calculus are well understood and the foundations of computer science (and are isomorphic to each other).
I think the problem is more that a gap exists between the theoretical framework and practical programming. For example have fun trying to formally verify Javascript code!
Interesting side-note: we are currently unable to prove that there is nothing more powerful than a turing machine, i don't know if its even provable (that smells like something undecidable).
That may be the border of my english-understanding, but i can prove non-existence of certain things (are they entities?).
Since this is currently my seminar-research topic: In the Zermelo–Fraenkel set theory there is only one Urelement.
I can prove the non-existence of another Urelement not equal to the Urelement.
Since i only care about Zermelo–Fraenkel, i can define Urelemente as things inside the Zermelo–Fraenkel set theory that are not sets themselves but elements of a set. And there is only one element that satisfies this definition. Every other thing is not a Urelement, since it does not satisfy the definition.
I would think that disproving the nonexistence of something would be the same as demonstrating the existence of something. That seems doable, on a sense at least.
And I would think that disproving the existence of something could be done by deriving a contradiction from the assumption that the thing exists. This also seems possible.
I'm guessing I am misunderstanding you in some way. Can you help me understand?
Methinks you had a double-negative too many. Your sentence says it's impossible to prove the existence of something. Unless that's what you had intended, in which case I don't see how that's useful philosophy for this discussion.
Having different algorithms for different purposes is fine. For instance, autoencoders can do unsupervised learning, GANs can learn generative models that sample from the entire distribution, recurrent neural networks can handle time series, etc. Also while there are many different types of networks, research has shown which ones work and which ones don't. Few people pretrain autoencoders or bother with RBMs anymore, for instance. And I think we have good theoretical reasons why they aren't as good.
But to continue the analogy to computer science, imagine all the different kinds of sorting algorithms. They will each work better or worse based on how the data you are sorting is arranged. If it's already sorted in reverse, that's a lot different than if it's sorted completely randomly, or if it was sorted and then big chunks were randomly rearranged.
There's no way to prove that one sorting algorithm will always do better than another, because there's always special cases where they do do better. The same is true of neural networks, it's impossible to formally prove they will work, because it depends on how real world problems are distributed.