DEVONthink and DeepSeek?

rkaplan · February 6, 2025, 7:30pm

Totally agree - we are nowhere near achieving true AGI and perhaps never will. Understanding how human consciousness arises is arguably the biggest and hardest unkown in the universe.

That said - I do suspect the description of LLM functioning as simply a “pattern predictor” is too simplistic. I do not know how it would be possible for an LLM to answer novel questions and/or to correctly debug original computer code if it were simply a pattern predictor.

kewms · February 6, 2025, 7:48pm

Computer code is extremely predictable, much more so than natural language. The subtasks within a function are even more so: there are only so many ways to write an iterator or to compare two strings.

As for novel natural language questions, what’s novel about them? If you are asking about, say, gravity, there are an enormous number of sentences about gravity in the training corpus. It doesn’t require reasoning to identify “sentences about gravity and black holes” and piece together a new sentence on that topic.

A more interesting analysis, IMO, is to look at the cases where LLMs fail. They don’t fail in the same ways that humans do. Why not, and how does that behavior illuminate the differences between human and machine cognition?

rkaplan · February 6, 2025, 9:52pm

I can ask Perplexity a question about a new piece of tech just introduced this week and ask it to compare it to prior tech. It gives me answers which cannot be found anywhere currently in the developer’s website nor anywhere else on the web.

It is true that the pattern of Perplexity’s reply is fairly predictable - the structure it uses in its writing and overall conclusion are of a familiar format. But the content itself can clearly be novel.

Totally agree -fascinating question.

kewms · February 6, 2025, 11:56pm

I don’t need to know what the product is to say that the current generation improves Metric A by 5%, and Metric B by 15%. And those might even be plausible answers, especially if I had a database full of examples showing that Metric B typically improves by 15% in each generation. But that doesn’t mean I’m “thinking” or “analyzing” the performance of the product.

rkaplan · February 7, 2025, 12:08am

More like - "Features X and Y are not compatible but you might want to try Products 1, 2, and 3 which have similar features. Or you can write a script using the API found at URL1. "

That’s more than pattern analysis.

This is an AI example from the new ChatGPT/OpenAI Deep Research platform. The analysis seems like a whole lot more than pattern recognition to me. I happen to also agree with the ultimate conclusion that the study is promising but a bit hard to believe given the perfect response to a small non-controlled study.

This analysis would be worthy of a literature review taking a couple weeks full-time work by an MD/PhD. I am a huge stickler for hallucination or errors by AI; so far I have not found a single error except for a couple instances where the link back to the original study is blank.

This is pattern recognition? I suppose the clinical practice of medicine or even my work as an expert witness in medical litigation are “just pattern recognition” too, yet I don’t think many people would minimize the cognitive abilities required for those jobs.

meowky · February 7, 2025, 6:11am

That is not to be taken for granted. Our confirmation bias tends to reinforce the long-standing belief that humans are different from machines. However…

Each human being is different. Some humans tend to fail at something whereas others do not. If a GenAI agent fails at that something, how should we categorize that failure?
If we assume that
- (a) GenAI learns from humans, and
- (b) that GenAI is good at what humans are good at because of (a), then it is reasonable to extrapolate that
- (c) GenAI struggles where humans struggle, too.

Chinese humor provides an excellent case of my point 2(c). The Chinese script is unique for being logographic. Each character within a multi-character word has its own meaning, which is often quite different from that of the word. Characters with similar pronunciations are interchangeable for humorous purposes. These traits make it quite challenging for a LLM agent (and even a human native speaker) to “understand” the true meaning of Chinese humor, for there are so many possibilities yet only one is intended.

DeepSeek is perhaps the best among competitors when it comes Chinese-language tasks. Its responses to select humors were highly consistent with that of a hypothetical human who is a native speaker but unfamiliar with the genre (see e.g. this Chinese-language video). Even though DeepSeek did not appreciate the nuances of many of these humors, it failed in a very human-like way.

Silverstone · February 7, 2025, 2:25pm

Let me enter the “club of not-neuroscientists” )

They are also not-neuroscientists if the just “believe” )
It vastly depends on how to compare. Comparing geometry of human brains with the geometry of Apple M4 Max betrays not only a “shallow understanding of human cognition”, but itself a poor “human cognition” of that person.

Absolutely. Don’t like mechanistic, as well as holistic approach. Like, two people love each other, because atoms (they consist of in the end) love each other. Otherwise, where is this «love» from? This is the first aspect (emergence effect in systems approach).

Another aspect is in what is the «same way». Memory allows us to store and retrieve information. And as the ideal function it is the same. When we materialize this function in «proteins» or in «silicon» we get the difference, of course. Big or small, depends on many factors, but stating that «there are no difference at all», as well as «it is totally incomparable» betrays a very shallow understanding of how comparing works, as the base cognition method.

That’s true. But we need to include in this comparison other things as well:

how long human being in average creates its «LLM»? 15 years (of «incandescent bulb») vs 1 month (of power plant)
how long human being gives an average answer? 1 month of research (of «incandescent bulb») vs 15 seconds (of power plant).

But I guess human being wins eco battle for now… )

And DNA as the information storage? Storing information on a molecular level… Evolution made it from RNA molecule specifically for storage purposes (changing uracil and removing hydroxyl group from ribose).

rkaplan · February 7, 2025, 2:38pm

Excellent analogy. I think it applies to AI also.

Some say AI is “just a word predictor” and no more.

Can’t I say that the human brain is “just a bunch of Sodium channels” so human consciousness and emotion are nonsense?

Silverstone · February 7, 2025, 3:46pm

All the biological evolution, in this sense, may be divided into three major stages, depending on the “cost of the learning” to adapt (roughly):

“Die to adapt”. The species accidentally get an advantage to a changed environment. The rest of the population dies. This species survive. Very slow, depends on chances, but you get this feature from the birth (congenital).
“Repeat to adapt.” Appears mostly with the central neural system. You just need to repeat right behavior several times to form a reflex. This is the basis for next step. Some scientists, like Pavlov (discovered conditioned reflexes) suggested that thinking - is a complex network of conditioned reflexes). No need to die to adapt. Much faster adaptation. Need for learning first. The main limitation - there are dangers, which you cannot repeat to adapt.
“Imagine to adapt”. Here is where we get consciousness, when we build the model of the reality, which can predict this reality (i.e. knowledge), before we do something. Though there were a lot of research on elementary cognitive behaviors of animals (first half of 20 cent), it is a common knowledge, that only human today can do this trick (first imagine what is never done before - and then do it).

Well, I guess AI now is somewhere between (2) and (3)… And I don’t see insurmountable obstacles on the way to (3) - building and developing (changing and verifying) models. Verifying cycle may be the hardest one to implement. In the end, it took dozens of thousands of years for biological evolution to make it from (2) to (3). “Bio-silicon evolution” is thousand times faster… )
The nightmare may happen when “silicon” understands that “bio-” becomes a restraint…

Silverstone · February 7, 2025, 3:53pm

French materialists (mechanists) liked to say like this in 18th century ))

kewms · February 7, 2025, 4:41pm

Training a large LLM actually takes multiple years of GPU time. Massive parallelism is why it only looks like a month.

As for the answer time, what’s the question? And is the answer accurate?

Not really. In silicon, you can say these bits of information are stored in this location. Information in human memory is not localized: you can’t say, “these are the cat recognition neurons” or “this is the English dictionary.”

kewms · February 7, 2025, 4:45pm

I’m pretty sure no human over the age of eight has ever suggested putting glue on pizza.

kewms · February 7, 2025, 5:23pm

That’s an impressive analysis, and better than most of the ChatGPT results I have seen. It’s worth noting that “DeepResearch” is part of ChatGPT’s “Pro” tier. Its ability to search the web for results outside its training set is a substantial improvement, and the Pro tier buys a significantly larger analysis window.

And yet… this task is easier than it looks. All of the information about the study is contained in the paper itself, which also helpfully provides both links and keywords to relevant material elsewhere on the web. Similarly, there are lots of guides on “how to evaluate research studies” out there on the web, and probably plenty of papers addressing the limits of this kind of trial specifically. (I also don’t think this task would take a human weeks. A full literature review would, but not this kind of analysis focused on a single paper.)

I think one of the reasons why humans tend to think LLMs “have to” exhibit “intelligence” to do what they do is that their training corpuses are incomprehensibly huge. The Entire (public) Web is so vast that patterns emerge that simply aren’t visible at the scales that humans normally deal with.

rkaplan · February 7, 2025, 6:53pm

Fair enough - good points.

That said - why do you suppose that OpenAI Operator is a fail whereas OpenAI Deep Research is a huge win?

For that matter I tried Google Gemini Deep Research - it is awful because it has hallucinations galore. How does OpenAI cure that problem but Google cannot?

Silverstone · February 7, 2025, 7:31pm

Note, as the ideal function.
What you further describe is how exactly it is materialised “in proteins” or “in silicon”. Of course there is the difference, as well as some similarities.

As for “cat recognition”. Saving such information is not in neurons itself - but in their network and its dynamics. All the more, the problem is that there may not be such a single object as “cat” to search for location - I suppose here is a confusion of different levels:

sensation (registering data by sense organs)
perception (completing registered data to the whole / pattern e.g.)
imagination (playing a model outside “registered data” or “completed registered data”)

kewms · February 8, 2025, 5:35am

My guess would be that Deep Research’s ability to augment its corpus with a topic-specific web search is a tremendous help.