- Hyena AI is a new AI model developed by researchers at Stanford University that promises to be more efficient and powerful than GPT-4.
- Hyena AI can process larger volumes of data and maintain context for longer conversations compared to GPT-4.
- Hyena AI uses a hierarchical filtering approach instead of the attention mechanism used by GPT-4, resulting in faster processing speeds and better handling of complexity.
- Hyena AI has the potential to summarize entire books, handle intricate databases, and engage in lengthy and complex human-to-AI dialogue.
It’s widely agreed that Chat GPT-4 currently represents the gold standard when it comes to AI. GPT-4’s success has mostly to do with its ability to process and perform logical reasoning on huge amounts of data in mere seconds.
With Code Interpreter, GPT-4 can even do the work of an entry-level data analyst: look at data, clean data, manipulate data, and create sensible visualizations.
But even with GPT-4’s capabilities, the technology has limitations. Everyday users notice these flaws when they witness GPT’s “hallucinations” (stating falsehoods with confidence) or see GPT unable to process information due to its knowledge cutoff.
It’s also not always reliable at creating working code, or code that makes sense for the task — though it does a decent job most of the time.
Towards Larger Convolutional Language Models
Unlike your average GPT-4 subscribers, there’s another group of people much more attuned to the technical aspects of OpenAI’s approach. These are researchers, many clustered around Stanford University, who’ve been following developments in AI — particularly in natural language processing (NLP) — for years.
Michael Poli, Stefano Massaroli, Eric Nguyen, and other scholars have been tinkering with a different way of creating an AI chatbot that they believe could work many times more effectively than GPT-4 or other state-of-the-art AI currently available.
They call their project “Hyena,” and they explain the results of their findings in a scholarly paper titled: Hyena Hierarchy: Towards Larger Convolutional Language Models.
The paper gets highly technical. But if you want to learn about this fascinating advancement in AI without diving into a 40-page manuscript, you’re in the right place. We’ll break it down into simple terms and see how it compares to the GPT-4. Let’s start with some basic facts.
Hyena AI: 5 Must-Know Facts
- Unlike GPT-4 which uses an attention model, Hyena operates through a hierarchy of convolutional filters, allowing it to maintain context over longer sequences and avoid overfitting on less relevant data.
- Hyena’s processing power is significantly greater than GPT-4’s. It can handle larger sequences and operate at much faster speeds. For example, in sequence lengths of 100k, Hyena runs 100 times faster than FlashAttention, the most optimized AI model currently available.
- Not only is Hyena faster, but it also delivers its efficiency using twenty times less computing power than Chat GPT-4 and similar AI bots.
- The goal of Hyena is not only to handle a larger scale but also to work better with increasingly complex requests, such as sophisticated reasoning and logic.
- While Hyena promises many improvements over existing AI models, it remains theoretical and has not been deployed in a live environment. Researchers have made all the relevant Python code publicly available on GitHub for further development and testing.
Hyena AI: Briefly Explained
In case you don’t already know, a hyena is a vicious wild canine that lives in large groups, known as clans. Hyenas stalk their prey for many miles and are capable of speeds up to 40 miles per hour.
But wait, what does this have to do with artificial intelligence? Well, the Stanford researchers we introduced earlier thought that their AI model bore some similarities to the hyena — mainly in terms of speed and range.
Like a hyena, the AI technology these researchers are developing can reportedly consume objects many times its size, and can “pursue” objects for much longer than current natural Language models.
The Hyena AI model can also deliver this efficiency with about twenty times less computing power than Chat GPT-4 and similar AI bots.
As an end user, it may be difficult to see why this extra heft matters. After all, Chat GPT-4 already seems super intelligent and already appears to work extremely fast. So, what’s the fuss?
A More Powerful Language Model
Michael Poli et al. points out that, because of how GPT’s designed, the AI will begin to slow down the more parameters it has to work with. It will also require more and more computing power as time goes on, and as demand for more complex operations increases.
We may not see these limitations now, but we will at some point in the future — probably sooner rather than later.
Hyena anticipates these coming technical blockers and is gearing up for greater volume and complexity. In doing so, it promises to deliver exponentially greater processing power and awareness of context.
In theory, it should be able to process entire textbooks or databases of information. On the contextual side, it should be able to maintain a conversation thread for hours, maybe even longer.
If all this hype is true, that means Hyena would be much closer to human-level intelligence than current AI models. It would certainly be superior to Chat GPT-4.
To really understand how Hyena works, and how it fundamentally differs from GPT-4, we need to also understand how GPT-4 works on a technical level. That’s where we’ll go next.
How Chat GPT-4 Works
Yes, GPT-4 is AI. More specifically, though, it’s a natural language processing model. In other words, GPT-4 is designed to understand “natural” human dialogue. When asking GPT a question, you don’t need to do so in code as you would with other computer programs — you just use normal text.
NLP doesn’t just come into play when GPT-4 processes your queries, though. It also uses this model on the data it sifts through to find answers to your questions. This means it can use both textual data and hard data to provide responses.
The Attention Mechanism
GPT-4 comes up with responses by using a mechanism that AI developers call “attention.” The model tries to predict logical answers to problems by focusing on specific parts of input data that the model thinks might be relevant.
GPT-4 chooses focus by assigning different weight to relevant data. The more weight a data point has, the more focus GPT-4 will place on it.
The Stanford researchers point out that, while attention works well in many cases, it does have its weaknesses. They describe attention as “fundamentally a quadratic operation, as it compares each pair of points in a sequence.”
By following data points in basically a linear path, GPT-4 and other attention-oriented AI models eventually run out of bandwidth.
Maybe these complaints seem whiny, but they’re actually significant blockers to progress in AI. Relying solely upon the attention model means we’re highly limited in terms of how much training data and context our AI helpers can take.
It couldn’t absorb a whole book for you, or an entire database full of new information. It also has pretty short-term memory in conversations.
Since attention-driven AI requires endlessly more computing power, it’s also needlessly expensive to maintain. It may even come with greater environmental harm, similar to how the energy required to power NFT places severe burdens on current energy infrastructure.
Enter Hyena to save us from these woes. Let’s see how its approach differs from OpenAI’s attention model.
How Hyena Is Different — Maybe Better — Than GPT-4
The authors of Hyena Hierarchy describe a markedly different natural language processing approach compared to attention models like GPT-4. Whereas ChatGPT’s model is essentially quadratic, the Stanford researchers envision Hyena as sub-quadratic.
Basically, Hyena de-emphasizes attention and instead works through a hierarchy of convolutional filters. Employing filters rather than weights means Hyena is far less likely to get trapped overfitting on data that fails to yield the most accurate answer. Hierarchical filtering also allows Hyena to maintain context over much longer sequences.
Not only can Hyena handle greater sequence length, but it can also process at far greater speeds. In sequence lengths of 100k, Hyena runs 100 times faster compared to FlashAttention, the most optimized AI model currently available.
Best of all, Hyena’s speed and analytical prowess don’t come at the cost of increased computing. In fact, Hyena’s researchers reported being able to achieve significantly increased context length while also using twenty times less computation compared to other AI models.
Hyena’s not merely intended to handle greater scale, but it’s also meant to be “smarter.” Specifically, it’s designed to work better with increasingly complex requests.
Its ability to filter through longer sequences while maintaining contextual awareness essentially makes Hyena more precise, more detail-oriented, and more able to think like a human. This should, in turn, result in far more sophisticated reasoning than existing AI.
Handling Greater Complexity with Fewer Resources
The practical implications here are obvious — and enormous. Owing to its innate ability to handle greater complexity at scale, Hyena takes natural language processing to the next level in every possible way.
For instance, what if AI had the ability to summarize an entire book, rather than a mere article? Due to computational and context limitations, current AI models cannot handle this task. In theory, though, Hyena would be able to.
It would also be able to keep up with ever-evolving and intricate databases, such as a company’s customer database, or perhaps even a medical facility’s patient records. This obviously raises some possible ethical concerns, but the technological implications are exciting to think about.
Hyena would also be far more equipped to hold its own in lengthy and complex human-to-AI dialogue. As many probably know, current chatbots aren’t quite up to this task. They tend to require much hand-holding, and the longer you talk to them, the more likely their memory will begin to break down.
Will Hyena Replace GPT-4?
There’s no question that Chat GPT-4, and even its lesser freebie version, GPT-3.5, have proved themselves to be game-changers. GPT and other AI services are shaking up the world of work and the private lives of ordinary people. As revolutionary as ChatGPT has been, though, there’s still so much to develop in the world of AI.
Hyena appears to be the best theoretical representation of the next major breakthrough in AI. Note our use of the term “theoretical,” though. Hyena is not a product, it’s a model set in a limited testing environment.
GPT-4, on the other hand, is a live product with a larger library. Sure, we’ve seen its limitations, its failures, its hallucinations, its biases, and so on, but this is inevitable with any technology.
Since nothing like Hyena has actually been deployed in the wild, we can’t say for absolute certain if it will stack up against — and then surpass — existing AI models.
Yes, through the course of experimentation, Hyena has achieved stunning results in NLP. But it remains to be seen how Hyena will perform when provided as many parameters as GPT-4 has.
In the meantime, if you’re a developer who’d like to see the nuts and bolts behind how Hyena works, luckily for you, Hyena’s researchers have made all the relevant Python code publicly available on GitHub.
The image featured at the top of this post is ©Pungu x/Shutterstock.com.