Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone.
My name is Naman Goel.
Today I want to talk about a profound shift happening in artificial
intelligence, a move beyond simple text generation towards what we call agent ai,
often powered by advanced language models.
For a while now, we have seen AI primarily as reactive tools.
You give a prompt, you get a text response, but the
horizon is expanding rapidly.
We are witnessing the rise of AI system that don't just respond, but act.
They are both product active, autonomous agents capable of understanding complex
goals, planning, interacting with their environment, and solving problems
with minimal human intervention.
In this job, we will explore this evolution how AI systems
are transforming from passive responders to active problem solvers.
We will develop into technical foundations enabling this change, examining
key operational frameworks, discuss the crucial ethical considerations.
And look towards the exciting future directions this research is taking.
This isn't about just better chat bots.
It's about redefining human AI collaboration and opening entirely new
frontiers of AI research and application.
So how did we get here?
Let's chase the journey.
Initially we had text generators.
Think of early LMS like GBD three in its initial form.
Powerful, yes, but fundamentally reactive.
They excel at generating human length text based on specific prompts, but are
limited awareness of broader context or the ability to act independently.
The next shift involved developing deeper contextual understanding.
Models became better at grasping nuances, implicit meaning with
the then request and maintaining contextual, longer interaction.
This was crucial, but still largely within the input output paradigm.
The real transformation is the emergence of autonomous agency.
Modern systems, the agent ai, we are discussing exhibit gold active behavior.
You give them an objective and they can figure out the steps needed.
Often with minimal human guidance, they transcend the limitations of
just generating text, speech, and pass pattern, and this agency often
involves multi-system interaction.
Integration Agent tech AI isn't confined into a single interface.
It can coordinate actions across different platforms, tools, and APIs
to accomplish complex multi-step tasks.
This ability to interact with external environments is a hallmark
of this new generation of ai.
This shift from reactive responses to proactive goal-driven action
is the core of agent revolution.
What makes this Jump two agency possible?
Several key technical foundations work together, which we can visualize
as layers, building from bottom up.
At the base, we have hierarchical planning.
Instead of tackling a huge goal at once, agent systems break it
down into manageable sub goals.
From high level strategic objectives to specific actionable task, and finally
operational execution of individual task.
This structured approach sometimes uses techniques like recursive refinement,
allows them to handle complexity and maintain coherence over extended periods.
Research shows models using hierarchical planning achieve
significantly higher success rates, sometimes around 45% than flat plane
planning approaches on complex tasks.
Building on planning is long-term memory.
Traditional LMS are often limited back their context window.
Agent AI needs persistence, specialized memory structures like
episodic memory for specific events.
Semantic knowledge, basis for conceptual understanding and working memory for
current context allows this agent to gain information, learn from past interactions,
and build understanding over time.
Integrating explicit semantic knowledge, for instance, has shown
accuracy improvements of up to 37% on certain domain specific tasks.
Next, we have tool interaction integration.
This is crucial for breaking free from the limitations of internal knowledge.
Agent AI can leverage external tools, APIs, and knowledge sources.
Performing web searches for real time information, executing code for
calculations, analyzing documents, or interacting with other software.
This capability extension dramatically expands their functional range
implementations, using tools have shown success rate nearly three times
of complex information gathering tasks compared to models without multiplexis.
Finally at the top, these foundations enabled decision autonomy.
The agent can independently evaluate situations, selects actions based on
its goals and context, and decide how and when it's to use its planning,
memory and tool capabilities.
It's this autonomy that truly distinguish agent tech systems.
Okay, we have the foundations, but how do we, these agents actually operate two
prominent frameworks, illustrate different approaches, react and plan, and execute
the React framework, which stands for reasoning and action combines these two
elements in the tight iterative cycles.
First formalized around late 2022.
It mirrors human cognition, observe reason, chain of thought is often
used here, act and then observe the result to inform the next cycle.
This continuous looks mix react very effective in dynamic
environments where conditions change frequently and adaptation is key.
It shows strong performance on task, requiring information gathering,
and a multi hub reasoning like benchmarks, hotpot, qa, and Web
Shop where React showed 31% higher completion dates than some alternative.
It also shows for better error recovery because feedback
is integrated immediately.
In contrast, the Plan and Execute framework emphasizes a clear separation
between planning and execution.
Inspired by classical AI planning, the agent first generates a
comprehensive, often data plan, and then executes its methodically.
Making only minor adjustments along the way.
Significant replanning only happens if major obstacles arise.
This approach excels in more structured, predictable environments where a
good upfront plan remains viable.
Study shows it can lead to more coherent solutions while stakeholder ratings,
sometimes 24% higher on consistency metrics in domains like urban planning.
It can be more efficient if the environment is stable, as
it avoids overhead of constant reasoning, so which is better?
The performance comparison shows there's no single answer.
Specialized frameworks excel in different contexts.
Tax complexity, environment dynamism, and the need for adaptability varies various
currents, often the optimal approach.
We are also seeing normal hybrid models emerging through that and
trying to get the best of both worlds.
Context sensitive framework selection is crucial for building agent systems.
As these agents became more capable, a critical question, psoriasis,
how much autonomy they should have.
We need a way to manage this responsibly.
The graduated framework provides a structured approach.
It defines different levels of agent's independence, human oversight.
At the lowest level, the agent might suggest actions, but a
human was to approve every step.
Direct supervision monitor.
The agent approach operates more independently, but requires human
verification at key points or from certain types of actions.
Founded independence.
Here, the agent has freedom to act within predefined safety parameters.
Constrained or rules oversight is less direct, but the boundaries are clear.
And lastly, full otomy.
At the highest level, the agent operates largely self-directed
with minimal human interventions, perhaps only for setting initial
goals on handling major exceptions.
The key idea is context.
Sensity.
The pre level isn't fixed.
It should adapt based on the task risk actions reversibility
the agent's confidence and the sensitivity of the domain.
Research suggests these nuance.
Context sensitive boundaries can significantly reduce unnecessary human
interventions by as much as 64% in some studies, while maintaining safety,
balancing efficiency with control.
The agent paradigm doesn't stop at single agents.
One of the most exciting frontiers is multi-agent systems, where multiple
socialized agents collaborate to tackle complex problems far beyond
the reach of any individual agent.
This enables collaborative problem solving.
Think of a team of experts.
Frameworks like Meta GPT demonstrate this by assigning specific roles like product
manager, architect, programmer within a software development process, each
agent brings unique skills or knowledge.
Effective collaboration requires internal dialogue mechanisms.
Agents need structured protocols to communicate, share information, build
consensus, and resolve conflicts.
Experiments within frameworks like Autogens show that structured direct
agent to agent communication can improve task completion rates by around 37%
compared to centralized approaches.
Often these systems exhibit hierarchical organization mimicing supervisor
worker relationships for efficient task delegation and coordination, and fastly.
We can see emergent social dynamics, complex group behaviors and problem
solving strategies can arrive from.
Relatively simple interaction rules between the agents and sometimes leading
suppressive and innovative solutions.
Multi-agent systems hold immense potential for tackling multifaceted challenges
in areas like scientific research, complex system design, and mobile
leveraging collective intelligence.
Another major frontier is embodied agency connecting the reasoning power of LLMs
to physical worlds through robotic.
This requires several key capabilities.
Environmental perception, agents needs advanced sensor vision, sound touch
for real time situational awareness.
This involves processing multimodal data and understanding context
aware scenes physical interaction.
Moving beyond just text requires precise manipulation.
This includes adaptive force control and sophisticated
object reasoning and handling.
Spatial navigations agent must move autonomously through dynamic
environments using obstacles.
Avoidance and path optimization algorithms, physical digital interaction,
creating seamless bridges between the virtual and material rooms,
perhaps through realtime digital twins or augmented reality interfaces.
Research here focuses on grounding abstract reasoning in physical reality.
Frameworks like vision, language action models integrate perception,
language, and motor skills.
System like S Spam, ERT one and RT two are demonstrating impressive
progress, enabling robots to perform complex manipulation tasks based on
natural language instructions with success rates, sometimes comparable
to specialized algorithms, but offering far greater flexibility.
Potential applications are vast from healthcare, patient assistance
therapy to complex industrial tasks.
Manufacturing maintenance is potentially hazardous environments
Embodied agency significantly expands the practical impact of agent ai.
Traditional, more adults are often chatting and training agent tech
system, especially those interacting with the world need to adapt.
Continual learning focuses on architecture that allow agents to
improve over through ongoing experience.
This involves several processes, often working in a cycle experience acquisition,
gathering diverse data from interactions with user and the environment.
Second, knowledge distillation, extracting meaningful patterns,
principles, and generalizable knowledge from these raw experiences.
Model adaptation, updating the agent's internal models on neural architectures
to incorporate this new knowledge.
Catastrophic.
Forgetting prevention.
This is a critical challenge.
This involves ensuring that learning new things does not erase previously
acquired knowledge or skills.
Technical life retrieval, augmented generation or specialized memory helps.
Preserve critical information frameworks like Camel and R-L-I-C-M-L illustrate
approaches where agents refine behavior based on outcomes and feedback.
The goal is to create systems that become more personalized, effective, and align
with user preferences over time without constantly costing retraining cycles.
Experiments show agents using interactive learning can achieve 24
to 38% higher success rates on complex tasks after multiple interaction
cycles compared to static systems.
This makes AI development more sustainable and adaptive.
Perhaps the most ambitious frontier is meta learning or learning to learn.
This aims to create agents that can adapt their own learning strategies
when facing new remains or challenges.
Keeping capabilities include few short adaptations, allowing agents to master
normal tasks with minimal examples, drastically reducing training data needs.
Dynamic architecture modification, enabling systems to autonomous,
restructure their internal processing based on the problem type.
Optimizing for unseen challenges.
Transferable skill acquisition learning in one domain enhances
performance in related ones, creating compounding returns on learning
hyper parameter, self-op optimization models, tuning their own configuration
settings, automating previously manual resource intensive processes.
Frameworks like Agent Verse and I-E-L-H-E are exploring how agents
can select learning strategies like switching between imitation
learning and reinforcement learning.
Or identifying transferable knowledge components.
The promise here is unprecedented.
Adaptability, research indicates meta learning agents might require 40 to
65% fewer examples to master new tasks compared to fixed strategy learners.
This represents a significant step towards more general, versatile and efficient ai.
While much of this is cutting edge research, agent AI is already finding
practical applications across industries in healthcare can integrate patient
history, research and imaging to a diagnosis or coordinate complex care
plans across specialists in finance.
Portfolio optimization agents can balance risk and goals independently
executing complex trading strategies across markets In scientific research,
hypothesis suggesting can design and conduct experiments, adapting
protocols based on emerging data, potentially accelerating discovery.
These examples showcase the potential for agent systems to handle complex multi-step
tasks, requiring reasoning planning and tool use in real world scenarios.
With great capability comes great responsibility.
The rise of utmost agents introduces significant ethical consideration
that we must atla proactively.
We group these into key areas, agency alignment.
How do we ensure agent goals remain consistent with
human values and intentions?
We need mechanisms for value drift prevention and purpose
validation, control mechanisms.
As agents become more.
How we maintain reliable oversight and the ability to intervene.
This includes drill switches, defining clear behavior boundaries, and designing
effective human AI interaction protocols.
The paper identifies control risk at a critical category,
societal impact.
What are the effects of workforce potential, job displacement and
issues of access and equity?
We need to consider displaced worker transitions and ensure
equitable access to these powerful technologies accountability frameworks.
Who is responsible when an ATMOS agents makes some mistake?
We need clear frameworks.
Defining responsibility, including decision audit trails and
stakeholder recourse mechanisms.
Yeah, so we are gonna highlight key risk categories, misalignment,
uninvented, consequences, deception, and in adequate control.
Addressing this require interdisciplinary effort and em embedding ethical
considerations throughout the design, development and deployment life
cycle, not just an afterthought.
Frameworks like values sensitive design are crucial here.
This slide gives us snapshot of current research focus on publication trends.
As you can see, there's immense activity in developing novel agent architectures,
the core structures that enable agency.
Multi-agent systems are also a major focus reflecting the
interest in collaborative ai.
We are also see significant research and avoided AI connecting these agents
to physical world and in advanced learning paradigms like meta learning.
Crucially, ethics and governance is recognized as a vital research
area, though perhaps still needing more adventure relative
to capability development overall.
This lacks clearly shows the significant shift in AI research
towards Atmos agency moving far beyond traditional LLM capabilities.
Looking ahead, where is Agent AI heading?
We anticipate collaborative agent ecosystems, self-organizing communities
of specialized agents tackling complex problems through coordinated division
of labor human agent co-evolution.
Deeper sym relationships will emerge where humans and AI adapt together refining
knowledge, work, and collaboration.
Cognitive architecture convergence.
AI systems may increasingly mirror aspects of human cognition,
potentially incorporating elements like emotion, creativity, or intuition.
Alongside logic embedded ethical frameworks, agents, stem cells might
develop more nuanced moral reason capabilities enabling principal decision
making even in ambiguous situations.
In conclusion, the emergence of agentic AI represents a fundamental expansion of
what artificial intelligence can be and do by booming from reactive text generation
to proactive goal directive behavior, these systems unlock new possibilities
for human AI collaboration, andous problem solving across countless domains.
As hierarchical planning, memory, tool use and learning continue to advance, these
agents will become increasingly capable.
However, this progress must be.
Guided by thoughtful evaluation, robust safety mechanisms and
strong ethical frameworks to ensure alignment with human values.
The transition from passive respondents to active problem solving is just
not an incremental improvement.
It potentially is one of the most tion developments in AI
evolution, fundamentally reshaping how these systems participate and
contribute to human endeavors.
Thank you for attending the top.