AgenticUniverse - Previously Formi
    AgenticUniverse - Previously Formi
    • Our Technical Note
      • Why Open AI is not Enough
      • How business Outcomes would Change Radically with AgenticUniverse
      • Our Research
        • STT - Nuances and Insights
        • Solving for STT Constraints
    • Generate Token
      • Login/Generate Token
        POST
    • Agent Configuration
      • Model Configuration
        • Configuration Helpers
          • Supported Providers
          • Supported Models
          • Supported Parameters
        • Get Model Configuration
        • Set Model Configuration
      • State Machine
        • Edge
          • Legacy
            • Create Edge
            • Edge Details
            • Update Edge
            • Delete Edge
          • Update edge properties
          • Get edge details
          • Delete an edge
          • Create an edge (transition) between two states
        • State
          • Create State from Template
          • Get State Information
          • Update State
          • Delete State
        • Get State Machine Structure
      • Prompt Templates
        • Get All Templates
        • Render Template
      • Tools
        • Get Tools List
        • Add Tool
        • Update Tool
        • Delete Tool
      • Get All Agents
        GET
      • Single Agent Details
        GET
      • Create Agent
        POST
      • Update Agent Details
        PUT
      • Enable Dashboard For An Outlet
        POST
      • Disable Dashboard For An Outlet
        POST
      • Get Call queue Sheet ID
        GET
    • Interactions
      • Pre-Interaction Context
        • Schedule an Interaction
        • Update an Interaction Id
        • Delete an Interaction Id
        • Clear all interactions
        • Get Summarized Interaction Info
      • Interaction Modalities
        • Video
          • Generation
            • Generate Welcome Video
        • Text
          • Start Interaction
          • Create Response
          • End Interaction
        • Voice
          • Connection Configuration
            • Quickstart
            • Connecting Twilio
            • Connecting Exotel
            • Formi WebSocket Configuration Guide
            • Create a New Connection Vendor
            • Get All Connection Vendors
            • Update a Connection Vendor
            • Delete a Connection Vendor
            • Get Agent's Connection Config
            • Add or Update Agent's Connection Config
      • Post Interaction Configuration
        • Email Destination Configuration
        • Variables CRUD
          • Get all required variables for the outlet with map
          • Modify variable definition for outlet
          • Add a new variable for the outlet
          • DELETE variable for outlet
          • Connect Variable to a destination
        • Destinations CRUD
          • Get all destinations for the outlet
          • Modify Destination for outlet
          • Add a new Destination for the outlet
          • DELETE Destinations for outlet
      • Get Interaction Summary
        GET
      • Resolve an Escalated Interaction
        POST
      • Get the Interaction list
        GET
      • Get Information regarding Single Interaction
        GET
    • Agent Utilisation
      • Get Credits Available
        GET
      • Interaction Utilisation
        GET
      • Model Utilisation
        GET
    • Webhooks
      • Get webhook URL
      • Update webhook URL
      • Get webhook metadata
      • Modify webhook metadata
      • Get reservation ingestion metadata
    • Untitled Endpoint
      POST

    Our Technical Note

    The Global Problem We're Solving

    The Enterprise AI Implementation Crisis: Despite $330B in projected enterprise AI investments, 70% of projects are failing due to fundamental architectural limitations rather than execution issues. The core challenge isn't talent shortage — it's that current AI paradigms are fundamentally inadequate for enterprise transformation.

    The Data Gap: Traditional AI systems operate on incomplete data representations, missing critical behavioral signals (hesitation patterns, contextual pauses, multi-modal interactions) that determine genuine customer intent. This creates a massive opportunity for 10x differentiation through signal-complete AI architectures.

    The Orchestration Paradox: While enterprises demand end-to-end customer journey automation, current tools create fragmented experiences—handling isolated touchpoints but failing to maintain contextual continuity across voice, text, and email interactions.

    Our Contrarian Technical Thesis: Challenging Silicon Valley's Core Assumptions

    We believe the AI industry is building on three fundamentally flawed assumptions that create massive market opportunities for IP-led disruption:
    1. The Scaling Fallacy

    Assumption: Larger models + more data = eventual AGI

    Reality: Scaling current transformer architectures hits diminishing returns without addressing data completeness and causal understanding. Most of the public data present is over for any training improvements to be noticed.

    2. The Specialization Trap

    Assumption: Task-specific AI systems are the practical path forward

    Reality: Fragmented systems create customer journey disconnects and prevent true business transformation

    3. The Intelligence Without Theory Myth

    Assumption: We can achieve general intelligence by solving isolated components

    Reality: Without unified frameworks for data mapping, processing and causal learning, we're building sophisticated but brittle systems.

    Deep Dive into why OpenAI and fine-Tuning is not Enough

    Background

    Customer decisions are driven by four completely different types of data that current AI systems cannot process together effectively.

    When a customer interacts with your business, they're simultaneously sending signals through multiple channels that tell different parts of their story:

    Signal TypeDescriptionExamples
    Behavioral signalsIrregular time series on non-uniform grids (clicks, engagement patterns)- Click hesitation patterns
    - Page scroll velocity
    - Cart abandonment timing
    - Support ticket frequency
    - Feature usage sequences
    - Login patterns & session duration
    - Mobile vs desktop switching
    - Document download patterns
    Emotional signalsContinuous manifolds in prosodic space (vocal confidence, speech patterns)- Vocal confidence levels
    - Speech rate fluctuations
    - Tone escalation patterns
    - Frustration markers (sighs, pauses)
    - Excitement indicators (pitch changes)
    - Stress detection in voice
    Contextual signalsDiscrete categorical distributions (demographics, channel preferences)- Geographic location & time zone
    - Channel preference history
    - Purchase history categories
    - Referral source patterns
    - Social media activity level
    Linguistic signalsSequential token embeddings with attention structure- Vocabulary complexity
    - Question formulation patterns
    - Technical vs casual language
    - Urgency keywords usage
    - Politeness/formality levels

    Our Current Approach

    Current Engineering Path: Currently we're proving the thesis through sophisticated orchestration systems and collecting the data necessary to train the model:

    • Reasoning Models: Advanced prompt engineering and chain-of-thought processing to handle complex decision trees
    • Multi-Model Orchestration: Intelligent routing between specialized models based on signal types and business context
    • Human-in-the-Loop Validation: Expert feedback loops that teach our system which signal combinations predict successful outcomes

    We've built a human-in-the-loop system that simultaneously:

    • Meets the ROI standards for businesses we are working with (current customers)
    • Collects ground-truth causal labels for signal → action mappings
    • Captures all four signal types synchronously

    Dataset:

    • We have 100,000+ labeled signal-action pairs with causal annotations for Customer x Business interactions.
    • 60-70 distinct actions across 65 businesses.
    • Only dataset with synchronized multi-modal signals and causal ground truth

    Key Observations:

    • Signal combinations show 85% consistency within business contexts
    • Emotional-behavioral signal interactions dominate action selection (67% of variance)

    The above orchestration is constrained to the signal-to-action mapping identified by the human in the loop, and to achieve superhuman performance and scale, we need a different way to think about the challenge altogether.

    The Fundamental Limitations of Human-in-the-Loop Approach

    LimitationThe ProblemBusiness Impact
    Cognitive Processing BottleneckHumans cannot consciously capture all signal-to-action mappings because most decision-making occurs in the subconscious (System 1 thinking). Expert operators make intuitive decisions based on pattern recognition they cannot fully articulate or remember.Critical signal combinations remain unlabeled, creating gaps which are realised and stitched together once the business takes the agent live in the real live environment.
    Degrading Performance Under ScaleAs customer interactions increase, human validators cannot maintain consistency in labeling complex signal combinations. Each validation decision requires cognitive load that decreases accuracy over time.System performance degrades daily without constant human recalibration, requiring exponentially more human resources to maintain quality standards.
    Time-to-Value MismatchCurrent mechanism requires 3-6 months of human training and validation before businesses see meaningful ROI.
    Most enterprises need immediate value from AI investments and cannot justify extended implementation periods.Market adoption becomes limited to businesses with exceptional patience and resources, constraining scalability.
    Economic and Risk BarriersOur current approach is expensive (high human resource costs). Businesses cannot budget for indefinite human-in-the-loop costs with uncertain outcomes.We're currently absorbing these costs through engineering and human resources to meet customer benchmarks, but this model cannot scale to hundreds of enterprise customers.

    The Breaking Point

    This approach works for our current 65 business implementations because we can absorb the human overhead costs. However, it creates an impossible scaling equation: each new customer requires exponentially more human validation effort while providing only linear revenue growth.

    This is precisely why we need the native signal-processing model - to eliminate the human bottleneck and create truly scalable AI that learns signal-to-action mappings automatically, rather than depending on human cognitive limitations.

    The 10x in ROI

    Human operators can only handle and optimise 20-30% of customer interactions effectively, leaving 70% of touchpoints uncaptured due to human errors, or lack of experience in the coverage possible. You can think of a Junior Sales Rep vs. a Country Head who knows all the tips and tricks to optimise for the outcome.

    1. The current state of agents are limited to conscious human decision-making patterns and what the human can observe. Our model would captures subconscious signal nuances that humans cannot articulate but significantly impact outcomes, leading to our system outperforming humans for hitting business goals.
    2. Capturing complete signal-action combinations ensures 95%+ customer journey coverage without human involvement.

    Potential Business Metrics Impact:

    • Conversion Rate: 15% standard → 52% (3.5x improvement from complete signal processing)
    • Customer Satisfaction: 72% → >95% (seamless experience across all touchpoints)
    • Revenue per Customer: 40% increase from optimal action selection at every interaction
    • Cost Reduction: 98% reduction in per-interaction costs

    What Should Change: Signal-To-Action Mapping and Model

    Our Vision: We aim to build the first AI model capable of processing the complete spectrum of human behavioral signals to drive accurate business actions. This isn't theoretical—we're systematically building toward this through deliberate data collection and validation.

    What would become the 10X Differentiator?

    Current AI models fail because they're trained on text-heavy datasets that strip away crucial behavioral context. We're collecting signal-rich datasets that capture:

    • Temporal patterns (hesitation, response timing, conversation flow)
    • Multi-modal behavioral cues (tone shifts, engagement patterns, decision points)
    • Contextual business outcomes tied to specific signal combinations
    • Real customer journey progressions with complete signal histories

    We aim to build the foundational AI model that solves the fundamental mathematical problem of multi-modal causal inference - creating the first system that can natively learn the causal mapping necessary across heterogeneous signal spaces and outperform human performance for meeting a business outcome by a large gap.


    The Mathematical Challenge We've Uncovered

    The core problem: No existing method can learn the causal mapping

    G: S₁ × S₂ × S₃ × S₄ → A

    where:

    • Each Sᵢ has different topology and sampling rates involves hidden, heterogeneous reward functions
    • The action space A is compositional (starting with 60-70 base actions which currently our agent performs with continuous parameters)

    High Level Representation
    Mermaid Chart - Create complex, visual diagrams with text. A smarter way of creating diagrams.-2025-08-08-092918.png

    WhatsApp Image 2025-08-08 at 13.38.14.jpeg

    Why Current Approaches Are Failing

    1. Concatenated embeddings + transformers: Destroys causal structure, treats all signals as independent and identically distributed.
    2. Standard RL: Assumes single reward function, fails on population heterogeneity.
    3. Existing causal discovery (PC, GES algorithms): Requires homogeneous data types.
    4. Multi-modal fusion: No framework preserves causal relationships across modalities.

    High Level Representation

    Mermaid Chart - Create complex, visual diagrams with text. A smarter way of creating diagrams.-2025-08-08-081622.png

    • Deshpande et al. (2022) work on images/text - but customer behavior involves time-series behavioral data, emotional signals, and contextual factors that their method doesn't handle
    • Transformer causal encoding (Nichani et al.) works for structured data but breaks down with heterogeneous sampling rates and signal modalities.

    Based on the current research done on causality and how it works:

    • No existing method handles their specific 4-signal combination (behavioral time series + emotional prosodics + discrete context + linguistic embeddings)
    • Academic solutions don't scale to 65+ business implementations with real-time requirements
    • Theoretical frameworks lack the engineering infrastructure for practical deployment

    Mathematical Assumptions Underlying the Framework

    AssumptionDescriptionMathematical Formulation
    Signal Completeness and UniversalityWhere θc, θb are finite-dimensional parameters. This assumes: The four signal classes S = {S₁, S₂, S₃, S₄} are sufficient (no hidden confounders). No business-specific signal types exist outside our taxonomy∀ \; customers \; c ∈ C, ∀ \; businesses \; b ∈ B: P(a\|s₁,s₂,s₃,s₄,c,b) = P(a\|s₁,s₂,s₃,s₄,θc,θb)
    Reward Function SeparabilityIndividual reward functions decompose into a universal base function plus bounded personal variations. This assumes humans are "mostly similar" in their decision-making.R(s,a\|c) = R_{base}(s,a) + ΔR(s,a\|θc)
    Finite Compositional Action SpaceActions are parameterized functions with finite parameter spaces, not truly continuous.A = \{f₁(θ₁), f₂(θ₂), ..., fₙ(θₙ)\} where n ≤ 70, \|Θᵢ\| < ∞ \; ∀i
    Temporal Markov PropertyWhere hₜ is a finite-dimensional sufficient statistic of history. Future actions depend only on current signals and compressed history.P(a_t\|s_{1:t}, a_{1:t-1}) = P(a_t\|s_t, h_t)
    Stationarity Within ContextThe causal relationships are stationary within a business context over reasonable time windows τ.P(A\|S,t,context) = P(A\|S,context) \; ∀t ∈ [t₀, t₀ + τ]

    Our Proposed Mathematical Framework

    1. Causal Kernel Embeddings

    φ: S₁ × S₂ × S₃ × S₄ → ℋ (space)

    Maps heterogeneous signals to shared space while preserving causal structure.

    2. Population-Level Inverse RL

    Learn P(R|context, customer\_type) from observed (s,a) pairs

    Discovers the distribution of reward functions across customer populations.

    3. Causal Action Decoder (THE MISSING PIECE)

    π*: ℋ × P(R) → A(θ)

    Maps from embedded signal space + reward distribution to compositional actions.

    This could be implemented as either Direct Causal Mapping or Causal Policy Network.


    The 10x Impact for Solving the Math

    How Business Outcomes would Change Radically with this Approach

    The causal embedding approach creates a world where AI delivers hyper-personalized, empathetic, and seamless customer experiences.

    By integrating behavioral, emotional, and contextual signals, it anticipates needs, eliminates friction, and fosters trust, transforming interactions across industries into delightful, inclusive, and empowering journeys that boost loyalty and engagement.

    Overall, it shifts AI from resource-intensive and limited-text-based processing to a more holistic, efficient, and predictive paradigm.

    CategoryExpected ImprovementsCurrent Limitations
    Cost Reduction100–1000xCurrent systems require expensive reasoning models (e.g., GPT-4, Claude) for each inference, as they must reconstruct causal relationships from scratch every time.
    Accuracy Improvement60% → 95%Current LLMs only process linguistic and partial contextual signals, missing 75% of decision-relevant information, including:
    - Behavioral signals: Completely ignored by LLMs.
    - Emotional signals: Not representable in text.
    - Full contextual signals: Only surface-level in current systems.

    Why We Will Win

    Operational Success: Our customer implementations demonstrate ROI in live business environments—critical proof beyond laboratory conditions.

    Academic Credibility: Team members and advisors with published research from IIT Delhi and IISc provide essential scientific foundation.

    Proprietary Dataset Advantage: 100,000+ labeled multi-modal dataset creates a defensible data and time moat—an IP-led competitive advantage that cannot be easily replicated.

    Research-Led AI Adoption Breakthroughs:

    • STT 5x Accuracy Jump: Solved fundamental architectural limitations of European/US-trained models in the Indian market by improving Deepgram's WER from <15% to 75% for Indian locations — a 5x breakthrough enabling AI adoption. Research paper publication forthcoming. ( references below )
    • 10x Delight with TTS Models: Achieved 85% improvement in Indian name/location pronunciation versus Google TTS/Amazon Polly (Indian-configured), significantly increasing customer delight and business trust in AI for critical customer journey touchpoints.

    References:

    • Filed Patent for our CTO: https://www.patentguru.com/US10993017B2
    • Research Paper Co-Authored by our Engineer: https://www.mrs.org/meetings-events/annual-meetings/archive/meeting/presentations/view/2022-mrs-fall-meeting/2022-mrs-fall-meeting-3784443

    Additional Readings:

    1. STT Improvement References
    2. STT Nuances
    Modified at 2025-08-12 05:48:22
    Next
    Why Open AI is not Enough
    Built with