Building an AI-Powered Query System on Salesforce Objects

December 09, 2025

Salesforce Data Engineering AI LLM CRM SOQL

Building an AI-Powered Query System on Salesforce Objects

While understanding Salesforce objects is foundational, the real power emerges when you combine object knowledge with LLMs to create intelligent querying systems. This is the core of what Divami is building.

The Problem: From Manual Dashboards to Dynamic Intelligence

Traditional Salesforce dashboards require manual configuration. Want to answer a new business question? You must:

Navigate to Reports or Dashboards
Manually select objects and fields
Define filters and aggregations
Create visualizations

This workflow is static and slow. Modern executives need dynamic, ad-hoc analysis - the ability to ask natural language questions and get instant answers.

The Solution: LLM-Powered Salesforce Querying

Divami's approach bridges this gap by leveraging three key technical capabilities:

1. Object Metadata Extraction - Understanding what data exists and how it's structured

2. LLM-Assisted Query Generation - Converting natural language questions into SOQL (Salesforce Object Query Language)

3. Dynamic Visualization - Rendering results as charts and dashboards without pre-configuration

Real-World Example: From Questions to Insights

Consider this workflow powered by helper functions that interact with Salesforce:

def get_salesforce_tables():
    """Get list of all queryable Salesforce tables"""
    tables = [obj['name'] for obj in sf.describe()['sobjects'] if obj['queryable']]
    return tables

def get_table_schema(table_name: str):
    """Get schema details for a specific Salesforce table"""
    table = getattr(sf, table_name)
    fields = table.describe()['fields']
    schema = []
    for f in fields:
        field_info = dict(name=f['name'], type=f['type'], label=f['label'])
        if f['type'] == 'picklist': 
            field_info['values'] = [v['value'] for v in f['picklistValues']]
        if f['type'] == 'reference': 
            field_info['references'] = f['referenceTo']
        schema.append(field_info)
    return schema

def run_salesforce_query(soql_query: str):
    """Execute a SOQL query and return results"""
    result = sf.query(soql_query)
    return result['records']

These three functions become the interface between natural language and Salesforce data. An LLM can:

Call get_salesforce_tables() to understand what objects are available
Call get_table_schema() to understand field types and relationships
Generate appropriate SOQL and execute via run_salesforce_query()

Case Study: Analyzing Sales Pipeline Data

Let's walk through a real example of how this system works by asking progressively complex business questions.

Question 1: "How many opportunities do we have by stage?"

Initial Query (with error):

SELECT Stage__c, COUNT(Id) cnt FROM Custom_Opportunity__c GROUP BY Stage__c

The LLM initially tried the wrong field name. After receiving a malformed request error, it self-corrected by referencing the schema to find Opportunity_Stage__c instead:

SELECT Opportunity_Stage__c, COUNT(Id) cnt FROM Custom_Opportunity__c WHERE Opportunity_Stage__c != null GROUP BY Opportunity_Stage__c

Results:

Nurture: 66 opportunities (52%) - most leads still in early stage
SQL: 18 opportunities (14%) - qualified sales leads
Proposal: 14 opportunities (11%) - actively pitching
Solution/PoC: 12 opportunities (10%) - proof of concept phase
Won: 8 opportunities (6%) - closed deals
Discovery: 3 opportunities (2%)
Contract: 2 opportunities (2%)
Lost: 2 opportunities (2%)
Negotiation: 1 opportunity (1%)

This reveals the sales funnel shape - most opportunities are still in nurture, with a significant drop-off at proposal stage.

Question 2: "What's the total value of opportunities by owner and their performance?"

Query:

SELECT Owner.Name owner_name, Owner.UserRole.Name role_name, SUM(Revenue__c) total_revenue, COUNT(Id) opp_cnt FROM Custom_Opportunity__c GROUP BY Owner.Name, Owner.UserRole.Name ORDER BY SUM(Revenue__c) DESC

Key Findings:

Owner	Opportunities	Total Value	Avg Value per Opp
John	9	$554.8M	$61.6M
Alice	48	$50.4M	$1.0M
Bob	4	$44.1M	$11.0M
Charlie	14	$17.9M	$1.3M
Dave	24	$9.8M	$408K

Business Insight: John manages fewer deals but significantly higher-value ones, suggesting he focuses on enterprise accounts.

Question 3: "Which opportunities are closing this month?"

Query:

SELECT Name, Company__r.Name, Revenue__c, Opportunity_Stage__c, Target_Close_Date__c FROM Custom_Opportunity__c WHERE Target_Close_Date__c >= 2025-12-01 AND Target_Close_Date__c <= 2025-12-31

Results (December 2025):

Acme Corp - $26M (Proposal, closes Dec 22)
BetaTech - $900K (Proposal, closes Dec 23)
Gamma Inc - $4M (Proposal, closes Dec 9)
Delta LLC - $800K (Won, closed Dec 8)
Epsilon Co - $82.5K (Proposal, closes Dec 5)

Total value closing this month: $3.17M

Additional Insights from Related Objects

Lead Conversion Performance:

Total leads: 352
Converted leads: 6
Conversion rate: 1.7%

Top lead sources for conversions: N/A (50%), Research (33%), Partner (17%)

Account-Opportunity Relationships:

Placeholder account: 57 opportunities ($55.6M)
Top contacts per account: Tata Communications (54), Blue Yonder (11)
Industries represented: Technology (29%), Healthcare (13%), Computer Software (11%)

Why This Matters for Enterprise Intelligence

The key insight here is that objects are not just database tables - they're the foundation for intelligent systems. When an LLM understands:

What objects exist (Account, Opportunity, Lead, Contact, etc.)
How they're structured (fields, types, relationships)
How to query them (SOQL syntax)

...it becomes possible to answer complex business questions in natural language without manual dashboard configuration.

Challenges Encountered and Solutions

Challenge 1: Field Name Ambiguity

Problem: Multiple possible field names for similar concepts (e.g., Stage__c vs Opportunity_Stage__c)
Solution: Always call get_table_schema() first to validate field names before query generation

Challenge 2: Query Failures and Self-Correction

Problem: LLM generates syntactically valid but semantically incorrect queries
Solution: Implement error handling that feeds back to the LLM with schema information, allowing self-correction

Challenge 3: Complex Relationship Traversal

Problem: Opportunity → Company (Account) relationships require understanding lookup fields
Solution: get_table_schema() returns relationship metadata, enabling the LLM to generate proper query paths using __r notation

Challenge 4: Data Quality

Problem: Many fields unpopulated (e.g., 349/350 accounts have no geographic data)
Solution: Filter with != null in queries and handle missing data gracefully

The Future: From Query Generation to Predictive Intelligence

This foundation opens doors to more sophisticated capabilities:

Predictive scoring: "Which opportunities are most likely to close?"
Anomaly detection: "Which accounts have unusual purchasing patterns?"
Automated recommendations: "Who should follow up with these leads?"

All of these become possible once you've mastered the fundamentals of querying Salesforce objects programmatically.

Key Takeaway

Salesforce objects are more than just database tables - they're the semantic layer that enables intelligent systems. Understanding their structure, relationships, and metadata is the first step toward building AI-powered business intelligence tools that can answer any question executives ask, in real-time, without manual configuration.