Data Tech June 3, 2025

SenseCheck: AI-Powered Comms Stress-Testing

Designed and built a demographic simulation platform that lets organisations stress-test draft announcements against 148,000 synthetic Singapore personas before publication. SenseCheck identifies which phrases trigger negative reactions in which segments, and recommending targeted rewrites with validated improvement.

Task

Conceptualized, designed, and developed a comprehensive banking calculator tool that enables users to compare interest rates and potential returns from different banks, considering multiple qualification criteria.

  • Role

    AI Engineer, Product Owner & Developer

  • Tech

    LangChain, LangGraph, Arize Phoenix (OpenTelemetry)

  • Status

    Iteration 2 in PRD

Open Project
Discovery

Why Organisations Fly Blind on Language

Government agencies and large organisations draft policy announcements without knowing how specific resident segments will react until after publication. A single dismissive phrase can dominate public discourse for weeks. By the time you know it’s a problem, the damage is done.

The traditional alternatives are surveys and focus groups. They’re slow (weeks to months), expensive, and biased toward respondents who self-select — meaning the voices most affected by a policy change are often the least represented.

Existing synthetic persona tools (notably NVIDIA’s Nemotron-Personas-Singapore demos) had shown that AI-driven simulation could produce demographically differentiated sentiment. But these were one-shot explorations. None offered the thing a communications officer actually needs: which specific phrases are the problem, for which specific segments, and what should I change them to — with proof that the change actually helps.

That was the gap SenseCheck was designed to close: an iterative testing pipeline that goes from diagnosis to validated fix in minutes.

Personas

The Baseline: Can Synthetic Personas React to Language?

The simplest useful version. A communications officer pastes their draft announcement. The system samples 50 personas from a dataset of 148,000 synthetic Singaporeans (NVIDIA Nemotron-Personas-Singapore, CC BY 4.0), using population-weighted stratified sampling so the persona mix reflects Singapore’s actual demographic distribution per planning area. Each persona is run through Claude Sonnet with role-based prompting and returns structured output: sentiment direction, intensity score, trigger phrase, and reasoning.

The results render on a choropleth map of Singapore’s planning areas, colour-coded by dominant sentiment. A persona feed sidebar shows each individual reaction with sentiment badges and the exact phrase that drove it.

Architecture: Draft Announcement → Persona Sampler (stratified by planning area) → Claude Sonnet (role-based prompt, zero-shot) → Structured Output (SENTIMENT / INTENSITY / TRIGGER_PHRASE / REASONING) → Choropleth Map + Persona Feed + Phrase Attribution

Design decisions that mattered:

Parallel API calls with a batch size of 48, targeting 50 personas simulated in under 15 seconds. Claude Sonnet was chosen for persona reasoning — best instruction-following for role-based simulation — while MiniMax and GPT-4o-mini handle cheaper sub-tasks like theme extraction and concern tagging at 3–4x lower cost.

What worked:

The phrase-level attribution was immediately useful. In a test scenario with a hawker centre closure announcement for Toa Payoh, 72% of elderly low-income personas reacted negatively with an average intensity of -3.8. The phrase “explore nearby dining options” was flagged as the top trigger — personas interpreted it as dismissive of their daily food access needs. That level of specificity — not just “elderly people are unhappy” but “this phrase is the reason” — was the core value proposition validated.

What the evaluation revealed was missing:

When I compared responses across planning areas, a Toa Payoh elderly persona sounded nearly identical to a Punggol elderly persona. They were reacting to the same phrases with the same reasoning, just with different names. The grounding delta eval — comparing whether responses were actually differentiated by location — failed. Without real local context, every persona was reasoning from the same generic understanding of Singapore.

That signal triggered Iteration 2.

Persona 1: Beginner Saver
Young, tech-savvy professional
  • Looking to save for a BTO flat down payment in the next 2-3 years
  • Already has credit cards with several banks but isn’t strategically using them to qualify for higher interest rates
  • Prefers digital solutions for financial management

“I need to understand how different banking decisions will affect my financial future.”

Persona 2: Time-Pressed Tactician
Mid-career professional with children
  • Manages family finances and wants to optimize returns without spending hours researching
  • Has insurance policies and uses GIRO for utility bills and children’s school fees
  • Time-constrained and needs quick, clear financial information

“I need to quickly determine which bank will give me the best return for my specific situation”

Persona 3: Optimization Seeker
Small F&B business owner
  • Has fluctuating monthly income
  • Uses multiple banks for business and personal finances
  • Interested in optimizing distribution of funds across banks to maximize interest while maintaining liquidity
“I need to optimize my banking setup to maximize interest without spending hours on calculations”
Persona 4: Retiree
Has substantial savings of $200,000+
  • Less comfortable with technology but willing to use intuitive digital tools
  • Looking for ways to generate passive income through interest as she transitions to retirement
  • Concerned about minimum balance requirements and maintaining access to funds

“I want to keep my money with me but try to maximise my savings. I don’t understand all these banking terms and conditions.”

Strategy

Iteration 2: Grounding in Reality — and the Rewrite Problem

RAG: Making Personas Sound Like They Live Where They Live

Before each persona call, the system now retrieves real hyperlocal data from data.gov.sg by planning area: nearby hawker centres, MRT stations, supermarkets, parks, median household income, age distribution, and flat type mix. This is injected as a LOCAL ENVIRONMENT block in the system prompt, so each persona responds with awareness of their actual neighbourhood.

A deliberate retrieval decision: this is a direct JSON lookup by planning area, not semantic search. Planning area demographics are structured tabular data — income bands, amenity counts, flat type distributions. Embeddings can’t capture the semantics of exact values. A direct lookup is faster, cheaper, and more accurate. This was informed by the principle that semantic retrieval struggles on tabular data where precision matters.

A cost optimisation followed naturally: the demographic block is identical for all personas in the same planning area, so it’s placed at the top of the prompt as a cacheable prefix. Repeated calls for Toa Payoh personas reuse the cached context, reducing cost by roughly 90% for same-area batches.

The grounding delta was immediately visible. A Toa Payoh elderly persona now references actual hawker centres and real walking distances to alternatives. A Punggol young family persona references Waterway Point food court. The responses became location-specific rather than generic. The eval that had failed in Iteration 1 now passed.

Design

Analytics-First Design

Unlike typical calculators that simply present data, I took an analytics-first design approach focused on transforming raw calculations into actionable insights. This included creating a logical flow from input variables to insight presentation, developing progressive disclosure hierarchy for complex information, and designing a comparative visualization framework highlighting key differences.

The technical architecture was designed to support complex calculations while maintaining performance and scalability:

  • Modular Calculation Engine: Separated core calculation logic from presentation layer
  • Data Management: Used Supabase for structured storage of rates and criteria
  • Performance Optimization: Implemented calculation memoization to improve response time
  • Analytics Implementation: Designed event tracking for feature usage patterns
Analytics

Measurement Framework

I established a comprehensive measurement framework to evaluate product success, focusing on these primary metrics:

  • User Confidence Score: Measured through post-calculation survey
  • Decision Time: Time from initial visit to completion of comparison
  • Optimization Value: Percentage improvement through fund distribution optimization
  • Satisfaction Rating: Post-usage satisfaction survey results

I implemented a robust analytics architecture to measure both usage and impact, with an event tracking framework aligned with the user journey, a feedback collection system, and performance monitoring for calculation response time.

Security

Security Implementation

Input Sanitization
  • DOMPurify Implementation: Added robust sanitization for all user inputs, configured to strip HTML tags and attributes to prevent XSS attacks
  • Client-Side Sanitization: Sanitized user inputs in chat messages and calculator inputs before processing
  • Server-Side Sanitization: Implemented middleware to sanitize all incoming requests including body, query parameters, and URL parameters
Rate Limiting
  • API Rate Limiting: Implemented rate limiting on sensitive APIs (5 requests per minute per user) to prevent abuse
  • User Experience: Added user-friendly error messages and visual styling for rate limit notifications
Authentication Security
  • Supabase Authentication: Implemented secure email/password authentication with Google OAuth integration
  • Auth Flow Protection: Created secure authentication redirects and properly managed sessions and tokens
  • Production Configuration: Configured OAuth providers to only accept redirects to trusted domains
API & Data Security
  • CORS Configuration: Properly configured CORS headers to restrict API access to trusted domains
  • Secure Error Handling: Implemented error handling that doesn’t leak implementation details
  • Input Validation: Added checks for input types and formats before processing
  • Protected Routes: Implemented access controls to protect sensitive functionality
Analytics

Results & User Impact

Early results demonstrate significant impact on user financial decision-making:

  • 76% of users report increased confidence in their banking decisions
  • Average optimization value of 21% through the fund distribution algorithm
  • Decision time reduced by 89% compared to manual comparison methods
  • 4.7/5 average satisfaction rating from post-usage surveys

User feedback revealed several key insights: transparency drives trust, with users particularly valuing understanding exactly how interest is calculated; the fund distribution optimization feature receives the highest satisfaction ratings; and mobile usage exceeds expectations, with 68% of users accessing the tool via mobile devices and spending 24% more time exploring different scenarios.

Product Roadmap

Evolution & Future Plans

SmartSaver has evolved significantly from its initial concept, with each release adding valuable functionality based on user feedback and strategic priorities:

Back