AI and LM

Your AI chatbot is just a FAQ with a face. So we tested what it actually takes.

We built a full AI travel interface from scratch, not to say we do AI, but to test whether a conversation can actually be a meaningful channel for personalized travel. This is what it took, why a plugin will never get you there, and why most AI chatbots are just expensive search bars.

7 hours ago • 15 min read

By Remco Loup.

We built an AI-powered travel interface from scratch. Not a chatbot. Not a widget. Not a GPT-wrapper with a smiley face bolted onto a website. A full conversational interface with its own vector database, its own retrieval pipeline, its own memory system, and real-time streaming over WebSockets. Not to replace anything. To test whether conversational AI is actually ready to be a meaningful channel for travel. Here's what we found, what it took, and why we think most of what's out there right now is noise.

Long read ahead. Grab a coffee. This one covers architecture diagrams, vector databases, memory systems, and why most AI chatbots in travel are just expensive search bars.

What are you actually trying to solve?

That's the question I keep coming back to. Every time I see a company announce their new "AI-powered" feature. Every time someone asks me what we're doing with AI. Every time a vendor pitches yet another plugin.

It's the most important question in tech right now. And almost nobody is asking it.

Instead, the entire industry is working backwards: starting with "we need to do something with AI" and then looking for a place to put it. Companies hear "AI" and they hear "innovation" and they think they're falling behind if they don't do something. Anything. Now.

The pressure to "do something with AI" has become louder than the question of whether you should.

And so what happens? Companies reach for the most visible, most marketable application they can find: a chatbot. Slap it on the website. Give it a name and a friendly face. Connect it to a knowledge base. Done. "We do AI now."

In travel, this plays out in a specific way. Most travel companies don't have their own developers or IT infrastructure. So third-party vendors, companies that often have nothing to do with travel, throw tools against LLMs, wrap them in a widget, and sell them as plugins. "Install our AI assistant on your website! Smart. Conversational. 24/7 availability."

Here's the problem with these chatbots: the integration is always limited. Some connect to a product database. Some pull live availability. But as a plugin, there's a ceiling to how deep that integration goes. It's always working with whatever data it gets access to, in whatever format the API provides, with whatever context the vendor decided was enough. The result is an interaction that looks conversational on the surface but is fundamentally constrained in what it can actually do.

And because it's a plugin, it can never be fully seamless. It will always be a separate layer on top of your website, not part of it. A different design language. A different interaction model. A different experience. The customer feels it, even if they can't articulate it. It's the difference between a room that was designed as a whole and a room where someone placed a kiosk in the corner.

A smarter gatekeeper is still a gatekeeper. Your customers wanted a door.

We've been down this road before. The old-fashioned chatbot was already universally hated because it stood between the customer and an actual answer. Making it "smarter with AI" doesn't fix the fundamental problem. And these plugins often make things worse. They hallucinate. They give wrong information about trips. They make promises that travel companies can't keep. And when something goes wrong, the travel company has zero control, because the chatbot isn't theirs. It's the vendor's. Built on someone else's tech. With someone else's priorities.

The real question was never "how do we add AI to our website?" The real question is: where does AI genuinely create value that wasn't possible before? That's what we set out to test.

Why I built what I built

I didn't build this because I think websites are dead or because I wanted to say we "do something with AI." I've written before about how much I despise that mentality.

I built this because of a question I've been testing for over a year now: is conversational AI ready to be a real channel?

In my Web 4.0 post, I wrote about the autonomous web, a world where AI assistants don't just retrieve information but act as independent digital consumers on behalf of their humans. Where Lisa tells her AI assistant "find me a two-week trip to Thailand" and the AI contacts travel companies through MCP-enabled interfaces, negotiates, refines, and presents options. Without Lisa ever opening a browser.

That might sound like removing humans from the equation. It's not. The technology enables full autonomy. But that doesn't mean we want to remove the human. It means the customer gets to choose. And our default will always be: lead to a person. More on that later.

That future is coming. Maybe not tomorrow, but it's coming. And if you're a travel company that takes technology seriously, you'd better be ready for it.

So I built Joy, an AI travel assistant for 333travel, not as a product launch, not as a marketing play, but as a serious test. Can we build a conversational interface that actually adds value? That understands our products? That doesn't hallucinate? That creates a genuinely useful experience as an additional channel next to our website, phone, and email?

I didn't build this to replace our website. I built this to answer a question: can a conversation be a meaningful channel for personalized travel?

What this actually is

Let me be specific about what Joy does, because calling it a chatbot would be like calling a restaurant a vending machine.

Joy sits on top of our entire product database. Every roundtrip, every hotel, every excursion, every destination, all embedded as vectors in a dedicated database. When a customer says "I want something adventurous in Southeast Asia, around two weeks, not too touristy," Joy doesn't keyword-match against a PDF. She searches semantically across hundreds of products, reranks results based on actual relevance, and responds with options that genuinely fit the intent.

And she remembers the conversation. Not just the last message, the whole thread. She knows what trips were discussed, what was saved, what was dismissed. When you say "that trip you mentioned earlier with the cooking class," she understands what you mean even if she originally called it a "culinary experience."

But here's the key difference with any chatbot out there: she doesn't just know travel in general. She knows our travel. Our specific products. Our specific itineraries. Our destinations that our team personally visited. That's not something you can bolt on with a plugin, no matter how smart the LLM behind it is.

And the moment you want human contact? Joy hands the full context to a travel specialist who knows exactly what you've been exploring. No "please describe your inquiry again." No cold transfer. Full continuity.

Joy doesn't replace conversations with specialists. She makes them better.

What she doesn't do (yet): she doesn't handle customer service queries, airline information, or post-booking support. This is purely a product discovery and inspiration channel. We'll get there. But we're not going to pretend she does things she doesn't. And honestly, when someone reaches out to customer service, there's already a problem. That's the worst possible moment to put a machine between you and your customer. That's when you show up. Personally.

Why you can't get here with a plugin

You cannot achieve what we built by installing a plugin. Not with ChatGPT. Not with any off-the-shelf "AI widget." Not with any SaaS tool that promises to "make your website smarter in minutes."

And it's not because the AI models aren't good enough. The models are fine. The problem is everything around them.

The entire system depends on data that we own, structure, and control. Every trip in our database has been visited by our own team. Every itinerary is our own creation. Every piece of destination knowledge lives in our own systems. This isn't data we license from a third party or pull from a brochure catalog.

That means we can embed it. We can structure it exactly how we need it for semantic search. We can update it the moment something changes. We control the full pipeline from raw product data to vector embedding to retrieval to response.

You can't build a smart interface on top of dumb data.

If your product data lives in someone else's system, if you're reselling trips from a catalog, if your "database" is a collection of PDFs from tour operators, you're at the mercy of whatever structure someone else decided on. You can't build intelligence on top of data you don't understand or control.

And many companies aren't just starting from zero, they're starting from a deficit. Their existing systems are already inadequate. The booking engine is outdated. The product data is scattered across spreadsheets and PDFs. The customer journey has gaps everywhere. And instead of fixing those fundamentals, they layer an AI chatbot on top of it. As if intelligence on top of dysfunction creates something functional. It doesn't. It just makes the dysfunction harder to diagnose.

You're not solving a problem. You're decorating one.

Building what we built required three things:

Owning our data. Every trip is ours. Visited by our team. Structured in our database. Embedded in our vector store. You can't build semantic search over data you don't structure yourself.

Owning our systems. Our own backend, frontend, deployment pipeline. When we needed a dual reranker, we built it. When we needed semantic conversation memory, we built it. When we needed an MCP server for Web 4.0 readiness, we built it. No vendor approval needed. No feature requests. No roadmap dependencies.

Owning the entire chain. From the moment a customer opens the chat to the moment they're talking to a specialist, every step is ours. The AI knows our products because we made the products. The specialist knows the conversation because our system feeds it to them.

The AI is the last 10% of the work. The first 90% is building a company that can support it.

The architecture

Since this is a tech blog and not a marketing page, let me show you what's actually running. This isn't a weekend project. This is production infrastructure.

Frontend (React + TypeScript + Zustand)
    │
    │  WebSocket (real-time, bidirectional)
    │
FastAPI Backend
    ├── Query Orchestrator
    │     ├── LLM Client (multi-provider with automatic fallback)
    │     ├── Tool Registry (9 specialized tools)
    │     └── Observer System (async post-turn analysis)
    ├── RAG Pipeline
    │     ├── Vector Database (3 content collections)
    │     ├── Embedding Engine
    │     └── Reranker (dual provider, switchable)
    ├── Conversation Memory (dual storage)
    │     ├── SQL (message history)
    │     └── Vector DB (semantic search over past messages)
    ├── Security Layer
    │     ├── Input Guard (prompt injection defense)
    │     ├── Output Guard (credential leak prevention)
    │     └── Rate Limiting + IP Management
    ├── MCP Server (Model Context Protocol)
    │     └── External AI assistant access to our data
    └── Lead Management
          ├── Proposal Generation
          └── Specialist Handoff

The frontend is a React application with real-time WebSocket communication. No polling. No request-response cycles where you wait for a full answer. Every token streams in real-time, buffered at 50ms intervals to prevent UI flicker. Product results push to the interface the moment they're found, before the LLM even starts composing its response.

The backend is Python/FastAPI with a modular tool system. The LLM doesn't just generate text. It has access to 9 specialized tools and decides autonomously which ones to use. Need to search trips? It calls search_products. Need destination information? It calls search_knowledge. Need to recall what was discussed earlier? It queries conversation memory semantically.

The LLM isn't generating answers. It's orchestrating tools and composing a response from real data.

The LLM layer runs with a multi-provider fallback system. If the primary provider goes down, the system automatically switches to the secondary. No downtime. No errors. The customer never notices.

And there's already an MCP server running, the protocol I described in my Web 4.0 post. External AI assistants can already query our travel data through it. It's early days, but the infrastructure for AI-to-business communication is live. When Lisa's AI assistant wants to search 333travel's products, the endpoint already exists.

The RAG pipeline: where the real work happens

RAG, Retrieval-Augmented Generation, is the backbone of the entire system. It's what separates "GPT with a knowledge base" from "an AI that actually knows your product." But RAG done poorly is almost worse than no RAG at all.

We maintain three separate vector collections:

Collection	What's in it	What it does
Travel Products	Roundtrips, hotels, tours, cruises	Powers the main product search
Travel Details	Day programs, excursions, inclusions	Answers specific itinerary questions
Travel Knowledge	Blogs, destination guides, travel info	Provides context and inspiration

When a customer asks something, the query gets embedded and thrown against the relevant collection. But here's the critical part: we don't just take the top vector matches and call it a day.

Vector similarity alone is mediocre. It gets you in the right ballpark, but it doesn't get you the best results. So we do a broad fetch, significantly more results than we need, and then run them through a reranker. The reranker evaluates each query-document pair for actual semantic relevance, not just embedding proximity.

The difference is enormous. It's the difference between "here are trips that contain some of the words you used" and "here are trips that actually match what you're looking for."

But it goes further than that. Before the query even hits the vector database, it runs through a fuzzy normalization layer. Every country and location name in our system is cached at startup. When a customer types "Bali" we know they mean Indonesia. When they write "Tailand" we know what they meant. Aliases, spelling variations, language differences, all resolved before the search begins. Without this, you get empty results for queries that should have matched. And on top of that, the search combines semantic similarity with metadata filters. Country, duration, product type, these aren't just fields in a database. They're active filters that work together with the vector search to narrow results before the reranker even starts.

After reranking, the results split into two paths. The full set of reranked results pushes to the frontend immediately, but blurred. The customer sees product cards appearing while Joy is still thinking. It signals progress without overwhelming. Meanwhile, the LLM gets a stripped-down version of those results, only the fields it needs to reason about. This projection saves roughly a third of the tokens, which means faster responses and lower costs. Then Joy composes her answer and discusses the 4 or 5 trips that genuinely fit the question. Only those discussed products become visible to the customer. The rest disappear. What the customer sees is a curated, considered selection. Not a list of 10 search results. A recommendation of 5 that actually make sense.

Broad fetch → Rerank → Project. Three steps that make the difference between a search result and a recommendation.

The memory problem

This is the part that took the most iterations to get right.

Conversational AI has a dirty secret: it doesn't remember anything. Every LLM call is stateless. The model has no idea what was said before unless you explicitly feed it the context. And there's a limit to how much context you can feed.

For a simple Q&A chatbot, this doesn't matter. Someone asks a question, gets an answer, done.

But for a travel assistant? Memory is everything.

A customer says: "I want to go to Thailand for two weeks." Three turns later: "Actually, make it three weeks. And I want to include Laos." Five turns later: "What was that trip you showed me earlier with the cooking class?"

Without memory, every turn is a blank slate. The system doesn't know about Thailand. Doesn't know about the three-week change. Doesn't know about Laos. Definitely doesn't know about the cooking class trip from seven messages ago.

Without memory, every message is a first date. Your AI has no idea what happened before.

This wasn't solved in one attempt. Not in two either.

The first version only had SQL-based memory. We stored every message and fed the recent history back to the LLM. Simple, but brittle. You could reference things that were said recently, but the moment a customer said "that trip you mentioned" without being specific, the system was lost. It could recall what was said. It couldn't understand what was meant.

So we added a second layer: semantic memory. Every message gets embedded as a vector in the same database we use for product search. When the customer says "that trip with the cooking class," the system runs a semantic search across all past messages in that session. It finds the relevant message even if the exact words don't match. Maybe Joy called it a "culinary experience" three turns ago. Vector search doesn't care about word matching. It matches meaning.

AI memory isn't one problem. It's two: recalling what was said, and understanding what was meant. We needed two different systems to solve them.

UI/UX: the invisible work

Here's where most AI projects die, and they don't even realize it.

You can have the most sophisticated RAG pipeline in the world. The smartest retrieval. The best reranking. But if the interface feels clunky, slow, or confusing, none of it matters. The customer doesn't care about your architecture. They care about how it feels.

If your AI is smart but your interface is slow, you built a genius locked in a closet.

We went deep on this. And "deep" means dozens of iterations on things most people would consider details. But details are the product.

Streaming that feels natural. LLM responses stream token by token over WebSocket. But raw streaming creates flickering text that's jittery and unpleasant. We buffer at 50ms intervals. Fast enough to feel real-time, slow enough to prevent constant re-renders. The difference is subtle but massive in terms of perceived quality.

Joy shows her work. While Joy is thinking, the interface doesn't just show a spinner. It shows what she's doing. Searching products. Looking up destination information. Checking conversation history. Every tool call is visible to the customer as it happens. Product cards push in blurred while she's still composing. Once the answer is ready, only the trips Joy actually discusses become visible. The rest fade away. The customer never stares at a blank screen wondering if something is broken. They see the process. It builds trust in a way that a loading animation never will.

Instant page loads. The entire conversation persists in local storage. When you refresh the page or come back later, your conversation is already there. Instantly. No loading spinner. No "reconnecting..." message. The sync with the server happens silently in the background.

Markdown normalization. LLMs have opinions about formatting. Claude loves bold headers with emoji. GPT loves numbered lists. We normalize everything server-side. Bold-with-emoji gets converted to clean headers. Horizontal rules get stripped. Font weights get tuned down. The result looks like it was designed, not generated. The customer should never feel like they're reading AI output.

Session continuity. Your session survives across page refreshes, browser closes, and reconnections. Come back a day later, your conversation is still there. Your saved trips are still there. And if a session does expire, you get a clean overlay explaining what happened, not a broken page or a cryptic error.

The lead flow. When a customer is ready to talk to a specialist, the system generates a travel proposal pre-filled with what it already knows from the conversation. Destinations discussed, trip preferences, products explored. Minimal friction. Maximum context. The human specialist picks up exactly where Joy left off. That's not a handoff. That's the whole point. Joy exists to make that human conversation better, not to avoid it.

Every one of these details took multiple iterations. Every one of them is invisible to the customer. That's the point.

Security: an honest note

I'll keep this section brief and honest. Security isn't something you finish. It's something you work on constantly.

We have a dual guard system. An input guard before the LLM that catches prompt injection attempts, and an output guard after the LLM that catches credential leaks or system prompt fragments in responses. Both are deterministic and add virtually zero latency.

The system prompt is hardened with anti-jailbreak instructions. There's rate limiting, bot protection, and IP management for persistent abuse.

Is it bulletproof? No. Nothing is. But it's built with defense in depth in mind, and we iterate on it as new attack patterns emerge. If you're building any AI interface that talks to the public and you haven't thought about input/output guards, you have a problem waiting to happen.

The bigger picture: an extra channel, not a replacement

Let me be very clear about what Joy is and what she isn't.

Joy is not a replacement for our website. She's not a replacement for phone or email. She's an additional channel. A new way for customers to explore our products, alongside everything that already exists. She will never be a replacement for human contact. We value that the most in our company. Joy is a means to human contact. She helps customers figure out what they want, so that when they talk to one of our specialists, that conversation starts at a completely different level.

Joy isn't the destination. She's the road that leads to a real conversation with a real person.

The MCP server is already live, which means AI assistants can already query our data programmatically. When the Web 4.0 vision materializes, when Lisa's AI assistant autonomously searches for her perfect Thailand trip, the endpoint already exists. Not because we scrambled to build it. Because we've been exploring this infrastructure for over a year.

Joy isn't the end product. She's the beginning of a channel strategy that includes both human-to-AI and AI-to-AI interactions. And she's a test, a very serious, very thorough test, of whether conversational AI can actually add value for travel customers today.

The results so far have been better than I expected. Not flawless, nothing is, but genuinely impressive in how natural the conversations feel and how relevant the recommendations are. When you see a customer describe a vague idea and Joy comes back with trips that actually fit, you realize this isn't a gimmick. It works. And it works because of everything underneath it.

So, what are we trying to solve again?

I opened with this question because it's the one that guided everything we built. Not "how do we use AI?" Not "what's our competitors doing?" Not "how do we look innovative?"

Just: what problem are we solving, and does this solution actually work?

We built Joy to find out. With real architecture. Real data. Real customers. And the willingness to say "it's not ready yet" if that turned out to be the case.

If you're thinking about adding AI to your product, start there. Not with the tool. Not with the vendor pitch. Not with the pressure to keep up.

Start with the question. Build from the answer.

Your customers deserve better than a FAQ with a face. They deserve a real conversation. And eventually, a real person. Make sure your AI leads them there, not away from it.

Your AI chatbot is just a FAQ with a face. So we tested what it actually takes.

What are you actually trying to solve?

Why I built what I built

What this actually is

Why you can't get here with a plugin

The architecture

The RAG pipeline: where the real work happens

The memory problem

UI/UX: the invisible work

Security: an honest note

The bigger picture: an extra channel, not a replacement

So, what are we trying to solve again?

I Vibecoded an app and this is what I learned.

Keep reading

What are you actually trying to solve?

Why I built what I built

What this actually is

Why you can't get here with a plugin

The architecture

The RAG pipeline: where the real work happens

The memory problem

UI/UX: the invisible work

Security: an honest note

The bigger picture: an extra channel, not a replacement

So, what are we trying to solve again?

Spread the word

I Vibecoded an app and this is what I learned.

Keep reading

I Vibecoded an app and this is what I learned.

Let’s Stop Talking About AI (and Start Solving Real Problems)

Web 4.0 and the autonomous web: The rise of AI-driven customers