Arabic is not one language
Every chatbot marketing page says "supports Arabic." Most of them mean: they can parse Modern Standard Arabic (فصحى) — the formal written language of newspapers, official documents, and nowhere else humans actually speak. Real customers write the way they talk. The way they talk depends on where they grew up.
The linguistic reality: Arabic dialects are further apart than many pairs of European languages that get separate Wikipedia pages. A Moroccan saying "بغيت نعرف الثمن" and a Kuwaiti saying "أبي أعرف السعر" are both saying "I want to know the price" — but they share almost no words.
The five dialects worth planning for
Where generic chatbots fail
Three failure modes we see in chatbots that "support Arabic" without dialect-awareness:
The Fusha Reply
Customer writes "شلونك" (Khaleeji for "how are you"). Bot replies in formal newspaper Arabic. Customer feels like they are talking to a legal document.
The Wrong Dialect
Customer writes Egyptian; bot answers in Levantine. Like asking a question in English and getting the reply in a thick Glasgow accent — parseable, but jarring.
The Misunderstood Word
"عندي شمعة" in Egyptian means "I have a candle." In Gulf slang "شمعة" can colloquially mean a problem. Context and dialect together decide which meaning applies. Platforms that ignore dialect guess wrong.
How Thikaa matches dialect
We do not pick a dialect centrally. The bot matches whatever the customer uses, automatically. The mechanics:
Step 1 — detect
First message gets fast dialect detection (keyword + model-based). Result: Khaleeji / Egyptian / Levantine / Maghrebi / Iraqi / MSA / English / Arabizi.
Step 2 — inject
The detected dialect is added to the system prompt: "Reply in Khaleeji Arabic. Use أبي, شلون, وايد naturally. Avoid formal Fusha."
Step 3 — anchor
A few dialect-specific example exchanges are pulled from the RAG bundle and prepended to the prompt so the model has concrete anchors, not abstract instructions.
Step 4 — refresh
If the customer switches (say, types first in Egyptian, then switches to English mid-conversation), the detection re-runs and the prompt updates.
Step 5 — guard
Hallucination guards check the reply for dialect-inappropriate phrasing (e.g., Fusha leaking into a Khaleeji conversation) and soft-correct.
Claude Opus 4.5 and GPT-4o both handle this well when the dialect is explicit in the prompt. Without the explicit instruction, both default to Fusha most of the time — which is why most platforms sound robotic.
Arabizi and mixed-language conversations
Arabizi — the habit of writing Arabic in Latin letters with digits for missing sounds (3 for ع, 7 for ح, 9 for ق) — is enormously common among young Gulf customers on WhatsApp and Instagram. "7abibi sh5obarik" is Khaleeji for "my dear, how are you."
Thikaa treats Arabizi as a first-class input. The bot reads it, maps it back to Arabic internally for reasoning, and replies — by default — in whichever script the customer used. Some merchants force the bot to always reply in Arabic script even when the customer writes in Arabizi; that is a setting per bot.
The other mixed case: code-switching mid-sentence ("I want طلبية شاورما مع delivery بليز"). All frontier models handle this well; the anti-hallucination guards keep the reply coherent rather than the bot awkwardly picking one language for the whole reply.
Building dialect-aware Q&A into your bot
For bots that must be precise (e.g., medical clinics, legal firms) we recommend creating explicit Q&A pairs in the dialect your customers use. The bot prefers these direct pairs over generated answers:
| Customer intent | "أبي أحجز موعد مع الطبيب" | Khaleeji / Gulf — "I want to book an appointment with the doctor" |
| Matched Q&A | Q: booking request in Khaleeji · A: "حياك الله، في أي تخصص تبي تحجز؟ (عام/أسنان/جلدية/نسائية)" | Natural Khaleeji reply that asks for specialty |
| Result | The bot responds in the customer's dialect with a concrete next step instead of a generic Fusha "نرحب بك، ما هو استفسارك؟" | Feels like a local clinic receptionist, not a call center |
Upload Q&A pairs from the admin panel or import a WhatsApp chat export — Thikaa extracts question/answer pairs automatically and you review them before they go live. Five or ten good pairs in the right dialect change the bot's personality more than any system prompt tweak.
FAQ
Which dialects does Thikaa detect automatically?
Khaleeji (Gulf), Egyptian, Levantine, Maghrebi, Iraqi, Modern Standard Arabic, English, and Arabizi. Detection runs on every incoming message and updates as the customer shifts.
Can I force the bot to always use one dialect?
Yes. In bot settings, set a fixed dialect. The bot will reply in that dialect regardless of what the customer uses. Useful for merchants who want to brand a specific voice.
Does dialect-awareness cost more?
No. Detection is cheap and runs inline. You pay only for the LLM tokens like any other message.
What about voice notes in dialects?
Voice notes are transcribed by Whisper large-v3, which handles all major Arabic dialects. The resulting transcript flows through the same dialect-detection pipeline as text, so the reply matches the spoken dialect.
How do I know the bot is actually doing this?
Every conversation log shows the detected dialect and the prompt injection that was applied. Admins can review and adjust.
Let your bot sound like a local
Start a 14-day trial. Send your bot a voice note in Khaleeji or a text in Egyptian slang — it will reply in kind.
Start Free Trial