Arabic Dialects

Kuwaiti, Gulf, Egyptian: How Thikaa Handles Arabic Dialects

A customer in Jeddah says "أبي" when they want something. A customer in Cairo says "عايز". A customer in Beirut says "بدي". Same intent, three different words. Here is how AI chatbots actually handle that — and where most platforms quietly fail.

April 18, 2026 8 min read Arabic · Dialects · Khaleeji · Egyptian · Levantine

Arabic is not one language

Every chatbot marketing page says "supports Arabic." Most of them mean: they can parse Modern Standard Arabic (فصحى) — the formal written language of newspapers, official documents, and nowhere else humans actually speak. Real customers write the way they talk. The way they talk depends on where they grew up.

The linguistic reality: Arabic dialects are further apart than many pairs of European languages that get separate Wikipedia pages. A Moroccan saying "بغيت نعرف الثمن" and a Kuwaiti saying "أبي أعرف السعر" are both saying "I want to know the price" — but they share almost no words.

The five dialects worth planning for

Khaleeji (خليجي)
Saudi Arabia, UAE, Kuwait, Bahrain, Qatar, Oman
"أبي" (I want) · "شلون" (how) · "وايد" (a lot) · "يبا لي" (I need)
Heaviest consonant reduction. Drops final vowels. Tripping point for most AI platforms.
Egyptian (مصري)
Egypt (also widely understood across MENA due to media)
"عايز" (I want) · "إزاي" (how) · "قوي" (very) · "كده" (like this)
Most widely understood dialect thanks to films and TV. Strong "ج" → "g" shift.
Levantine (شامي)
Lebanon, Syria, Jordan, Palestine
"بدي" (I want) · "كيف" (how) · "كتير" (a lot) · "هيك" (like this)
Softer phonetics. Heavy Western loanword usage especially in Lebanon.
Maghrebi (مغربي)
Morocco, Tunisia, Algeria
"بغيت" (I want) · "كيفاش" (how) · "بزاف" (a lot) · "هكذاك" (like this)
Heavy Berber + French influence. The hardest dialect for Mashriq speakers to understand.
Iraqi (عراقي)
Iraq
"أريد" (I want, sometimes) · "اشلون" (how) · "هواية" (a lot) · "هيج" (like this)
Mix of Gulf and Levantine features with strong Persian/Turkish loanwords.

Where generic chatbots fail

Three failure modes we see in chatbots that "support Arabic" without dialect-awareness:

The Fusha Reply

Customer writes "شلونك" (Khaleeji for "how are you"). Bot replies in formal newspaper Arabic. Customer feels like they are talking to a legal document.

The Wrong Dialect

Customer writes Egyptian; bot answers in Levantine. Like asking a question in English and getting the reply in a thick Glasgow accent — parseable, but jarring.

The Misunderstood Word

"عندي شمعة" in Egyptian means "I have a candle." In Gulf slang "شمعة" can colloquially mean a problem. Context and dialect together decide which meaning applies. Platforms that ignore dialect guess wrong.

How Thikaa matches dialect

We do not pick a dialect centrally. The bot matches whatever the customer uses, automatically. The mechanics:

Step 1 — detect

First message gets fast dialect detection (keyword + model-based). Result: Khaleeji / Egyptian / Levantine / Maghrebi / Iraqi / MSA / English / Arabizi.

Step 2 — inject

The detected dialect is added to the system prompt: "Reply in Khaleeji Arabic. Use أبي, شلون, وايد naturally. Avoid formal Fusha."

Step 3 — anchor

A few dialect-specific example exchanges are pulled from the RAG bundle and prepended to the prompt so the model has concrete anchors, not abstract instructions.

Step 4 — refresh

If the customer switches (say, types first in Egyptian, then switches to English mid-conversation), the detection re-runs and the prompt updates.

Step 5 — guard

Hallucination guards check the reply for dialect-inappropriate phrasing (e.g., Fusha leaking into a Khaleeji conversation) and soft-correct.

Claude Opus 4.5 and GPT-4o both handle this well when the dialect is explicit in the prompt. Without the explicit instruction, both default to Fusha most of the time — which is why most platforms sound robotic.

Arabizi and mixed-language conversations

Arabizi — the habit of writing Arabic in Latin letters with digits for missing sounds (3 for ع, 7 for ح, 9 for ق) — is enormously common among young Gulf customers on WhatsApp and Instagram. "7abibi sh5obarik" is Khaleeji for "my dear, how are you."

Thikaa treats Arabizi as a first-class input. The bot reads it, maps it back to Arabic internally for reasoning, and replies — by default — in whichever script the customer used. Some merchants force the bot to always reply in Arabic script even when the customer writes in Arabizi; that is a setting per bot.

The other mixed case: code-switching mid-sentence ("I want طلبية شاورما مع delivery بليز"). All frontier models handle this well; the anti-hallucination guards keep the reply coherent rather than the bot awkwardly picking one language for the whole reply.

Building dialect-aware Q&A into your bot

For bots that must be precise (e.g., medical clinics, legal firms) we recommend creating explicit Q&A pairs in the dialect your customers use. The bot prefers these direct pairs over generated answers:

Customer intent"أبي أحجز موعد مع الطبيب"Khaleeji / Gulf — "I want to book an appointment with the doctor"
Matched Q&AQ: booking request in Khaleeji · A: "حياك الله، في أي تخصص تبي تحجز؟ (عام/أسنان/جلدية/نسائية)"Natural Khaleeji reply that asks for specialty
ResultThe bot responds in the customer's dialect with a concrete next step instead of a generic Fusha "نرحب بك، ما هو استفسارك؟"Feels like a local clinic receptionist, not a call center

Upload Q&A pairs from the admin panel or import a WhatsApp chat export — Thikaa extracts question/answer pairs automatically and you review them before they go live. Five or ten good pairs in the right dialect change the bot's personality more than any system prompt tweak.

FAQ

Which dialects does Thikaa detect automatically?

Khaleeji (Gulf), Egyptian, Levantine, Maghrebi, Iraqi, Modern Standard Arabic, English, and Arabizi. Detection runs on every incoming message and updates as the customer shifts.

Can I force the bot to always use one dialect?

Yes. In bot settings, set a fixed dialect. The bot will reply in that dialect regardless of what the customer uses. Useful for merchants who want to brand a specific voice.

Does dialect-awareness cost more?

No. Detection is cheap and runs inline. You pay only for the LLM tokens like any other message.

What about voice notes in dialects?

Voice notes are transcribed by Whisper large-v3, which handles all major Arabic dialects. The resulting transcript flows through the same dialect-detection pipeline as text, so the reply matches the spoken dialect.

How do I know the bot is actually doing this?

Every conversation log shows the detected dialect and the prompt injection that was applied. Admins can review and adjust.

Let your bot sound like a local

Start a 14-day trial. Send your bot a voice note in Khaleeji or a text in Egyptian slang — it will reply in kind.

Start Free Trial