[2603.13793] GhanaNLP Parallel Corpora: Comprehensive Multilingual Resources for Low-Resource Ghanaian Languages

[2603.13793] GhanaNLP Parallel Corpora: Comprehensive Multilingual Resources for Low-Resource Ghanaian Languages

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2603.13793: GhanaNLP Parallel Corpora: Comprehensive Multilingual Resources for Low-Resource Ghanaian Languages

Computer Science > Computation and Language arXiv:2603.13793 (cs) [Submitted on 14 Mar 2026 (v1), last revised 30 Mar 2026 (this version, v2)] Title:GhanaNLP Parallel Corpora: Comprehensive Multilingual Resources for Low-Resource Ghanaian Languages Authors:Lawrence Adu Gyamfi, Paul Azunre, Stephen Edward Moore, Joel Budu, Akwasi Asare, Mich-Seth Owusu, Jonathan Ofori Asiamah View a PDF of the paper titled GhanaNLP Parallel Corpora: Comprehensive Multilingual Resources for Low-Resource Ghanaian Languages, by Lawrence Adu Gyamfi and 6 other authors View PDF Abstract:Low resource languages present unique challenges for natural language processing due to the limited availability of digitized and well structured linguistic data. To address this gap, the GhanaNLP initiative has developed and curated 41,513 parallel sentence pairs for the Twi, Fante, Ewe, Ga, and Kusaal languages, which are widely spoken across Ghana yet remain underrepresented in digital spaces. Each dataset consists of carefully aligned sentence pairs between a local language and English. The data were collected, translated, and annotated by human professionals and enriched with standard structural metadata to ensure consistency and usability. These corpora are designed to support research, educational, and commercial applications, including machine translation, speech technologies, and language preservation. This paper documents the dataset creation methodology, structure, intended use cases, and evaluation, a...

Originally published on March 31, 2026. Curated by AI News.

Related Articles

Nlp

Built an Event Kernel for Agent OSes that Coordinates Under Load: Real-Time Events, Replayable Logs, TTL subs, No Deadlocks

Agent systems are running on outdated infrastructure, manual state checks, endless polling, and fragile logs. Every workaround patches an...

Reddit - Artificial Intelligence · 1 min ·
[2602.08482] CLEAR: A Knowledge-Centric Vessel Trajectory Analysis Platform
Llms

[2602.08482] CLEAR: A Knowledge-Centric Vessel Trajectory Analysis Platform

Abstract page for arXiv paper 2602.08482: CLEAR: A Knowledge-Centric Vessel Trajectory Analysis Platform

arXiv - AI · 3 min ·
[2603.12057] Coarse-Guided Visual Generation via Weighted h-Transform Sampling
Machine Learning

[2603.12057] Coarse-Guided Visual Generation via Weighted h-Transform Sampling

Abstract page for arXiv paper 2603.12057: Coarse-Guided Visual Generation via Weighted h-Transform Sampling

arXiv - AI · 4 min ·
[2603.09455] Declarative Scenario-based Testing with RoadLogic
Nlp

[2603.09455] Declarative Scenario-based Testing with RoadLogic

Abstract page for arXiv paper 2603.09455: Declarative Scenario-based Testing with RoadLogic

arXiv - AI · 3 min ·
More in Nlp: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime