Building a Knowledge Graph from Scratch: Lessons from a Learning Journey

Author:

Kareem Alnassag

Published:

January 22, 2026

What happens when a group of language operations professionals decides to tackle knowledge graphs with zero expertise? Turns out, quite a lot.

Back in August 2024, our workgroup at the Language Operations Institute embarked on an ambitious project: build a multilingual knowledge graph from the ground up. None of us were experts—that was precisely the point. We wanted to demystify knowledge graphs and create something practical that demonstrates their value in real business contexts.

Why Knowledge Graphs?

The appeal was simple: knowledge graphs excel at showing definite connections between data points, unlike LLMs which generate probabilistic outputs. They organize information using nodes, properties, and relationships, making them particularly powerful for complex data retrieval and AI applications. We saw potential applications everywhere—from terminology management to product information systems to AI-powered chatbots.

Choosing Our Domain

After considering medical devices (too complex), regulated industries (too dry), and general retail, we settled on electronics products. The domain offered several advantages:

Relatively accessible data sources
Rich relationship opportunities (products, components, accessories)
Potential for both structured specs and unstructured content like manuals
Something we could all relate to

The Technical Journey

From Protégé to Neo4j

We started with Protégé, Stanford's open-source ontology tool, learning the fundamentals: RDF, RDFS, OWL, and semantic triples (subject-predicate-object relationships). It was excellent for understanding the concepts but limited for practical implementation.

After several sessions wrestling with Protégé's constraints, we pivoted to Neo4j. The difference was night and day:

CSV import capabilities
Visual graph interface
Cypher query language
Better integration options with AI tools

The Data Challenge

Data quality became our unexpected nemesis. We quickly learned that building a knowledge graph isn't just about structure—it's about data normalization, standardization, and cleanup. We spent weeks:

Identifying which of 553 properties actually mattered
Standardizing measurement units
Removing duplicates and null values
Deciding what should be nodes versus properties

A key insight: common entities like brands and colours work better as nodes than properties, enabling more efficient searching across multiple products.

The Multilingual Dimension

We implemented translation support using a relationship-based approach rather than storing translations as properties. This meant creating separate language nodes connected via "translated_to" relationships, allowing us to retrieve information in any supported language (Arabic, Chinese, German, Spanish, French, Russian) without duplicating the entire graph structure.

AI Integration: The Game Changer

What started as a translation use case evolved into something more exciting: combining the knowledge graph with an LLM using RAG (Retrieval-Augmented Generation). The workflow:

User asks a question in natural language
LLM converts it to a Cypher query
Knowledge graph returns factual data
LLM generates a natural language response

This approach dramatically reduces hallucinations while providing conversational output—the knowledge graph supplies the truth, the LLM supplies the fluency.

Key Learnings

Technical:

Data quality matters more than we expected
Relationship-based translations scale better than property-based
Docker containerization is essential for sustainable deployment
Format compatibility between Neo4j editions is crucial

Process:

Start with use cases and questions before building the ontology
Iterative problem-solving beats perfect planning
Documentation for non-technical audiences is as important as the code

Team:

Learning together creates better educational materials
Diverse perspectives improve problem-solving
Real challenges make better teaching moments than sanitized tutorials

What's Next

We're currently packaging the system for broader distribution, implementing AI chat capabilities using the GenAI Stack template, and creating comprehensive documentation. The goal isn't just to build a knowledge graph—it's to show others how they can build one too.

The project continues to evolve, now exploring triple stores for even better multilingual support and reasoning capabilities. What started as a learning exercise has become a practical demonstration that knowledge graphs aren't just for enterprise tech companies with massive budgets—they're accessible to anyone willing to learn.

Interested in knowledge graphs or multilingual AI systems? The journey continues, and new learners are always welcome. No expertise required—just curiosity and willingness to debug alongside the rest of us.

Building a Knowledge Graph from Scratch: Lessons from a Learning Journey

Kareem Alnassag

Why Knowledge Graphs?

Choosing Our Domain

The Technical Journey

From Protégé to Neo4j

The Data Challenge

The Multilingual Dimension

AI Integration: The Game Changer

Key Learnings

What's Next

Recent Posts

Lead the change:
join our community today!

Quick Links

Get In Touch

Follow Us

Building a Knowledge Graph from Scratch: Lessons from a Learning Journey

Kareem Alnassag

Why Knowledge Graphs?

Choosing Our Domain

The Technical Journey

From Protégé to Neo4j

The Data Challenge

The Multilingual Dimension

AI Integration: The Game Changer

Key Learnings

What's Next

Recent Posts

Lead the change:join our community today!

Quick Links

Get In Touch

Follow Us

Lead the change:
join our community today!