Building a Knowledge Graph from Scratch: Lessons from a Learning Journey

Author:
Kareem Alnassag
Published:
January 22, 2026

What happens when a group of language operations professionals decides to tackle knowledge graphs with zero expertise? Turns out, quite a lot.

Back in August 2024, our workgroup at the Language Operations Institute embarked on an ambitious project: build a multilingual knowledge graph from the ground up. None of us were experts—that was precisely the point. We wanted to demystify knowledge graphs and create something practical that demonstrates their value in real business contexts.

Why Knowledge Graphs?

The appeal was simple: knowledge graphs excel at showing definite connections between data points, unlike LLMs which generate probabilistic outputs. They organize information using nodes, properties, and relationships, making them particularly powerful for complex data retrieval and AI applications. We saw potential applications everywhere—from terminology management to product information systems to AI-powered chatbots.

Choosing Our Domain

After considering medical devices (too complex), regulated industries (too dry), and general retail, we settled on electronics products. The domain offered several advantages:

  • Relatively accessible data sources
  • Rich relationship opportunities (products, components, accessories)
  • Potential for both structured specs and unstructured content like manuals
  • Something we could all relate to

The Technical Journey

From Protégé to Neo4j

We started with Protégé, Stanford's open-source ontology tool, learning the fundamentals: RDF, RDFS, OWL, and semantic triples (subject-predicate-object relationships). It was excellent for understanding the concepts but limited for practical implementation.

After several sessions wrestling with Protégé's constraints, we pivoted to Neo4j. The difference was night and day:

  • CSV import capabilities
  • Visual graph interface
  • Cypher query language
  • Better integration options with AI tools

The Data Challenge

Data quality became our unexpected nemesis. We quickly learned that building a knowledge graph isn't just about structure—it's about data normalization, standardization, and cleanup. We spent weeks:

  • Identifying which of 553 properties actually mattered
  • Standardizing measurement units
  • Removing duplicates and null values
  • Deciding what should be nodes versus properties

A key insight: common entities like brands and colours work better as nodes than properties, enabling more efficient searching across multiple products.

The Multilingual Dimension

We implemented translation support using a relationship-based approach rather than storing translations as properties. This meant creating separate language nodes connected via "translated_to" relationships, allowing us to retrieve information in any supported language (Arabic, Chinese, German, Spanish, French, Russian) without duplicating the entire graph structure.

AI Integration: The Game Changer

What started as a translation use case evolved into something more exciting: combining the knowledge graph with an LLM using RAG (Retrieval-Augmented Generation). The workflow:

  1. User asks a question in natural language
  2. LLM converts it to a Cypher query
  3. Knowledge graph returns factual data
  4. LLM generates a natural language response

This approach dramatically reduces hallucinations while providing conversational output—the knowledge graph supplies the truth, the LLM supplies the fluency.

Key Learnings

Technical:

  • Data quality matters more than we expected
  • Relationship-based translations scale better than property-based
  • Docker containerization is essential for sustainable deployment
  • Format compatibility between Neo4j editions is crucial

Process:

  • Start with use cases and questions before building the ontology
  • Iterative problem-solving beats perfect planning
  • Documentation for non-technical audiences is as important as the code

Team:

  • Learning together creates better educational materials
  • Diverse perspectives improve problem-solving
  • Real challenges make better teaching moments than sanitized tutorials

What's Next

We're currently packaging the system for broader distribution, implementing AI chat capabilities using the GenAI Stack template, and creating comprehensive documentation. The goal isn't just to build a knowledge graph—it's to show others how they can build one too.

The project continues to evolve, now exploring triple stores for even better multilingual support and reasoning capabilities. What started as a learning exercise has become a practical demonstration that knowledge graphs aren't just for enterprise tech companies with massive budgets—they're accessible to anyone willing to learn.

Interested in knowledge graphs or multilingual AI systems? The journey continues, and new learners are always welcome. No expertise required—just curiosity and willingness to debug alongside the rest of us.

Lead the change:
join our community today!

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.