What happens when a group of language operations professionals decides to tackle knowledge graphs with zero expertise? Turns out, quite a lot.
Back in August 2024, our workgroup at the Language Operations Institute embarked on an ambitious project: build a multilingual knowledge graph from the ground up. None of us were experts—that was precisely the point. We wanted to demystify knowledge graphs and create something practical that demonstrates their value in real business contexts.
Why Knowledge Graphs?
The appeal was simple: knowledge graphs excel at showing definite connections between data points, unlike LLMs which generate probabilistic outputs. They organize information using nodes, properties, and relationships, making them particularly powerful for complex data retrieval and AI applications. We saw potential applications everywhere—from terminology management to product information systems to AI-powered chatbots.
Choosing Our Domain
After considering medical devices (too complex), regulated industries (too dry), and general retail, we settled on electronics products. The domain offered several advantages:
- Relatively accessible data sources
- Rich relationship opportunities (products, components, accessories)
- Potential for both structured specs and unstructured content like manuals
- Something we could all relate to
The Technical Journey
From Protégé to Neo4j
We started with Protégé, Stanford's open-source ontology tool, learning the fundamentals: RDF, RDFS, OWL, and semantic triples (subject-predicate-object relationships). It was excellent for understanding the concepts but limited for practical implementation.
After several sessions wrestling with Protégé's constraints, we pivoted to Neo4j. The difference was night and day:
- CSV import capabilities
- Visual graph interface
- Cypher query language
- Better integration options with AI tools
The Data Challenge
Data quality became our unexpected nemesis. We quickly learned that building a knowledge graph isn't just about structure—it's about data normalization, standardization, and cleanup. We spent weeks:
- Identifying which of 553 properties actually mattered
- Standardizing measurement units
- Removing duplicates and null values
- Deciding what should be nodes versus properties
A key insight: common entities like brands and colours work better as nodes than properties, enabling more efficient searching across multiple products.
The Multilingual Dimension
We implemented translation support using a relationship-based approach rather than storing translations as properties. This meant creating separate language nodes connected via "translated_to" relationships, allowing us to retrieve information in any supported language (Arabic, Chinese, German, Spanish, French, Russian) without duplicating the entire graph structure.
AI Integration: The Game Changer
What started as a translation use case evolved into something more exciting: combining the knowledge graph with an LLM using RAG (Retrieval-Augmented Generation). The workflow:
- User asks a question in natural language
- LLM converts it to a Cypher query
- Knowledge graph returns factual data
- LLM generates a natural language response
This approach dramatically reduces hallucinations while providing conversational output—the knowledge graph supplies the truth, the LLM supplies the fluency.
Key Learnings
Technical:
- Data quality matters more than we expected
- Relationship-based translations scale better than property-based
- Docker containerization is essential for sustainable deployment
- Format compatibility between Neo4j editions is crucial
Process:
- Start with use cases and questions before building the ontology
- Iterative problem-solving beats perfect planning
- Documentation for non-technical audiences is as important as the code
Team:
- Learning together creates better educational materials
- Diverse perspectives improve problem-solving
- Real challenges make better teaching moments than sanitized tutorials
What's Next
We're currently packaging the system for broader distribution, implementing AI chat capabilities using the GenAI Stack template, and creating comprehensive documentation. The goal isn't just to build a knowledge graph—it's to show others how they can build one too.
The project continues to evolve, now exploring triple stores for even better multilingual support and reasoning capabilities. What started as a learning exercise has become a practical demonstration that knowledge graphs aren't just for enterprise tech companies with massive budgets—they're accessible to anyone willing to learn.
Interested in knowledge graphs or multilingual AI systems? The journey continues, and new learners are always welcome. No expertise required—just curiosity and willingness to debug alongside the rest of us.

.png)




