Fig1: The GLOW-QA Pipeline Phases
Table 1: Average accuracy (%) on open-world KGQA tasks, grouped by reasoning hop count.
GLOW-GN significantly outperforms the baseline methods, especially on both 1-hop and 2-hop questions.
All methods use Qwen3-8B as the underlying LLM.
To RUN GLOW-QA Pipelines
python src/GLOW.py --llm_model qwen3:8b --dataset_name biokg --runs 3 --glow-v All --top-k 3-
llm_model choices=[gpt-4o-mini,deepseek-chat,deepseek-r1,granite3.3,gemini-1.5-flash,llama3.2:3b,qwen3:8b,phi4-mini]
-
dataset_name choices=[biokg,linkedIMDB,yago4-person,yago4-creativwork,crunchbase,arxiv2023,ogbnArxiv,ogbnProduct]
-
glow-v choices=[L,GCR,GN,G,N,LLM,All]
