Overview

This tutorial is going to walk you through using Knowledge Graph to support an advanced and highly desired use case: “For a given standard, I want to know what the pre-req(s) are, to create a differentiated content or product experience.” When it comes to this use case of working with learning progressions data, there are a few key things to know in advance:
  • Our current learning progressions dataset, coming from Student Achievement Partners, map Common Core State Standards for Mathematics into logical sequences.
  • The sequences do not name definitive pre-reqs. In other words, it is not necessarily true that students must master an earlier standard before they will be ready for standards it supports.
  • Instead, the relationships shown indicate what might be helpful in a given circumstance.
To learn more about these entities please read the data references for Academic Standards and Learning Progressions.
Coherence Map Dataset Data Model Diagram High Level
Sv

Key demonstrated capabilities

  1. Navigating learning progressions relationships
  2. Unpacking standards into learning components
  3. Inserting Knowledge Graph data into LLM context for content generation

Prerequisites

  • This tutorial assumes you’ve downloaded Knowledge Graph already. If you haven’t please see download instructions.
  • You can use either Node or Python to go through this tutorial
  • If using Node
    • node 14+
    • openai
    • dotenv
    • arquero
    • csv-parse
  • If using Python
    • python 3.9+
    • openai
    • pandas
    • python-dotenv
  • OpenAI API key

Step 1: setup

Load dependencies and env variables

First you’ll need to set up your environment variables for an LLM (this tutorial assumes OpenAI) and either the data files you have downloaded or a PostgreSQL database. This tutorial assumes your variable names look like the following:
# .env

# Knowledge Graph data path - update with your actual path to CSV files
KG_DATA_PATH=/path/to/your/knowledge-graph/csv/files

# OpenAI API key for generating practice questions
OPENAI_API_KEY=your_openai_api_key_here
Then setup the javascript app, constants and helper functions.
// Dependencies
const fs = require('fs');
const path = require('path');
const { parse } = require('csv-parse/sync');
const OpenAI = require('openai');
require('dotenv').config();

// Constants
const GENERATE_PRACTICE = true;
// Filter criteria for mathematics standards
const JURISDICTION = 'Multi-State';
const ACADEMIC_SUBJECT = 'Mathematics';
const TARGET_CODE = '6.NS.B.4';
// OpenAI configuration
const OPENAI_MODEL = 'gpt-4';
const OPENAI_TEMPERATURE = 0.7;

// Environment setup
const dataDir = process.env.KG_DATA_PATH;
if (!dataDir) {
  console.error('❌ KG_DATA_PATH environment variable is not set.');
  process.exit(1);
}

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

// Helper functions
function loadCSV(filename) {
  try {
    const content = fs.readFileSync(path.join(dataDir, filename), 'utf8');
    return parse(content, { columns: true, skip_empty_lines: true });
  } catch (error) {
    console.error(`❌ Error loading CSV file ${filename}: ${error.message}`);
    throw error;
  }
}

Read and filter data files

Let’s load in the relevant files and specific data now that will be used to explore prerequisite standards, learning components, and ultimately generate practice content with OpenAI. If you completed the previous tutorial, this one will feel a bit more advanced. The main difference is that instead of loading all the standards data, we’ll focus only on a specific subset of standards. Here’s what we’ll do:
  • Load only the StandardsFrameworks and StandardsFrameworkItems that are part of the Common Core Math Standards.
  • Pull in the hasChild relationships that connect those standards.
  • Include the LearningComponents that support the StandardsFrameworkItems we just loaded.
function loadData(aq) {
  /* Load CSV data files and build filtered dataset
   */
  
  // First load the data in arquero 
  const standardsFrameworkItems = aq.from(loadCSV('StandardsFrameworkItem.csv'));
  const learningComponents = aq.from(loadCSV('LearningComponent.csv'));
  const relationships = aq.from(loadCSV('Relationships.csv'));

  console.log('✅ Files loaded from KG CSV files');

  // Filter for relevant StandardsFrameworkItem by jurisdiction and subject 
  const relevantStandards = standardsFrameworkItems
    .params({ jurisdiction: JURISDICTION, academicSubject: ACADEMIC_SUBJECT })
    .filter(d => d.jurisdiction === jurisdiction && d.academicSubject === academicSubject);

  // Get array of relevant identifiers for filtering
  const relevantStandardIds = relevantStandards.array('caseIdentifierUUID');
  const relevantStandardSet = new Set(relevantStandardIds);

  // Filter relationships for buildsTowards and supports relationships
  const relevantRelationships = relationships
    .filter(aq.escape(d =>
      (d.relationshipType === 'buildsTowards' &&
        relevantStandardSet.has(d.sourceEntityValue) &&
        relevantStandardSet.has(d.targetEntityValue)) ||
      (d.relationshipType === 'supports' &&
        relevantStandardSet.has(d.targetEntityValue))
    ));

  // Get learning component IDs from supports relationships
  const supportRelationships = relevantRelationships
    .filter(d => d.relationshipType === 'supports');
  const linkedLearningComponentIds = supportRelationships.array('sourceEntityValue');
  const linkedLearningComponentSet = new Set(linkedLearningComponentIds);

  // Filter learning components by identifier
  const relevantLearningComponents = learningComponents
    .filter(aq.escape(d => linkedLearningComponentSet.has(d.identifier)));

  console.log('✅ Retrieved scoped graph:');
  console.log({
    standardsFrameworkItems: relevantStandards.numRows(),
    learningComponents: relevantLearningComponents.numRows(),
    relationships: relevantRelationships.numRows()
  });

  return {
    relevantStandards,
    relevantRelationships,
    relevantLearningComponents
  };
}
The loaded data from dataset files should look something like this (though it’s possible the counts might be different):
{
  "standardsFrameworkItems": 836,
  "learningComponents": 1481,
  "relationships": 2238
}

Step 2: find prerequisites

Get prerequisite standards for 6.NS.B.4

Now that the data is loaded, let’s get a target standard and filter through the relationship buildTowards to find prerequisite standards.
function getStandardAndPrerequisites(relevantStandards, relevantRelationships) {
  const targetStandardTable = relevantStandards
    .params({ targetCode: TARGET_CODE })
    .filter(d => d.statementCode === targetCode);

  if (targetStandardTable.numRows() === 0) {
    console.error(`❌ No StandardsFrameworkItem found for statementCode = "${TARGET_CODE}"`);
    return null;
  }

  const targetStandard = targetStandardTable.object();
  console.log(`✅ Found StandardsFrameworkItem for ${TARGET_CODE}:`)

  const prerequisiteLinks = relevantRelationships
    .params({ targetIdentifier: targetStandard.caseIdentifierUUID })
    .filter(d => d.relationshipType === 'buildsTowards' &&
      d.targetEntityValue === targetIdentifier);

  const prerequisiteStandards = prerequisiteLinks
    .join(relevantStandards, ['sourceEntityValue', 'caseIdentifierUUID'])
    .select('sourceEntityValue', 'statementCode', 'description_2')
    .rename({ sourceEntityValue: 'caseIdentifierUUID', description_2: 'standardDescription' });

  console.log(`✅ Found ${prerequisiteStandards.numRows()} prerequisite(s) for ${targetStandard.statementCode}:`);
  console.log(prerequisiteStandards.objects());

  return { targetStandard, prerequisiteStandards };
}
The data for the specific standard should look like this:
[
  {
    "caseIdentifierUUID": "6b9ed00e-d7cc-11e8-824f-0242ac160002",
    "statementCode": "4.OA.B.4",
    "standardDescription": "A buildsTowards relationship indicates that proficiency in one entity supports the likelihood of success in another, capturing a directional progression without requiring strict prerequisite order."
  },
  // ...
]

Get the learning components that support the prerequisite standards

Now, we’re going to filter the relationships table to find the learning components that supports the prerequisite standards that we found in the previous step. Please note, description_2 is auto-generated by the arquero library with _n being appended when multiple tables contain the same column name.
function getLearningComponentsForPrerequisites(prerequisiteStandards, relevantRelationships, relevantLearningComponents) {
  const prerequisiteLearningComponents = prerequisiteStandards
    .join(relevantRelationships, ['caseIdentifierUUID', 'targetEntityValue'])
    .params({ supportsType: 'supports' })
    .filter(d => d.relationshipType === supportsType)
    .join(relevantLearningComponents, ['sourceEntityValue', 'identifier'])
    .select('caseIdentifierUUID', 'statementCode', 'standardDescription', 'description_2')
    .rename({ description_2: 'learningComponentDescription' });

  console.log(`✅ Found ${prerequisiteLearningComponents.numRows()} supporting learning components for prerequisites:`);
  console.log(prerequisiteLearningComponents.objects());

  return prerequisiteLearningComponents;
}
The learning components and prerequisite standards data should look like this:
[
  {
    "caseIdentifierUUID": "6b9d5f43-d7cc-11e8-824f-0242ac160002",
    "statementCode": "5.OA.A.2",
    "standardDescription": "A buildsTowards relationship indicates that proficiency in one entity supports the likelihood of success in another, capturing a directional progression without requiring strict prerequisite order.",
    "learningComponentDescription": "Write simple expressions of two or more steps and with grouping symbols that record calculations with numbers"
  },
  // ...
]

Packaging the prerequisite functions

function queryPrerequisiteData(aq, relevantStandards, relevantRelationships, relevantLearningComponents) {
  const standardAndPrereqData = getStandardAndPrerequisites(relevantStandards, relevantRelationships);
  if (!standardAndPrereqData) {
    return null;
  }

  const { targetStandard, prerequisiteStandards } = standardAndPrereqData;
  const prerequisiteLearningComponents = getLearningComponentsForPrerequisites(prerequisiteStandards, relevantRelationships, relevantLearningComponents);

  return { targetStandard, prerequisiteLearningComponents };
}

Step 3: generate practice

Now that you’ve identified learning components and prerequisite standards, you can use those for downstream applications, such as generating practice problems. But, remember the caveats discussed above and apply your judgement to create appropriate learning experiences.

Package the data

Let’s create clean JSON to package the data so it will be easily parsable by the LLM and maintain the relationship structure of the data.
function packageContextData(targetStandard, prerequisiteLearningComponents) {
  /* Package the standards and learning components data for text generation
   * This creates a structured context that can be used for generating practice questions
   */

  // Convert dataframe to context format for LLM
  const allRows = prerequisiteLearningComponents.objects();
  const standardsMap = new Map();

  // Group learning components by standard for context
  for (const row of allRows) {
    if (!standardsMap.has(row.caseIdentifierUUID)) {
      standardsMap.set(row.caseIdentifierUUID, {
        statementCode: row.statementCode,
        description: row.standardDescription || '(no statement)',
        supportingLearningComponents: []
      });
    }

    standardsMap.get(row.caseIdentifierUUID).supportingLearningComponents.push({
      description: row.learningComponentDescription || '(no description)'
    });
  }

  const fullStandardsContext = {
    targetStandard: {
      statementCode: targetStandard.statementCode,
      description: targetStandard.description || '(no statement)'
    },
    prereqStandards: Array.from(standardsMap.values())
  };
  
  return fullStandardsContext;
}

Generate practice questions

Finally, we’ll inject that JSON into a prompt so the LLM has full context of what the user is looking to create practice problems for.
function generatePracticeData(fullStandardsContext) {
  /* Generate practice questions using OpenAI API
   * This creates educational content based on prerequisite data
   */
  return async function generatePractice() {
    console.log(`🔄 Generating practice questions for ${fullStandardsContext.targetStandard.statementCode}...`);

    try {
      // Build prompt inline
      let prerequisiteText = '';
      for (const prereq of fullStandardsContext.prereqStandards) {
        prerequisiteText += `- ${prereq.statementCode}: ${prereq.description}\n`;
        prerequisiteText += '  Supporting Learning Components:\n';
        for (const lc of prereq.supportingLearningComponents) {
          prerequisiteText += `    • ${lc.description}\n`;
        }
      }

      const prompt = `You are a math tutor helping middle school students. Based on the following information, generate 3 practice questions for the target standard. Questions should help reinforce the key concept and build on prerequisite knowledge.

Target Standard:
- ${fullStandardsContext.targetStandard.statementCode}: ${fullStandardsContext.targetStandard.description}

Prerequisite Standards & Supporting Learning Components:
${prerequisiteText}`;

      const response = await openai.chat.completions.create({
        model: OPENAI_MODEL,
        messages: [
          { role: 'system', content: 'You are an expert middle school math tutor.' },
          { role: 'user', content: prompt }
        ],
        temperature: OPENAI_TEMPERATURE
      });

      const practiceQuestions = response.choices[0].message.content.trim();

      console.log(`✅ Generated practice questions:\n`);
      console.log(practiceQuestions);

      return {
        aiGenerated: practiceQuestions,
        targetStandard: fullStandardsContext.targetStandard.statementCode,
        prerequisiteCount: fullStandardsContext.prereqStandards.length
      };
    } catch (err) {
      console.error('❌ Error generating practice questions:', err.message);
      throw err;
    }
  };
}
The resulting generated problems should look something like this:
Question 1: 
Find the greatest common factor of 36 and 90. Then use the distributive property to express the sum of these two numbers as a multiple of a sum of two whole numbers with no common factor.

Question 2: 
Write the expression "add 12 and 15, then multiply by 3" as an algebraic expression. After that, recognize that this expression is three times as large as 12 + 15, without having to calculate the indicated sum or product.

Question 3: 
Determine whether the number 72 is a multiple of the digit 8. Find all factor pairs of 72. Recognize that 72 is a multiple of each of its factors and determine whether 72 is a prime or a composite number.

Step 4: pulling it all together

Now let’s create a final function to run everything.
async function main() {
  console.log('\n=== GENERATE PREREQUISITE PRACTICE TUTORIAL ===\n');

  console.log('🔄 Step 1: Loading data...');
  const aq = await import('arquero');
  const { relevantStandards, relevantRelationships, relevantLearningComponents } = loadData(aq);

  console.log('\n🔄 Step 2: Querying prerequisite data...');
  const prerequisiteData = queryPrerequisiteData(aq, relevantStandards, relevantRelationships, relevantLearningComponents);

  if (!prerequisiteData) {
    console.error('❌ Failed to find prerequisite data');
    return;
  }

  const { targetStandard, prerequisiteLearningComponents } = prerequisiteData;

  console.log('\n🔄 Step 3: Generating practice...');
  const fullStandardsContext = packageContextData(targetStandard, prerequisiteLearningComponents);
  const generatePractice = generatePracticeData(fullStandardsContext);
  if (GENERATE_PRACTICE) {
    await generatePractice();
  } else {
    console.log('🚫 Practice question generation disabled');
  }
}

main().catch(console.error);

Conclusion

In this tutorial you started with standards data, followed the buildsTowards relationships to see what comes before a target standard, joined in the learning components that support those prerequisites, and then structured that context to generate new practice problems. Everything here was scoped to a single standard for clarity, but the same steps work across grade levels, subject areas, or even larger parts of the dataset. As you keep experimenting, try extending the queries to include lessons, assessments, or instructional routines to create more complete learning experiences.