Overview
This tutorial is going to walk you through using Knowledge Graph to support one of the most common use cases and data access patterns for: “I want access to the contents of state standards, to tag my content or otherwise help in my content creation process.”
State standards are modeled in Knowledge Graph as two core entities: StandardsFramework and StandardsFrameworkItem. To learn more about these entities please read the data reference for Academic Standards.
Here is a diagram that explains how the standards data is modeled.
Key demonstrated capabilities
- Using standards identifiers to find deeper-level relationships
- Generate embeddings of standards data
- Basic vector search to find standards from description
Prerequisites
- This tutorial assumes you’ve downloaded Knowledge Graph already. If you haven’t please see download instructions.
- You can use either Node or Python to go through this tutorial
- If using Node
node 14+
dotenv
arquero
csv-parse
xenova/transformers
fast-cosine-similarity
- If using Python
python 3.9+
transformers
pandas
torch
scikit-learn
python-dotenv
Step 1: setup
Load dependencies and env variables
First you’ll need to set up your environment variables for the location of the data files you have downloaded. This tutorial assumes your variable names look like the following:
# .env
# Knowledge Graph data path - update with your actual path to CSV files
KG_DATA_PATH=/path/to/your/knowledge-graph/csv/files
Then set up the javascript app, define constants, and helper functions.
// Dependencies
const fs = require('fs');
const path = require('path');
const { parse } = require('csv-parse/sync');
const { cosineSimilarity } = require('fast-cosine-similarity');
require('dotenv').config();
// Constants
const GENERATE_EMBEDDINGS = true;
const MIDDLE_SCHOOL_GRADES = ['6', '7', '8'];
// For this tutorial, we use 'all-MiniLM-L6-v2' which provides good quality embeddings
// for short text. You can substitute any compatible embedding model.
const EMBEDDING_MODEL = 'Xenova/all-MiniLM-L6-v2';
// Environment setup
const dataDir = process.env.KG_DATA_PATH;
if (!dataDir) {
console.error('❌ KG_DATA_PATH environment variable is not set.');
process.exit(1);
}
const EMBEDDING_FILE_PATH = path.join(dataDir, 'california_math_embeddings.json');
// Initialize embedding pipeline (will be loaded on first use)
let embedder = null;
let pipeline = null;
// Helper functions
function loadCSV(filename) {
try {
const content = fs.readFileSync(path.join(dataDir, filename), 'utf8');
return parse(content, { columns: true, skip_empty_lines: true });
} catch (error) {
console.error(`❌ Error loading CSV file ${filename}: ${error.message}`);
throw error;
}
}
function findFrameworkItem(caseIdentifierUUID, standardsFrameworkItemsData) {
return standardsFrameworkItemsData.find(item => item.caseIdentifierUUID === caseIdentifierUUID);
}
Read and filter data files
Let’s load the data into memory for use with the rest of the tutorial. We’ll load the StandardsFramework
and StandardsFrameworkItem
directly.
function loadData(aq) {
/* Load CSV data files needed for the tutorial
*/
const standardsFrameworksData = loadCSV('StandardsFramework.csv');
const standardsFrameworkItemsData = loadCSV('StandardsFrameworkItem.csv');
console.log('✅ Files loaded from KG CSV files');
console.log({
standardsFrameworks: standardsFrameworksData.length,
standardsFrameworkItems: standardsFrameworkItemsData.length
});
return { standardsFrameworksData, standardsFrameworkItemsData };
}
Step 2: query for standards data
Retrieving standards frameworks
Let’s get all the state frameworks in Knowledge Graph for math standards. These are the various groupings of standards for the different states and academic subjects.
function getMathStandardsFrameworks(aq, standardsFrameworksData) {
// First get all math standards frameworks
const mathFrameworks = aq.from(standardsFrameworksData)
.filter(d => d.academicSubject === 'Mathematics')
.select('caseIdentifierUUID', 'name', 'jurisdiction', 'academicSubject');
console.log(`✅ Retrieved ${mathFrameworks.numRows()} state standard frameworks for math (dataframe):`);
console.log('Sample of first 5 frameworks:');
console.log(mathFrameworks.slice(0,5).objects());
const californiaFrameworkTable = aq.from(standardsFrameworksData)
.filter(d => d.jurisdiction === 'California' && d.academicSubject === 'Mathematics')
.select('caseIdentifierUUID', 'name', 'jurisdiction', 'academicSubject');
// Then get just the Califoria math framework
const californiaFramework = californiaFrameworkTable.object();
console.log(`✅ Retrieved ${californiaFramework ? 1 : 0} California math standards framework:`);
if (californiaFramework) {
console.log(californiaFramework);
}
return { mathFrameworks, californiaFramework };
}
The returned data for California math standards framework should look like this:
{
"caseIdentifierUUID": "c6487102-d7cb-11e8-824f-0242ac160002",
"name": "California Common Core State Standards - Mathematics",
"jurisdiction": "California",
"academicSubject": "Mathematics"
}
Retrieving standards groupings
Let’s now get all the standards groupings within the California math standards framework, using the identifier returned by the previous code block. To show specificity, we’re going to look for middle school standards groupings. The standards groupings are useful if you’re looking to get a particular set of standards that states tend to group by names like clusters, domains, etc.
function getMiddleSchoolStandardsGroupings(aq, standardsFrameworkItemsData) {
const groupings = aq.from(standardsFrameworkItemsData)
.filter(aq.escape(item => item.jurisdiction === 'California' &&
item.academicSubject === 'Mathematics' &&
item.normalizedStatementType === 'Standard Grouping' &&
(JSON.parse(item.gradeLevel || '[]')).some(level => MIDDLE_SCHOOL_GRADES.includes(level))))
.select('caseIdentifierUUID', 'statementCode', 'description', 'normalizedStatementType', 'statementType', 'gradeLevel');
console.log(`✅ Retrieved ${groupings.numRows()} standard groupings for middle school math in California (dataframe):`);
console.log('Sample of first 5 standard groupings:');
console.log(groupings.slice(0,5).objects());
return groupings;
}
The returned data for California middle school math standard clusters and domains should look like this:
Some StandardsFrameworkItems
won’t have a statementCode
, depending on how a given state has published their standards framework.
[
{
"caseIdentifierUUID": "5ebeb890-d7cc-11e8-824f-0242ac160002",
"statementCode": "6.RP.A",
"description": "Understand ratio concepts and use ratio reasoning to solve problems.",
"normalizedStatementType": "Standard Grouping",
"statementType": "Cluster",
"gradeLevel": ["6"]
},
//...
]
Retrieving all standards from a standard cluster
Lastly, let’s now get all of the individual standards for all of middle school math in California. If we wanted to we could also filter on a specific Standard Grouping to specify even further.
function getMiddleSchoolStandards(aq, standardsFrameworkItemsData) {
const standards = aq.from(standardsFrameworkItemsData)
.filter(aq.escape(item => item.jurisdiction === 'California' &&
item.academicSubject === 'Mathematics' &&
item.normalizedStatementType === 'Standard' &&
(JSON.parse(item.gradeLevel || '[]')).some(level => MIDDLE_SCHOOL_GRADES.includes(level))))
.select('caseIdentifierUUID', 'statementCode', 'description', 'normalizedStatementType', 'gradeLevel');
console.log(`✅ Retrieved ${standards.numRows()} standards for California middle school mathematics (dataframe):`);
console.log('Sample of first 5 standards:');
console.log(standards.slice(0, 5).objects());
return standards;
}
The returned data for the standards within cluster 8.G.B should look like this:
[
{
"caseIdentifierUUID": "5ec25ed1-d7cc-11e8-824f-0242ac160002",
"statementCode": "8.EE.B.5",
"description": "Graph proportional relationships, interpreting the unit rate...",
"normalizedStatementType": "Standard",
"gradeLevel": ["8"]
},
// ...
]
Run all standards query functions
We have 3 separate functions to query for different levels of standards data in Knowledge Graph. Let’s create a simple wrapper function that simply calls all 3 at once.
function queryForStandardsData(aq, standardsFrameworksData, standardsFrameworkItemsData) {
const { mathFrameworks, californiaFramework } = getMathStandardsFrameworks(aq, standardsFrameworksData);
const groupings = getMiddleSchoolStandardsGroupings(aq, standardsFrameworkItemsData);
const standards = getMiddleSchoolStandards(aq, standardsFrameworkItemsData);
return { californiaFramework, groupings, standards };
}
Step 3: standards vector search
Generate embeddings of standards
One of the most common steps anyone will take in trying to use Knowledge Graph within LLM pipelines will be to generate embeddings of the datasets for use in RAG and general vector search applications.
For the purposes of this tutorial we’re going to generate embeddings only for middle school math standards in California, but you can use this pattern to scale up to all standards if you’d like. This will allow us to explore how vector search works for skills, concepts, topics, etc. in middle school math, rather than only being able to perform text matching.
The code below shows vector embeddings being saved as a local JSON file. This is sufficient for the tutorial, but you might also want to store embeddings in a vector database such as Pinecone or Chroma.
function embedStandardData(aq, standardsFrameworkItemsData) {
// Generate embeddings for California middle school mathematics standards
const embeddingStandards = aq.from(standardsFrameworkItemsData)
.filter(aq.escape(item => item.jurisdiction === 'California' &&
item.academicSubject === 'Mathematics' &&
item.normalizedStatementType === 'Standard' &&
(JSON.parse(item.gradeLevel || '[]')).some(level => MIDDLE_SCHOOL_GRADES.includes(level)) &&
!!item.description))
.objects();
/* Generate and save embeddings for each standard
* This creates vector representations of standard descriptions for semantic search
*/
return async function generateEmbeddings() {
const results = [];
console.log(`🔄 Generating embeddings for ${embeddingStandards.length} standards...`);
// Initialize embedder if not already done
if (!embedder) {
console.log('📥 Loading embedding model (first time only)...');
if (!pipeline) {
const { pipeline: pipelineImport } = await import('@xenova/transformers');
pipeline = pipelineImport;
}
embedder = await pipeline('feature-extraction', EMBEDDING_MODEL);
console.log('✅ Embedding model loaded');
}
for (const standard of embeddingStandards) {
const code = standard.statementCode || '(no code)';
try {
const output = await embedder(standard.description, { pooling: 'mean', normalize: true });
const embedding = Array.from(output.data);
results.push({
caseIdentifierUUID: standard.caseIdentifierUUID,
statementCode: standard.statementCode,
embedding: embedding
});
console.log(`✅ ${code}`);
} catch (err) {
console.error(`❌ ${code}: ${err.message}`);
throw err;
}
}
// Save embeddings to file
fs.writeFileSync(EMBEDDING_FILE_PATH, JSON.stringify(results, null, 2));
console.log(`✅ Saved ${results.length} embeddings to ${EMBEDDING_FILE_PATH}`);
};
}
If everything worked well, you should see this:
✅ Embedding model loaded
✅ MP1
✅ MP2
✅ 6.RP.A.1
✅ 6.RP.A.2
...
Find a standard from a description
Now that we have embeddings generated and stored, we can perform basic vector search to find data in Knowledge Graph from a description of a standard or skill. Since the embeddings are stored as a local JSON, we’re going to utilize cosine similarity for vector search, but if you’re using a vector database, this process is typically built-in and handled automatically.
function vectorSearchStandardData(standardsFrameworkItemsData) {
/* Perform vector search using cosine similarity
*/
return async function vectorSearch(query, topK = 5) {
// Initialize embedder if not already done
if (!embedder) {
console.log('📥 Loading embedding model...');
if (!pipeline) {
const { pipeline: pipelineImport } = await import('@xenova/transformers');
pipeline = pipelineImport;
}
embedder = await pipeline('feature-extraction', EMBEDDING_MODEL);
console.log('✅ Embedding model loaded');
}
let queryEmbedding;
try {
const output = await embedder(query, { pooling: 'mean', normalize: true });
queryEmbedding = Array.from(output.data);
} catch (error) {
console.error(`❌ Error generating embedding for query "${query}": ${error.message}`);
return;
}
let storedEmbeddings;
try {
storedEmbeddings = JSON.parse(fs.readFileSync(EMBEDDING_FILE_PATH, 'utf8'));
} catch (error) {
console.error(`❌ Error loading embeddings from ${EMBEDDING_FILE_PATH}: ${error.message}`);
console.error('💡 Make sure to run the embedding generation step first (Step 3)');
return;
}
const topResults = storedEmbeddings
.map(item => ({
caseIdentifierUUID: item.caseIdentifierUUID,
score: cosineSimilarity(queryEmbedding, item.embedding)
}))
.sort((a, b) => b.score - a.score)
.slice(0, topK);
console.log(`\nTop ${topK} results for "${query}":\n`);
topResults.forEach((result, i) => {
const frameworkItem = findFrameworkItem(result.caseIdentifierUUID, standardsFrameworkItemsData);
const statementCode = frameworkItem?.statementCode || '(no code)';
const description = frameworkItem?.description || '(no statement)';
topResults[i].statementCode = statementCode;
topResults[i].description = description;
});
console.log(topResults);
};
}
Step 4: pulling it all together
Now let’s create a final function to run everything, as well as pass a search string to the vector search function.
async function main() {
const aq = await import('arquero');
console.log('\n=== WORKING WITH STATE STANDARDS TUTORIAL ===\n');
console.log('🔄 Step 1: Loading data...');
const { standardsFrameworksData, standardsFrameworkItemsData } = loadData(aq);
console.log('\n🔄 Step 2: Querying for standards data...');
queryForStandardsData(aq, standardsFrameworksData, standardsFrameworkItemsData);
console.log('\n🔄 Step 3: Embedding standard data...');
const generateEmbeddings = embedStandardData(aq, standardsFrameworkItemsData);
if (GENERATE_EMBEDDINGS) {
await generateEmbeddings();
} else {
console.log('🚫 Embedding generation disabled');
}
console.log('\n🔄 Step 4: Vector searching standard data...');
const vectorSearch = vectorSearchStandardData(standardsFrameworkItemsData);
await vectorSearch('linear equations');
}
main().catch(console.error);
If everything was successful you should see the results of the vector search:
[
{
"caseIdentifierUUID": "5ec3386d-d7cc-11e8-824f-0242ac160002",
"score": 0.8438616923542105,
"statementCode": "8.EE.C.7",
"description": "Solve linear equations in one variable."
},
{
"caseIdentifierUUID": "5ec387e8-d7cc-11e8-824f-0242ac160002",
"score": 0.6576467155206818,
"statementCode": "8.EE.C.8",
"description": "Analyze and solve pairs of simultaneous linear equations."
},
// ...
]
Conclusion
In this walkthrough you loaded state standards, pulled the math frameworks, filtered on California middle school, and generated embeddings so you could run a simple vector search from a description. That gives you a working path from raw standards data to something you can query semantically. From here, you can try scaling the embeddings to additional grades or subjects, swapping in another state, or moving the vectors into a dedicated store (e.g., Pinecone/Chroma) and wiring this up to your RAG pipeline.