Grading is the bottleneck of education. Teachers spend hundreds of hours manually reviewing descriptive answers, checking diagrams, and deciphering handwriting.Grading is the bottleneck of education. Teachers spend hundreds of hours manually reviewing descriptive answers, checking diagrams, and deciphering handwriting.

How to Automate Exam Grading with RAG and CLIP

2025/12/16 03:00
6 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Grading is the bottleneck of education. Teachers spend hundreds of hours manually reviewing descriptive answers, checking diagrams, and deciphering handwriting. It’s subjective, exhausting, and prone to inconsistency.

While Multiple Choice Questions (MCQs) are easy to automate, Descriptive and Diagrammatic answers have always been the "final boss" for EdTech.

Most existing solutions rely on simple keyword matching (TF-IDF) or basic BERT models, which fail to understand context or evaluate visual diagrams. In this guide, we are going to build a system that solves this using Retrieval-Augmented Generation (RAG) and Multimodal AI.

We will architect a solution that:

  1. Ingests textbooks to create a "Ground Truth" knowledge base.
  2. Uses Local LLMs (Mistral via Ollama) to generate model answers.
  3. Uses Semantic Search to grade text.
  4. Uses CLIP to grade student diagrams.

Let’s build.

The Architecture: A Dual-Pipeline System

We are building a pipeline that handles two distinct data types: Text and Images. We cannot rely on the LLM's internal knowledge alone (hallucination risk), so we ground it in a Vector Database created from the course textbooks.

Here is the high-level data flow:

The Tech Stack

  • LLM Runtime: Ollama (running Mistral 7B)
  • Orchestration: LangChain
  • Vector DB: FAISS (CPU optimized)
  • Embeddings (Text): thenlper/gte-base or all-MiniLM-L6-v2
  • Embeddings (Image): OpenAI CLIP (ViT-B-32)
  • OCR: PaddleOCR (for extracting labels from diagrams)

Phase 1: The Knowledge Base (Ingestion)

First, we need to turn a static PDF textbook into a query-based database. We don't just want text; we need to extract diagrams and their captions to grade visual questions later.

The Extraction Logic

We use pdfplumber for text and PaddleOCR to find diagram labels.

import pdfplumber from paddleocr import PaddleOCR def ingest_textbook(pdf_path): ocr = PaddleOCR(use_angle_cls=True, lang='en') documents = [] with pdfplumber.open(pdf_path) as pdf: for page in pdf.pages: # 1. Extract Text text = page.extract_text() # 2. Extract Images (Pseudo-code for brevity) # In production, use fitz (PyMuPDF) to extract binary image data images = extract_images_from_page(page) # 3. OCR on Images to get Captions/Labels for img in images: result = ocr.ocr(img, cls=True) caption = " ".join([line[1][0] for line in result[0]]) # Associate diagram with text context documents.append({ "content": text + "\n [DIAGRAM: " + caption + "]", "type": "mixed" }) return documents

Once extracted, we chunk the text (500 characters with overlap) and store it in FAISS.

Phase 2: Generating the "Perfect" Answer (RAG)

To grade a student, we first need to know what the correct answer looks like. We don't rely on a teacher's answer key alone; we generate a dynamic model answer from the textbook to ensure it matches the curriculum exactly.

We use LangChain to retrieve the relevant context and Mistral to synthesize the answer.

from langchain.chains import RetrievalQA from langchain_community.llms import Ollama from langchain_community.vectorstores import FAISS from langchain_community.embeddings import HuggingFaceEmbeddings # 1. Setup Embeddings & Vector Store embeddings = HuggingFaceEmbeddings(model_name="thenlper/gte-base") vectorstore = FAISS.load_local("textbook_index", embeddings) # 2. Setup Local LLM via Ollama llm = Ollama(model="mistral") # 3. Create RAG Chain qa_chain = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", retriever=vectorstore.as_retriever(search_kwargs={"k": 3}), return_source_documents=True ) def generate_model_answer(question): # Optimize prompt for academic precision prompt = f""" You are a science teacher. Answer the following question based ONLY on the context provided. Question: {question} Answer within 50-80 words. """ result = qa_chain.invoke(prompt) return result['result']

Phase 3: Grading the Text (Semantic Similarity)

Now we compare the Student's Answer against the Model's Answer.

We avoid exact keyword matching because students phrase things differently. Instead, we use Cosine Similarity on sentence embeddings.

from sentence_transformers import SentenceTransformer, util model = SentenceTransformer('all-MiniLM-L6-v2') def grade_text_response(student_ans, model_ans): # Encode both answers embedding_1 = model.encode(student_ans, convert_to_tensor=True) embedding_2 = model.encode(model_ans, convert_to_tensor=True) # Calculate Cosine Similarity score = util.pytorch_cos_sim(embedding_1, embedding_2) return score.item() # Returns value between 0 and 1

Note: In our experiments, a raw similarity score of 0.85+ usually correlates to full marks. We scale the scores: anything above 0.85 is a 100%, and anything below 0.4 is a 0%.

Phase 4: Grading the Diagrams (CLIP)

This is the hardest part. How do you grade a hand-drawn diagram of a "Neuron" or "Flower"?

We use CLIP (Contrastive Language-Image Pre-Training). CLIP understands the semantic relationship between images. We compare the embedding of the student's drawing (or uploaded image) against the embedding of the "Gold Standard" diagram from the textbook.

from transformers import CLIPProcessor, CLIPModel from PIL import Image import torch # Load CLIP model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32") processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32") def grade_diagram(student_img_path, textbook_img_path): image1 = Image.open(student_img_path) image2 = Image.open(textbook_img_path) # Process images inputs = processor(images=[image1, image2], return_tensors="pt", padding=True) # Get Embeddings outputs = model.get_image_features(**inputs) # Normalize outputs = outputs / outputs.norm(p=2, dim=-1, keepdim=True) # Calculate Similarity similarity = (outputs[0] @ outputs[1].T).item() return similarity

Phase 5: The Final Grading Algorithm

Finally, we aggregate the scores based on the question type. If a question requires both text and a diagram, we apply weights.

The Logic:

  1. Length Check: If the student's answer is too short (<30% of expected length), apply a penalty.
  2. Weighted Scoring: Final Score = (TextScore * 0.7) + (DiagramScore * 0.3)
  3. Thresholding:

| Similarity Score | Grade Percentage | |----|----| | > 0.85 | 100% (Full Marks) | | 0.6 - 0.85 | 50% (Half Marks) | | 0.25 - 0.6 | 25% | | < 0.25 | 0% |

def calculate_final_grade(text_sim, img_sim, max_marks, has_diagram=False): if has_diagram: # 70% weight to text, 30% to diagram combined_score = (text_sim * 0.7) + (img_sim * 0.3) else: combined_score = text_sim # Apply Thresholds if combined_score > 0.85: marks = max_marks elif combined_score > 0.6: marks = max_marks * 0.5 elif combined_score > 0.25: marks = max_marks * 0.25 else: marks = 0 return round(marks, 1)

Results and Reality Check

We tested this on CBSE Class 10 Science papers.

  • Time Saved: Manual grading took ~20 minutes per paper. The AI took 5-6 minutes.
  • Accuracy: The system achieved high alignment with human graders on descriptive answers.
  • Challenge: CLIP struggles if the student's diagram is rotated or poorly lit. The text grader can sometimes be too lenient if the student uses the right keywords but in the wrong order.

Conclusion

We have moved beyond simple multiple-choice scanners. By combining RAG for factual grounding and CLIP for visual understanding, we can build automated grading systems that are fair, consistent, and tireless.

This architecture isn't just for schools, it applies to technical interviews, certification exams, and automated compliance checking.

Ready to build? Start by installing Ollama and getting your vector store running. The future of education is automated.

\

Market Opportunity
null Logo
null Price(null)
--
----
USD
null (null) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Why This New Trending Meme Coin Is Being Dubbed The New PEPE After Record Presale

Why This New Trending Meme Coin Is Being Dubbed The New PEPE After Record Presale

The post Why This New Trending Meme Coin Is Being Dubbed The New PEPE After Record Presale appeared on BitcoinEthereumNews.com. Crypto News 17 September 2025 | 20:13 The meme coin market is heating up once again as traders look for the next breakout token. While Shiba Inu (SHIB) continues to build its ecosystem and PEPE holds onto its viral roots, a new contender, Layer Brett (LBRETT), is gaining attention after raising more than $3.7 million in its presale. With a live staking system, fast-growing community, and real tech backing, some analysts are already calling it “the next PEPE.” Here’s the latest on the Shiba Inu price forecast, what’s going on with PEPE, and why Layer Brett is drawing in new investors fast. Shiba Inu price forecast: Ecosystem builds, but retail looks elsewhere Shiba Inu (SHIB) continues to develop its broader ecosystem with Shibarium, the project’s Layer 2 network built to improve speed and lower gas fees. While the community remains strong, the price hasn’t followed suit lately. SHIB is currently trading around $0.00001298, and while that’s a decent jump from its earlier lows, it still falls short of triggering any major excitement across the market. The project includes additional tokens like BONE and LEASH, and also has ongoing initiatives in DeFi and NFTs. However, even with all this development, many investors feel the hype that once surrounded SHIB has shifted elsewhere, particularly toward newer, more dynamic meme coins offering better entry points and incentives. PEPE: Can it rebound or is the momentum gone? PEPE saw a parabolic rise during the last meme coin surge, catching fire on social media and delivering massive short-term gains for early adopters. However, like most meme tokens driven largely by hype, it has since cooled off. PEPE is currently trading around $0.00001076, down significantly from its peak. While the token still enjoys a loyal community, analysts believe its best days may be behind it unless…
Share
BitcoinEthereumNews2025/09/18 02:50
USD/JPY Intervention: How Verbal Warnings Dramatically Slowed the Japanese Yen’s Slide

USD/JPY Intervention: How Verbal Warnings Dramatically Slowed the Japanese Yen’s Slide

BitcoinWorld USD/JPY Intervention: How Verbal Warnings Dramatically Slowed the Japanese Yen’s Slide TOKYO, March 2025 – Japanese authorities’ carefully calibrated
Share
bitcoinworld2026/03/30 23:25
USDH Power Struggle Ignites Stablecoin “Bidding Wars” Across DeFi: Bloomberg

USDH Power Struggle Ignites Stablecoin “Bidding Wars” Across DeFi: Bloomberg

A heated contest for control over a new dollar-pegged token has set the stage for what analysts say could define the next phase of the stablecoin industry. According to Bloomberg, a bidding war unfolded on Hyperliquid, one of crypto’s fastest-growing trading platforms, with the prize being the right to issue USDH, its native stablecoin. The competition drew some of the sector’s most prominent names, including Paxos, Sky, and Ethena, who later withdrew their bid, alongside the lesser-known Native Markets, a startup backed by Stripe stablecoin subsidiary Bridge. Hyperliquid Stablecoin Race Shows Branding and Partnerships Matter as Much as Tech Over the weekend, Hyperliquid’s validators, the contributors who secure the network and vote on key decisions, awarded the USDH contract to Native Markets over the weekend. Despite its relatively new status, the firm’s connection with Stripe helped it outpace more established rivals. Stablecoins underpin decentralized finance by providing a dollar-backed medium for collateral, settlement, and payments across applications. What began as a grassroots, community-led sector has evolved into a battleground for institutions and payment companies seeking revenue from interest on reserves. Circle, for example, shares proceeds from its USDC with Coinbase under a partnership designed to stabilize earnings during market swings. The Hyperliquid contest offered a rare glimpse into just how intense competition has become. Paxos pledged to take no revenue until USDH surpassed $1 billion in circulation. Agora offered to share 100% of net revenue with Hyperliquid, while Ethena put forward 95%. All were outbid by Native Markets, whose ties to Stripe’s $1.1 billion acquisition of Bridge and subsequent rollout of the Tempo blockchain positioned it as a strong contender. “Every stablecoin issuer is extremely desperate for supply,” said Zaheer Ebtikar, co-founder of Split Capital. “They are willing to publicly announce how much they are willing to offer. It just shows it’s a very tough business for stablecoin issuers.” While USDC remains dominant on Hyperliquid with more than $5.6 billion in deposits, the arrival of USDH could shift flows and revenue dynamics. Paxos co-founder Bhau Kotecha said the firm sees the exchange’s growth as an important opportunity, while Agora’s co-founder Nick van Eck warned that awarding the contract to a vertically integrated issuer risked undermining decentralization. Regulatory positioning also factored into the debate. Paxos operates under a New York trust charter and is seeking a federal license, while Bridge holds money transmitter approvals in 30 states. Native Markets, in a blog post, cited regulatory flexibility and deployment speed as reasons for its selection. Hyperliquid said the strong engagement from its community validated the process. Circle CEO Jeremy Allaire dismissed concerns over USDC’s status, noting on X that competition benefits the ecosystem. Analysts suggested that fears of centralization may be exaggerated, noting that Hyperliquid is likely to remain neutral and support multiple stablecoins. Still, the contest over USDH highlighted a new reality for stablecoins: branding, partnerships, and business strategy are becoming as decisive as technology. Native Markets Secures USDH Stablecoin Mandate on Hyperliquid Hyperliquid has concluded its governance vote for the USDH stablecoin, awarding the mandate to Native Markets after a closely watched process that drew weeks of community debate and rival proposals. USDH, described by Hyperliquid as a “Hyperliquid-first, compliant, and natively minted” dollar-backed token, is intended to reduce the platform’s dependence on USDC and strengthen its spot markets. Validators on the decentralized exchange voted in favor of Native Markets, a relatively new player backed by Stripe’s Bridge subsidiary, over established contenders including Paxos and Ethena. The outcome followed a string of proposals offering aggressive revenue-sharing terms to win validator support, underscoring the scale of incentives attached to controlling USDH. Hyperliquid’s exchange has become a critical hub for stablecoin liquidity, with $5.7 billion in USDC, around 8% of its total supply, currently held on the network. At prevailing treasury yields, that translates to an estimated $200 million to $220 million in annual revenue for Circle, underlining why a native alternative could be transformative. Hyperliquid’s validators, who secure the network and vote on key decisions, selected Native Markets following an on-chain governance process that concluded September 15. Native Markets has laid out a phased rollout for USDH, beginning with capped minting and redemption trials before expanding into spot markets. Its reserves will be managed in cash and treasuries by BlackRock, with on-chain tokenization through Superstate and Bridge. Yield from those reserves will be split between Hyperliquid’s Assistance Fund and ecosystem development. The launch of USDH comes as Hyperliquid records record profits from perpetual futures trading, with $106 million in revenue in August alone, and prepares to slash spot trading fees by 80% to bolster liquidity. Analysts say the move positions Hyperliquid to capture more of the stablecoin economics internally, marking a significant step in its bid to rival the largest players in decentralized finance
Share
CryptoNews2025/09/18 00:48