Home/AI Spreadsheet Prep
RAG & AI Guide

How to Convert Lotus 1-2-3 Spreadsheets for RAG Pipelines and LLM Ingestion

A practical guide to turning legacy .123, .wk1, .wk3, and .wk4 archives into AI-ready tabular data — locally, securely, and at scale.

TL;DR

Convert Lotus 1-2-3 workbooks to CSV for value-only analytics and embeddings, and to Markdown when you want LLMs to keep table structure. Avoid PDF as an intermediate — it destroys the grid the model actually needs. Lotus Converter does this locally in bulk so regulated rows never leave your network.

The Problem: Legacy Spreadsheets Are Invisible to AI

If your finance, ops, or engineering team has been around for 20+ years, a non-trivial slice of your institutional knowledge is still in Lotus 1-2-3 files: chart-of-accounts ledgers, actuarial tables, plant cost models, rate-case workpapers, customer pricing books. These binary workbooks cannot be read by modern AI systems, vector databases, or embedding models. To an LLM, your historical numbers simply do not exist.

Building a RAG (Retrieval-Augmented Generation) system or private LLM that ignores your legacy spreadsheets means your AI is missing decades of quantitative context — the very numbers analysts ask follow-up questions about.

Step-by-Step: Lotus 1-2-3 to Vector Database

The workflow for making legacy Lotus workbooks AI-ready:

1

Inventory Your Source Archives

Locate your .123, .wk1, .wk3, .wk4, .wks, .wb1, .wb2, and .wb3 files. They're typically scattered across departmental network drives, retired file servers, and backup media. Lotus Converter scans entire folder trees recursively and can even detect Lotus files that have lost their extension by inspecting header bytes.

2

Batch-Convert to CSV and Markdown

Run Lotus Converter against the archive. Pick CSV when each sheet is one logical table and you want maximum token efficiency. Pick Markdown when the workbook has captions, totals, and notes that an LLM should read alongside the grid. Everything happens locally — no workbooks leave your machine.

3

Chunk Per Sheet, Not Per File

A single .wk4 file usually contains multiple worksheets. Treat each sheet as its own document for chunking — that keeps row context coherent and prevents the "summary tab" from polluting the "detail tab" embeddings. Markdown headings and CSV file names give you natural breakpoints.

4

Generate Embeddings

Run your chunks through an embedding model (OpenAI, Cohere, local models like Sentence-BERT, BGE). Clean tabular text in = higher quality vectors out = better numeric retrieval. Tag each chunk with file, sheet, year, and business unit so retrieval can filter on metadata.

5

Load Into Your Vector Store

Store embeddings in Pinecone, Weaviate, Chroma, pgvector, or any vector database. Your legacy quantitative knowledge is now queryable by your RAG pipeline alongside modern XLSX, DOCX, and PDF sources.

6

Query with RAG (and Tools)

When an analyst asks "what did our 1998 plant cost model assume for steel input prices?", your RAG system retrieves the relevant CSV/Markdown chunks. For numeric reasoning, pair retrieval with a sandboxed code-interpreter tool so the model can actually re-run the math instead of guessing.

Format Comparison: Which Output is Best for AI?

Not all spreadsheet outputs are created equal for LLM ingestion. Here's how the common targets compare:

FactorCSVMarkdownXLSXPDFRaw .wk4 / .123
LLM ReadabilityExcellentExcellentGood (needs parser)FairNone
Token EfficiencyHighestHighLow (XML overhead)Low (extraction noise)N/A
Structure PreservationRows and columnsTables, headings, captionsSheets, formulas, formatsLayout-dependentBinary format
Embedding QualityHighHighMedium (after parsing)Medium (noisy)N/A
Tool / Code-InterpreterNative (pandas, DuckDB)ConvertibleNative (openpyxl)Requires OCRNo tooling
Processing ComplexityDirect ingestionDirect ingestionXLSX parserPDF parser / OCRNeeds Lotus library
Best ForRAG over numeric tables; agents with code toolsRAG over annotated workbooks with notesRe-using the workbook in ExcelHuman reading, archivingNothing (legacy only)

What's the best format to feed legacy spreadsheets into an LLM?

CSV for raw tabular numerics where each sheet is a clean table and you want analysts' agents to query with pandas or DuckDB. Markdown for workbooks where comments, totals, and section headings carry meaning the model should keep. Avoid PDF as an intermediate — PDF extraction destroys the row/column grid that makes spreadsheets useful to AI.

Why Local Processing Matters for AI Pipelines

Most teams want to build private RAG systems specifically to keep regulated data — financial models, customer pricing, employee records — off third-party servers. Using a cloud-based converter to prepare those files for a private AI defeats the purpose. Lotus Converter processes everything on your machine, maintaining a complete chain of custody from legacy workbook to vector database.

This is especially critical for banks and credit unions (GLBA), healthcare and benefits admins (HIPAA), regulated utilities and public-sector records (rate cases and FOIA), and any enterprise with SOX or GDPR obligations.

Related Reading

Ready to make your legacy spreadsheets AI-ready?

Download the free trial and convert up to 25 workbooks. See how quickly Lotus 1-2-3 becomes clean, structured CSV and Markdown for your RAG pipeline.

Free trial

Full app features - up to 25 files

Windows 10 or 11

Same free trial whether you use the Microsoft Store or the offline MSI - pick the option that fits your PC or IT policy.

Same free trial: install from the Microsoft Store or download the offline MSI
Microsoft StoreRecommendedOffline MSIAir-gapped or scripted
MSI coming soon

Same trial as Store