Skip to main content
When you upload a file to RelayHub, it goes through an automated pipeline that extracts text, generates vector embeddings, and builds knowledge graph connections. This page explains how to upload files and what happens behind the scenes.

How to Upload Files

There are three ways to get files into RelayHub:
1
Upload during a chat
2
Click the paperclip icon in the chat input bar, select one or more files, and send your message. The files are uploaded, processed, and immediately available for the AI to reference in that conversation.
3
Upload through File Hub
4
Open File Hub from the sidebar and click the Upload button. Drag and drop files or browse your filesystem. Files uploaded here are added to your personal library and available across all your conversations.
5
Upload to a workspace
6
Inside a workspace, navigate to the files section and upload there. These files are scoped to the workspace and accessible to all workspace members.
You can upload multiple files at once. Each file is processed independently, so a slow-to-process PDF will not block a quick CSV from becoming available.

Supported Formats

FormatExtensionsWhat Gets Extracted
PDF.pdfFull text, tables (via PdfPlumber), images sent to vision AI
Word.docxText, headings, tables, embedded images
Excel.xlsx, .csvAll sheets parsed into queryable tabular data
PowerPoint.pptxSlide text, speaker notes, table content
Images.png, .jpgOCR text extraction, chart/diagram interpretation via vision AI
JSON.jsonParsed and indexed as structured data
Plain text.txt, .mdDirect text indexing

The Processing Pipeline

Every uploaded file passes through three stages automatically:
1
Text extraction
2
RelayHub reads the file contents using format-specific parsers. PDFs are processed with PyMuPDF for text and PdfPlumber for table extraction. Word and PowerPoint files are parsed for all text elements including headers, footers, and notes. Spreadsheets are converted into structured tabular representations. Images are sent through vision AI for OCR and content recognition.
3
Embedding generation
4
The extracted text is split into semantically meaningful chunks and converted into vector embeddings using a language model. These embeddings are stored in pgvector (PostgreSQL with vector extensions), enabling fast semantic search. When you ask the AI a question, it searches these embeddings to find the most relevant passages from your documents.
5
Knowledge graph extraction
6
An AI model reads the extracted text and identifies entities (people, companies, products, concepts, dates) and the relationships between them. These are stored as nodes and edges in a knowledge graph that you can explore visually. This step runs as a background worker task and may complete a few seconds after the embeddings are ready.

Processing Times

Most files complete processing in under 30 seconds. Factors that affect processing time:
  • File size — A 5-page PDF processes in seconds; a 200-page document takes longer
  • Tables — PDFs with complex tables require additional extraction passes
  • Images — Files containing images trigger vision AI analysis, adding a few seconds per image
  • Spreadsheets — Large datasets (10,000+ rows) take longer to chunk and embed

Reprocessing Files

If a file shows an Error status or if you suspect the initial processing missed content, you can reprocess it:
  1. Open the file in File Hub
  2. Click Reprocess in the file detail panel
  3. The file goes back through the full pipeline — extraction, embeddings, and knowledge graph
Reprocessing replaces all existing embeddings and knowledge graph data for that file. This is useful after platform updates that improve extraction quality.

File Size Limits

The maximum upload size depends on your instance configuration. The default limit is 50 MB per file. Contact your administrator if you need to upload larger files.
Extremely large spreadsheets (100,000+ rows) are supported but may take several minutes to process. Consider splitting very large datasets if you only need to analyze a subset of the data.