FabriXWork User GuideFabriXWork User Guide
Home
Getting Started
Use Cases
Best Practices
Help
Why FabriXWork
Home
Getting Started
Use Cases
Best Practices
Help
Why FabriXWork
  • Use Cases

    • Use Cases
    • Create Presentation Slides from Your Documents
    • Auto-Fill Forms (DOCX, Excel, PDF)
    • Check Documents for Compliance
    • Build an Interactive Learning Tool
    • Customer Insight & Strategy Recommendations
    • Extract & Analyze Data from Multiple PDFs
    • Create a Promotional Website
    • Convert Static Forms to Interactive HTML Forms

Extract & Analyze Data from Multiple PDFs

Extract data from multiple PDFs and create summary tables with basic analysis. No manual data entry, no copy-paste, no hours of tedious work.

Why This Matters

The problem: Manually extracting data from multiple PDFs is time-consuming and error-prone, often taking 2-3 hours with no guarantee of accuracy.

The FabriXWork way: Your agent extracts data from all PDFs and creates summary tables with basic analysis in minutes. Add more PDFs, re-extract instantly.

See It in Action: Invoice Data Extraction

This demo shows how 20 invoice PDFs become a summary table with analysis instantly.

Note: Some scenes in this video have been accelerated to 10× speed to enhance the viewing experience. The prompt used in this video can be found in the Featured Example section below.

How it Works:

┌──────────────────────────────────────────────────────────┐
│  INPUT                          OUTPUT                   │
│  ┌──────────────┐               ┌─────────────────────┐  │
│  │ invoice-001. │               │  Summary Table      │  │
│  │   pdf        │               │  • All Invoices     │  │
│  │ invoice-002. │               │  • Totals           │  │
│  │   pdf        │    ──────►    │  • Averages         │  │
│  │     ...      │     Agent     │  • By Vendor        │  │
│  │ invoice-020. │   Extract     │  • Insights         │  │
│  │   pdf        │               │                     │  │
│  └──────────────┘               │  CSV / Excel        │  │
│                                 │  (Ready to use)     │  │
│                                 └─────────────────────┘  │
└──────────────────────────────────────────────────────────┘

The Pattern:

  1. You have multiple PDFs with similar structure (invoices, reports, forms)
  2. Agent extracts specified fields from all PDFs
  3. Agent creates summary table with basic analysis
  4. You review, export, and use for reporting/decision-making

Try It Out

Tip

Choose the scenario that best matches your needs, then adapt the prompt to fit your content and goals.

  1. Prepare your PDFs — Collect all PDFs in one folder (ensure they're text-based, not scanned)
  2. Choose an agent — All agents support data extraction (Oscar - Operations Analyst is recommended for data analysis, and Claire - Claims Specialist is recommended for claiming related tasks)
  3. Connect your folder — Select agent → Click "Browse" → Choose folder with PDFs
  4. Enter your prompt - Use the examples below as inspiration. Adapt them to your content and goals

Featured Example: Invoice Data Extraction

Scenario: You have 20 invoice PDFs from different vendors and need to extract key data into a summary table for accounts payable processing.

Example Files:

  • invoices/ — Folder with 20 invoice PDFs (invoice-001.pdf through invoice-020.pdf)

Example Prompt:

Tip

Use Plan Mode first to review the proposed extraction structure before building. Learn more about the different modes in How to Interact with an AI Agent

Extract data from all invoice PDFs in the invoices folder and create a summary table.

**Fields to extract:**
Invoice Number, Invoice Date, Vendor Name, Total Amount, Due Date, PO Number

**Output:**
- CSV file: invoice-summary.csv
- Summary: total amount, average invoice, count by vendor, overdue invoices
- Flag invoices over $10,000 for review

Make It Your Own

Don't simply copy this prompt, adapt it. Ask yourself:

  • What PDFs are you processing: invoices, reports, forms, statements?
  • What fields do you need: amounts, dates, names, numbers, codes?
  • What analysis do you want: totals, averages, grouping, trends?

Examples:

  • Expense reports → "Extract employee, date, amount, category. Summarize by employee and category"
  • Purchase orders → "Extract PO number, vendor, items, total. Flag POs over budget"

More Examples to Inspire You

Example 2: Quarterly Financial Report Comparison — See how to compare data across time periods

Scenario: You have quarterly financial reports (PDFs) and need to compare key metrics across quarters to identify trends.

Example Files:

  • financial-reports/ — Folder with 4 quarterly report PDFs (Q1-2025.pdf through Q4-2025.pdf)

Example Prompt:

Extract and compare data from all quarterly financial report PDFs.

**Fields to extract:**
Revenue, Operating Expenses, Net Profit, Cash Flow, Headcount

**Output:**
- Excel file: financial-comparison.xlsx (quarterly data + comparisons)
- Summary: QoQ growth %, trends, best/worst quarter

Make It Your Own

Adapt this for:

  • Monthly sales reports: "Extract revenue by region, compare month-over-month growth"
  • Budget vs actual: "Extract budget and actual from each report, calculate variances"
  • KPI dashboards: "Extract KPIs from each period, track progress toward targets"

Example 3: Claims Pattern Analysis — See how to find patterns in form data

Scenario: You have 50 claim form PDFs and need to analyze patterns in claim amounts, types, and frequencies.

Example Files:

  • claims/ — Folder with 50 claim form PDFs

Example Prompt:

Extract data from all claim form PDFs and analyze patterns.

**Fields to extract:**
Claim Number, Claim Date, Claim Type, Claim Amount, Claimant Name, Status

**Output:**
- CSV file: claims-data.csv
- Summary: total claims, approval rate %, by claim type, top 5 highest, outliers

Make It Your Own

Adapt this for:

  • Customer feedback: "Extract ratings and comments, analyze sentiment by product"
  • Inspection reports: "Extract pass/fail status, defect types, analyze by location"
  • Time sheets: "Extract hours by employee and project, analyze utilization"

Make It Even Better

Quick Wins

  • Use text-based PDFs — Scanned PDFs need OCR first. Ensure PDFs have selectable text
  • Specify exact fields — e.g. "Extract: Invoice Number, Date, Vendor Name, Total Amount"
  • Request summary statistics — e.g. "Calculate: totals, averages, counts by category"
  • Define output format upfront — e.g. "Save as CSV with columns: [list columns]"
  • Ask for outlier detection — e.g. "Flag any values over [threshold] for review"

Review & Refine

Always verify extracted data before using for reporting or decisions.

What to Check:

  • Completeness — All PDFs were processed, none skipped
  • Accuracy — Extracted values match the source PDFs
  • Format consistency — Dates, currency, numbers formatted correctly
  • Missing data — Fields marked "Not Found" are actually missing

How to Request Corrections:

For missing PDFs:

"The extraction only processed 18 of 20 PDFs. Please check if [file names] were included and re-extract."

For incorrect extraction:

"Invoice amounts for [vendor] are incorrect. The correct amounts should be [values]. Please re-extract."

For missing fields:

"The PO Number field shows 'Not Found' for all invoices, but most invoices have PO numbers. Please re-extract this field."

For format issues:

"Dates are in different formats. Please standardize all dates to YYYY-MM-DD format."

PDF Analysis Tips

  • Invoices: Extract vendor, date, amount, PO number. Summarize by vendor and month
  • Financial Reports: Extract revenue, expenses, profit. Compare periods, calculate growth %
  • Forms/Claims: Extract type, amount, date, status. Analyze patterns and outliers

Reference & Details

Advanced Prompting Tips — Get better results with these techniques

1. Specify Exact Fields

✅ Good: "Extract: Invoice Number, Invoice Date, Vendor Name, Total Amount, Due Date"

❌ Vague: "Extract all the important data from the invoices"

2. Define Output Structure

✅ Good: "Save as CSV with columns: Invoice#, Date, Vendor, Amount, Due Date, Status"

❌ Vague: "Create a summary table"

3. Request Calculations

✅ Good: "Calculate: total amount, average invoice, count by vendor, overdue count"

❌ Vague: "Add some summary statistics"

4. Handle Missing Data

✅ Good: "If a field is not found, mark it as 'Not Found' and continue processing"

❌ Vague: "Handle missing data"

5. Define Thresholds

✅ Good: "Flag invoices over $10,000 for review. Highlight vendors with > 5 invoices"

❌ Vague: "Flag unusual items"

6. Specify Formatting

✅ Good: "Format dates as YYYY-MM-DD. Format currency with 2 decimal places. Sort by date"

❌ Vague: "Format it nicely"

7. Iterate on Extraction After first extraction:

✅ Good: "Good start! Now also extract the payment terms and add a column for days until due"

❌ Vague: "Add more information"


Common PDF Types — What you can extract from

Financial Documents

PDF TypeCommon FieldsTypical Analysis
InvoicesInvoice #, Date, Vendor, Amount, Due Date, PO #Totals by vendor, aging, overdue
ReceiptsDate, Vendor, Amount, Category, Payment MethodTotals by category, monthly spend
Bank StatementsDate, Description, Debit, Credit, BalanceCash flow, categorization, trends
Financial ReportsRevenue, Expenses, Profit, Assets, LiabilitiesPeriod comparison, growth %, ratios

Business Forms

PDF TypeCommon FieldsTypical Analysis
Claims FormsClaim #, Date, Type, Amount, StatusApproval rates, patterns, outliers
Purchase OrdersPO #, Date, Vendor, Items, TotalSpend by vendor, budget variance
Expense ReportsEmployee, Date, Amount, Category, ProjectBy employee, by project, policy compliance
Time SheetsEmployee, Date, Project, HoursUtilization, project hours, overtime

Reports & Statements

PDF TypeCommon FieldsTypical Analysis
Sales ReportsDate, Product, Region, Quantity, RevenueBy product, by region, trends
Inventory ReportsSKU, Description, Quantity, Location, ValueStock levels, turnover, valuation
Customer StatementsCustomer, Date, Transaction, Amount, BalanceAR aging, payment patterns
Compliance ReportsMetric, Target, Actual, StatusCompliance rate, gaps, trends

Troubleshooting — Common issues and solutions

Quick Fixes

IssueWhat to Try
PDF not processed"Check if [file name] is in the folder and is a valid PDF. Re-process all files"
Scanned PDF (image)"This is a scanned image PDF. Use OCR tool first or convert to text-based PDF"
Field not found"The field [name] is labeled as [different name] in the PDFs. Update field name"
Inconsistent formats"PDFs have different formats. Extract what's common, note variations in report"
Wrong data extracted"The extracted [field] is incorrect. It should be from [section/location] in PDF"
Missing calculations"Add calculations: [list calculations]. Update the summary report"
Export not working"Save the output as [CSV/Excel] format. Ensure file is not locked"

PDF-Specific Issues

IssuePDF TypeWhat to Try
Tables not extracted correctlyFinancial reports"Extract table data row by row. Preserve column structure"
Multi-page PDFsLong reports"Process all pages. Combine data from entire document"
Different layoutsMixed vendors"Extract based on field labels, not position. Handle format variations"
Handwritten notesForms"Skip handwritten fields or mark as 'Manual Review Required'"
Password protectedSecure PDFs"Remove password protection first or provide password"
Corrupted filesAny PDF"Skip corrupted files, list them in report for manual processing"

Technical Details — How the output works

Output Formats

CSV (Comma-Separated Values)

  • ✅ Universal format (opens in Excel, Google Sheets, any spreadsheet)
  • ✅ Easy to import into databases or BI tools
  • ✅ Lightweight, fast to process
  • ❌ No formatting (no colors, formulas, multiple sheets)
  • Best For: Data extraction, further analysis, importing to other systems

Excel (XLSX)

  • ✅ Multiple sheets (data + summary + analysis)
  • ✅ Formulas and calculations
  • ✅ Formatting (colors, conditional formatting, charts)
  • ✅ Pivot tables for analysis
  • Best For: Financial analysis, dashboards, sharing with stakeholders

Markdown Report

  • ✅ Readable in any text editor
  • ✅ Easy to convert to PDF or HTML
  • ✅ Version control friendly
  • ✅ Can include tables, charts (as code)
  • Best For: Summary reports, insights documentation, sharing findings

Data Quality Considerations

Accuracy Checks:

  • Spot-check 10% of extractions against source PDFs
  • Verify totals and calculations
  • Check for duplicate entries
  • Validate date ranges and amount ranges

Handling Variations:

  • Different PDF layouts: Extract by field label, not position
  • Missing fields: Mark as "Not Found" or "N/A"
  • Format differences: Standardize in output (dates, currency, numbers)
  • Multi-currency: Convert to base currency or note currency per row

Best Practices:

  1. Keep original PDFs organized in folders
  2. Name output files with date (e.g., invoice-summary-2026-03-27.csv)
  3. Document any manual adjustments made
  4. Save extraction prompts for future use
  5. Version control: Keep track of extraction iterations

Processing Limits

Typical Capacity:

  • 10-50 PDFs: Quick processing (2-5 minutes)
  • 50-200 PDFs: Moderate processing (5-15 minutes)
  • 200+ PDFs: Consider batching into groups

File Size:

  • Individual PDF: Up to 10MB recommended
  • Total batch: Up to 500MB for optimal performance
  • Larger batches: Split into multiple folders

Related Use Cases

  • Customer Insight & Strategy Recommendations — Analyze extracted data to generate deeper insights and recommendations
  • Auto-Fill Forms (DOCX, Excel, PDF) — Populate forms with extracted data
  • Check Documents for Compliance — Validate extracted data against compliance rules
  • Create Presentation Slides from Your Documents — Present your analysis findings to stakeholders
Prev
Customer Insight & Strategy Recommendations
Next
Create a Promotional Website