The Problem
An accounting firm receives 200-300 documents per week from clients during tax season: bank statements, W-2 forms, 1099s, receipts, mortgage interest statements, and charitable donation letters. Clients dump everything into a shared folder as multi-page PDFs, sometimes combining 15 different document types into a single scan.
Staff spend 12 hours per week just sorting, splitting, and renaming files before any actual accounting work begins. Last year, a missing 1099-DIV was not caught until April 10th, forcing an extension filing and an unhappy client conversation.
The Solution
Use the doc-parser skill to identify document types and extract metadata from each page, and the pdf-merge-split skill to automatically split combined documents, rename files based on content, and merge related documents into organized client folders.
Step-by-Step Walkthrough
1. Parse and classify incoming documents
Analyze each uploaded PDF to identify document types page by page:
Process all PDF files in /incoming/2026-02/. For each file, analyze every page and classify it as one of: W-2, 1099-INT, 1099-DIV, 1099-MISC, 1099-NEC, bank-statement, mortgage-interest-1098, charitable-donation, medical-receipt, business-expense, property-tax, or unknown. Extract the key metadata from each page: document date, payer/institution name, recipient name, and dollar amounts. Output a classification report showing file name, page ranges for each document type detected, and extracted metadata.
The classification must handle the reality that clients scan documents in arbitrary order. A single PDF might contain a W-2, three receipts, two pages of a bank statement, and then another W-2 from a different employer.
2. Split combined scans into individual documents
Separate multi-document PDFs at the boundaries identified in the classification step:
Using the classification results, split each multi-document PDF into individual files. The file "johnson_family_docs.pdf" contains pages 1-2 (W-2 from Medtronic), pages 3-4 (1099-INT from Chase Bank), pages 5-7 (mortgage statement from Wells Fargo), and pages 8-15 (8 pages of receipts). Split into 4 separate files. Name each output file with the pattern: [client-name][doc-type][institution]_[year].pdf. For example: johnson_w2_medtronic_2025.pdf.
Consistent naming eliminates the "which file has the Chase 1099?" question that accountants ask multiple times per client. The naming pattern also enables quick visual scanning of a folder to confirm completeness.
3. Validate extracted data against expected documents
Cross-reference parsed documents against the client's tax checklist to find missing items:
Compare the documents received from the Johnson family against their tax preparation checklist. Checklist expects: 2 W-2s (both spouses), 1099-INT from Chase Bank, 1099-DIV from Vanguard, mortgage interest statement from Wells Fargo, property tax receipt from Franklin County, and charitable donation receipts totaling at least $3,200. Report which items have been received, which are missing, and flag any unexpected documents that were not on the checklist.
Missing document detection is the highest-value part of this pipeline. Catching a missing 1099-DIV in February prevents a frantic scramble in April when the tax preparer discovers the gap mid-filing.
4. Merge related documents into client packages
Combine classified documents into organized packages for each tax preparer:
For each client folder in /clients/, merge all documents of the same type into combined files. Create: [client]_income_documents.pdf (all W-2s and 1099s), [client]_deductions.pdf (mortgage interest, property tax, charitable donations, medical receipts), and [client]_business.pdf (business expenses and 1099-NECs). Add a cover page to each merged file listing the included documents with page numbers. Place all merged files in /ready-for-review/[client-name]/.
The cover page with document index is essential. Tax preparers need to flip between documents during review, and a table of contents with page numbers turns a 40-page merged PDF into a navigable reference.
5. Generate a weekly intake summary
Produce a status report showing document processing volume and outstanding items:
Generate the weekly intake report for February 10-14, 2026. Show: total documents received (count and pages), documents successfully classified vs flagged as unknown, documents processed per client, clients with complete document sets ready for tax preparation, and clients with missing required documents. List the top 5 clients by document volume and any files that failed parsing and need manual review.
The weekly report drives the firm's workflow. Clients with complete document sets move into the preparation queue immediately. Clients with missing items get automated reminder emails specifying exactly what is needed.
Real-World Example
Monday morning, 47 new files land in the incoming folder from 12 clients. The parser classifies 189 pages across those files, identifying 67 individual documents including 14 W-2s, 23 1099 variants, 8 mortgage statements, and 22 receipt scans. Three clients uploaded everything as a single combined scan; those get split into 31 individual files automatically. The validation step finds that 8 of the 12 clients are still missing at least one required document, and the system generates reminder emails listing exactly what is needed.
The merged client packages go into the review queue with cover sheets. The firm's senior accountant reports that the organized packages cut her per-client review time by 30% because she no longer spends time hunting for specific documents within disorganized folders. What used to take a staff member all of Monday and half of Tuesday is done before the first coffee break.