mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-02-17 11:43:26 +08:00
Add nutrient-document-processing skill for PDF conversion, OCR, redaction, signing, and form filling via Nutrient DWS API.
166 lines
5.7 KiB
Markdown
166 lines
5.7 KiB
Markdown
---
|
|
name: nutrient-document-processing
|
|
description: Process, convert, OCR, extract, redact, sign, and fill documents using the Nutrient DWS API. Works with PDFs, DOCX, XLSX, PPTX, HTML, and images.
|
|
---
|
|
|
|
# Nutrient Document Processing
|
|
|
|
Process documents with the [Nutrient DWS Processor API](https://www.nutrient.io/api/). Convert formats, extract text and tables, OCR scanned documents, redact PII, add watermarks, digitally sign, and fill PDF forms.
|
|
|
|
## Setup
|
|
|
|
Get a free API key at **https://dashboard.nutrient.io/sign_up/?product=processor**
|
|
|
|
```bash
|
|
export NUTRIENT_API_KEY="pdf_live_..."
|
|
```
|
|
|
|
All requests go to `https://api.nutrient.io/build` as multipart POST with an `instructions` JSON field.
|
|
|
|
## Operations
|
|
|
|
### Convert Documents
|
|
|
|
```bash
|
|
# DOCX to PDF
|
|
curl -X POST https://api.nutrient.io/build \
|
|
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
|
-F "document.docx=@document.docx" \
|
|
-F 'instructions={"parts":[{"file":"document.docx"}]}' \
|
|
-o output.pdf
|
|
|
|
# PDF to DOCX
|
|
curl -X POST https://api.nutrient.io/build \
|
|
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
|
-F "document.pdf=@document.pdf" \
|
|
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}' \
|
|
-o output.docx
|
|
|
|
# HTML to PDF
|
|
curl -X POST https://api.nutrient.io/build \
|
|
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
|
-F "index.html=@index.html" \
|
|
-F 'instructions={"parts":[{"html":"index.html"}]}' \
|
|
-o output.pdf
|
|
```
|
|
|
|
Supported inputs: PDF, DOCX, XLSX, PPTX, DOC, XLS, PPT, PPS, PPSX, ODT, RTF, HTML, JPG, PNG, TIFF, HEIC, GIF, WebP, SVG, TGA, EPS.
|
|
|
|
### Extract Text and Data
|
|
|
|
```bash
|
|
# Extract plain text
|
|
curl -X POST https://api.nutrient.io/build \
|
|
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
|
-F "document.pdf=@document.pdf" \
|
|
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}' \
|
|
-o output.txt
|
|
|
|
# Extract tables as Excel
|
|
curl -X POST https://api.nutrient.io/build \
|
|
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
|
-F "document.pdf=@document.pdf" \
|
|
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}' \
|
|
-o tables.xlsx
|
|
```
|
|
|
|
### OCR Scanned Documents
|
|
|
|
```bash
|
|
# OCR to searchable PDF (supports 100+ languages)
|
|
curl -X POST https://api.nutrient.io/build \
|
|
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
|
-F "scanned.pdf=@scanned.pdf" \
|
|
-F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}' \
|
|
-o searchable.pdf
|
|
```
|
|
|
|
Languages: Supports 100+ languages via ISO 639-2 codes (e.g., `eng`, `deu`, `fra`, `spa`, `jpn`, `kor`, `chi_sim`, `chi_tra`, `ara`, `hin`, `rus`). Full language names like `english` or `german` also work. See the [complete OCR language table](https://www.nutrient.io/guides/document-engine/ocr/language-support/) for all supported codes.
|
|
|
|
### Redact Sensitive Information
|
|
|
|
```bash
|
|
# Pattern-based (SSN, email)
|
|
curl -X POST https://api.nutrient.io/build \
|
|
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
|
-F "document.pdf=@document.pdf" \
|
|
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"social-security-number"}},{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"email-address"}}]}' \
|
|
-o redacted.pdf
|
|
|
|
# Regex-based
|
|
curl -X POST https://api.nutrient.io/build \
|
|
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
|
-F "document.pdf=@document.pdf" \
|
|
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","strategyOptions":{"regex":"\\b[A-Z]{2}\\d{6}\\b"}}]}' \
|
|
-o redacted.pdf
|
|
```
|
|
|
|
Presets: `social-security-number`, `email-address`, `credit-card-number`, `international-phone-number`, `north-american-phone-number`, `date`, `time`, `url`, `ipv4`, `ipv6`, `mac-address`, `us-zip-code`, `vin`.
|
|
|
|
### Add Watermarks
|
|
|
|
```bash
|
|
curl -X POST https://api.nutrient.io/build \
|
|
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
|
-F "document.pdf=@document.pdf" \
|
|
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":72,"opacity":0.3,"rotation":-45}]}' \
|
|
-o watermarked.pdf
|
|
```
|
|
|
|
### Digital Signatures
|
|
|
|
```bash
|
|
# Self-signed CMS signature
|
|
curl -X POST https://api.nutrient.io/build \
|
|
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
|
-F "document.pdf=@document.pdf" \
|
|
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms"}]}' \
|
|
-o signed.pdf
|
|
```
|
|
|
|
### Fill PDF Forms
|
|
|
|
```bash
|
|
curl -X POST https://api.nutrient.io/build \
|
|
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
|
-F "form.pdf=@form.pdf" \
|
|
-F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","formFields":{"name":"Jane Smith","email":"jane@example.com","date":"2026-02-06"}}]}' \
|
|
-o filled.pdf
|
|
```
|
|
|
|
## MCP Server (Alternative)
|
|
|
|
For native tool integration, use the MCP server instead of curl:
|
|
|
|
```json
|
|
{
|
|
"mcpServers": {
|
|
"nutrient-dws": {
|
|
"command": "npx",
|
|
"args": ["-y", "@nutrient-sdk/dws-mcp-server"],
|
|
"env": {
|
|
"NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
|
|
"SANDBOX_PATH": "/path/to/working/directory"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## When to Use
|
|
|
|
- Converting documents between formats (PDF, DOCX, XLSX, PPTX, HTML, images)
|
|
- Extracting text, tables, or key-value pairs from PDFs
|
|
- OCR on scanned documents or images
|
|
- Redacting PII before sharing documents
|
|
- Adding watermarks to drafts or confidential documents
|
|
- Digitally signing contracts or agreements
|
|
- Filling PDF forms programmatically
|
|
|
|
## Links
|
|
|
|
- [API Playground](https://dashboard.nutrient.io/processor-api/playground/)
|
|
- [Full API Docs](https://www.nutrient.io/guides/dws-processor/)
|
|
- [Agent Skill Repo](https://github.com/PSPDFKit-labs/nutrient-agent-skill)
|
|
- [npm MCP Server](https://www.npmjs.com/package/@nutrient-sdk/dws-mcp-server)
|