Skip to Content
DataDistill API Documentation (v1)

DataDistill API Documentation (v1)

Welcome to the DataDistill API! This guide provides detailed documentation for our v1 REST API, which offers programmatic access to our powerful text processing and artifact management services.

API Base URL: https://api.datadistill.co/api/v1

Authentication

All requests to the /api/v1 gateway (excluding the public health check endpoint) must be authenticated using HTTP Basic Authentication.

  • Username: Your API Key
  • Password: Your API Secret (You must use the original, unhashed secret provided when you first created the key.)

You can generate and manage your API credentials from the “API Keys” section within your project settings on the DataDistill dashboard.

The Authorization header must be formatted as Basic <credentials>, where <credentials> is the Base64-encoded string of your-api-key:your-api-secret.

Authentication & Dynamic URL Examples

import requests import os import json # --- Configuration --- # Best practice: Store credentials as environment variables API_KEY = os.environ.get("DATADISTILL_API_KEY", "YOUR_API_KEY") API_SECRET = os.environ.get("DATADISTILL_API_SECRET", "YOUR_API_SECRET") # Base URL for all API v1 endpoints BASE_URL = "https://api.datadistill.co/api/v1" # The library handles the Base64 encoding for you. auth_credentials = (API_KEY, API_SECRET) # --- Dynamic Request Function --- def make_api_request(method, endpoint_path, params=None, json_payload=None, files=None): """A helper function to make authenticated API requests.""" full_url = f"{BASE_URL}{endpoint_path}" print(f"Making {method} request to: {full_url}") try: response = requests.request( method, full_url, auth=auth_credentials, params=params, json=json_payload, files=files ) response.raise_for_status() # Raises an exception for bad status codes (4xx or 5xx) print(f"Status Code: {response.status_code}") # For 204 No Content, there is no body if response.status_code != 204: print("Response JSON:") print(json.dumps(response.json(), indent=2)) except requests.exceptions.HTTPError as http_err: print(f"HTTP error occurred: {http_err}") print(f"Response body: {response.text}") except requests.exceptions.RequestException as e: print(f"An error occurred: {e}")

Metering and Credit System

API usage is metered through a credit-based system. Each request consumes credits, which are automatically debited from your account. If your balance is insufficient, the API will respond with an HTTP 402 Payment Required. You can manage your credits at https://app.datadistill.co .

General Error Response Format (for 4xx/5xx errors):

{ "error": { "code": "INSUFFICIENT_CREDITS", "message": "Your account does not have enough credits to process this request.", "details": "Current balance: 0 credits. Required: 5 credits." } }

API Health Check

GET / - Check API Status

  • Description: Confirms that the API is running and accessible. This is the only endpoint that does not require authentication.
  • Credit Cost: 0
  • Parameters: None
  • Use Case: Use this as a basic health check in your monitoring systems to ensure the API is reachable before attempting authenticated requests.
make_api_request("GET", "/")
  • Success Response (200 OK):
    { "status": "API is running", "timestamp": "2025-09-22T14:00:00.123456Z", "version": "v1" }

Text Processing API

Base Path: /api/v1/text-processing

These endpoints stream responses using Server-Sent Events (SSE). Your client should listen for a data event containing the final JSON payload. The stream begins with an initial event and ends with a final data: [DONE] marker.

General Streaming Response Format:

  • Events: data: {"progress": "processing", "eta": "2s"}
  • Final: data: {"result": {...}, "status": "completed"}

POST /ai-detect

  • Description: Analyzes text to determine the probability that it was generated by an AI model.
  • Credit Cost: 1
  • Parameters:
    • text (string, required): The text to analyze for AI generation.
  • Use Case: Integrate into a content management system to flag submissions that may be AI-generated for editorial review, helping to maintain content authenticity.
make_api_request("POST", "/text-processing/ai-detect", json_payload={"text": "This text might be AI generated."})
  • Success Response (200 OK, streamed final data):
    { "result": { "ai_probability": 0.85, "verdict": "likely_ai", "reasoning": "Repetitive structure and unnatural phrasing detected." }, "status": "completed", "credits_consumed": 1 }

POST /clone-writing-style

  • Description: Analyzes a sample text and generates new content in the same style.
  • Credit Cost: 5
  • Parameters:
    • sample_text (string, required): The reference text exemplifying the desired style.
    • new_text_prompt (string, required): The prompt for generating new content.
  • Use Case: Create marketing copy that matches your brand’s established tone of voice by providing a sample of past successful content.
make_api_request("POST", "/text-processing/clone-writing-style", json_payload={"sample_text": "The quick brown fox jumps over the lazy dog in a whimsical manner.", "new_text_prompt": "Write a short story about a cat."})
  • Success Response (200 OK, streamed final data):
    { "result": { "cloned_text": "The sly tabby cat leaps across the moonlit fence with a playful twirl.", "style_match_score": 0.92 }, "status": "completed", "credits_consumed": 5 }

POST /humanize

  • Description: Makes AI-generated text sound more natural and less robotic.
  • Credit Cost: 3
  • Parameters:
    • text (string, required): The AI-generated text to humanize.
    • target_audience (string, optional, default: “general”): The intended audience (e.g., “general”, “professional”, “casual”).
  • Use Case: Refine AI-generated blog post drafts or product descriptions to make them more engaging and relatable to a human audience.
make_api_request("POST", "/text-processing/humanize", json_payload={"text": "The product is very efficient and performs optimally.", "target_audience": "general"})
  • Success Response (200 OK, streamed final data):
    { "result": { "humanized_text": "This product works like a charm—super efficient and gets the job done without any fuss.", "changes_made": 3 }, "status": "completed", "credits_consumed": 3 }

POST /grammar-check

  • Description: Corrects grammatical errors in the provided text.
  • Credit Cost: 2
  • Parameters:
    • text (string, required): The text to check and correct for grammar.
  • Use Case: Build a writing assistant application that provides real-time grammar and spelling suggestions to users as they type.
make_api_request("POST", "/text-processing/grammar-check", json_payload={"text": "he walk to store."})
  • Success Response (200 OK, streamed final data):
    { "result": { "corrected_text": "He walks to the store.", "corrections": [ { "original": "he walk", "corrected": "He walks", "reason": "Subject-verb agreement and capitalization" } ] }, "status": "completed", "credits_consumed": 2 }

POST /summarize

  • Description: Creates a concise summary of a longer piece of text.
  • Credit Cost: 4
  • Parameters:
    • text (string, required): The long text to summarize.
    • max_length (integer, optional, default: 100): Maximum length of the summary in words.
  • Use Case: Automatically generate executive summaries for long business reports or abstracts for academic papers.
make_api_request("POST", "/text-processing/summarize", json_payload={"text": "A long article about climate change...", "max_length": 150})
  • Success Response (200 OK, streamed final data):
    { "result": { "summary": "Climate change poses significant risks to global ecosystems, requiring immediate action.", "word_count": 12, "coverage_score": 0.95 }, "status": "completed", "credits_consumed": 4 }

POST /paraphrase

  • Description: Rewrites text while retaining the original meaning.
  • Credit Cost: 3
  • Parameters:
    • text (string, required): The text to paraphrase.
    • creativity (float, optional, default: 0.5, range: 0.0-1.0): Level of creative rephrasing (0.0: minimal change, 1.0: highly creative).
  • Use Case: Avoid plagiarism by rephrasing source material for research papers or generate multiple unique versions of a product description for A/B testing.
make_api_request("POST", "/text-processing/paraphrase", json_payload={"text": "The weather is nice today.", "creativity": 0.8})
  • Success Response (200 OK, streamed final data):
    { "result": { "paraphrased_text": "Today's climate is delightfully pleasant.", "similarity_score": 0.88 }, "status": "completed", "credits_consumed": 3 }

Artifact Management (API)

Base Path: /api/v1/artifacts

GET / - List Artifacts

  • Description: Lists all artifacts for the user with pagination and filtering.
  • Credit Cost: 1
  • Parameters (query):
    • status (string, optional): Filter by status (e.g., “processing”, “ready”, “failed”).
    • limit (integer, optional, default: 10, max: 100): Number of artifacts per page.
    • page (integer, optional, default: 1): Page number.
  • Use Case: Populate a dashboard in your application that shows a user all the files they have previously uploaded.
make_api_request("GET", "/artifacts", params={"status": "ready", "limit": 10})
  • Success Response (200 OK):
    { "artifacts": [ { "id": "art_12345678-1234-1234-1234-123456789abc", "filename": "document.pdf", "status": "ready", "size_bytes": 102400, "upload_date": "2025-09-30T10:00:00Z", "content_type": "application/pdf" } ], "pagination": { "page": 1, "limit": 10, "total": 50, "pages": 5 }, "credits_consumed": 1 }

GET /search - Search Artifacts

  • Description: Provides advanced search for a user’s artifacts.
  • Credit Cost: 2
  • Parameters (query):
    • q (string, required): Search query (filename, metadata, or content keywords).
    • limit (integer, optional, default: 10): Number of results.
  • Use Case: Implement a search bar in your application that allows users to find specific documents by filename, content hash, or upload date.
make_api_request("GET", "/artifacts/search", params={"q": "invoice"})
  • Success Response (200 OK):
    { "results": [ { "id": "art_87654321-4321-4321-4321-cba987654321", "filename": "invoice_2025.pdf", "relevance_score": 0.95, "snippet": "Invoice #INV-123 for services rendered." } ], "total": 3, "credits_consumed": 2 }

GET {artifact_id} - Get Artifact Details

  • Description: Retrieves all metadata and job history for a specific artifact.
  • Credit Cost: 1
  • Path Parameters:
    • artifact_id (string, required): The unique ID of the artifact.
  • Use Case: Display a detailed view of a selected file, showing its status, type, size, and a history of all processing jobs performed on it.
ARTIFACT_ID = "art_12345678-1234-1234-1234-123456789abc" make_api_request("GET", f"/artifacts/{ARTIFACT_ID}")
  • Success Response (200 OK):
    { "artifact": { "id": "art_12345678-1234-1234-1234-123456789abc", "filename": "document.pdf", "status": "ready", "size_bytes": 102400, "upload_date": "2025-09-30T10:00:00Z", "content_type": "application/pdf", "metadata": { "description": "Sample invoice" }, "job_history": [ { "job_id": "job_abc123", "type": "extraction", "status": "completed", "timestamp": "2025-09-30T10:05:00Z" } ] }, "credits_consumed": 1 }

PATCH {artifact_id} - Update Artifact Metadata

  • Description: Updates an artifact’s mutable metadata.
  • Credit Cost: 1
  • Path Parameters:
    • artifact_id (string, required): The unique ID of the artifact.
  • Request Body:
    • description (string, optional): Updated description.
    • tags (array of strings, optional): Updated tags.
  • Use Case: Allow users to rename their uploaded files or add descriptive notes for better organization within your application.
ARTIFACT_ID = "art_12345678-1234-1234-1234-123456789abc" make_api_request("PATCH", f"/artifacts/{ARTIFACT_ID}", json_payload={"description": "Updated invoice description."})
  • Success Response (200 OK):
    { "updated_artifact": { "id": "art_12345678-1234-1234-1234-123456789abc", "description": "Updated invoice description." }, "credits_consumed": 1 }

DELETE {artifact_id} - Delete Artifact

  • Description: Deletes an artifact and its associated data.
  • Credit Cost: 2
  • Path Parameters:
    • artifact_id (string, required): The unique ID of the artifact.
  • Use Case: Provide a “delete” button for users to permanently remove their files and associated data from the system.
ARTIFACT_ID = "art_12345678-1234-1234-1234-123456789abc" make_api_request("DELETE", f"/artifacts/{ARTIFACT_ID}")
  • Success Response (204 No Content)

POST /upload - Upload a Single File

  • Description: Uploads a single file directly and begins processing.
  • Credit Cost: 5 (includes initial upload and processing start)
  • Parameters (multipart/form-data):
    • file (file, required): The file to upload (supports PDF, DOCX, images, etc.).
  • Use Case: The primary method for getting user files into the DataDistill system for further processing and extraction.
with open("path/to/doc.pdf", "rb") as f: files = {"file": ("doc.pdf", f, "application/pdf")} make_api_request("POST", "/artifacts/upload", files=files)
  • Success Response (201 Created):
    { "artifact": { "id": "art_12345678-1234-1234-1234-123456789abc", "filename": "doc.pdf", "status": "processing", "upload_date": "2025-09-30T10:00:00Z" }, "credits_consumed": 5 }

POST {artifact_id}/cancel - Cancel Artifact Processing

  • Description: Cancels an artifact stuck in the ‘processing’ state.
  • Credit Cost: 1 (partial refund may apply for unused processing)
  • Path Parameters:
    • artifact_id (string, required): The unique ID of the artifact.
  • Use Case: Provide a way for users or administrators to stop a processing job that seems to be taking too long, preventing it from consuming further resources.
ARTIFACT_ID = "art_12345678-1234-1234-1234-123456789abc" make_api_request("POST", f"/artifacts/{ARTIFACT_ID}/cancel")
  • Success Response (200 OK):
    { "status": "cancelled", "message": "Processing job has been cancelled.", "credits_consumed": 1, "refunded_credits": 2 }