Documentation — DeepTrace

What is DeepTrace?

DeepTrace is a deterministic web research engine designed for systematic, reproducible exploration of web content. It operates without AI, LLMs, or machine learning components in version 1.

The engine follows rule-based procedures to generate structured question trees, inspect web pages methodically, and produce verifiable research outputs. Every operation is traceable and reproducible with identical inputs yielding identical outputs.

Version 1 establishes a foundation of deterministic research, prioritizing transparency and control over convenience. The system is designed for researchers who require consistent, verifiable results rather than probabilistic or conversational interaction.

What DeepTrace is NOT

Not a chatbot: DeepTrace does not engage in conversational interaction. It follows deterministic research workflows with structured inputs and outputs.

Not an AI assistant: The engine contains no neural networks, language models, or machine learning algorithms in version 1. All operations are rule-based.

Not a search engine: While it utilizes web content, DeepTrace performs active research by visiting and analyzing pages rather than indexing and ranking links.

Not a scraper for illegal content: The tool is designed for legitimate research purposes. Users must comply with all applicable laws and website terms of service.

How DeepTrace v1 Works

The research process follows a deterministic sequence:

CLI-based workflow: All interaction occurs through a command-line interface.
Deterministic question generation: Research topics are decomposed into structured question trees using rule-based algorithms.
Structured page inspection: Web pages are visited and analyzed according to predefined inspection patterns.
Controlled research depth: Exploration depth is configurable and follows predictable branching patterns.
JSON-based outputs: All findings are stored as structured JSON data with source attribution.

The engine maintains a research log documenting every operation, allowing complete traceability from input to output.

Installation

DeepTrace requires Node.js and runs entirely locally without cloud services:

# Clone the repository
git clone https://github.com/ximantaxyz/deeptrace.git
cd deeptrace

# Install dependencies
npm install

# Verify installation
node cli.js --version

System requirements: Node.js 16+ with npm. No additional services or APIs are required for operation.

Note: The engine operates within your local environment. All data remains on your machine unless explicitly shared.

Running the Engine

Start the research engine from the command line:

node cli.js

The engine will prompt for three inputs:

Research topic: The primary subject for investigation.
Optional instruction: Specific constraints or focus areas for the research.
Optional URLs: Starting points for web exploration (if omitted, the engine will begin with search results).

Once initiated, the engine proceeds step-by-step:

Generates question tree based on the topic
Systematically visits relevant web pages
Extracts and verifies information
Stores findings in structured format
Produces synthesis output

The process is fully logged and can be interrupted or resumed at any point.

Output Files

DeepTrace generates several output files in the outputs/ directory:

research_data.json: Structured data containing all extracted information with source URLs and timestamps.
research_log.txt: Complete log of all operations including page visits, extractions, and decisions.
synthesis_output.json: Consolidated research findings organized by the question tree structure.
question_tree.json: The generated question structure that guided the research process.

All files use consistent naming conventions with timestamps to allow multiple research sessions:

outputs/
├── research_20240128_152430_data.json
├── research_20240128_152430_log.txt
├── research_20240128_152430_synthesis.json
└── research_20240128_152430_questions.json

Outputs are designed for further analysis, integration with other tools, or manual examination.

Folder Structure

The project follows a modular architecture with clear separation of concerns:

deeptrace/
├── cli.js                    # Command-line interface entry point
├── questioner.js             # Deterministic question tree generation
├── refiner.js                # Question refinement and optimization
├── inspector.js              # Web page inspection and extraction
├── storage.js                # Structured data storage and retrieval
├── synthesizer.js            # Research synthesis and output generation
├── outputs/                  # Generated research files
├── package.json              # Dependencies and project configuration
└── README.md                 # Project overview and quick start

Each module has a single responsibility and can be examined, tested, or modified independently. The architecture supports deterministic behavior through predictable module interactions.

Limits of v1

Version 1 has intentional design boundaries:

No AI reasoning: All decisions are rule-based without neural network inference.
No natural language summaries: Outputs are structured data, not narrative text.
Deterministic only: Probabilistic or generative approaches are excluded.
Foundational design: The architecture prioritizes reliability over complexity.

These limits ensure consistent, verifiable operation at the expense of flexibility and conversational interaction. Version 1 serves as a stable baseline for deterministic web research.

Future Direction

While v1 establishes deterministic operation, future versions may explore:

v2 with AI-assisted synthesis: Optional integration of language models for summary generation while maintaining deterministic core processes.
Enhanced research patterns: Additional deterministic algorithms for different research methodologies.
Extended output formats: Support for different data structures and export options.

Version 1 will remain available as a stable, deterministic baseline. Any future development will maintain transparency and reproducibility as core principles, with clear documentation of any probabilistic components.

Safety & Ethics

DeepTrace is a tool that requires responsible usage:

User responsibility: Researchers are responsible for compliance with all applicable laws and regulations.
Respect website terms: The engine should only be used in accordance with the terms of service of visited websites.
No automated abuse: The tool includes rate limiting and respectful crawling patterns, but users must ensure their research does not disrupt web services.
Ethical research: DeepTrace should be used for legitimate research purposes with respect for privacy and intellectual property.

The open-source nature of the project allows community review and contributes to transparent, accountable tool development.