Implementing ArtifactStore: A Core Service Guide

by Alex Johnson 49 views

Overview

This document details the implementation of a foundational ArtifactStore service, designed to manage pipeline artifacts using established WCF service patterns. ArtifactStore provides a robust and scalable solution for handling artifacts generated during pipeline execution. This service is crucial for decoupling storage implementation from execution logic, paving the way for future distributed execution capabilities, and ensuring proper lifecycle management and error handling.

What

We are creating a service abstraction for artifact storage. This involves defining an abstract interface, implementing a concrete in-memory storage solution, and employing a factory pattern for selecting different backends. This approach mirrors the successful architecture of LLMService. The abstract interface will define the core operations for interacting with the artifact store, while the in-memory implementation provides a simple, yet functional, storage mechanism for initial development and testing. The factory pattern allows for easy swapping of storage backends without modifying the core service logic.

Why

Currently, the Executor utilizes a local dictionary for artifact storage. This approach suffers from several limitations:

  • It tightly couples the storage implementation to the execution logic.
  • It prevents future distributed execution capabilities.
  • It lacks proper lifecycle management and error handling.

By introducing an ArtifactStore service, we address these limitations and enable a unified connector architecture. This service provides the foundation for future distributed backends, allowing for seamless scaling and improved resilience. The unified connector architecture will streamline the integration of different storage solutions, making it easier to adapt to changing requirements and leverage new technologies. Furthermore, proper lifecycle management and error handling ensure that artifacts are stored and retrieved reliably, even in the face of failures.

Goal

The primary goal is to create a clean service abstraction that supports multiple storage backends while maintaining simplicity for the initial in-memory implementation. The service abstraction should be well-defined and easy to understand, allowing developers to quickly integrate it into their pipelines. The in-memory implementation should be lightweight and efficient, providing a practical solution for local development and testing. By focusing on simplicity and flexibility, we can ensure that the ArtifactStore service is both easy to use and adaptable to future needs.

Changes

To achieve this goal, the following changes will be implemented:

  • ArtifactStore abstract base class with core operations (save, get, exists, clear)
  • Exception hierarchy (ArtifactStoreError, ArtifactStoreNotFoundError)
  • ArtifactStoreConfiguration class with environment variable support
  • ArtifactStoreFactory implementing ServiceFactory protocol
  • InMemoryArtifactStore with thread-safe dict-based storage
  • Comprehensive test suite covering configuration, factory, and implementation

The ArtifactStore abstract base class will define the core operations for interacting with the artifact store, providing a consistent interface for all storage backends. The exception hierarchy will provide detailed error information, making it easier to diagnose and resolve issues. The ArtifactStoreConfiguration class will allow users to configure the service using environment variables, providing a flexible and portable configuration mechanism. The ArtifactStoreFactory will be responsible for creating instances of the ArtifactStore based on the configuration, allowing for easy swapping of storage backends. The InMemoryArtifactStore will provide a simple, yet functional, storage mechanism for initial development and testing. Finally, the comprehensive test suite will ensure that the service is robust and reliable.

Expected Outcome & Usage Example

The expected outcome is a fully functional ArtifactStore service that can be easily integrated into existing pipelines. Here's an example of how to use the service:

from waivern_artifact_store import (
 ArtifactStoreFactory,
 ArtifactStoreConfiguration
)

# Create store via factory (supports env vars and defaults)
config = ArtifactStoreConfiguration(backend="memory")
factory = ArtifactStoreFactory(config)
store = factory.create()

# Save artifact
store.save(step_id="extract", message=output_message)

# Retrieve artifact
message = store.get(step_id="extract")

# Check existence
if store.exists(step_id="extract"):
 # Process artifact

# Cleanup
store.clear()

This example demonstrates how to create an ArtifactStore instance using the factory, save an artifact, retrieve it, check its existence, and clean up the store. The factory pattern simplifies the creation of the ArtifactStore instance, allowing for easy configuration and backend selection. The save and get methods provide a simple interface for interacting with the artifact store, while the exists method allows users to check for the existence of an artifact before attempting to retrieve it. Finally, the clear method allows users to clean up the store, ensuring that resources are released properly.

Testing

The following testing procedures will be implemented to ensure the quality and reliability of the ArtifactStore service:

  • All quality checks pass
  • All existing and new tests pass
  • Thread safety verified with concurrent operations
  • Configuration tested with layered precedence (explicit > env > defaults)

The quality checks will ensure that the code meets the required coding standards and best practices. The existing and new tests will verify the functionality of the service, ensuring that it behaves as expected. Thread safety will be verified with concurrent operations to ensure that the service can handle multiple requests simultaneously without any data corruption. Configuration will be tested with layered precedence to ensure that the service is configured correctly based on the available configuration sources.

Documentation

Detailed documentation is available for the ArtifactStore service:

  • Full specification: docs/development/active/artifact-store-service-plan.md
  • Task details: docs/development/active/artifect-store/artifact-store-task-1-core-service.md

The documentation provides a comprehensive overview of the service, including its architecture, design, and usage. It also includes detailed information about the configuration options and the available APIs. The documentation is intended to be a valuable resource for developers who are integrating the ArtifactStore service into their pipelines.

Dependencies

The ArtifactStore service has the following dependencies:

  • Builds on WCF service patterns (LLMService architecture)
  • Foundation for unified connector architecture

The WCF service patterns provide a proven and reliable framework for building distributed services. The unified connector architecture simplifies the integration of different storage solutions, making it easier to adapt to changing requirements and leverage new technologies.

Related

The ArtifactStore service is related to the following initiatives:

  • Part of #227 (Unified Connector Architecture)
  • Enables pipeline artifact management
  • Prerequisite for distributed execution capabilities

The unified connector architecture is a key initiative that aims to simplify the integration of different services and components. The ArtifactStore service plays a crucial role in this architecture by providing a centralized and consistent way to manage pipeline artifacts. This, in turn, enables pipeline artifact management and paves the way for future distributed execution capabilities.

For further reading on service-oriented architecture, check out this link.