How It Works

We believe in transparent, understandable systems. This diagram shows the core data flow:


[ GitHub Repo ]
|
| (1) Polls for changes (git push, PR)
|
[ Polling Dispatcher ]
|
| (2) Enqueues job
|
[ SQLite Job Queue ]
|
| (3) Dequeues job
|
[ HorseCI Runner ] --- (git worktree) ---> [ Local Cache ]
|
| (4) Reports status/logs
|
[ GitHub Checks API ]

HorseCI's architecture prioritizes simplicity and robustness through deliberate technical trade-offs. Our goal is a system that is both fast and easy to reason about.

* Polling Dispatcher: For our initial implementation, we chose polling over webhooks for its implementation simplicity and reliability guarantees. While this introduces some latency (typically 10-30 seconds), it ensures no events are missed without the complexity of managing a webhook consumer service with retries and dead-letter queues. For side projects and small teams, this trade-off favors simplicity over sub-second trigger latency.

* SQLite Job Queue: A simple, robust, embedded database manages the job queue on each host. This avoids the operational complexity and network latency of a separate database server or message broker for this core task. We acknowledge this creates a single point of failure for the dispatcher—if the host fails, queued jobs are lost. Runners already executing jobs will complete normally. For our current scale and target use case (side projects, small teams), this is an acceptable trade-off. We plan to add replication for higher availability as we grow.

* Isolated git worktree Checkouts: Instead of fresh clones, we use git worktree to create pristine, isolated filesystem environments for each job. This is significantly faster and more efficient for repositories of all sizes.

Limitations & Future Work

We are upfront about our current limitations:

Single Point of Failure: The polling dispatcher currently runs on a single host. If it fails, new jobs will not be scheduled until it recovers. Runners already executing jobs will complete normally. This is acceptable for side projects but not for mission-critical enterprise CI. We are working on a replicated dispatcher design.

Polling Latency: Jobs start 10-30 seconds after the triggering event. This is fine for most development workflows but not suitable for urgent production hotfixes requiring immediate CI feedback.

No Horizontal Scaling (Yet): The current architecture is designed for single-tenant, small-to-medium scale. We will add horizontal scaling as we approach capacity limits.

Runner Performance

Our runners are designed for maximum speed and efficiency.

git worktree: Instead of performing a full git clone for every job, we maintain a single bare repository for your project. Each job runs in a clean, isolated git worktree, which is significantly faster to create and tear down.
Workspace Caching: Artifacts and dependencies can be cached intelligently between runs of the same branch or PR, further speeding up build and test cycles.
Performance-Aware Concurrency: We monitor I/O performance on runner hosts using Linux's /proc/diskstats. If the 5-second rolling average of await (average I/O wait time in ms) exceeds 20ms, we pause new job scheduling until the backlog clears. This prevents the "thundering herd" problem when many jobs start simultaneously and overwhelm the storage subsystem.