---
phase: 01-poc
plan: "01"
subsystem: scaffold
tags:
  - scaffolding
  - postgres
  - alembic
  - sqlalchemy
  - uv
dependency_graph:
  requires: []
  provides:
    - govcrawler.models.Base
    - govcrawler.models.Site
    - govcrawler.models.Column_
    - govcrawler.models.Article
    - govcrawler.models.Attachment
    - govcrawler.models.CrawlLog
    - govcrawler.db.get_engine
    - govcrawler.db.get_sessionmaker
    - govcrawler.settings.get_settings
    - alembic migration 0001_initial (5 tables)
  affects: []
tech_stack:
  added:
    - patchright==1.58.2 (Apache-2.0)
    - parsel==1.11.0 (BSD-3)
    - GeneralNewsExtractor==0.1.3 (MIT)
    - trafilatura==2.0.0 (Apache-2.0)
    - SQLAlchemy==2.0.49 (MIT)
    - alembic==1.18.4 (MIT)
    - psycopg==3.3.3 / psycopg-binary==3.3.3 (LGPL-3, dynamic link)
    - simhash==2.1.2 (MIT)
    - httpx==0.28.1 (BSD-3)
    - pydantic-settings==2.14.0 (MIT)
    - python-slugify==8.0.4 (MIT)
    - tldextract==5.3.1 (BSD-3)
    - lxml==5.4.0 (BSD-3)
  patterns:
    - SQLAlchemy 2.0 DeclarativeBase + Mapped[] typed columns
    - Alembic hand-written migration (no autogenerate) aligned 1:1 with ORM
    - pydantic-settings BaseSettings reading from .env file
    - Docker Compose single-service postgres:16-alpine on host port 5433
key_files:
  created:
    - pyproject.toml
    - uv.lock
    - .env.example
    - .gitignore
    - docker-compose.yml
    - README.md
    - govcrawler/__init__.py
    - govcrawler/settings.py
    - govcrawler/db.py
    - govcrawler/models.py
    - alembic.ini
    - alembic/env.py
    - alembic/script.py.mako
    - alembic/versions/0001_initial_schema.py
  modified: []
decisions:
  - "Port 5433 used instead of 5432 (plan default): host port 5432 occupied by existing postgres_local container"
  - "DB_URL in .env.example updated to localhost:5433 accordingly"
  - "LGPL-3 psycopg[binary] accepted per user decision in CONTEXT.md additional_locked_decisions"
  - "Hand-wrote migration (not autogenerate) for deterministic column order and alignment with ORM"
metrics:
  duration: "42 minutes 18 seconds (dominated by patchright 39MB + postgres:16-alpine 109MB downloads)"
  completed: "2026-04-22"
  tasks_completed: 2
  tasks_total: 2
  files_created: 14
---

# Phase 1 Plan 01: Project Scaffold + PG Schema Summary

**One-liner:** uv 项目骨架 + psycopg3 同步 + SQLAlchemy 2.0 ORM + Alembic 迁移，在本地 PG16 容器上建立 sites/columns/articles/attachments/crawl_logs 五张表，upgrade/downgrade 双向验证通过。

## What Was Built

### Task 1: 项目骨架

创建了完整的 Python 项目骨架：

- **pyproject.toml**: PEP 621 格式，13 个运行依赖 + 3 个开发依赖，uv 兼容
- **docker-compose.yml**: 单 `db` 服务，`postgres:16-alpine`，端口 **5433:5432**（宿主机 5432 已被 `postgres_local` 占用）
- **.env.example**: 含 `DB_URL=postgresql+psycopg://govcrawler:govcrawler@localhost:5433/govcrawler`
- **govcrawler/settings.py**: pydantic-settings v2 `BaseSettings`，从 `.env` 读取 `DB_URL / DATA_DIR / USER_AGENT`
- **govcrawler/db.py**: `get_engine()` + `get_sessionmaker()` 同步工厂（psycopg3）

### Task 2: ORM 模型 + Alembic 迁移

- **govcrawler/models.py**: SQLAlchemy 2.0 DeclarativeBase 风格，5 个 ORM 类（Site, Column_, Article, Attachment, CrawlLog），完整 `Mapped[...]` 类型标注
- **alembic/versions/0001_initial_schema.py**: 手写迁移，创建全部 5 张表 + 6 个索引，downgrade 按依赖逆序 drop

## Actual Dependency Versions (from uv.lock)

| Package | Resolved Version | License |
|---------|-----------------|---------|
| patchright | 1.58.2 | Apache-2.0 |
| parsel | 1.11.0 | BSD-3 |
| GeneralNewsExtractor | 0.1.3 | MIT |
| trafilatura | 2.0.0 | Apache-2.0 |
| SQLAlchemy | 2.0.49 | MIT |
| alembic | 1.18.4 | MIT |
| psycopg / psycopg-binary | 3.3.3 | LGPL-3 (dynamic) |
| simhash | 2.1.2 | MIT |
| httpx | 0.28.1 | BSD-3 |
| pydantic-settings | 2.14.0 | MIT |
| python-slugify | 8.0.4 | MIT |
| tldextract | 5.3.1 | BSD-3 |
| lxml | 5.4.0 | BSD-3 |
| **Total** | **54 packages** | |

## Migration Details

- **Path:** `alembic/versions/0001_initial_schema.py`
- **Revision ID:** `0001_initial`
- **down_revision:** `None` (root migration)
- **Tables created (5):** sites, columns, articles, attachments, crawl_logs
- **Indexes created (6):** ix_sites_site_id, ix_columns_site_col, ix_articles_url_hash, ix_articles_site_col_pub, ix_articles_exported, ix_attachments_file_hash

## Verified End-to-End Commands

```bash
# 1. Install dependencies (54 packages, ~1 min on good network)
uv sync --dev

# 2. Copy env file (DB_URL points to port 5433)
cp .env.example .env

# 3. Start PostgreSQL 16 on port 5433
docker compose up -d db
# -> website-db-1  postgres:16-alpine  Up (healthy)  0.0.0.0:5433->5432/tcp

# 4. Create schema
uv run alembic upgrade head
# -> INFO Running upgrade  -> 0001_initial, Initial schema: sites, columns, articles, attachments, crawl_logs

# 5. Verify 5 tables
uv run python -c "
from sqlalchemy import inspect; from govcrawler.db import get_engine
print(sorted(inspect(get_engine()).get_table_names()))
"
# -> ['alembic_version', 'articles', 'attachments', 'columns', 'crawl_logs', 'sites']

# 6. Test rollback
uv run alembic downgrade base
# -> INFO Running downgrade 0001_initial -> , Initial schema: ...

# 7. Re-apply for Plan 02 readiness
uv run alembic upgrade head
```

## Deviations from Plan

### Auto-fixed Issues

**1. [Rule 3 - Environment Constraint] Docker host port changed from 5432 to 5433**

- **Found during:** Task 1 (environment notes pre-warned)
- **Issue:** Port 5432 occupied by existing `postgres_local` container on dev machine
- **Fix:** `docker-compose.yml` uses `5433:5432`; `.env.example` `DB_URL` updated to `localhost:5433`
- **Files modified:** `docker-compose.yml`, `.env.example`
- **Impact:** Developer must ensure `.env` uses port 5433; plan documentation noted this explicitly

No other deviations. All other plan instructions executed as-written.

## Known Stubs

None. This plan delivers infrastructure only (schema + config); no data-flow logic exists yet to stub.

## Threat Flags

None. No new network endpoints, auth paths, or trust boundary crossings introduced beyond what the threat model already covers (`.env` gitignore for T-01-01-01; ORM parameterization for T-01-01-02).

## Self-Check: PASSED

All files verified present and committed.

- `pyproject.toml` — commit 19aae31
- `docker-compose.yml` — commit 19aae31
- `govcrawler/models.py` — commit 6a67190
- `alembic/versions/0001_initial_schema.py` — commit 6a67190
- `uv.lock` — commit 6a67190
- `alembic upgrade head` — tables verified in live PG16 container
- `alembic downgrade base` — tables verified dropped
