# AGENTS.md

## Directory Structure

```
arxiv/
├── AGENTS.md
├── scripts/
│   ├── download_all.py                          # Downloads all info for a paper in one step
│   ├── download_arxiv_pdf.py                    # Downloads PDF from arXiv
│   ├── download_inspire_abstract.py             # Downloads abstract from INSPIRE-HEP by recid
│   ├── download_inspire_bib.py                  # Downloads paper BibTeX from INSPIRE-HEP
│   ├── download_inspire_references.py           # Downloads references from INSPIRE-HEP
│   ├── download_inspire_citations.py            # Downloads citations from INSPIRE-HEP
│   ├── get_all_paper_arxiv_id.py                # Lists arXiv IDs for all downloaded papers
│   ├── get_arxiv_id_from_bibtex.py              # Extracts arXiv IDs from BibTeX files
│   ├── merge_all_paper_info.py                  # Merges abstracts, refs, and citations
│   ├── refresh_all_citations.py                 # Re-downloads citations for all papers
│   ├── convert_bibtex_to_markdown.py            # Converts .bib files to numbered .md lists
│   └── convert_bibtex_to_html.py                # Converts .bib files to numbered .html lists
└── YYYY/
    └── ARXIV_ID/            # Subfolder by year/arxiv_id (e.g., 2026/2604.20797/)
        ├── {id}.pdf              # Downloaded paper PDF
        ├── {id}.json             # Raw INSPIRE-HEP JSON record
        ├── {id}_abstract.md      # Metadata: title, authors, date, subjects, abstract
        ├── {id}_abstract.html    # Same metadata in HTML format
        ├── {id}_bib.bib          # BibTeX entry for this paper only
        ├── {id}_references.bib   # References in BibTeX format
        ├── {id}_citations.bib    # Citations in BibTeX format
        ├── {id}_references.md    # Numbered list of all references
        ├── {id}_references.html  # Numbered list of all references in HTML
        ├── {id}_citations.md     # Numbered list of all citations
        └── {id}_citations.html   # Numbered list of all citations in HTML
```

Each paper gets files inside a `YYYY/ARXIV_ID/` folder (e.g., `2604.20797` → `2026/2604.20797/2604.20797.pdf` + `2026/2604.20797/2604.20797_abstract.md`). The year is extracted from the arXiv ID prefix.

### Old-style arXiv IDs

Some older papers use the legacy format `{category}/{YYMMNNN}` (e.g., `hep-lat/9810026`). For these:

- **Folder:** Extract year from the numeric suffix, use safe ID as folder name — `hep-lat/9810026` → `1998/hep-lat_9810026/`
- **Filenames:** Replace `/` with `_` in the ID — `hep-lat/9810026` → `hep-lat_9810026.pdf`, `hep-lat_9810026_abstract.md`, etc.
- **URLs:** The old-style ID works directly with arXiv URLs (`https://arxiv.org/abs/hep-lat/9810026`)

## Actions

### List recent hep-lat papers
Fetch the arXiv new submissions page for hep-lat at `https://arxiv.org/list/hep-lat/recent` and summarize the papers listed, including title, authors, subjects, and a brief description.

### Download an arXiv paper and its abstract
Given an arXiv ID (e.g., `2604.20797`):

1. **Download PDF**: Run `python scripts/download_arxiv_pdf.py {id}` to fetch from arXiv and save to `{YYYY}/{ARXIV_ID}/{id}.pdf`
2. **Fetch abstract from INSPIRE-HEP**: Run `python scripts/download_inspire_abstract.py {id}` to fetch metadata and save to `{YYYY}/{ARXIV_ID}/{id}_abstract.md`, `{YYYY}/{ARXIV_ID}/{id}_abstract.html`, and `{YYYY}/{ARXIV_ID}/{id}.json`
3. **Fetch paper BibTeX**: Fetch the BibTeX entry for the paper itself from INSPIRE-HEP and save to `{YYYY}/{ARXIV_ID}/{id}_bib.bib`
4. **Fetch references from INSPIRE-HEP**: Run `python scripts/download_inspire_references.py {id}` to save to `{YYYY}/{ARXIV_ID}/{id}_references.bib`
5. **Fetch citations from INSPIRE-HEP**: Run `python scripts/download_inspire_citations.py {id}` to save to `{YYYY}/{ARXIV_ID}/{id}_citations.bib`
6. **Convert references to lists**: Run `python scripts/convert_bibtex_to_markdown.py {YYYY}/{ARXIV_ID}/{id}_references.bib` and `python scripts/convert_bibtex_to_html.py {YYYY}/{ARXIV_ID}/{id}_references.bib` to generate `_references.md` and `_references.html`
7. **Convert citations to lists**: Run `python scripts/convert_bibtex_to_markdown.py {YYYY}/{ARXIV_ID}/{id}_citations.bib` and `python scripts/convert_bibtex_to_html.py {YYYY}/{ARXIV_ID}/{id}_citations.bib` to generate `_citations.md` and `_citations.html`

Steps 1–7 are handled by individual scripts:
- `python scripts/download_arxiv_pdf.py {id}` – download PDF from arXiv
- `python scripts/download_inspire_abstract.py {id}` – fetch abstract and metadata from INSPIRE-HEP
- `python scripts/download_inspire_bib.py {id}` – fetch paper BibTeX from INSPIRE-HEP
- `python scripts/download_inspire_references.py {id}` – fetch references from INSPIRE-HEP
- `python scripts/download_inspire_citations.py {id}` – fetch citations from INSPIRE-HEP

Alternatively, run `python scripts/download_all.py {id}` to perform all of the above in a single command (supports multiple IDs). Use `--force` to re-download even if output files already exist.

### Merge all paper info
Run `python scripts/merge_all_paper_info.py` to combine all `*_abstract.md`, `{id}_bib.bib`, `*_references.bib`, and `*_citations.bib` files across all `YYYY/ARXIV_ID/` subfolders. Deduplicates BibTeX entries by citation key, sorts by arXiv date, and writes seven output files to the repository root: `all_abstracts.md`, `all_abstracts.html`, `all_papers.bib`, `all_papers.md`, `all_papers.html`, `all_references.bib`, `all_references.md`, `all_references.html`, `all_citations.bib`, `all_citations.md`, and `all_citations.html`.

### List all downloaded paper IDs
Run `python scripts/get_all_paper_arxiv_id.py` to print the arXiv IDs of all downloaded papers (one per line), sorted by date.

### Extract arXiv IDs from BibTeX files
Run `python scripts/get_arxiv_id_from_bibtex.py refs1.bib refs2.bib` to extract arXiv IDs from one or more `.bib` files and print them to stdout (one per line). Entries are deduplicated by citation key and sorted by date (most recent first). Entries without an `eprint` field are silently skipped.

### Refresh all citations
Run `python scripts/refresh_all_citations.py` to re-download citations from INSPIRE-HEP for all papers that already have a `_citations.bib` file. Skips papers where the citation count has not changed. Use `--force` to re-download regardless of count. After refreshing, merges all citation files into `all_citations.bib`, `all_citations.md`, and `all_citations.html` at the repository root.

### INSPIRE-HEP API documentation
See `scripts/inspire-hep-api.md` for detailed documentation of the INSPIRE REST API, including endpoints, query syntax, output formats, rate limiting, and citation/reference export.

## Code Style

- Python function names should not start with `_`.
