# Web Scraper Prototype Notes

## Implemented Files

- UI: [scrapper.html](../scrapper.html)
- API: [api/scrape-page.php](../api/scrape-page.php)
- DJI endpoint: [api/getDji.php](../api/getDji.php)
- Requirements: [SCRAPER_REQUIREMENTS.md](./SCRAPER_REQUIREMENTS.md)

## Prototype Flow

1. Enter URL.
2. Add one or more capture rules.
3. Click **Load + Scrape**:
   - backend fetches page HTML,
   - applies rules,
   - returns JSON + sanitized HTML,
   - UI renders extracted values and preview.
4. Hover page elements in preview to inspect selector + HTML snippet.

## DJI Shortcut in Scraper UI

- In the **Target** panel, use **Open DJI JSON Endpoint** to open:
  - `./api/getDji.php?id=a6qja2`
- This returns the current DJI signal payload (`close`, `change`, `pct`) directly.
- The button is useful for quick source verification before running full scrape rules.

## API Contract

### Request

`POST api/scrape-page.php`

```json
{
  "url": "https://example.com",
  "includeHtml": true,
  "rules": [
    {
      "name": "pageTitle",
      "type": "css",
      "selector": "h1",
      "extract": "text",
      "attr": "",
      "multiple": false
    }
  ]
}
```

### Response (success)

```json
{
  "success": true,
  "url": "https://example.com",
  "meta": {
    "httpStatus": 200,
    "rulesCount": 1,
    "fetchedAt": "2026-05-23T00:00:00Z"
  },
  "extracted": {
    "pageTitle": "Example Domain"
  },
  "html": "..."
}
```

### Response (error)

```json
{
  "success": false,
  "error": "Invalid URL"
}
```

## Next Steps

1. Add richer CSS selector support (full parser library).
2. Add saved presets per site.
3. Add CSV/JSON export buttons.
4. Add extraction test mode per single rule.
5. Add authentication/session support for protected pages.
