Implement MVP DOM.RIA parser with curl_cffi scraping
...
- Add src/ modules: config, session, parser, normalizer, collector, storage, crawler, main
- Implement curl_cffi session with chrome124 impersonation and cookie warmup
- Extract __INITIAL_STATE__ using bracket-counting JSON parser
- Normalize DOM.RIA data into data_collector payload format
- Add SQLite resume storage for deduplication
- Implement rate limiting: 10s between pages/details, 2min pause every 50 pages
- Add CLI entry point with argparse for city/category/operation selection
- Add Dockerfile and requirements.txt
- Update TECH_SPEC.md with v3.0 post-MVP status and empirical data findings
- Update README.md with usage instructions
Co-Authored-By: Claude <noreply@anthropic.com>
Eugene Sukhodolskiy
committed
1 day ago