dify/api/core/rag/extractor
李龙飞 81832c14ee
Fix: Correctly handle merged cells in DOCX tables to prevent content duplication and loss (#27871)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com>
2025-11-13 15:56:24 +08:00
..
blob chore: add ast-grep rule to convert Optional[T] to T | None (#25560) 2025-09-15 13:06:33 +08:00
entity use model_validate (#26182) 2025-10-10 17:30:13 +09:00
firecrawl refactor: Update Firecrawl to use v2 API (#24734) 2025-10-15 10:48:54 +08:00
unstructured refactor: use dynamic max characters for chunking in extractors (#26782) 2025-10-13 10:22:59 +08:00
watercrawl change all to httpx (#26119) 2025-10-10 23:41:16 +08:00
csv_extractor.py chore: add ast-grep rule to convert Optional[T] to T | None (#25560) 2025-09-15 13:06:33 +08:00
excel_extractor.py chore: add ast-grep rule to convert Optional[T] to T | None (#25560) 2025-09-15 13:06:33 +08:00
extract_processor.py remove .value (#26633) 2025-10-11 09:08:29 +08:00
extractor_base.py chore(api/core): apply ruff reformatting (#7624) 2024-09-10 17:00:20 +08:00
helpers.py chore: add ast-grep rule to convert Optional[T] to T | None (#25560) 2025-09-15 13:06:33 +08:00
html_extractor.py chore: cleanup unnecessary mypy suppressions on imports (#24712) 2025-08-28 23:17:25 +08:00
jina_reader_extractor.py feat: knowledge pipeline (#25360) 2025-09-18 12:49:10 +08:00
markdown_extractor.py chore: add ast-grep rule to convert Optional[T] to T | None (#25560) 2025-09-15 13:06:33 +08:00
notion_extractor.py change all to httpx (#26119) 2025-10-10 23:41:16 +08:00
pdf_extractor.py chore: add ast-grep rule to convert Optional[T] to T | None (#25560) 2025-09-15 13:06:33 +08:00
text_extractor.py chore: add ast-grep rule to convert Optional[T] to T | None (#25560) 2025-09-15 13:06:33 +08:00
word_extractor.py Fix: Correctly handle merged cells in DOCX tables to prevent content duplication and loss (#27871) 2025-11-13 15:56:24 +08:00