dify/api/core/rag/extractor
wangxiaolei 45911ab0af
feat: using charset_normalizer instead of chardet (#29022)
2025-12-05 11:19:19 +08:00
..
blob
entity
firecrawl refactor: Update Firecrawl to use v2 API (#24734) 2025-10-15 10:48:54 +08:00
unstructured refactor: use dynamic max characters for chunking in extractors (#26782) 2025-10-13 10:22:59 +08:00
watercrawl change all to httpx (#26119) 2025-10-10 23:41:16 +08:00
csv_extractor.py
excel_extractor.py
extract_processor.py remove .value (#26633) 2025-10-11 09:08:29 +08:00
extractor_base.py
helpers.py feat: using charset_normalizer instead of chardet (#29022) 2025-12-05 11:19:19 +08:00
html_extractor.py
jina_reader_extractor.py
markdown_extractor.py
notion_extractor.py change all to httpx (#26119) 2025-10-10 23:41:16 +08:00
pdf_extractor.py
text_extractor.py
word_extractor.py Fix: Correctly handle merged cells in DOCX tables to prevent content duplication and loss (#27871) 2025-11-13 15:56:24 +08:00