dify/models at feat/chunk-deduplication - dify - Gitea: Git with a cup of tea

History

Frederick2313072 626e71cb3b feat: implement content-based deduplication for document segments - Add database index on (dataset_id, index_node_hash) for efficient deduplication queries - Add deduplication check in SegmentService.create_segment and multi_create_segment methods - Add deduplication check in DatasetDocumentStore.add_documents method to prevent duplicate embedding processing - Skip creating segments with identical content hashes across the entire dataset This prevents duplicate content from being re-processed and re-embedded when uploading documents with repeated content, improving efficiency and reducing unnecessary compute costs.		2025-09-20 06:28:14 +08:00
..
__init__.py	feat: knowledge pipeline (#25360 )	2025-09-18 12:49:10 +08:00
_workflow_exc.py	feat: Persist Variables for Enhanced Debugging Workflow (#20699 )	2025-06-24 09:05:29 +08:00
account.py	chore: add ast-grep rule to convert Optional[T] to T \| None (#25560 )	2025-09-15 13:06:33 +08:00
api_based_extension.py	replace db with sa to get typing support (#23240 )	2025-08-02 23:54:23 +08:00
base.py	example add more type check (#24999 )	2025-09-02 19:13:43 +08:00
dataset.py	feat: implement content-based deduplication for document segments	2025-09-20 06:28:14 +08:00
engine.py	feat(api): Add image multimodal support for LLMNode (#17372 )	2025-04-30 17:28:02 +08:00
enums.py	feat: knowledge pipeline (#25360 )	2025-09-18 12:49:10 +08:00
model.py	feat: knowledge pipeline (#25360 )	2025-09-18 12:49:10 +08:00
oauth.py	feat: knowledge pipeline (#25360 )	2025-09-18 12:49:10 +08:00
provider.py	chore: add ast-grep rule to convert Optional[T] to T \| None (#25560 )	2025-09-15 13:06:33 +08:00
provider_ids.py	feat: knowledge pipeline (#25360 )	2025-09-18 12:49:10 +08:00
source.py	chore: add ast-grep rule to convert Optional[T] to T \| None (#25560 )	2025-09-15 13:06:33 +08:00
task.py	chore: add ast-grep rule to convert Optional[T] to T \| None (#25560 )	2025-09-15 13:06:33 +08:00
tools.py	feat: knowledge pipeline (#25360 )	2025-09-18 12:49:10 +08:00
types.py	[Chore/Refactor] Improve type annotations in models module (#25281 )	2025-09-08 09:42:27 +08:00
web.py	replace db with sa to get typing support (#23240 )	2025-08-02 23:54:23 +08:00
workflow.py	feat: knowledge pipeline (#25360 )	2025-09-18 12:49:10 +08:00