mirror of https://github.com/langgenius/dify.git
3.0 KiB
3.0 KiB
Purpose
api/controllers/console/datasets/datasets_document.py contains the console (authenticated) APIs for managing dataset documents (list/create/update/delete, processing controls, estimates, etc.).
Storage model (uploaded files)
- For local file uploads into a knowledge base, the binary is stored via
extensions.ext_storage.storageunder the key:upload_files/<tenant_id>/<uuid>.<ext>
- File metadata is stored in the
upload_filestable (UploadFilemodel), keyed byUploadFile.id. - Dataset
Documentrecords reference the uploaded file via:Document.data_source_info.upload_file_id
Download endpoint
-
GET /datasets/<dataset_id>/documents/<document_id>/download- Only supported when
Document.data_source_type == "upload_file". - Performs dataset permission + tenant checks via
DocumentResource.get_document(...). - Delegates
Document -> UploadFilevalidation and signed URL generation toDocumentService.get_document_download_url(...). - Applies
cloud_edition_billing_rate_limit_check("knowledge")to match other KB operations. - Response body is only:
{ "url": "<signed-url>" }.
- Only supported when
-
POST /datasets/<dataset_id>/documents/download-zip- Accepts
{ "document_ids": ["..."] }(upload-file only). - Returns
application/zipas a single attachment download. - Rationale: browsers often block multiple automatic downloads; a ZIP avoids that limitation.
- Applies
cloud_edition_billing_rate_limit_check("knowledge"). - Delegates dataset permission checks, document/upload-file validation, and download-name generation to
DocumentService.prepare_document_batch_download_zip(...)before streaming the ZIP.
- Accepts
Verification plan
- Upload a document from a local file into a dataset.
- Call the download endpoint and confirm it returns a signed URL.
- Open the URL and confirm:
- Response headers force download (
Content-Disposition), and - Downloaded bytes match the uploaded file.
- Response headers force download (
- Select multiple uploaded-file documents and download as ZIP; confirm all selected files exist in the archive.
Shared helper
DocumentService.get_document_download_url(document)resolves theUploadFileand signs a download URL.DocumentService.prepare_document_batch_download_zip(...)performs dataset permission checks, batches document + upload file lookups, preserves request order, and generates the client-visible ZIP filename.- Internal helpers now live in
DocumentService(_get_upload_file_id_for_upload_file_document(...),_get_upload_file_for_upload_file_document(...),_get_upload_files_by_document_id_for_zip_download(...)). - ZIP packing is handled by
FileService.build_upload_files_zip_tempfile(...), which also:- sanitizes entry names to avoid path traversal, and
- deduplicates names while preserving extensions (e.g.,
doc.txt→doc (1).txt). Streaming the response and deferring cleanup is handled by the route viasend_file(path, ...)+ExitStack+response.call_on_close(...)(the file is deleted when the response is closed).