|
View: |
Part 1: Document Description
|
|
Citation |
|
|---|---|
|
Title: |
GLOBALISE - VOC Document Segmentation Dataset |
|
Identification Number: |
hdl:10622/XMCZLZ |
|
Distributor: |
IISH Data Collection |
|
Date of Distribution: |
2025-09-22 |
|
Version: |
1 |
|
Bibliographic Citation: |
Smit, Renate, 2025, "GLOBALISE - VOC Document Segmentation Dataset", https://hdl.handle.net/10622/XMCZLZ, IISH Data Collection, V1, UNF:6:6pNG8HfphDeciynT9YU+RA== [fileUNF] |
|
Citation |
|
|
Title: |
GLOBALISE - VOC Document Segmentation Dataset |
|
Identification Number: |
hdl:10622/XMCZLZ |
|
Authoring Entity: |
Smit, Renate (Huygens Institute) |
|
Distributor: |
IISH Data Collection |
|
Access Authority: |
Pepping, Kay |
|
Depositor: |
Pepping, Kay |
|
Date of Deposit: |
2025-09-01 |
|
Holdings Information: |
https://hdl.handle.net/10622/XMCZLZ |
|
Study Scope |
|
|
Keywords: |
Arts and Humanities |
|
Topic Classification: |
archive, Dutch East India Company |
|
Abstract: |
This dataset contains detailed annotations of Dutch East India Company (VOC) archival documents based on the TANAP (Towards a New Age of Partnership) project. The dataset provides precise boundaries and classifications for documents within digitized archival volumes, serving as training data for machine learning approaches to historical document segmentation and classification. This work supports the broader goal of making VOC archives more accessible beyond traditional finding aids that often reflect colonial perspectives. |
|
Methodology and Processing |
|
|
Sources Statement |
|
|
Data Access |
|
|
Other Study Description Materials |
|
|
Related Publications |
|
|
Citation |
|
|
Bibliographic Citation: |
Schnober, C., Smit, R., Kuruppath, M., Pepping, K., van Wissen, L., & Petram, L. (2024). Page Embeddings: Extracting and Classifying Historical Documents with Generic Vector Representations. In Proceedings of the Computational Humanities Research Conference 2024: Aarhus, Denmark, December 4-6, 2024 (Vol. 3834, pp. 999-1011). (CEUR Workshop Proceedings). https://ceur-ws.org/Vol-3834/paper73.pdf |
|
File Description--f34885 |
|
|
File: 1120 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:PaALHGHcxn8vZzdIoK338Q== |
|
File Description--f34878 |
|
|
File: 1267 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:mbuRDVkEBgHKzZip9MFlbw== |
|
File Description--f34896 |
|
|
File: 1274 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:0CRaDzC5pJN/To8jr+/5aA== |
|
File Description--f34889 |
|
|
File: 1539 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:EfwH8NmoW4Itv6flCBnuow== |
|
File Description--f34898 |
|
|
File: 1547 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:HY7HTyDWzOhdpb0O4TYIPQ== |
|
File Description--f34884 |
|
|
File: 1557 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:AtuvqqF04kp2zCeUsHK6Mg== |
|
File Description--f34880 |
|
|
File: 2448 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:/jd92CDEedNui+xXLnJYDw== |
|
File Description--f34886 |
|
|
File: 2548 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:fpxgSlWx2cLLT64IDyoP+Q== |
|
File Description--f34881 |
|
|
File: 2555 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:r9a3wBxqJMfJx+cMI32S9g== |
|
File Description--f34882 |
|
|
File: 2775 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:um34/99KE/qyz4SrYcHGMg== |
|
File Description--f34887 |
|
|
File: 3142 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:XpiE2Tq7UgwCxZmu5aKeVQ== |
|
File Description--f34888 |
|
|
File: 3891 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:HiQtpXQp6H0YB6g8WYVHTQ== |
|
File Description--f34895 |
|
|
File: 7923 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:GOJcy4ef7RrxF7beQlWc5Q== |
|
File Description--f34897 |
|
|
File: 8023 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:BJ3ukW8BMVUg5oCWmk06Ig== |
|
File Description--f34891 |
|
|
File: 8121 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:pJbJM07imrdbYvheLKJl2Q== |
|
File Description--f34879 |
|
|
File: 8237 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:cIuvOf6jG28J3Gj4eZe/9Q== |
|
File Description--f34893 |
|
|
File: 8276 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:p0nklJvNDQsJZbEcYgTA7w== |
|
File Description--f34883 |
|
|
File: 8284 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:WulLRLNd1GHvC6YXJNxpWg== |
|
File Description--f34892 |
|
|
File: 8697 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:/P0ICj2jlWffC4BpSZTM3g== |
|
File Description--f34890 |
|
|
File: 8834 - Document Segmentation.tab |
|
|
|
|
Notes: |
UNF:6:hERnkAlAKFo5YYdwWoj1DQ== |
|
List of Variables: |
|
|
Variables |
|
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34885 Location: |
Variable Format: character Notes: UNF:6:PaALHGHcxn8vZzdIoK338Q== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34878 Location: |
Variable Format: character Notes: UNF:6:mbuRDVkEBgHKzZip9MFlbw== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34896 Location: |
Variable Format: character Notes: UNF:6:0CRaDzC5pJN/To8jr+/5aA== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34889 Location: |
Variable Format: character Notes: UNF:6:EfwH8NmoW4Itv6flCBnuow== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34898 Location: |
Variable Format: character Notes: UNF:6:HY7HTyDWzOhdpb0O4TYIPQ== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34884 Location: |
Variable Format: character Notes: UNF:6:AtuvqqF04kp2zCeUsHK6Mg== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34880 Location: |
Variable Format: character Notes: UNF:6:/jd92CDEedNui+xXLnJYDw== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page;; |
|
|
f34886 Location: |
Variable Format: character Notes: UNF:6:fpxgSlWx2cLLT64IDyoP+Q== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34881 Location: |
Variable Format: character Notes: UNF:6:r9a3wBxqJMfJx+cMI32S9g== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34882 Location: |
Variable Format: character Notes: UNF:6:um34/99KE/qyz4SrYcHGMg== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34887 Location: |
Variable Format: character Notes: UNF:6:XpiE2Tq7UgwCxZmu5aKeVQ== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34888 Location: |
Variable Format: character Notes: UNF:6:HiQtpXQp6H0YB6g8WYVHTQ== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34895 Location: |
Variable Format: character Notes: UNF:6:GOJcy4ef7RrxF7beQlWc5Q== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34897 Location: |
Variable Format: character Notes: UNF:6:BJ3ukW8BMVUg5oCWmk06Ig== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34891 Location: |
Variable Format: character Notes: UNF:6:pJbJM07imrdbYvheLKJl2Q== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34879 Location: |
Variable Format: character Notes: UNF:6:cIuvOf6jG28J3Gj4eZe/9Q== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34893 Location: |
Variable Format: character Notes: UNF:6:p0nklJvNDQsJZbEcYgTA7w== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34883 Location: |
Variable Format: character Notes: UNF:6:WulLRLNd1GHvC6YXJNxpWg== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34892 Location: |
Variable Format: character Notes: UNF:6:/P0ICj2jlWffC4BpSZTM3g== |
|
Scan File_Name;TANAP Boundaries;TANAP ID;Subdocument boundaries;Type of non-document page |
|
|
f34890 Location: |
Variable Format: character Notes: UNF:6:hERnkAlAKFo5YYdwWoj1DQ== |
|
Label: |
README - GLOBALISE - VOC Document Segmentation Dataset.pdf |
|
Notes: |
application/pdf |