fake_1000 |
Fake 1000 from splink demos. Records are 250 simulated people, with different numbers of duplicates, labelled. |
1,000 |
250 |
source |
historical_50k |
The data is based on historical persons scraped from wikidata. Duplicate records are introduced with a variety of errors. |
50,000 |
5,156 |
source |
febrl3 |
The Freely Extensible Biomedical Record Linkage (FEBRL) datasets consist of comparison patterns from an epidemiological cancer study in Germany.FEBRL3 data set contains 5000 records (2000 originals and 3000 duplicates), with a maximum of 5 duplicates based on one original record. |
5,000 |
2,000 |
source |
febrl4a |
The Freely Extensible Biomedical Record Linkage (FEBRL) datasets consist of comparison patterns from an epidemiological cancer study in Germany.FEBRL4a contains 5000 original records. |
5,000 |
5,000 |
source |
febrl4b |
The Freely Extensible Biomedical Record Linkage (FEBRL) datasets consist of comparison patterns from an epidemiological cancer study in Germany.FEBRL4b contains 5000 duplicate records, one for each record in FEBRL4a. |
5,000 |
5,000 |
source |
transactions_origin |
This data has been generated to resemble bank transactions leaving an account. There are no duplicates within the dataset and each transaction is designed to have a counterpart arriving in 'transactions_destination'. Memo is sometimes truncated or missing. |
45,326 |
45,326 |
source |
transactions_destination |
This data has been generated to resemble bank transactions arriving in an account. There are no duplicates within the dataset and each transaction is designed to have a counterpart sent from 'transactions_origin'. There may be a delay between the source and destination account, and the amount may vary due to hidden fees and foreign exchange rates. Memo is sometimes truncated or missing. |
45,326 |
45,326 |
source |