Getting Started

Provides a REST API for persons/beneficiaries data to be indexed and queried. A typical use case will:

index all enrolled beneficiaries into the deduplication service
query the deduplication service to assert that an applicant is not already enrolled before proceeding
use for deduplicating existing beneficiary lists

API

Indexing Beneficiary

Adding a beneficiary to the deduplication service, e.g., beneficiary add to your program

POST /index
{
    "city": "Freetown",
    "email": "saltonmassally@gmail.com",
    "first_name": "Salton",
    "id": "{id}",
    "last_name": "Massally",
    "phone": "07722015",
    "state": "Freetown",
    "street": "5 Foday Drive",
    "street2": "Hill Station"
}

NOTE id must be unique, ideally beneficiary ID in your database. This is what is returned for matches.

Searching for Beneficiary

Checks indexed beneficiaries for similar matches

POST /index/search
{
  "attributes": {
    "first_name": "Salton",
    "last_name": "Massally",
    "phone": ["202-555-1234", "317-555-1234"],
    "email": "saltonmassally@gmail.com"
  }
}

RESPONSE: HTTP 200

[
    {
        "beneficiary": "{id}",
        "reasons": [
            "email ->  Input: saltonmassally@gmail.com | Match: saltonmassally@gmail.com | Type: match_fuzzy ",
            "first_name.metaphone ->  Input: Salton | Match: Salton | Type: match ",
            "first_name.nysiis ->  Input: Salton | Match: Salton | Type: match ",
            "first_name.soundex ->  Input: Salton | Match: Salton | Type: match ",
            "first_name ->  Input: Salton | Match: Salton | Type: match_fuzzy ",
            "last_name.metaphone ->  Input: Massally | Match: Massally | Type: match ",
            "last_name.nysiis ->  Input: Massally | Match: Massally | Type: match ",
            "last_name.soundex ->  Input: Massally | Match: Massally | Type: match ",
            "last_name ->  Input: Massally | Match: Massally | Type: match_fuzzy "
        ]
    }
]

Returns a list of map of beneficiaries with the id used to index the record and a list of reasons why the record matched the query.

Performance of queries depend on data quality of both the index and they query

Querying with as much data as possible increases the likelihood of finding duplicates if any exists.

Not finding a duplicate is not a certainty that none exist.

Especially if unique IDs are lacking or multiple IDs are supported, the frameworks approach to deduplication is probabilistic and depends on factors like data quality, and matching strategies employed (e.g. biometric provides greater assurances than entity resolution). Inversely, matching a duplicate simply means that there is a high probability that they both refer to the same person (usually > 95% if data quality is ideal) and not that its a certain match. Consequently some human intervention is necessary. {: .important}

De-indexing Beneficiary

Removing a beneficiary from the deduplication service, e.g., beneficiary disenrolled from your program

DELETE /index/{id}

Where id is what was used to index the beneficiary

Allowed fields

Table shows a list of beneficiary attributes allowed for indexing and querying:

Attribute	Type	Note
id	String	Required; beneficiary’s unique ID in your database
identity	String	Identity records in the form type-number. e.g. passport-1234567
first_name	String
middle_name	String
last_name	String
phone	String	Please remove the country code e.g. 0778763839 and not +232778763839
email	String
street	String
street2	String
city	String	City or town depending on context
state	String	State or District depending on context
postal_code	String
dob	String	please use form 1990/07/23
bank	String	Name of bank or any organization that payment is sent to
bank_account	String	Bank account number
emergency_contact_name	String	Name of person listed as emergency contact
emergency_contact_phone	String	Phone of person listed as emergency contact; without country code

You do not need to provide all this data when indexing or querying for records; however the better your data quality, the better the precision of your result; i.e. try to supply all these fields for both index and query operations!

Kibana UI

The docker-compose ships with Kibana, providing the power to visualize your beneficiary data in custom ways and run queries against the elasticsearch backend.

Navigate to http://<you-ip-address>:5601

kibana_query_ui

How it all works

The default implementation works via the process of entity resolution. It compares attributes of the query provided against its database of indexed beneficiaries to find a match very likely referencing the same person.

Below is the set of matching rules employed:

identity (ID Type Number)
first_name, last_name, phone
first_name, phone
first_name, last_name, phone
last_name, phone
first_name, last_name, email
first_name, phone
last_name, phone
first_name, last_name, middle_name, dob
first_name, last_name, street, street2, city, state
first_name, last_name, street, city, state
first_name, last_name, street, state
first_name, last_name, middle_name, street2, state
first_name, last_name, street, postal_code
first_name, last_name, dob, city, state
first_name, last_name, emergency_contact_phone
first_name, last_name, middle_name, emergency_contact_name
first_name, last_name, bank_account_bank, bank_account_number
first_name, bank_account_bank, bank_account_number
last_name, bank_account_bank, bank_account_number

NOTE: To compensate for typos and near infinite representation of the same information, we employ a selection of fuzzy matching and phonetics algorithms when indexing and querying