Arrow_left   Arrow_right
 
  #5

Pre-process using Jaccard index for address matching

    • Created on: Fri, 19 Jun 2009 (over 2 years ago)
    • Reported by: AndreSomers
    • Assigned to: -
    • Milestone: Backlog
    • Status: New
    • Priority: Normal (3)
    • Component: Record Grouper
    • Report type: Idea
    Matching addresses is a time consuming process. Everything is matched with everything, and that makes for a lot of work. It might help to filter the items to match by first splitting the address into seperate words, and then calculating a Jaccard index for each address combination. Only those that have a sufficiently high Jaccard score would need to be processed further. Perhaps a speed increase would be possible this way. Currently, processing 90k addressing takes about 7 hours on a quad core machine.
  • Followers
     
    Ico-users AndreSomers 
     
    Attachments
    No attachments
    Associations
     
    No associations
    Activity
    Time Expenditure
    Loading