DevoxxFR 2024 Reproducible Builds with Apache Maven
An adaptive algorithm for detection of duplicate records
1. An Adaptive Algorithm for Detection of Duplicate Records Presented By: Rama kanta Behera IT200127207 Under the guidance of : Miss Ipsita Mishra
2.
3.
4.
5. OUTLINE OF THE PROPOSED SOLUTION The central idea behind the present algorithm is based on the fundamental property of primality of numbers I f(x) Record set Integer number space Fig: hashing I P Record set Integer number Prime number f(x) g(x) Fig: Extended hashing into prime space
6. r1 r2 … rn I1 I2 … In P1 P2 … Pn PRODUCT( P prior) f(x) g(x) P1*p2 …*pn= P prior Fig: The complete algorithm
7.
8. STEPS OF THE ALGORITHM Step 1 : For each new record, hash is performed and unique hash value (Hnew) for each distinct record is obtained. Step 2 : Hnew is mapped to its corresponding unique prime (Pnew). Step 3 : Pprior is divided with Pnew. If Pnew exactly divides Pprior, then the corresponding record to Pnew is a duplicate and already exists in Pprior. Else, Pnew is a distinct record. Step 4 : If Pnew is a distinct record, Pprior is multiplied with Pnew and the result is stored back in Pprior. Thus updating Pprior renders the algorithm adaptive.