There are Python 2.7 codes and learning notes for Spark 2.1.1
-
Updated
Aug 21, 2018 - Python
There are Python 2.7 codes and learning notes for Spark 2.1.1
insight data engineering fellow project
An easy-to-use script for fast similarity search in the textual data (and embedding space) with GPU & Multi-core support.
Fast Jaccard similarity search for abstract sets (documents, products, users, etc.) using MinHashing and Locality Sensitve Hashing
Implementation of a B+ Tree for range and exact match queries and of the LSH algorithm for finding similar documents as measured by Jaccard Similarity.
Scalable Data Mining - Assignment submissions
Recommendation systems for Yelp (collaborative filtering & content-based)
ETH Zurich Fall 2017
Implementing Locality Sensitive Hashing for DNA Sequences.
similarity of the texts (Jaccard Similarity, Minhash, LSH)
Deduplication : minhash w/ LSH
Add a description, image, and links to the minhash-lsh-algorithm topic page so that developers can more easily learn about it.
To associate your repository with the minhash-lsh-algorithm topic, visit your repo's landing page and select "manage topics."