TY - RPRT T1 - b-Bit Minwise Hashing in Practice Y1 - 2013 A1 - Li, Ping A1 - Shrivastava, Anshumali A1 - König, Arnd Christian AB - b-Bit Minwise Hashing in Practice Li, Ping; Shrivastava, Anshumali; König, Arnd Christian Minwise hashing is a standard technique in the context of search for approximating set similarities. The recent work [26, 32] demon- strated a potential use of b-bit minwise hashing [23, 24] for ef- ficient search and learning on massive, high-dimensional, binary data (which are typical for many applications in Web search and text mining). In this paper, we focus on a number of critical is- sues which must be addressed before one can apply b-bit minwise hashing to the volumes of data often used industrial applications. PB - Cornell University UR - http://hdl.handle.net/1813/37986 ER - TY - RPRT T1 - Fast Near Neighbor Search in High-Dimensional Binary Data Y1 - 2013 A1 - Shrivastava, Anshumali A1 - Li, Ping AB - Fast Near Neighbor Search in High-Dimensional Binary Data Shrivastava, Anshumali; Li, Ping Numerous applications in search, databases, machine learning, and computer vision, can benefit from efficient algorithms for near neighbor search. This paper proposes a simple framework for fast near neighbor search in high-dimensional binary data, which are common in practice (e.g., text). We develop a very simple and effective strategy for sub-linear time near neighbor search, by creating hash tables directly using the bits generated by b-bit minwise hashing. The advantages of our method are demonstrated through thorough comparisons with two strong baselines: spectral hashing and sign (1-bit) random projections. PB - Cornell University UR - http://hdl.handle.net/1813/37987 ER - TY - RPRT T1 - Presentation: Predicting Multiple Responses with Boosting and Trees Y1 - 2013 A1 - Li, Ping A1 - Abowd, John AB - Presentation: Predicting Multiple Responses with Boosting and Trees Li, Ping; Abowd, John Presentation by Ping Li and John Abowd at FCSM on November 4, 2013 PB - Cornell University UR - http://hdl.handle.net/1813/40255 ER -