Project ID:
1435886
Project Type:
Fixed
Budget:
min $5000 USD
(Approx. €3953 EUR)
Project Description:
Help us build the people search engine. You will be at the heart of our back-end core team, building a data storage system that holds content from the web, performs regular map/reduce computations used for scoring and data analysis and indexes it for full-text search. You not only know what it takes to build, administer and grow such a system, you know how to leverage it to its’ full potential and you know how to help us get there.
Your background:
You have already built a distributed data store system, preferably using a column-based database such as HBase
You can develop tools to automate the deployment, administration and performance monitoring of large Hadoop/HBase clusters
You can keep a distributed data store up even when EC2 goes down!
You understand bottlenecks in distributed systems (application, OS or network) and can design software to avoid them
You not only know how to write map/reduce jobs on Hadoop, you know how to tweak them for optimal performance
You know Zookeeper is not a job title
You are familiar with Solr, Lucene and their underlying index storage format
You clearly articulate your design choices and educate the rest of the team on them
What makes you get up in the morning:
You see our HBase secondary-indexing code and want to re-write it using HBase Coprocessors
You see Apache Mahout as a never-ending stream of opportunities to uncover a treasure trove of information from our data
You like using technology to empower people
Skills required:
Apache Solr,
Data Mining