Job Description
I am a researcher conducting a large-scale literature review based on academic papers (PDFs).
I already have a predefined set of research questions. I am not looking for AI to write the review. I am looking to build a robust, reproducible system that:
- Applies my question set to each paper
- Extracts answers directly from the documents
- Provides supporting real page references
- Flags missing or ambiguous information
- Outputs everything into a structured master Excel file
This system will function as an additional analytical layer — an “extra pair of eyes” — to support, our review process.
Deliverables
The final output must be a master Excel file with a structured format. In addition, I will also need:
- Clean, documented source code
- Clear instructions to re-run the pipeline
- Ability to add new PDFs and re-run
- Ability to modify or add questions
You should be comfortable with:
- Scientific PDF i
- Extracting data from tables inside PDF
- LLM outputs with enforced JSON schema
- Hallucination mitigation strategies
- Citation grounding at chunk level
Python preferred.(others are also ok)
To Apply, Please include:
1. A description of a similar system you built.
2. Your proposed architecture and tool stack.
3. How you will prevent hallucinations.
4. Estimated timeline
Apply tot his job
Apply To this Job