Language Data Scientist
Job description
Job Title: Language Data Scientist
Location: Fully Remote within the U.S. (excluding California, Washington, Alaska, Colorado, Montana, New York, Puerto Rico, Nevada, Nebraska)
Employment Type: Full-Time (40 hours per week) Fixed-Term
Who we are:
Innodata (NASDAQ: INOD) is a leading data engineering company. With more than 2,000 customers and operations in 13 cities around the world, we are an AI technology solutions provider-of-choice for 4 out of 5 of the world’s biggest technology companies, as well as leading companies across financial services, insurance, technology, law, and medicine.
By combining advanced machine learning and artificial intelligence (ML/AI) technologies, a global workforce of subject matter experts, and a high-security infrastructure, we’re helping usher in the promise of AI. Innodata offers a powerful combination of both digital data solutions and easy-to-use, high-quality platforms.
Our global workforce includes over 7,000 employees in the United States, Canada, United Kingdom, the Philippines, India, Sri Lanka, Israel and Germany. We’re poised for a period of explosive growth over the next few years.
About the Role:
Innodata is building a team of Language Data Scientists and Gen AI experts to help our customers advance GenAI applications. You will work hands-on with multi-modal and multi-lingual datasets and collaborate with cross-functional partners. You will use your experience with human and synthetic data workflows to drive innovation and continuous improvement. The ideal candidate must have the right mix of skills in (computational) linguistics and human evaluation tasks, data science, and data engineering.
Key Responsibilities:
Design/improve workflows to create data for AI/ML training and evaluation. Includes human annotation and data collection workflows, as well as synthetic ones.
Dive deep into existing workflows and processes to gather data and insights, make recommendations, and drive improvement through innovation and cross-functional collaboration with customers
Critically assess annotation tooling and workflows
Quantitatively analyze large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance
Work closely with client stakeholders on understanding goals, gathering requirements, proposing solutions and executing them.
Job requirements
Qualifications:
Knowledge of how components of GenAI products or services combine to work
Collaborating with cross-functional teams to define AI project requirements and objectives, ensuring alignment with overall business goals
MA in (computational) linguistics, data science, computer science (AI / ML / NLU), quantitative social sciences or a related scientific / quantitative field, PhD strongly preferred
Language and language data expertise: Extensive experience working with human language data and designing human evaluation tasks, including multi-phase and complex workflows.
Deep understanding of language and its relationship with culture
Ability to identify ambiguity and subjectivity in language
Ability to work with multi-lingual and multi-modal projects
Quantitative Analysis Skills: Advanced knowledge of statistics, metrics (e.g. f1 score, inter-rater reliability metrics), and data analysis methods such as sampling.
Technical skills:
Experience with Natural Language Processing (NLP) techniques and tools, such as SpaCy, NLTK, or Hugging Face.
Proficiency in Python to
handle / transform large datasets (e.g. pre- and postprocessing data, pandas)
perform quantitative analyses
visualize data (for example matplotlib, seaborn)
Data processing:
Deep understanding of data pipelines to support ML and NLP workflows,
Knowledge of efficient data collection, transformation, and storage
Knowledge of data structures, algorithms, and data engineering principles
Excellent interpersonal skills for effective cross-functional stakeholder engagement
Excellent problem-solving skills, with the ability to think critically and creatively to develop innovative AI solutions
Ability to work independently and collaborate as part of a team
Adaptable to changing technologies and methodologies
Ability to translate experience, research and development information to understand client products and services.
Preferred Qualifications:
Conducting research to stay up-to-date with the latest advancements in generative AI, machine learning, and deep learning techniques
Knowledge of optimizing existing generative AI models for improved performance, scalability, and efficiency
Experience of developing and maintaining ML/AI pipelines, including data preprocessing, feature extraction, model training, and evaluation
Model Fine-Tuning: Knowledge of Fine-tuning pre-trained models to adapt them to specific tasks and datasets, improving their performance and relevance
Developing clear and concise documentation, including technical specifications, user guides, and presentations, to communicate complex AI concepts to both technical and nontechnical stakeholders
Contributing to establishing best practices and standards for generative AI development with customers and within the organization
Providing technical mentorship and guidance to junior team members
Understanding of techniques such as GPT, VAE, and GANs
Salary Range: Up to $95k USD
Rates at Innodata vary depending on a wide array of factors, which may include but are not limited to the role, skill set, educational background and geographic location.
All done!
Your application has been successfully submitted!
Recommended Jobs
Supervisor
Hallmark Field Installation Supervisor (part-time) - Birmingham, AL The Field Installation Supervisor (IS) is a part-time position responsible for supervising a group of 2-3 Installation Merchandi…
Associate Director, Medical Science Liaison, Solid Tumors - East Region (Montgomery)
If you are a current Jazz employee please apply via the Internal Career site. Jazz Pharmaceuticals is a global biopharma company whose purpose is to innovate to transform the lives of patients and …
Concrete Quality Manager
3129 Inv. Dornell Cousette St Tuscaloosa Alabama 35401 _ Why Valmont _ We’re Here to Move the World Forward. Valmont impacts millions of people around the world every day, yet they might not…
Store Manager
Assistant Store Manager Assist store manager in areas of sales, appearance, and overall operations of store. Objective is to receive hands on training preparing manager in training for a managemen…
Senior research scientist - human
Lensa is a career site that helps job seekers find great jobs in the US. We are not a staffing firm or agency. Lensa does not hire directly for these jobs, but promotes jobs on LinkedIn on behalf of …
Radar Test Engineer
Description:s and test objectives; Document and communicate simulation software anomalies/errors; Participate in CONOPs, Radar Operator Procedure Development, and Lessons Learned reviews and doc…
RF Design Engineer (Board Level)
IERUS specializes in a variety of electromagnetic spectrum technology research, testing, and implementation. IERUS supports customers with a diverse set of competencies including software, firmware, …
Cullman Chick Truck Driver-CDL
Description Position at Pilgrim's Basic Skills and Qualifications: Hold a class A CDL License No serious nature of accidents or DOT recordable accidents. Not have more than 3 occurrences on…
Maintenance Electrical Technician
New Flyer is North America’s heavy-duty transit bus leader, providing sustainable mobility solutions through transit buses, technology, and infrastructure. New Flyer is a subsidiary of NFI Group, a l…
ASSEMBLER - ALABAMA
Job Title: Assembler Department: New and/or Repair Reports To: Manufacturing Manager FLSA Status: Non-Exempt Position Overview: The Assembler is responsible for buildin…