Job Requisition:Collection Technology Specialist (Webscraper)
JOB DESCRIPTION:The Mission Support Operation is currently soliciting experienced Collection Technology Specialists to be part of a Data Exploitation Technology Team for a language services contract supporting an Intelligence Community customer in the National Capital Region (NCR). Collection Technology Specialists will use technical tools to collect Internet information from publically accessible websites, format the data, and send the results to ingest points. The data collection occurs in a variety of foreign languages. Languages Supported: Any and all languages of Africa; East Asia and Pacific; Europe and Central Asia; Latin America and Caribbean; Middle East and North Africa; South Asia.PRIMARY RESPONSIBILITIES:
The web collection specialists configure, test, and monitor webscraping tools and the workflows that ingest the data. Though not required for all positions, specialists with language proficiency independently confirm that vernacular data is collected and formatted correctly, otherwise, may be required to work closely with a translator to satisfy specific web collection requests. The successful candidate will perform a variety of data collection tasks, including crawling selected web sources to determine structure, metadata, and data discovery; formatting data to be compatible with downstream systems; performing API pulls, posting the resulting data to cloud-based systems. Some positions may be required to possess intermediate foreign language skills (IRL level 3/3+).
EDUCATION & EXPERIENCE:MINIMUM QUALIFICATIONS:
- US citizen
- 2 year technical degree or 3 years of experience related to web technologies or technology-based data collection
- Have in-depth understanding of webscraping and data transfers via APIs and be able to work with technology-based data collection in a variety of formats, such as HTML, JSON, XML, REST.
- Familiarity with processing foreign language text with Unicode standards
- Understand how to code scripts or software routines to perform and control reliable web collection
- Ability to work independently and as a contributor to a virtual team
- Self-starter with curious mindset and desire to solve problems and learn new skills
- Strong organizational skills, time-management skills, attention to detail, and strong initiative
- Effective communicator in both written and verbal formats for a variety of audiences
- Bachelor's degree
- Taken training or coursework in digital curation, metadata, digital libraries, electronic records management, big data analytics for the Web and text, ontologies, data semantics, information visualization, digital publishing standards and systems, and data management for big data
- Experience with Human Language Technology (HLT) such as computer assisted translation (CAT) tools, post-edited machine translation, and adaptive machine translation tools.
- Intermediate foreign language skills (IRL level 3/3+)