We are looking for a Web Scraping Engineer to join our team!
As a Web Scraping focused Data Engineer, you will be responsible for extracting, transforming and storing data to nosql db from websites using web crawling tools.
In this role you will own the creation process of these tools, services, and workflows to improve crawl/ scrape analysis, reports and data management.
You will also be responsible to test the data to insure accuracy and quality. You will own the process to identify and rectify any issues with breaks as well as scale scrapes as needed.
Write bots to source publicly available data (scraping websites, consuming data published via APIs or CSV, or extracting data from PDFs) in order to create new data feeds, and also help solve problems with our existing feeds.
2+ years of web scraping experience
Productionized experience with one or more of the following web scraping frameworks and tools: Scrapy, Puppeteer, Selenium, ScrapingHub, BeautifulSoup, Import.io Webhose.io
Basic knowledge of data engineering (database ingestion, ETL, etc.)
Experience with data testing/quality assurance processes, scripting & tools
Experience with bypassing Bot detection (HTTP Proxy, CAPTCHA etc.)
Query and understand structured data: SQL (SQLite/MySQL or similar), JSON, XML
Familiarity with NoSql databases (Graph Databases are even better)
Solution orientation and "can do" attitude - with a desire to tackle complex problems.
Advantage: Experience with data DevOps tools such as airflow
Advantage: experience with cloud environment (Azure, AWS)
Advantage: experience extracting data from multiple disparate sources including Web, PDF, and spreadsheets.
Possibility to work from home
Advancement and professional development
Prikaži tekst oglasa
Link je kopiran.
Posao Software Developer / Programmer / Engineer, Beograd(448 oglasa)
email@example.com:~# Helloworld.rs koristi kolačiće kako bi ti pružao najbolje korisničko iskustvo. Nastavkom korišćenja
sajta smatraćemo da imamo saglasnost sa korišćenjem kolačića. Više o kolačićima možeš pročitati ovde.