Data Engineer (Python)

ENVISN INCORPORATED

Data Engineer (Python)

Houston, TX
Full Time
Paid
  • Responsibilities

    Job Title: Python Data Engineer

    Location: Houston, TX (ONSITE ROLE)

    Duration: Long term contract

    Job Description: We are looking for a talented Data Engineer with expertise in Python data processing. The ideal candidate will have a strong background in Python API development, parallel data processing, and distributed systems design. You will be responsible for building and maintaining systems that handle large-scale data processing tasks, ensuring high performance and scalability.

    Key Responsibilities: Python API Development: o Develop and maintain RESTful APIs using Python web frameworks such as FastAPI or Django. o Collaborate with front-end developers to integrate user-facing elements with server-side logic.

    Parallel Data Processing: o Utilize Pandas, NumPy, and other libraries to process large datasets efficiently. o Implement multithreading, multiprocessing, and asynchronous programming techniques. o Optimize data processing pipelines to handle millions of rows with minimal latency.

    Distributed Systems Design: o Design and implement distributed systems with a focus on scalability and reliability. o Understand and apply core concepts such as load balancing and task queues. o Use Docker to containerize applications and manage dependencies. o (Preferred) Experience with Kubernetes for container orchestration.

    Technical Communication: o Clearly articulate complex technical concepts to team members and stakeholders. o Document system designs, processes, and code effectively. o Collaborate with cross-functional teams to align on project goals and deliverables.

    Must-Have Qualifications: Experience in Python Web Frameworks: o Proficiency with FastAPI, Django, or similar frameworks. O C# coding o Understanding of RESTful API principles and best practices.

    Docker Knowledge: o Ability to create and manage Docker Files. o Experience with containerization for deployment and development workflows.

    Systems Design Understanding: o Basic knowledge of load balancing, task queues, and distributed system concepts. o Ability to design systems that are scalable and maintainable.

    Concurrent and Parallel Computing Skills: o Proficiency in multithreading and multiprocessing without relying solely on external libraries or frameworks. o Familiarity with asynchronous programming, particularly asyncIO in Python.

    Communication Skills: o Excellent technical communication abilities. o Experience collaborating in team environments and conveying complex ideas clearly.

    Preferred Qualifications: Education: o BS or MS in Computer Science

    Advanced Data Processing Tools: o Experience with Polars, PySpark, or similar tools. o Handling of large-scale data processing tasks efficiently.

    Distributed Computing Experience: o Hands-on experience with distributed architectures in Docker. o Familiarity with concepts like task queuing, MapReduce, and saga patterns.

    Kubernetes Experience: o Knowledge of container orchestration using Kubernetes. o Experience deploying and managing applications in a Kubernetes cluster.

    Problem-Solving at Scale: o Demonstrated ability to solve complex problems using parallel or distributed computing. o Innovative thinking beyond single-threaded processes.