In this month’s Spotlight, System1 Lead Machine Learning Scientist Xi Wang talks programming, tackling challenges and data science team culture.Coding is an essential part of the data scientist job — we use programming tools to work efficiently and effectively with data, especially when dealing with large datasets.
Python is my go-to programming language. I also use VS code as the developer tool; snowflake (SQL) for pulling the data; Jupyter Notebook for data analysis, visualization and some preliminary modeling work.
When I decide on the solution, I then write data processing pipeline and model training in python scripts so that we can run them inside a docker container on any remote instances.
At System1, we build models/tools/packages for the following main tasks:
- Data manipulation and visualization: We deal with millions of event data every hour that come in various sources and data formats. Cleaning and processing data usually requires programming skills to extract meaningful insights from them. We also use data visualization Python packages to create charts and graphs to communicate findings with other teams.
- Machine Learning: A lot of projects involve building advanced machine learning (ML) models to solve business problems. Coding is required to implement and tune ML algorithms as well as evaluate model performance.
- Automation: We build Python tools to automate data processing and data modeling workflow to make sure the most up-to-date ML models can be trained in an efficient manner and used in production.
I recently worked on a challenging project that focused on improving real-time keyword recommendations in System1’s advertising business. Having an impactful recommender system is important when it comes to better understanding user intent and maximizing revenue.
We are dealing with data with high volume, velocity and variety — millions of requests every hour from users in different geo locations, campaigns and devices.
For each request, we recommend the top keywords from hundreds of thousands of candidates in a time-sensitive manner. This requires a well-designed and robust infrastructure that can support this type of recommendation.
The first thing we do is gather information and data to really understand the problem and the goal. Then we can break it down into smaller parts and identify the underlying issues.
Depending on the time frame, we prioritize simple solutions based on statistics or pre-trained models that are most likely not perfect but effective and have a good impact in the first round.
After we test the simple solution, we iteratively improve results by enhancing our training pipeline, adding more useful features and building a more sophisticated model based on the feedback data collected by the current solution.
System1 provides access to technology and resources that help me perform my job more efficiently and effectively.
S1 also has training and development programs that help me acquire new skills or improve existing ones. This includes taking online courses, attending conferences and seminars.
The overall working environment is very flexible so I can make the best use of my time and excel at my work.
Supportive, open-minded and respectful — we celebrate our successes and support each other through challenges, creating a positive and uplifting team culture.
We also have an open-minded team culture that encourages new ideas and perspectives from everyone. We have a lot of smart people here who are humble and like to learn from each other.