Meet Charlene Tay, Data Scientist at High Alpha
Employer: High Alpha
Job Title: Data Scientist
Degree Path: MS in Data Science, Indiana University
Hometown: Singapore
City: Indianapolis, IN
What led you to your getting into tech and this occupation? What was your first job in tech?
My first job out of college was doing computational research in cognitive science. The research was cutting-edge and data-intensive – I had to learn and find ways to clean, manage and analyze very large amounts of data (images, audio, IoT time-series data, etc.) to build models, answer questions, and support hypotheses. I loved how computational methods can complement human intelligence, jumping in on areas where we have cognitive limitations, as well as by automating the work that must be done to prepare the way for insights and decisions. I became excited by how the field of data science is such a promising way of applying curiosity and technical tradecraft to create value from data, and I knew that I wanted to be able to use this toolbox of skills to help others. This led to me enrolling in the graduate program in Data Science at Indiana University, where I began training to become a data scientist.
What has been your career path so far?
While in grad school, my classmates and I realized the need to find ways to extend our classroom knowledge to real-world problems and data. A few of us started a consulting group and began working with partners from academia and industry on a wide range of project-based learning experiences, working to provide solutions to clients. Through this arrangement, I was fortunate to build a relationship with High Alpha and have since joined them as a data scientist. I am now part of a small team that works across High Alpha’s portfolio of companies, servicing them by working together with their teams to incorporate data science and machine learning capabilities into their products and processes. It has been a fun ride!
When you think of a day in your life, what are the main work activities you do or responsibilities you have?
In my daily work, I help build software frameworks and automated processes that help the High Alpha portfolio companies that we work with to extract, interpret and manage data, and solve complex problems using data science. My typical day usually involves work with at least two different projects: I focus on one in the morning and switch to a different project in the afternoon. We take on projects as a team, giving us the freedom to collaborate and work on aspects of the project that play to our strengths or interests. The format of my day then depends on how deep I am into a project.
In the early stages of a project, I spend a lot of time meeting with the company’s team in order to understand their vision, their problems, and how we might best augment their work. Once we’ve scoped out the project, I spend my time coding (Python) in order to perform and automate statistical analyses, build data products, and generate data visualizations. When that work is complete or almost complete, we share our findings with the stakeholders and work together to develop an action plan on implementing the work into the product or processes.
Help us picture your work environment.
Due to the nature of our work, we try to embed with the companies we are working with on a regular basis. This means that I have a desk at most of the companies I do projects with, but we are free to work where we like. On most days, approximately 20 percent of our day is spent in meetings and the rest of the time is spent in our data science office working or coding. On some weeks, we take a day to work from home in order to get away from meetings and focus on engineering.
What do you love about the work you do?
Working at High Alpha is incredibly exciting, especially because I get to work at the portfolio level to solve a variety of different problems with various teams. I get to regularly collaborate not just with the data scientists on my team, but also with leadership, engineering, sales and other departments within companies across the High Alpha portfolio. This keeps things very interesting!
Which personality traits, interests, and abilities are important or common for a person to succeed in and enjoy this occupation?
Curiosity and creativity are necessary: we are always looking for new and optimal ways to apply our knowledge to a wide range of problems. A healthy dose of skepticism is also helpful in reining in that creativity, understanding what we are doing, and avoiding traps like assumptions and bias in data and models. Importantly, the best data scientists that I have worked with are full of humility. These people listen and engage with others, recognizing that they need to collaborate with others who understand the business better than they do in order to be effective. They also understand that this line of work is all about experimentation, and they persevere despite failures in experimentation.
Which tools/technologies or technical skills are particularly important for a person to be proficient in for data science jobs?
A data scientist should know how to write code (usually Python or R) and be comfortable handling a variety of programming tasks, including querying databases, data cleaning, transformation, dealing with large volumes of data, and performing analyses on the data. Good knowledge of statistics, linear algebra and calculus are vital, as are the foundations of machine learning.
Which soft skills (aka general business skills or employability skills) are particularly important for a person to be proficient in for data science jobs?
Problem-solving, communication, teamwork and project management skills are all essential in this line of work. In addition, the ability to understand the business needs of an organization and communicate your work and findings in a way that is approachable and actionable to the people you work with is incredibly valuable.
From your experience with new grads applying for and beginning data science jobs, are they missing any particular knowledge, skills, or experiences that hold them back? Please describe.
As I myself experienced in school, many new grads do not have wide experience with applying the knowledge gained from the classroom to real problems and data. Additionally, the data that we use in school is much smaller, cleaner and structured compared to the kind of data that we often have to work within the industry. Taking your skills outside of the classroom and applying them to interesting data and hard problems will go a long way!
Which resources, people, books, websites, etc. would you recommend to those who want to learn more or advance their skills in this occupation?
For online resources, I would recommend Kaggle.com, Andrew Ng’s Machine Learning course on Coursera, and Fast.AI. Textbooks I would recommend include Artificial Intelligence by Stuart Russell and Peter Norvig, Elements of Statistical Learning by Hastie, Tibshirani and Friedman, and Deep Learning by Goodfellow, Bengio and Courville.
What encouragement or advice would you offer to others considering this occupation or wanting to stand out amongst others?
There is so much to learn in data science, so I would advise focusing most of your time on building things (projects, analyses, visualizations, code) rather than trying to study and memorize everything. There is definitely a lot of merit to be gained from theoretical study, but the bulk strength of my data science foundation has come from practice via the application of acquired skills and knowledge on real-world problems. The added upside to this approach is that it helps you build a portfolio to show to potential employers.
The second piece of advice that I would offer is to be unafraid of digging into the tools and packages that you use and asking how they work. A lot of tools in data science have already been built and packaged for us, which is a huge boon when trying to move fast on analyses and building products; however, models and methods each have their own limitations and quirks. Having an intuition for them, understanding them and knowing where they should or should not be used will help you progress from proficiency to mastery.
Finally, try to actively resist imposter syndrome. The field of data science has more facets and tools than any one person can possibly master. While you should strive to keep learning and growing, don’t beat yourself up about having to learn everything, and never think that you are not good enough!