
Data science is a hot new industry, but what skills and background do you need to break into the field? Essentially, data science, data engineering and data analytics are broad — and sometimes ambiguous — terms that describe a litany of skills and job titles in the world of data analytics.
“The title of ‘data scientist’ is broadly applied within different organizations, making it difficult to provide a complete and noncontroversial list of required skills. At a high level, a data scientist needs a mastery of the tools and techniques to access, transform, analyze and leverage the data of their organization,” says Kyle Polich, principal data scientist at DataScience.
If your company is looking to hire data scientists or analysts, it’s important to know what you’re hiring for. Data jobs often encompass a lot more than just data; there are people specifically dedicated to each stage of the process from collecting, to warehousing, to analyzing and to using that data to transform the business. Ultimately, a good data strategy relies on a number of qualified individuals who can write algorithms, manage and collate data, interpret the data and communicate it to key stake holders.
Data warehousing
Warehousing data is a task in and of itself, because the more data you have, the more servers, hardware and third-party services you will need to store it. However, data warehousing skills include more than just the ability to capture and store data, it’s also about interpreting the data and possibly even making critical decisions and tough choices to make sure data retrieval and analysis can remain cost-effective, according to Polich.
“Data warehousing roles, which focus on Extract, Transform and Load (ETL) and data ingestion, are generally distinct from data science roles. The former focus on capturing, storing, and pre-processing the data while the latter focus on extracting insight from the data,” says Sham Mustafa, CEO of Correlation One, a company that is focused on matching data scientists and hiring companies.
Ashish Thusoo, co-founder & CEO, Qubole, a cloud-scaling data processing company, has worked in data science roles throughout his career. For him, one of the most important skills around data warehousing includes “understanding the capabilities and limitations of the technology.” Beyond that, he says it’s crucial that employees working in this area also understand how to translate business requests into SQL queries, so that data can be quickly retrieved when it’s needed.
Essentially, hiring the right person for data warehousing will mean finding a candidate who can strike a comfortable balance between understanding how to capture and store data and how to meaningfully interpret it, rather than being completely focused on one or the other.
“They do not necessarily have to be experts in the subject, or know how to create, run and maintain the warehouse independently, but they need to know how to inspect them and query efficiently to get their results,” says Thusoo.
Data collection
Data collection is an enormous undertaking, especially considering that companies tend to collect far more data than they can actually use or need. Before you can hire the right employees to help with data collection, you actually need to know what data you want to collect, says Mustafa.
But the biggest problems in data collection arise when businesses are faced with the “four V’s of big data: volume, variety, velocity and veracity,” says Polich. And one person can’t deal with all four. For example, figuring out a strategy to deal with the velocity and volumes of data is typically an area for data engineers, rather than data scientists or data analysts says Mustafa.
And before you can even determine what skills you need for data collection, it’s important to first consider your audience and customer base. Polich gives the example of a bank, which can’t withstand any down time or lag in data retrieval, so companies need to hire accordingly. That might mean hiring people who have worked in similar high-stress environments, where certain aspects of data matter more than in other industries.
Alternatively, he also gives the example of a social media network, which can probably withstand a minimal amount of lag or inconsistency in data retrieval, especially if it results in cost-savings. That might mean you can hire someone with other skills that are important to your business or someone more accustomed to working in agile and innovative environments. Taking time to consider how your businesses can use data and what data you actually need to collect will help you hire the right person for the job.
Thusoo says he looks for workers who understand the intricacies of data collection, and everything that can go wrong with or taint data. “There is an old saying in computing, ‘Garbage in, garbage out’. More than anything else, this applies to data. Your resume should not only show that you have worked with systems that are involved in this process, but also that you are adept at finding data quality issues and resolving them.”
Data analysis
Having data is great, but if you can’t understand what it means for your company, then it’s ultimately a waste of resources. In the past, Thusoo says that it was important to find data analysts with skills in SQL and statistical and modeling tools like SAS and SPSS. But now, he says, as programming becomes more ubiquitous in the industry, and easier to learn, companies will want to look for other skills.
“Companies building modern data science capabilities should look for employees with programming abilities in Python, deep learning libraries and who can work with big data tools and infrastructure such as Spark, Hadoop and Hive apart from the traditional tools such as SQL,” says Thusoo.
Part of hiring the right person to assist in data analysis also includes determining how high-level you want your data analysis. For example, Mustafa says for companies interested in a high-level interpretation, one that looks at user activity and engagement, or to predict trends, you might want someone with a broader knowledge of data science. However, for businesses that want to hone in on large amounts of data, or focus on predictions, you’ll want to hire people with more specific skills. Mustafa says candidates with knowledge of optimization theory and machine learning will help build sound prediction models. Meanwhile, businesses that are tackling large amounts of data, you’ll want people well versed in tools like Hadoop and Apache Spark.
Data transformation
The impact of data on businesses has been huge, and it’s ushered in the age of digital transformation, and companies are scrambling to keep up with the rapid pace of technology. Part of that digital transformation revolves around data and properly integrating it into the day-to-day business. It’s something that requires not only a solid foundation in technology, but also a deep understanding of the business side of the company.
Transformation is about how data can help shape the future of the business and keep the company modern and innovative. That means, you’ll want to hire people who can show their ability to assess a complex data situation, oftentimes from multiple sources, and determine important, says Thusoo.
“Moreover, a data scientist must be able to visualize data in an informationally compact way. Visualization skill is the key to telling a story with data, which is the single most important skill for a data scientist. Telling a story with data, or communicating what the data is saying, is how data scientists ultimately add value for their employers,” says Mustafa.
Soft skills
While most of the skills mentioned are, for the most part, technical, it’s important not to overlook soft skills. The people you hire who are tasked with collecting, housing and interpreting that data are also going to be responsible for communicating it effectively to business executives. You’ll want to hire someone with strong communication skills to help balance out the more technical side, especially as big data is an emerging trend in businesses — not everyone in the company will be up to speed.
You want people with the right technical skills, of course, but it’s just as important to make sure you have employees that are willing to challenge the status quo in data, and push boundaries.
“Companies need to hire new team members based not just on their skillset and tool knowledge, but on their dedication to staying on top of the field. A candidate who is a good match should be familiar with the tools that are currently employed in the organization, but also bring something innovative to the table,” says Polich.
This article was originally published on www.cio.com and can be viewed in full


Archive
- October 2024(44)
- September 2024(94)
- August 2024(100)
- July 2024(99)
- June 2024(126)
- May 2024(155)
- April 2024(123)
- March 2024(112)
- February 2024(109)
- January 2024(95)
- December 2023(56)
- November 2023(86)
- October 2023(97)
- September 2023(89)
- August 2023(101)
- July 2023(104)
- June 2023(113)
- May 2023(103)
- April 2023(93)
- March 2023(129)
- February 2023(77)
- January 2023(91)
- December 2022(90)
- November 2022(125)
- October 2022(117)
- September 2022(137)
- August 2022(119)
- July 2022(99)
- June 2022(128)
- May 2022(112)
- April 2022(108)
- March 2022(121)
- February 2022(93)
- January 2022(110)
- December 2021(92)
- November 2021(107)
- October 2021(101)
- September 2021(81)
- August 2021(74)
- July 2021(78)
- June 2021(92)
- May 2021(67)
- April 2021(79)
- March 2021(79)
- February 2021(58)
- January 2021(55)
- December 2020(56)
- November 2020(59)
- October 2020(78)
- September 2020(72)
- August 2020(64)
- July 2020(71)
- June 2020(74)
- May 2020(50)
- April 2020(71)
- March 2020(71)
- February 2020(58)
- January 2020(62)
- December 2019(57)
- November 2019(64)
- October 2019(25)
- September 2019(24)
- August 2019(14)
- July 2019(23)
- June 2019(54)
- May 2019(82)
- April 2019(76)
- March 2019(71)
- February 2019(67)
- January 2019(75)
- December 2018(44)
- November 2018(47)
- October 2018(74)
- September 2018(54)
- August 2018(61)
- July 2018(72)
- June 2018(62)
- May 2018(62)
- April 2018(73)
- March 2018(76)
- February 2018(8)
- January 2018(7)
- December 2017(6)
- November 2017(8)
- October 2017(3)
- September 2017(4)
- August 2017(4)
- July 2017(2)
- June 2017(5)
- May 2017(6)
- April 2017(11)
- March 2017(8)
- February 2017(16)
- January 2017(10)
- December 2016(12)
- November 2016(20)
- October 2016(7)
- September 2016(102)
- August 2016(168)
- July 2016(141)
- June 2016(149)
- May 2016(117)
- April 2016(59)
- March 2016(85)
- February 2016(153)
- December 2015(150)