Skip to main content

Is today's world all about creativity and ideation?

Are they the seeds to be nurtured to bring in automation, innovation and transformation.  There is a saying, necessity is the mother of invention. I would say, innovation is amalgamation of creativity and necessity.  We need to understand the ecosystem, to apply creativity and identify the ideas to bring in change. We need to be competent with changing ecosystem and think beyond the possible. What is the biggest challenge in doing this? "Unlearning and Learning", we think the current ecosystem is the best. Be it health, finserve, agriculture or mechanical domain, we need to emphasize with the stakeholders, to come up with the strategy to drive. The very evident example here is the quality of life is changing every millisecond. Few decades back the phone connection was limited to few, but today all the millennials are having a mobile phone. Now phone is not just a medium to talk, but are so powerful devices that an innovative solution can be developed on it.

What skills one need to be a Successful Data Scientist?

Hello Data Scientists,

Let me continue from my last blog http://outstandingoutlier.blogspot.in/2017/08/how-data-scientist-can-help.html :: “How Data Scientist can help organization grow?” where I wrote importance of Data Scientists and their role in growth. To conclude my last blog, I would say Data Scientist are skilled with analytical capability to analyze huge data cloud(s) by intelligently articulating the outcome, allowing leaders to enrich their choices. They empower everyone by unleashing the hidden data views to outer world for ready consumption.

To be a good scientist start with A to V…Z i.e., Analytical, Industry Domain, Mathematics, Statistical algorithms, Power Tools, User Friendly and easy to read Visualization with Z as Zeal to learn power of data. –  AG

To be an expert in any field one should be qualified on certain skills, it’s same for Data Scientist as well. From our school days, we have been taught few concepts around Statistics and as we grow and go for graduation we were taught industry domains and we try to go for specialization we get an exposure to Power tool like Excel and reporting tools and with fast change in technology there are lot of new Powerful tool like R, Python, SAS etc.

Let me help articulate and what skills (in no particular order) one should possess to be a Data Scientist. Kindly bear in mind this revolves around “Data Science” which is a combination of “DATA” and “SCIENCE” hence most of the skills revolve around these 2 key areas.

MATHAMETICS:
To be a Data Scientist, basic concept for Mathematics is very important as this profession is all about dealing in numbers. One should know basic mathematics details like Different Data Types, Data Operations, Probability, Probability Distribution etc. It is GOOD TO HAVE hands on experience on Algebra, Equations and Probability fundamentals. That said one does not need to be graduate in Mathematics, basic school level math’s knowledge is good enough to start Analytical journey towards Data Science.

ANALYTICAL SKILLS:
It is one of the important skill (but not must) to have analytical mindset. Anyone who has interest in solving Puzzles or working on Brain Teasers are better placed and will have edge over others who does not find it interesting. I love puzzle solving, my favorite time pass is successfully finishing Sudoku. GOOD TO HAVE skills to get going but MUST HAVE SKILL from long term perspective.  

STATISTICAL ANALYSIS:
Statistical Analysis complement analytical Skills. This is the most important and MUST HAVE skillset for a good Data Scientist. Individuals with good Statistical Analysis skills can infer better outcome using right techniques from the raw data.

POWER TOOLS:
There are 2 layer of tools to be a proficient Data Scientist, Data Collection tools and Data Consumption/Processing tools

Data Collection Tools:
Data can be collected either as structured data or unstructured data. Unstructured data gets collected primarily as the secondary data from applications where as Structured data is well designed data collected as applications are used.  Unstructured data can be later processed using tools and be converted to structure data. CSV, XML, Logs, JSON are form of unstructured data whereas SQL and other relational databases holds structure data. GOOD TO KNOW tools.

Data Processing Tools:
Once data is collected it is MUST for Data scientist to know one statistical tools like R, Python, SAS. This will help gain productivity and avoid performing all manual calculations. There are easy to use basic tools like EXCEL, SQL and Scientific calculators but each has its limitation. GO FOR “R”, PYTHON” or any other statistical programming language.  It is good to understand any Big data tools like Hadoop where data is stored in multiple server because of data size but managed by a single server etc.

INDUSTRY KNOWLEDGE OR DOMAIN:
As a scientist, it is always advantageous to understand industry domain as statistical calculations are based on Industrial parameters. Let me take an example Significance Level can be 95% i.e., 2σ - 3σ limit whereas Healthcare and Aviation industry anything less than 6σ can be disastrous. Outcomes will be more relevant and accurate as one understands domain. This is good to have skill, but I personally will keep it under MUST HAVE SKILL.
   
UX/UI OR VISUALIZATION:
Data if not presented to the right forum in RIGTH FORM, it is not useful. Data Visualization is GOOD TO HAVE skill as it can be complemented by Data writers for final consumptions by leaders. R has got good graphical visualization outcomes, those can be leveraged to come up with fantastic looking dashboards.  

In my next blog, I will share more insight into how to pick any one statistical programming language. Though it is a matter of choice however I will share my rational for picking R and work through it.    

Thank you once for sparing time going through this article, I hope it must have helped you understand what it will take for an individual to be a successful Data Scientist. Kindly share your views and what you would like to see and hear from me in my future blogs.     

Outstanding Outliers Productivity SolutionsThank you...
Outstanding Outliers "AG".  


Comments

Popular posts from this blog

Z and T distribution values using R

Hello Data Experts, Let me continue from my last blog http://outstandingoutlier.blogspot.in/2017/08/normality-test-for-data-using-r.html “ Normality test using R as part of advanced Exploratory Data Analysis where I had covered four moments of statistics and key concept around probability distribution, normal distribution and Standard normal distribution. Finally, I had also touched upon how to transform data to run normality test. I will help recap all those 4 moments. Those 4 moments of statistics. First step covers Mean, Median and Mode, it is a measure of central tendency. Second step covers Variance Standard Deviation, Range, it is a measure of dispersion. Third step covers Skewness, it is a measure of asymmetry. Fourth step covers Kurtosis, it is a measure of peakness. To get standardized data use “scale” command using R whereas run “pnorm” command to get probability of a value using Z distribution. To understand if data follows normality we can e

Practical usage of RStudio features

Hello Data Experts, Let me continue from my last blog Step by Step guide to install R :: “Step by Step guide to install R” where I had shared steps to install R framework and R Studio on windows platform. Now that we are ready with Installation and R Studio, I will take you through R Studio basics. R Studio has primarily 4 sections with multiple sub tabs in each window: Top Left Window: Script editor: It is for writing, Saving and opening R Scripts. Commands part of Script can also be executed from this window. Data viewer: Data uploaded can be viewed in this window.   Bottom Left Window: Console: R Commands run in this window.   Top Right Window: Workspace: workspace allow one to view objects and values assigned to them in global environment. Historical commands: There is an option to search historical commands from beginning till last session. Beauty of this editor is that historical commands are searchable. Once historical commands are searched they can be

Code Branch and Merge strategies

Learn Git in a Month of Lunches Hello Everyone, IT industry is going through a disruptive evolution where being AGILE and adopting DevOps is the key catalytic agent for accelerating the floor for success. As explained in my earlier blog, they complement each other rather than competing against one another. If Leaders will at the crossroad where in case they need to pick one what should be their pick. There is no right or wrong approaching, it depends on the scenario and dynamics for the program or project. I would personally pick #DevOps over Agile as its supremacy lies in ACCELERATING delivery with RELIABILITY and CONSISTENCY . This path will enable and empower development teams to be more productive and prone to less rework. Does this mean adopting DevOps with any standard will help reap benefits? In this blog, I will focus on importance of one of the standard and best practice around Code branching and merging strategy to get the desired outcome by adopting DevOps. To