Are they the seeds to be nurtured to bring in automation, innovation and transformation. There is a saying, necessity is the mother of invention. I would say, innovation is amalgamation of creativity and necessity. We need to understand the ecosystem, to apply creativity and identify the ideas to bring in change. We need to be competent with changing ecosystem and think beyond the possible. What is the biggest challenge in doing this? "Unlearning and Learning", we think the current ecosystem is the best. Be it health, finserve, agriculture or mechanical domain, we need to emphasize with the stakeholders, to come up with the strategy to drive. The very evident example here is the quality of life is changing every millisecond. Few decades back the phone connection was limited to few, but today all the millennials are having a mobile phone. Now phone is not just a medium to talk, but are so powerful devices that an innovative solution can be developed on it.
Hello Data Experts,
T Distribution = Sample mean + T(1-α, n-1,) value* (SD/square root of (sample size))
result will be 1.959964
Pilot launch helped e-retailer that there will be 95% confidence that average sale will fall in the range from 4.24 to 5.76
We can take manual route and using T table come up with the T score for 95% confidence and 199 degree of freedom and then calculate confidence intervals but using R it is simple to get T score by executing “qt” command.
result will be 1.971957
result will be .975, i.e., 95% confidence.
I hope this topic was helpful in understating Z and T distribution concepts and how to derive Z Score and T score using R. Sample size and standard deviation for the population plays key role in deciding which technique to opt for.
Let me continue from my last blog
http://outstandingoutlier.blogspot.in/2017/08/normality-test-for-data-using-r.html
“ Normality test using R as part of
advanced Exploratory Data Analysis where I had covered four moments of
statistics and key concept around probability distribution, normal distribution
and Standard normal distribution. Finally, I had also touched upon how to
transform data to run normality test. I will help recap all those 4 moments. Those
4 moments of statistics.
- First step covers Mean,
Median and Mode, it is a measure of central tendency.
- Second step covers Variance
Standard Deviation, Range, it is a measure of dispersion.
- Third step covers Skewness,
it is a measure of asymmetry.
- Fourth step covers Kurtosis,
it is a measure of peakness.
We had learned thus far that
probability of any value is always Zero but can get probability less than or
greater using standard normal distribution leveraging pnorm value. Generally,
in the industry we have come across 95% as the starting benchmark value for
confidence that expected outcome will be within this range. This definition of confidence
in statistical terms called as confidence level. In simple statistical definition,
it means for 95% of the samples population will follow the same mean.
We will touch upon Z Distribution
and T Distribution techniques. There is always
an open query when to use which technique. As a matter of experience and usage,
I follow below guiding principle for myself to proceed, If the size of a sample
is < 30 (sample less than 30 is categorized as small in statistical world) and
the Standard deviation for population is unknown, T distribution can should be the first choice whereas if the
sample size is large i.e., >30 as well as SD for population is known Z distribution
should be the technique. As sample
size increase they trend closer output.
Confidence Interval = Sample
mean + Margin of Error
Z Distribution = Sample
mean + Z(1-α) value* (SD/square root of (sample size))T Distribution = Sample mean + T(1-α, n-1,) value* (SD/square root of (sample size))
Let us consider a e-retailer who
has 10500 register customers whom e-retailer wants to launch a new offer but
before doing so she would like to get the confidence level of success. Before
going for a launch, they chose 200 customers and granted then an access to new
promotion where on an average 5 new products were purchased during this selecting
launch with a standard deviation of 6. E-retailer typically launch new
promotion every month hence they have a sd from last launch to the full
population which is 5.5. Before new full launch she wanted the 95% confidence
level to go full scale.
Here we have a sample size >
30 (Big sample size) and population SD is also known this Z-Distribution is the
appropriate option here.
We can take manual route and
using Z table come up with the Z
score for 95 % confidence level and then then calculate confidence intervals
but using R it is simple to get Z score by executing “qnorm” command.
# for 95% confidence, a value
will be (for easy remembrance follow 95+(100-95)/2 = 97.5%).
qnorm (.975) result will be 1.959964
Once we get Z value
(1.959964), sample mean as 5, Population SD (5.5) and sample size (200), applying
a formula will get confidence level.
5 + (1.971957*(6/Square root (200))) to 5 + (1.971957*(6/Square root (200)))
Pilot launch helped e-retailer that there will be 95% confidence that average sale will fall in the range from 4.24 to 5.76
Let us assume there was no
earlier pilot launch and hence it’s for the first-time e-retailer is trying to
launch promotion. In this case, only change will be instead of using population
SD, it is recommended to use sample SD with Degree of freedom. Degree of freedom can be considered as n-1
because if we have n-1 value, last value will be confirm/fix.
We can take manual route and using T table come up with the T score for 95% confidence and 199 degree of freedom and then calculate confidence intervals but using R it is simple to get T score by executing “qt” command.
# for 95% confidence, a value
will be (for easy remembrance follow 95+(100-95)/2 = 97.5%) whereas degree of
freedom will be 199 as sample size minus 1
qt(.975, 199) result will be 1.971957
Once we get T value (1.971957), sample mean as 6, Sample SD (6) and sample size (200), applying a formula will get confidence level.
5 + (1.971957*(6/Square root (200))) to 5 + (1.971957*(6/Square root (200)))
Pilot launch helped e-retailer
that there will be a 95% confidence that average sale will fall in the range
from 4.16 to 5.84
If we know the benchmark
confidence level, we can proceed with range but if we would like to understand the
confidence level for a LCL or UCL we can use
pt(1.971957, 199) result will be .975, i.e., 95% confidence.
I hope this topic was helpful in understating Z and T distribution concepts and how to derive Z Score and T score using R. Sample size and standard deviation for the population plays key role in deciding which technique to opt for.
Thank you for going through this blog,
I hope it helped you built sound foundation of Z and T Distribution using R. Kindly
share your valuable and kind opinion. Please do not forget to suggest what you
would like to understand and hear from me in my future blogs.
Thank you...
Outstanding Outliers::
"AG".
Comments
Post a Comment