preloader

Loading ...

0
SeekACE Academy . 1st Jul, 2021

DATA SCIENCE INTERVIEW PART-2

21) What are the drawbacks of the linear model?

  • The assumption of linearity is that, it assumes a linear relationship or a straight-line relationship between dependent and independent variables which doesn’t hold true all the time.
  • It cannot be used for count outcomes or binary outcomes.
  • There are overfitting problems that it can’t solve.
  • It assumes that data is independent which is usually but not always true.

22) Explain what regularization is and why it is useful?

Regularisation is the process of making modifications to the learning algorithm to improve generalisation performance but does not improve the performance on data sets. This process regularizes or shrinks the coefficients towards zero. In simple words, regularisation discourages learning a more complex or flexible model to prevent overfitting.

23) Explain L1 and L2 Regularization?

Both L1 and L2 regularizations are used to avoid overfitting in the model. L1 regularization or Lasso and L2 regularization or Ridge Regularization remove features from our model. The key difference between them is the Penalty term. L1 regularization, however, is more tolerant to outliers and works much better for feature selection in case we have huge number of features.

24) Differentiate between univariate, bivariate and multivariate analysis?

Descriptive statistical analysis techniques employing single variable   at a given point of time are called Univariate Analysis and are one of the simplest forms of statistical analysis. Pie charts of sales based on territory involve only one variable, Bar charts, histogram etc are some of the examples of Univariate Analysis.

If the quantitative analysis attempts to understand the changes that occur between 2 variables and to what extent, it is often denoted as bivariate analysis. The variables generally used are denoted as x and y, where one variable is dependent and the other independent Analysing the volume of sale and spending can be considered an example of bivariate analysis. These analyses are often used in quality of life search.

Analysis that deals with the study of more than two independent variables to predict the effect on the dependent variable is referred to as multivariate analysis. Multivariate analysis due to the size and complexity of the underlying data sets, requires much computational effort.

25) What do you mean by the law of large numbers?

 This is a principle of probability, according to which the frequency of occurrence of events that possess the same likelihood are evened out after they undergo a significant number of trials.

26) What is a statistical interaction?

It is a statistically established relation between two or more variables when the effect of one factor (input variable) on the dependent variable (output variable) differs among levels of another factor.

27) What do you mean by the term Normal Distribution?

An arrangement of data during which most values cluster within the middle range and rest taper off symmetrically towards left or right is named normal distribution. The graphical representation of random variables is distributed in the form of a symmetrical bell-shaped curve in Normal distribution with peak always in middle. For example, if we measure height, maximum people are of average height and very small numbers of people are taller and shorter than average height. Mean, mode, and median are same in case of Normal Distribution.

 

28) What is the goal of A/B Testing?

The goal of A/B Testing is to spot any changes to the web page to maximise or increase the result of interest. A/B testing may be a fantastic method for determining the most effective online promotional and marketing strategies for your business. It can be used to test everything from website copy to sales, emails to search ads and has thus become the domain for Marketers.

An example of this could be identifying the click-through rate for a banner ad.

29) What is p-value?

When you perform a hypothesis test in statistics, a p-value approach uses calculated probability to help you determine the strength of your results and determine whether there is evidence to reject the null hypothesis. p-value denotes a number between 0 and 1. Based on the value it will denote the strength of the results.

Low p-value (≤ 0.05) indicates strength against the null hypothesis which suggests we will reject the null Hypothesis. High p-value (≥ 0.05) indicates strength for the null hypothesis which suggests we will accept the null Hypothesis. p-value of 0.05 indicates the Hypothesis could go either way. Therefore, in case of High P values, your data are consistent with a true null whereas in Low P values, your data are not consistent with a true null.

30) What is the difference between Point Estimates and Confidence Interval?

A Point Estimate gives statisticians a single value as the estimate of a given population parameter denoted by p. Point estimates are subject to bias, where the bias denotes the difference between the expected value of the estimator and true value of the population parameter. A well-defined formula is used to calculate point estimates. Method of Moments and Maximum Likelihood estimator methods are used to derive Point Estimators for population parameters.

A confidence interval gives us a range of values which is likely to contain the population parameter. It is generally preferred, as it has the lower and upper limits which serve as the bounds of the interval and tells us how likely this interval is to contain the population parameter. This likeliness or probability is termed as Confidence Level or Confidence coefficient and represented by 1 — alpha, where alpha denotes the level of significance. How precise the interval is, depends on sample statistics and the margin of error.

Shopping Cart

Loading...