Since we don’t have that many data points to get the confidence interval, we can use the idea of bootstrapping. Bootstrapping: the practice of sampling from the observed data (X,Y) for estimating statistical properties.
For instance, we can do sampling with replacement to create multiple samples. And from those samples, we can perform multiple trainings to get multiple fittings
This would decrease the effects of the outliers
Once we have multiple regression models, we can use standard error to quantify them.
Assume $y = \beta_0 + \beta_1x + \epsilon$ and model $\epsilon$ as a random variable with mean zero and variance $\sigma^2$
$$ SE(\hat\beta_1) = \sqrt{VAR(\beta_1)}\\
\text{VAR}(\beta_1) = \text{VAR}(\frac{\sum_i(x_i-\bar x)(y_i-\bar y)}{\sum_i(x_i-\bar x)^2}) \\ \vdots\\ \text{VAR}(\beta_1)=\frac{\sigma^2}{\sum_i(x_i-\bar x)^2} $$
so the standard error is thus
$$ SE(\hat\beta_1) = \frac{\sigma}{\sqrt{\sum_i(x_i-\bar x)^2}} $$
Similarly,
$$ \text{VAR}(\beta_0)=\text{VAR}(y-\beta_1x)\\\vdots\\ SE(\beta_0) = \sigma\sqrt{\frac{1}{n}+\frac{\bar x^2}{\sum_i(x_i-\bar x)^2}} $$
better data → $\sigma^2$ goes down
more data → n goes up, SE goes down
larger coverage → $x_i$ goes up, SE goes down
$$ \sigma \approx \sqrt{\sum \frac{(\hat f(x)-y_i)^2}{n-2}} $$
If indeed noise variance is unknown (not uncommon), it too needs to be estimated from the data.