Random Forest different results for same observationResponse-distribution-dependent bias in random forest...
"The cow" OR "a cow" OR "cows" in this context
Is there a way to generate a list of distinct numbers such that no two subsets ever have an equal sum?
How to fry ground beef so it is well-browned
Read line from file and process something
What does the integral of a function times a function of a random variable represent, conceptually?
Like totally amazing interchangeable sister outfits II: The Revenge
How to stop co-workers from teasing me because I know Russian?
Critique of timeline aesthetic
Why did C use the -> operator instead of reusing the . operator?
What is the smallest unit of eos?
"You've called the wrong number" or "You called the wrong number"
How to write a column outside the braces in a matrix?
What happened to Captain America in Endgame?
Does a large simulator bay have standard public address announcements?
Rivers without rain
Discriminated by senior researcher because of my ethnicity
On The Origin of Dissonant Chords
Two field separators (colon and space) in awk
What happens to Mjolnir (Thor's hammer) at the end of Endgame?
What happens in the secondary winding if there's no spark plug connected?
Why do games have consumables?
What is the most expensive material in the world that could be used to create Pun-Pun's lute?
bldc motor, esc and battery draw, nominal vs peak
Extension of 2-adic valuation to the real numbers
Random Forest different results for same observation
Response-distribution-dependent bias in random forest regressionRandom forest - binary classification vs. regression?random forest classification in R - no separation in training setLow explained variance in Random Forest (R randomForest)randomForest vs randomForestSRC discrepanciesStrange Behavior of Random Forest in Binary ClassificationRandom Forest vs. General Additive Model for PredictingRandom Forest not improving Regression Treemeasure for prediction error in random forest regressionRandom Forest Regression with sparse data in Python
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
$begingroup$
Hi I am fairly new to Random Forest estimation, but I could not find a questions similiar to mine. I was surprised that the predictions are different using the same predictors. I would have expected the same. I understand, that the model would be different with each estimation, but getting different predictions for the same predictors?
library(randomForest)
set.seed(100)
df<-mtcars
rt.est<-randomForest(mpg ~ .,
data = df,
ntree=1000)
predict(rt.est)
df.double<-rbind(df,df[32,])
rt.est<-randomForest(mpg ~ .,
data = df.double,
ntree=1000)
predict(rt.est)
The results for the last Observation on the Volvo142E are similiar but not the same. Why?
r random-forest
New contributor
Max M is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
Hi I am fairly new to Random Forest estimation, but I could not find a questions similiar to mine. I was surprised that the predictions are different using the same predictors. I would have expected the same. I understand, that the model would be different with each estimation, but getting different predictions for the same predictors?
library(randomForest)
set.seed(100)
df<-mtcars
rt.est<-randomForest(mpg ~ .,
data = df,
ntree=1000)
predict(rt.est)
df.double<-rbind(df,df[32,])
rt.est<-randomForest(mpg ~ .,
data = df.double,
ntree=1000)
predict(rt.est)
The results for the last Observation on the Volvo142E are similiar but not the same. Why?
r random-forest
New contributor
Max M is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
$begingroup$
Once you have an answer to the question as stated, you shouldn't really change the substance of the question in a way that breaks the connection with the existing answers. If you wan to ask another question as a fork of this, just start a new thread. You can link back for context, if you want.
$endgroup$
– gung♦
2 hours ago
$begingroup$
I did not alter the questions substantially. It was always about having a different prediction for the same predictors.
$endgroup$
– Max M
2 hours ago
add a comment |
$begingroup$
Hi I am fairly new to Random Forest estimation, but I could not find a questions similiar to mine. I was surprised that the predictions are different using the same predictors. I would have expected the same. I understand, that the model would be different with each estimation, but getting different predictions for the same predictors?
library(randomForest)
set.seed(100)
df<-mtcars
rt.est<-randomForest(mpg ~ .,
data = df,
ntree=1000)
predict(rt.est)
df.double<-rbind(df,df[32,])
rt.est<-randomForest(mpg ~ .,
data = df.double,
ntree=1000)
predict(rt.est)
The results for the last Observation on the Volvo142E are similiar but not the same. Why?
r random-forest
New contributor
Max M is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
Hi I am fairly new to Random Forest estimation, but I could not find a questions similiar to mine. I was surprised that the predictions are different using the same predictors. I would have expected the same. I understand, that the model would be different with each estimation, but getting different predictions for the same predictors?
library(randomForest)
set.seed(100)
df<-mtcars
rt.est<-randomForest(mpg ~ .,
data = df,
ntree=1000)
predict(rt.est)
df.double<-rbind(df,df[32,])
rt.est<-randomForest(mpg ~ .,
data = df.double,
ntree=1000)
predict(rt.est)
The results for the last Observation on the Volvo142E are similiar but not the same. Why?
r random-forest
r random-forest
New contributor
Max M is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Max M is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 2 hours ago
gung♦
110k34269540
110k34269540
New contributor
Max M is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 8 hours ago
Max MMax M
1134
1134
New contributor
Max M is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Max M is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Max M is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$begingroup$
Once you have an answer to the question as stated, you shouldn't really change the substance of the question in a way that breaks the connection with the existing answers. If you wan to ask another question as a fork of this, just start a new thread. You can link back for context, if you want.
$endgroup$
– gung♦
2 hours ago
$begingroup$
I did not alter the questions substantially. It was always about having a different prediction for the same predictors.
$endgroup$
– Max M
2 hours ago
add a comment |
$begingroup$
Once you have an answer to the question as stated, you shouldn't really change the substance of the question in a way that breaks the connection with the existing answers. If you wan to ask another question as a fork of this, just start a new thread. You can link back for context, if you want.
$endgroup$
– gung♦
2 hours ago
$begingroup$
I did not alter the questions substantially. It was always about having a different prediction for the same predictors.
$endgroup$
– Max M
2 hours ago
$begingroup$
Once you have an answer to the question as stated, you shouldn't really change the substance of the question in a way that breaks the connection with the existing answers. If you wan to ask another question as a fork of this, just start a new thread. You can link back for context, if you want.
$endgroup$
– gung♦
2 hours ago
$begingroup$
Once you have an answer to the question as stated, you shouldn't really change the substance of the question in a way that breaks the connection with the existing answers. If you wan to ask another question as a fork of this, just start a new thread. You can link back for context, if you want.
$endgroup$
– gung♦
2 hours ago
$begingroup$
I did not alter the questions substantially. It was always about having a different prediction for the same predictors.
$endgroup$
– Max M
2 hours ago
$begingroup$
I did not alter the questions substantially. It was always about having a different prediction for the same predictors.
$endgroup$
– Max M
2 hours ago
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
If you supply the argument newdata the discrepancy disappears: predict(rt.est, newdata=df) gives
Volvo 142E
22.15557
Volvo 142E1
22.15557
When you do not supply newdata, it's probably reporting the out-of-bag results, but I haven't found an explicit clarification of this in the documentation. Samples that are "in-bag" were included in a tree during training as a result of the bootstrap re-sampling procedure; out-of-bag samples were omitted.
We can verify that this is out-of-bag data by calling rt.est$predicted which reports the out-of-bag predictions. The results match predict(rt.est).
Volvo 142E
22.83609
Volvo 142E1
22.85975
One way to think of predict.randomForest is that it's a shortcut to the out-of-bag predictions unless you supply newdata.
OP originally asked about 2 different ensembles of random forests. This portion of the answer addresses why 2 random forest ensembles might make different predictions on the same data.
The trees are different.
First, randomForest is a random procedure: both the samples chosen for each tree are different (bootstrap resampling), and the features chosen at each split are chosen at random (randomized feature subspaces). Without fixing the random seed, we would expect two randomForest runs to produce different results with high probability for the same reason that flipping a fair coin 1000 times will plausibly result in a different sequence of heads and tails. (You have fixed the seed, however the two instances of randomForest will still be different because the random state is altered after the first randomForest is produced.)
Second, The data used to train the models is different. It appears that you've added an additional row. Different data makes for a different model, which makes for a different prediction. When randomForest conducts its bootstrapping, the probability that df[32,] is in-bag for that tree is larger than for each non-duplicated sample. This change to the data will also change the trees, because choices about where to make splits will be influenced by the increased prominence of this sample.
Different trees make different predictions.
Having the same feature values is only half the battle. The other half is how the trees are constructed.
As an example, suppose I have 3 trees constructed with the random forest procedure, each with 1 split.
- This tree has a bootstrap resample and randomly samples
cyl,dispandhp. It picks splitting oncylat 5 as the best split. - This tree has a bootstrap resample and randomly samples
cyl,dispandhp. It picks splitting oncylat 7 as the best split. - This tree has a bootstrap resample and randomly samples
wt,dispandhp. It picks splitting onhpat 123 as the best split.
Clearly there will be different predictions whenever the splits change the decision of a sample. A sample with cyl 6 might go "right" for tree 1, but "left" for tree 2. Feature hp doesn't have a one-to-one relationship to cyl, so a split on hp won't generally match splits on cyl.
$endgroup$
$begingroup$
That i understand. I added one Observation ran the model again and would have expected the same result for the Observation #32 because it has the same predictors.
$endgroup$
– Max M
7 hours ago
$begingroup$
If you have two completely different decision trees, are they guaranteed to make the same predictions?
$endgroup$
– Sycorax
7 hours ago
$begingroup$
I edited my Code to make it clearer what confuses me. I do understand that each run of the randomforest command yields a different model if I do not set a seed, but why is the prediction for my last two observations different? So you are saying that the prediction can slighly deviate at the cutoff Point? But this would be more likely for categorical predictors then?
$endgroup$
– Max M
3 hours ago
$begingroup$
Oh thats surprising to me. So ist a using kind of a mixed random drawn sample to get the predictions for the model? Is this not contradicting your original answer then???
$endgroup$
– Max M
3 hours ago
$begingroup$
No, that's absolutely not what it's doing. Please read the explanation in my answer. The behavior of the function changes depending on what arguments you do or do not give to it. Your original question asked about a comparison between 2 random forest ensembles. Your revised question asks about how the functionpredictworks.
$endgroup$
– Sycorax
3 hours ago
|
show 2 more comments
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Max M is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f405234%2frandom-forest-different-results-for-same-observation%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
If you supply the argument newdata the discrepancy disappears: predict(rt.est, newdata=df) gives
Volvo 142E
22.15557
Volvo 142E1
22.15557
When you do not supply newdata, it's probably reporting the out-of-bag results, but I haven't found an explicit clarification of this in the documentation. Samples that are "in-bag" were included in a tree during training as a result of the bootstrap re-sampling procedure; out-of-bag samples were omitted.
We can verify that this is out-of-bag data by calling rt.est$predicted which reports the out-of-bag predictions. The results match predict(rt.est).
Volvo 142E
22.83609
Volvo 142E1
22.85975
One way to think of predict.randomForest is that it's a shortcut to the out-of-bag predictions unless you supply newdata.
OP originally asked about 2 different ensembles of random forests. This portion of the answer addresses why 2 random forest ensembles might make different predictions on the same data.
The trees are different.
First, randomForest is a random procedure: both the samples chosen for each tree are different (bootstrap resampling), and the features chosen at each split are chosen at random (randomized feature subspaces). Without fixing the random seed, we would expect two randomForest runs to produce different results with high probability for the same reason that flipping a fair coin 1000 times will plausibly result in a different sequence of heads and tails. (You have fixed the seed, however the two instances of randomForest will still be different because the random state is altered after the first randomForest is produced.)
Second, The data used to train the models is different. It appears that you've added an additional row. Different data makes for a different model, which makes for a different prediction. When randomForest conducts its bootstrapping, the probability that df[32,] is in-bag for that tree is larger than for each non-duplicated sample. This change to the data will also change the trees, because choices about where to make splits will be influenced by the increased prominence of this sample.
Different trees make different predictions.
Having the same feature values is only half the battle. The other half is how the trees are constructed.
As an example, suppose I have 3 trees constructed with the random forest procedure, each with 1 split.
- This tree has a bootstrap resample and randomly samples
cyl,dispandhp. It picks splitting oncylat 5 as the best split. - This tree has a bootstrap resample and randomly samples
cyl,dispandhp. It picks splitting oncylat 7 as the best split. - This tree has a bootstrap resample and randomly samples
wt,dispandhp. It picks splitting onhpat 123 as the best split.
Clearly there will be different predictions whenever the splits change the decision of a sample. A sample with cyl 6 might go "right" for tree 1, but "left" for tree 2. Feature hp doesn't have a one-to-one relationship to cyl, so a split on hp won't generally match splits on cyl.
$endgroup$
$begingroup$
That i understand. I added one Observation ran the model again and would have expected the same result for the Observation #32 because it has the same predictors.
$endgroup$
– Max M
7 hours ago
$begingroup$
If you have two completely different decision trees, are they guaranteed to make the same predictions?
$endgroup$
– Sycorax
7 hours ago
$begingroup$
I edited my Code to make it clearer what confuses me. I do understand that each run of the randomforest command yields a different model if I do not set a seed, but why is the prediction for my last two observations different? So you are saying that the prediction can slighly deviate at the cutoff Point? But this would be more likely for categorical predictors then?
$endgroup$
– Max M
3 hours ago
$begingroup$
Oh thats surprising to me. So ist a using kind of a mixed random drawn sample to get the predictions for the model? Is this not contradicting your original answer then???
$endgroup$
– Max M
3 hours ago
$begingroup$
No, that's absolutely not what it's doing. Please read the explanation in my answer. The behavior of the function changes depending on what arguments you do or do not give to it. Your original question asked about a comparison between 2 random forest ensembles. Your revised question asks about how the functionpredictworks.
$endgroup$
– Sycorax
3 hours ago
|
show 2 more comments
$begingroup$
If you supply the argument newdata the discrepancy disappears: predict(rt.est, newdata=df) gives
Volvo 142E
22.15557
Volvo 142E1
22.15557
When you do not supply newdata, it's probably reporting the out-of-bag results, but I haven't found an explicit clarification of this in the documentation. Samples that are "in-bag" were included in a tree during training as a result of the bootstrap re-sampling procedure; out-of-bag samples were omitted.
We can verify that this is out-of-bag data by calling rt.est$predicted which reports the out-of-bag predictions. The results match predict(rt.est).
Volvo 142E
22.83609
Volvo 142E1
22.85975
One way to think of predict.randomForest is that it's a shortcut to the out-of-bag predictions unless you supply newdata.
OP originally asked about 2 different ensembles of random forests. This portion of the answer addresses why 2 random forest ensembles might make different predictions on the same data.
The trees are different.
First, randomForest is a random procedure: both the samples chosen for each tree are different (bootstrap resampling), and the features chosen at each split are chosen at random (randomized feature subspaces). Without fixing the random seed, we would expect two randomForest runs to produce different results with high probability for the same reason that flipping a fair coin 1000 times will plausibly result in a different sequence of heads and tails. (You have fixed the seed, however the two instances of randomForest will still be different because the random state is altered after the first randomForest is produced.)
Second, The data used to train the models is different. It appears that you've added an additional row. Different data makes for a different model, which makes for a different prediction. When randomForest conducts its bootstrapping, the probability that df[32,] is in-bag for that tree is larger than for each non-duplicated sample. This change to the data will also change the trees, because choices about where to make splits will be influenced by the increased prominence of this sample.
Different trees make different predictions.
Having the same feature values is only half the battle. The other half is how the trees are constructed.
As an example, suppose I have 3 trees constructed with the random forest procedure, each with 1 split.
- This tree has a bootstrap resample and randomly samples
cyl,dispandhp. It picks splitting oncylat 5 as the best split. - This tree has a bootstrap resample and randomly samples
cyl,dispandhp. It picks splitting oncylat 7 as the best split. - This tree has a bootstrap resample and randomly samples
wt,dispandhp. It picks splitting onhpat 123 as the best split.
Clearly there will be different predictions whenever the splits change the decision of a sample. A sample with cyl 6 might go "right" for tree 1, but "left" for tree 2. Feature hp doesn't have a one-to-one relationship to cyl, so a split on hp won't generally match splits on cyl.
$endgroup$
$begingroup$
That i understand. I added one Observation ran the model again and would have expected the same result for the Observation #32 because it has the same predictors.
$endgroup$
– Max M
7 hours ago
$begingroup$
If you have two completely different decision trees, are they guaranteed to make the same predictions?
$endgroup$
– Sycorax
7 hours ago
$begingroup$
I edited my Code to make it clearer what confuses me. I do understand that each run of the randomforest command yields a different model if I do not set a seed, but why is the prediction for my last two observations different? So you are saying that the prediction can slighly deviate at the cutoff Point? But this would be more likely for categorical predictors then?
$endgroup$
– Max M
3 hours ago
$begingroup$
Oh thats surprising to me. So ist a using kind of a mixed random drawn sample to get the predictions for the model? Is this not contradicting your original answer then???
$endgroup$
– Max M
3 hours ago
$begingroup$
No, that's absolutely not what it's doing. Please read the explanation in my answer. The behavior of the function changes depending on what arguments you do or do not give to it. Your original question asked about a comparison between 2 random forest ensembles. Your revised question asks about how the functionpredictworks.
$endgroup$
– Sycorax
3 hours ago
|
show 2 more comments
$begingroup$
If you supply the argument newdata the discrepancy disappears: predict(rt.est, newdata=df) gives
Volvo 142E
22.15557
Volvo 142E1
22.15557
When you do not supply newdata, it's probably reporting the out-of-bag results, but I haven't found an explicit clarification of this in the documentation. Samples that are "in-bag" were included in a tree during training as a result of the bootstrap re-sampling procedure; out-of-bag samples were omitted.
We can verify that this is out-of-bag data by calling rt.est$predicted which reports the out-of-bag predictions. The results match predict(rt.est).
Volvo 142E
22.83609
Volvo 142E1
22.85975
One way to think of predict.randomForest is that it's a shortcut to the out-of-bag predictions unless you supply newdata.
OP originally asked about 2 different ensembles of random forests. This portion of the answer addresses why 2 random forest ensembles might make different predictions on the same data.
The trees are different.
First, randomForest is a random procedure: both the samples chosen for each tree are different (bootstrap resampling), and the features chosen at each split are chosen at random (randomized feature subspaces). Without fixing the random seed, we would expect two randomForest runs to produce different results with high probability for the same reason that flipping a fair coin 1000 times will plausibly result in a different sequence of heads and tails. (You have fixed the seed, however the two instances of randomForest will still be different because the random state is altered after the first randomForest is produced.)
Second, The data used to train the models is different. It appears that you've added an additional row. Different data makes for a different model, which makes for a different prediction. When randomForest conducts its bootstrapping, the probability that df[32,] is in-bag for that tree is larger than for each non-duplicated sample. This change to the data will also change the trees, because choices about where to make splits will be influenced by the increased prominence of this sample.
Different trees make different predictions.
Having the same feature values is only half the battle. The other half is how the trees are constructed.
As an example, suppose I have 3 trees constructed with the random forest procedure, each with 1 split.
- This tree has a bootstrap resample and randomly samples
cyl,dispandhp. It picks splitting oncylat 5 as the best split. - This tree has a bootstrap resample and randomly samples
cyl,dispandhp. It picks splitting oncylat 7 as the best split. - This tree has a bootstrap resample and randomly samples
wt,dispandhp. It picks splitting onhpat 123 as the best split.
Clearly there will be different predictions whenever the splits change the decision of a sample. A sample with cyl 6 might go "right" for tree 1, but "left" for tree 2. Feature hp doesn't have a one-to-one relationship to cyl, so a split on hp won't generally match splits on cyl.
$endgroup$
If you supply the argument newdata the discrepancy disappears: predict(rt.est, newdata=df) gives
Volvo 142E
22.15557
Volvo 142E1
22.15557
When you do not supply newdata, it's probably reporting the out-of-bag results, but I haven't found an explicit clarification of this in the documentation. Samples that are "in-bag" were included in a tree during training as a result of the bootstrap re-sampling procedure; out-of-bag samples were omitted.
We can verify that this is out-of-bag data by calling rt.est$predicted which reports the out-of-bag predictions. The results match predict(rt.est).
Volvo 142E
22.83609
Volvo 142E1
22.85975
One way to think of predict.randomForest is that it's a shortcut to the out-of-bag predictions unless you supply newdata.
OP originally asked about 2 different ensembles of random forests. This portion of the answer addresses why 2 random forest ensembles might make different predictions on the same data.
The trees are different.
First, randomForest is a random procedure: both the samples chosen for each tree are different (bootstrap resampling), and the features chosen at each split are chosen at random (randomized feature subspaces). Without fixing the random seed, we would expect two randomForest runs to produce different results with high probability for the same reason that flipping a fair coin 1000 times will plausibly result in a different sequence of heads and tails. (You have fixed the seed, however the two instances of randomForest will still be different because the random state is altered after the first randomForest is produced.)
Second, The data used to train the models is different. It appears that you've added an additional row. Different data makes for a different model, which makes for a different prediction. When randomForest conducts its bootstrapping, the probability that df[32,] is in-bag for that tree is larger than for each non-duplicated sample. This change to the data will also change the trees, because choices about where to make splits will be influenced by the increased prominence of this sample.
Different trees make different predictions.
Having the same feature values is only half the battle. The other half is how the trees are constructed.
As an example, suppose I have 3 trees constructed with the random forest procedure, each with 1 split.
- This tree has a bootstrap resample and randomly samples
cyl,dispandhp. It picks splitting oncylat 5 as the best split. - This tree has a bootstrap resample and randomly samples
cyl,dispandhp. It picks splitting oncylat 7 as the best split. - This tree has a bootstrap resample and randomly samples
wt,dispandhp. It picks splitting onhpat 123 as the best split.
Clearly there will be different predictions whenever the splits change the decision of a sample. A sample with cyl 6 might go "right" for tree 1, but "left" for tree 2. Feature hp doesn't have a one-to-one relationship to cyl, so a split on hp won't generally match splits on cyl.
edited 3 hours ago
answered 8 hours ago
SycoraxSycorax
43.2k12112208
43.2k12112208
$begingroup$
That i understand. I added one Observation ran the model again and would have expected the same result for the Observation #32 because it has the same predictors.
$endgroup$
– Max M
7 hours ago
$begingroup$
If you have two completely different decision trees, are they guaranteed to make the same predictions?
$endgroup$
– Sycorax
7 hours ago
$begingroup$
I edited my Code to make it clearer what confuses me. I do understand that each run of the randomforest command yields a different model if I do not set a seed, but why is the prediction for my last two observations different? So you are saying that the prediction can slighly deviate at the cutoff Point? But this would be more likely for categorical predictors then?
$endgroup$
– Max M
3 hours ago
$begingroup$
Oh thats surprising to me. So ist a using kind of a mixed random drawn sample to get the predictions for the model? Is this not contradicting your original answer then???
$endgroup$
– Max M
3 hours ago
$begingroup$
No, that's absolutely not what it's doing. Please read the explanation in my answer. The behavior of the function changes depending on what arguments you do or do not give to it. Your original question asked about a comparison between 2 random forest ensembles. Your revised question asks about how the functionpredictworks.
$endgroup$
– Sycorax
3 hours ago
|
show 2 more comments
$begingroup$
That i understand. I added one Observation ran the model again and would have expected the same result for the Observation #32 because it has the same predictors.
$endgroup$
– Max M
7 hours ago
$begingroup$
If you have two completely different decision trees, are they guaranteed to make the same predictions?
$endgroup$
– Sycorax
7 hours ago
$begingroup$
I edited my Code to make it clearer what confuses me. I do understand that each run of the randomforest command yields a different model if I do not set a seed, but why is the prediction for my last two observations different? So you are saying that the prediction can slighly deviate at the cutoff Point? But this would be more likely for categorical predictors then?
$endgroup$
– Max M
3 hours ago
$begingroup$
Oh thats surprising to me. So ist a using kind of a mixed random drawn sample to get the predictions for the model? Is this not contradicting your original answer then???
$endgroup$
– Max M
3 hours ago
$begingroup$
No, that's absolutely not what it's doing. Please read the explanation in my answer. The behavior of the function changes depending on what arguments you do or do not give to it. Your original question asked about a comparison between 2 random forest ensembles. Your revised question asks about how the functionpredictworks.
$endgroup$
– Sycorax
3 hours ago
$begingroup$
That i understand. I added one Observation ran the model again and would have expected the same result for the Observation #32 because it has the same predictors.
$endgroup$
– Max M
7 hours ago
$begingroup$
That i understand. I added one Observation ran the model again and would have expected the same result for the Observation #32 because it has the same predictors.
$endgroup$
– Max M
7 hours ago
$begingroup$
If you have two completely different decision trees, are they guaranteed to make the same predictions?
$endgroup$
– Sycorax
7 hours ago
$begingroup$
If you have two completely different decision trees, are they guaranteed to make the same predictions?
$endgroup$
– Sycorax
7 hours ago
$begingroup$
I edited my Code to make it clearer what confuses me. I do understand that each run of the randomforest command yields a different model if I do not set a seed, but why is the prediction for my last two observations different? So you are saying that the prediction can slighly deviate at the cutoff Point? But this would be more likely for categorical predictors then?
$endgroup$
– Max M
3 hours ago
$begingroup$
I edited my Code to make it clearer what confuses me. I do understand that each run of the randomforest command yields a different model if I do not set a seed, but why is the prediction for my last two observations different? So you are saying that the prediction can slighly deviate at the cutoff Point? But this would be more likely for categorical predictors then?
$endgroup$
– Max M
3 hours ago
$begingroup$
Oh thats surprising to me. So ist a using kind of a mixed random drawn sample to get the predictions for the model? Is this not contradicting your original answer then???
$endgroup$
– Max M
3 hours ago
$begingroup$
Oh thats surprising to me. So ist a using kind of a mixed random drawn sample to get the predictions for the model? Is this not contradicting your original answer then???
$endgroup$
– Max M
3 hours ago
$begingroup$
No, that's absolutely not what it's doing. Please read the explanation in my answer. The behavior of the function changes depending on what arguments you do or do not give to it. Your original question asked about a comparison between 2 random forest ensembles. Your revised question asks about how the function
predict works.$endgroup$
– Sycorax
3 hours ago
$begingroup$
No, that's absolutely not what it's doing. Please read the explanation in my answer. The behavior of the function changes depending on what arguments you do or do not give to it. Your original question asked about a comparison between 2 random forest ensembles. Your revised question asks about how the function
predict works.$endgroup$
– Sycorax
3 hours ago
|
show 2 more comments
Max M is a new contributor. Be nice, and check out our Code of Conduct.
Max M is a new contributor. Be nice, and check out our Code of Conduct.
Max M is a new contributor. Be nice, and check out our Code of Conduct.
Max M is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f405234%2frandom-forest-different-results-for-same-observation%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Once you have an answer to the question as stated, you shouldn't really change the substance of the question in a way that breaks the connection with the existing answers. If you wan to ask another question as a fork of this, just start a new thread. You can link back for context, if you want.
$endgroup$
– gung♦
2 hours ago
$begingroup$
I did not alter the questions substantially. It was always about having a different prediction for the same predictors.
$endgroup$
– Max M
2 hours ago