Identify and count spells (Distinctive events within each group) The Next CEO of Stack...
What difference does it make matching a word with/without a trailing whitespace?
How to coordinate airplane tickets?
Can a PhD from a non-TU9 German university become a professor in a TU9 university?
Physiological effects of huge anime eyes
Why do we say “un seul M” and not “une seule M” even though M is a “consonne”?
How can I separate the number from the unit in argument?
How to find if SQL server backup is encrypted with TDE without restoring the backup
Find a path from s to t using as few red nodes as possible
Compensation for working overtime on Saturdays
MT "will strike" & LXX "will watch carefully" (Gen 3:15)?
Is it possible to make a 9x9 table fit within the default margins?
Find the majority element, which appears more than half the time
Does int main() need a declaration on C++?
Creating a script with console commands
Would a grinding machine be a simple and workable propulsion system for an interplanetary spacecraft?
Can this transistor (2N2222) take 6 V on emitter-base? Am I reading the datasheet incorrectly?
Which acid/base does a strong base/acid react when added to a buffer solution?
Is it possible to create a QR code using text?
Can Sri Krishna be called 'a person'?
What steps are necessary to read a Modern SSD in Medieval Europe?
Gauss' Posthumous Publications?
What does this strange code stamp on my passport mean?
Could you use a laser beam as a modulated carrier wave for radio signal?
Upgrading From a 9 Speed Sora Derailleur?
Identify and count spells (Distinctive events within each group)
The Next CEO of Stack OverflowR - list to data frameCount number of rows within each groupCounting unique / distinct values by group in a data frameR: find relative weight within each group and within the entire dataframeR: how to calculate summary for each group and all the data?count the number of distinct variables in a groupusing tidyverse; counting after and before change in value, within groups, generating new variables for each unique shiftDistinct in r within groups of datahow to get count and distinct count with group by in dataframe RNest a dataframe by group, but include extra rows within each groupChange value by group based in reference within group
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell
is what I'm trying to compute. I've tried using dplyr
's lead
and lag
, but that gets too complicated. I've tried rle
but got nowhere.
ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse
solution.
Assumptions
Data is sorted by
group
and then bytime
There are no gaps in
time
within each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rle
approach by @markus took 0.53 seconds - the
cumsum
approach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rle
anddense_rank
by @tmfmnk took 0.89
r dataframe dplyr time-series tidyverse
add a comment |
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell
is what I'm trying to compute. I've tried using dplyr
's lead
and lag
, but that gets too complicated. I've tried rle
but got nowhere.
ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse
solution.
Assumptions
Data is sorted by
group
and then bytime
There are no gaps in
time
within each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rle
approach by @markus took 0.53 seconds - the
cumsum
approach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rle
anddense_rank
by @tmfmnk took 0.89
r dataframe dplyr time-series tidyverse
2
For someone who is not familiar with how thespell
is computed, can you share a formula or description?
– nsinghs
7 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
6 hours ago
add a comment |
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell
is what I'm trying to compute. I've tried using dplyr
's lead
and lag
, but that gets too complicated. I've tried rle
but got nowhere.
ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse
solution.
Assumptions
Data is sorted by
group
and then bytime
There are no gaps in
time
within each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rle
approach by @markus took 0.53 seconds - the
cumsum
approach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rle
anddense_rank
by @tmfmnk took 0.89
r dataframe dplyr time-series tidyverse
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell
is what I'm trying to compute. I've tried using dplyr
's lead
and lag
, but that gets too complicated. I've tried rle
but got nowhere.
ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse
solution.
Assumptions
Data is sorted by
group
and then bytime
There are no gaps in
time
within each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rle
approach by @markus took 0.53 seconds - the
cumsum
approach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rle
anddense_rank
by @tmfmnk took 0.89
r dataframe dplyr time-series tidyverse
r dataframe dplyr time-series tidyverse
edited 2 hours ago
Thomas Speidel
asked 7 hours ago
Thomas SpeidelThomas Speidel
359216
359216
2
For someone who is not familiar with how thespell
is computed, can you share a formula or description?
– nsinghs
7 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
6 hours ago
add a comment |
2
For someone who is not familiar with how thespell
is computed, can you share a formula or description?
– nsinghs
7 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
6 hours ago
2
2
For someone who is not familiar with how the
spell
is computed, can you share a formula or description?– nsinghs
7 hours ago
For someone who is not familiar with how the
spell
is computed, can you share a formula or description?– nsinghs
7 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
6 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
6 hours ago
add a comment |
6 Answers
6
active
oldest
votes
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
add a comment |
A somehow different possibility could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
add a comment |
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
add a comment |
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
add a comment |
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55463310%2fidentify-and-count-spells-distinctive-events-within-each-group%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
add a comment |
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
add a comment |
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = {
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
}
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values
with the cumulative sum where values == 1
while values
should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values)
with r$values
; where the latter is a vector of 0
s and 1
s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle
to get back a vector of the same length as is.5
.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group
.
edited 5 hours ago
answered 7 hours ago
markusmarkus
15k11336
15k11336
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
add a comment |
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
1
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
5 hours ago
1
1
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
@M-M Added some explanation. Thanks for the comment.
– markus
5 hours ago
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
Here's a helper function that can return what you are after
spell_index <- function(time, flag) {
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
}
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag()
to look for changes. We use cumsum()
to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
answered 7 hours ago
MrFlickMrFlick
124k11141173
124k11141173
add a comment |
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat){
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0 | change == -1, 0, cumsum(flag))) %>%
dplyr::select(time, group, is.5, spell)
return(rst)
}
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
edited 7 hours ago
answered 7 hours ago
Hector HaffendenHector Haffenden
569216
569216
add a comment |
add a comment |
A somehow different possibility could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
add a comment |
A somehow different possibility could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
add a comment |
A somehow different possibility could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
A somehow different possibility could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
answered 6 hours ago
tmfmnktmfmnk
3,6211516
3,6211516
add a comment |
add a comment |
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
add a comment |
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
add a comment |
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
One options is using cumsum
:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5
this takes care of assigning a new id (i.e. spell
) whenever is.5
changes; but we want to avoid assigning new ones to those rows is.5
equal to 0
and that's why I have the second rule in cumsum
function (i.e. (is.5!=0)
).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0
. That's why I have multiplied the answer by is.5
.
answered 5 hours ago
M-MM-M
7,17962146
7,17962146
add a comment |
add a comment |
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
add a comment |
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
add a comment |
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
Here is one option with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'group', get the run-length-id (rleid
) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i
with a logical vector to select rows that have 'spell' values not zero, match
those values of 'spell' with unique
'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
edited 1 hour ago
answered 1 hour ago
akrunakrun
418k13207282
418k13207282
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55463310%2fidentify-and-count-spells-distinctive-events-within-each-group%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
For someone who is not familiar with how the
spell
is computed, can you share a formula or description?– nsinghs
7 hours ago
@nsinghs I think they mean "hospital spell"
– zx8754
6 hours ago