How to use Pandas to get the count of every combination inclusiveHow to get all possible combinations of a...

Does the average primeness of natural numbers tend to zero?

Crop image to path created in TikZ?

COUNT(*) or MAX(id) - which is faster?

Unbreakable Formation vs. Cry of the Carnarium

Pristine Bit Checking

Is domain driven design an anti-SQL pattern?

Is it legal to have the "// (c) 2019 John Smith" header in all files when there are hundreds of contributors?

Why did the Germans forbid the possession of pet pigeons in Rostov-on-Don in 1941?

Copycat chess is back

What does 'script /dev/null' do?

aging parents with no investments

How to deal with fear of taking dependencies

When blogging recipes, how can I support both readers who want the narrative/journey and ones who want the printer-friendly recipe?

How to make payment on the internet without leaving a money trail?

Could a US political party gain complete control over the government by removing checks & balances?

Is "plugging out" electronic devices an American expression?

How to answer pointed "are you quitting" questioning when I don't want them to suspect

Is there a familial term for apples and pears?

Patience, young "Padovan"

I’m planning on buying a laser printer but concerned about the life cycle of toner in the machine

Are white and non-white police officers equally likely to kill black suspects?

Is it wise to focus on putting odd beats on left when playing double bass drums?

What is the command to reset a PC without deleting any files

How can I fix this gap between bookcases I made?



How to use Pandas to get the count of every combination inclusive


How to get all possible combinations of a list’s elements?How to get the ASCII value of a character?How to get the current time in PythonHow to get line count cheaply in Python?How do I get the number of elements in a list in Python?How can I count the occurrences of a list item?How to drop rows of Pandas DataFrame whose value in certain columns is NaNHow do I get the row count of a Pandas dataframe?How to iterate over rows in a DataFrame in Pandas?Get list from pandas DataFrame column headersHow to deal with SettingWithCopyWarning in Pandas?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







10















I am trying to figure out what combination of clothing customers are buying together. I can figure out the exact combination, but the problem I can't figure out is the count that includes the combination + others.



For example, I have:



Cust_num  Item    Rev
Cust1 Shirt1 $40
Cust1 Shirt2 $40
Cust1 Shorts1 $40
Cust2 Shirt1 $40
Cust2 Shorts1 $40


This should result in:



Combo                  Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2


The best I can do is unique combinations:



Combo                 Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 1


I tried:



df = df.pivot(index='Cust_num',columns='Item').sum()
df[df.notnull()] = "x"
df = df.loc[:,"Shirt1":].replace("x", pd.Series(df.columns, df.columns))
col = df.stack().groupby(level=0).apply(','.join)
df2 = pd.DataFrame(col)
df2.groupby([0]).size().reset_index(name='counts')


But that is just the unique counts.










share|improve this question







New contributor




lys_dad is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 3





    I feel like this is one sort of problem pandas would not be suitable for.

    – coldspeed
    23 hours ago


















10















I am trying to figure out what combination of clothing customers are buying together. I can figure out the exact combination, but the problem I can't figure out is the count that includes the combination + others.



For example, I have:



Cust_num  Item    Rev
Cust1 Shirt1 $40
Cust1 Shirt2 $40
Cust1 Shorts1 $40
Cust2 Shirt1 $40
Cust2 Shorts1 $40


This should result in:



Combo                  Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2


The best I can do is unique combinations:



Combo                 Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 1


I tried:



df = df.pivot(index='Cust_num',columns='Item').sum()
df[df.notnull()] = "x"
df = df.loc[:,"Shirt1":].replace("x", pd.Series(df.columns, df.columns))
col = df.stack().groupby(level=0).apply(','.join)
df2 = pd.DataFrame(col)
df2.groupby([0]).size().reset_index(name='counts')


But that is just the unique counts.










share|improve this question







New contributor




lys_dad is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 3





    I feel like this is one sort of problem pandas would not be suitable for.

    – coldspeed
    23 hours ago














10












10








10








I am trying to figure out what combination of clothing customers are buying together. I can figure out the exact combination, but the problem I can't figure out is the count that includes the combination + others.



For example, I have:



Cust_num  Item    Rev
Cust1 Shirt1 $40
Cust1 Shirt2 $40
Cust1 Shorts1 $40
Cust2 Shirt1 $40
Cust2 Shorts1 $40


This should result in:



Combo                  Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2


The best I can do is unique combinations:



Combo                 Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 1


I tried:



df = df.pivot(index='Cust_num',columns='Item').sum()
df[df.notnull()] = "x"
df = df.loc[:,"Shirt1":].replace("x", pd.Series(df.columns, df.columns))
col = df.stack().groupby(level=0).apply(','.join)
df2 = pd.DataFrame(col)
df2.groupby([0]).size().reset_index(name='counts')


But that is just the unique counts.










share|improve this question







New contributor




lys_dad is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












I am trying to figure out what combination of clothing customers are buying together. I can figure out the exact combination, but the problem I can't figure out is the count that includes the combination + others.



For example, I have:



Cust_num  Item    Rev
Cust1 Shirt1 $40
Cust1 Shirt2 $40
Cust1 Shorts1 $40
Cust2 Shirt1 $40
Cust2 Shorts1 $40


This should result in:



Combo                  Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2


The best I can do is unique combinations:



Combo                 Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 1


I tried:



df = df.pivot(index='Cust_num',columns='Item').sum()
df[df.notnull()] = "x"
df = df.loc[:,"Shirt1":].replace("x", pd.Series(df.columns, df.columns))
col = df.stack().groupby(level=0).apply(','.join)
df2 = pd.DataFrame(col)
df2.groupby([0]).size().reset_index(name='counts')


But that is just the unique counts.







python pandas






share|improve this question







New contributor




lys_dad is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question







New contributor




lys_dad is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question






New contributor




lys_dad is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked yesterday









lys_dadlys_dad

565




565




New contributor




lys_dad is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





lys_dad is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






lys_dad is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 3





    I feel like this is one sort of problem pandas would not be suitable for.

    – coldspeed
    23 hours ago














  • 3





    I feel like this is one sort of problem pandas would not be suitable for.

    – coldspeed
    23 hours ago








3




3





I feel like this is one sort of problem pandas would not be suitable for.

– coldspeed
23 hours ago





I feel like this is one sort of problem pandas would not be suitable for.

– coldspeed
23 hours ago












4 Answers
4






active

oldest

votes


















8














Using pandas.DataFrame.groupby:



grouped_item = df.groupby('Cust_num')['Item']
subsets = grouped_item.apply(lambda x: set(x)).tolist()
Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]
combo = grouped_item.apply(lambda x:','.join(x))
combo = combo.reset_index()
combo['Count']=Count


Output:



  Cust_num                   Item  Count
0 Cust1 Shirt1,Shirt2,Shorts1 1
1 Cust2 Shirt1,Shorts1 2





share|improve this answer
























  • How is finding subsets finding inclusive combination of df['Item']? Subsets are not combinations. Below is output of what you doing: {' Shirt2', ' Shorts1', ' Shirt1'}: True followed by {' Shorts1', ' Shirt1'}: True and then {' Shorts1', ' Shirt1'}: True. Then you sum then to get [1,2]. I agree my approach i did is wrong so is yours. I would think @ResidentSleeper has correct answer.

    – Lee Mtoti
    20 hours ago













  • @Chris, I think you need to find combination of Item first which would give you your subsets. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing

    – Lee Mtoti
    20 hours ago











  • @LeeMtoti Apologies for a strong language. I've deleted it. BTW, grouped_item contains a combination of each CustN's choice of clothing. lambda x: set(x) was implemented for a issubset comparison. As you pointed out, issubset returns True if and only if a set is contained in other set, which I still believe is what OP wants.

    – Chris
    20 hours ago











  • I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.

    – Chris
    20 hours ago











  • Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.

    – lys_dad
    12 hours ago



















2














Late answer, but you can use:



df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})
df['Count'] = df['Count'].str.replace(r'Cust','')




combo                   Count                 
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2





share|improve this answer

































    2














    I think you need to create a combination of items first.



    How to get all possible combinations of a list’s elements?



    I used the function from Dan H's answer.



    from itertools import chain, combinations
    def all_subsets(ss):
    return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))

    uq_items = df.Item.unique()

    list(all_subsets(uq_items))

    [(),
    ('Shirt1',),
    ('Shirt2',),
    ('Shorts1',),
    ('Shirt1', 'Shirt2'),
    ('Shirt1', 'Shorts1'),
    ('Shirt2', 'Shorts1'),
    ('Shirt1', 'Shirt2', 'Shorts1')]


    And use groupby each customer to get their items combination.



    ls = []

    for _, d in df.groupby('Cust_num', group_keys=False):
    # Get all possible subset of items
    pi = np.array(list(all_subsets(d.Item)))

    # Fliter only > 1
    ls.append(pi[[len(l) > 1 for l in pi]])


    Then convert to Series and use value_counts().



    pd.Series(np.concatenate(ls)).value_counts()

    (Shirt1, Shorts1) 2
    (Shirt2, Shorts1) 1
    (Shirt1, Shirt2, Shorts1) 1
    (Shirt1, Shirt2) 1





    share|improve this answer


























    • This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).

      – lys_dad
      13 hours ago











    • I did the first 1000 customers and it worked! Any suggestion for low memory laptops?

      – lys_dad
      12 hours ago











    • @lys_dad The accepted answer has already solved your memory problem, right?

      – ResidentSleeper
      9 hours ago











    • It does, yes. But thank you for your elegant solution!

      – lys_dad
      7 hours ago



















    0














    My version which I believe is easier to understand



    new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})

    new_df ['count'] = range(1, len(new_df ) + 1)


    Output:



                                Item      Rev count
    <lambda> <lambda>
    Cust_num
    Cust1 Shirt1 Shirt2 Shorts1 $40 1
    Cust2 Shirt1 Shorts1 $40 2


    Since you do not need the Rev column, you can drop it:



    new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()

    new_df


    Output:



      Cust_num                    Item count
    <lambda>
    0 Cust1 Shirt1 Shirt2 Shorts1 1
    1 Cust2 Shirt1 Shorts1 2


    This edit is to respond to @Chris by looking at his approach written using list comprehension. He created an list of sets:



    [{' Shirt1', ' Shirt2', ' Shorts1'}, {' Shirt1', ' Shorts1'}]


    Then next step finds the subsets:



    for s1 in subsets:
    for s2 in subsets:
    if s2.issubset(s1):
    print("{}: {}".format(s2,s2.issubset(s1)))


    Output:



    {' Shirt2', ' Shorts1', ' Shirt1'}: True
    {' Shorts1', ' Shirt1'}: True
    {' Shorts1', ' Shirt1'}: True


    You asked me to explain myself and I did. However after thinking about it, I realized your approach was wrong too. As such I was not mocking you, but thanking you for making me think about my solution. Also thanks to @ResidentSleeper for his solution.






    share|improve this answer


























    • @Chris thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.

      – Lee Mtoti
      20 hours ago











    • @Chris thanks for removing the downvote. Whole point is to learn from each other while helping each other.

      – Lee Mtoti
      19 hours ago














    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    lys_dad is a new contributor. Be nice, and check out our Code of Conduct.










    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55565916%2fhow-to-use-pandas-to-get-the-count-of-every-combination-inclusive%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    8














    Using pandas.DataFrame.groupby:



    grouped_item = df.groupby('Cust_num')['Item']
    subsets = grouped_item.apply(lambda x: set(x)).tolist()
    Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]
    combo = grouped_item.apply(lambda x:','.join(x))
    combo = combo.reset_index()
    combo['Count']=Count


    Output:



      Cust_num                   Item  Count
    0 Cust1 Shirt1,Shirt2,Shorts1 1
    1 Cust2 Shirt1,Shorts1 2





    share|improve this answer
























    • How is finding subsets finding inclusive combination of df['Item']? Subsets are not combinations. Below is output of what you doing: {' Shirt2', ' Shorts1', ' Shirt1'}: True followed by {' Shorts1', ' Shirt1'}: True and then {' Shorts1', ' Shirt1'}: True. Then you sum then to get [1,2]. I agree my approach i did is wrong so is yours. I would think @ResidentSleeper has correct answer.

      – Lee Mtoti
      20 hours ago













    • @Chris, I think you need to find combination of Item first which would give you your subsets. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing

      – Lee Mtoti
      20 hours ago











    • @LeeMtoti Apologies for a strong language. I've deleted it. BTW, grouped_item contains a combination of each CustN's choice of clothing. lambda x: set(x) was implemented for a issubset comparison. As you pointed out, issubset returns True if and only if a set is contained in other set, which I still believe is what OP wants.

      – Chris
      20 hours ago











    • I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.

      – Chris
      20 hours ago











    • Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.

      – lys_dad
      12 hours ago
















    8














    Using pandas.DataFrame.groupby:



    grouped_item = df.groupby('Cust_num')['Item']
    subsets = grouped_item.apply(lambda x: set(x)).tolist()
    Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]
    combo = grouped_item.apply(lambda x:','.join(x))
    combo = combo.reset_index()
    combo['Count']=Count


    Output:



      Cust_num                   Item  Count
    0 Cust1 Shirt1,Shirt2,Shorts1 1
    1 Cust2 Shirt1,Shorts1 2





    share|improve this answer
























    • How is finding subsets finding inclusive combination of df['Item']? Subsets are not combinations. Below is output of what you doing: {' Shirt2', ' Shorts1', ' Shirt1'}: True followed by {' Shorts1', ' Shirt1'}: True and then {' Shorts1', ' Shirt1'}: True. Then you sum then to get [1,2]. I agree my approach i did is wrong so is yours. I would think @ResidentSleeper has correct answer.

      – Lee Mtoti
      20 hours ago













    • @Chris, I think you need to find combination of Item first which would give you your subsets. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing

      – Lee Mtoti
      20 hours ago











    • @LeeMtoti Apologies for a strong language. I've deleted it. BTW, grouped_item contains a combination of each CustN's choice of clothing. lambda x: set(x) was implemented for a issubset comparison. As you pointed out, issubset returns True if and only if a set is contained in other set, which I still believe is what OP wants.

      – Chris
      20 hours ago











    • I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.

      – Chris
      20 hours ago











    • Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.

      – lys_dad
      12 hours ago














    8












    8








    8







    Using pandas.DataFrame.groupby:



    grouped_item = df.groupby('Cust_num')['Item']
    subsets = grouped_item.apply(lambda x: set(x)).tolist()
    Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]
    combo = grouped_item.apply(lambda x:','.join(x))
    combo = combo.reset_index()
    combo['Count']=Count


    Output:



      Cust_num                   Item  Count
    0 Cust1 Shirt1,Shirt2,Shorts1 1
    1 Cust2 Shirt1,Shorts1 2





    share|improve this answer













    Using pandas.DataFrame.groupby:



    grouped_item = df.groupby('Cust_num')['Item']
    subsets = grouped_item.apply(lambda x: set(x)).tolist()
    Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]
    combo = grouped_item.apply(lambda x:','.join(x))
    combo = combo.reset_index()
    combo['Count']=Count


    Output:



      Cust_num                   Item  Count
    0 Cust1 Shirt1,Shirt2,Shorts1 1
    1 Cust2 Shirt1,Shorts1 2






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered 23 hours ago









    ChrisChris

    3,791522




    3,791522













    • How is finding subsets finding inclusive combination of df['Item']? Subsets are not combinations. Below is output of what you doing: {' Shirt2', ' Shorts1', ' Shirt1'}: True followed by {' Shorts1', ' Shirt1'}: True and then {' Shorts1', ' Shirt1'}: True. Then you sum then to get [1,2]. I agree my approach i did is wrong so is yours. I would think @ResidentSleeper has correct answer.

      – Lee Mtoti
      20 hours ago













    • @Chris, I think you need to find combination of Item first which would give you your subsets. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing

      – Lee Mtoti
      20 hours ago











    • @LeeMtoti Apologies for a strong language. I've deleted it. BTW, grouped_item contains a combination of each CustN's choice of clothing. lambda x: set(x) was implemented for a issubset comparison. As you pointed out, issubset returns True if and only if a set is contained in other set, which I still believe is what OP wants.

      – Chris
      20 hours ago











    • I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.

      – Chris
      20 hours ago











    • Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.

      – lys_dad
      12 hours ago



















    • How is finding subsets finding inclusive combination of df['Item']? Subsets are not combinations. Below is output of what you doing: {' Shirt2', ' Shorts1', ' Shirt1'}: True followed by {' Shorts1', ' Shirt1'}: True and then {' Shorts1', ' Shirt1'}: True. Then you sum then to get [1,2]. I agree my approach i did is wrong so is yours. I would think @ResidentSleeper has correct answer.

      – Lee Mtoti
      20 hours ago













    • @Chris, I think you need to find combination of Item first which would give you your subsets. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing

      – Lee Mtoti
      20 hours ago











    • @LeeMtoti Apologies for a strong language. I've deleted it. BTW, grouped_item contains a combination of each CustN's choice of clothing. lambda x: set(x) was implemented for a issubset comparison. As you pointed out, issubset returns True if and only if a set is contained in other set, which I still believe is what OP wants.

      – Chris
      20 hours ago











    • I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.

      – Chris
      20 hours ago











    • Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.

      – lys_dad
      12 hours ago

















    How is finding subsets finding inclusive combination of df['Item']? Subsets are not combinations. Below is output of what you doing: {' Shirt2', ' Shorts1', ' Shirt1'}: True followed by {' Shorts1', ' Shirt1'}: True and then {' Shorts1', ' Shirt1'}: True. Then you sum then to get [1,2]. I agree my approach i did is wrong so is yours. I would think @ResidentSleeper has correct answer.

    – Lee Mtoti
    20 hours ago







    How is finding subsets finding inclusive combination of df['Item']? Subsets are not combinations. Below is output of what you doing: {' Shirt2', ' Shorts1', ' Shirt1'}: True followed by {' Shorts1', ' Shirt1'}: True and then {' Shorts1', ' Shirt1'}: True. Then you sum then to get [1,2]. I agree my approach i did is wrong so is yours. I would think @ResidentSleeper has correct answer.

    – Lee Mtoti
    20 hours ago















    @Chris, I think you need to find combination of Item first which would give you your subsets. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing

    – Lee Mtoti
    20 hours ago





    @Chris, I think you need to find combination of Item first which would give you your subsets. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing

    – Lee Mtoti
    20 hours ago













    @LeeMtoti Apologies for a strong language. I've deleted it. BTW, grouped_item contains a combination of each CustN's choice of clothing. lambda x: set(x) was implemented for a issubset comparison. As you pointed out, issubset returns True if and only if a set is contained in other set, which I still believe is what OP wants.

    – Chris
    20 hours ago





    @LeeMtoti Apologies for a strong language. I've deleted it. BTW, grouped_item contains a combination of each CustN's choice of clothing. lambda x: set(x) was implemented for a issubset comparison. As you pointed out, issubset returns True if and only if a set is contained in other set, which I still believe is what OP wants.

    – Chris
    20 hours ago













    I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.

    – Chris
    20 hours ago





    I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.

    – Chris
    20 hours ago













    Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.

    – lys_dad
    12 hours ago





    Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.

    – lys_dad
    12 hours ago













    2














    Late answer, but you can use:



    df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})
    df['Count'] = df['Count'].str.replace(r'Cust','')




    combo                   Count                 
    Shirt1,Shirt2,Shorts1 1
    Shirt1,Shorts1 2





    share|improve this answer






























      2














      Late answer, but you can use:



      df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})
      df['Count'] = df['Count'].str.replace(r'Cust','')




      combo                   Count                 
      Shirt1,Shirt2,Shorts1 1
      Shirt1,Shorts1 2





      share|improve this answer




























        2












        2








        2







        Late answer, but you can use:



        df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})
        df['Count'] = df['Count'].str.replace(r'Cust','')




        combo                   Count                 
        Shirt1,Shirt2,Shorts1 1
        Shirt1,Shorts1 2





        share|improve this answer















        Late answer, but you can use:



        df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})
        df['Count'] = df['Count'].str.replace(r'Cust','')




        combo                   Count                 
        Shirt1,Shirt2,Shorts1 1
        Shirt1,Shorts1 2






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 22 hours ago

























        answered 23 hours ago









        Pedro LobitoPedro Lobito

        50.6k16138172




        50.6k16138172























            2














            I think you need to create a combination of items first.



            How to get all possible combinations of a list’s elements?



            I used the function from Dan H's answer.



            from itertools import chain, combinations
            def all_subsets(ss):
            return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))

            uq_items = df.Item.unique()

            list(all_subsets(uq_items))

            [(),
            ('Shirt1',),
            ('Shirt2',),
            ('Shorts1',),
            ('Shirt1', 'Shirt2'),
            ('Shirt1', 'Shorts1'),
            ('Shirt2', 'Shorts1'),
            ('Shirt1', 'Shirt2', 'Shorts1')]


            And use groupby each customer to get their items combination.



            ls = []

            for _, d in df.groupby('Cust_num', group_keys=False):
            # Get all possible subset of items
            pi = np.array(list(all_subsets(d.Item)))

            # Fliter only > 1
            ls.append(pi[[len(l) > 1 for l in pi]])


            Then convert to Series and use value_counts().



            pd.Series(np.concatenate(ls)).value_counts()

            (Shirt1, Shorts1) 2
            (Shirt2, Shorts1) 1
            (Shirt1, Shirt2, Shorts1) 1
            (Shirt1, Shirt2) 1





            share|improve this answer


























            • This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).

              – lys_dad
              13 hours ago











            • I did the first 1000 customers and it worked! Any suggestion for low memory laptops?

              – lys_dad
              12 hours ago











            • @lys_dad The accepted answer has already solved your memory problem, right?

              – ResidentSleeper
              9 hours ago











            • It does, yes. But thank you for your elegant solution!

              – lys_dad
              7 hours ago
















            2














            I think you need to create a combination of items first.



            How to get all possible combinations of a list’s elements?



            I used the function from Dan H's answer.



            from itertools import chain, combinations
            def all_subsets(ss):
            return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))

            uq_items = df.Item.unique()

            list(all_subsets(uq_items))

            [(),
            ('Shirt1',),
            ('Shirt2',),
            ('Shorts1',),
            ('Shirt1', 'Shirt2'),
            ('Shirt1', 'Shorts1'),
            ('Shirt2', 'Shorts1'),
            ('Shirt1', 'Shirt2', 'Shorts1')]


            And use groupby each customer to get their items combination.



            ls = []

            for _, d in df.groupby('Cust_num', group_keys=False):
            # Get all possible subset of items
            pi = np.array(list(all_subsets(d.Item)))

            # Fliter only > 1
            ls.append(pi[[len(l) > 1 for l in pi]])


            Then convert to Series and use value_counts().



            pd.Series(np.concatenate(ls)).value_counts()

            (Shirt1, Shorts1) 2
            (Shirt2, Shorts1) 1
            (Shirt1, Shirt2, Shorts1) 1
            (Shirt1, Shirt2) 1





            share|improve this answer


























            • This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).

              – lys_dad
              13 hours ago











            • I did the first 1000 customers and it worked! Any suggestion for low memory laptops?

              – lys_dad
              12 hours ago











            • @lys_dad The accepted answer has already solved your memory problem, right?

              – ResidentSleeper
              9 hours ago











            • It does, yes. But thank you for your elegant solution!

              – lys_dad
              7 hours ago














            2












            2








            2







            I think you need to create a combination of items first.



            How to get all possible combinations of a list’s elements?



            I used the function from Dan H's answer.



            from itertools import chain, combinations
            def all_subsets(ss):
            return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))

            uq_items = df.Item.unique()

            list(all_subsets(uq_items))

            [(),
            ('Shirt1',),
            ('Shirt2',),
            ('Shorts1',),
            ('Shirt1', 'Shirt2'),
            ('Shirt1', 'Shorts1'),
            ('Shirt2', 'Shorts1'),
            ('Shirt1', 'Shirt2', 'Shorts1')]


            And use groupby each customer to get their items combination.



            ls = []

            for _, d in df.groupby('Cust_num', group_keys=False):
            # Get all possible subset of items
            pi = np.array(list(all_subsets(d.Item)))

            # Fliter only > 1
            ls.append(pi[[len(l) > 1 for l in pi]])


            Then convert to Series and use value_counts().



            pd.Series(np.concatenate(ls)).value_counts()

            (Shirt1, Shorts1) 2
            (Shirt2, Shorts1) 1
            (Shirt1, Shirt2, Shorts1) 1
            (Shirt1, Shirt2) 1





            share|improve this answer















            I think you need to create a combination of items first.



            How to get all possible combinations of a list’s elements?



            I used the function from Dan H's answer.



            from itertools import chain, combinations
            def all_subsets(ss):
            return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))

            uq_items = df.Item.unique()

            list(all_subsets(uq_items))

            [(),
            ('Shirt1',),
            ('Shirt2',),
            ('Shorts1',),
            ('Shirt1', 'Shirt2'),
            ('Shirt1', 'Shorts1'),
            ('Shirt2', 'Shorts1'),
            ('Shirt1', 'Shirt2', 'Shorts1')]


            And use groupby each customer to get their items combination.



            ls = []

            for _, d in df.groupby('Cust_num', group_keys=False):
            # Get all possible subset of items
            pi = np.array(list(all_subsets(d.Item)))

            # Fliter only > 1
            ls.append(pi[[len(l) > 1 for l in pi]])


            Then convert to Series and use value_counts().



            pd.Series(np.concatenate(ls)).value_counts()

            (Shirt1, Shorts1) 2
            (Shirt2, Shorts1) 1
            (Shirt1, Shirt2, Shorts1) 1
            (Shirt1, Shirt2) 1






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 20 hours ago

























            answered 23 hours ago









            ResidentSleeperResidentSleeper

            37210




            37210













            • This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).

              – lys_dad
              13 hours ago











            • I did the first 1000 customers and it worked! Any suggestion for low memory laptops?

              – lys_dad
              12 hours ago











            • @lys_dad The accepted answer has already solved your memory problem, right?

              – ResidentSleeper
              9 hours ago











            • It does, yes. But thank you for your elegant solution!

              – lys_dad
              7 hours ago



















            • This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).

              – lys_dad
              13 hours ago











            • I did the first 1000 customers and it worked! Any suggestion for low memory laptops?

              – lys_dad
              12 hours ago











            • @lys_dad The accepted answer has already solved your memory problem, right?

              – ResidentSleeper
              9 hours ago











            • It does, yes. But thank you for your elegant solution!

              – lys_dad
              7 hours ago

















            This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).

            – lys_dad
            13 hours ago





            This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).

            – lys_dad
            13 hours ago













            I did the first 1000 customers and it worked! Any suggestion for low memory laptops?

            – lys_dad
            12 hours ago





            I did the first 1000 customers and it worked! Any suggestion for low memory laptops?

            – lys_dad
            12 hours ago













            @lys_dad The accepted answer has already solved your memory problem, right?

            – ResidentSleeper
            9 hours ago





            @lys_dad The accepted answer has already solved your memory problem, right?

            – ResidentSleeper
            9 hours ago













            It does, yes. But thank you for your elegant solution!

            – lys_dad
            7 hours ago





            It does, yes. But thank you for your elegant solution!

            – lys_dad
            7 hours ago











            0














            My version which I believe is easier to understand



            new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})

            new_df ['count'] = range(1, len(new_df ) + 1)


            Output:



                                        Item      Rev count
            <lambda> <lambda>
            Cust_num
            Cust1 Shirt1 Shirt2 Shorts1 $40 1
            Cust2 Shirt1 Shorts1 $40 2


            Since you do not need the Rev column, you can drop it:



            new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()

            new_df


            Output:



              Cust_num                    Item count
            <lambda>
            0 Cust1 Shirt1 Shirt2 Shorts1 1
            1 Cust2 Shirt1 Shorts1 2


            This edit is to respond to @Chris by looking at his approach written using list comprehension. He created an list of sets:



            [{' Shirt1', ' Shirt2', ' Shorts1'}, {' Shirt1', ' Shorts1'}]


            Then next step finds the subsets:



            for s1 in subsets:
            for s2 in subsets:
            if s2.issubset(s1):
            print("{}: {}".format(s2,s2.issubset(s1)))


            Output:



            {' Shirt2', ' Shorts1', ' Shirt1'}: True
            {' Shorts1', ' Shirt1'}: True
            {' Shorts1', ' Shirt1'}: True


            You asked me to explain myself and I did. However after thinking about it, I realized your approach was wrong too. As such I was not mocking you, but thanking you for making me think about my solution. Also thanks to @ResidentSleeper for his solution.






            share|improve this answer


























            • @Chris thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.

              – Lee Mtoti
              20 hours ago











            • @Chris thanks for removing the downvote. Whole point is to learn from each other while helping each other.

              – Lee Mtoti
              19 hours ago


















            0














            My version which I believe is easier to understand



            new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})

            new_df ['count'] = range(1, len(new_df ) + 1)


            Output:



                                        Item      Rev count
            <lambda> <lambda>
            Cust_num
            Cust1 Shirt1 Shirt2 Shorts1 $40 1
            Cust2 Shirt1 Shorts1 $40 2


            Since you do not need the Rev column, you can drop it:



            new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()

            new_df


            Output:



              Cust_num                    Item count
            <lambda>
            0 Cust1 Shirt1 Shirt2 Shorts1 1
            1 Cust2 Shirt1 Shorts1 2


            This edit is to respond to @Chris by looking at his approach written using list comprehension. He created an list of sets:



            [{' Shirt1', ' Shirt2', ' Shorts1'}, {' Shirt1', ' Shorts1'}]


            Then next step finds the subsets:



            for s1 in subsets:
            for s2 in subsets:
            if s2.issubset(s1):
            print("{}: {}".format(s2,s2.issubset(s1)))


            Output:



            {' Shirt2', ' Shorts1', ' Shirt1'}: True
            {' Shorts1', ' Shirt1'}: True
            {' Shorts1', ' Shirt1'}: True


            You asked me to explain myself and I did. However after thinking about it, I realized your approach was wrong too. As such I was not mocking you, but thanking you for making me think about my solution. Also thanks to @ResidentSleeper for his solution.






            share|improve this answer


























            • @Chris thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.

              – Lee Mtoti
              20 hours ago











            • @Chris thanks for removing the downvote. Whole point is to learn from each other while helping each other.

              – Lee Mtoti
              19 hours ago
















            0












            0








            0







            My version which I believe is easier to understand



            new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})

            new_df ['count'] = range(1, len(new_df ) + 1)


            Output:



                                        Item      Rev count
            <lambda> <lambda>
            Cust_num
            Cust1 Shirt1 Shirt2 Shorts1 $40 1
            Cust2 Shirt1 Shorts1 $40 2


            Since you do not need the Rev column, you can drop it:



            new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()

            new_df


            Output:



              Cust_num                    Item count
            <lambda>
            0 Cust1 Shirt1 Shirt2 Shorts1 1
            1 Cust2 Shirt1 Shorts1 2


            This edit is to respond to @Chris by looking at his approach written using list comprehension. He created an list of sets:



            [{' Shirt1', ' Shirt2', ' Shorts1'}, {' Shirt1', ' Shorts1'}]


            Then next step finds the subsets:



            for s1 in subsets:
            for s2 in subsets:
            if s2.issubset(s1):
            print("{}: {}".format(s2,s2.issubset(s1)))


            Output:



            {' Shirt2', ' Shorts1', ' Shirt1'}: True
            {' Shorts1', ' Shirt1'}: True
            {' Shorts1', ' Shirt1'}: True


            You asked me to explain myself and I did. However after thinking about it, I realized your approach was wrong too. As such I was not mocking you, but thanking you for making me think about my solution. Also thanks to @ResidentSleeper for his solution.






            share|improve this answer















            My version which I believe is easier to understand



            new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})

            new_df ['count'] = range(1, len(new_df ) + 1)


            Output:



                                        Item      Rev count
            <lambda> <lambda>
            Cust_num
            Cust1 Shirt1 Shirt2 Shorts1 $40 1
            Cust2 Shirt1 Shorts1 $40 2


            Since you do not need the Rev column, you can drop it:



            new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()

            new_df


            Output:



              Cust_num                    Item count
            <lambda>
            0 Cust1 Shirt1 Shirt2 Shorts1 1
            1 Cust2 Shirt1 Shorts1 2


            This edit is to respond to @Chris by looking at his approach written using list comprehension. He created an list of sets:



            [{' Shirt1', ' Shirt2', ' Shorts1'}, {' Shirt1', ' Shorts1'}]


            Then next step finds the subsets:



            for s1 in subsets:
            for s2 in subsets:
            if s2.issubset(s1):
            print("{}: {}".format(s2,s2.issubset(s1)))


            Output:



            {' Shirt2', ' Shorts1', ' Shirt1'}: True
            {' Shorts1', ' Shirt1'}: True
            {' Shorts1', ' Shirt1'}: True


            You asked me to explain myself and I did. However after thinking about it, I realized your approach was wrong too. As such I was not mocking you, but thanking you for making me think about my solution. Also thanks to @ResidentSleeper for his solution.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 19 hours ago

























            answered 22 hours ago









            Lee MtotiLee Mtoti

            13410




            13410













            • @Chris thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.

              – Lee Mtoti
              20 hours ago











            • @Chris thanks for removing the downvote. Whole point is to learn from each other while helping each other.

              – Lee Mtoti
              19 hours ago





















            • @Chris thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.

              – Lee Mtoti
              20 hours ago











            • @Chris thanks for removing the downvote. Whole point is to learn from each other while helping each other.

              – Lee Mtoti
              19 hours ago



















            @Chris thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.

            – Lee Mtoti
            20 hours ago





            @Chris thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.

            – Lee Mtoti
            20 hours ago













            @Chris thanks for removing the downvote. Whole point is to learn from each other while helping each other.

            – Lee Mtoti
            19 hours ago







            @Chris thanks for removing the downvote. Whole point is to learn from each other while helping each other.

            – Lee Mtoti
            19 hours ago












            lys_dad is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            lys_dad is a new contributor. Be nice, and check out our Code of Conduct.













            lys_dad is a new contributor. Be nice, and check out our Code of Conduct.












            lys_dad is a new contributor. Be nice, and check out our Code of Conduct.
















            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55565916%2fhow-to-use-pandas-to-get-the-count-of-every-combination-inclusive%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            El tren de la libertad Índice Antecedentes "Porque yo decido" Desarrollo de la...

            Castillo d'Acher Características Menú de navegación

            Connecting two nodes from the same mother node horizontallyTikZ: What EXACTLY does the the |- notation for...