User Stories - Why they are Important

Good Day!

This is mostly a nice little lesson learned for developing apps without having all your ducks in a row first. I'm sure that I'm not the first to say it, but I'm confident that I will also not be the last!

I have an app written for my company that is using the MEAN stack. MongoDB / Express / Angular / Node.js. Pretty standard for my blog - it's my stack of choice lately.

Anyways, this application is used as an internal 'call for submissions' and 'rewards' system. The system was designed originally to simply allow one person to make a submission and we could track, judge, and declare winners as such.

Pretty easy -

let submissionSchema = mongoose.Schema({  
  //... omitted for brevity
  submittedBy:{
    firstName:{type:String, required:true},
    lastName:{type:String, required:true},
    email:{type:String, required:true}
  }
  //...omitted
})

Then we decided to allow 'team' submissions. Sure - let's just adjust our model on the fly

let submissionSchema = mongoose.Schema({  
  //... omitted for brevity
  submittedBy:{
    firstName:{type:String, required:true},
    lastName:{type:String, required:true},
    email:{type:String, required:true}
  },
  team:{
    name:{type:String, required:false},
    isTeamSubmission:{type:Boolean, required:true, default:false},
    members:[{
      //Note: Mongoose adds an _id field to objects in arrays now
      firstName:{type:String, required:true},
      lastName:{type:String, required:true},
      email:{type:String, required:true}
    }]
  }
  //...omitted
})

Best way? Maybe not, but we were able to make an adjustment after a few months of production without requiring any real data migrations. It was functional though.

Now a new feature request came in which, honestly, should have been in the front of our minds during design -- We want to list out all the users and the number of submissions they were involved in. Pretty standard I guess, but a lot more difficult than I would have imagined.

The aggregation pipeline for this got annoying. It was super helpful breaking it down into pseudo code though.

1. Unwind all the team members

  • We need to blow up our arrays so we can perform some groupings and sums on common data grounds
{
  $unwind:{
    path:'$submittedBy.team.members', 
    preserveNullAndEmptyArrays:true
  }
}

2. Convert the main user to an array

  • A possibly frivolous step but I can't seem to logic my way around it right now, however, Group the submissions by id, submitted by and push the set of the team. create a new array in this aggregation from the user information. You will see below. It's admittedly wonky.
{
  $group:{
    _id:{
      _id:'$_id',
      user:[
        {
          firstName:'$submittedBy.firstName', 
          lastName:'$submittedBy.lastName', 
          email:'$submittedBy.email'
        }
      ]
    },
    members:{$push:'$submittedBy.team.members'}
  }
}

3. Project and Merge

  • Use a project to merge the new user array with the other team members array. Project the submission ID as well
{
  $project:{
    _id:'$_id._id',
    all_members:{
      $setUnion:['$_id.user', '$members']
    }
  }
}

4. Unwind the members array.

  • The previous structure should have 1 user per submission with all users represented by one array.
{
  $unwind:{path:'$all_members'}
}

5. Final Grouping

  • Group the users by their identifying information and count up the number of times they appear.
{
  $group:{
    _id:{
      firstName:'$all_members.firstName', 
      lastName:'$all_members.lastName', 
      email:'$all_members.email'
    },
    count:{$sum:1}
  }
}

This feels like I'm just padding the post but for people that want to see what it all looks like together, here's the whole pipeline:

db.submissions.aggregate([  
  {
    $unwind:{
      path:'$submittedBy.team.members', 
      preserveNullAndEmptyArrays:true
    }
  },
  {
    $group:{
      _id:{
        _id:'$_id',
        user:[
          {
            firstName:'$submittedBy.firstName', 
            lastName:'$submittedBy.lastName', 
            email:'$submittedBy.email'}
        ]
      },
      members:{$push:'$submittedBy.team.members'}
    }
  },
  {
    $project:{
      _id:'$_id._id',
      all_members:{
        $setUnion:['$_id.user', '$members']
      }
    }
  },
  {
    $unwind:{path:'$all_members'}
  },
  {
    $group:{
      _id:{
        firstName:'$all_members.firstName', 
        lastName:'$all_members.lastName', 
        email:'$all_members.email'
      },
      count:{$sum:1}
    }
  }

My real question here today is, would it have been easier to perform this aggregation pipeline if I had maybe designed my schema better? If I had perhaps designed it so that we assumed that a team could have multiple members and that the first person in the array would be the primary, or maybe a boolean flag on one to be primary and some business logic to ensure one only primary existed?

I just don't know.
I had a few things to learn though: The unwind aggregation's presereNullAndEmptyArrays was a life saver for submissions without a team. I was definitely glad when I found that little tidbit.

Hopefully this inspired anyone that stumbles across it to make sure they have the whole story before they start designing, or at the very least gave you an idea of how to handle a potentially difficult pipeline caused by a similar design flaw as I have encountered.

Hack Away, Friends!