The video uses a mix of hand modelled items and 3D scans to produce the scenes. Most time was spent on materials as each model had to match a designers mock-ups that used stock images.
To save time on the isometric view, this was produced with smaller turntable renders of each individual object in orthographic perspective, and then put together in After Effects where more control could be had over timings in post.