Wellio with Sivan Aldor-Noiman and Erik Andrejko
In our last (but not least!) interview from NEXT, Mark and Melanie talked with Sivan Aldor-Noiman and Erik Andrejko about Wellio, an awesome new platform that combines AI and healthy eating. Wellio was developed as a way to not only educate users on the importance of proper nutrition for well-being but to give them their own personal nutritionist.
The data scientists at Wellio started from scratch (pun intended) to create their own food-related database and then began training models so the data could be organized and personalized. Using a combination of human power and machine learning techniques, Wellio learns your preferences, allergies, diets, etc. and will make healthy decisions for you based on these key facts. It chooses recipes, populates a grocery list, and even has the ingredients delivered to your door in time for dinner!
Sivan heads Data Science for Wellio, an early stage startup in the FoodTech space that is helping people eat better. In Wellio, her team delivers models that help inspire, empower and adapt to people’s eating needs, cooking abilities and health constraints. She began her career in the Israeli military serving as an instructor for an anti-tank missile unit (please don’t think Rambo, think more like a classroom teacher). Sivan then transitioned to school and received her undergraduate degree in Industrial Engineering and a Master in Statistics from the Technion, Israel Institute of Technology. She moved to the U.S. to complete a Ph.D. degree in Statistics from The Wharton School, University of Pennsylvania. In her previous job, Sivan ended up leading several Data Science teams and learned that she really liked leading technical people since she got to learn a lot from them. Ultimately, she missed the smaller company mentality, so she is back in the startup world. Sivan was once asked to define herself so here goes: “I am an enthusiastic disagreeable giver and a constant empirical driven learner”.
Erik has spent his career making a positive impact on the world through mathematics. He is a co-founder and Chief Technology Officer of Wellio - an early stage startup applying AI to the intersection of food and human health. Previously, Erik lead the data science and research organization at The Climate Corporation, which applies data science to solve challenging problems in numerous domains including climatology, agronomic modeling and geospatial applications. When not analyzing interesting datasets, Erik can often be found riding up some incline on a bicycle or cooking.
Cool things of the week
- Summary of Google Cloud Next Tokyo site
- Deep Learning Indaba GCP Credit Awards site
- Data Studio and Dataprep are now generally available blog
- Announcing general availability of Cloud Memorystore for Redis blog
- Coursera Advanced Machine Learning with TensorFlow with GCP blog
- Webinar on October 9th at 9AM PST to learn more
- Simplifying ML predictions with Google Cloud Functions blog
- 50 Best Cloud Security Podcasts site
- GCP Podcast Episode #100: Vint Cerf: past, present, and future of the internet podcast
- Wellio site
- GKE site
- Cloud Storage site
- Pub/Sub site
- Cloud Composer site
- Cloud ML Engine site
- Stackdriver site
- Cloud Functions site
- TensorFlow site
- Keras site
- Scikit Learn site
- Cloud TPU site
- Cloud AutoML site
- Cloud Vision site
- DevOps201 for Application Developers video
- Cloud Firestore site
- Day 3 Keynote: Made Here Together video
- Spinnaker site
- Contact Wellio email
Questions of the week
Is Inbox going away?
- Inbox is signing off: find your favorite features in the new Gmail blog
- 5 ways the new Gmail can help you get more done blog
Where can you find us next?
We’ll both be at Strangeloop.
Mark will probably be at Unite L.A. in October.
Melanie speaking at Monktoberfest Oct 4th in Portland, Maine.
Transcriptshow full transcript
[MUSIC PLAYING] MARK: Hi, and welcome to Episode Number 148 of the weekly Google Cloud Platform podcast. I'm Mark Randall. And I'm here with my colleague, Melanie Warrick, yet again, after a short hiatus. How you doing Melanie?
MELANIE: Hi, Mark. Welcome back.
MARK: Welcome back to you, too. I think we've both been traveling a whole lot.
MELANIE: Yes, we were. And I know last week you were a little--
--tied up over in Tokyo. Where were you in Tokyo? What were you up to?
MARK: So, yeah. I was over in Tokyo. I was at Cloud Next, presenting at the inside version of Cloud Next, so the game conference that was, like, a little side thing on Cloud Next. It was a pretty awesome event. It was really great, I think. There were thousands of people, lots of Cloud people from Tokyo.
And I also learned something very important. I didn't realize this, but their local GCP community is called CF Pug--
MELANIE: That's cute.
MARK: --user group. And they have a logo with a pug inside the hexagon. And it's a pug. It's awesome.
MELANIE: So that's a group I need to be involved in.
MARK: Yes. I have a sticker on my laptop now.
MELANIE: Nice. Well, so overall it was a great experience, it sounds like. And I'm glad you--
MARK: Yeah, it was fantastic.
MELANIE: I'm glad you made it back.
MARK: How was Indaba?
MELANIE: Yeah. I was going to say-- last week, we were sharing the episode we recorded in Stellenbosch, Africa, which was at Deep Learning Indaba. And it was great. And one thing I forgot to mention last week was that one of the awards that was given out for the best posters was Google Cloud Platform credits.
We gave out $1,000 worth of credits for 25 different posters. So it was really great. And I'm glad we were able to do the podcast and record from there. We have a couple more podcasts that we'll be sharing out later next month.
MARK: And thank you, Melanie, for taking care of the podcast while I was in Japan. I really appreciate it.
MELANIE: It was fun! For anybody who wants to know the difference between a South African accent and an Australian accent, you should listen to last week's episode, and then all the other episodes.
MARK: That sounds about right.
MELANIE: Then you can compare.
Anyway, it's good. We had some help from a few folks over there. So Mark, what's going on this week?
MARK: So this week we're bringing back a interview that we did while we were at Next earlier this year-- so the Next in San Francisco this time. We had a great discussion with the team from Wellio, with Sivan and Erik, talking about all the data science things they do over at Wellio. It was really, really cool.
MELANIE: It was. It was this full picture. That was the nice thing. It's seeming like applied machine learning. But from the start to the end, from coming up with the concept and the problem and then actually to the deployment, it was a really valuable interview. So I'm glad we're finally going to get a chance to share that with everyone.
MARK: Yeah. Yeah, a really great one-- and then we have a Question of the Week this afternoon. We're going to step away from the GCP side of things so we can move back maybe a little to the Google Cloud. So we did hear that Inbox is going away, which makes me very sad. But we're going to talk about what can we do with the Gmail features that are moved across, and talk a little bit about that and what sort of things you can use once Inbox is gone.
MELANIE: Yes how to move forward with Gmail and as the Inbox goes away. We're going to have a therapy session. It's all good. All right. So as always, we start off with our Cool Things of the Week.
And we have a couple of things that are generally available now. Data Studio and Dataprep-- so Data Studio being this great BI tool that we've been providing. It's working with more than 500 data sources and more than 100 partner-built connectors. And it's used by over a million people. It's a pretty well-used tool that's out there in terms of business intelligence.
And Dataprep, of course, allowing you to prepare your data when you're doing any type of analysis or machine learning with it. There's a lot of work, sometimes, you need to do to clean it up. And the Dataprep tool allows you to do that.
And now that it's generally available, it's got a new look. It allows for things like team collaboration and some additional features. So yeah, those are nice.
MARK: That is nice.
MARK: And speaking of generally available, we also have Cloud Memorystore for Redis going generally available. So if you're looking for a managed Redis system, you can come to us for that, as well. And we also expanded it to several new regions. So now Cloud Memorystore is supported in Tokyo, Singapore, and Netherlands, which are new, as well as Oregon, Iowa, South Carolina, Belgium, and Taiwan.
MELANIE: Nice. Another Cool Thing of the Week-- I know earlier this year, in May, we were talking about the Coursera courses that came out for TensorFlow. And these were the machine learning with TensorFlow on Google Cloud Platform course. That's a five course specialization. And it's now in the top 10 data science specializations on Coursera that's out there.
Following onto that, they went with a more advanced machine learning TensorFlow course that has been released recently. And this advanced course covers topics like end to end machine learning with TensorFlor on Google Cloud platform, production machine learning systems, image understanding with TensorFlow and Google Cloud, sequenced models for time series and natural language processing, and recommendation systems with TensorFlow. So this is something that will help take you another step down the path of expanding your machine learning expertise. And there's going to be a webinar on October 9 at 9 AM to learn more about the course.
MARK: Fantastic. Speaking of machine learning, as well, teammates of ours, Sara Robinson and Zack Akil, recently released a blog post on the Google Cloud blog called Simplifying ML Predictions with Google Cloud Functions. This is a really great, hands on blog post that takes you through all the code that they use to write cloud functions, call in cloud ML from the Cloud Function, and going all the way through. It's actually a really great read.
MELANIE: Yes, it is. And it's great to get this out from our teammates.
MELANIE: And the last thing we want to mention is that Mark Moore gave us a shout out as one of the 50 best Cloud security podcasts in an article that he wrote. And he gave us a nice little overview in terms of some of the podcast episodes, in particular, that he found especially useful. And one of those, of course, being the Vint Cerf episode, which I agree was a very valuable episode that you guys captured last year.
MARK: Thank you very much for including us on your list. It was much appreciated.
MELANIE: Thank you. All right, Mark. I think it's time for us to go and talk with Wellio.
MARK: Yeah. I think we did enough Cool Things of the Week. Let's go talk with Wellio.
MARK: So it's day three here at Next. And it's the final interview, but I'm super excited. We have two wonderful people with us. I'm going to probably massacre your names, but we'll try our very best. Sivan Aldor-Noiman?
MARK: Wonderful. VP of Data Science at Wellio-- and Erik Andrejko?
MARK: Oh, my god. I'm doing really well.
SIVAN: You're amazing.
MARK: Also CTO of Wellio-- thank you so much for joining us and taking time out of Next to come hang out with us.
ERIK: Thanks for having us.
SIVAN: That's our pleasure, yeah.
MARK: Awesome. We want to hear all about what you're doing at Wellio, how you do it, how all the stuff works, basically. But do you want to tell us a little bit about who both you are and what it is you do? Sivan, you want to go first?
SIVAN: Sure. So my name's Sivan. And it's a nice Israeli name, Hebrew name, Hebrew month. And my background is in statistics. So I have a master and a PhD in the statistics world.
And I was intrigued in applying statistics to temporal spatial models, so that's where I started my career. And I worked for several years in the agriculture technology space, alongside Erik actually. And after several years there, I transitioned to Wellio.
And in Wellio, we work on the food technology space, which is kind of in a similar space, right? They're related to each other. But one tackles more the demand side and the other the supply side. So after several years in the supply side of food, I really wanted to move to the demand. Because if you want to change supply, you really should just change demand. That's much easier.
So I joined Wellio. And I've been developing models for, and recommendation systems for food. And I think we'll tell you a little bit more about Wellio in a minute. But yeah, I build models for life. I am a super duper, happy statistician, data scientist slash machine learner, and all the words above that describe the same position.
MARK: And you like chocolate.
SIVAN: And I love dark chocolate, correct. It's true.
ERIK: And I'm Erik. I am CTO and co-founder of Wellio. My background is very similar to Sivan's in some ways. I'm trained as a computer scientist. I did a PhD, actually, in pure math. I think of myself as a disillusioned mathematician.
I got a little tired of staring at the wall and wanted to do something a little more impactful. So I came out to the Bay Area, and I've been working in machine learning ever since. And Wellio kind of grew out of a personal problem that I had that I think is shared by many people, which is to feed my family at home healthy meals and make good decisions about food and ways to make eating at home the easiest and healthiest choice. And especially when making decisions under duress, like, with a crying kid, and hungry-- that's exactly the moment when I want machines to make decisions for me, because I know I won't make a good one.
And I had all this exposure to technology that was helping make nutrition decisions for clients, but I had none of that technology for my own family. And so I wanted to create the experience of having a personal chef and a nutritionist in the house to help with meal planning and grocery shopping and those sorts of things. And I imagined a way in which that could be embodied in an intelligent machine. So that's kind of how we got to here.
SIVAN: When we talk about Wellio, we talk about how can we make a personalized family nutritionist available to everyone in the world, not just someone that can afford it. We both have families. I have two kids. So we we've tackled the problem of how do you feed everyone in the family.
How do you personalize it? I'm on a diet. My husband likes these kind of foods. My kids don't eat these types of foods. So how do we really cook at home so that everyone's happy, everyone's healthy? And also, the other side is how do we make people learn the relationship between food and their well-being, which is very hard. It's very hard to know.
And you look at trends of how people have thought about diets over time. It changes, right? One decade you need more Vitamin C. And the next decade, it's Vitamin D.
And oh, eat more meat. Eat less meat. There's not a lot of consistent science in this domain and definitely very hard to personalize that domain. And that's sort of the problem we're trying to tackle.
MELANIE: Well, how does Wellio work?
ERIK: That's a very good question. There's two things to Wellio. There's the technology platform, which is really ultimately designed to be an AI for food, if you excuse the buzzwords for this type of audience.
MARK: What does the user experience look like?
ERIK: Yeah. And so on top of the user experience, one user experience we are testing in the market today-- it's essentially a virtual culinary assistant. So you say something like, I would like to eat shrimp tacos tonight. And then it knows, oh, you probably want corn tortillas because there's someone in the house who is gluten free.
You like these brands. You need this number of servings. You'll probably like this recipe, but it needs to be modified a little bit to make it spicy, less spicy, reduce the fat content, increase the protein content, something like that.
And the system goes out, translates that high level request into a specific grocery order, it executes the grocery order for you. And then through one of our partners, we deliver a bag of groceries and customized recipes to you the same day. So it translates a high level intent into dinner tonight.
MELANIE: And what markets do you serve?
ERIK: Right now, we're not publicly available, although you can sign up for our beta list. And we can currently serve about 60% of the US market. We're mostly limited on availability of a delivery partner. And we partner with Amazon Fresh, Instacart, people like that to actually do the last mile delivery.
MELANIE: Nice. So in terms of data science, how does data science play a role into this?
SIVAN: So from minute zero, data and data science played a really big role. So I think you can imagine you can build this company two ways. One is you can start with very limited meals. Bring an expert in the domain, like a chef, and start creating very tiny, little recipes and start promoting those recipes.
And over time, you would increase your database and make it bigger, and learn intent, and so on, and so on. We did not go that path. We didn't feel that's a scalable path. And really, from the get go, we said we really want to understand the culinary space and the food space. And in order to do that as a data scientist, what you first need is a big data set.
And so we went out and got a vast amount of data, recipe data, for example, and blog data on food, and articles on food. And we generated a gigantic database with food related information. And then we could start training models on top of it.
And when Erik says, like, translate with intent to a shopping list and doing all of that, there is a lot of models that come into play. Taking this unstructured data, whether it's the intent, or even if you think about the data that we've accumulated. If you look at recipe data from different varieties and you really think about not a single web site, but many, many web sites, it's unstructured data.
So the first thing you have to do is apply some way to structure it. And we took a very machine learning, deep learning approaches. And we structure it using models.
And once the data is structured, then you can start building recommendation systems on top of them that would recommend what recipe, how to adapt the recipe, personalize it to a certain person or a family. Even how to make it into a shopping list-- so now that you have a recipe, how do you personalize the shopping list? Maybe it's particular stores that people like, or brands or particular items that they prefer, organic, for example, or traditional. So all these personalizations on top is another part where the machine learning and data science comes in.
MELANIE: Wait, and where does your data come from?
ERIK: The data comes from proprietary sources and also from the web.
MELANIE: Got it.
SIVAN: So it's a combo.
MARK: So when you're doing-- and this is maybe my naive understanding of the machine learning side of things-- but when you're doing these deep model analysis of the data that you have, are you essentially categorizing it, so that people can use it? Or tagging it?
SIVAN: For us, there' s a different level of structuring. So when you think of a recipe, you can think of ingredients and preparation lines. So one type of structuring is to take, for example, an ingredient line, and extracting certain information from that alone. Like, for example, if it says one cup of sugar, you would need to know that sugar is the actual food entity that was in line. And the cup is the amount. And one is a unit of measure, and one is the amount. So one is structuring that kind of structuring, which is kind of annotation, things like that.
And the other one is deriving additional labels. So for example, cuisine type, or the type of course, or things like that-- so that's another type of models, classification models that we have to develop. And you're right. Yeah.
ERIK: And I mean, there's some other things that we've done that were very interesting, that are more on the latent semantics space. So for example, on top of this parsing of recipes, you now have a data set where you have co-occurrence of food terms. So Parmesan and tomatoes co-occur quite frequently for some reason. And you can actually build, essentially, something just like Word2Vec. You can build vector representations of these food terms.
And the vector representations capture culinary use. So if you look in the vector space for things that are very close to Parmesan, you find all of the other Italian cheeses-- reggiano, et cetera, et cetera. And so by taking this even semi-structured data, and then using some of the techniques we all know and love, you can get pretty rich representations that you can then use as good signals or features for other machine learning models that maybe are more--
SIVAN: Yeah, for example, ingredient substitutions-- you can think about it that way, right? You can think of generative recipes, all sorts of these kind of applications on top of that.
MELANIE: Are you doing generative recipes?
SIVAN: I would not say that we've conquered generative recipes.
MELANIE: But you're exploring it?
SIVAN: But we're exploring it, yeah.
MELANIE: Yeah, especially from ingredients-- I know for people who-- they may have certain types of ingredients in their house. And some of these recipes can be very-- like a barrier to entry sometimes, with some of the ingredients they ask for. It's nice when you can see what's easy to swap.
ERIK: Yeah. And I mean, there's also very practical things. Because oftentimes you go to the grocery store and the grocery store themselves don't know what inventory they have.
ERIK: And they may have run out of the thing that you intended to buy. So you need to make a decision in the moment about what to do.
MARK: And so, OK, once you've got the data set for the food, essentially the ingredients and the recipes, I'm guessing there's another side which is the consumers' wants and needs, and how do you mush the two things together?
SIVAN: Yeah. That's the personalization step, right? So there's kind of two types that Erik mentioned when he sort of described the journey. One is more about the type of things I like, my taste. I like these types of recipes. Or I'm following a certain diet, paleo or keto.
SIVAN: There's the health conditions that could sort of also dictate certain things. That's more on the food domain, when you ask people. And then there's more on the shopping domain, which is what kind of stores do I like to shop in, and what kind of grocery items do I usually buy--
SIVAN: --what my budget is. I have other constraints that may come in. Or what do I have in my pantry and how do I bring it?
Or I dislike these types-- like, I hate pickles. So if you bring me a recipe with pickles, you're going to have to recommend some alternative to those pickles. So there's all these information-- some of it we can ask directly.
A lot of other companies do. Very, very directly, like a diet-- are you on this specific diet, dislikes and likes, things like that. In others, I think, we've taken a more different approach. For example, taste, for me, people can describe what they like. But it's much easier and maybe much more informative to not ask it directly, but indirectly by asking them to choose a set of recipes from a list, or to learn it over time, because they're selecting more and more recipes, so you can over time adapt to their taste better.
Or explore their taste varieties that they like, as opposed to asking if you prefer savory or sweet or things like that. The people may say it, but they may not always exactly want that one thing. They may want to venture out. So some things we collect very directly by asking and others kind of indirectly through the interaction with the app itself.
MARK: Yeah. And that makes sense. I'm just thinking if you asked me directly what movies I watch, I would probably tell you art house, because I'm cultured. But if I'm at home, I'm probably watching "Terminator 2."
MELANIE: Or "Aglow--" anyways, no plugging there-- so this is great. What are some of the GCP products that you're using?
ERIK: I think it may be easier to say which GCP products we're not using.
SIVAN: We're not using, yeah, I think--
MELANIE: That's a new one. Let's try that.
ERIK: What are we not using?
ERIK: We're not using Bigtable. That's about it.
MELANIE: Oh, wow! So you're using everything else?
ERIK: Well, we're not using Bigtable. We're not using Spanner.
MELANIE: Do you use TPUs?
MARK: Why don't we start on the journey where you started, which is the data side of things. Where does that data go, and how do you process it?
ERIK: Yes. Actually-- so the first thing we did build was the crawling system. Because I'm a data scientist. I've worked with many data scientists. I've never met a data scientist who didn't want more data. So I figured we better have data if we're going to be successful.
So we built a crawling system that's built in Python. The actual operations of it is heavily influenced by the availability of things like GKE. That runs on GKE auto scales on GKE. And all of the data coming out of that is dumped on to GCS.
And then all of the post-processing happens via Pub/Sub so that there's various stages. It goes through kind of a staged pipeline that's all coordinated by Pub/Sub and there are various GKE clusters that read the messages--
MELANIE: And this is stored on BigQuery, even when it's unstructured?
ERIK: Yeah. There's a whole bunch of side channels that, out of this pipeline, we collect data at various levels of normalization. That is available on GCS in some rawer form, which we can then access, and we do access via Dataproc. But primarily, it ends up in BigQuery. And then most of the machine learning, either kind of in an ad hoc, or in a more formal repeated training--
MELANIE: Are you using Dataproc with more like a Hadoop or with a Spark?
ERIK: With Spark.
MELANIE: Got it.
ERIK: Yeah. Actually, we're increasingly kind of automating a lot of our model training evaluation and appointment pipelines on Composer.
MARK: Tell us about Composer, because I think that's very new. In fact, I have no idea what it is.
ERIK: Composer is essentially an orchestration layer. It's host at Airflow. I mean, Airflow seems to have been widely-- increasingly widely used in the industry for doing machine learning pipelines.
Where I have Step A, I transform some data. Step B, I maybe need to train a model. Next step is maybe I do model evaluation and then model selection. And then I need to deploy it.
So if you want to run model training on some schedule like that, we've been doing that recently in Composer, and that's worked very well. There's a lot of nice hooks for some of the other GCP offerings. For instance, like Cloud ML Engine-- we use a lot of Cloud ML Engine with Keras, actually, increasingly. And we launch our training jobs directly from Composer as we get new data coming in, new label data, or new unstructured data, depending on the pipeline.
SIVAN: Yeah. And we do have several models that really are kind of continuous integration development and deployment in the sense that we've developed them, we train them. We get all of that from Composer. And then there's a human in the loop that, once the output of the model is shown, we have an internal tool that would show the model output.
Internally, on our team, we have experts-- domain experts, chefs, and people in the culinary space-- that would review the labels that the model produced. For example, we talked about-- it could say the type of cuisine or things like that. And they would correct and we would collect that corrected data. And then we can train, again, the model with more data. So that over time, the model improves, and we can monitor that improvement. So we definitely have kind of those mechanisms.
MARK: That's actually really interesting. So you're not just ML. And as you said before, you were, like, we're not just [? doing human. ?] It's kind of the combination of both or a little bit of tweaking.
SIVAN: Yeah. I think--
MELANIE: A lot of companies do that, too.
SIVAN: Yeah. I think even with our experience from our previous company, it's very clear to us that machine learnings and models-- I guess that's also my background-- I cannot, in good conscience, have a model that I believe will always be right. There's always going to be cases where the model is going to be wrong. And as Erik said, as a data scientist, I'm always going to want more data, so having experts in the room talking to you, helping you understand the data.
And I would admit, I cooked before coming to Wellio, but I am not a chef. And I am not an expert in this domain. So having the domain expert sit next to me and asking lots of questions about the data so that I understand how to properly utilize it really helps. And so yeah, we've devised a mechanism so that we can kind of get the most from this domain expert and scale in some sense.
MELANIE: When the data is being flowed through your ETL and through the data and doing the training, is it batched? Is it streaming? Is it a mix?
SIVAN: It's a mix.
ERIK: Yeah. It's a mix. It's primarily batched today, especially on the training side. I think going forward, it'll be more online and streaming.
We do do streaming in another way, which we do by human-in-the-loop. And that is to have human-in-the-loop kind of operationally. So we have a system where a model will be selecting a grocery list, say, for an individual. And then the model also outputs confidence scores with respect to the rankings its made in terms of the grocery items.
And then if that falls below some threshold, we actually will shunt some of that work to a human expert. And the human expert can review and override some of model-based decisions. That's entirely a streaming, that process.
SIVAN: I would say we have sufficient data that we don't get a lot from training online, like, from every observation. We probably do need enough quantity to sort of move the needle again. Because the needle is already quite high, so to move it again takes a while. And so I think the system that Erik describes helps us deal with outliers and things that don't work all the way, like it should, because they're outlier. And it will take time for the machine learning to sort of go get all those kind of edge cases.
MELANIE: And I'm also curious, in terms of your setup, are you using from the more, like, data engineering side of things, Stackdriver and other performance tools, too?
SIVAN: Yeah. Yeah.
MELANIE: To make sure it's--
MELANIE: --the system's functioning the way you want it to?
SIVAN: When Erik said-- asked us what we're not using, it's actually right. It's probably true. You know, there's a presentation with all the Google icons in them?
SIVAN: But I think we can highlight many, many of the icons that we use. We're really not just the data science, but those in the engineering side, very adventurous and up to date with the current technology that Google is offering so that we can adapt quickly and move fast. And I think we've been really, really good at-- I know a lot of beta, like Composer and other things, that are in pure beta that we're, like, really adapting already into our stack.
I got excited in this conference to hear more about Cloud Function in bringing Python into the mix of it. So I'm eager to test that in our infrastructure. We don't have that yet. But definitely, other components from Google have made it very easy as a data scientist to sort of explore data, push models into production. It's been a really nice platform to work with.
MELANIE: And you mentioned some of the models that you're working with, or recommend, or-- well, a lot of these end up being some form of recommendation or classification, anyways. But it sounds like you're using ML Engine, or you said you were using ML Engine. So you're using neural nets, then?
SIVAN: Some are deep learning nets, and some are-- I like to call them shallow models, now that everyone really loves the buzzwords of deep models.
So yeah, we use a combination of shallow and deep model, dependent on the application. But a lot of it--
ERIK: Most are DNNs.
SIVAN: --we have these semantic representation of certain things. And then we might train a more shallow model on top of it, right-- so, like, transfer learn and things like that. So we have a combo of these approaches.
MELANIE: Are you using other things, like random forests and linear regressions, based on the types of other problems that you're working on?
SIVAN: Yeah, yeah, yeah.
MELANIE: Most companies would be, yeah.
SIVAN: It's a combination. I really am not-- I'm trained as a statistician. So I'd rather be-- and I'm practical. So I'm, like, OK, what is the right approach for this problem?
And I don't believe the deep learning is the solution to all of the problems in the world. I believe that we need to understand the data, you need to understand what you're trying to solve. Sometimes simple approaches can get you what you need in order to solve the business problem.
ERIK: It tends to be a very layered approach, in that the models that are consuming essentially unstructured data-- that could be raw text; that could be images, for example-- tend to be deep neural networks. And then they output some vector representation or some additional auxiliary data or labeling that then can be fed into an application-specific model. And sometimes a model is a deep neural network. But very often, it's not.
MARK: How are you serving these models? Like, how do you end up using them at the end?
ERIK: Yeah, we serve them-- well, we have a microservices architecture. And so everything, for the most part, ends up in GKE in some form or another. We're starting to increasingly use Cloud ML Engine as a deployment platform when we can.
I'm excited to try that with more frameworks than TensorFlow. We've done it with TensorFlow. I'm excited to try that outside of TensorFlow. Because we like to think about the models we produce as essentially appliances, and then the engineering organization can just use them as any other service. It's just another API that you call.
We're a very small team. We have no platform engineering team. GCP is our platform engineering team. So we either have none, or we have a giant one, depending on how you look at it. And we have no operations team. Everyone kind of owns operations and that includes all of the data scientists. So on the machine learning operations, online model monitoring, performance monitoring the Stackdriver, for example, that's all done by--
ERIK: --by the-- yeah.
SIVAN: Yeah, by our data scientists.
MARK: Yeah. So that's really interesting. And you're using a lot of very different pieces of GCP. But you're all learning it with this small team. Are you facing any challenges about basically using so many things, or are you finding it nice, or how is that experience like?
ERIK: Well, I mean, some things are kind of new, I think, to the data science industry in some ways. I mean, most times when we talk to other data scientists, they're not super familiar with Kubernetes, for example. So that's just new.
But I think what we found is very nice is that the abstraction layers in the GCP seem to be well-considered in that you don't have to re-learn API primitives or command line interfaces over and over again to use different parts of the system. There's a fair amount of consistency. And so once our engineers or data scientists kind of grok the underlying conceptual space, using it tends to be pretty easy.
SIVAN: And there's some learning, right, that you need to do in order to ramp up on the different technologies. But I feel like-- like, I never worked on the Google Cloud platform before joining Wellio. And yeah, it took a while but not as much as it took me other platforms. So that's a good thing.
MARK: That's the sound bite. Thank you, [? we're done. ?]
MELANIE: Well, I have two different questions I wanted to ask. Earlier before, you were saying you want to try to make it accessible outside of TensorFlow. Is there a reason why you want to be able to explore beyond TensorFlow, and is there a specific software that you'd be more interested in?
ERIK: Well. As Sivan was saying, our quote unquote "shallow" models-- I love this brand name, by the way.
SIVAN: I say constantly-- shallow and deep. And I find it very funny that people call it deep and not know there's no alternative.
ERIK: Yeah. Once you go deep, it's hard--
SIVAN: It's shallow.
ERIK: --too say I'm not doing anything deep. So for our non-deep neural network models are primarily Scikit learned. And so to be able to deploy those as individual [INAUDIBLE] interfaces would be very useful.
MELANIE: Scikit Learn is very popular, yes.
ERIK: Yes. Scikit Learn is very popular. I think if we could do both of those, and then, that allows us also to do model versioning so that we can do A/B testing or canary deployments. That's something we want to do more of.
And it provides a safety net so that we can more quickly deploy new models to production. And we don't have to be so careful in terms of the testing, but we can deploy and monitor. And then we have the safety to roll back if we need to.
MELANIE: And you were saying how it took a little bit of time to get up to speed. Any tips or tricks or advice for those who are exploring GCP for the first time, trying to build out their own solutions for their company?
SIVAN: So I was in a bigger company before. So in a bigger company, data scientists, at least in my experience, they really have a very specific role. They don't really do the data engineering that's done by someone else. They're very, very data science. Meaning you look at the data, you build the model, and you push it to someone else.
And I think my biggest problem was, oh, I now need to learn all the data engineering side, because no one's going to do it for me. Oh, I need to learn DevOps. There's no DevOps team. Oh, I need to do all these things. So it was more tackling those things.
I think from a data science perspective, if my role in this company was just look at data, develop models, and push it, this platform makes it extremely easy to do those tasks. Really, like, accessing the data in different formats from different locations has become extremely easy and simple. Spinning up your own notebooks to be able to explore the data from different places is very easy. Deploying the model now becomes extremely easy, too, with Cloud ML. So it's a platform that's designed in some sense for our data scientists to play around with.
I think it becomes more challenging when you're, like, yeah, oh, you're also the engineer and the DevOp. So you need to learn about Kubernetes. And oh, yeah, you do need to understand all these things. Then it becomes a little bit more complicated.
MELANIE: You also mentioned earlier that you are using TPUs. How are you using TPUs?
SIVAN: For training purposes.
ERIK: Yeah, well, one thing that's very important for us is to think about our biggest costs. And our biggest cost is opportunity cost. In that we don't have an infinite number of software engineers. So if a software engineer is doing x, they're not doing y.
And that means, especially for people building and building models, that the foster they can get to the next iteration result, the better. So that is reducing training time. And we use accelerators and TPUs and GPUs for reducing time to answer.
And the other thing that we increasingly use, which has been very beneficial for us, is the hyperparameter optimization.
ERIK: So it used to be that we would have these long discussions.
SIVAN: That's true.
ERIK: People thought that their job as a data scientist was to know and have intuition about what type of parameters to choose, and now I tell them, yeah, don't really worry about that. We'll just run a big experiment. And I'd rather have the experiment running than spend another couple of days thinking about which parameter should be used.
MARK: What's the issue there, for people who aren't familiar with that space?
ERIK: When you build-- say, take a deep neural network, you have to decide how many layers you should have, how big each layer should be, which units you have in each layer. You could choose from different units.
MELANIE: Activation functions.
ERIK: Yeah, I mean, there's a smorgasbord of--
MELANIE: The way I like to explain it, and I'm just going to jump in with this, is it's almost like you have one of those very complex radios that you can tune all the things for the radio to make the sounds sound like a certain quality. But you've got to be an expert on how to tune it. And that can be a challenge.
ERIK: Yeah. That can be a challenge. And the alternative to be an expert is, you know, let the machine worry about that. Because then you can go on to doing things that I think are higher value, which is, well, what problems would we be solving?
Which work should we prioritize? Which new model should we build? And let the platform do that fine tuning. I think we've seen the same thing, by the way, with feature engineering. And I've had discussions with data scientists, even Googlers data scientists, who told me that they think their job is getting automated away. Because their job used to be to look at some raw data, and then they could figure out how to encode the raw data in these features that would work well in a machine learned model.
And now with deep neural networks, the network does most of the encoding. And it does a better job of encoding the raw input than most data scientists do. So that a data scientist's job is no longer to be a feature engineer. And I think the next phase is the data scientist's job is not going to be to be choosing hyperparameters. And then it's probably going to be not to be choosing model architectures.
MELANIE: Well, and that starts to touch into AutoML, too. Have you explored that yet?
SIVAN: A little.
ERIK: Yes. That's on the list of things we have used, yes.
SIVAN: [LAUGHS] We've tried once or twice. There might be good use cases for it, and I agree that some automation is good. But maybe, like, my own internal thing is I want to see the model. I need to see the actual architecture and what it does.
And I believe in the technology of deep learning. I understand it mathematically. I understand where it's going.
But I also believe that the value of a data scientist is in interpreting some of these things sometimes. And the more deep learning is applied, the less people have an understanding-- even the people who actually developed these models have less understanding of what's going on internally, what are the actual features it's creating, how to explain it to anyone, the causal things that people like to interpret with-- causality becomes really hard. So I think with AutoML it's even more obscure than normal.
I see a lot of benefit for certain places, especially if you want to accelerate certain things that are very common tasks in machine learning. Then you should use it because you don't need to hire today a whole suite of data scientists just to build a model. That's already kind of implemented for you.
SIVAN: And it takes away that magic of feature selection and all of these things. So I think it's very useful for those kinds of use cases. I feel like if you have a data scientist and they know what they're doing, letting them play around with TensorFlow and building models that you can better grasp when they fail and when they're good-- it's good for the organization.
So this is why, I guess, Erik and I are a good pairing. He pushes in that direction. And I push in the other direction. So on average, it's fine.
MELANIE: And one other thing on the TPU side-- do you combine those with GPUs and CPUs in the calculations and the computations that you're doing?
ERIK: Yeah. That's one thing that I would say is still a challenge for us. Which is, when you have a particular model training job, having to decide the compute infrastructure. Because sometimes it's actually-- especially in dollar terms-- that requires a lot of trial and error on our part. I'd love to be able to just set a budget. I would like to train this model for this many dollars and be able to trade off time to conversions in dollars.
MELANIE: Ooh. It sounds like a good product to build.
ERIK: Yeah. Because--
SIVAN: A forecaster.
ERIK: Yeah. Because what we do now is, we'll run a job and we'll use just CPU, distributed CPU training. And then we'll compare that with a single GPU. And then you can choose your flavors of the GPUs or DPU. And then we have to decide for that class of models which one we think we should use as the standard in our Composer workflow, which is in the automated training. That's a very operational, optimization decision that most, I think-- most data scientists probably don't want to do.
SIVAN: But I've seen talks in this conference talk a lot about the engineering costs. And they less focus about these topics of, like, oh, how much does it take to train your model? It's more about if you have an app, where should you deploy it, and how much it will cost you. Or operationally, what's going on. And I think this is kind of an area that hasn't yet been fully explored, maybe.
MELANIE: Well, and off that point, what have you seen at Next that you've been excited about or that you're interested in looking into further?
SIVAN: We actually made a list--
SIVAN: --just before coming here, without knowing that this was going to be a question. And there were quite a few things. So I got really excited about Cloud Functions. I can see immediate benefits. And I like that there's a Python beta version.
So I'm, like, OK, we might want to try that. We might want to venture into that. There's some Google vision that we looked at that we were, like, oh, we might be able to capitalize on it. There's some really cool, new things that were shown there.
I actually went to an amazing talk that maybe is not new, but just explained it in a better way. She was a DevOps advocator. But she was one of the keynote speakers today.
MARK: Was it Aja?
MARK: Aja's great. She's on our team.
SIVAN: She gave a talk yesterday with kind of tips for DevOps for people who are not DevOps. And it's going to be on YouTube, I head.
MARK: We'll find it and [INAUDIBLE].
SIVAN: So I would recommend that one. It was a very, very good talk. And so I'm excited to take some of the learnings she suggested there.
I was telling Erik there was a talk about Traffic Director. We Googled it and couldn't find it. So there's an alpha version of something called Traffic Director-- Istio and it works [? well ?] with Envoy. And so all of these things, we're, like, oh, those are interesting things we should start thinking about.
And then the last one was the Firestore stuff that were mentioned through out. Lots of things here in Next were talking about Firestore. So I thought there was some nifty ideas in how you kind of immediately go from Firestore into BigQuery and other kind of storages so that the data can get shown more quickly to the data scientist into the models. So I'm kind of excited about that one, too-- so lots of things.
MARK: Anything on your end or is that the entire list?
MARK: It was a great list.
ERIK: It's pretty similar. I mean, I think we're increasingly, as a business, providing our platform to partners. And that means giving them access to APIs and that creates kind of additional operational needs. So I'm really excited about Istio.
In terms of talks and kind of ways to use Kubernetes well, the developer-keynote Kelsey Hightower's talk was pretty awesome.
ERIK: I actually had the chance to, a few weeks ago, serendipitously, pair program with him at the end of dinner for 15 minutes. And I can say that the talk is exactly like the pair programming experience.
ERIK: So I'm excited to be able to use more of the operational elements, things like STO and Spinnaker-- kind of really extend GKE as an operations platform and be able to do less operations, especially as we do B2B integrations.
MARK: Fantastic. Well, before we wrap up today, is there anything we've missed, or anything you want to make sure that our listeners know about? Are you hiring?
SIVAN: Only eat dark chocolate above a certain percent, otherwise the experience is very bad. And don't eat 100%, because that's also a very bad experience.
That's one very important tip. We are hiring. We're looking for back end engineering slash data engineering to join my team, so definitely that. We also have the--
ERIK: But we're hiring in all kinds of roles.
SIVAN: --senior application developer, so definitely people on that spectrum. And I think if you're generally interested in the food tech space, it would be great to get in touch with us. I mean, I assume the email is going to be somewhere visible. And so we'd be always happy to talk to people who are passionate about this space and have opinions, even if they're not technology kind of people. That's totally fine, too.
MELANIE: Well, thank you both for joining.
ERIK: Thank you.
MARK: So thanks again to Sivan and Erik for joining us on the podcast. This was a great episode. I'm so glad that you had the time to sit down with us at Next and really share your knowledge with how Wellio was built. It was really awesome.
MELANIE: Great. And I think that wraps it up for our Next interviews finally. So thank you for being our last, but definitely not the least episode to be able to share with everyone.
OK, Mark, let's talk about Inbox. Inbox is going to be going away in the next six months is my understanding. And so there's been a lot of work to make sure some of the features and functionality that Inbox has been moved over to Gmail. So what does that look like?
MARK: Yes. So that's a good question. So first, Melanie, are you an Inbox user?
MELANIE: I have been.
MARK: I am definitely an Inbox user.
MELANIE: How do you feel?
MARK: I-- [SIGH] um-- [SIGHS] I'm OK. I'm OK
MELANIE: You're OK. [LAUGHS]
MARK: I'm OK. Most of everything's been moved across. The only feature that I would like to see moved across, if anyone is listening, I would love to see Bundles move across. If that happens, then I'm fine.
MELANIE: They do say that there's some more additional features that they're working on bringing over. But I agree. That in Inbox, the snooze feature was my favorite. And so when that came into Gmail, it made it easier for me to start it up in Gmail again.
MARK: Yeah. So definitely Inbox is my life. Inbox is my to do list. So there is-- yes, March 2019, Inbox is being shut down. It was a wonderful experiment.
And so they're moving a lot of the features over, which is really good. And a lot of the features have been moved over, as well. There is a transition guide that has been written. So if you're looking for like snoozing of emails, yep, we can do that, which is really good. Because yeah, snoozing is the best.
MELANIE: You can work offline.
MARK: Yeah, you can do stuff offline, which is awesome, too. Though I do particularly appreciate coming back to Gmail. The new smart replies-- I really like that.
MELANIE: So yes, we've got a transition guide that we're going to link for everyone to see what types of features have been brought over to Gmail, so you can have an understanding of how that will look, how that will work.
MARK: Yeah. And pretty much everything has been moved over, right? Like reminders-- if you use those, which I do. Moving over to Google Tasks is pretty seamless.
MARK: So you just have to move that stuff across, which is pretty good. Snooze is still there. You can replicate Bundles with using tags and filters. So-- [PIANO PLAYING] it's not-- it's not the end of the world. [SIGH] It's not the end of the world. I'm OK.
MELANIE: It is not the end of the world. And we highly recommend to everyone, if you are using Inbox, to start taking a look at Gmail again and see how that is transformed.
MARK: Yeah. You might actually not recognize it, because it did go through a big redesign a little while ago.
MARK: And there's a lot of really cool features in there.
MELANIE: All right, Mark, where are you going to be? What you up to?
MARK: You and I are going to be in Strange Loop in--
MELANIE: We are.
MARK: --basically on the day this comes out. I think I'm flying in later than you are.
MELANIE: Yes, I know. You're a little bummed because you're missing out on the museum party they do every year.
MARK: Yeah apparently people want to meet me and talk to me about stuff. So yes, I will be late. I am a little sad.
MELANIE: But why are you going to be late?
MARK: Basically, internal games meetings stuff is basically the reason. I'm chairing our internal Game Summit.
MELANIE: That's fun.
MARK: Yeah, it's good. It's going to be awesome.
MELANIE: It is going to be awesome. And Strange Loop will be really wonderful, too. And we're planning on recording a podcast there that we will share soon. So you'll hear more about that later. And Mark, anywhere else you're going to be in the next month?
MARK: I don't think so. I think October is going to be relatively quiet. There is a chance that I will be at Unite LA. That's probably where I will be. I have no idea what I'm doing there or what I'm doing. But it seems likely that I'll probably be at Unite LA. Are you going anywhere?
MELANIE: I am. I'm going to be in Portland, Maine the week after Strange Loop. I'm going to be speaking at Oktoberfest.
MARK: Oh, cool.
MELANIE: Yeah, should be fun. All right, Mark. I think that's it for us for this week.
MARK: I think so, too. So Melanie, thank you so much for joining me for yet another episode of the podcast.
MELANIE: Thank you.
MARK: And thank you all for listening. And we'll see you all next week.
Mark Mandel and Melanie Warrick
Continue the conversation
Leave us a comment on Reddit