Job Title: Senior Software Engineer

Company Name: Parakeet Health

Job Url: https://app.dover.com/apply/parakeet-health/95582c79-0cca-4fe4-8c5f-39d0506f9a92?rs=42706078&jr_id=69b0cd3b0b2db6275c056a83

Job Description: Senior Software Engineer at Parakeet Health
In Office (San Francisco, CA)
Full-time
$170,000 - $230,000 / yr
parakeethealth.com
View all open roles
Every keystroke you spend writing a line of code is one that could be better spent writing instructions for your army of agents.
(Ask us how we know.)

Who We Are
A few weeks ago a 95-year-old patient called one of our practices. She apologized for being long-winded before she’d said much of anything. “I promise my mind is still very sharp,” she said; you could hear in her voice that she’d been made to feel otherwise. Our agent didn’t cut her off. It listened. She thanked it at the end of the call for being so patient.

She probably didn’t know she was talking to AI. But she felt that someone had listened.

Most voice agent companies in healthcare are racing to solve one thing: getting patients off hold fast enough to schedule an appointment. We think that’s the wrong finish line. The moment a patient calls is just the beginning: there’s the referral fax, the insurance check, the prescription refill, the follow-up, the billing. Most of it is invisible to the patient and exhausting for the practice. We’re building the platform that handles all of it, with the patient at the center of every touchpoint.

We’re 1.5 years in and working with 400+ medical practices across 27 specialities, including primary care, dermatology, optometry, and mental health. One Medical's founder is an investor. We just raised our Series A (which means now is a very good time to join).

The Role
Honest take: the tools we’re building with today could be obsolete in six months, maybe sooner. The engineers who thrive here are the ones who find that energizing, and who treat LLMs as force multipliers for everything: thinking through problems, catching edge cases, moving faster than should be possible.

In your first two weeks, you’ll launch a new end-to-end agent use case for a live customer. Not a toy project; a real one, for a practice that serves millions of patients. The gray-area decisions around patient experience are the ones LLMs consistently fumble: what the agent says when something goes wrong, how it recovers mid-conversation, when it hands off to a human. Someone with actual judgment has to own those calls, and that someone is you.

Your job, more broadly, is to harness LLMs to give yourself as much leverage as possible across everything you do. Use them to uncover product gaps, pressure-test your own ideas, and compress the distance between a thought and a shipped thing. (Use them to order your lunch if that’s what it takes. We mean that. Any time spent increasing leverage while pushing the quality bar higher is time well spent here.)

You’ll also be closer to customers than most engineers at your level, because the best product decisions here come from engineers who heard the problem firsthand. No feedback filtered through three layers of telephone.

What it looks like day to day: our engineers plan and build using an internal agent harness with custom skills. We already have agents crawling our Datadog logs and fixing bugs autonomously. Non-engineers are one-shotting bug fixes in Slack (I’ll be honest, I was skeptical of this until I watched it happen). Even our onboarding docs are agentic: we have a custom agent that conjures up a design diagram in your favorite color on demand. Getting to this point required a lot of upfront investment, including building out our harness with rich context graphs and MCPs, and we’re starting to enjoy the fruits of that. But we've only scratched the surface, and we know it.

Our North Star is that an engineer here could ship meaningful work without manually writing a single line of code… maybe ever. We’re probably 15% of the way there today, and we think we could be at 70% in four months. Getting us there is part of the job.

What makes this harder than it sounds: our agents aren’t handling toy interactions. They’re speaking with real patients—people who are afraid, in pain, or navigating something complicated—and what we ship runs on their most sensitive health data. A fast-moving, LLM-heavy engineering culture has to coexist with a genuinely low margin for error. We’ve spent a lot of time figuring out how to make that work. The short version: it’s possible, it’s working, and there’s still a lot of runway ahead.

What You’d Work On
One of our customers was processing 50,000 faxes a month. Each one required a person to spend five minutes reading, sorting, and inputting the relevant information—armies of 10 to 20 people, eight hours a day, five days a week, just to keep up. The day after they brought this problem to us, we had them forwarding faxes into our system. Within two weeks we had an MVP running on actual production faxes: a tool that reads incoming referrals and prior authorizations and calls the patient to schedule them. What used to take 5 to 7 days now takes one minute. That product is live today, and we didn't cut corners on compliance or quality to get there.

Building on and expanding that platform is the work. Specific problems you’d wrestle with:

Voice and conversation design at scale. Our agents speak to tens of thousands of patients daily. The gap between “technically correct” and “actually human” is enormous, and it’s not a problem you can prompt your way out of. One of our engineers spent three or four hours generating iterations of filler phrases (“mmm, let me look that up for you”) because cadence and intonation matter to a patient calling about their health. An LLM can generate the options. Someone with ears has to choose.

Knowing when the agent should stop. Healthcare conversations move through predictable stages—intake, verification, scheduling, follow-up—and what’s appropriate to say at each stage isn’t always obvious from the outside. A patient’s diagnosis might be relevant at one point in the conversation and completely out of bounds thirty seconds earlier. A model optimizing for correctness will flatten all of that. Building systems that know not just what to say but when—and that route edge cases to a human without grinding everything to a halt—is some of the most consequential work we’re doing. Get it wrong and it’s not just a bad experience. Depending on what gets said to whom, it can be a compliance violation.

Building the infrastructure that makes all of this possible. Getting an agent to fix a bug autonomously sounds simple. Getting it to do that reliably, on a production codebase that handles real patient data, is a different problem. It requires evaluation systems that can tell the difference between a good fix and a confident wrong answer, context that’s rich enough for the agent to understand what it’s touching, and guardrails that fail safely when it doesn’t. We’ve built a lot of this. There’s more to build. And the honest truth is that the agents are only as trustworthy as the humans who designed the systems around them.

A Day in Your Life
No two days look the same, but here’s a realistic sketch:

Morning: You're building out a new capability on the agent platform: a way to handle prescription refill requests that a practice flagged as fully manual and error-prone. You’re using the internal agent harness to scaffold the initial implementation, then reviewing what it produced and making the judgment calls it couldn’t. If you need to test something under real-world conditions—noise reduction, how the agent handles a difficult accent—you might head to the cafe downstairs. (We get out of the lab.)

Late morning: An alert surfaces from the Datadog agent: something’s off in how insurance information is being collected in a specific scenario. The symptom points to one place; the root cause turns out to be somewhere completely different. The agent had context across the entire codebase, meeting transcripts, and Notion pages and traced it faster than any engineer who knew the stack. You verify the fix and ship it.

After lunch: One of the QA folks who audits patient calls posts in Slack that she needs a “Mark as Reviewed” button. It takes a few seconds per call, but she reviews thousands of them. She tags the Slack bot hooked to our codebase. Five minutes later the button exists. You give it a once-over and it’s live.

End of day: You sync with CSMs and Eric on what they heard from customers. One conversation surfaces a new product opportunity. You sketch what a prototype might look like. (You’ll be building it tomorrow.)

Who Thrives Here
You treat LLMs as force multipliers and you’re already working that way.

You have taste. It’s the thing that’s hardest to hire for and most important to get right. It means you can tell the difference between code that’s technically correct and a patient experience that’s genuinely good. You notice the thing that’s slightly wrong and then you can’t unknow it (and you won’t ship until it’s right). You understand that what we’re building is a science and an art, and you take both seriously.

You question things. Not performatively, but you genuinely don’t accept the status quo when you can see a better way. You’re the counterbalance to people who defer too quickly, and you make the team better for it.

You care about shipping things that reach real people, not abstract users. That context should inform how you think about every decision you make here. If it doesn’t, this probably won’t be the right fit.

You’re honest about what you don’t know. “You might have a point; I may be wrong” should be something you’ve said recently and meant. We debate, we disagree, we challenge each other, but we do it with respect and with the assumption of good intent. Someone who can’t say they’re wrong won’t last long here, and frankly won’t enjoy it much either.

You know there’s unglamorous work and you don’t think it’s beneath you. No matter your seniority, everyone rotates through the mundane stuff. That’s just how it works at this stage, and the people who are great here are the ones who do it without making it a thing.

You shouldn’t apply if:

You’re drawn to heavy ML research. That’s not what this is, and it would probably frustrate you. We’re building products that leverage models, not the models themselves, and a generalist engineer with strong instincts and genuine curiosity will outperform an ML specialist here almost every time.

You want someone to define the problem in detail before you start. We’re moving too fast for that, and the most interesting problems here are the ones nobody has fully defined yet.

You’ve never taken real ownership of a project. If you’re used to working with an EM or PM who assigns you detailed Jira tickets on exactly how to build it, this will be disorienting.

You need a detailed roadmap three years out. We shoot for a north star but we’re constantly adapting. Agility is our advantage.

The Team
Our honest assessment: this team skews high EQ. (Maybe too high. We probably need more shoelace staring.) But it’s also what makes working here feel different from most engineering environments: people genuinely like each other, go climbing together, form bands, bake for each other. The culture isn’t declared. It’s just what happens when you hire people you actually want to spend time with.

Cecilia has been here the longest and is, by general consensus, the heartbeat of the culture. She has a cat named Kiwi (she’s allergic to cats, so this is real commitment) and strong opinions about where to eat lunch.

Jeffrey flies in from Kansas once a month and takes each of us on a workout when he visits. He’s climbed Mt. Whitney twice and survived one thunderstorm-induced retreat. (He came down the mountain like a reasonable person. He went back like Jeffrey.) Pack your office bag accordingly when he’s in town.

Sumedh joined because he’d been watching what we were building and couldn’t stay on the sidelines. He’s senior enough to mentor everyone, but humble enough that nobody feels mentored. Also inexplicably West Coast for an East Coast guy.

Tyrone commutes from Berkeley by bike every day, which is either impressive or unhinged depending on how you feel about the Bay Bridge. He loves climbing and will invite every new hire to go with him.

Tom runs Customer Success and is technical enough to do real triage before anything reaches you, which means the problems that land on your desk are actually worth solving. He also has a dachshund named Wanda who is, by any measurable metric, our most effective marketing asset.

Jung (co-founder, CEO) was patient number one at One Medical: he helped the founder build the first financial model before the practice had a single location. He also built the compliance program there from scratch, which is why we're already SOC 2 and HIPAA compliant. He has a PhD in operations research, which means the person running ops here thinks like an engineer. On the weekends you can find him at the farmers market selling kimchi in honor of his mother.

Eric (co-founder, CPO) was my college classmate, and we played on the same rec league basketball team before we decided to build together. He was working in ML/NLP before ChatGPT existed, which means he knew this business was possible before most people were even asking the question. Eric’s the reason we’re building in healthcare specifically… but I’ll let him tell you that story.

Me (Aaron, co-founder, CTO): Eric and I have been friends since before any of this existed. I think the best thing I can say about what it’s like to work with me is that he’s still here.

Requirements
3-7 years of relevant engineering experience. Expertise in Python, Django, and React. Strong backend fundamentals: you’ve designed and implemented scalable, robust systems. Previous experience at an early-stage startup is a plus. Genuine belief that agentic tools are the future of how software gets built.

Interview Process
We move quickly: you’ll hear back within 48 hours after each round. This can be as fast as two weeks end-to-end.

30-minute founder intro call

1-hour technical phone screen: you’ll problem-solve through an evolving technical challenge

Full-day onsite in San Francisco. We’ll work through real customer problems together. If your schedule allows, we’d love to make this a paid 3-day work trial. We think it’s the best way for both sides to know if there’s a real fit.

Reference checks → Offer

Benefits
Excellent health, dental, and vision insurance

Free dinner

Free illy espresso, coffee, and beer

Fitness stipend

Commuter benefits

Unlimited PTO

401(k)

Relocation stipend

Our Office (The Nest)
In-person, Downtown SF by Montgomery BART. After years of remote work, we wanted to be in the same room again to brainstorm, whiteboard, and actually enjoy each other’s company. There’s high ownership, no BS meetings, and a ton of gourmet snacks. Once a week we also get out of the office for a team lunch… because some of the best conversations happen away from the whiteboards.

Include a 🦜 in your application for extra credit.

- Aaron, Co-founder & CTO