Can ChatGPT Generate a Full iOS App?
We already know that ChatGPT has the coding interview skills to land itself a job at a top tech firm. However, job interviews are a notoriously poor predictor of on-the-job performance. So, how would ChatGPT fare performing the day-to-day tasks of a tech worker? Is the software engineering profession under an imminent existential threat?
As a mobile app developer, I wanted to understand ChatGPT’s ability to design and program large-scale applications. Can it piece together an entire project and transform mobile app development into an interactive, “choose your own adventure” experience? Can ChatGPT democratize app development — placing this valuable skill in the hands of the average layperson with an original idea?
ChatGPT Codes an App
To evaluate ChatGPT’s app development skills, I decided to guide it through the development of a fully featured “Reddit client” iOS app. RedditGPT should have the ability to load a list of posts from a user-specified subreddit along with a “post details view” complete with comments and more. Throughout the challenge I assumed minimal prior coding experience and avoided writing a single line of code — every snippet of code driving the app had to be written entirely by ChatGPT.
Getting ChatGPT on the right initial track turned out to be the most finicky and time-intensive aspect of the programming exercise. After dozens of iterations, I was finally able to display a simple list of Reddit posts (shown below) with the prompt “create a SwiftUI app that displays a list of the top posts from reddit.com” followed by five subsequent error-correcting prompts. Through hours of experimentation, I picked up on ChatGPT’s quirks and developed tactics to keep the conversation on course. For example, keeping code heavily abstracted, modularized, and adherent to an industry-standard application design pattern (in this case MVVM) allowed ChatGPT to focus on more narrow chunks of functionality — more similar in fashion to the chatbot’s specialty of coding interview questions.
As a seasoned iOS app developer, I found the process excruciating. However, after hundreds of prompts spread across dozens of conversations, I finally lassoed ChatGPT into building a decent, fully featured iOS app! Here is my full conversation with the generative AI chat bot along with the final project codebase.
Impressively, ChatGPT successfully built a complex, interactive, web-enabled iOS app with 500+ lines of functional code. It even suggested a reasonable file structure based on the code it had written. While my stretch goal was to get to 100 prompts in one continuous chat, my journey was terminated abruptly after 80 prompts when the answers became incoherent despite repeated backtracking and iteration. Nevertheless, I think that the incredibly promising capability of ChatGPT speaks for itself in the video above.
Key Takeaways
Grappling with ChatGPT for over 20 hours during this challenge, I was continuously amazed at just how capable the large language model (LLM) was given the absence of formal scaffolding applied around the open-ended task of writing a fully featured Reddit client iOS app. Following this experience, here are three of my takeaways as we work towards optimizing our collective use of generative AI.
1: Context
Providing context when having ChatGPT iterate on a complex codebase such as the RedditGPT app is critical. While ChatGPT has an impressive look-back of at least 1,000 words, it was necessary to regularly include clarifying prose and snippets of the codebase’s current state. Without this, ChatGPT often suggested code which conflicted with the rest of the project and failed to compile.
Prompt engineering seeks to align an LLM’s goals with the user’s by attaching the full context, set of assumptions, and hints at desired outcome in the prompt. I like to think of the components of this provided context as input modules. These are both 1) supplementary inputs and 2) modifications to the user’s input that optimize for goal alignment. For example, one input module helpful for this challenge would be a list of available classes in the existing code. In fact, ChatGPT already does much of the heavy lifting for you under the hood as evidenced by explorations in prompt injection.
2: Hallucination
ChatGPT’s propensity to “hallucinate” convincing yet false realities is well documented. During this exercise, I repeatedly observed that ChatGPT referenced nonexistent symbols, un-imported third party libraries, or incongruous code clearly gleaned directly from an online tutorial.
Hence, I found myself developing a routine wherein I would plug in the new code, attempt to compile the updated app, and either 1) prompt ChatGPT with whatever errors arose, or 2) back up a few prompts once it seemed like ChatGPT had dug itself into a hole that it could not escape. Generalizing this, the validation and post-processing of LLM output can be thought of as output modules. As described above, I played the part of a manual output validator but it is clear to see how this process can be automated. For instance, one of ChatGPT’s output modules is a content moderation filter which prevents it from producing controversial or otherwise unsavory output.
3: Generality
Despite ChatGPT’s glaring shortcomings, the rate of progress in LLM capability and scale over the past couple of years is staggering. Looking through my full conversation with ChatGPT, it was able to do everything from generate code from scratch to debug compiler errors to suggest and implement high-level improvements to suggest a file structure for maintainability.
As powerful as ChatGPT’s generality is, this comes at the cost of precision and performance in narrow, precise use cases such as app development. Applications built on top of LLMs can overcome this by fine tuning — training the model with additional domain-specific dataset to improve the quality, speed, and consistency of results. Post-training pruning is another recent topic of research which promises a potential reduction in model size of 90%+ and reduced training costs.
Conclusion
So, will ChatGPT replace software engineers outright within the next handful of years? Almost certainly not. However, LLMs will continue to augment ever larger swaths of engineering capabilities. For now, these models are better off as tools or assistants (eg. Github Copilot), leaving the critical high-level problem solving to the engineer. Much like calculators are an indispensable tool for mathematicians, LLMs will act as a force multiplier for software engineers. This symbiotic relationship between LLMs and engineers will become increasingly crucial as the scope and complexity of software continues to grow.
Generative AI is a disruptive innovation that has the potential to reshape entire industries in the years to come. There is a virtually limitless range of opportunities for tailored suites of input modules, output modules, and fine tuned, pruned models to be applied to specific verticals. This is what will transform LLMs from mere shortcuts into a revolutionary new set of tools. On their own, LLMs don’t make for excellent app developers, but they can and will be transformed into one.
The experiment above offers a tantalizing glimpse into a future where complete creative power is made accessible to everyone. The barriers to entry for the creator economy are crumbling as previously passive content consumers will become active contributors. These new tools will have a profound impact on the economy akin to the proliferation of personal computers.
One of the things that really separates us from the high primates is that we’re tool builders. I read a study that measured the efficiency of locomotion for various species on the planet. The condor used the least energy to move a kilometer. And, humans came in with a rather unimpressive showing, about a third of the way down the list. It was not too proud a showing for the crown of creation. So, that didn’t look so good. But, then somebody at Scientific American had the insight to test the efficiency of locomotion for a man on a bicycle. And, a man on a bicycle, a human on a bicycle, blew the condor away, completely off the top of the charts.
And that’s what a computer is to me. What a computer is to me is it’s the most remarkable tool that we’ve ever come up with, and it’s the equivalent of a bicycle for our minds.
~ Steve Jobs
Who knows — generative AI might just be the last mind bicycle that man need ever make. (hint: AGI 😉)
Pro tip: you can give up to 50 claps for an article on Medium! Just click and hold the clap icon for few seconds and watch the magic happen! 😉