Jason Voorhees
๐ธ๐๐๐๐๐๐๐๐ ๐ฎ๐๐๐ โข ๐๐๐๐๐ฅ
- Joined
- May 15, 2020
- Posts
- 85,587
- Reputation
- 254,840
Title. Just got released last week and it's extremely good in coding tests. So even I wanted to see how fucked us ITcels are. I was actually testing this last weekend with an assignment that they give to my friend's company's interns for one of their SDE-1 roles. This is the assignment it is for a YC backed startup. Very good pay and here are all the test cases. If you want to solve this. Dm me I'll send you the assignment.
The main problem in this assignment is the non-standardized columns that they have and the fact that they want you derive and calculate the taxes individually and flag the mismatched while also adding support for excel,pdfs and images
I solved it last month using GCP Document AI for OCR like reading the files for PDFs, Images using the built in invoice parser with metadata and use excel as is with pandas to extract it's content. Then I used gemini to normalize the columns and fed it to redux columns to make real time updates possible and had the site deployed and it ran all the test cases perfectly and also preemptively flagged the ones that it was not sure about.
I had a few minor hiccups here and there where I forgot that mantine ui stuff. Also like always i generated the entire frontend design and styling using ChatGPT, only did the the redux toolkit and glueing the code together part. I did do all the logic, overall it was no big deal didn't feel the assignment was particularly too difficult. Just a bit lengthy and boring. Took me some 4-5 hours. 300 lines of clean code that anyone can understand and read from to debug with proper file structure. For a beginner who doesn't know about these things it might have taken them maybe 8-10 hours.
But now lets come to this Claude Opus with all the praises I heard about it, even I wanted to know how fucked all us ITcels and how I should go back to my kerala town and start fishing to make living again so there I uploaded all the test cases, the assignment and gave it prompt to make it as simple as possible, out came the code within seconds but the kind bugs that it had made my head spin so hard. Debugging it was a bigger problem than me myself doing it. It generated this.
If you dont know what it is it is basically hallucinated an entire npm package that does not even exist. Holy shit. This is where it got it from
github.com
.When I looked up where it got this idea from that GitHub link it pointed to the runllama/llama_cloud_services GitHub repo. Claude basically hallucinated all those parts which completely breaks the app and even there, right there with a
they slapped a massive DEPRECATION NOTICE on that exact repository. That claude just simply ignored it because it was fed outdated data.
When I prompted it again it immediately started apologizing
and after prompting once more it hallucinated once again to give some garbage code that changed the schema and randomly generated some unknown stuff that never existed entirely causing frontend and backend crash.
All these are career ending million dollar mistakes by the way like if someone caught you making these mistakes they will kick you out of the company right that second.
My friend was watching all this and started laughing how AI is going to take our jobs. Almost 4 hours of continuous prompting in and still tons of bugs, errors and long garbage code solving completely unnecessary edge cases and it gave me like 800 lines of some frankenstein zombie code with ternary operators, depreciated libraries, unused imports, looped statements, variables out of nowhere, it was such a mess that I had one look and said nope. 0 interest in debugging it.
So yeah that about answers LLMs in general do I think Opus is waste absolutely not. It is amazing I am telling you this openly that it can do stuff that I or even a senior engineers with 10+ years of experience will struggle to do
It is very powerful and very impressive but it lacks comprehension, and proper system design understanding of things, it doens't understand what exactly it spits out.
In simple words Opus is like that over smart hyper active kid t in class that doesnt actually know anything but still blurts out answers with the kind of confidence that even teacher has to double check if they were wrong but when you look at actually look at his work it's all BS and filled with him inventing stuff but it does show flashes of brilliance
For real projects I'd pick GitHub Copilot. Way fewer dumb mistakes, more reliable but also way less creative and bold it can't create an entire app like Opus but atleast it doesn't do shit like this. Copilot is the quiet kid who only speaks when 100% sure and that's what you want so yeah LLMs even now need a lot of human holding.
The main problem in this assignment is the non-standardized columns that they have and the fact that they want you derive and calculate the taxes individually and flag the mismatched while also adding support for excel,pdfs and images
I solved it last month using GCP Document AI for OCR like reading the files for PDFs, Images using the built in invoice parser with metadata and use excel as is with pandas to extract it's content. Then I used gemini to normalize the columns and fed it to redux columns to make real time updates possible and had the site deployed and it ran all the test cases perfectly and also preemptively flagged the ones that it was not sure about.
I had a few minor hiccups here and there where I forgot that mantine ui stuff. Also like always i generated the entire frontend design and styling using ChatGPT, only did the the redux toolkit and glueing the code together part. I did do all the logic, overall it was no big deal didn't feel the assignment was particularly too difficult. Just a bit lengthy and boring. Took me some 4-5 hours. 300 lines of clean code that anyone can understand and read from to debug with proper file structure. For a beginner who doesn't know about these things it might have taken them maybe 8-10 hours.
But now lets come to this Claude Opus with all the praises I heard about it, even I wanted to know how fucked all us ITcels and how I should go back to my kerala town and start fishing to make living again so there I uploaded all the test cases, the assignment and gave it prompt to make it as simple as possible, out came the code within seconds but the kind bugs that it had made my head spin so hard. Debugging it was a bigger problem than me myself doing it. It generated this.
If you dont know what it is it is basically hallucinated an entire npm package that does not even exist. Holy shit. This is where it got it from
GitHub - run-llama/llama_cloud_services: Knowledge Agents and Management in the Cloud
Knowledge Agents and Management in the Cloud. Contribute to run-llama/llama_cloud_services development by creating an account on GitHub.
.When I looked up where it got this idea from that GitHub link it pointed to the runllama/llama_cloud_services GitHub repo. Claude basically hallucinated all those parts which completely breaks the app and even there, right there with a
they slapped a massive DEPRECATION NOTICE on that exact repository. That claude just simply ignored it because it was fed outdated data.
When I prompted it again it immediately started apologizing
and after prompting once more it hallucinated once again to give some garbage code that changed the schema and randomly generated some unknown stuff that never existed entirely causing frontend and backend crash.
All these are career ending million dollar mistakes by the way like if someone caught you making these mistakes they will kick you out of the company right that second.
My friend was watching all this and started laughing how AI is going to take our jobs. Almost 4 hours of continuous prompting in and still tons of bugs, errors and long garbage code solving completely unnecessary edge cases and it gave me like 800 lines of some frankenstein zombie code with ternary operators, depreciated libraries, unused imports, looped statements, variables out of nowhere, it was such a mess that I had one look and said nope. 0 interest in debugging it.
So yeah that about answers LLMs in general do I think Opus is waste absolutely not. It is amazing I am telling you this openly that it can do stuff that I or even a senior engineers with 10+ years of experience will struggle to do
It is very powerful and very impressive but it lacks comprehension, and proper system design understanding of things, it doens't understand what exactly it spits out.
In simple words Opus is like that over smart hyper active kid t in class that doesnt actually know anything but still blurts out answers with the kind of confidence that even teacher has to double check if they were wrong but when you look at actually look at his work it's all BS and filled with him inventing stuff but it does show flashes of brilliance
For real projects I'd pick GitHub Copilot. Way fewer dumb mistakes, more reliable but also way less creative and bold it can't create an entire app like Opus but atleast it doesn't do shit like this. Copilot is the quiet kid who only speaks when 100% sure and that's what you want so yeah LLMs even now need a lot of human holding.
Last edited:



