dawooddX
Diamond
- Joined
- Jan 14, 2024
- Posts
- 1,490
- Reputation
- 1,793
how advanced do u think ai is so far? and whats its rate of improving likeTitle. Just got released last week and it's extremely good in coding tests. So even I wanted to see how fucked us ITcels are. I was actually testing this last weekend with an assignment that they give to my friend's company's interns for one of their SDE-1 roles. This is the assignment it is for a YC backed startup. Very good pay and here are all the test cases. If you want to solve this. Dm me I'll send you the assignment.
View attachment 4673695
The main problem in this assignment is the non-standardized columns that they have and the fact that they want you derive and calculate the taxes individually and flag the mismatched while also adding support for excel,pdfs and images
View attachment 4673759View attachment 4673768
I solved it last month using GCP Document AI for OCR like reading the files for PDFs, Images using the built in invoice parser with metadata and use excel as is with pandas to extract it's content. Then I used gemini to normalize the columns and fed it to redux columns to make real time updates possible and had the site deployed and it ran all the test cases perfectly and also preemptively flagged the ones that it was not sure about.
View attachment 4673764
I had a few minor hiccups here and there where I forgot that mantine ui stuff. Also like always i generated the entire frontend design and styling using ChatGPT, only did the the redux toolkit and glueing the code together part. I did do all the logic, overall it was no big deal didn't feel the assignment was particularly too difficult. Just a bit lengthy and boring. Took me some 4-5 hours. 300 lines of clean code that anyone can understand and read from to debug with proper file structure. For a beginner who doesn't know about these things it might have taken them maybe 8-10 hours.
View attachment 4673716
View attachment 4673721
But now lets come to this Claude Opus with all the praises I heard about it, even I wanted to know how fucked all us ITcels and how I should go back to my kerala town and start fishing to make living again so there I uploaded all the test cases, the assignment and gave it prompt to make it as simple as possible, out came the code within seconds but the kind bugs that it had made my head spin so hard. Debugging it was a bigger problem than me myself doing it. It generated this.
![]()
View attachment 4673732
If you dont know what it is it is basically hallucinated an entire npm package that does not even exist. Holy shit. This is where it got it from
![]()
GitHub - run-llama/llama_cloud_services: Knowledge Agents and Management in the Cloud
Knowledge Agents and Management in the Cloud. Contribute to run-llama/llama_cloud_services development by creating an account on GitHub.github.com
It is just a internal+public docs for LlamaIndex's backend APIs which shows end points it is not a maintained SDK at all.
View attachment 4673734
When I prompted it again it immediately started apologizing
View attachment 4673745
View attachment 4673746
and after prompting once more it hallucinated once again to give some garbage code that changed the schema and randomly generated some unknown stuff that never existed entirely causing frontend and backend crash.
All these are career ending million dollar mistakes by the way like if someone caught you making these mistakes they will kick you out of the company right that second. View attachment 4673747
My friend was watching all this and started laughing how AI is going to take our jobs. Almost 4 hours of continuous prompting in and still tons of bugs, errors and long garbage code solving completely unnecessary edge cases and it gave me like 800 lines of some frankenstein zombie code that I had 0 interest in debugging.
View attachment 4673795
So yeah that about answers LLMs in general do I think Opus is waste absolutely not. It is amazing I am telling you this openly that it can do stuff that I or even a senior engineers with 10+ years of experience will struggle to do
It is very powerful and very impressive but it lacks comprehension, and proper system design understanding of things, it doens't understand what exactly it spits out.
In simple words Opus is like that over smart hyper active kid t in class that doesnt actually know anything but still blurts out answers with the kind of confidence that even teacher has to double check if they were wrong but when you look at actually look at his work it's all BS and filled with him inventing stuff.
For real projects I'd pick GitHub Copilot. Way fewer dumb mistakes, more reliable but also way less creative and bold it can't create an entire app like Opus but atleast it doesn't do shit like this. Copilot is the quiet kid who only speaks when 100% sure and that's what you want so yeah LLMs even now need a lot of human holding.
they slapped a massive DEPRECATION NOTICE on that exact repository. That claude just simply ignored it because it was fed outdated data.