I tested the claude opus 4.6 with an assignment they give to interns. Detailed thread

Title. Just got released last week and it's extremely good in coding tests. So even I wanted to see how fucked us ITcels are. I was actually testing this last weekend with an assignment that they give to my friend's company's interns for one of their SDE-1 roles. This is the assignment it is for a YC backed startup. Very good pay and here are all the test cases. If you want to solve this. Dm me I'll send you the assignment.

View attachment 4673695

The main problem in this assignment is the non-standardized columns that they have and the fact that they want you derive and calculate the taxes individually and flag the mismatched while also adding support for excel,pdfs and images

View attachment 4673759View attachment 4673768

I solved it last month using GCP Document AI for OCR like reading the files for PDFs, Images using the built in invoice parser with metadata and use excel as is with pandas to extract it's content. Then I used gemini to normalize the columns and fed it to redux columns to make real time updates possible and had the site deployed and it ran all the test cases perfectly and also preemptively flagged the ones that it was not sure about.

View attachment 4673764

I had a few minor hiccups here and there where I forgot that mantine ui stuff. Also like always i generated the entire frontend design and styling using ChatGPT, only did the the redux toolkit and glueing the code together part. I did do all the logic, overall it was no big deal didn't feel the assignment was particularly too difficult. Just a bit lengthy and boring. Took me some 4-5 hours. 300 lines of clean code that anyone can understand and read from to debug with proper file structure. For a beginner who doesn't know about these things it might have taken them maybe 8-10 hours.

View attachment 4673716
View attachment 4673721

But now lets come to this Claude Opus with all the praises I heard about it, even I wanted to know how fucked all us ITcels and how I should go back to my kerala town and start fishing to make living again so there I uploaded all the test cases, the assignment and gave it prompt to make it as simple as possible, out came the code within seconds but the kind bugs that it had made my head spin so hard. Debugging it was a bigger problem than me myself doing it. It generated this.

Black Ink Crew Laughing GIF by VH1


View attachment 4673732


If you dont know what it is it is basically hallucinated an entire npm package that does not even exist. Holy shit. This is where it got it from


It is just a internal+public docs for LlamaIndex's backend APIs which shows end points it is not a maintained SDK at all.

View attachment 4673734

When I prompted it again it immediately started apologizing
View attachment 4673745
View attachment 4673746

and after prompting once more it hallucinated once again to give some garbage code that changed the schema and randomly generated some unknown stuff that never existed entirely causing frontend and backend crash.

All these are career ending million dollar mistakes by the way like if someone caught you making these mistakes they will kick you out of the company right that second. View attachment 4673747

My friend was watching all this and started laughing how AI is going to take our jobs. Almost 4 hours of continuous prompting in and still tons of bugs, errors and long garbage code solving completely unnecessary edge cases and it gave me like 800 lines of some frankenstein zombie code that I had 0 interest in debugging.

View attachment 4673795

So yeah that about answers LLMs in general do I think Opus is waste absolutely not. It is amazing I am telling you this openly that it can do stuff that I or even a senior engineers with 10+ years of experience will struggle to do

It is very powerful and very impressive but it lacks comprehension, and proper system design understanding of things, it doens't understand what exactly it spits out.

In simple words Opus is like that over smart hyper active kid t in class that doesnt actually know anything but still blurts out answers with the kind of confidence that even teacher has to double check if they were wrong but when you look at actually look at his work it's all BS and filled with him inventing stuff.

For real projects I'd pick GitHub Copilot. Way fewer dumb mistakes, more reliable but also way less creative and bold it can't create an entire app like Opus but atleast it doesn't do shit like this. Copilot is the quiet kid who only speaks when 100% sure and that's what you want so yeah LLMs even now need a lot of human holding.
how advanced do u think ai is so far? and whats its rate of improving like
 
  • +1
Reactions: alexias and Jason Voorhees
One more thing is that a true sde/ swe doesn't actually spend the entire day coding. It only 10% of your responsibility as a software engineers most of the time will be spent just reading documents and talking to people
i have a question can I pm you about it
 
  • +1
Reactions: alexias and Jason Voorhees
how advanced do u think ai is so far? and whats its rate of improving like

how advanced do u think ai is so far? and whats its rate of improving like
No body can predict the future bro but one thing is for sure. This AI thing is hear to stay and people who do not adapt will get crushed eventually
 
  • +1
Reactions: alexias and dawooddX
  • +1
Reactions: alexias
@Nexom
 
  • +1
Reactions: alexias and banku don
@Scandicel @Former Shortcel
 
  • +1
Reactions: alexias, Scandicel, banku don and 1 other person
Nice to see someone's experimenting with Claude just like me, though you're doing it based on ITcelling, whereas here I'm tryna predict my exam papers lol

Since you're a high-IQ user, I wanted to ask you a few stuff related to AI, can I pm you?
 
  • +1
Reactions: alexias and Jason Voorhees
Nice to see someone's experimenting with Claude just like me, though you're doing it based on ITcelling, whereas here I'm tryna predict my exam papers lol

Since you're a high-IQ user, I wanted to ask you a few stuff related to AI, can I pm you?
Ok
 
  • +1
Reactions: alexias and banku don
@Divineincel @HighLtn
 
  • +1
Reactions: alexias and Divineincel
@Gomez @jeoyw9192
 
  • +1
Reactions: alexias and Gomez
The issue with relying on AI to do such complex work you cannot do is if it fails, you will not know, until it's too late
 
  • +1
Reactions: banku don, alexias and Jason Voorhees
The issue with relying on AI to do such complex work you cannot do is if it fails, you will not know, until it's too late
the problem is it fails so spectacularly that it will take you more time debugging it than doing it on your own
 
  • +1
Reactions: chadzay, alexias and mcmentalonthemic
@kababcel
 
  • +1
Reactions: kababcel and alexias
I feel retarded dont understand these abbreviations ur making
 
  • +1
Reactions: alexias and Jason Voorhees
  • +1
Reactions: futureschizo
YC means Ycombinator and SDE meand software developer
Thanks I ended up searching it on google

Is this work hard? Just read the thread how competitive is this field
 
  • +1
Reactions: Jason Voorhees
For a lot of white collar fields, outsourcing to jeets is a greater threat then AI. The hallucination issue creates far to many problems, and in some cases it's extremely serious. It can commit things like straight up fraud due to the issue.

It's most beneficial at low level review, IE contract or legal complice. Which represents a minor part of low value professional workload that no one likes anyways.
 
  • +1
Reactions: Jason Voorhees
Thanks I ended up searching it on google

Is this work hard? Just read the thread how competitive is this field
yes but if you are dedicated are willing to put the work and upskill yourself everyday you can also score a sde-1 also very paying but it will be hard
 
  • +1
Reactions: futureschizo
yes but if you are dedicated are willing to put the work and upskill yourself everyday you can also score a sde-1 also very paying but it will be hard
Is AI not taking these jobs?

Do you think its worth getting into this now
 
  • +1
Reactions: Jason Voorhees
Is AI not taking these jobs?

Do you think its worth getting into this now
If you read this thread it should tell you enough if AI will take these jobs
 
  • +1
Reactions: futureschizo
If you read this thread it should tell you enough if AI will take these jobs
I did, more so talking about the future.

AI will only continue to evolve and improve over time putting people out of jobs is that the wrong way of looking at it?
 
  • +1
Reactions: Jason Voorhees
I did, more so talking about the future.

AI will only continue to evolve and improve over time putting people out of jobs is that the wrong way of looking at it?
I think AI will take a lot of lower level jobs and there will be fewer jobs in future as the bar gets raised but all software development and all tech adjacent jobs still remain just their nature will change with a higher entry bar
 
  • +1
Reactions: Deleted member 276814, afroheadluke and futureschizo
Is AI not taking these jobs?

Do you think its worth getting into this now
if you are motivated yes I know of so many programs for freshmen/sophmore students that I wish I knew about and could have applied to inorder to land a FANG internship. You can do good projects and also larp about them as long as you do good on the interview. Once you get a good company on your resume you are setup because the name/aura carries
 
  • +1
Reactions: futureschizo
LLMs come in two flavors. wildly creative geniuses who hallucinate like crazy, or boringly reliable assistants who are limit in their abilities.
This is why codex is better than Claude code imo
 
Title. Just got released last week and it's extremely good in coding tests. So even I wanted to see how fucked us ITcels are. I was actually testing this last weekend with an assignment that they give to my friend's company's interns for one of their SDE-1 roles. This is the assignment it is for a YC backed startup. Very good pay and here are all the test cases. If you want to solve this. Dm me I'll send you the assignment.

View attachment 4673695

The main problem in this assignment is the non-standardized columns that they have and the fact that they want you derive and calculate the taxes individually and flag the mismatched while also adding support for excel,pdfs and images

View attachment 4673759View attachment 4673768

I solved it last month using GCP Document AI for OCR like reading the files for PDFs, Images using the built in invoice parser with metadata and use excel as is with pandas to extract it's content. Then I used gemini to normalize the columns and fed it to redux columns to make real time updates possible and had the site deployed and it ran all the test cases perfectly and also preemptively flagged the ones that it was not sure about.

View attachment 4673764

I had a few minor hiccups here and there where I forgot that mantine ui stuff. Also like always i generated the entire frontend design and styling using ChatGPT, only did the the redux toolkit and glueing the code together part. I did do all the logic, overall it was no big deal didn't feel the assignment was particularly too difficult. Just a bit lengthy and boring. Took me some 4-5 hours. 300 lines of clean code that anyone can understand and read from to debug with proper file structure. For a beginner who doesn't know about these things it might have taken them maybe 8-10 hours.

View attachment 4673716
View attachment 4673721

But now lets come to this Claude Opus with all the praises I heard about it, even I wanted to know how fucked all us ITcels and how I should go back to my kerala town and start fishing to make living again so there I uploaded all the test cases, the assignment and gave it prompt to make it as simple as possible, out came the code within seconds but the kind bugs that it had made my head spin so hard. Debugging it was a bigger problem than me myself doing it. It generated this.

Black Ink Crew Laughing GIF by VH1


View attachment 4673732


If you dont know what it is it is basically hallucinated an entire npm package that does not even exist. Holy shit. This is where it got it from


.When I looked up where it got this idea from that GitHub link it pointed to the runllama/llama_cloud_services GitHub repo. Claude basically hallucinated all those parts which completely breaks the app and even there, right there with a โš ๏ธ they slapped a massive DEPRECATION NOTICE on that exact repository. That claude just simply ignored it because it was fed outdated data.

View attachment 4674011

View attachment 4673734

When I prompted it again it immediately started apologizing
View attachment 4673745
View attachment 4673746

and after prompting once more it hallucinated once again to give some garbage code that changed the schema and randomly generated some unknown stuff that never existed entirely causing frontend and backend crash.

All these are career ending million dollar mistakes by the way like if someone caught you making these mistakes they will kick you out of the company right that second. View attachment 4673747

My friend was watching all this and started laughing how AI is going to take our jobs. Almost 4 hours of continuous prompting in and still tons of bugs, errors and long garbage code solving completely unnecessary edge cases and it gave me like 800 lines of some frankenstein zombie code with ternary operators, depreciated libraries, unused imports, looped statements, variables out of nowhere, it was such a mess that I had one look and said nope. 0 interest in debugging it.

View attachment 4673795

So yeah that about answers LLMs in general do I think Opus is waste absolutely not. It is amazing I am telling you this openly that it can do stuff that I or even a senior engineers with 10+ years of experience will struggle to do

It is very powerful and very impressive but it lacks comprehension, and proper system design understanding of things, it doens't understand what exactly it spits out.

In simple words Opus is like that over smart hyper active kid t in class that doesnt actually know anything but still blurts out answers with the kind of confidence that even teacher has to double check if they were wrong but when you look at actually look at his work it's all BS and filled with him inventing stuff but it does show flashes of brilliance

For real projects I'd pick GitHub Copilot. Way fewer dumb mistakes, more reliable but also way less creative and bold it can't create an entire app like Opus but atleast it doesn't do shit like this. Copilot is the quiet kid who only speaks when 100% sure and that's what you want so yeah LLMs even now need a lot of human holding.
Mirin

Tbh if anyone relies totally on ai to solve the assignment they're beyond fucked

Just a small tip to newbies

A good system architecture from gpt 1, 2 or 3 >>>>>> meaning less code from claude, Gemini pro or even kiwi 2.5 as such
 
  • +1
Reactions: Jason Voorhees
This is why codex is better than Claude code imo
In enterprise land especially bigger shops like ours the company picks the tools not you. I'm only allowed to use Copilot because it's already baked into VS Code/GitHub, and has solid IP protection, satisfies compliance standards etc. Claude often gets blocked at the firewall or stuck in endless review cycles since it's not as plug and play. Model vs Model mog battles and comparisons
are fun to talk about but in deployment reality zero friction is what they want. Copilot just works everywhere. Many companies also host their own LLMs too these days.
 
  • +1
Reactions: widdi
@BR32
 
  • +1
Reactions: BR32
saaar

we are replace u saar

1771538444732


buy my ai saar
 
  • +1
Reactions: Jason Voorhees
1771538513821
 
  • +1
Reactions: Jason Voorhees
now try it with claude code
 
  • +1
Reactions: Jason Voorhees
God damn it man I just want AI powered UBI NEETbux so I can rot in peace
 
  • +1
Reactions: Jason Voorhees
now try it with claude code
I don't think you understand what claude code is. Claude code is a cli tool. Claude Opus is an LLM model
 
Last edited:
  • JFL
Reactions: Soter
God damn it man I just want AI powered UBI NEETbux so I can rot in peace
I am okay with AI taking all jobs as long as it takes everyone's jobs bar none. I'll happily live through it. No one will feel bad. No LinkedIn showoffs, no "big package" flexes. We'll all be in the same boat with collective unemployment that's okay with me.
 
  • +1
Reactions: Acquiescence
@Alt Number 3 @Swarthy Knight
 
@brotato78 @karmacita901
 
@browncurrycel
 
  • +1
Reactions: browncurrycel
@brotato78 @karmacita901
Even tho I didnโ€™t understand half of what u said I still read all of it

Good to know humans will still be needed and all our jobs havenโ€™t been eviscerated quite yet
 
  • +1
Reactions: Jason Voorhees
Even tho I didnโ€™t understand half of what u said I still read all of it

Good to know humans will still be needed and all our jobs havenโ€™t been eviscerated quite yet
Keep hustling!
 
  • +1
Reactions: brotato78
@NotAMogger @qxdr @SheafCohomology
 
  • +1
Reactions: qxdr
yeah they dont really have long term memory and that also often creates issues especially when we deal with โ€˜project' scale works
 
  • +1
Reactions: Jason Voorhees
Title. Just got released last week and it's extremely good in coding tests. So even I wanted to see how fucked us ITcels are. I was actually testing this last weekend with an assignment that they give to my friend's company's interns for one of their SDE-1 roles. This is the assignment it is for a YC backed startup. Very good pay and here are all the test cases. If you want to solve this. Dm me I'll send you the assignment.

View attachment 4673695

The main problem in this assignment is the non-standardized columns that they have and the fact that they want you derive and calculate the taxes individually and flag the mismatched while also adding support for excel,pdfs and images

View attachment 4673759View attachment 4673768

I solved it last month using GCP Document AI for OCR like reading the files for PDFs, Images using the built in invoice parser with metadata and use excel as is with pandas to extract it's content. Then I used gemini to normalize the columns and fed it to redux columns to make real time updates possible and had the site deployed and it ran all the test cases perfectly and also preemptively flagged the ones that it was not sure about.

View attachment 4673764

I had a few minor hiccups here and there where I forgot that mantine ui stuff. Also like always i generated the entire frontend design and styling using ChatGPT, only did the the redux toolkit and glueing the code together part. I did do all the logic, overall it was no big deal didn't feel the assignment was particularly too difficult. Just a bit lengthy and boring. Took me some 4-5 hours. 300 lines of clean code that anyone can understand and read from to debug with proper file structure. For a beginner who doesn't know about these things it might have taken them maybe 8-10 hours.

View attachment 4673716
View attachment 4673721

But now lets come to this Claude Opus with all the praises I heard about it, even I wanted to know how fucked all us ITcels and how I should go back to my kerala town and start fishing to make living again so there I uploaded all the test cases, the assignment and gave it prompt to make it as simple as possible, out came the code within seconds but the kind bugs that it had made my head spin so hard. Debugging it was a bigger problem than me myself doing it. It generated this.

Black Ink Crew Laughing GIF by VH1


View attachment 4673732


If you dont know what it is it is basically hallucinated an entire npm package that does not even exist. Holy shit. This is where it got it from


.When I looked up where it got this idea from that GitHub link it pointed to the runllama/llama_cloud_services GitHub repo. Claude basically hallucinated all those parts which completely breaks the app and even there, right there with a โš ๏ธ they slapped a massive DEPRECATION NOTICE on that exact repository. That claude just simply ignored it because it was fed outdated data.

View attachment 4674011

View attachment 4673734

When I prompted it again it immediately started apologizing
View attachment 4673745
View attachment 4673746

and after prompting once more it hallucinated once again to give some garbage code that changed the schema and randomly generated some unknown stuff that never existed entirely causing frontend and backend crash.

All these are career ending million dollar mistakes by the way like if someone caught you making these mistakes they will kick you out of the company right that second. View attachment 4673747

My friend was watching all this and started laughing how AI is going to take our jobs. Almost 4 hours of continuous prompting in and still tons of bugs, errors and long garbage code solving completely unnecessary edge cases and it gave me like 800 lines of some frankenstein zombie code with ternary operators, depreciated libraries, unused imports, looped statements, variables out of nowhere, it was such a mess that I had one look and said nope. 0 interest in debugging it.

View attachment 4673795

So yeah that about answers LLMs in general do I think Opus is waste absolutely not. It is amazing I am telling you this openly that it can do stuff that I or even a senior engineers with 10+ years of experience will struggle to do

It is very powerful and very impressive but it lacks comprehension, and proper system design understanding of things, it doens't understand what exactly it spits out.

In simple words Opus is like that over smart hyper active kid t in class that doesnt actually know anything but still blurts out answers with the kind of confidence that even teacher has to double check if they were wrong but when you look at actually look at his work it's all BS and filled with him inventing stuff but it does show flashes of brilliance

For real projects I'd pick GitHub Copilot. Way fewer dumb mistakes, more reliable but also way less creative and bold it can't create an entire app like Opus but atleast it doesn't do shit like this. Copilot is the quiet kid who only speaks when 100% sure and that's what you want so yeah LLMs even now need a lot of human holding.
Ai still has a long way to go, it can do simple tasks but anything further than that and it will give you consistently wrong answers and if you confront it about it, it will go like this "Ah, my bad, sorry for my incompetence"
 
  • +1
Reactions: Jason Voorhees
  • JFL
Reactions: Swarthy Knight and Wuzzdio
THIS SHIT SENT ME FLYING :forcedsmile:

@Swarthy Knight @Wuzzdio
Might be joever for us techcels cuhh :fuk:

You should have your backup plan of joining the Taliban and I should start contacting the Tamil Tigers to see if they have any openings :feelscry:
 
  • JFL
Reactions: Glorious King
Might be joever for us techcels cuhh :fuk:

You should have your backup plan of joining the Taliban and I should start contacting the Tamil Tigers to see if they have any openings :feelscry:
i got asked bout al qaeda for US visa :feelshah:
 
  • Woah
Reactions: Swarthy Knight

Similar threads

Jason Voorhees
Replies
46
Views
591
Chadeep
Chadeep
Jason Voorhees
Replies
53
Views
2K
flows5991
flows5991
Jason Voorhees
Replies
155
Views
5K
TropaChadding
TropaChadding
Jason Voorhees
Replies
24
Views
646
Pakicel
P

Users who are viewing this thread

Back
Top